模型:

TheBloke/orca_mini_13B-GPTQ

许可:

mit

预印本库:

arxiv:2306.02707

其他:

text-generation-inference llama

语言:

数据集:

3Apsmathur/WizardLM_Orca 3Apsmathur/dolly-v2_orca 3Apsmathur/alpaca_orca

类库:

Transformers

任务:

文本生成

模型介绍文件清单

英文

Chat & support: my new Discord server

Want to contribute? TheBloke's Patreon page

Pankaj Mathur的Orca Mini 13B GPTQ

这些文件是用于 Pankaj Mathur's Orca Mini 13B 的GPTQ 4位模型文件。

这是使用 GPTQ-for-LLaMa 进行4位量化的结果。

可用的存储库

提示模板：

### System:
You are an AI assistant that follows instruction extremely well. Help as much as you can.

### User:
prompt

### Response:

或

### System:
You are an AI assistant that follows instruction extremely well. Help as much as you can.

### User:
prompt

### Input:
input

### Response:

如何轻松下载并在 text-generation-webui 中使用这个模型。

请确保您使用的是最新版本的 text-generation-webui 。

强烈建议您使用一键安装程序进行文本生成，除非您知道如何进行手动安装。

点击“模型”选项卡。

在“下载自定义模型或LoRA”下，输入“TheBloke/orca_mini_13B-GPTQ”。

点击“下载”。

模型将开始下载。下载完成后，会显示“完成”

在左上角，点击“模型”旁边的刷新图标。

在“模型”下拉菜单中，选择刚刚下载的模型：orca_mini_13B-GPTQ

模型将自动加载，现在已经准备好使用！

如果您想要任何自定义设置，请进行设置，然后依次点击“保存此模型的设置”和右上角的“重新加载模型”。

请注意，您不需要设置手动GPTQ参数。这些参数会从文件quantize_config.json自动设置。

准备好后，点击“文本生成”选项卡，然后输入提示进行开始！

如何使用Python代码中的这个GPTQ模型

首先确保已安装 AutoGPTQ ：

pip安装auto-gptq

然后尝试以下示例代码：

from transformers import AutoTokenizer, pipeline, logging
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
import argparse

model_name_or_path = "TheBloke/orca_mini_13B-GPTQ"
model_basename = "orca-mini-13b-GPTQ-4bit-128g.no-act.order"

use_triton = False

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)

model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
        model_basename=model_basename,
        use_safetensors=True,
        trust_remote_code=False,
        device="cuda:0",
        use_triton=use_triton,
        quantize_config=None)

# Note: check the prompt template is correct for this model.
prompt = "Tell me about AI"
prompt_template=f'''USER: {prompt}
ASSISTANT:'''

print("\n\n*** Generate:")

input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
output = model.generate(inputs=input_ids, temperature=0.7, max_new_tokens=512)
print(tokenizer.decode(output[0]))

# Inference can also be done using transformers' pipeline

# Prevent printing spurious transformers error when using pipeline with AutoGPTQ
logging.set_verbosity(logging.CRITICAL)

print("*** Pipeline:")
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=512,
    temperature=0.7,
    top_p=0.95,
    repetition_penalty=1.15
)

print(pipe(prompt_template)[0]['generated_text'])

提供的文件

orca-mini-13b-GPTQ-4bit-128g.no-act.order.safetensors

这将与AutoGPTQ，ExLlama和GPTQ-for-LLaMa的CUDA版本一起使用。有报道称近期GPTQ-for-LLaMa的Triton模式存在问题。如果遇到问题，请改用AutoGPTQ。

它是使用group_size 128创建的，以增加推断准确性，但没有使用--act-order（desc_act）以增加兼容性和提高推断速度。

orca-mini-13b-GPTQ-4bit-128g.no-act.order.safetensors
- 适用于CUDA或Triton模式下的AutoGPTQ。
- LLaMa模型也适用于 ExLlama ，通常提供更高的性能，并且使用的VRAM比AutoGPTQ少。
- 适用于CUDA模式下的GPTQ-for-LLaMa。GPTQ-for-LLaMa Triton模式可能存在问题。
- 适用于文本生成WebUI，包括一键安装程序。
- 参数：Groupsize = 128。Act Order / desc_act = False。

Discord

如需进一步支持以及有关这些模型和人工智能的讨论，请加入我们的：

TheBloke AI's Discord server

感谢和如何贡献。

感谢 chirper.ai 团队！

我收到很多人询问是否可以进行贡献。我喜欢提供模型并帮助人们，很乐意能够更多地花时间做这些，并且扩展到新的项目，如微调/训练。

如果您能够并且愿意贡献，我将非常感激，并将帮助我继续提供更多模型，并开始新的AI项目。

捐赠者将优先获得对AI/LLM/模型问题和请求的支持，可以访问私人Discord房间，以及其他福利。

Patreon： https://patreon.com/TheBlokeAI
Ko-Fi： https://ko-fi.com/TheBlokeAI

特别感谢：CarbonQuill的Luke，Aemon Algiz，Dmitriy Samsonov。

Patreon特别感谢：Pyrater，WelcomeToTheClub，Kalila，Mano Prime，Trenton Dambrowitz，Spiking Neurons AB，Pierre Kircher，Fen Risland，Kevin Schuppel，Luke，Rainer Wilmers，vamX，Gabriel Puliatti，Alex，Karl Bernard，Ajan Kanaga，Talal Aujan，Space Cruiser，ya boyyy，biorpg，Johann-Peter Hartmann，Asp the Wyvern，Ai Maven，Ghost，Preetika Verma，Nikolai Manek，trip7s trip，John Detwiler，Fred von Graf，Artur Olbinski，subjectnull，John Villwock，Junyu Yang，Rod A，Lone Striker，Chris McCloskey，Iucharbius，Matthew Berman，Illia Dulskyi，Khalefa Al-Ahmad，Imad Khwaja，chris gileta，Willem Michiel，Greatston Gnanesh，Derek Yates，K，Alps Aficionado，Oscar Rangel，David Flickinger，Luke Pendergrass，Deep Realms，Eugene Pentland，Cory Kujawski，terasurfer，Jonathan Leane，senxiiz，Joseph William Delisle，Sean Connelly，webtim，zynix，Nathan LeClaire。

感谢所有慷慨的赞助者和捐赠者！

原始模型卡片：Pankaj Mathur的Orca Mini 13B

orca_mini_13b

这是一个基于说明微调数据集训练的 OpenLLaMa-13B model 模型，使用WizardLM、Alpaca和Dolly-V2数据集的指令和输入，并应用Orca Research Paper数据集构建方法。

数据集

我们构建了经过说明微调的 WizardLM dataset ~70K ， Alpaca dataset ~52K 和 Dolly-V2 dataset ~15K ，使用了 Orca Research Paper 的方法。

我们利用Orca Research Paper中提供的所有 15 个系统指令来生成自定义数据集，与原始数据集使用的普通指令微调方法相比。

这有助于学生模型（也就是这个模型）从ChatGPT（gpt-3.5-turbo-0301版本）的教师模型中学习思维过程。

请参阅下面的示例用法，展示了如何在每个指令之前添加系统提示。

训练

训练配置如下表所示。

训练使用8个A100（80G）GPU进行，持续时间约为15小时，成本为180美元，使用 Lambda Labs

我们使用DeepSpeed进行完全共享的数据并行训练，也就是通过编写自己的微调脚本并利用由惊人的 OpenAlpaca repo 提供的一些模型训练代码来实现。

下面是训练过程中使用的一些参数：

batch_size	16
train_micro_batch_size_per_gpu	2
gradient_accumulation_steps	1
Learning rate	2e-5
Max length	1024
Epochs	3
Optimizer	AdamW

示例用法

以下显示了如何使用这个模型的示例

import torch
from transformers import LlamaForCausalLM, LlamaTokenizer

# Hugging Face model_path
model_path = 'psmathur/orca_mini_13b'
tokenizer = LlamaTokenizer.from_pretrained(model_path)
model = LlamaForCausalLM.from_pretrained(
    model_path, torch_dtype=torch.float16, device_map='auto',
)


#generate text function
def generate_text(system, instruction, input=None):
    
    if input:
        prompt = f"### System:\n{system}\n\n### User:\n{instruction}\n\n### Input:\n{input}\n\n### Response:\n"
    else:
        prompt = f"### System:\n{system}\n\n### User:\n{instruction}\n\n### Response:\n"
    
    tokens = tokenizer.encode(prompt)
    tokens = torch.LongTensor(tokens).unsqueeze(0)
    tokens = tokens.to('cuda')

    instance = {'input_ids': tokens,'top_p': 1.0, 'temperature':0.7, 'generate_len': 1024, 'top_k': 50}

    length = len(tokens[0])
    with torch.no_grad():
        rest = model.generate(
            input_ids=tokens, 
            max_length=length+instance['generate_len'], 
            use_cache=True, 
            do_sample=True, 
            top_p=instance['top_p'],
            temperature=instance['temperature'],
            top_k=instance['top_k']
        )    
    output = rest[0][length:]
    string = tokenizer.decode(output, skip_special_tokens=True)
    return f'[!] Response: {string}'

# Sample Test Instruction Used by Youtuber Sam Witteveen https://www.youtube.com/@samwitteveenai
system = 'You are an AI assistant that follows instruction extremely well. Help as much as you can.'
instruction = 'Write a letter to Sam Altman, CEO of OpenAI, requesting him to convert GPT4 a private model by OpenAI to an open source project'
print(generate_text(system, instruction))

[!] Response:
Dear Sam Altman,

I am writing to request that you convert the GPT4 private model developed by OpenAI to an open source project. As a user of OpenAI, I have been waiting for the day when I can use the advanced natural language processing capabilities of GPT4 in a more open and accessible way.

While OpenAI has made significant progress in developing AI applications, it has primarily focused on building private models that are not accessible to the general public. However, with the recent release of GPT-3, there is a growing demand for more open and accessible AI tools.

Converting GPT4 to an open source project would allow for greater transparency, collaboration, and innovation. It would also help to build trust in the technology and ensure that it is used ethically and responsibly.

I urge you to consider converting GPT4 to an open source project. This would be a significant contribution to the AI community and would help to create a more open and accessible future.

Thank you for your consideration.

Sincerely,

[Your Name]

P.S.我 #opentowork 和 #collaboration，如果您能提供帮助，请通过psmathur.public@gmail.com与我联系

下一个目标：

尝试更多数据，例如实际使用FLAN-v2，就像Orka Research Paper一样（我接受建议）

为文本生成UI提供更多选项（也许可以在 https://github.com/oobabooga/text-generation-webui 上提供）

提供4位GGML/GPTQ量化模型（也许 TheBloke 可以在这方面提供帮助）

限制和偏见：

这个模型可能会产生事实上不准确的输出，不应依赖它来产生事实准确的信息。该模型是根据各种公共数据集进行训练的。虽然我们已经尽力清理预训练数据，但这个模型可能会生成淫秽、有偏见或其他冒犯性的输出。

免责声明：

该模型的许可证不构成法律建议。对于使用此模型进行商业目的的第三方的行为，我们不承担任何责任。请在将此模型用于商业目的之前咨询律师。

引文：

如果您在您的研究或应用中发现wizardlm_alpaca_dolly_orca_open_llama_13b有用，请使用以下BibTeX引用：

@misc{wizardlm_alpaca_dolly_orca_open_llama_13b,
  author = {Pankaj Mathur},
  title = {wizardlm_alpaca_dolly_orca_open_llama_13b: An explain tuned OpenLLaMA-13b model on custom wizardlm, alpaca, & dolly datasets},
  year = {2023},
  publisher = {GitHub, HuggingFace},
  journal = {GitHub repository, HuggingFace repository},
  howpublished = {\url{https://github.com/pankajarm/wizardlm_alpaca_dolly_orca_open_llama_13b}, \url{https://https://huggingface.co/psmathur/wizardlm_alpaca_dolly_orca_open_llama_13b}},
}

@software{openlm2023openllama,
  author = {Xinyang Geng and Hao Liu},
  title = {OpenLLaMA: An Open Reproduction of LLaMA},
  month = May,
  year = 2023,
  url = {https://github.com/openlm-research/open_llama}
}

@misc{openalpaca,
  author = {Yixuan Su and Tian Lan and Deng Cai},
  title = {OpenAlpaca: A Fully Open-Source Instruction-Following Model Based On OpenLLaMA},
  year = {2023},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/yxuansu/OpenAlpaca}},
}

@misc{alpaca,
  author = {Rohan Taori and Ishaan Gulrajani and Tianyi Zhang and Yann Dubois and Xuechen Li and Carlos Guestrin and Percy Liang and Tatsunori B. Hashimoto },
  title = {Stanford Alpaca: An Instruction-following LLaMA model},
  year = {2023},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/tatsu-lab/stanford_alpaca}},
}

作者:

Tom Jobbins

数据集大小:

7.56 GB