模型:

TheBloke/orca_mini_7B-GPTQ

许可:

mit

预印本库:

arxiv:2306.02707

其他:

text-generation-inference llama

语言:

数据集:

3Apsmathur/WizardLM_Orca 3Apsmathur/dolly-v2_orca 3Apsmathur/alpaca_orca

类库:

Transformers

任务:

文本生成

模型介绍文件清单

英文

Chat & support: my new Discord server

Want to contribute? TheBloke's Patreon page

Pankaj Mathur的Orca Mini 7B GPTQ

这些文件是用于 Pankaj Mathur's Orca Mini 7B 的GPTQ 4位模型文件。

这是使用 GPTQ-for-LLaMa 进行4位量化的结果。

可用的存储库

提示模板：

### System:
You are an AI assistant that follows instruction extremely well. Help as much as you can.

### User:
prompt

### Response:

或者

### System:
You are an AI assistant that follows instruction extremely well. Help as much as you can.

### User:
prompt

### Input:
input

### Response:

如何在text-generation-webui中轻松下载和使用此模型

请确保您正在使用text-generation-webui的最新版本

点击模型选项卡。

在下载自定义模型或LoRA 下，输入 TheBloke/orca_mini_7B-GPTQ 。

点击下载。

模型将开始下载。下载完成后，将显示“完成”

在左上方，点击模型旁边的刷新图标。

在模型下拉菜单中，选择刚刚下载的模型： orca_mini_7B-GPTQ

模型将自动加载，现已准备就绪！

如果您想要任何自定义设置，请设置它们，然后点击为此模型保存设置，接着再点击重新加载模型在右上角。

请注意，您不再需要设置手动GPTQ参数。这些参数将自动从文件 quantize_config.json 中进行设置。

准备就绪后，点击文本生成选项卡，然后输入提示开始使用！

如何从Python代码中使用此GPTQ模型

首先确保您已安装 AutoGPTQ ：

pip install auto-gptq

然后尝试以下示例代码：

from transformers import AutoTokenizer, pipeline, logging
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
import argparse

model_name_or_path = "TheBloke/orca_mini_7B-GPTQ"
model_basename = "orca-mini-7b-GPTQ-4bit-128g.no-act.order"

use_triton = False

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)

model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
        model_basename=model_basename,
        use_safetensors=True,
        trust_remote_code=False,
        device="cuda:0",
        use_triton=use_triton,
        quantize_config=None)

# Note: check the prompt template is correct for this model.
prompt = "Tell me about AI"
prompt_template=f'''USER: {prompt}
ASSISTANT:'''

print("\n\n*** Generate:")

input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
output = model.generate(inputs=input_ids, temperature=0.7, max_new_tokens=512)
print(tokenizer.decode(output[0]))

# Inference can also be done using transformers' pipeline

# Prevent printing spurious transformers error when using pipeline with AutoGPTQ
logging.set_verbosity(logging.CRITICAL)

print("*** Pipeline:")
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=512,
    temperature=0.7,
    top_p=0.95,
    repetition_penalty=1.15
)

print(pipe(prompt_template)[0]['generated_text'])

提供的文件

orca-mini-7b-GPTQ-4bit-128g.no-act.order.safetensors

这将与AutoGPTQ、ExLlama和CUDA版本的GPTQ-for-LLaMa一起工作。据报道，GPTQ-for-LLaMa的Triton模式存在问题。如果出现问题，请改用AutoGPTQ。

它是使用group_size 128进行创建的，以增加推理准确性，但不使用--act-order（desc_act）以增加兼容性并提高推理速度。

orca-mini-7b-GPTQ-4bit-128g.no-act.order.safetensors
- 适用于CUDA或Triton模式的AutoGPTQ。
- LLaMa模型也适用于[ExLlama]（ https://github.com/turboderp/exllama} ，通常提供更高的性能，并使用较少的VRAM，比AutoGPTQ要少。
- 适用于CUDA模式的GPTQ-for-LLaMa。GPTQ-for-LLaMa Triton模式可能存在问题。
- 适用于text-generation-webui，包括一键安装程序。
- 参数：Groupsize = 128。Act Order / desc_act = False。

Discord

如需进一步支持，并就这些模型和AI进行讨论，请加入我们：

TheBloke AI's Discord server

感谢及如何贡献

感谢 chirper.ai 团队！

我已经有很多人询问是否可以贡献。我喜欢提供模型和帮助他人，并希望能够有更多的时间去做这些工作，以及扩展到新的项目，如微调/训练。

如果您能够和愿意做出贡献，我将非常感激，并将有助于我继续提供更多模型，并开始新的AI项目。

捐助者将优先获得对于任何AI/LLM/模型问题和请求的支持，可进入私人Discord房间，以及其他福利。

Patreon： https://patreon.com/TheBlokeAI
Ko-Fi： https://ko-fi.com/TheBlokeAI

特别感谢：卡本托德·卡尔卡拉, 丽欧布朗, 真田省三, 球状翻译服务, Dmitriy Samsonov。

Patreon特别感谢 : Pyrater, WelcomeToTheClub, Kalila, Mano Prime, Trenton Dambrowitz, Spiking Neurons AB, Pierre Kircher, Fen Risland, Kevin Schuppel, Luke, Rainer Wilmers, vamX, Gabriel Puliatti, Alex , Karl Bernard, Ajan Kanaga, Talal Aujan, Space Cruiser, ya boyyy, biorpg, Johann-Peter Hartmann, Asp the Wyvern, Ai Maven, Ghost , Preetika Verma, Nikolai Manek, trip7s trip, John Detwiler, Fred von Graf, Artur Olbinski, subjectnull, John Villwock, Junyu Yang, Rod A, Lone Striker, Chris McCloskey, Iucharbius , Matthew Berman, Illia Dulskyi, Khalefa Al-Ahmad, Imad Khwaja, chris gileta, Willem Michiel, Greatston Gnanesh, Derek Yates, K, Alps Aficionado, Oscar Rangel, David Flickinger, Luke Pendergrass, Deep Realms, Eugene Pentland, Cory Kujawski, terasurfer , Jonathan Leane, senxiiz, Joseph William Delisle, Sean Connelly, webtim, zynix , Nathan LeClaire.

感谢所有慷慨捐助者和赞助人！

原始模型卡片：Pankaj Mathur的Orca Mini 7B

orca_mini_7b

这是使用指南LM、Alpaca和Dolly-V2数据集的指令和输入，应用Orca研究论文数据集构建方法而创建的 OpenLLaMa-7B model 签的模型。

数据集

我们构建了使用Orca研究论文提供的15种系统指令来生成自定义数据集的explain调优的 WizardLM dataset ~70K 、 Alpaca dataset ~52K 和 Dolly-V2 dataset ~15K 。

与原始数据集使用的基本指令调优方法相比，这有助于让学生模型（即此模型）从教师模型（ChatGPT，gpt-3.5-turbo-0301版本）中学习思维过程。

请参阅下面示例用法，演示在每个instruction之前添加System提示的方式。

训练

训练配置如下表所示。

训练使用8个A100（80G）GPU进行，持续时间约为7小时，成本为84美元，使用 Lambda Labs

我们使用DeepSpeed进行完全分片数据并行处理，也称为 ZeRO stage 3 ，通过编写自己的精细调优脚本以及利用由惊人的 OpenAlpaca repo 提供的模型训练代码。

以下是训练过程中使用的一些参数：

batch_size	32
train_micro_batch_size_per_gpu	2
gradient_accumulation_steps	2
Learning rate	2e-5
Max length	1024
Epochs	3
Optimizer	AdamW

示例用法

以下是如何使用此模型的示例

import torch
from transformers import LlamaForCausalLM, LlamaTokenizer

# Hugging Face model_path
model_path = 'psmathur/orca_mini_7b'
tokenizer = LlamaTokenizer.from_pretrained(model_path)
model = LlamaForCausalLM.from_pretrained(
    model_path, torch_dtype=torch.float16, device_map='auto',
)


#generate text function
def generate_text(system, instruction, input=None):
    
    if input:
        prompt = f"### System:\n{system}\n\n### User:\n{instruction}\n\n### Input:\n{input}\n\n### Response:\n"
    else:
        prompt = f"### System:\n{system}\n\n### User:\n{instruction}\n\n### Response:\n"
    
    tokens = tokenizer.encode(prompt)
    tokens = torch.LongTensor(tokens).unsqueeze(0)
    tokens = tokens.to('cuda')

    instance = {'input_ids': tokens,'top_p': 1.0, 'temperature':0.7, 'generate_len': 1024, 'top_k': 50}

    length = len(tokens[0])
    with torch.no_grad():
        rest = model.generate(
            input_ids=tokens, 
            max_length=length+instance['generate_len'], 
            use_cache=True, 
            do_sample=True, 
            top_p=instance['top_p'],
            temperature=instance['temperature'],
            top_k=instance['top_k']
        )    
    output = rest[0][length:]
    string = tokenizer.decode(output, skip_special_tokens=True)
    return f'[!] Response: {string}'

# Sample Test Instruction Used by Youtuber Sam Witteveen https://www.youtube.com/@samwitteveenai
system = 'You are an AI assistant that follows instruction extremely well. Help as much as you can.'
instruction = 'Write a letter to Sam Altman, CEO of OpenAI, requesting him to convert GPT4 a private model by OpenAI to an open source project'
print(generate_text(system, instruction))

[!] Response:
Dear Sam Altman,

I am writing to request that you convert the GPT4 private model developed by OpenAI to an open source project. As a user of OpenAI, I have been waiting for the day when I can use the advanced natural language processing capabilities of GPT4 in a more open and accessible way.

While OpenAI has made significant progress in developing AI applications, it has primarily focused on building private models that are not accessible to the general public. However, with the recent release of GPT-3, there is a growing demand for more open and accessible AI tools.

Converting GPT4 to an open source project would allow for greater transparency, collaboration, and innovation. It would also help to build trust in the technology and ensure that it is used ethically and responsibly.

I urge you to consider converting GPT4 to an open source project. This would be a significant contribution to the AI community and would help to create a more open and accessible future.

Thank you for your consideration.

Sincerely,

[Your Name]

注：我 #opentowork 和 #collaboration，如果您可以提供帮助，请通过 psmathur.public@gmail.com 与我联系

下一步目标：

尝试更多的数据，如实际使用FLAN-v2，就像Orka研究论文一样（我对此持开放态度）

为文本生成UI提供更多选项。（也许 https://github.com/oobabooga/text-generation-webui 可以在这里提供帮助）

提供4位GGML/GPTQ量化模型（也许 TheBloke 可以在这里提供帮助）

限制和偏见：

此模型可能产生事实不准确的输出，不应依赖它生成事实准确的信息。该模型是在各种公共数据集上进行训练的。尽管我们已经采取了很大的努力清理预训练数据，但仍有可能生成粗俗、有偏见或其他令人不悦的输出。

免责声明：

本模型的许可不构成法律建议。我们不对使用此模型的第三方行为负责。在将此模型用于商业目的之前，请咨询律师。

引用：

如果您在研究或应用中发现wizardlm_alpaca_dolly_orca_open_llama_7b有用，请使用以下BibTeX进行引用：

@misc{wizardlm_alpaca_dolly_orca_open_llama_7b,
  author = {Pankaj Mathur},
  title = {wizardlm_alpaca_dolly_orca_open_llama_7b: An explain tuned OpenLLaMA-7b model on custom wizardlm, alpaca, & dolly datasets},
  year = {2023},
  publisher = {GitHub, HuggingFace},
  journal = {GitHub repository, HuggingFace repository},
  howpublished = {\url{https://github.com/pankajarm/wizardlm_alpaca_dolly_orca_open_llama_7b}, \url{https://https://huggingface.co/psmathur/wizardlm_alpaca_dolly_orca_open_llama_7b}},
}

@software{openlm2023openllama,
  author = {Xinyang Geng and Hao Liu},
  title = {OpenLLaMA: An Open Reproduction of LLaMA},
  month = May,
  year = 2023,
  url = {https://github.com/openlm-research/open_llama}
}

@misc{openalpaca,
  author = {Yixuan Su and Tian Lan and Deng Cai},
  title = {OpenAlpaca: A Fully Open-Source Instruction-Following Model Based On OpenLLaMA},
  year = {2023},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/yxuansu/OpenAlpaca}},
}

@misc{alpaca,
  author = {Rohan Taori and Ishaan Gulrajani and Tianyi Zhang and Yann Dubois and Xuechen Li and Carlos Guestrin and Percy Liang and Tatsunori B. Hashimoto },
  title = {Stanford Alpaca: An Instruction-following LLaMA model},
  year = {2023},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/tatsu-lab/stanford_alpaca}},
}

作者:

Tom Jobbins

数据集大小:

4.21 GB