模型:

TheBloke/falcon-7b-instruct-GPTQ

任务:

文本生成

类库:

Transformers

数据集:

tiiuae/falcon-refinedweb 3Atiiuae/falcon-refinedweb

语言:

其他:

RefinedWebModel custom_code text-generation-inference

预印本库:

arxiv:2205.14135 arxiv:1911.02150 arxiv:2005.14165 arxiv:2104.09864

许可:

apache-2.0

模型介绍文件清单

英文

Chat & support: my new Discord server

Want to contribute? TheBloke's Patreon page

Falcon-7B-Instruct GPTQ

这个存储库包含一个实验性的4位GPTQ模型，用于 Falcon-7B-Instruct 。

这是使用 AutoGPTQ 将其量化为4位的结果。

性能

请注意，使用此GPTQ的性能目前非常慢。

使用最新的GPTQ-for-LLaMa代码可能会有更好的性能，但我个人还没有测试过。

提示模板

A helpful assistant who helps the user with any questions asked.
User: prompt
Assistant:

AutoGPTQ

需要使用AutoGPTQ：GITHUB_ACTIONS=true pip install auto-gptq

AutoGPTQ提供了适用于Windows和Linux的预编译轮子，使用CUDA工具包11.7或11.8。

如果您正在运行CUDA工具包12.x，您将需要按照以下说明自行编译：

git clone https://github.com/PanQiWei/AutoGPTQ
cd AutoGPTQ
pip install .

这些手动步骤需要您安装 Nvidia CUDA toolkit 。

如何下载和使用text-generation-webui中的此模型

启动text-generation-webui

单击“模型”选项卡。

取消选中“自动加载模型”。

在“下载自定义模型或LoRA”下，输入“TheBloke/falcon-7B-instruct-GPTQ”。

单击“下载”。

等待下载完成的提示。

单击左上角的“模型”旁边的“刷新”图标。

在“模型下拉菜单”中选择您刚下载的模型“falcon-7B-instruct-GPTQ”。

将“加载器”设置为“AutoGPTQ”。此模型不适用于ExLlama。它可能适用于最新的GPTQ-for-LLaMa，但我没有测试过。

选择“信任远程代码”，然后保存设置。

单击“重新加载”。

等待加载完成后，单击“文本生成”选项卡，然后输入提示！

关于trust_remote_code

请注意，此命令行参数会在您的计算机上执行来自Falcon的Python代码。

目前需要此代码，因为Falcon太新，无法由Hugging Face transformers支持。在将来的某个时间点，transformers将原生支持该模型，那时就不再需要trust_remote_code。

在此存储库中，您可以看到两个.py文件-这些文件将被执行。它们是从 Falcon-7B-Instruct 的基础存储库中复制过来的。

简单的Python示例代码

要运行此代码，您需要安装AutoGPTQ和einops：

GITHUB_ACTIONS=true pip install auto-gptq
pip install einops

然后，您可以运行此示例代码：

from transformers import AutoTokenizer, pipeline, logging
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
import argparse

model_name_or_path = "TheBloke/falcon-7b-instruct-GPTQ"
# You could also download the model locally, and access it there
# model_name_or_path = "/path/to/TheBloke_falcon-7b-instruct-GPTQ"

model_basename = "gptq_model-4bit-64g"

use_triton = False

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)

model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
        model_basename=model_basename,
        use_safetensors=True,
        trust_remote_code=True,
        device="cuda:0",
        use_triton=use_triton,
        quantize_config=None)

prompt = "Tell me about AI"
prompt_template=f'''A helpful assistant who helps the user with any questions asked.
User: {prompt}
Assistant:'''

print("\n\n*** Generate:")

input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
output = model.generate(inputs=input_ids, temperature=0.7, max_new_tokens=512)
print(tokenizer.decode(output[0]))

# Inference can also be done using transformers' pipeline
# Note that if you use pipeline, you will see a spurious error message saying the model type is not supported
# This can be ignored!  Or you can hide it with the following logging line:
# Prevent printing spurious transformers error when using pipeline with AutoGPTQ
logging.set_verbosity(logging.CRITICAL)

print("*** Pipeline:")
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=512,
    temperature=0.7,
    top_p=0.95,
    repetition_penalty=1.15
)

print(pipe(prompt_template)[0]['generated_text'])

提供的文件

gptq_model-4bit-64g.safetensors

这将与AutoGPTQ 0.2.0及更高版本一起使用。

它是使用groupsize 64创建的，以提高推理质量，并且没有使用desc_act（act-order）以增加推理速度。

gptq_model-4bit-64g.safetensors
- 适用于AutoGPTQ CUDA 0.2.0及更高版本。
  - 此时它不适用于AutoGPTQ Triton，但希望将来会增加支持。
- 适用于使用--trust-remote-code的text-generation-webui
- 不适用于任何版本的GPTQ-for-LLaMa
- 参数：Groupsize = 64。无act-order。

Discord

如需进一步支持以及有关这些模型和AI的讨论，请加入我们的社区： TheBloke AI's Discord server

感谢和如何进行贡献。

感谢 chirper.ai 团队！

我已经有很多人问我是否可以贡献。我喜欢提供模型和帮助人们，并且很乐意能够更多地花时间做这些事情，以及扩展到新的项目，如微调/训练。

如果您能和愿意做出贡献，我将非常感激，并且这将帮助我继续提供更多的模型，并开始新的AI项目。

捐助者将优先获得在所有AI / LLM /模型问题和请求方面的支持，可以进入私人Discord房间，并享受其他福利。

Patreon： https://patreon.com/TheBlokeAI
Ko-Fi： https://ko-fi.com/TheBlokeAI

特别感谢：Luke from CarbonQuill，Aemon Algiz。

Patreon特别提到：RoA，Lone Striker，Gabriel Puliatti，Derek Yates，Randy H，Jonathan Leane，Eugene Pentland，Karl Bernard，Viktor Bowallius，senxiiz，Daniel P. Andersen，Pierre Kircher，Deep Realms，Cory Kujawski，Oscar Rangel，Fen Risland，Ajan Kanaga，LangChain4j，webtim，Nikolai Manek，特伦顿·丹布罗维茨，拉文·克劳，卡利拉，Khalefa Al-Ahmad，克里斯·麦克洛斯基，Luke @flexchar，Ai Maven，Dave，Asp the Wyvern，Sean Connelly，Imad Khwaja，Space Cruiser，Rainer Wilmers，subjectnull，Alps Aficionado，Willian Hasse, Fred von Graf，Artur Olbinski，约翰-彼得·哈特曼，WelcomeToTheClub，Willem Michiel，Michael Levine，Iucharbius，Spiking Neurons AB，K，biorpg，John Villwock，Pyrater，Greatston Gnanesh，Mano Prime，Junyu Yang，Stephen Murray，John Detwiler，Luke Pendergrass，terasurfer，Pieter，zynix，Edmond Seymore，theTransient，Nathan LeClaire，vamX，Kevin Schuppel，Preetika Verma，ya boyyy，Alex，SuperWojo，Ghost，Joseph William Delisle，Matthew Berman，Talal Aujan，chris gileta，Illia Dulskyi。

感谢所有慷慨的赞助者和捐助者！

✨ Falcon-7B-Instruct的原始模型卡片：Falcon-7B-Instruct

Falcon-7B-Instruct是由 TII 基于 Falcon-7B 构建的7B参数因果解码器模型，并在混合的聊天/讲解数据集上进行了微调。根据 TII Falcon LLM License 提供。

即将推出的论文😊。

为什么使用Falcon-7B-Instruct？

您正在寻找基于 Falcon-7B 的可用的聊天/讲解模型。
Falcon-7B是一个强大的基础模型，优于其他可比较的开源模型（例如 MPT-7B ， StableLM ， RedPajama 等），因为它在1500B令牌的 RefinedWeb 基础上进行了训练，并配合了精心策划的语料库。请参阅 OpenLLM Leaderboard 以了解详情。
它具有经过优化的推理架构，具有FlashAttention（ Dao et al., 2022 ）和多查询（ Shazeer et al., 2019 ）。

💬 这是一个讲解模型，对于进一步的微调可能不是理想的选择。如果您有兴趣构建自己的讲解/聊天模型，我们建议从 Falcon-7B 开始。

🔥 想要更强大的模型吗？Falcon-7B-Instruct的大哥是 Falcon-40B-Instruct ！

from transformers import AutoTokenizer, AutoModelForCausalLM
import transformers
import torch

model = "tiiuae/falcon-7b-instruct"

tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
    device_map="auto",
)
sequences = pipeline(
   "Girafatron is obsessed with giraffes, the most glorious animal on the face of this Earth. Giraftron believes all other animals are irrelevant when compared to the glorious majesty of the giraffe.\nDaniel: Hello, Girafatron!\nGirafatron:",
    max_length=200,
    do_sample=True,
    top_k=10,
    num_return_sequences=1,
    eos_token_id=tokenizer.eos_token_id,
)
for seq in sequences:
    print(f"Result: {seq['generated_text']}")

💥 Falcon LLMs需要使用transformers的PyTorch 2.0版本！

Falcon-7B-Instruct模型卡片

模型详情

模型描述

开发者： https://www.tii.ae ；
模型类型：因果解码器模型；
语言（NLP）：英语和法语；
许可证： TII Falcon LLM License ；
从模型进行微调： Falcon-7B 。

模型来源

论文：即将推出。

用途

直接应用

Falcon-7B-Instruct已经在讲解和聊天数据集的混合中进行了微调。

超出范围的应用

在没有充分评估风险和采取适当预防措施的情况下的生产用途；任何被认为是不负责任或有害的用途。

偏见、风险和限制

Falcon-7B-Instruct主要是在英语数据上训练的，对其他语言不适用。此外，由于它是在代表网络的大规模语料库上训练的，因此它将携带在网上常见的刻板印象和偏见。

建议

我们建议Falcon-7B-Instruct的用户制定保护措施，对任何生产用途采取适当的预防措施。

如何开始使用模型

from transformers import AutoTokenizer, AutoModelForCausalLM
import transformers
import torch

model = "tiiuae/falcon-7b-instruct"

tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
    device_map="auto",
)
sequences = pipeline(
   "Girafatron is obsessed with giraffes, the most glorious animal on the face of this Earth. Giraftron believes all other animals are irrelevant when compared to the glorious majesty of the giraffe.\nDaniel: Hello, Girafatron!\nGirafatron:",
    max_length=200,
    do_sample=True,
    top_k=10,
    num_return_sequences=1,
    eos_token_id=tokenizer.eos_token_id,
)
for seq in sequences:
    print(f"Result: {seq['generated_text']}")

训练详细信息

训练数据

Falcon-7B-Instruct在250M令牌的讲解/聊天数据集上进行了微调。

Data source	Fraction	Tokens	Description
12332321	65%	164M	chat
12333321	25%	62M	instruct
12334321	5%	11M	instruct
12335321	5%	13M	massive web crawl

数据使用Falcon- 7B / 40B 标记。

评估

即将推出的论文。

请参阅 OpenLLM Leaderboard 获取早期结果。

请注意，此模型变体未针对NLP基准进行优化。

技术规格

有关预训练的更多信息，请参阅 Falcon-7B 。

模型架构和目标

Falcon-7B是一个因果解码器模型，训练任务是因果语言建模（即预测下一个令牌）。

该架构大致基于GPT-3论文（ Brown et al., 2020 ），具有以下差异：

位置嵌入：旋转（ Su et al., 2021 ）；
注意力：多查询（ Shazeer et al., 2019 ）和FlashAttention（ Dao et al., 2022 ）；
解码器块：平行注意力/MLP与单层规范化。

Hyperparameter	Value	Comment
Layers	32
d_model	4544	Increased to compensate for multiquery
head_dim	64	Reduced to optimise for FlashAttention
Vocabulary	65024
Sequence length	2048

计算设施

硬件

Falcon-7B-Instruct在AWS SageMaker上使用32个A100 40GB GPU在P4d实例上进行训练。

软件

Falcon-7B-Instruct使用自定义的分布式训练代码库Gigatron进行训练。它采用三维并行性方法，结合ZeRO和高性能的Triton内核（FlashAttention等）。

引用

即将推出的论文😊。

许可证

Falcon-7B-Instruct根据 TII Falcon LLM License 提供。总的来说，

您可以自由使用我们的模型进行研究和/或个人目的；
您可以分享和构建这些模型的派生作品，但您必须进行归因并使用相同的许可证共享；
对于商业用途，如果归因收入小于每年100万美元，则免除版税支付，否则您应该与TII签订商业协议。

联系

falconllm@tii.ae

作者:

Tom Jobbins

数据集大小:

5.54 GB