模型:

TheBloke/baichuan-7B-GPTQ

许可:

other

预印本库:

arxiv:2009.03300 arxiv:1910.07467

其他:

custom_code baichuan

任务:

文本生成

类库:

Transformers

模型介绍文件清单

英文

Chat & support: my new Discord server

Want to contribute? TheBloke's Patreon page

百川公司的Baichuan 7B GPTQ

这些是 Baichuan Inc's Baichuan 7B 的GPTQ 4位模型文件。

这是使用 AutoGPTQ 进行4位量化的结果。

可用的存储库

实验性的第一个GPTQ，需要最新的AutoGPTQ代码

这是全新模型类型的第一次量化。

它只能与AutoGPTQ一起使用，并且只能使用从源代码编译的最新版本的AutoGPTQ。

要合并此PR，请按照以下步骤从源码安装最新的AutoGPTQ：

Linux：

pip uninstall -y auto-gptq
git clone https://github.com/PanQiWei/AutoGPTQ
cd AutoGPTQ
GITHUB_ACTIONS=true pip install .

Windows（命令提示符）：

pip uninstall -y auto-gptq
git clone https://github.com/PanQiWei/AutoGPTQ
cd AutoGPTQ
set GITHUB_ACTIONS=true
pip install .

相信远程代码

由于这是一种新的模型类型，尚未得到Transformers的支持，您必须使用Trust Remote Code进行推理。

在text-generation-webui中，您可以通过在UI中勾选"Trust Remote Code"，或通过在命令行中传递--trust-remote-code来实现。

在Python代码中，请在AutoTokenizer.from_pretrained()和AutoGPTQForCausalLM.from_quantized()调用中传递trust_remote_code=True。

提示模板

目前未知一个通用的提示模板。

README中给出的示例是一次性的分类：

Hamlet->Shakespeare\nOne Hundred Years of Solitude->

如何轻松在text-generation-webui中下载和使用此模型

请确保您使用的是text-generation-webui的最新版本

点击"Model"选项卡。

取消选择"Autoload"。

在"Download custom model or LoRA"下输入"TheBloke/baichuan-7B-GPTQ"。

点击"Download"。

模型开始下载。下载完成后会显示"Done"。

选择AutoGPTQ加载器。

在左上方，点击"Model"旁边的刷新图标。

在"Model"下拉菜单中，选择刚刚下载的模型：baichuan-7B-GPTQ

勾选"Trust Remote Code"。然后点击"Save Settings"，再点击"Reload"

模型将自动加载，现在可以使用了！

准备好后，点击"Text Generation"选项卡，输入提示开始使用！

如何从Python代码中使用此GPTQ模型

首先确保您已从上述提到的源代码安装了最新的 AutoGPTQ 。

然后尝试以下示例代码：

from transformers import AutoTokenizer
from auto_gptq import AutoGPTQForCausalLM

model_name_or_path = 'TheBloke/baichuan-7B-GPTQ'
# Or you can clone the model locally and reference it on disk, eg with:
# model_name_or_path = "/path/to/TheBloke_baichuan-7B"

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, trust_remote_code=True)

model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
        use_safetensors=True,
        device_map="auto",
        trust_remote_code=True)

# This is the example from the Baichuan README
inputs = tokenizer('Hamlet->Shakespeare\nOne Hundred Years of Solitude->', return_tensors='pt')
inputs = inputs.to('cuda:0')
pred = model.generate(**inputs, max_new_tokens=64,repetition_penalty=1.1)
print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))

# Here's my own example, which sometimes kind of works.
inputs = tokenizer('USER:Write a story about llamas\nASSISTANT:', return_tensors='pt')
inputs = inputs.to('cuda:0')
pred = model.generate(**inputs, max_new_tokens=500,repetition_penalty=1.1)
print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))

提供的文件

gptq_model-4bit-128g.safetensors

这仅适用于最新的 AutoGPTQ ，需要从源代码重新编译。

gptq_model-4bit-128g.safetensors
- 仅适用于最新的AutoGPTQ，从源代码编译。
- 需要trust_remote_code。
- 适用于text-generation-webui，但还不能与一键安装程序配合使用，除非手动重新编译AutoGPTQ。
- 参数：Groupsize = 128。Act Order/desc_act = False。

Discord

如需进一步支持，并就这些模型和人工智能进行讨论，请加入：

TheBloke AI's Discord server

感谢以及如何贡献

感谢 chirper.ai 团队！

许多人问我是否可以做出贡献。我喜欢提供模型和帮助别人，而且很愿意能够花更多时间这样做，以及扩展到新的项目，比如微调/训练。

如果您有能力和意愿作出贡献，我将非常感激，并且将帮助我继续提供更多模型，并开始进行新的人工智能项目。

资助者将获得对所有AI/LLM/模型问题和请求的优先支持，可以进入私人Discord房间，以及其他福利。

Patreon： https://patreon.com/TheBlokeAI
Ko-Fi： https://ko-fi.com/TheBlokeAI

特别感谢：CarbonQuill的Luke，Aemon Algiz，Dmitriy Samsonov。

Patreon特别提及：Mano Prime，Fen Risland，Derek Yates，Preetika Verma，webtim，Sean Connelly，Alps Aficionado，Karl Bernard，Junyu Yang，Nathan LeClaire，Chris McCloskey，Lone Striker，Asp the Wyvern，Eugene Pentland，Imad Khwaja，trip7s trip，WelcomeToTheClub，John Detwiler，Artur Olbinski，Khalefa Al-Ahmad，Trenton Dambrowitz，Talal Aujan，Kevin Schuppel，Luke Pendergrass，Pyrater，Joseph William Delisle，terasurfer，vamX，Gabriel Puliatti，David Flickinger，Jonathan Leane，Iucharbius，Luke，Deep Realms，Cory Kujawski，ya boyyy，Illia Dulskyi，senxiiz，Johann-Peter Hartmann，John Villwock，K，Ghost，Spiking Neurons AB，Nikolai Manek，Rainer Wilmers，Pierre Kircher，biorpg，Space Cruiser，Ai Maven，subjectnull，Willem Michiel，Ajan Kanaga，Kalila，chris gileta，Oscar Rangel

感谢所有慷慨的赞助者和捐助者！

原始模型卡：百川公司的Baichuan 7B

baichuan-7B

baichuan-7B是百川智能科技开发的一个开源大规模预训练模型。基于Transformer结构，在大约1.2万亿个tokens上训练的70亿参数模型，支持中英双语，上下文窗口长度为4096。在标准的中文和英文权威基准测试（C-EVAL/MMLU）上取得了同尺寸模型最佳效果。

如果您希望使用baichuan-7B（进行推理、微调等），我们建议使用配套的代码库 baichuan-7B 。

为什么使用baichuan-7B

baichuan-7B在相同尺寸的模型中达到了目前的SOTA水平，参见下面的MMLU指标。
baichuan-7B使用了自有的中英文双语语料进行训练，在中文方面进行了优化，在C-Eval上达到了SOTA水平。
与完全禁止商业使用的LLaMA不同，baichuan-7B采用了更宽松的开源许可证，允许用于商业目的。
在同尺寸模型中，baichuan-7B达到了目前SOTA的水平，参考下面MMLU指标
baichuan-7B使用自有的中英文双语语料进行训练，在中文上进行优化，在C-Eval达到SOTA水平
不同于LLaMA完全禁止商业使用，baichuan-7B使用更宽松的开源协议，允许用于商业目的

如何开始使用该模型

以下是使用baichuan-7B进行一次性推理的任务，根据作品给出作者名，正确的输出应为"夜雨寄北->李商隐"

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("baichuan-inc/baichuan-7B", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("baichuan-inc/baichuan-7B", device_map="auto", trust_remote_code=True)
inputs = tokenizer('登鹳雀楼->王之涣\n夜雨寄北->', return_tensors='pt')
inputs = inputs.to('cuda:0')
pred = model.generate(**inputs, max_new_tokens=64,repetition_penalty=1.1)
print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))

The following is a task of performing 1-shot inference using baichuan-7B, where the author's name is given based on the work, with the correct output being "One Hundred Years of Solitude->Gabriel Garcia Marquez"

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("baichuan-inc/baichuan-7B", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("baichuan-inc/baichuan-7B", device_map="auto", trust_remote_code=True)
inputs = tokenizer('Hamlet->Shakespeare\nOne Hundred Years of Solitude->', return_tensors='pt')
inputs = inputs.to('cuda:0')
pred = model.generate(**inputs, max_new_tokens=64,repetition_penalty=1.1)
print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))

模型详细信息

模型描述

开发者：百川智能
Email：opensource@baichuan-inc.com
语言（NLP）：中文/英文
许可证： baichuan-7B License

模型来源

整体模型基于标准的Transformer结构，我们采用了和LLaMA一样的模型设计

位置嵌入：采用rotary-embedding，是现阶段被大多数模型采用的位置编码方案，具有很好的外推性。
前馈层：采用SwiGLU，Feedforward变化为（8/3）倍的隐层大小，即11008。
层归一化：基于 RMSNorm 的Pre-Normalization。

具体参数如下表所示

Hyperparameter	Value
n_parameters	7000559616
n_layers	32
n_heads	32
d_model	4096
vocab size	64000
sequence length	4096

The overall model is based on the standard Transformer structure, and we have adopted the same model design as LLaMA:

Position Embedding: We use rotary-embedding, which is the position encoding scheme adopted by most models at this stage, and it has excellent extrapolation capabilities.
Feedforward Layer: We use SwiGLU. The feedforward changes to (8/3) times the size of the hidden layer, that is, 11008.
Layer Normalization: Pre-Normalization based on RMSNorm .

The specific parameters are as follows:

Hyperparameter	Value
n_parameters	7000559616
n_layers	32
n_heads	32
d_model	4096
vocab size	64000
sequence length	4096

用途

下游使用

我们还开源了与该模型配套的训练代码，可用于下游任务的高效微调，详见 baichuan-7B 。

超出范围的使用

在没有充分评估风险和采取缓解措施的情况下投入生产使用；任何可能被视为不负责任或有害的使用案例。

Production use without adequate assessment of risks and mitigation; any use cases which may be considered irresponsible or harmful.

偏见、风险和限制

baichuan-7B可能会产生事实上不正确的输出，不应依赖它产生事实上准确的信息。baichuan-7B是在各种公共数据集上训练的。尽管我们已经做出了巨大的努力来清洗预训练数据，但这个模型可能会生成淫秽、偏见或其他冒犯性的输出。

baichuan-7B can produce factually incorrect output, and should not be relied on to produce factually accurate information. baichuan-7B was trained on various public datasets. While great efforts have been taken to clean the pretraining data, it is possible that this model could generate lewd, biased or otherwise offensive outputs.

训练细节

具体的训练设置请参见 baichuan-7B 。

评估

中文评测

C-Eval

CEval数据集是一个全面的中文基础模型评测数据集，涵盖了52个学科和四个难度级别。我们使用该数据集的dev集作为few-shot的来源，在test集上进行了5-shot测试。

Model 5-shot	Average	Avg(Hard)	STEM	Social Sciences	Humanities	Others
GPT-4	68.7	54.9	67.1	77.6	64.5	67.8
ChatGPT	54.4	41.4	52.9	61.8	50.9	53.6
Claude-v1.3	54.2	39.0	51.9	61.7	52.1	53.7
Claude-instant-v1.0	45.9	35.5	43.1	53.8	44.2	45.4
moss-moon-003-base (16B)	27.4	24.5	27.0	29.1	27.2	26.9
Ziya-LLaMA-13B-pretrain	30.2	22.7	27.7	34.4	32.0	28.9
LLaMA-7B-hf	27.1	25.9	27.1	26.8	27.9	26.3
ChatGLM-6B	34.5	23.1	30.4	39.6	37.4	34.5
Falcon-7B	25.8	24.3	25.8	26.0	25.8	25.6
Open-LLaMA-v2-pretrain (7B)	24.0	22.5	23.1	25.3	25.2	23.2
TigerBot-7B-base	25.7	27.0	27.3	24.7	23.4	26.1
Aquila-7B *	25.5	25.2	25.6	24.6	25.2	26.6
BLOOM-7B	22.8	20.2	21.8	23.3	23.9	23.3
BLOOMZ-7B	35.7	25.8	31.3	43.5	36.6	35.6
baichuan-7B	42.8	31.5	38.2	52.0	46.2	39.3

高考

Gaokao 是一个以中国高考题作为评测大语言模型能力的数据集，用以评估模型的语言能力和逻辑推理能力。我们只保留了其中的单项选择题，并对所有模型进行统一5-shot测试。

以下是测试的结果。

Model	Average
Open-LLaMA-v2-pretrain	21.41
Ziya-LLaMA-13B-pretrain	23.17
Falcon-7B	23.98
TigerBot-7B-base	25.94
LLaMA-7B	27.81
ChatGLM-6B	21.41
BLOOM-7B	26.96
BLOOMZ-7B	28.72
Aquila-7B *	24.39
baichuan-7B	36.24

AGIEval

AGIEval 旨在评估模型的认知和解决问题相关的任务中的一般能力。我们只保留了其中的四选一单项选择题，随机划分后对所有模型进行了统一5-shot测试。

Model	Average
Open-LLaMA-v2-pretrain	23.49
Ziya-LLaMA-13B-pretrain	27.64
Falcon-7B	27.18
TigerBot-7B-base	25.19
LLaMA-7B	28.17
ChatGLM-6B	23.49
BLOOM-7B	26.55
BLOOMZ-7B	30.27
Aquila-7B *	25.58
baichuan-7B	34.44

*其中Aquila模型来源于智源官方网站，仅做参考

英文Leaderboard

除了中文，我们还测试了模型在英文上的性能。

MMLU

MMLU 是一个英文评估数据集，包括57个多项选择任务，涵盖了初等数学，美国历史，计算机科学，法律等。难度从高中级到专家级不等，是主流的LLM评估数据集。

我们采用了 open-source 的评估方案，最终的5-shot结果如下：

Model	Humanities	Social Sciences	STEM	Other	Average
LLaMA-7B 2	34.0	38.3	30.5	38.1	35.1
Falcon-7B 1	-	-	-	-	35.0
mpt-7B 1	-	-	-	-	35.6
ChatGLM-6B 0	35.4	41.0	31.3	40.5	36.9
BLOOM 7B 0	25.0	24.4	26.5	26.4	25.5
BLOOMZ 7B 0	31.3	42.1	34.4	39.0	36.1
moss-moon-003-base (16B) 0	24.2	22.8	22.4	24.4	23.6
moss-moon-003-sft (16B) 0	30.5	33.8	29.3	34.4	31.9
baichuan-7B 0	38.4	48.9	35.6	48.1	42.3

Model列中的上标表示结果的来源。

0:reimplemented
1:https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard
2:https://paperswithcode.com/sota/multi-task-language-understanding-on-mmlu

作者:

Tom Jobbins

数据集大小:

4.12 GB