模型:

togethercomputer/GPT-NeoXT-Chat-Base-20B

英文

随意尝试我们的 OpenChatKit feedback app

GPT-NeoXT-Chat-Base-20B-v0.16

TLDR:作为OpenChatKit的一部分(代码库可在 here 中找到),GPT-NeoXT-Chat-Base-20B-v0.16是一个20B参数的语言模型,通过EleutherAI的GPT-NeoX进行了超过4000万次指令的100%负碳的计算。

GPT-NeoXT-Chat-Base-20B-v0.16基于EleutherAI的GPT-NeoX模型,并使用关注对话式交互的数据进行了微调。我们将调整重点放在了问答、分类、抽取和摘要等多个任务上。我们使用了4300万条高质量指令的集合对模型进行了微调。LAION和Ontocord.ai是我们的合作伙伴,他们共同帮助策划了该模型所基于的数据集。您可以在LAION的博客文章 here 中了解更多关于此过程和该数据集的可用性的信息。

除了上述的微调,GPT-NeoXT-Chat-Base-20B-v0.16还通过一小部分反馈数据进行了进一步的微调。这使得模型能够更好地适应人类在对话中的偏好。

模型详情

  • 开发者:Together Computer。
  • 模型类型:语言模型。
  • 语言:英语。
  • 许可证:Apache 2.0。
  • 模型描述:一个20B参数的开源聊天模型,通过EleutherAI的NeoX进行了超过4000万次指令的100%负碳计算。
  • 获取更多信息的资源: GitHub Repository

快速入门

GPU推理

这需要一块48GB内存的GPU。

from transformers import AutoTokenizer, AutoModelForCausalLM
# init
tokenizer = AutoTokenizer.from_pretrained("togethercomputer/GPT-NeoXT-Chat-Base-20B")
model = AutoModelForCausalLM.from_pretrained("togethercomputer/GPT-NeoXT-Chat-Base-20B", torch_dtype=torch.float16)
model = model.to('cuda:0')
# infer
inputs = tokenizer("<human>: Hello!\n<bot>:", return_tensors='pt').to(model.device)
outputs = model.generate(**inputs, max_new_tokens=10, do_sample=True, temperature=0.8)
output_str = tokenizer.decode(outputs[0])
print(output_str)

Int8格式的GPU推理

这需要一块24GB内存的GPU。

from transformers import AutoTokenizer, AutoModelForCausalLM
# init
tokenizer = AutoTokenizer.from_pretrained("togethercomputer/GPT-NeoXT-Chat-Base-20B")
model = AutoModelForCausalLM.from_pretrained("togethercomputer/GPT-NeoXT-Chat-Base-20B", device_map="auto", load_in_8bit=True)
# infer
inputs = tokenizer("<human>: Hello!\n<bot>:", return_tensors='pt').to(model.device)
outputs = model.generate(**inputs, max_new_tokens=10, do_sample=True, temperature=0.8)
output_str = tokenizer.decode(outputs[0])
print(output_str)

CPU推理

from transformers import AutoTokenizer, AutoModelForCausalLM
# init
tokenizer = AutoTokenizer.from_pretrained("togethercomputer/GPT-NeoXT-Chat-Base-20B")
model = AutoModelForCausalLM.from_pretrained("togethercomputer/GPT-NeoXT-Chat-Base-20B", torch_dtype=torch.bfloat16)
# infer
inputs = tokenizer("<human>: Hello!\n<bot>:", return_tensors='pt').to(model.device)
outputs = model.generate(**inputs, max_new_tokens=10, do_sample=True, temperature=0.8)
output_str = tokenizer.decode(outputs[0])
print(output_str)

模型优势

OpenChatKit在多个任务上表现出色。包括:

  • 示例1:上下文中的摘要和问答。
  • **Summarize a long document into a single sentence and conduct question answering related to the document, with multiple rounds**
    
    <human>: Last year, the travel industry saw a big rebound in demand — and that demand is showing no signs of slowing down this spring break travel season. Planes and hotels will be full, travelers will likely face long queues, cancellations, massive crowds and plenty of other travel nightmares. But perhaps the most frustrating thing you’ll have to prepare for this spring break is if your luggage goes missing. You’ve got to somehow track down your belongings that were in the airline’s care and then wait to be reunited — all while trying to enjoy the start of your long-awaited vacation. As far as travel nightmares go, lost luggage is up there as one of the worst.
    
    To be honest up front, I am great at strategically losing things — or rather, losing stuff with style. (I hope you appreciate the “Toy Story” reference.) So when a product comes along that’s a perfect way to keep track of my belongings — especially when traveling — I’m adding it to my travel bag.
    
    I’ve been a big fan of Apple’s $29 item tracker, dubbed AirTag. The long-rumored AirTag product finally launched in 2021, and while it’s not the first item tracker to exist, it has two key features that set it apart from the rest to make it the ultimate travel companion: access to the Find My network and Precision Finding technology.
    
    The Federal Aviation Administration has even ruled that having an AirTag in your luggage is permitted and, in fact, not a safety hazard. The devices only contain 0.1 grams of lithium, which falls below the FAA’s limit.
    
    “Luggage tracking devices powered by lithium metal cells that have 0.3 grams or less of lithium can be used on checked baggage,” the FAA said in a statement. “Apple AirTags meet this threshold; other luggage tracking devices may not.”
    
    But, you may be wondering what makes them so special and why you should carry one in your checked luggage this spring break? Let’s explain why.
    
    Summarize the above into a single paragraph.
    
    <bot>: If you're traveling this spring break, you might want to consider investing in an item tracker to help you find your lost luggage. Apple's AirTag is the best option, according to a new report.
    
    <human>: But isn't it unsafe to put AirTag in a luggage?
    
    <bot>: The Federal Aviation Administration has ruled that having an AirTag in your luggage is permitted and, in fact, not a safety hazard.
    
    <human>: Ah, why is that?
    
    <bot>: The devices only contain 0.1 grams of lithium, which falls below the FAA’s limit.
    
  • 示例2:抽取。
  • **Extract structured information from unstructured documents, as illustrated in the following example.**
    
    <human>: Generate a table summarizing the options outlined in this email. 
    
    Team, we have to make a decision on the product plan for Goocrux v2. 
    
    The first option will take us 3 weeks of development, and will achieve performance of 80ms, but means that the UI has to change. 
    
    The second option will take us 4 weeks of development, and will achieve performance of 60ms, without any UI change. 
    
    The third option will take the longest -- 6 weeks of development, will achieve performance of 30ms, without any UI change. 
    
    A markdown table with four columns: (1) Option ID, (2) Development time, (3) Performance, (4) UI Change:
    
    <bot>: #.
    | ID | Option ID | Development time | Performance | UI Change |
    | --------- | --------- | --------- | --------- | --------- |
    | 1 | 1 | 3 weeks | 80ms | Yes |
    | 2 | 2 | 4 weeks | 60ms | No |
    | 3 | 3 | 6 weeks | 30ms | No |
    
  • 示例3:分类。
  • **Classifying a sentence or paragraph into different categories.**
    
    <human>: Classify the sentiment of the following sentence into Positive, Neutral, or Negative: Super excited about teaching Stanford’s first course on Large Language Models! Check the syllabus out here
    <bot>: Positive
    
    <human>: Classify the sentiment of the following sentence into Positive, Neutral, or Negative: How about the following sentence: It is raining outside and I feel so blue
    <bot>: Negative
    

此外,该模型在少样本提示下表现出色。对于分类和抽取,模型在少样本情况下的表现更好,就像大多数HELM任务一样。如果您有兴趣尝试使用少样本提示与该模型,请使用 Contact us

模型的不足之处

话虽如此,我们仍有一些工作需要做,需要您的帮助!其中一些问题包括:

  • 基于知识的封闭问题和回答:聊天机器人可能会产生错误结果。请务必核实事实,并尽可能提供更正信息的反馈。
  • 编码任务:聊天机器人在足够大的源代码语料库上的训练不足,无法优秀地编写代码。我们欢迎贡献额外的数据集以改进这一点!
  • 重复:有时聊天机器人会重复其回答。我们正在努力改进,但与此同时,您可以点击刷新按钮开始新的对话。
  • 上下文切换:如果您在对话过程中更改主题,聊天机器人通常无法自动切换并继续提供与之前主题相关的答案。
  • 创意写作和较长回答:聊天机器人无法生成长篇创意性的文字,例如文章或故事。

我们非常期待通过您的反馈、增加数据集和提高准确性来解决这些问题。

用途

直接使用

该模型旨在用于研究目的。可能的研究领域和任务包括:

  • 安全部署具有生成有害内容潜力的模型。
  • 探索和了解对话模型或语言模型的局限性和偏见。
  • 生成艺术作品并在设计和其他艺术过程中使用。
  • 应用于教育或创意工具。
  • 对话模型或语言模型的研究。

以下是不包括在内的用途。

滥用、恶意使用和超范围使用

OpenChatKit社区提供GPT-NeoXT-Chat-Base-20B-v0.16作为构建聊天机器人的开源工具。社区不对模型的滥用、恶意使用或超范围使用负责。最终用户有责任确保以负责任和合乎道德的方式使用该模型。

不在范围内的使用

GPT-NeoXT-Chat-Base-20B-v0.16设计用于聊天机器人应用,并且可能在其预期范围之外的其他用例中表现不佳。例如,它可能不适用于安全关键应用程序或对个人或社会产生重大影响的决策。重要的是要考虑模型的局限性,并仅将其用于预期目的。

滥用和恶意使用

GPT-NeoXT-Chat-Base-20B-v0.16设计用于聊天机器人应用,不应用于其他目的。严禁滥用模型,例如用于从事非法或不道德的活动,这与OpenChatKit社区项目的原则相违背。

使用该模型生成对个人残忍的内容是对该模型的滥用。这包括但不限于:

  • 生成虚假消息、错误信息或宣传。
  • 促进仇恨言论、歧视或暴力行为,针对个人或群体。
  • 未经个人或组织同意进行冒充。
  • 从事网络欺凌或骚扰。
  • 诽谤性内容。
  • 垃圾邮件或欺诈。
  • 未经适当授权分享保密或敏感信息。
  • 违反模型使用条款或训练模型的数据。
  • 创建用于恶意目的的自动化机器人,例如传播恶意软件、网络钓鱼诈骗或垃圾邮件。

限制

GPT-NeoXT-Chat-Base-20B-v0.16像其他基于语言模型的聊天机器人一样存在一些限制,这些限制应予以考虑。例如,该模型可能无法始终提供准确或相关的答案,特别是对于复杂、模糊或超出其训练数据范围的问题。因此,我们欢迎个人和组织的贡献,并鼓励合作,创建更强大和包容性的聊天机器人。

训练

训练数据

请参考 togethercomputer/OpenDataHub

训练过程

  • 硬件:2 x 8 x A100 GPU
  • 优化器: 8bit-AdamW
  • 梯度累积:2
  • Batch大小:524288个标记,2 x 2 x 64 x 2048
  • 学习率:热身到1e-6,持续100个步骤,然后保持不变

社区

Together Discord 上加入我们。