英文

? Lotus-12B

Lotus-12B是在2.5GB的不同类型轻小说、情色小说、注释文学和公共领域对话数据上进行微调的GPT-NeoX 12B模型,用于生成类似小说的虚构文本和对话。

模型描述

用于微调的模型是 Pythia 12B Deduped ,是一个训练在 The Pile 上的拥有120亿参数的自回归语言模型。

训练数据与注释提示

微调中使用的数据来自不同的来源,如 Gutenberg Project 。注释小说数据集中有添加的标签,以帮助生成特定风格的文本。以下是一个示例提示,展示如何使用这些标签。

[ Title: The Dunwich Horror; Author: H. P. Lovecraft; Genre: Horror; Tags: 3rdperson, scary; Style: Dark ]
***
When a traveler in north central Massachusetts takes the wrong fork...

而从 My Discord Server Reddit 公开可用的子版块中抓取的对话数据如下:

[ Title: (2019) Cars getting transported on an open deck catch on fire after salty water shorts their batteries; Genre: CatastrophicFailure ]
***
Anonymous: Daaaaaamn try explaining that one to the owners
EDIT: who keeps reposting this for my comment to get 3k upvotes?
Anonymous: "Your car caught fire from some water"
Irythros: Lol, I wonder if any compensation was in order
Anonymous: Almost all of the carriers offer insurance but it isn’t cheap. I guarantee most of those owners declined the insurance.

这些注释可以混合使用,以帮助生成特定风格的文本。

下游应用

该模型可用于娱乐目的,同时也可作为小说作家和聊天机器人的创作助手。

示例代码

from transformers import AutoTokenizer, AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained('hakurei/lotus-12B')
tokenizer = AutoTokenizer.from_pretrained('hakurei/lotus-12B')

prompt = '''[ Title: The Dunwich Horror; Author: H. P. Lovecraft; Genre: Horror ]
***
When a traveler'''

input_ids = tokenizer.encode(prompt, return_tensors='pt')
output = model.generate(input_ids, do_sample=True, temperature=1.0, top_p=0.9, repetition_penalty=1.2, max_length=len(input_ids[0])+100, pad_token_id=tokenizer.eos_token_id)

generated_text = tokenizer.decode(output[0])
print(generated_text)

使用此代码的示例输出结果将类似于:

[ Title: The Dunwich Horror; Author: H. P. Lovecraft; Genre: Horror ]
***
When a traveler comes to an unknown region, his thoughts turn inevitably towards the old gods and legends which cluster around its appearance. It is not that he believes in them or suspects their reality—but merely because they are present somewhere else in creation just as truly as himself, and so belong of necessity in any landscape whose features cannot be altogether strange to him. Moreover, man has been prone from ancient times to brood over those things most connected with the places where he dwells. Thus the Olympian deities who ruled Hyper

团队成员和致谢

本项目没有EleutherAI的工作就不可能完成,感谢他们的贡献!

为了联系我们,您可以加入我们的 Discord server