该模型是一个潜在的扩散模型,经过训练以生成日本艺术家的艺术作品, ヒトこもる/Hitokomoru 。当前模型是从 waifu-diffusion-1-4 ( wd-1-4-anime_e2.ckpt )进行微调的,使用学习率为 2.0e-6 ,进行了15000个训练步骤,并且使用4个批次大小,在从Danbooru收集的257件艺术作品上进行训练。该模型是从 hitokomoru-diffusion 中断的Fine-tune继续训练的,该模型是从Anything V3.0进行Fine-tune的。数据集使用 Aspect Ratio Bucketing Tool 进行了预处理,以便将其转换为潜变量并在非方形分辨率下进行训练。与其他动漫风格的稳定扩散模型一样,该模型还支持使用Danbooru标签来生成图像。
例如:1girl,白色头发,金色眼睛,美丽的眼睛,细节,花田,积云,闪电,细致的天空,花园
worst quality, low quality, medium quality, deleted, lowres, comic, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, jpeg artifacts, signature, watermark, username, blurry
masterpiece, best quality, high quality, absurdres
这个模型可以像任何其他稳定扩散模型一样使用。更多信息,请查看 Stable Diffusion 。您还可以将模型导出为 ONNX , MPS 和/或 FLAX/JAX 。
您应该按照以下顺序安装依赖项才能运行流程
pip install diffusers transformers accelerate scipy safetensors
运行流程(如果您不更换调度程序,则默认使用DDIM运行,在此示例中,我们将其更换为DPMSolverMultistepScheduler):
import torch from torch import autocast from diffusers import StableDiffusionPipeline, DPMSolverMultistepScheduler model_id = "Linaqruf/hitokomoru-diffusion-v2" # Use the DPMSolverMultistepScheduler (DPM-Solver++) scheduler here instead pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16) pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config) pipe = pipe.to("cuda") prompt = "masterpiece, best quality, high quality, 1girl, solo, sitting, confident expression, long blonde hair, blue eyes, formal dress" negative_prompt = "worst quality, low quality, medium quality, deleted, lowres, comic, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, jpeg artifacts, signature, watermark, username, blurry" with autocast("cuda"): image = pipe(prompt, negative_prompt=negative_prompt, width=512, height=728, guidance_scale=12, num_inference_steps=50).images[0] image.save("anime_girl.png")
这里是一些精选的样本:
masterpiece, best quality, high quality, 1girl, solo, sitting, confident expression, long blonde hair, blue eyes, formal dress, jewelry, make-up, luxury, close-up, face, upper body. Negative prompt: worst quality, low quality, medium quality, deleted, lowres, comic, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, jpeg artifacts, signature, watermark, username, blurry Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 994051800, Size: 512x768, Model hash: ea61e913a0, Model: hitokomoru-v2, Batch size: 2, Batch pos: 0, Denoising strength: 0.6, Clip skip: 2, ENSD: 31337, Hires upscale: 1.5, Hires steps: 20, Hires upscaler: Latent (nearest-exact)
该模型是开放访问的,并按照CreativeML OpenRAIL-M许可协议进一步确定权利和用途。CreativeML OpenRAIL许可证规定: