A latent diffusion model that has been trained on Japanese Artist artwork, ヒトこもる/Hitokomoru . The current model is fine-tuned from waifu-diffusion-1-4 ( wd-1-4-anime_e2.ckpt ) with a learning rate of 2.0e-6 , 15000 training steps and 4 batch sizes on the 257 artworks collected from Danbooru. This model supposed to be a continuation of hitokomoru-diffusion fine-tuned from Anything V3.0. Dataset has been preprocessed using Aspect Ratio Bucketing Tool so that it can be converted to latents and trained at non-square resolutions. Like other anime-style Stable Diffusion models, it also supports Danbooru tags to generate images.
e.g. 1girl, white hair, golden eyes, beautiful eyes, detail, flower meadow, cumulonimbus clouds, lighting, detailed sky, garden
worst quality, low quality, medium quality, deleted, lowres, comic, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, jpeg artifacts, signature, watermark, username, blurry
masterpiece, best quality, high quality, absurdres
This model can be used just like any other Stable Diffusion model. For more information, please have a look at the Stable Diffusion . You can also export the model to ONNX , MPS and/or FLAX/JAX .
You should install dependencies below in order to running the pipeline
pip install diffusers transformers accelerate scipy safetensors
Running the pipeline (if you don't swap the scheduler it will run with the default DDIM, in this example we are swapping it to DPMSolverMultistepScheduler):
import torch from torch import autocast from diffusers import StableDiffusionPipeline, DPMSolverMultistepScheduler model_id = "Linaqruf/hitokomoru-diffusion-v2" # Use the DPMSolverMultistepScheduler (DPM-Solver++) scheduler here instead pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16) pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config) pipe = pipe.to("cuda") prompt = "masterpiece, best quality, high quality, 1girl, solo, sitting, confident expression, long blonde hair, blue eyes, formal dress" negative_prompt = "worst quality, low quality, medium quality, deleted, lowres, comic, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, jpeg artifacts, signature, watermark, username, blurry" with autocast("cuda"): image = pipe(prompt, negative_prompt=negative_prompt, width=512, height=728, guidance_scale=12, num_inference_steps=50).images[0] image.save("anime_girl.png")
Here is some cherrypicked samples:
masterpiece, best quality, high quality, 1girl, solo, sitting, confident expression, long blonde hair, blue eyes, formal dress, jewelry, make-up, luxury, close-up, face, upper body. Negative prompt: worst quality, low quality, medium quality, deleted, lowres, comic, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, jpeg artifacts, signature, watermark, username, blurry Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 994051800, Size: 512x768, Model hash: ea61e913a0, Model: hitokomoru-v2, Batch size: 2, Batch pos: 0, Denoising strength: 0.6, Clip skip: 2, ENSD: 31337, Hires upscale: 1.5, Hires steps: 20, Hires upscaler: Latent (nearest-exact)
This model is open access and available to all, with a CreativeML OpenRAIL-M license further specifying rights and usage. The CreativeML OpenRAIL License specifies: