模型:

lllyasviel/control_v11p_sd15_scribble

英文

Controlnet - v1.1 - Scribble版本

Controlnet v1.1 是 Controlnet v1.0 的后续模型,于 lllyasviel/ControlNet-v1-1 发布,由 Lvmin Zhang 开发。

此检查点是将 the original checkpoint 转换为扩散器格式。它可以与稳定扩散器(如 runwayml/stable-diffusion-v1-5 )结合使用。

更多细节,请参阅 ? Diffusers docs

ControlNet是一种神经网络结构,用于通过添加额外条件来控制扩散模型。

此检查点对应于基于涂鸦图像的ControlNet

模型细节

简介

Controlnet是由Lvmin Zhang和Maneesh Agrawala在 Adding Conditional Control to Text-to-Image Diffusion Models 提出的。

摘要如下所示:

我们提出了一种神经网络结构ControlNet,用于控制预训练的大规模扩散模型以支持额外的输入条件。 ControlNet以端到端方式学习任务特定的条件,并且即使训练数据集很小(<50k),学习也很稳健。而且,训练ControlNet与微调扩散模型的速度相同,并且可以在个人设备上进行训练。或者,如果有强大的计算集群可用,该模型可以扩展到大量(百万到数十亿)的数据。我们报告了像Stable Diffusion这样的大规模扩散模型可以通过ControlNets进行增强,以实现边缘图,分割图,关键点等条件输入。这可以丰富控制大规模扩散模型的方法并进一步促进相关应用。

示例

推荐使用 Stable Diffusion v1-5 作为检查点,因为检查点已经在 Stable Diffusion v1-5 上进行了训练。实验上,检查点可以与其他扩散模型(如dreamboothed stable diffusion)一起使用。

注意:如果要处理图像以创建辅助条件,则需要以下外部依赖项:

  • 安装 https://github.com/patrickvonplaten/controlnet_aux
  • $ pip install controlnet_aux==0.3.0
    
  • 安装扩散器和相关软件包:
  • $ pip install diffusers transformers accelerate
    
  • 运行代码:
  • import torch
    import os
    from huggingface_hub import HfApi
    from pathlib import Path
    from diffusers.utils import load_image
    from PIL import Image
    import numpy as np
    from controlnet_aux import PidiNetDetector, HEDdetector
    
    from diffusers import (
        ControlNetModel,
        StableDiffusionControlNetPipeline,
        UniPCMultistepScheduler,
    )
    
    checkpoint = "lllyasviel/control_v11p_sd15_scribble"
    
    image = load_image(
        "https://huggingface.co/lllyasviel/control_v11p_sd15_scribble/resolve/main/images/input.png"
    )
    
    prompt = "royal chamber with fancy bed"
    
    processor = HEDdetector.from_pretrained('lllyasviel/Annotators')
    
    control_image = processor(image, scribble=True)
    
    control_image.save("./images/control.png")
    controlnet = ControlNetModel.from_pretrained(checkpoint, torch_dtype=torch.float16)
    pipe = StableDiffusionControlNetPipeline.from_pretrained(
        "runwayml/stable-diffusion-v1-5", controlnet=controlnet, torch_dtype=torch.float16
    )
    
    pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)
    pipe.enable_model_cpu_offload()
    
    generator = torch.manual_seed(0)
    image = pipe(prompt, num_inference_steps=30, generator=generator, image=control_image).images[0]
    
    image.save('images/image_out.png')
    

    其他发布的检查点v1-1

    作者发布了14个不同的检查点,每个检查点都是在不同类型的条件(由 Stable Diffusion v1-5 进行训练)下进行训练的:

    12321321 12322321 12324321 12325321 12327321 12328321 12330321 12331321 12333321 12334321 12336321 12337321 12339321 12340321 12342321 12343321 12345321 12346321 12348321 12349321 12351321 12352321 12354321 12355321 12357321 12358321 12360321 12361321
    Model Name Control Image Overview Condition Image Control Image Example Generated Image Example
    12320321 Trained with canny edge detection A monochrome image with white edges on a black background.
    12323321 Trained with pixel to pixel instruction No condition .
    12326321 Trained with image inpainting No condition.
    12329321 Trained with multi-level line segment detection An image with annotated line segments.
    12332321 Trained with depth estimation An image with depth information, usually represented as a grayscale image.
    12335321 Trained with surface normal estimation An image with surface normal information, usually represented as a color-coded image.
    12338321 Trained with image segmentation An image with segmented regions, usually represented as a color-coded image.
    12341321 Trained with line art generation An image with line art, usually black lines on a white background.
    12344321 Trained with anime line art generation An image with anime-style line art.
    12347321 Trained with human pose estimation An image with human poses, usually represented as a set of keypoints or skeletons.
    12350321 Trained with scribble-based image generation An image with scribbles, usually random or user-drawn strokes.
    12353321 Trained with soft edge image generation An image with soft edges, usually to create a more painterly or artistic effect.
    12356321 Trained with image shuffling An image with shuffled patches or regions.
    12359321 Trained with image tiling A blurry image or part of an image .

    Scribble 1.1中的改进:

    • 上一个cnet 1.0的训练数据集存在一些问题,包括(1)一小组灰度人类图像被重复数千次(!),导致先前的模型可能会生成灰度人类图像;(2)一些图像质量低,非常模糊或具有显著的JPEG伪影;(3)由于数据处理脚本中的错误,一小组图像具有错误的配对提示。新模型解决了训练数据集的所有问题,应在许多情况下更加合理。
    • 我们发现用户有时喜欢画非常粗的涂鸦。因此,我们使用更积极的随机形态变换来合成涂鸦。即使涂鸦相对较粗(训练数据的最大宽度是512个画布中24像素宽的涂鸦),该模型也应该能很好地工作(最小宽度为1像素)。
    • 从涂鸦1.0继续,使用了200个GPU小时的A100 80G。

    更多信息

    更多信息,请查看 Diffusers ControlNet Blog Post ,并查看 official docs