


Controlnet - v1.1 - Scribble版本

Controlnet v1.1 是 Controlnet v1.0 的后续模型,于 lllyasviel/ControlNet-v1-1 发布,由 Lvmin Zhang 开发。

此检查点是将 the original checkpoint 转换为扩散器格式。它可以与稳定扩散器(如 runwayml/stable-diffusion-v1-5 )结合使用。

更多细节,请参阅 ? Diffusers docs





Controlnet是由Lvmin Zhang和Maneesh Agrawala在 Adding Conditional Control to Text-to-Image Diffusion Models 提出的。


我们提出了一种神经网络结构ControlNet,用于控制预训练的大规模扩散模型以支持额外的输入条件。 ControlNet以端到端方式学习任务特定的条件,并且即使训练数据集很小(<50k),学习也很稳健。而且,训练ControlNet与微调扩散模型的速度相同,并且可以在个人设备上进行训练。或者,如果有强大的计算集群可用,该模型可以扩展到大量(百万到数十亿)的数据。我们报告了像Stable Diffusion这样的大规模扩散模型可以通过ControlNets进行增强,以实现边缘图,分割图,关键点等条件输入。这可以丰富控制大规模扩散模型的方法并进一步促进相关应用。


推荐使用 Stable Diffusion v1-5 作为检查点,因为检查点已经在 Stable Diffusion v1-5 上进行了训练。实验上,检查点可以与其他扩散模型(如dreamboothed stable diffusion)一起使用。


  • 安装 https://github.com/patrickvonplaten/controlnet_aux
  • $ pip install controlnet_aux==0.3.0
  • 安装扩散器和相关软件包:
  • $ pip install diffusers transformers accelerate
  • 运行代码:
  • import torch
    import os
    from huggingface_hub import HfApi
    from pathlib import Path
    from diffusers.utils import load_image
    from PIL import Image
    import numpy as np
    from controlnet_aux import PidiNetDetector, HEDdetector
    from diffusers import (
    checkpoint = "lllyasviel/control_v11p_sd15_scribble"
    image = load_image(
    prompt = "royal chamber with fancy bed"
    processor = HEDdetector.from_pretrained('lllyasviel/Annotators')
    control_image = processor(image, scribble=True)
    controlnet = ControlNetModel.from_pretrained(checkpoint, torch_dtype=torch.float16)
    pipe = StableDiffusionControlNetPipeline.from_pretrained(
        "runwayml/stable-diffusion-v1-5", controlnet=controlnet, torch_dtype=torch.float16
    pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)
    generator = torch.manual_seed(0)
    image = pipe(prompt, num_inference_steps=30, generator=generator, image=control_image).images[0]


    作者发布了14个不同的检查点,每个检查点都是在不同类型的条件(由 Stable Diffusion v1-5 进行训练)下进行训练的:

    Model Name Control Image Overview Condition Image Control Image Example Generated Image Example
    12320321 Trained with canny edge detection A monochrome image with white edges on a black background.
    12323321 Trained with pixel to pixel instruction No condition .
    12326321 Trained with image inpainting No condition.
    12329321 Trained with multi-level line segment detection An image with annotated line segments.
    12332321 Trained with depth estimation An image with depth information, usually represented as a grayscale image.
    12335321 Trained with surface normal estimation An image with surface normal information, usually represented as a color-coded image.
    12338321 Trained with image segmentation An image with segmented regions, usually represented as a color-coded image.
    12341321 Trained with line art generation An image with line art, usually black lines on a white background.
    12344321 Trained with anime line art generation An image with anime-style line art.
    12347321 Trained with human pose estimation An image with human poses, usually represented as a set of keypoints or skeletons.
    12350321 Trained with scribble-based image generation An image with scribbles, usually random or user-drawn strokes.
    12353321 Trained with soft edge image generation An image with soft edges, usually to create a more painterly or artistic effect.
    12356321 Trained with image shuffling An image with shuffled patches or regions.
    12359321 Trained with image tiling A blurry image or part of an image .

    Scribble 1.1中的改进:

    • 上一个cnet 1.0的训练数据集存在一些问题,包括(1)一小组灰度人类图像被重复数千次(!),导致先前的模型可能会生成灰度人类图像;(2)一些图像质量低,非常模糊或具有显著的JPEG伪影;(3)由于数据处理脚本中的错误,一小组图像具有错误的配对提示。新模型解决了训练数据集的所有问题,应在许多情况下更加合理。
    • 我们发现用户有时喜欢画非常粗的涂鸦。因此,我们使用更积极的随机形态变换来合成涂鸦。即使涂鸦相对较粗(训练数据的最大宽度是512个画布中24像素宽的涂鸦),该模型也应该能很好地工作(最小宽度为1像素)。
    • 从涂鸦1.0继续,使用了200个GPU小时的A100 80G。


    更多信息,请查看 Diffusers ControlNet Blog Post ,并查看 official docs