模型:

lllyasviel/control_v11p_sd15_scribble

任务:

图生图

类库:

Diffusers

其他:

art controlnet stable-diffusion controlnet-v1-1

预印本库:

arxiv:2302.05543

许可:

openrail

模型介绍文件清单

英文

Controlnet - v1.1 - Scribble版本

Controlnet v1.1 是 Controlnet v1.0 的后续模型，于 lllyasviel/ControlNet-v1-1 发布，由 Lvmin Zhang 开发。

此检查点是将 the original checkpoint 转换为扩散器格式。它可以与稳定扩散器（如 runwayml/stable-diffusion-v1-5 ）结合使用。

更多细节，请参阅 ? Diffusers docs 。

ControlNet是一种神经网络结构，用于通过添加额外条件来控制扩散模型。

此检查点对应于基于涂鸦图像的ControlNet

模型细节

开发者：Lvmin Zhang，Maneesh Agrawala
模型类型：基于扩散的文本到图像生成模型
语言：英语
许可证： The CreativeML OpenRAIL M license 是 Open RAIL M license ，改编自 BigScience 和 the RAIL Initiative 合作从事负责任的AI许可证领域的工作。另请参阅我们许可证的 the article about the BLOOM Open RAIL license 。
更多信息资源： GitHub Repository ， Paper 。
引用如下：

@misc{zhang2023adding，title={Adding Conditional Control to Text-to-Image Diffusion Models}，author={Lvmin Zhang and Maneesh Agrawala}，year={2023}，eprint={2302.05543}，archivePrefix={arXiv}，primaryClass={cs.CV}}

简介

Controlnet是由Lvmin Zhang和Maneesh Agrawala在 Adding Conditional Control to Text-to-Image Diffusion Models 提出的。

摘要如下所示：

我们提出了一种神经网络结构ControlNet，用于控制预训练的大规模扩散模型以支持额外的输入条件。 ControlNet以端到端方式学习任务特定的条件，并且即使训练数据集很小（＜50k），学习也很稳健。而且，训练ControlNet与微调扩散模型的速度相同，并且可以在个人设备上进行训练。或者，如果有强大的计算集群可用，该模型可以扩展到大量（百万到数十亿）的数据。我们报告了像Stable Diffusion这样的大规模扩散模型可以通过ControlNets进行增强，以实现边缘图，分割图，关键点等条件输入。这可以丰富控制大规模扩散模型的方法并进一步促进相关应用。

示例

推荐使用 Stable Diffusion v1-5 作为检查点，因为检查点已经在 Stable Diffusion v1-5 上进行了训练。实验上，检查点可以与其他扩散模型（如dreamboothed stable diffusion）一起使用。

注意：如果要处理图像以创建辅助条件，则需要以下外部依赖项：

安装 https://github.com/patrickvonplaten/controlnet_aux

$ pip install controlnet_aux==0.3.0

安装扩散器和相关软件包：

$ pip install diffusers transformers accelerate

运行代码：

import torch
import os
from huggingface_hub import HfApi
from pathlib import Path
from diffusers.utils import load_image
from PIL import Image
import numpy as np
from controlnet_aux import PidiNetDetector, HEDdetector

from diffusers import (
    ControlNetModel,
    StableDiffusionControlNetPipeline,
    UniPCMultistepScheduler,
)

checkpoint = "lllyasviel/control_v11p_sd15_scribble"

image = load_image(
    "https://huggingface.co/lllyasviel/control_v11p_sd15_scribble/resolve/main/images/input.png"
)

prompt = "royal chamber with fancy bed"

processor = HEDdetector.from_pretrained('lllyasviel/Annotators')

control_image = processor(image, scribble=True)

control_image.save("./images/control.png")
controlnet = ControlNetModel.from_pretrained(checkpoint, torch_dtype=torch.float16)
pipe = StableDiffusionControlNetPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5", controlnet=controlnet, torch_dtype=torch.float16
)

pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)
pipe.enable_model_cpu_offload()

generator = torch.manual_seed(0)
image = pipe(prompt, num_inference_steps=30, generator=generator, image=control_image).images[0]

image.save('images/image_out.png')

其他发布的检查点v1-1

作者发布了14个不同的检查点，每个检查点都是在不同类型的条件（由 Stable Diffusion v1-5 进行训练）下进行训练的：

12321321 12322321 12324321 12325321 12327321 12328321 12330321 12331321 12333321 12334321 12336321 12337321 12339321 12340321 12342321 12343321 12345321 12346321 12348321 12349321 12351321 12352321 12354321 12355321 12357321 12358321 12360321 12361321

Model Name	Control Image Overview	Condition Image
12320321	Trained with canny edge detection	A monochrome image with white edges on a black background.
12323321	Trained with pixel to pixel instruction	No condition .
12326321	Trained with image inpainting	No condition.
12329321	Trained with multi-level line segment detection	An image with annotated line segments.
12332321	Trained with depth estimation	An image with depth information, usually represented as a grayscale image.
12335321	Trained with surface normal estimation	An image with surface normal information, usually represented as a color-coded image.
12338321	Trained with image segmentation	An image with segmented regions, usually represented as a color-coded image.
12341321	Trained with line art generation	An image with line art, usually black lines on a white background.
12344321	Trained with anime line art generation	An image with anime-style line art.
12347321	Trained with human pose estimation	An image with human poses, usually represented as a set of keypoints or skeletons.
12350321	Trained with scribble-based image generation	An image with scribbles, usually random or user-drawn strokes.
12353321	Trained with soft edge image generation	An image with soft edges, usually to create a more painterly or artistic effect.
12356321	Trained with image shuffling	An image with shuffled patches or regions.
12359321	Trained with image tiling	A blurry image or part of an image .

Scribble 1.1中的改进：

上一个cnet 1.0的训练数据集存在一些问题，包括（1）一小组灰度人类图像被重复数千次（！），导致先前的模型可能会生成灰度人类图像；（2）一些图像质量低，非常模糊或具有显著的JPEG伪影；（3）由于数据处理脚本中的错误，一小组图像具有错误的配对提示。新模型解决了训练数据集的所有问题，应在许多情况下更加合理。
我们发现用户有时喜欢画非常粗的涂鸦。因此，我们使用更积极的随机形态变换来合成涂鸦。即使涂鸦相对较粗（训练数据的最大宽度是512个画布中24像素宽的涂鸦），该模型也应该能很好地工作（最小宽度为1像素）。
从涂鸦1.0继续，使用了200个GPU小时的A100 80G。

Controlnet - v1.1 - Scribble版本

模型细节

简介

示例

其他发布的检查点v1-1

Scribble 1.1中的改进：

更多信息