模型:

lllyasviel/control_v11p_sd15_seg

任务:

图生图

类库:

Diffusers

其他:

art controlnet stable-diffusion controlnet-v1-1

预印本库:

arxiv:2302.05543

许可:

openrail

模型介绍文件清单

英文

Controlnet - v1.1 - seg 版本

Controlnet v1.1 是 Controlnet v1.0 的后续模型，于 lllyasviel/ControlNet-v1-1 由 Lvmin Zhang 发布。

该检查点是将 the original checkpoint 转换为扩散器格式。它可以与稳定扩散（如 runwayml/stable-diffusion-v1-5 ）等稳定扩散一起使用。

更多详细信息，请参阅 ? Diffusers docs 。

ControlNet 是一种神经网络结构，用于通过添加额外条件来控制扩散模型。

此检查点对应于基于 seg 图像的 ControlNet。

模型详细信息

开发者：Lvmin Zhang，Maneesh Agrawala
模型类型：基于扩散的文本到图像生成模型
语言：英文
许可证： The CreativeML OpenRAIL M license 是 Open RAIL M license 的衍生产品，由 BigScience 和 the RAIL Initiative 共同在负责任的 AI 许可证方面进行合作。还请参阅我们许可证所依据的 the article about the BLOOM Open RAIL license 。
更多信息的资源： GitHub Repository ， Paper 。
引用方式：

@misc{zhang2023adding, title={Adding Conditional Control to Text-to-Image Diffusion Models}, author={Lvmin Zhang and Maneesh Agrawala}, year={2023}, eprint={2302.05543}, archivePrefix={arXiv}, primaryClass={cs.CV}}

简介

Controlnet 是由 Lvmin Zhang，Maneesh Agrawala 在 Adding Conditional Control to Text-to-Image Diffusion Models 提出的。

摘要如下：

我们提出了一种神经网络结构 ControlNet，用于控制预训练的大型扩散模型，以支持额外的输入条件。ControlNet 以端到端的方式学习任务特定的条件，并且即使训练数据集很小（< 50k），学习也很稳健。此外，训练 ControlNet 的速度与微调扩散模型的速度相当，并且可以在个人设备上进行训练。或者，如果有强大的计算集群可用，该模型可以扩展到大量（百万到十亿）的数据。我们报告了像 Stable Diffusion 这样的大型扩散模型可以通过与 ControlNet 的组合来实现条件输入，如边缘图，分割图，关键点等。这可能丰富了控制大型扩散模型的方法，并进一步促进相关应用。

示例

建议使用 Stable Diffusion v1-5 作为检查点，因为该检查点已经在其上训练过。实验上，该检查点可以与其他扩散模型（如 dreamboothed 稳定扩散）一起使用。

注意：如果要处理图像以创建辅助条件，则需要以下外部依赖项：

安装扩散器和相关软件包：

$ pip install diffusers transformers accelerate

定义稍后需要的颜色表。

import numpy as np

ada_palette = np.asarray([
      [0, 0, 0],
      [120, 120, 120],
      [180, 120, 120],
      [6, 230, 230],
      [80, 50, 50],
      [4, 200, 3],
      [120, 120, 80],
      [140, 140, 140],
      [204, 5, 255],
      [230, 230, 230],
      [4, 250, 7],
      [224, 5, 255],
      [235, 255, 7],
      [150, 5, 61],
      [120, 120, 70],
      [8, 255, 51],
      [255, 6, 82],
      [143, 255, 140],
      [204, 255, 4],
      [255, 51, 7],
      [204, 70, 3],
      [0, 102, 200],
      [61, 230, 250],
      [255, 6, 51],
      [11, 102, 255],
      [255, 7, 71],
      [255, 9, 224],
      [9, 7, 230],
      [220, 220, 220],
      [255, 9, 92],
      [112, 9, 255],
      [8, 255, 214],
      [7, 255, 224],
      [255, 184, 6],
      [10, 255, 71],
      [255, 41, 10],
      [7, 255, 255],
      [224, 255, 8],
      [102, 8, 255],
      [255, 61, 6],
      [255, 194, 7],
      [255, 122, 8],
      [0, 255, 20],
      [255, 8, 41],
      [255, 5, 153],
      [6, 51, 255],
      [235, 12, 255],
      [160, 150, 20],
      [0, 163, 255],
      [140, 140, 140],
      [250, 10, 15],
      [20, 255, 0],
      [31, 255, 0],
      [255, 31, 0],
      [255, 224, 0],
      [153, 255, 0],
      [0, 0, 255],
      [255, 71, 0],
      [0, 235, 255],
      [0, 173, 255],
      [31, 0, 255],
      [11, 200, 200],
      [255, 82, 0],
      [0, 255, 245],
      [0, 61, 255],
      [0, 255, 112],
      [0, 255, 133],
      [255, 0, 0],
      [255, 163, 0],
      [255, 102, 0],
      [194, 255, 0],
      [0, 143, 255],
      [51, 255, 0],
      [0, 82, 255],
      [0, 255, 41],
      [0, 255, 173],
      [10, 0, 255],
      [173, 255, 0],
      [0, 255, 153],
      [255, 92, 0],
      [255, 0, 255],
      [255, 0, 245],
      [255, 0, 102],
      [255, 173, 0],
      [255, 0, 20],
      [255, 184, 184],
      [0, 31, 255],
      [0, 255, 61],
      [0, 71, 255],
      [255, 0, 204],
      [0, 255, 194],
      [0, 255, 82],
      [0, 10, 255],
      [0, 112, 255],
      [51, 0, 255],
      [0, 194, 255],
      [0, 122, 255],
      [0, 255, 163],
      [255, 153, 0],
      [0, 255, 10],
      [255, 112, 0],
      [143, 255, 0],
      [82, 0, 255],
      [163, 255, 0],
      [255, 235, 0],
      [8, 184, 170],
      [133, 0, 255],
      [0, 255, 92],
      [184, 0, 255],
      [255, 0, 31],
      [0, 184, 255],
      [0, 214, 255],
      [255, 0, 112],
      [92, 255, 0],
      [0, 224, 255],
      [112, 224, 255],
      [70, 184, 160],
      [163, 0, 255],
      [153, 0, 255],
      [71, 255, 0],
      [255, 0, 163],
      [255, 204, 0],
      [255, 0, 143],
      [0, 255, 235],
      [133, 255, 0],
      [255, 0, 235],
      [245, 0, 255],
      [255, 0, 122],
      [255, 245, 0],
      [10, 190, 212],
      [214, 255, 0],
      [0, 204, 255],
      [20, 0, 255],
      [255, 255, 0],
      [0, 153, 255],
      [0, 41, 255],
      [0, 255, 204],
      [41, 0, 255],
      [41, 255, 0],
      [173, 0, 255],
      [0, 245, 255],
      [71, 0, 255],
      [122, 0, 255],
      [0, 255, 184],
      [0, 92, 255],
      [184, 255, 0],
      [0, 133, 255],
      [255, 214, 0],
      [25, 194, 194],
      [102, 255, 0],
      [92, 0, 255],
  ])

运行代码：

import torch
import os
from huggingface_hub import HfApi
from pathlib import Path
from diffusers.utils import load_image
from PIL import Image
import numpy as np
from transformers import AutoImageProcessor, UperNetForSemanticSegmentation

from diffusers import (
    ControlNetModel,
    StableDiffusionControlNetPipeline,
    UniPCMultistepScheduler,
)

image_processor = AutoImageProcessor.from_pretrained("openmmlab/upernet-convnext-small")
image_segmentor = UperNetForSemanticSegmentation.from_pretrained("openmmlab/upernet-convnext-small")

checkpoint = "lllyasviel/control_v11p_sd15_seg"

image = load_image(
    "https://huggingface.co/lllyasviel/control_v11p_sd15_seg/resolve/main/images/input.png"
)

prompt = "old house in stormy weather with rain and wind"

pixel_values = image_processor(image, return_tensors="pt").pixel_values
with torch.no_grad():
  outputs = image_segmentor(pixel_values)
seg = image_processor.post_process_semantic_segmentation(outputs, target_sizes=[image.size[::-1]])[0]
color_seg = np.zeros((seg.shape[0], seg.shape[1], 3), dtype=np.uint8) # height, width, 3
for label, color in enumerate(ada_palette):
    color_seg[seg == label, :] = color
color_seg = color_seg.astype(np.uint8)
control_image = Image.fromarray(color_seg)

control_image.save("./images/control.png")

controlnet = ControlNetModel.from_pretrained(checkpoint, torch_dtype=torch.float16)
pipe = StableDiffusionControlNetPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5", controlnet=controlnet, torch_dtype=torch.float16
)

pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)
pipe.enable_model_cpu_offload()

generator = torch.manual_seed(0)
image = pipe(prompt, num_inference_steps=30, generator=generator, image=control_image).images[0]

image.save('images/image_out.png')

其他发布的检查站点 v1-1

作者发布了14个不同的检查站点，每个站点都使用 Stable Diffusion v1-5 在不同类型的条件下进行了训练：

12320321 12321321 12323321 12324321 12326321 12327321 12329321 12330321 12332321 12333321 12335321 12336321 12338321 12339321 12341321 12342321 12344321 12345321 12347321 12348321 12350321 12351321 12353321 12354321 12356321 12357321 12359321 12360321

Model Name	Control Image Overview	Condition Image
12319321	Trained with canny edge detection	A monochrome image with white edges on a black background.
12322321	Trained with pixel to pixel instruction	No condition .
12325321	Trained with image inpainting	No condition.
12328321	Trained with multi-level line segment detection	An image with annotated line segments.
12331321	Trained with depth estimation	An image with depth information, usually represented as a grayscale image.
12334321	Trained with surface normal estimation	An image with surface normal information, usually represented as a color-coded image.
12337321	Trained with image segmentation	An image with segmented regions, usually represented as a color-coded image.
12340321	Trained with line art generation	An image with line art, usually black lines on a white background.
12343321	Trained with anime line art generation	An image with anime-style line art.
12346321	Trained with human pose estimation	An image with human poses, usually represented as a set of keypoints or skeletons.
12349321	Trained with scribble-based image generation	An image with scribbles, usually random or user-drawn strokes.
12352321	Trained with soft edge image generation	An image with soft edges, usually to create a more painterly or artistic effect.
12355321	Trained with image shuffling	An image with shuffled patches or regions.
12358321	Trained with image tiling	A blurry image or part of an image .

分割 1.1 中的改进：

支持 COCO 协议。之前的分割 1.0 支持约150种颜色，但分割 1.1 还支持来自 coco 的其他182种颜色。
从分割 1.0 恢复。所有之前的输入应该仍然有效。

Controlnet - v1.1 - seg 版本

模型详细信息

简介

示例

其他发布的检查站点 v1-1

分割 1.1 中的改进：

更多信息