模型:
lllyasviel/control_v11p_sd15_openpose
Controlnet v1.1 是 Controlnet v1.0 的后继模型,于 lllyasviel/ControlNet-v1-1 由 Lvmin Zhang 发布。
此检查点是将 the original checkpoint 转换为 diffusers 格式的结果。它可以与稳定扩散 (Stable Diffusion) 结合使用,例如 runwayml/stable-diffusion-v1-5 。
更多细节,请参考 ? Diffusers docs 。
ControlNet 是一种通过添加额外条件来控制扩散模型的神经网络结构。
此检查点对应于基于 openpose 图像的 ControlNet。
开发者:Lvmin Zhang, Maneesh Agrawala
模型类型:基于扩散的文本到图像生成模型
语言:英语
许可证: The CreativeML OpenRAIL M license 是 Open RAIL M license 的修改版,是 Lvmin Zhang 和 Maneesh Agrawala 在负责 AI 许可方面的联合工作的成果。我们的许可证基于 the article about the BLOOM Open RAIL license 。
更多信息资源: GitHub Repository , Paper
引用:
@misc{zhang2023adding, title={Adding Conditional Control to Text-to-Image Diffusion Models}, author={Lvmin Zhang and Maneesh Agrawala}, year={2023}, eprint={2302.05543}, archivePrefix={arXiv}, primaryClass={cs.CV}}
Controlnet 是由 Lvmin Zhang, Maneesh Agrawala 在 Adding Conditional Control to Text-to-Image Diffusion Models 提出的。
摘要如下:
我们提出了一种神经网络结构 ControlNet,用于控制预训练的大规模扩散模型以支持附加输入条件。ControlNet 可以以端到端的方式学习任务特定的条件,即使训练数据集很小 (< 50k),学习仍然稳健。此外,训练 ControlNet 的速度与微调扩散模型的速度一样快,可以在个人设备上进行训练。如果有强大的计算集群可用,该模型可以扩展到大量的数据(百万到十亿级)。我们报告了像 Stable Diffusion 这样的大规模扩散模型可以通过 ControlNet 进行增强,以实现像边缘映射、分割映射、关键点等有条件的输入。这可能丰富了控制大规模扩散模型的方法,并进一步促进了相关应用。
推荐使用与 Stable Diffusion v1-5 相对应的检查点,因为该检查点已经在其上进行了训练。实验上,该检查点可以与其他扩散模型(如 dreamboothed stable diffusion)一起使用。
注意:如果您想处理图像以创建辅助条件,请按照以下示例安装外部依赖项:
$ pip install controlnet_aux==0.3.0
$ pip install diffusers transformers accelerate
import torch import os from huggingface_hub import HfApi from pathlib import Path from diffusers.utils import load_image from PIL import Image import numpy as np from controlnet_aux import OpenposeDetector from diffusers import ( ControlNetModel, StableDiffusionControlNetPipeline, UniPCMultistepScheduler, ) checkpoint = "lllyasviel/control_v11p_sd15_openpose" image = load_image( "https://huggingface.co/lllyasviel/control_v11p_sd15_openpose/resolve/main/images/input.png" ) prompt = "chef in the kitchen" processor = OpenposeDetector.from_pretrained('lllyasviel/ControlNet') control_image = processor(image, hand_and_face=True) control_image.save("./images/control.png") controlnet = ControlNetModel.from_pretrained(checkpoint, torch_dtype=torch.float16) pipe = StableDiffusionControlNetPipeline.from_pretrained( "runwayml/stable-diffusion-v1-5", controlnet=controlnet, torch_dtype=torch.float16 ) pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config) pipe.enable_model_cpu_offload() generator = torch.manual_seed(0) image = pipe(prompt, num_inference_steps=30, generator=generator, image=control_image).images[0] image.save('images/image_out.png')
作者发布了 14 个不同的检查点,每个检查点都是在不同类型的条件下使用 Stable Diffusion v1-5 进行训练的:
Model Name | Control Image Overview | Control Image Example | Generated Image Example |
---|---|---|---|
12320321 Trained with canny edge detection | A monochrome image with white edges on a black background. | 12321321 12322321||
12323321 Trained with pixel to pixel instruction | No condition . | 12324321 12325321||
12326321 Trained with image inpainting | No condition. | 12327321 12328321||
12329321 Trained with multi-level line segment detection | An image with annotated line segments. | 12330321 12331321||
12332321 Trained with depth estimation | An image with depth information, usually represented as a grayscale image. | 12333321 12334321||
12335321 Trained with surface normal estimation | An image with surface normal information, usually represented as a color-coded image. | 12336321 12337321||
12338321 Trained with image segmentation | An image with segmented regions, usually represented as a color-coded image. | 12339321 12340321||
12341321 Trained with line art generation | An image with line art, usually black lines on a white background. | 12342321 12343321||
12344321 Trained with anime line art generation | An image with anime-style line art. | 12345321 12346321||
12347321 Trained with human pose estimation | An image with human poses, usually represented as a set of keypoints or skeletons. | 12348321 12349321||
12350321 Trained with scribble-based image generation | An image with scribbles, usually random or user-drawn strokes. | 12351321 12352321||
12353321 Trained with soft edge image generation | An image with soft edges, usually to create a more painterly or artistic effect. | 12354321 12355321||
12356321 Trained with image shuffling | An image with shuffled patches or regions. | 12357321 12358321
更多信息,请参考 Diffusers ControlNet Blog Post 并查看 official docs 。