潜在扩散模型 (LDM) 用于超分辨率

论文： High-Resolution Image Synthesis with Latent Diffusion Models

摘要：

通过将图像形成过程分解为一系列去噪自编码器的应用，扩散模型 (DMs) 在图像数据和其他领域上实现了最先进的合成结果。此外，它们的表达允许通过引导机制来控制图像生成过程，无需重新训练。然而，由于这些模型通常直接在像素空间中操作，优化强大的 DMs 往往需要消耗数百个 GPU 天的时间，并且由于需要顺序评估，推理过程也非常昂贵。为了在有限的计算资源上进行 DM 训练，同时保持它们的质量和灵活性，我们将它们应用于强大的预训练自编码器的潜在空间中。与以前的工作相比，对这种表示进行扩散模型的训练首次可以达到复杂度减少和细节保留之间的近乎最优状态，大大提高了视觉保真度。通过将交叉注意力层引入模型架构，我们将扩散模型转化为强大而灵活的生成器，可以对文本或边界框等一般条件输入进行高分辨率合成。我们的潜在扩散模型 (LDMs) 在图像修复方面取得了新的技术水平，并在各种任务上表现出极具竞争力的性能，包括无条件图像生成、语义场景合成和超分辨率，同时与基于像素的 DM 相比，大大降低了计算要求。

作者

Robin Rombach、Andreas Blattmann、Dominik Lorenz、Patrick Esser、Björn Ommer

用法

使用管道进行推理

!pip install git+https://github.com/huggingface/diffusers.git

import requests
from PIL import Image
from io import BytesIO
from diffusers import LDMSuperResolutionPipeline
import torch

device = "cuda" if torch.cuda.is_available() else "cpu"
model_id = "CompVis/ldm-super-resolution-4x-openimages"

# load model and scheduler
pipeline = LDMSuperResolutionPipeline.from_pretrained(model_id)
pipeline = pipeline.to(device)

# let's download an  image
url = "https://user-images.githubusercontent.com/38061659/199705896-b48e17b8-b231-47cd-a270-4ffa5a93fa3e.png"
response = requests.get(url)
low_res_img = Image.open(BytesIO(response.content)).convert("RGB")
low_res_img = low_res_img.resize((128, 128))

# run pipeline in inference (sample random noise and denoise)
upscaled_image = pipeline(low_res_img, num_inference_steps=100, eta=1).images[0]
# save image
upscaled_image.save("ldm_generated_image.png")

作者:

CompVis

数据集大小:

1.26 GB