利用FLUX.1进行图像转换：方法与应用

2024年10月17日由 alex 发表 480 0

这篇文章将引导你根据现有图像和文字提示生成新图像。

首先，我们将简要介绍潜在扩散模型的工作原理。然后，我们将了解 SDEdit 如何修改后向扩散过程，以便根据文本提示编辑图像。

背景：潜在扩散

潜在扩散在低维潜在空间中执行扩散过程。让我们定义潜空间：

变异自动编码器（VAE）将图像从像素空间（人类理解的 RGB 高宽表示法）投射到一个较小的潜在空间。这种压缩保留了足够的信息，以便日后重建图像。扩散过程在这个潜空间中运行，因为它的计算成本更低，对像素空间无关细节的敏感度也更低。

现在，让我们来解释一下潜空间扩散：

扩散过程分为两个部分：

前向扩散：一个预定的、非学习的过程，通过多个步骤将自然图像转换为纯噪声。
后向扩散：从纯噪声中重建自然图像的学习过程。

需要注意的是，在前向过程中，噪音会被添加到潜空间，并按照特定的时间表从弱到强进行处理。

在前向扩散过程中，噪声会按照特定的时间表从弱到强地添加到潜空间。与 GAN 等一次性生成方法相比，这种多步骤方法简化了网络的任务。后向过程是通过似然最大化来学习的，这比对抗损失更容易优化。

文本调节

生成还受制于文本等额外信息，这是你可能给稳定扩散模型或 Flux.1 模型的提示。当扩散模型学习如何进行后向过程时，这些文本将作为 “提示 ”被包含在内。这段文字将使用 CLIP 或 T5 模型进行编码，并输入 UNet 或 Transformer，以引导它找到受噪声干扰的正确原始图像。

SDEdit

SDEdit 背后的理念很简单：在后向过程中，它不像上图的 “步骤 1 ”那样从完全随机噪声开始，而是从输入图像+缩放随机噪声开始，然后再运行常规的后向扩散过程。具体步骤如下：

加载输入图像，进行 VAE 预处理
在 VAE 中运行，并对一个输出进行采样（VAE 返回一个分布，因此我们需要采样来获得分布的一个实例）。
选取后向扩散过程的起始步长 t_i。
按照 t_i 的水平对一些噪声进行采样，并将其添加到潜像表示中。
利用噪声潜像和提示，从 t_i 开始启动后向扩散过程。
使用 VAE 将结果投影回像素空间。

代码

下面介绍如何使用扩散器运行此工作流程：

首先，安装依赖项

pip install git+https://github.com/huggingface/diffusers.git optimum-quanto

目前，你需要从源代码中安装扩散器，因为 pypi 上还没有这项功能。

接下来，加载 FluxImg2Img 管道

import os
from diffusers import FluxImg2ImgPipeline
from optimum.quanto import qint8, qint4, quantize, freeze
import torch
from typing import Callable, List, Optional, Union, Dict, Any
from PIL import Image
import requests
import io
MODEL_PATH = os.getenv("MODEL_PATH", "black-forest-labs/FLUX.1-dev")
pipeline = FluxImg2ImgPipeline.from_pretrained(MODEL_PATH, torch_dtype=torch.bfloat16)
quantize(pipeline.text_encoder, weights=qint4, exclude="proj_out")
freeze(pipeline.text_encoder)
quantize(pipeline.text_encoder_2, weights=qint4, exclude="proj_out")
freeze(pipeline.text_encoder_2)
quantize(pipeline.transformer, weights=qint8, exclude="proj_out")
freeze(pipeline.transformer)
pipeline = pipeline.to("cuda")
generator = torch.Generator(device="cuda").manual_seed(100)

这段代码加载了流水线，并对其中的某些部分进行了量化，使其适合 Colab 上的 L4 GPU。

现在，让我们定义一个实用程序，以正确的尺寸加载图像，而不会出现失真

def resize_image_center_crop(image_path_or_url, target_width, target_height):
    """
    Resizes an image while maintaining aspect ratio using center cropping.
    Handles both local file paths and URLs.
    Args:
        image_path_or_url: Path to the image file or URL.
        target_width: Desired width of the output image.
        target_height: Desired height of the output image.
    Returns:
        A PIL Image object with the resized image, or None if there's an error.
    """
    try:
        if image_path_or_url.startswith(('http://', 'https://')):  # Check if it's a URL
            response = requests.get(image_path_or_url, stream=True)
            response.raise_for_status()  # Raise HTTPError for bad responses (4xx or 5xx)
            img = Image.open(io.BytesIO(response.content))
        else:  # Assume it's a local file path
            img = Image.open(image_path_or_url)
        img_width, img_height = img.size
        # Calculate aspect ratios
        aspect_ratio_img = img_width / img_height
        aspect_ratio_target = target_width / target_height
        # Determine cropping box
        if aspect_ratio_img > aspect_ratio_target:  # Image is wider than target
            new_width = int(img_height * aspect_ratio_target)
            left = (img_width - new_width) // 2
            right = left + new_width
            top = 0
            bottom = img_height
        else:  # Image is taller or equal to target
            new_height = int(img_width / aspect_ratio_target)
            left = 0
            right = img_width
            top = (img_height - new_height) // 2
            bottom = top + new_height
        # Crop the image
        cropped_img = img.crop((left, top, right, bottom))
        # Resize to target dimensions
        resized_img = cropped_img.resize((target_width, target_height), Image.LANCZOS)
        return resized_img
    except (FileNotFoundError, requests.exceptions.RequestException, IOError) as e:
        print(f"Error: Could not open or process image from '{image_path_or_url}'.  Error: {e}")
        return None
    except Exception as e: #Catch other potential exceptions during image processing.
        print(f"An unexpected error occurred: {e}")
        return None

最后，让我们加载图像并运行管道

url = "https://images.unsplash.com/photo-1609665558965-8e4c789cd7c5?ixlib=rb-4.0.3&q=85&fm=jpg&crop=entropy&cs=srgb&dl=sven-mieke-G-8B32scqMc-unsplash.jpg"
image = resize_image_center_crop(image_path_or_url=url, target_width=1024, target_height=1024)
prompt = "A picture of a Tiger"
image2 = pipeline(prompt, image=image, guidance_scale=3.5, generator=generator, height=1024, width=1024, num_inference_steps=28, strength=0.9).images[0]

这样就可以转换下面的图像：

这个：

你可以看到，这只猫的姿势和形状与原图相似，但地毯的颜色不同。这说明模型遵循了与原图相同的模式，同时也采取了一些自由发挥的方式，使其更符合文字提示。

这里有两个重要参数：

推理步数（num_inference_steps）：它是反向扩散过程中去噪步骤的数量，数量越多意味着质量越好，但生成时间越长。
强度：它控制着噪音的大小或你想从多远的扩散过程开始。数字越小表示变化越小，数字越大表示变化越大。

结论

现在你知道了图像到图像的潜在扩散是如何工作的，以及如何在 python 中运行它。在我的测试中，使用这种方法的结果可能会时好时坏，我通常需要改变步数、强度和提示，才能让它更好地按照提示进行操作。

文章来源：https://medium.com/towards-data-science/image-to-image-translation-with-flux-1-intuition-and-tutorial-001fc521ebe6

标签：

数据科学人工智能

0 评论

欢迎关注ATYUN官方公众号

商务合作及内容投稿请联系邮箱:bd@atyun.com

上一篇 EfficientNet、ViT等视觉嵌入的图像相似性搜索对比

下一篇使用LangGraph构建生产级AI代理

评论登录

要发表评论，您必须先登录。

jonatasgrosman/wav2vec2-large-xlsr-53-english facebook/dino-vitb16 bert-base-uncased xlm-roberta-large xlm-roberta-base gpt2 microsoft/resnet-50 facebook/dino-vits8

AGENTIC AI如何塑造未来