数据集:

THUDM/ImageRewardDB

任务:

文生图

语言:

大小:

100K<n<1M

预印本库:

arxiv:2304.05977

许可:

apache-2.0

数据集介绍文件清单

英文

ImageRewardDB

数据集概要

ImageRewardDB是一个全面的文本与图像比较数据集，侧重于文本到图像的人类偏好。它包含了来自DiffusionDB的文本提示和相应模型生成结果的13.7万对专家比较。为了构建ImageRewardDB，我们设计了一个针对它的流程，建立了定量评估和注释者培训的准则，优化了标注体验，并确保了质量验证。ImageRewardDB现在可以在 ? Hugging Face Dataset 上公开获取。

注意：ImageRewardDB中的所有图像都是从DiffusionDB中收集的，并且我们还将相应的图像汇总到了同一提示下。

语言

数据集中的文本都是英文。

四个子集

考虑到ImageRewardDB包含大量图像，我们提供了四个不同规模的子集以满足不同的需求。对于所有子集，验证集和测试集保持不变。验证集（1.10GB）包含412个提示和2.6K张图像（7.32K对），测试集（1.16GB）包含466个提示和2.7K张图像（7.23K对）。不同规模训练集的信息如下：

Subset	Num of Pairs	Num of Images	Num of Prompts	Size
ImageRewardDB 1K	17.6K	6.2K	1K	2.7GB
ImageRewardDB 2K	35.5K	12.5K	2K	5.5GB
ImageRewardDB 4K	71.0K	25.1K	4K	10.8GB
ImageRewardDB 8K	141.1K	49.9K	8K	20.9GB

数据集结构

这个仓库中的所有数据都以一种组织良好的方式进行存储。ImageRewardDB中的62.6K张图像被分成几个文件夹，根据其划分存储在"./images"下的相应目录中。每个文件夹包含大约500个提示、相应的图像和一个JSON文件。JSON文件将图像与其对应的提示和注释关联起来。文件结构如下：

# ImageRewardDB
./
├── images
│   ├── train
│   │   ├── train_1
│   │   │   ├── 0a1ed3a5-04f6-4a1b-aee6-d584e7c8ed9c.webp
│   │   │   ├── 0a58cfa8-ff61-4d31-9757-27322aec3aaf.webp
│   │   │   ├── [...]
│   │   │   └── train_1.json
│   │   ├── train_2
│   │   ├── train_3
│   │   ├── [...]
│   │   └── train_32
│   ├── validation
│   │   └── [...]
│   └── test
│       └── [...]
├── metadata-train.parquet
├── metadata-validation.parquet
└── metadata-test.parquet

子文件夹的名称为{split_name}_{part_id}，JSON文件的名称与子文件夹相同。每个图像都是无损WebP文件，并且有一个由 UUID 生成的唯一名称。

数据实例

例如，下面是train_1.json中图像1b4b2d61-89c2-4091-a1c0-f547ad5065cb.webp的图像及其信息。

{
  "image_path": "images/train/train_1/0280642d-f69f-41d1-8598-5a44e296aa8b.webp",
  "prompt_id": "000864-0061",
  "prompt": "painting of a holy woman, decorated, intricate, elegant, highly detailed, digital painting, artstation, concept art, smooth, sharp focus, illustration, art by artgerm and greg rutkowski and alphonse mucha, 8 k ",
  "classification": "People",
  "image_amount_in_total": 9,
  "rank": 5,
  "overall_rating": 4,
  "image_text_alignment_rating": 3,
  "fidelity_rating": 4
}

数据字段

image: 图像对象
prompt_id: 对应提示的id
prompt: 对应提示的文本
classification: 对应提示的分类
image_amount_in_total: 与该提示相关的图像总数
rank: 图像在所有相关图像中的相对排名
overall_rating: 图像的综合评分
image_text_alignment_rating: 生成图像与给定文本匹配程度的评分
fidelity_rating: 输出图像是否符合对象应具有的形状和特征的评分

数据拆分

如上所述，我们提供的所有子集都有"train"、"validation"和"test"三个拆分。所有子集共用相同的验证集和测试集。

数据集元数据

我们还包括了三个元数据表metadata-train.parquet、metadata-validation.parquet和metadata-test.parquet，帮助您在不下载Zip文件的情况下访问和理解ImageRewardDB。

所有的表共享相同的模式，每一行都指向一张图像。模式如下所示，实际上，上面提到的JSON文件也共享相同的模式：

Column	Type	Description
image_path	string	The relative path of the image in the repository.
prompt_id	string	The id of the corresponding prompt.
prompt	string	The text of the corresponding prompt.
classification	string	The classification of the corresponding prompt.
image_amount_in_total	int	Total amount of images related to the prompt.
rank	int	The relative rank of the image in all related images.
overall_rating	int	The overall score of this image.
image_text_alignment_rating	int	The score of how well the generated image matches the given text.
fidelity_rating	int	The score of whether the output image is true to the shape and characteristics that the object should have.

下面是metadata-train.parquet的一个示例行。

image_path	prompt_id	prompt	classification	image_amount_in_total	rank	overall_rating	image_text_alignment_rating	fidelity_rating
images/train/train_1/1b4b2d61-89c2-4091-a1c0-f547ad5065cb.webp	001324-0093	a magical forest that separates the good world from the dark world, ...	Outdoor Scenes	8	3	6	6	6

加载ImageRewardDB

您可以使用Hugging Face库 Datasets 来轻松加载ImageRewardDB。如前所述，我们提供了1k、2k、4k和8k四个规模的子集。您可以按照以下方式加载它们：

from datasets import load_dataset

# Load the 1K-scale dataset
dataset = load_dataset("THUDM/ImageRewardDB", "1k")

# Load the 2K-scale dataset
dataset = load_dataset("THUDM/ImageRewardDB", "2k")

# Load the 4K-scale dataset
dataset = load_dataset("THUDM/ImageRewardDB", "4K")

# Load the 8K-scale dataset
dataset = load_dataset("THUDM/ImageRewardDB", "8k")

附加信息

许可信息

ImageRewardDB数据集可在 Apache license 2.0 下使用。此存储库中的Python代码可在 MIT License 下使用。

引用信息

@misc{xu2023imagereward,
      title={ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation}, 
      author={Jiazheng Xu and Xiao Liu and Yuchen Wu and Yuxuan Tong and Qinkai Li and Ming Ding and Jie Tang and Yuxiao Dong},
      year={2023},
      eprint={2304.05977},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

作者:

THUDM

数据集大小:

22.08 GB