数据集:

THUDM/ImageRewardDB

任务:

文生图

语言:

大小:

100K<n<1M

预印本库:

arxiv:2304.05977

许可:

apache-2.0

数据集介绍文件清单

中文

ImageRewardDB

Dataset Summary

ImageRewardDB is a comprehensive text-to-image comparison dataset, focusing on text-to-image human preference. It consists of 137k pairs of expert comparisons, based on text prompts and corresponding model outputs from DiffusionDB. To build the ImageRewadDB, we design a pipeline tailored for it, establishing criteria for quantitative assessment and annotator training, optimizing labeling experience, and ensuring quality validation. And ImageRewardDB is now publicly available at ? Hugging Face Dataset .

Notice: All images in ImageRewardDB are collected from DiffusionDB, and in addition, we gathered together images corresponding to the same prompt.

Languages

The text in the dataset is all in English.

Four Subsets

Considering that the ImageRewardDB contains a large number of images, we provide four subsets in different scales to support different needs. For all subsets, the validation and test splits remain the same. The validation split(1.10GB) contains 412 prompts and 2.6K images(7.32K pairs) and the test(1.16GB) split contains 466 prompts and 2.7K images(7.23K pairs). The information on the train split in different scales is as follows:

Subset	Num of Pairs	Num of Images	Num of Prompts	Size
ImageRewardDB 1K	17.6K	6.2K	1K	2.7GB
ImageRewardDB 2K	35.5K	12.5K	2K	5.5GB
ImageRewardDB 4K	71.0K	25.1K	4K	10.8GB
ImageRewardDB 8K	141.1K	49.9K	8K	20.9GB

Dataset Structure

All the data in this repository is stored in a well-organized way. The 62.6K images in ImageRewardDB are split into several folders, stored in corresponding directories under "./images" according to its split. Each folder contains around 500 prompts, their corresponding images, and a JSON file. The JSON file links the image with its corresponding prompt and annotation. The file structure is as follows:

# ImageRewardDB
./
├── images
│   ├── train
│   │   ├── train_1
│   │   │   ├── 0a1ed3a5-04f6-4a1b-aee6-d584e7c8ed9c.webp
│   │   │   ├── 0a58cfa8-ff61-4d31-9757-27322aec3aaf.webp
│   │   │   ├── [...]
│   │   │   └── train_1.json
│   │   ├── train_2
│   │   ├── train_3
│   │   ├── [...]
│   │   └── train_32
│   ├── validation
│   │   └── [...]
│   └── test
│       └── [...]
├── metadata-train.parquet
├── metadata-validation.parquet
└── metadata-test.parquet

The sub-folders have the name of {split_name}_{part_id}, and the JSON file has the same name as the sub-folder. Each image is a lossless WebP file and has a unique name generated by UUID .

Data Instances

For instance, below is the image of 1b4b2d61-89c2-4091-a1c0-f547ad5065cb.webp and its information in train_1.json.

{
  "image_path": "images/train/train_1/0280642d-f69f-41d1-8598-5a44e296aa8b.webp",
  "prompt_id": "000864-0061",
  "prompt": "painting of a holy woman, decorated, intricate, elegant, highly detailed, digital painting, artstation, concept art, smooth, sharp focus, illustration, art by artgerm and greg rutkowski and alphonse mucha, 8 k ",
  "classification": "People",
  "image_amount_in_total": 9,
  "rank": 5,
  "overall_rating": 4,
  "image_text_alignment_rating": 3,
  "fidelity_rating": 4
}

Data Fields

image: The image object
prompt_id: The id of the corresponding prompt
prompt: The text of the corresponding prompt
classification: The classification of the corresponding prompt
image_amount_in_total: Total amount of images related to the prompt
rank: The relative rank of the image in all related images
overall_rating: The overall score of this image
image_text_alignment_rating: The score of how well the generated image matches the given text
fidelity_rating: The score of whether the output image is true to the shape and characteristics that the object should have

Data Splits

As we mentioned above, all scales of the subsets we provided have three splits of "train", "validation", and "test". And all the subsets share the same validation and test splits.

Dataset Metadata

We also include three metadata tables metadata-train.parquet , metadata-validation.parquet , and metadata-test.parquet to help you access and comprehend ImageRewardDB without downloading the Zip files.

All the tables share the same schema, and each row refers to an image. The schema is shown below, and actually, the JSON files we mentioned above share the same schema:

Column	Type	Description
image_path	string	The relative path of the image in the repository.
prompt_id	string	The id of the corresponding prompt.
prompt	string	The text of the corresponding prompt.
classification	string	The classification of the corresponding prompt.
image_amount_in_total	int	Total amount of images related to the prompt.
rank	int	The relative rank of the image in all related images.
overall_rating	int	The overall score of this image.
image_text_alignment_rating	int	The score of how well the generated image matches the given text.
fidelity_rating	int	The score of whether the output image is true to the shape and characteristics that the object should have.

Below is an example row from metadata-train.parquet.

image_path	prompt_id	prompt	classification	image_amount_in_total	rank	overall_rating	image_text_alignment_rating	fidelity_rating
images/train/train_1/1b4b2d61-89c2-4091-a1c0-f547ad5065cb.webp	001324-0093	a magical forest that separates the good world from the dark world, ...	Outdoor Scenes	8	3	6	6	6

Loading ImageRewardDB

You can use the Hugging Face Datasets library to easily load the ImageRewardDB. As we mentioned before, we provide four subsets in the scales of 1k, 2k, 4k, and 8k. You can load them using as following:

from datasets import load_dataset

# Load the 1K-scale dataset
dataset = load_dataset("THUDM/ImageRewardDB", "1k")

# Load the 2K-scale dataset
dataset = load_dataset("THUDM/ImageRewardDB", "2k")

# Load the 4K-scale dataset
dataset = load_dataset("THUDM/ImageRewardDB", "4K")

# Load the 8K-scale dataset
dataset = load_dataset("THUDM/ImageRewardDB", "8k")

Additional Information

Licensing Information

The ImageRewardDB dataset is available under the Apache license 2.0 . The Python code in this repository is available under the MIT License .

Citation Information

@misc{xu2023imagereward,
      title={ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation}, 
      author={Jiazheng Xu and Xiao Liu and Yuchen Wu and Yuxuan Tong and Qinkai Li and Ming Ding and Jie Tang and Yuxiao Dong},
      year={2023},
      eprint={2304.05977},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

作者:

THUDM

数据集大小:

22.08 GB