数据集:
THUDM/ImageRewardDB
ImageRewardDB is a comprehensive text-to-image comparison dataset, focusing on text-to-image human preference. It consists of 137k pairs of expert comparisons, based on text prompts and corresponding model outputs from DiffusionDB. To build the ImageRewadDB, we design a pipeline tailored for it, establishing criteria for quantitative assessment and annotator training, optimizing labeling experience, and ensuring quality validation. And ImageRewardDB is now publicly available at ? Hugging Face Dataset .
Notice: All images in ImageRewardDB are collected from DiffusionDB, and in addition, we gathered together images corresponding to the same prompt.
The text in the dataset is all in English.
Considering that the ImageRewardDB contains a large number of images, we provide four subsets in different scales to support different needs. For all subsets, the validation and test splits remain the same. The validation split(1.10GB) contains 412 prompts and 2.6K images(7.32K pairs) and the test(1.16GB) split contains 466 prompts and 2.7K images(7.23K pairs). The information on the train split in different scales is as follows:
Subset | Num of Pairs | Num of Images | Num of Prompts | Size |
---|---|---|---|---|
ImageRewardDB 1K | 17.6K | 6.2K | 1K | 2.7GB |
ImageRewardDB 2K | 35.5K | 12.5K | 2K | 5.5GB |
ImageRewardDB 4K | 71.0K | 25.1K | 4K | 10.8GB |
ImageRewardDB 8K | 141.1K | 49.9K | 8K | 20.9GB |
All the data in this repository is stored in a well-organized way. The 62.6K images in ImageRewardDB are split into several folders, stored in corresponding directories under "./images" according to its split. Each folder contains around 500 prompts, their corresponding images, and a JSON file. The JSON file links the image with its corresponding prompt and annotation. The file structure is as follows:
# ImageRewardDB ./ ├── images │ ├── train │ │ ├── train_1 │ │ │ ├── 0a1ed3a5-04f6-4a1b-aee6-d584e7c8ed9c.webp │ │ │ ├── 0a58cfa8-ff61-4d31-9757-27322aec3aaf.webp │ │ │ ├── [...] │ │ │ └── train_1.json │ │ ├── train_2 │ │ ├── train_3 │ │ ├── [...] │ │ └── train_32 │ ├── validation │ │ └── [...] │ └── test │ └── [...] ├── metadata-train.parquet ├── metadata-validation.parquet └── metadata-test.parquet
The sub-folders have the name of {split_name}_{part_id}, and the JSON file has the same name as the sub-folder. Each image is a lossless WebP file and has a unique name generated by UUID .
For instance, below is the image of 1b4b2d61-89c2-4091-a1c0-f547ad5065cb.webp and its information in train_1.json.
{ "image_path": "images/train/train_1/0280642d-f69f-41d1-8598-5a44e296aa8b.webp", "prompt_id": "000864-0061", "prompt": "painting of a holy woman, decorated, intricate, elegant, highly detailed, digital painting, artstation, concept art, smooth, sharp focus, illustration, art by artgerm and greg rutkowski and alphonse mucha, 8 k ", "classification": "People", "image_amount_in_total": 9, "rank": 5, "overall_rating": 4, "image_text_alignment_rating": 3, "fidelity_rating": 4 }
As we mentioned above, all scales of the subsets we provided have three splits of "train", "validation", and "test". And all the subsets share the same validation and test splits.
We also include three metadata tables metadata-train.parquet , metadata-validation.parquet , and metadata-test.parquet to help you access and comprehend ImageRewardDB without downloading the Zip files.
All the tables share the same schema, and each row refers to an image. The schema is shown below, and actually, the JSON files we mentioned above share the same schema:
Column | Type | Description |
---|---|---|
image_path | string | The relative path of the image in the repository. |
prompt_id | string | The id of the corresponding prompt. |
prompt | string | The text of the corresponding prompt. |
classification | string | The classification of the corresponding prompt. |
image_amount_in_total | int | Total amount of images related to the prompt. |
rank | int | The relative rank of the image in all related images. |
overall_rating | int | The overall score of this image. |
image_text_alignment_rating | int | The score of how well the generated image matches the given text. |
fidelity_rating | int | The score of whether the output image is true to the shape and characteristics that the object should have. |
Below is an example row from metadata-train.parquet.
image_path | prompt_id | prompt | classification | image_amount_in_total | rank | overall_rating | image_text_alignment_rating | fidelity_rating |
---|---|---|---|---|---|---|---|---|
images/train/train_1/1b4b2d61-89c2-4091-a1c0-f547ad5065cb.webp | 001324-0093 | a magical forest that separates the good world from the dark world, ... | Outdoor Scenes | 8 | 3 | 6 | 6 | 6 |
You can use the Hugging Face Datasets library to easily load the ImageRewardDB. As we mentioned before, we provide four subsets in the scales of 1k, 2k, 4k, and 8k. You can load them using as following:
from datasets import load_dataset # Load the 1K-scale dataset dataset = load_dataset("THUDM/ImageRewardDB", "1k") # Load the 2K-scale dataset dataset = load_dataset("THUDM/ImageRewardDB", "2k") # Load the 4K-scale dataset dataset = load_dataset("THUDM/ImageRewardDB", "4K") # Load the 8K-scale dataset dataset = load_dataset("THUDM/ImageRewardDB", "8k")
The ImageRewardDB dataset is available under the Apache license 2.0 . The Python code in this repository is available under the MIT License .
@misc{xu2023imagereward, title={ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation}, author={Jiazheng Xu and Xiao Liu and Yuchen Wu and Yuxuan Tong and Qinkai Li and Ming Ding and Jie Tang and Yuxiao Dong}, year={2023}, eprint={2304.05977}, archivePrefix={arXiv}, primaryClass={cs.CV} }