数据集:

THUDM/ImageRewardDB

任务:

文生图

语言:

en

大小:

100K<n<1M

预印本库:

arxiv:2304.05977

许可:

apache-2.0
中文

ImageRewardDB

Dataset Summary

ImageRewardDB is a comprehensive text-to-image comparison dataset, focusing on text-to-image human preference. It consists of 137k pairs of expert comparisons, based on text prompts and corresponding model outputs from DiffusionDB. To build the ImageRewadDB, we design a pipeline tailored for it, establishing criteria for quantitative assessment and annotator training, optimizing labeling experience, and ensuring quality validation. And ImageRewardDB is now publicly available at ? Hugging Face Dataset .

Notice: All images in ImageRewardDB are collected from DiffusionDB, and in addition, we gathered together images corresponding to the same prompt.

Languages

The text in the dataset is all in English.

Four Subsets

Considering that the ImageRewardDB contains a large number of images, we provide four subsets in different scales to support different needs. For all subsets, the validation and test splits remain the same. The validation split(1.10GB) contains 412 prompts and 2.6K images(7.32K pairs) and the test(1.16GB) split contains 466 prompts and 2.7K images(7.23K pairs). The information on the train split in different scales is as follows:

Subset Num of Pairs Num of Images Num of Prompts Size
ImageRewardDB 1K 17.6K 6.2K 1K 2.7GB
ImageRewardDB 2K 35.5K 12.5K 2K 5.5GB
ImageRewardDB 4K 71.0K 25.1K 4K 10.8GB
ImageRewardDB 8K 141.1K 49.9K 8K 20.9GB

Dataset Structure

All the data in this repository is stored in a well-organized way. The 62.6K images in ImageRewardDB are split into several folders, stored in corresponding directories under "./images" according to its split. Each folder contains around 500 prompts, their corresponding images, and a JSON file. The JSON file links the image with its corresponding prompt and annotation. The file structure is as follows:

# ImageRewardDB
./
├── images
│   ├── train
│   │   ├── train_1
│   │   │   ├── 0a1ed3a5-04f6-4a1b-aee6-d584e7c8ed9c.webp
│   │   │   ├── 0a58cfa8-ff61-4d31-9757-27322aec3aaf.webp
│   │   │   ├── [...]
│   │   │   └── train_1.json
│   │   ├── train_2
│   │   ├── train_3
│   │   ├── [...]
│   │   └── train_32
│   ├── validation
│   │   └── [...]
│   └── test
│       └── [...]
├── metadata-train.parquet
├── metadata-validation.parquet
└── metadata-test.parquet

The sub-folders have the name of {split_name}_{part_id}, and the JSON file has the same name as the sub-folder. Each image is a lossless WebP file and has a unique name generated by UUID .

Data Instances

For instance, below is the image of 1b4b2d61-89c2-4091-a1c0-f547ad5065cb.webp and its information in train_1.json.

{
  "image_path": "images/train/train_1/0280642d-f69f-41d1-8598-5a44e296aa8b.webp",
  "prompt_id": "000864-0061",
  "prompt": "painting of a holy woman, decorated, intricate, elegant, highly detailed, digital painting, artstation, concept art, smooth, sharp focus, illustration, art by artgerm and greg rutkowski and alphonse mucha, 8 k ",
  "classification": "People",
  "image_amount_in_total": 9,
  "rank": 5,
  "overall_rating": 4,
  "image_text_alignment_rating": 3,
  "fidelity_rating": 4
}

Data Fields

  • image: The image object
  • prompt_id: The id of the corresponding prompt
  • prompt: The text of the corresponding prompt
  • classification: The classification of the corresponding prompt
  • image_amount_in_total: Total amount of images related to the prompt
  • rank: The relative rank of the image in all related images
  • overall_rating: The overall score of this image
  • image_text_alignment_rating: The score of how well the generated image matches the given text
  • fidelity_rating: The score of whether the output image is true to the shape and characteristics that the object should have

Data Splits

As we mentioned above, all scales of the subsets we provided have three splits of "train", "validation", and "test". And all the subsets share the same validation and test splits.

Dataset Metadata

We also include three metadata tables metadata-train.parquet , metadata-validation.parquet , and metadata-test.parquet to help you access and comprehend ImageRewardDB without downloading the Zip files.

All the tables share the same schema, and each row refers to an image. The schema is shown below, and actually, the JSON files we mentioned above share the same schema:

Column Type Description
image_path string The relative path of the image in the repository.
prompt_id string The id of the corresponding prompt.
prompt string The text of the corresponding prompt.
classification string The classification of the corresponding prompt.
image_amount_in_total int Total amount of images related to the prompt.
rank int The relative rank of the image in all related images.
overall_rating int The overall score of this image.
image_text_alignment_rating int The score of how well the generated image matches the given text.
fidelity_rating int The score of whether the output image is true to the shape and characteristics that the object should have.

Below is an example row from metadata-train.parquet.

image_path prompt_id prompt classification image_amount_in_total rank overall_rating image_text_alignment_rating fidelity_rating
images/train/train_1/1b4b2d61-89c2-4091-a1c0-f547ad5065cb.webp 001324-0093 a magical forest that separates the good world from the dark world, ... Outdoor Scenes 8 3 6 6 6

Loading ImageRewardDB

You can use the Hugging Face Datasets library to easily load the ImageRewardDB. As we mentioned before, we provide four subsets in the scales of 1k, 2k, 4k, and 8k. You can load them using as following:

from datasets import load_dataset

# Load the 1K-scale dataset
dataset = load_dataset("THUDM/ImageRewardDB", "1k")

# Load the 2K-scale dataset
dataset = load_dataset("THUDM/ImageRewardDB", "2k")

# Load the 4K-scale dataset
dataset = load_dataset("THUDM/ImageRewardDB", "4K")

# Load the 8K-scale dataset
dataset = load_dataset("THUDM/ImageRewardDB", "8k")

Additional Information

Licensing Information

The ImageRewardDB dataset is available under the Apache license 2.0 . The Python code in this repository is available under the MIT License .

Citation Information

@misc{xu2023imagereward,
      title={ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation}, 
      author={Jiazheng Xu and Xiao Liu and Yuchen Wu and Yuxuan Tong and Qinkai Li and Ming Ding and Jie Tang and Yuxiao Dong},
      year={2023},
      eprint={2304.05977},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}