数据集:

Bingsu/Cat_and_Dog

语言:

en

大小:

1K<n<10K

源数据集:

original

许可:

cc0-1.0
中文

Dataset Summary

A dataset from kaggle with duplicate data removed.

Data Fields

The data instances have the following fields:

  • image : A PIL.Image.Image object containing the image. Note that when accessing the image column: dataset[0]["image"] the image file is automatically decoded. Decoding of a large number of image files might take a significant amount of time. Thus it is important to first query the sample index before the "image" column, i.e. dataset[0]["image"] should always be preferred over dataset["image"][0] .
  • labels : an int classification label.

Class Label Mappings:

{
  "cat": 0,
  "dog": 1,
}

Data Splits

train test
# of examples 8000 2000
>>> from datasets import load_dataset

>>> dataset = load_dataset("Bingsu/Cat_and_Dog")
>>> dataset
DatasetDict({
    train: Dataset({
        features: ['image', 'labels'],
        num_rows: 8000
    })
    test: Dataset({
        features: ['image', 'labels'],
        num_rows: 2000
    })
})

>>> dataset["train"].features
{'image': Image(decode=True, id=None), 'labels': ClassLabel(num_classes=2, names=['cat', 'dog'], id=None)}