数据集:

sasha/dog-food

语言:

en

计算机处理:

monolingual

大小:

1K<n<10K

语言创建人:

found

批注创建人:

found

源数据集:

original
中文

Dataset Card for the Dog ? vs. Food ? (a.k.a. Dog Food) Dataset

Dataset Summary

This is a dataset for binary image classification, between 'dog' and 'food' classes.

The 'dog' class contains images of dogs that look like fried chicken and some that look like images of muffins, and the 'food' class contains images of (you guessed it) fried chicken and muffins ?

Supported Tasks and Leaderboards

TBC

Languages

The labels are in English (['dog', 'food'])

Dataset Structure

Data Instances

A sample from the training set is provided below:

{
{'image': <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=300x470 at 0x7F176094EF28>, 
'label': 0}

}

Data Fields

  • img: A PIL.JpegImageFile object containing the 300x470. image. Note that when accessing the image column: dataset[0]["image"] the image file is automatically decoded. Decoding of a large number of image files might take a significant amount of time. Thus it is important to first query the sample index before the "image" column, i.e. dataset[0]["image"] should always be preferred over dataset["image"][0]
  • label: 0-1 with the following correspondence 0 dog 1 food

Data Splits

Train (2100 images) and Test (900 images)

Dataset Creation

Curation Rationale

N/A

Source Data

Initial Data Collection and Normalization

This dataset was taken from the qw2243c/Image-Recognition-Dogs-Fried-Chicken-or-Blueberry-Muffins? Github repository, merging the 'chicken' and 'muffin' categories into a single 'food' category, and randomly splitting 10% of the data for validation.

Annotations

Annotation process

This data was scraped from the internet and annotated based on the query words.

Personal and Sensitive Information

N/A

Considerations for Using the Data

Social Impact of Dataset

N/A

Discussion of Biases

This dataset is imbalanced -- it has more images of food (2000) compared to dogs (1000), due to the original labeling. This should be taken into account when evaluating models.

Other Known Limitations

N/A

Additional Information

Dataset Curators

This dataset was created by @lanceyjt, @yl3829, @wesleytao, @qw2243c and @asyouhaveknown

Licensing Information

No information is indicated on the original github repository .

Citation Information

N/A

Contributions

Thanks to @sashavor for adding this dataset.