Visual Recognition Datasets for Deep Learning

November 19, 2020

Some visual recognition datasets have set benchmarks for supervised learning (Caltech101, Caltech256, CaltechBirds, CIFAR-10 andCIFAR-100) and unsupervised or self-taught learning algorithms(STL10) using deep learning across different object categories for various researches and developments. Under visual recognition mainly comes image classification, image segmentation and localization, object detection and various other use case problems. Many of these datasets have APIs present across some deep learning frameworks. I’ll be mentioning some of them in this article which can be directly imported and used to train models.

Cifar(Canadian Institute of Advanced Research) is a subset of 80 million tiny images dataset which has been collected by Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton.

Dataset can be found on the official website of the Computer Science department of the University of Toronto.

California Institute of Technology(Caltech) – a private research institute. Caltech vision databases are present under the Computational Vision section.

STL10 dataset was inspired byCIFAR-10, the dataset is present in the official website of the computer science department, Stanford University.

Taking our visual recognition datasets discussions further, today we will be talking about these datasets features along with some python code snippets on how to use them.

CIFAR10

Released in 2009 by Alex Krixhevsky CIFAR-10 contains 10 class categories- aeroplane, birds, cars, cats, deer, dogs, frogs, horses, ships, trucks. Images are present in 32×32 pixels in RGB format. Total of 60000 images wherein every 10 classes have 6000 images, making it a completely balanced dataset. This low-resolution image dataset is used for basic image classification problems. Convolutional Neural Networks have achieved pretty good results. In 2018, GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism achieved an error rate of 1%, which is the least until now.

Dataset size: 132.40 MiB

Data split pattern: Train set contains 50000 images, and test set contains 10000 images.

Code Snippet:

Using TensorFlow

import tensorflow_datasets as tfds
train,test = tfds.load('cifar10', split=['train', 'test'])

Using PyTorch

from torchvision import transforms, datasets
train = datasets.CIFAR10('', train=True, download=True,
                       transform=transforms.Compose([
                           transforms.ToTensor()
                       ]))
test = datasets.CIFAR10('', train=False, download=True,
                       transform=transforms.Compose([
                           transforms.ToTensor()
                       ]))

CIFAR100

Also released in 2009, CIFAR 100 is an extension to CIFAR 10, wherein there are 100 classes. Each class contains 600 images were 500 for training and 100 for testing with the same 32×32 pixelated colour images.

Divided into 20 superclasses- aquatic mammals, fish, flowers, food containers, fruit and vegetables, household electrical devices, household furniture, insects, large carnivores,large man-made outdoor things, large natural outdoor scenes, large omnivores and herbivores, medium-sized mammals, non-insect invertebrates, people, reptiles, trees, small mammals, vehicles 1, vehicles 2.

Dataset size: 132.03 MiB

Data split pattern: Train set contains 50000 images, and test set contains 10000 images.

Code Snippet

Using TensorFlow

import tensorflow_datasets as tfds
train,test = tfds.load('cifar100', split=['train', 'test'])

Using PyTorch

from torchvision import transforms, datasets
train = datasets.CIFAR100('', train=True, download=True,
                       transform=transforms.Compose([
                           transforms.ToTensor()
                       ]))
test = datasets.CIFAR100('', train=False, download=True,
                       transform=transforms.Compose([
                           transforms.ToTensor()
                       ]))

STL 10

Released in 2011, by Adam Coates, Honglak Lee and Andrew Y. Ng represented in the paper “An Analysis of Single-Layer Networks in Unsupervised Feature Learning”. This dataset was inspired by CIFAR-10, built for unsupervised learning in deep learning. Contains 10 class categories – aeroplane, birds, car, cat, deer, dog, horse, monkey, ship, truck. Each of these images is in 96×96 pixels in RGB format. Labelled images are taken from Imagenet. The main challenge here is on the unlabelled data, which is greater in size than labelled data and contains images similar to labelled images but with a different distribution.

Dataset size: 1.86 GiB

Data Split: train set 5000 images divided into 500 per class, the test set 8000 images divided 800 per class and unlabelled 100000 images these images are extracted from broader distribution of classes. In addition to the ones in the labelled set, it contains other types of animals (bears, rabbits, etc.) and vehicles (trains, buses, etc.).

Code Snippet

Using TensorFlow

import tensorflow_datasets as tfds
stl = tfds.load(‘stl10’)

Using PyTorch

from torchvision import transforms, datasets
train = datasets.STL10(root, split=’train’, download=True,
                       transform=transforms.Compose([
                           transforms.ToTensor()
                       ]))
test = datasets.STL10(root,split=’test’, download=True,
                       transform=transforms.Compose([
                           transforms.ToTensor()
                       ]))

CALTECH 101

Released in 2003, by Fei-Fei Li(Imagenet Creator), Fergus Rob and Perona Pietro. This dataset contains 101 object categories and an extra background clutter class(for background rejection testing). Each image is labelled with one object only after acquiring images through downloading then manually screening out images that do not fit in that particular category. Bounding boxes around the object are also provided. A total of around 9000 images are present. Each class contains approximately 40 to 800 images. The images are of variable dimensions ranging somewhere around 200 to 300 pixels.

Dataset size: 131 MB

Data Split Pattern: the training set contains 3060 images, and the testing set contains 6084 images.

Code Snippet

Using TensorFlow

import tensorflow_datasets as tfds
train,test = tfds.load('caltech101', split=['train', 'test'])

CALTECH 256

Released in 2006 by Greg Griffin, Alex Holub, and Perona Pietro, Caltech256 is an improvement to Caltech101 such as the number of object categories is more than double and the minimum number of samples per category was increased from 31 to 80. The background clutter class is also larger than earlier. Also, left-right alignment of images is not done which would make class categories into a more formidable set. Duplicate images had been removed using the SIFT algorithm. The total number of images increased from 9144 to 30,607 images spanning over 257 categories. A paper was published named “Caltech 256 Object Category Dataset “. The dataset can be found here.

Dataset Size: 1.2 GB

An implementation of the Caltech 256 dataset for image classification is shown in this notebook.

CALTECH BIRDS 2010

Released in 2010, by Peter Welinder, Steve Branson, Takeshi Mita, Catherine Wah, Florian Schroff, Serge Belongie, Pietro Perona. Caltech-UCSD Birds 200(CUB 200-2010) contains 200 bird species images(mostly North American species). Bird species classification is a challenging task for humans and computers. All these images are annotated with segmentation labels and bounding boxes a set of attribute labels such as shape, colour, pattern, etc. Images were downloaded from Flickr then annotated with the help of Amazon Mechanical Turk. This dataset was created to study subordinate categorization which was not possible with other datasets which focus on basic-level categories.

Dataset size: 659.64 MiB

Data Split Pattern: training set contains 3000 images and test set contains 3033 images.

Code Snippet

Using TensorFlow

import tensorflow_datasets as tfds
train,test = tfds.load('caltechbirds2010', split=['train', 'test'])

CALTECH BIRDS 2011

Released in 2011 by Catherine Wah, Steve Branson, Peter Welinder, Pietro Perona, Serge Belongie, this is an extension to its predecessor. Caltech-UCSD Birds 200-2011 contains the same 200 bird species with a total of 11788 images. Annotations for each image have 15 Part Locations, 312 Binary Attributes, 1 Bounding Box.

Dataset size: 1.11 GiB

Data split pattern: train set contains 5994 images, and test set contains 5794 images.

Code Snippet

Using TensorFlow

import tensorflow_datasets as tfds
train,test = tfds.load('caltechbirds2011', split=['train', 'test'])

This article has been published from the source link without modifications to the text. Only the headline has been changed.

Visual Recognition Datasets for Deep Learning

Related

Most Popular

Why AI’s Richest Builders Are Starting to Fear What They’ve Built

Trump Signs Executive Order Requiring Pre-Launch Access to New AI Models

Andrew Yang Is Right: AI Job Displacement Has Reached a Tipping Point

Mathematicians Sound a Formal Alarm Over AI’s Encroachment on Their Field

Bitcoin Slides to $67,000 as Strategy Sells and ETF Flows Dry Up

How AI Is Crushing the Generation of Startups Built Before ChatGPT

Follow Us

POPULAR POSTS

Uber Burned Its Annual AI Budget in Four Months — and Has Nothing to Show For It

Agentic AI Explained: What It Is and How to Use It Wisely

How AI Found Hidden Physics Inside Fusion Plasma That Humans Missed

Mathematicians Sound a Formal Alarm Over AI’s Encroachment on Their Field

POPULAR CATEGORY

Why AI’s Richest Builders Are Starting to Fear What They’ve Built

Visual Recognition Datasets for Deep Learning

Related

RELATED ARTICLES

Most Popular

Follow Us

POPULAR POSTS

POPULAR CATEGORY