Using CNN for Image Classification on CIFAR-10 Dataset

Devashree Madhugiri
7 min readAug 5, 2021
CIFAR-10 Image classification using CNN (Image Source: Author)

Convolutional neural network, CNN is a type of deep learning neural network which is commonly used for image recognition, image classification, objects detection etc.

CIFAR-10 is a very popular computer vision dataset provided by the Canadian Institute For Advanced Research (CIFAR). This dataset is used in many types of deep learning research for object recognition. Details about the CIFAR-10 dataset are available here.

Image Source: https://www.cs.toronto.edu/~kriz/cifar.html

The ‘10’ in this CIFAR-10 dataset refers to 10 classes. These 10 classes as shown in the above image are — Airplane, Automobile, Bird, Cat, Deer, Dog, Frog, Horse, Ship and Truck respectively.

There are 60,000, 32x32 color images from these 10 classes, with 6000 images from each class. The training dataset contains 50,000 training images and the test dataset contains 10,000 test images.

The CIFAR-10 dataset images are of color images with a (32 x 32) resolution. All the images in the dataset are of shape (32,32,3) where 3 represents the number of channels i.e R-G-B (Red,Green & Blue).

Let’s start by importing all required libraries and the dataset.

import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import keras
import tensorflow as tf

from tensorflow import keras
from keras.models import Sequential
from tensorflow.keras.layers import Input, Conv2D, Dense, Flatten, Dropout
from tensorflow.keras.layers import GlobalMaxPooling2D, MaxPooling2D
from tensorflow.keras.layers import BatchNormalization
from tensorflow.keras.models import Model
from tensorflow.keras import regularizers, optimizers
from tensorflow.keras.utils import to_categorical
from sklearn.metrics import accuracy_score

import warnings
warnings.filterwarnings('ignore')

print("Tensorflow version:",tf.__version__)
print("Keras version:",keras.__version__)

The Keras dataset module already comes with the CIFAR-10 dataset. So,we can directly import it using the ‘keras.datasets’ command.

from tensorflow.keras.datasets import cifar10
(X_train, Y_train), (X_test, Y_test) = cifar10.load_data()

We have to normalize the data in order to reduce the pixel values. Currently, all the image pixels are in a range from 0–255, and we want the values in the range of 0 and 1. So we can either divide all the pixel values by 255.0 or use the ‘normalize’ command from keras library. As CIFAR-10 has 10 classes, we are using the ‘to_categorical()’ method to one-hot encode the data.

# Normalizing
X_train = X_train/255
X_test = X_test/255
# One-Hot-Encoding
Y_train_en = to_categorical(Y_train,10)
Y_test_en = to_categorical(Y_test,10)

Now it’s time to build our model. We are going to use a Convolution Neural Network (CNN) to train our model. It includes:

# Base Modelmodel = Sequential()
model.add(Conv2D(32,(4,4),input_shape = (32,32,3),activation=’relu’))
model.add(MaxPooling2D(pool_size = (2,2)))
model.add(Conv2D(32,(4,4),input_shape = (32,32,3),activation=’relu’))
model.add(MaxPooling2D(pool_size = (2,2)))
model.add(Flatten())
model.add(Dense(128, activation =’relu’))
model.add(Dense(10, activation =’softmax’))
model.compile(loss =’categorical_crossentropy’, optimizer =’adam’, metrics =[‘accuracy’])
  • Convolution or Conv2D Layer: A convolution layer is used to extract features from the image or part of an image. Here, we are defining three parameters -
  1. Filters — It refers to the number of filters to be applied in the convolution. Eg. 32 or 64.
  2. Kernel_size — It refers to the length of the convolution window. Eg. (3,3) or (4,4).
  3. Activation — It refers to the regularizer function. Eg. ReLU, Leaky ReLU, Tanh, Sigmoid.
  • Pooling or MaxPooling2D Layer: In this layer, we are scaling down the size of an image. We are keeping the size (2,2) for the pooling layer.
  • Flatten Layer: This layer converts the n-dimensional array to 1 dimensional array.
  • Dense Layer: This layer is a fully connected layer i.e.all the neurons in the current layer are connected to the next layer. For our model, we are setting the first dense layer with 128 neurons and the second dense layer with 10 neurons.
  • model.compile() function: This function is used to compile the model. Here, we are defining three parameters -
  1. Loss function — It is used to evaluate how well our algorithm models the dataset. We can select options like ‘Categorical cross entropy’, ‘Binary cross entropy’, ‘sparse categorical cross entropy’ depending on our dataset.
  2. Optimizer — With this we can change the attributes of a neural network like weights and learning rate. Here, we can choose from different optimizers like Adam, AdaDelta, SGD etc.
  3. Metrics — It is used to understand the performance of our model. Eg. Accuracy, Mean Squared Error etc.
  • model.fit() function: This function is used to train our model which takes the training and test data to fit our model.
  • model.summary() function: This function is used to see all the parameters and shapes in all layers of our model. Here, we are defining -
  1. Epochs — Number of times we are passing the complete dataset forward and backward through the neural network.
  2. Verbose — options to view our output. Eg. verbose = 0 will print nothing, verbose = 1 will print the progress bar and one line per epoch while verbose = 2 will print one line per epoch.
model.summary()
history = model.fit(X_train, Y_train_en, epochs = 20, verbose=1,validation_data=(X_test,Y_test_en))

After running our base model with 20 Epochs, we are getting 87.79% training accuracy and 66.90% test accuracy which is good but as we can see the validation loss is continuously increasing from the first epoch till the last epoch i.e our base model is overfitting. Overfitting means that although our base model is giving good results on the training data set but fails to do so on the test data. Let’s try to get better accuracy by reducing the overfitting. We can achieve this by adding a few dropout layers in our model. Now we are building our model by dropping 25% of units.

# Model_1 with Dropoutsmodel_1 = Sequential()
model_1.add(Conv2D(64,(4,4),input_shape=(32,32,3),activation=’relu’))
model_1.add(MaxPooling2D(pool_size=(2,2)))
model_1.add(Dropout(0.5))
model_1.add(Conv2D(64,(4,4),input_shape=(32,32,3),activation=’relu’))
model_1.add(MaxPooling2D(pool_size=(2,2)))
model_1.add(Dropout(0.25))
model_1.add(Flatten())
model_1.add(Dense(256,activation=’relu’))
model_1.add(Dense(10,activation=’softmax’))
model_1.compile(loss=’categorical_crossentropy’,optimizer=’adam’,metrics=[‘accuracy’])

After running our model_1 with 30 Epochs, we can see that the validation accuracy has improved from 66.90% to 70.95% due to reduction in our model’s validation loss from 1.2931 to 0.9977. We will continue to train our model by increasing the number of epochs and by adding more filters.

# Model_2 with more filtersmodel_2 = Sequential()
model_2.add(Conv2D(64,(4,4),input_shape=(32,32,3),activation=’relu’))
model_2.add(Conv2D(64,(4,4),input_shape=(32,32,3),activation=’relu’))
model_2.add(MaxPooling2D(pool_size=(2,2)))
model_2.add(Dropout(0.4))
model_2.add(Conv2D(128,(4,4),input_shape=(32,32,3),activation=’relu’))
model_2.add(Conv2D(128,(4,4),input_shape=(32,32,3),activation=’relu’))
model_2.add(MaxPooling2D(pool_size=(2,2)))
model_2.add(Dropout(0.4))
model_2.add(Flatten())
model_2.add(Dense(1024,activation=’relu’))
model_2.add(Dense(1024,activation=’relu’))
model_2.add(Dense(units = , activation = ‘softmax’))
model_2.compile(loss=’categorical_crossentropy’,optimizer=’adam’,metrics=[‘accuracy’])

After running our model_2 with 50 Epochs, we can see that the validation accuracy has further improved from 70.95% to 75.49%.

Thus, the model accuracy improves from 66.90% to 70.95% and finally to 75.49%. Overall, with these two models the accuracy of the validation set improved by about 9%. We can say that the deep learning model has been reasonably trained well. Additionally, we can improve the model further using some data preprocessing.

We know that Deep learning tends to learn the features automatically from the data implying that with more training data we can expect the model to perform better. Especially, for image classification models with high-dimensional input samples, a deep learning model would likely perform better if it has been trained on a sufficiently high number of image samples. This is often a concern since huge amounts of labelled image data might not be easily available for training purposes.

In this CIFAR-10 dataset, although the 6000 images from each class might sound sufficient for training the model, overall, the dataset size is small. Hence, if it were larger in size i.e., having more images from each class to train the deep learning model would be beneficial to improve the model accuracy. For this purpose, a well-known solution is that of using the ‘Image data augmentation’ method. This technique helps us to artificially expand the training dataset by generating modified versions of all images in the existing dataset. This also avoids overfitting of the model. With the newly created image variations, we can try to improve the generalization of the deep learning model. The Keras library provides image data augmentation option using the ‘ImageDataGenerator’ class which is an API with an ability to generate images in batches in real-time i.e. the model generates these image variations directly during training. We can achieve these variations through a combinations of different parameters like geometric transformations (random rotate, shear, zoom, flip, crop or translate), changing RGB color channels, filtering (kernel filters for sharpening or blurring the image), erase (deleting a random part of the image) etc. It is important to note that none of the two images will be the same.

In conclusion, it might be interesting to use data augmentation on the CIFAR-10 dataset to increase the robustness of this deep learning model.

--

--

Devashree Madhugiri

Data Enthusiast focusing on applications of Machine Learning and Deep Learning in different domains.