You are on page 1of 32

Institute of Science and Technology

Tribhuvan University

A Seminar Report
On
“IMAGE CLASSIFICATION USING CNN”

Submitted To:

Central Department of Computer Science and Information Technology


Tribhuvan University, Kirtipur, Kathmandu, Nepal

In partial fulfillment of the requirement for Master’s Degree in Computer


Science and Information Technology

Submitted By:
Dhan Bahadur Pun
Roll No.: 18/077

June 20, 2021


Supervisor’s Recommendation

I hereby recommend that his Seminar report is prepared under my supervision by Dhan
Bahadur Pun entitled “IMAGE CLASSIFICATION USING CNN” be accepted as
fulfillment in partial requirement for the degree of Masters of Science in Computer
Science and Information Technology. In my best knowledge, this is an original work in
computer science.

…….…………………………

Asst. Prof. Jagdish Bhatta

Central Department of Computer Science and


Information Technology

i
LETTER OF APPROVAL

This is certify that the seminar report prepared by Mr. Dhan Bahadur Pun entitled
“IMAGE CLASSIFICATION USING CNN” in partial fulfillment of the requirements
for the degree of Masters of Science in Computer Science and Information Technology
has been well studied. In our opinion, it is satisfactory in the scope and quality as a
project for the required degree.

Evaluation Committee

………………………… …………………………
Asst. Prof. Nawaraj Paudel Asst. Prof. Jagdish Bhatta
(HOD) (Supervisor)
Central Department of Computer Science Central Department of Computer Science
and Information Technology and Information Technology

………………………………

(Internal)

ii
ACKNOWLEDGEMENT

The Seminar entitled “IMAGE CLASSIFICATION USING CNN” has been conducted
to satisfy the partial requirement for the degree of Master of Science in Computer Science
and Information Technology, Tribhuvan University.

Firstly, I would like to express appreciation to all those who provided us the possibility to
complete this seminar report. A special gratitude to our supervisor Prof. Jagdish Bhatta
for this contribution in stimulating suggestions and encouragement that helped us to co-
ordinate this project especially in writing this seminar report.

Dhan Bahadur Pun (18/077)

iii
ABSTRACT

Image classification is process of categorizing and labeling groups of pixel in a digital


image into one of the several classes. The seminar is focused on the study of Convolution
Neural Network. Convolution neural network is a neural network that has one or more
convolution layers and are used mainly for image processing, classification, segmentation
etc. In this study VGG16, GoogLeNet, and AlexNet was studied and their accuracy result
is 92.7%, 93.33% and 83% respectively. In this study CIFAR10 datasets is used to train a
Convolution Neural Network.

In this study, CIFAR10 datasets are train through the different layers of convolution
neural network, first input image of 32 x 32 x 1 size is passed to first convolution layer
and then output of first convolution layer is passed to second convolution layer and then
flatten the output of second layer into single dimension array then passed to two different
fully connected layer then passed to final layer with softmax function. After this process
happened the finally 75% accuracy result is achieved by this model.

iv
TABLE OF CONTENTS

Contents
ACKNOWLEDGEMENT..............................................................................................................iii

ABSTRACT...................................................................................................................................iv

LIST OF FIGURES.......................................................................................................................vii

LIST OF ABBREVIATIONS......................................................................................................viii

CHAPTER 1 INTRODUCTION.................................................................................................1

1.1 Introduction.......................................................................................................................1

1.2 Problem Statement............................................................................................................3

1.3 Objectives..........................................................................................................................3

CHAPTER 2 BACKGROUND STUDY AND LITERATURE REVIEW.................................4

2.1 Background Study.............................................................................................................4

2.1.1 Neural network..........................................................................................................4

2.1.2 CNN...........................................................................................................................4

2.2 Literature Review..............................................................................................................4

2.2.1 Image classification using various CNN architectures..............................................4

CHAPTER 3 METHODOLOGY.................................................................................................6

3.1 Flowchart..........................................................................................................................6

3.2 Data Set Description.........................................................................................................6

3.3 Data Preprocessing............................................................................................................7

3.4 Classification using CNN..................................................................................................8

3.4.1 Input layer..................................................................................................................8

3.4.2 Convolutional Layer..................................................................................................8

3.4.3 Pooling Layer...........................................................................................................11

v
3.4.4 Flatten Layers..........................................................................................................12

3.4.5 Fully Connected Layers...........................................................................................13

CHAPTER 4 IMPLEMENTATION..........................................................................................15

4.1 Numpy.............................................................................................................................15

4.2 Matplotlib........................................................................................................................15

4.3 Keras...............................................................................................................................15

4.4 Python.............................................................................................................................15

CHAPTER 5 RESULT AND ANALYSIS................................................................................16

5.1 Predicted result according to Actual data.......................................................................16

5.1.1 Accuracy:.................................................................................................................17

CHAPTER 6 CONCLUSION....................................................................................................19

References......................................................................................................................................20

vi
LIST OF FIGURES

Figure 1. 1 Architecture of CNN [2]...............................................................................................3

Figure 3. 1 Methodology.................................................................................................................6

Figure 3. 2 Operation of convolution with kernel 3, no padding, and stride 1 [7]........................10

Figure 3. 3 Max Pooling................................................................................................................12

Figure 3. 4 Flattening of 3×3 image matrix into 9×1 vector.........................................................12

Figure 3. 5 Fully connected layers................................................................................................13

Figure 3. 6 Model Architecture....................................................................................................14

Figure 5. 1 Accuracy Movement...................................................................................................17

vii
LIST OF ABBREVIATIONS

1D One-dimensional

2D Two-dimensional

3D Three-dimensional

CNN Convolutional Neural Network

GPU Graphical Processing Unit

LeRU Linear Rectifier Unit

RGB Red, Green, and Blue

viii
CHAPTER 1 INTRODUCTION

1.1 Introduction

Image classification is the process of categorizing and labeling groups of pixels or


vectors within an image based on specific rule. Image classification is the task of
extracting information classes from a raster image. The raster image from image
classification can be used to create thematic map. The image classification is the process
of categorization the all pixel in a digital image into one of the several classes. This
categorized data then may be used to produce thematic maps of the land cover present in
image. Image classification is to plays an important role in remote sensing images and is
used for various applications such as environmental changes, agriculture, land use and
planning, urban planning etc. The image classification is where a computer can analyze
an image and identify the class the image falls under. A class is essentially a label, for
instance, ‘car’, ‘animal’, and so on. For example, you input an image of a cat. Image
classification is the process of computer analyzing the image and telling it’s a cat.

There are several techniques for classification of image, like Supervised and Unsupervised
classification, Artificial Neural Network, SVM, K-Nearest Neighbor, Naïve Bayes,
Random Forest Algorithm, and Convolution Neural Networks (CNNs). Convolution
Neural Network is describe in details. CNNs are very similar to ordinary Neural Networks.
CNNs are composed of artificial neurons that have biases and learnable weights. Each
neuron receives some inputs, performs a dot product [1]. Artificial neurons are
mathematical functions that calculate the weighted sum of multiple inputs and outputs. The
weight defined the behavior of each neurons. The convolution neural network is a
specialized type of neural network model designed for working with 2D (image) data,
although they can be used with 1D (text or audio) and 3D (video).

CNN is use different layers to classify the image. The first layer of convolutional neural
network is the convolution layer that gives the network its name. This layer performs an
operation called a convolution. It extracts the high-level features from the input signal.

1
When input image is provide into a convolution neural network, each of its layers
generates the several activation maps. Each neurons takes a patch of pixels as input,
multiplies their color values by its weights, sum them up, and runs them through the
activation function. The convolution layer detects basic features such as horizontal,
vertical, and diagonal edges. The output of first layer is input of the next layer [2].

The next layer is called pooling layer. The pooling operation, which is fixed according to
the applications, includes max-pooling, min-pooling and average pooling. Pooling
operation is mainly used for the dimensionality reduction of feature maps from
convolution operation and also to select the most significant feature [3]. Due to the
complicity of CNN, ReLU is the common choice for the activation function to transfer
gradient in training by back-propagation. Back-propagation networks are feed-forward
networks in which the signals propagate in only one direction, from the inputs of the input
layer to the output of the output layer.

The fully connected layers are final layers in the CNN structure that can be one or more
layers and placed after a sequence of convolution and pooling layers. This layer is also
called classification layer, which takes the output of the final convolution layer as input.
Based on the activation map of the final convolution layer, the classification layer outputs
a set of confidence scores that specify how likely the image is to belong to a class [1]. For
example, CNN that detects cows, elephants, and tigers, the output of the final layer is the
possibility that the input image contains any of those animals. The last layer of fully
connected layers is known as softmax classifier and determines the probability of each
class label over N number of classes.

2
Figure 1. 1 Architecture of CNN [2]

1.2 Problem Statement

Image classification is process of assigning labels to images according to their classes.


Consider we have images of two categories Dog and Cat, when we provide the image to
classification model, model will assign label to image according to category. The CNN
follows the hierarchical model. The CNN reduces the number of parameters to be trained
but convolution neural network is significantly slower due to operation like maxpooling.
If CNN has several layers then training process takes a lot of time, if computer doesn’t
have good GPU the CNN takes a lot of time and doesn’t produce accurate result, and
CNN needs a lot of datasets to process and train the neural network. In CNN we need to
perform a lot of operation so its time consuming.

1.3 Objectives

The main objective of this study is classification of images according their categories or
classes using the convolution neural network and predict their result.

1.4

3
CHAPTER 2 BACKGROUND STUDY AND LITERATURE
REVIEW

2.1 Background Study

2.1.1 Neural network

A neural networks similarly to the human brain’s neural network. A neuron in a neural
network is a mathematical function that collects and classifies information according to a
specific architecture. The network bears a strong resemblance to statistical methods such
as curve fitting and regression analysis.

2.1.2 CNN

A Convolution Neural Network is a deep learning algorithm which can take in an image,
assign weights and biases to various objects in the image and be able to differentiate one
from the other.

2.2 Literature Review

2.2.1 Image classification using various CNN architectures

VGG16: VGG16 is most preferred CNN architecture, developed by Simonyan and


Zisserman from the university of oxford, has 16 convolution layers of 3 x 3 filter with
stride 1 and same padding and max pooling layer of 2 x 2 filter of 2 stride. At the end it has
2 fully connected layers followed by a softmax function for output and it was applied to
the ImageNet challenge in 2014 and the network achieved 92.7% top-5 test accuracy on
the ImageNet dataset. [3]

GoogLNet: The GoogLNet is based on the inception network which is also popular for
CNNs. There are three version of inception network, which are named inception version 1,
2, and 3. The first version of inception network is called GoogLNet, developed by a team

4
at Google in 2014. It has 22 layers with 27 pooling layers and the network achieved
93.33% top-5 accuracy on the ImageNet dataset [5].

AlexNets: AlexNet is most popular neural network architecture, developed by Alex


Krizhevsky in 2012. It has total of eight layers, out of first five are convolution layers and
last three are fully connected layers [5]. The first two layers are connected to max pooling
layers. The third, fourth and fifth layers are directly connected to fully connected layers.
The output of convolution and fully connected layers are connected to Relu function. The
final output layers is connected to a softmax activation layer. It achieved 83% top-5
accuracy [3].

5
CHAPTER 3 METHODOLOGY

3.1 Methodology

The main steps in the image classification process are shown in the following diagram.

Figure 3. 1 Methodology
In this seminar CIFAR10 dataset is used and first load datasets then preprocess the loaded
dataset and then train the preprocessed data by using Convolution Neural Network
approach. Finally predict the result according to trained datasets.

6
3.2 Data Set Description

CIFAR10 datasets is used in this seminar paper. CIFAR10 (Canadian Institute for
Advanced Research) is a collection of images. The CIFAR10 contains 60,000 32 x 32
color images in 10 different classes with 6,000 images per class. The 10 different classes
represent airplanes, cars, birds, cats, deer, dogs, frogs, horses, ships, and trucks. There are
50,000 training images and 10,000 test images.

Name of No. of Image Image per No. of No. of test Total no.
dataset classes size class training images of images
images

CIFAR10 10 32x32 6000 50000 10000 60000

labels 0 1 2 3 4 5 6 7 8 9

Name Airplane Car Bird Cat Deer Dog Frog Horse Ship Truck
s

3.3 Data Preprocessing

In this seminar CIFAR10 dataset is used and which is available in “keras.datasets”


library. First load the datasets from library using the function load_data() and set into
training and testing arrays, then normalization, reshaping/rotation and transformation
technique is used for data preprocessing to enhancing the some important image features.

In data normalization process there are two arrays, training and testing, divide the every
elements of arrays by 255 to set every arrays values between 0 - 1. Training and Testing

7
arrays contain values between 0 - 255 before normalized arrays, where 0 means
completely black and white means completely white.

Training = Training / 255 and Testing = Testing / 255

In image reshape/rotate process, reshape function used to reshape the arrays to fit data (or
array) into network size. Reshape help to change the dimension of arrays into appropriate
size.

In transformation process, change image from RGB color to Grayscale color,


cvtColor(img, color) function is used to change the image into Grayscale, this function is
available in cv2 library.

3.4 Classification using CNN

CNN consist of different components in its architecture, which are describe as below.

3.4.1 Input layer

The input layer is the first layer of CNN, which takes images as input and passing onto
further layers for features extraction. The image size used in this seminar is (32,32, 1),
which means image size is 32x32 in height and breath and 1D (means only gray color).
Each image pixel has a value between 0 and 1.

3.4.2 Convolutional Layer

This layer is the first layer that is used to extract the various features from the images. This
layer occurs the majority of computation. In this seminar Conv2D function is used for
convolution layer, which is available in keras library. It takes a filter, kernel_size,
activation, padding, and input_shape as a parameter. The parameters input_shape takes
input image which is described in above input layer. The function of others parameters are
describe below.

8
3.4.2.1 Filters:

The filter is parameter, which is given to convolution layer. Here 32 is given to filter,
which means it can detect the 32 different features or edges in input image. If possible
give image for example.

3.4.2.2 Kernel Size

The function of this parameter is to determine the size of filter images. In this seminar 3
is given to kernel size, which means each 32 different filter images have 3x3 size in
height and breath.

It is the number of pixels, or distance that the kernel moves over the input matrix. The
larger the stride value yields a smaller output size. The value of stride is two or higher
than two is rare case.

3.4.2.3 Stride

It is the number of pixel, or distance that the kernel moves over the input matrix. The
larger the stride value yields a smaller output size. In this seminar 1 value is given to
stride.

3.4.2.4 Padding

Padding is a process of adding layers of zeros to input images to fix the size of images, if
image is different size. In this seminar same padding size is used, because all images
have same size that is 32x32

9
Figure 3. 2 Operation of convolution with kernel 3, no padding, and stride 1 [7]

The figure above describe how convolution operations is performed. In above figure image
with 5x5 in size and each pixel have own value and kernel of size 3 with no padding and
stride size is one. So, Feature map or output of convolution matrix with 3 kernel size, no
padding and stride 1 is calculated using following formula.

nh +2 p −f n
n h × nw ∶=( + 1)×( w+2 p−f +1) (1)
s s

Where, n h and n w are height and width of images, p is padding, f is filter and s is stride.

10
The filter is applied to calculate a dot product between the input pixel and filter in an area
of image. This dot product is then fed into an output array whose size is determine in
equation (1)’s formula shown in figure above. For example

1*1+2*0+1*1+2*0+0*1+0*0+1*1+0*0+2*1 = 5, so 5 is store in first pixel of output


matrix. Then filter is moved one pixel from left to right, then similarly calculate the pixel
wise cross product and add, when filter is reach to end of image then it move from right to
left and move one pixel down unit whole image is completed.

After each convolution operation, a convolution neural network applies a Rectified Linear
Unit, short form ReLU, transformation to feature map, to introduce a nonlinearity in
model.

3.4.2.5 ReLU Layer

The ReLU activation function is passed to convolution layer to increase non-linearity in


the CNN, because images are made of different objects that are highly non-linear. Without
applying this function the image classification will be treated as a linear problem while it is
actually a non-linear one. It removes negative values from an activation map by setting
them to zero. The activation function used in CNN is the rectified linear unit.
Mathematically, it’s described as:

g ( z )=max ( 0 , z ) ……………….. (2)

3.4.3 Pooling Layer

The pooling layers are commonly used immediately after convolution layers. The primary
aim of pooling layer is to decrease the size of the convolved feature map to reduce the
computation and amount of parameters in network. In this seminar the max pooling
operation used, which is describe as below.

3.4.3.1 Max Pooling Layer

In the max pooling, the largest element is taken from feature map. As the filter moves
across the input, it select the pixel with the maximum value to send to the output array. A

11
max pooling with a filter of size (or pool size) 2×2 with a stride of 2 is used. Figure
below describe the how max pooling operation is performed.

Figure 3. 3 Max Pooling

In Figure 3.7, max pooling works by placing a matrix of 2×2 on the feature map and
picking the largest value in that box. The 2×2 matrix is moved from left to right through
the entire feature map picking the largest value in each pass.

3.4.4 Flatten Layers

After pooled feature map is obtained, the next step is to flatten it. Flattening is converting
the data into a 1D array for inputting it to the next layer. The flattening involves
transforming the entire pooled feature map into a single column which is then connected to
classification model, called fully-connected layer.

Figure 3. 4 Flattening of 3×3 image matrix into 9×1 vector

12
3.4.5 Fully Connected Layers

The fully connected layers are also called dense layers that can be one or more layers and
placed after a sequence of convolution and pooling layers. In this seminar three dense
layers is used, and 128 neurons, relu activation function is used in first two dense layers.
10 neurons for image classes and softmax activation function is passed to classify inputs
and producing a probability from 0 to 1, are passed in last dense layers

In this layer, information is passed through the network and the error prediction is
determined. The error is then backpropagated through the network to improve the
prediction. Full connected layer performs the classification task based on the features
extracted through previous layers.

Figure 3. 5 Fully connected layers

13
In this study the images are classified on following architecture.

Figure 3. 6 Model Architecture

Figure above shows the complete architecture of CNN, and this how the input image classify
using CNN. Here the input image size is 32 x 32 x 1, this image is pass to Conv 1 layer with
filter of size 3, stride 1, no padding and 32 filters. Then output of matrix is 30 x 30 x 32, size of
output matrix is calculated using equation (1) formula. And same concept is apply on Conv 2
layers and in Pooling layer first with use filter of size 2, stride 2 and no padding. The output
matrix is also calculated from equation (1) formula. After Pooling layer we use Conv 3 and Conv
4 layer same as Conv 1 and 2 but 64 filters instead of 32. And we used same Pooling layer with
output of size 5 x 5 x 64. After Pooling layer we need flatten the pooling layer output of
dimension 2 into flatten layer of dimension 1, whose output is 1600 x 1. After flatten the layer

14
we passed value to two fully connected layers (FC 3 and FC 4) with size 128 x 1. Then image is
classified according to input.

15
2

CHAPTER 4 IMPLEMENTATION

4.1 Numpy

Image is two dimensional data structure which have height and width, so Numpy array is
used to store a two dimensional image data structure in two dimensional array. Numpy
array is used for faster a numerical calculation.

4.2 Matplotlib

Matplotlib is widely used python library to plot different color graphs like bar chart,
histogram, pi chart etc. Matplotlib is used to model evaluation, to show accuracy score
towards test data in graph, to show loss values towards test data in graph, and analysis the
result in graph.

4.3 Keras

Here the Keras library is used to implement whole convolution neural network, to
implement convolution layer, pooling layer and fully connected layers.

4.4 Python

Python is object oriented programming language. The whole program is wrote using
python programming language.

4.5 Implementation of Convolution Neural Network

 Keras Sklearn Conv2D is a 2D Convolution Layer used to create a convolution


kernel that is wind with layers input which helps produce a tensor of outputs.
Conv2D function takes a filters, kernel size, activation, padding as parameter.
Where filters is used to produce the number of images in different channel, 32
filters is used. And kernel_size determines the dimension of the kernel, and must be
odd integer, 3 kernel size is used. The stride specifying the step of the convolution

16
along with height and width of input, the default value is (1, 1) and default value is
used. The padding can take one of two values that is valid or same. The activation
function is used to apply after performing the convolution, relu activation function
is used.

 Keras Sklearn MaxPool2D is function used to create a pooling layer in network and
its operation is to selects the maximum element from the region of feature map
covered by the filter. This layer takes a two parameters as filter and stride, filter =
(2, 2) and stride = (2, 2) is used.

 Keras Sklearn Flatten is function used to create a flatten layer in network, flatten
function takes no argument and it flatten the output of convolution 2D matrix into a
1D matrix.

 Keras SKlearn Dense is function used to create a fully connected layer in network,
it takes units and activation as arguments. Here units uses positive integer to
represent the input size of layer, here 128 units is used. Activation is use to apply
the element-wise activation function in dense layer, relu activation function is used.

17
CHAPTER 5 RESULT AND ANALYSIS

5.1 Predicted result according to Actual data

The total number of testing data is 1000 on each class. There are total of 10 classes and
total testing data on all class is 10000. After trained the model, we following predicted
value according actual data as shown in table below.

S.N Labels Correct Incorrect

0 Airplane 801 199

1 Automobile 808 192

2 Bird 555 445

3 Cat 522 478

4 Deer 798 202

5 Dog 661 339

6 Frog 811 189

7 Horse 842 158

8 Ship 859 141

9 Truck 890 110

18
Total 7547 2453

The table above shows the correct and incorrect data over the testing data sets. There are
10 classes shown in table above and each class have own correct and incorrect values.
For instance the 801 airplane image samples are predicted to airplane, and 199 airplane
image samples are predicted to others over the 1000 data of airplane. Similarly for other
classes likes Automobile. Bird, Cat, Deer, Dog, Frog, Horse, Ship, truck, all are same
meaning according to tabular values. .

5.1.1 Accuracy:

The accuracy is total correct predicted data over the total number of testing datasets.
Accuracy score measures the image samples predicted correctly. The accuracy score is
calculated using following formula.

CP
AS= ×100 ……………(3)
TDs

Where AS = Accuracy Score, CP = Correct Prediction, TDs = Total Datasets

Figure 5. 1 Accuracy Movement


19
The above graph shows the accuracy score and overall model accuracy score is 75%,
which is calculated from equation (3), it seems like most of test samples are predicted 75
percentage over the total datasets. This CNN model describes the almost 75 percent data
are predicted correctly.

20
CHAPTER 6 CONCLUSION

During this study image classification using CNN has been implemented and analyzed
and it is found that image classification using CNN works in different layers that is
Convolution Layer, Pooling Layer and Fully Connected Layer. During this study, after
trained the input image samples through the Convolution Neural Network 75% accuracy
result is achieved, and hence Convolution Neural Network is used for image
classification.

21
References

[1] J. Jeong, "towards data science," 24 1 2019. [Online]. Available:


www.towardsdatascience.com/the-most-intuitive-and-easiest-guide-for-
convolution-neural-network.

[2] A. Bonner, "towards data science," 2 2 2018. [Online]. Available:


www.towardsdatascience/wtf-is-image-classification-8e78a8235acb.

[3] K. S. a. A. Zisserman, "VGG16," 20 November 2018. [Online]. Available:


https://neurohive.io/en/popular-networks/vgg16/.

[4] R. Yamashita, M. Nishio, R. K. Gkan and K. Togashi, "CNN," pp. 05-19,


2018.

[5] S. Saha, "towardsdatascience," 15 12 2018. [Online]. Available:


www.towardsdatascience.com.

[6] D. Stutz, "Understanding Convolutional Neural Networks," 2014.

[7] D. Pawar, "Dipti Pawar," 14 8 2018. [Online]. Available:


medium.com/@dipti.rohan.pawar/improving-performance-of-convolution-
neural-network-2ecfe0207de7.

22
[8] V. Kurama, "PaperspaceBlog," 2020. [Online]. Available:
https://www.blog.paperspace.com/popular-deep-learning-architecture-
alexnet-vgg-googlnet. [Accessed 19 junly 2021].

23

You might also like