You are on page 1of 13

`AI & Machine Vision Coursework

Implementation of Deep Learning for classification of natural images


Table of Contents
Abstract............................................................................................................................................2

1. Introduction..................................................................................................................................2

2. Research Methodology................................................................................................................3

2.1 Method for image classification............................................................................................3

2.2 Current literature........................................................................................................................4

2.3 Residual learning...................................................................................................................4

2.4 Batch normalization...............................................................................................................4

3. MatLab simulation process..........................................................................................................4

3.1 Pre-processing of data............................................................................................................4

3.2 Implementation of pre-trained CNN......................................................................................4

3.3 Splitting data into test and train.............................................................................................6

3.4 Image pre-processing using Squeezenet................................................................................6

3.5 Feature extraction and training SVM classifier.....................................................................7

3.6 Performance evaluation.........................................................................................................8

4 Results...........................................................................................................................................9

5. Conclusion.................................................................................................................................11

6. References..................................................................................................................................12
Abstract
Convoluted neural network is a master algorithm in computer vision recent years. Chollet(2017)
reported that CNNs manage to accomplish superhuman performance to achieve complex visual
tasks with robust computational power. It is one of the outperformer architecture that has been
executed in this study for transfer learning. It merges with idea of other famous CNN
architecture such as GoogleNet, ResNet and SqueezeNet which replaces the inception modules
with special layer known as depth wise separable convolution (Liu et al, 2018).

1. Introduction suitable network is a tradeoff among these

Convoluted Neural Network is a robust traits. The ideal feature of pre-trained

machine learning technique in the deep network is high accuracy and speed. The

learning method. It is widely employed to classification accuracy versus prediction

train wide range of images. It is rich in time is usually plotted with mini-batch size

characteristic features for identification and 128. The prediction time is estimated using

classification. The features of CNN literally relative to the fastest network. The

outperforms hand crafted features such as classification accuracy on the ImageNet

SURF, HOG and LBP. It is the simple validation set is the most common way to

method to leverage power of CNN without measure the accuracy of networks trained on

consuming time and effort into training and ImageNet. Networks that are accurate on

just employ pre-trained network as a feature ImageNet are also often accurate when you

extractor. It is very easy to transfer learning apply them to other natural image data sets

to the pre-trained network instead to training using transfer learning or feature extraction.

a network from the scratch. Pre-trained This generalization is possible because the

neural networks are used for classification, networks have learned to extract powerful

feature extraction and transfer learning. and informative features from natural

There are different types of neural network images that generalize to other similar data

in its features and can be applied based on sets. However, high accuracy on ImageNet

the nature of the problem. The key traits of does not always transfer directly to other

the pre-trained network are network tasks, so it is a good idea to try multiple

accuracy, speed and size. Choosing the networks. There are different methods to
estimate the prediction and classification objective of the present study, description
accuracy especially for the ImageNet about deep learning techniques and CNN
validation dataset and wide range of sources neural network. The research methodology
employ different type of estimation section explains about the implementation
methods. However, ensemble of multiple procedure for the deep learning technique
models is also employed and in some for image classification. The simulation
scenario every image is assessed several explains the procedure for MatLab
times using multiple crops. Although, the simulation protocol for natural image
top most 5 kinds of accuracy values are identification, classification, feature
considered instead of the standard 1st extraction, batch normalization. The result
accuracy value in ensemble learning section infers the performance assessment
method. Despite of such variations, it is using confusion matrix. The conclusion
usually impossible to compute the accuracy section summarizes the current study results
value directly without comparing the and scope for the future improvement.
accuracy from other methods. The
accuracies of pre-trained networks in the
2. Research Methodology
MatLab Deep Learning Toolbox™ are the 2.1 Method for image classification
standard (top-1) accuracies computed using The research method for image
a single model and also single central image identification and classification involves
crop. The main research objective is to build three steps such as training a neural network,
an image identification and classification of validating the neural network and testing the
natural images and then to extract the neural network. This is also called as
features from the image and use those multilayer classification. The Validation
images to develop machine learning step is to assess the specific neural
algorithm to recognize what type of image network’s efficacy in prediction,
class it was among different natural images classification and identification as needed
(Liu et al, 2018). The structure of this report and to achieve the same, a unique validation
is divided into five sections such as data is employed. The validation data is
Introduction, research methodology, MatLab fetched from the portion of train test dataset
simulation, Results and conclusion. splitting. Upon completion of the validation,
Introduction section explains about the the testing dataset is classified as per the
different object classes using the trained implemented. (Liu et al, 2018; Bengio et al,
network. 2017). In this method, a couple of layers
were mapped and stacked up residually to
2.2 Current literature
map the image patterns in order to make
The benefits associated with recruiting pre-
easier for unsupervised learning. This
trained image classification network is to
method is used to train CNN which also
extract strong and informative features from
enhances the accuracy and thereby
the natural images which could be used as
improving the object identification and
initiation point to learn new task. Most of
classification.
the pre-trained network are trained on a
subset of the ImageNet database. This 2.4 Batch normalization
database is widely used for Large-Scale According to (Chan et al, 2015), internal
Visual Recognition Challenge (LSVRC). covariate shift was used to scale and shift
These networks are trained using millions of the non-linearity in the input layer. Two
images and possess the capacity to classify activation and back propagation was
images into more than 1000 object classes conducted and batch normalization was
such as flowers, birds, animals, utensils etc. performed to enhance efficiency so that the
Transfer learning from the group of actual training time could be reduced.
images to the specific images generally
enhances the prediction accuracy (Afzal et
3. MatLab simulation process
al, 2015). This method also facilitates the 3.1 Pre-processing of data
deep learning tool box and CNN A common issue occurs while performing
architectures and its classification efficacy image classification study is the differing
even if the sample size of the training data size of images within the dataset. Images
set is limited. with differing in its height and width are
invalid to be stacked in an array as an input
2.3 Residual learning
for machine learning algorithm. The
The training precision reduces with the
continuous transition can be introduced as
increase in the depth of the network. Hence,
input via interpolating pixel color by
ResNet – 50 was chosen for this study and
achieving output resized image. In this
to handle the accuracy issues very
study, bicubic interpolation is employed for
effectively residual learning was
image resizing. Though this method is quite
expensive, it is robust than other code.
interpolation techniques, it yield better
results.

Gaussian bluer is one of the widely used


technique to decrease the noise and enhance
the image structures at varying scales. The
kernel value 3 was found to be suitable to
decrease the noise in the images.

3.2 Implementation of pre-trained CNN


The current study employed pre-trained
CNN known as ‘ResNet 50’ which is
installed in MatLab as an add-on. The
Resnet-50 has 50 layer classification
It has a fire module is the primary units of
network which has been already trained with
“Resnet” network. The intermediate layers
more than million image divided into more
make up the bulk of the CNN. These are a
than 1000 object classes using ImageNet
series of convolutional layers, interspersed
dataset. (Deng et al, 2019). This network has
with rectified linear units (ReLU) and
very good response time and also robust in
max-pooling layers. Following the these
its accuracy. This is the key reason for
layers, are 3 fully-connected layers.
selecting as the preferred network for this
The final layer is the classification layer
study. This neural network has an input
and its properties depend on the
image size of 224x224x3. It has initial first
classification task. In this example, the
layer of the network, which is also known
CNN model that was loaded was trained to
as. “network.Layers(1).InputSize” as per
solve a 1000-way classification problem.
Thus the classification layer has 1000
classes from the ImageNet dataset.
By employing “Resnet” network a dataset
of “Naturalimagedata” which has images
of dog, car, airplane, flower, fruit,
motorbike, person, car and cat are to be
categorized. The images of all the
categories from the “Naturalimagedata”
namely ‘dog’, ‘car’, ‘airplane’, ‘flower’,
‘fruit’ are displayed in MATLAB which
are depicted below.

3.3 Splitting data into test and train


The data was split into train and test with
67:33 split ratio. The splitting of these data
was done in a random manner in order to
avoid the biasing of any type. Initially,
stratified splitting with shuffling for all etc. As a result, the network is enriched with
features and labels with 67% for training unique features and characteristics
and rest 33% for validating and testing. representations for a different kinds of
Secondly, splitting the 30% dataset will be images. The network has an image input size
performed as stratified sampling into 20% of 224-by-224. For more pre-trained
for validation and 3% for testing. networks in MATLAB. The first layer
successfully captures the blobs and edges
3.4 Image pre-processing using
and output of this first layer is subsequently
Squeezenet
passed through other layers where image
The Resnet-50 neural network can process
passes deep through layers where higher
the images with the size 223 x 223.
level of features could be extracted. There
However, the dataset might not have all
are different types of activation functions
images rendered at same size. Hence,
which can be used to extract the features
manually resizing the images is heavily time
from the images. Softmax activation will be
consuming and also demanding the use of
used in this study to extract features. The
complex algorithm. This issue was get rid
minimum batch size will be 50 for this study
using ‘AugmentedImageDatastore’ which
which fit into the GPU. After the completion
converts the grayscale images into RGB.
of feature extraction process, SVM
This function could also change the size of
multiclass support vector machine is trained
images into 227 x 227. Hence, augmenting
using “fitcecoc” function with learning
the data for training, validation and testing.
parameters was set as linear and emloying a
3.5 Feature extraction and training fast stochastic gradient descent solver. This
SVM classifier is capable of the training process for SVM
ResNet-50 is a convolutional neural network which is speed up while it is working with
which comprises of 50 layers deep. The pre- the features extracted by the CNN network.
trained CNN is also available in the deep
learning tool box which is trained with more
than a million images from the ImageNet
database. The pre-trained network can
classify images into 1000 object categories,
such as animals, fruits, flowers, cars, bikes
4 Results

3.6 Performance evaluation The total number of images in fruit is higher


The classifier performance evaluation was than other image classes. There are totally
conducted using test dataset. It assess the eight image classes in the dataset.
performance accuracy of the classifier to
recognize and classify the images correctly
based on the class by comparing the features
of original image and predicted image to
accuracy of the classifier. The validation
was conducted using validation dataset and
assess whether classifier predicting the
category of the image class it belongs.
The accuracy of SVM classifier prediction
value is 0.98.

The accuracy of the classifier for predicted


is 0.98. This infers that this classifier
predicts the images 89% accuracy average.

The above graph illustrates the training


accuracy, validation accuracy and losses
during validation and training. The training
loss is higher than validation loss. However,
Training accuracy is also quite higher than
validation accuracy.

The model has few false positives than false


negatives using this model. The false
positive occurred for toy car and false
negative occurred for healthcare imaging
device instead of patient person. It is also
not surprising that motorbikes also images and Guassian blur also cropped and
commonly mispredicted as cars since both reduced the noise in the images and aided to
shares similarities in features such as metal train the images as well especially for
chassis, wheels etc. The false negatives classes with less images. Although machine
obtained for cars shows the exotic image of learning techniques like SVM has higher
cars and cars shot at top angles with other and ideal performance when used with pre-
objects and classified as non-cars. trained network ResNet-50 with 98%
accuracy for identification and prediction.

6. References
5. Conclusion
Liu, K., Liu, H., Chan, K., Liu, T., and Pei,
The current study analysed Natural Images
S. “Age Estimation via Fusion of
dataset and the model is not restricted to
Depthwise Separable Convolutional Neural
classify only those images in the dataset but
Networks,” 2018 IEEE International
also wide array of images. The model has
Workshop on Information Forensics and
superior accuracy in identification,
Security (WIFS), Hong Kong, Hong Kong,
classification that other computer vision
2018, pp. 1–8
applications as well. It is evident from the
study showed some of the issues with the Deng, J., et al., "Imagenet: A large-scale
machine learning algorithm in which the hierarchical image database." IEEE
model has to be pre-trained and conduct Conference on Computer Vision and
residual learning to handle the identification Pattern Recognition, 2009.
and classification with higher accuracy. The
Donahue, J., et al., "Decaf: A deep
ability of Augmenting Image data store
convolutional activation feature for generic
function in order to decrease the dimensions
visual recognition." arXiv preprint
has specifically assisted and fastened up the
arXiv:1310.1531, 2013
training process. Data augmentation
function helped to replace the issues with

You might also like