Literature Survey

2019 2nd International Conference on Issues and Challenges in Intelligent Computing Techniques (ICICT)
A Comparative Study of CNN and AlexNet for Detection

of Disease in Potato and Mango leaf
Sunayana Arya1, Rajeev Singh2

1,2
G.B. Pant University of Agri. & Tech. (GBPUAT), Pantnagar, Uttarakhand
Abstract— Deep Learning (DL) is a fastest growing and a a large dataset to overcome the overfitting problem and DL
broader part of machine learning family. Deep learning uses based model are able to extract relevant features by themselves
Convolutional Neural Networks (CNN) for image classification as [3]. DL is a sub-field of machine learning which consists of a
it gives the most accurate results in solving real- world problem. huge number of processing elements (neuron) that are highly
CNN has various pre-trained architecture like AlexNet,
interconnected and solves specific problems by working
GoogleNet, DenseNet, SqueezeNet, ResNet, VGGNet etc. In this
study, we have used CNN and AlexNet architecture for detecting together. Deep learning is successfully applied to various
the disease in Mango and Potato leaf and compare the accuracy domains like bioinformatics, agriculture, drug design etc
and efficiency between these architectures. The dataset [4][5].
containing 4004 images were used for this work. The images for
potato were taken from plantvillage website, while images for This paper is further divided into V sections. Section II
mango were collected from GBPUAT field location. The results discusses about related work, Section III gives a brief
show that accuracy achieved from AlexNet is higher than CNN description about methodology utilized, Section IV discusses
architecture. about implementation and Section V outlines the results.
Keywords - Image classification, Deep Learning, Convolutional II. RELATED WORK
Neural Network Architecture, AlexNet Architecture.
Erika Fujita et al. (2016) applied AlexNet architecture for
detection of disease in cucumber plant. In this study the dataset
I. INTRODUCTION containing 7520 images (seven diseased and one healthy class)
India is the world's first mango producer and ranks second in which is melon yellow spot virus (MYSV), zucchini yellow
potato production. Mango (Mangifera indica) known as ‘king mosaic virus ( ZYMV), cucurbit clorotic yellows virus
of fruit’ is a South Asian indigenous plant distributed almost (CCYV), cucumber mosaic virus (CMV), papaya ring spot
virus (PRSV) watermelon mosaic virus (WMV) , and kyuri
all the states of India and potato is the most common food
green mottle mosaic virus (KGMMV). In pre- processing step
crop which is used by various food industries. But there is a images are augmented by rotating, shifting and mirroring to
heavy loss in potato and mango production due to diseases enlarge the dataset and also it can identify the disease even it
despite their importance. photographed in a different condition [6]. Then images were
The common disease in mango is anthracnose which is caused resized to 224×224 pixel size, in RGB format. After the
by a collectrichum gloeosporioides fungal infection. The major training, the model can detect diseased or non- diseased leaves
factor causing anthracnose is high humidity, during the easily, as convolutional Neural Network obtains necessary
cropping season, whereas early blight is common diseases in information efficiently for classification and increases the
potato caused by alternaria solani fungus, which primarily accuracy.
affects leaves and stem. These diseases impact crops, resulting
Lucas G .Nachtigall et al. (2016) discussed AlexNet
in significant losses to farmers and agricultural productivity.
architecture to detect the disease in apple leaves by using a
Therefore it is very essential to recognize diseases soon so that
dataset containing 2539 images. In which 5 classes are diseased
the plants can be safe guarded resulting in increase in income such as potassium deficiency, Magnesium deficiency, Scab
and productivity. Initially, people used to follow the instruction damage, Glomerella stain, Herbicide damage and one class of
made by experts to identify the disease and prevent them, but healthy leaf. The CNN architectures are able to classify the
this process takes a long time to identify the diseases in a large image and accurately detect the disease in leaf after training [7].
field, also it is very expensive [1]. Previously many machines This architecture achieved 97.3% accuracy.
learning based models were introduced to detect the disease
which takes less time to train and detect the disease, but these Mohammed Brahimi et al. (2017) proposed AlexNet and
techniques also have some limitation. The machine learning GoogleNet Architecture to detect diseases in tomato leaves by
techniques use small data sets, which lead to overfitting; also using a large dataset of 14,828 images containing 9 classes of
diseased leave such as Tomato yellow leaf curl virus, tomato
these systems are not fully automated because in the process of
mosaic virus, target spot, spider mites, septoria spot, leaf mold,
feature extraction these techniques require the help of expert late blight, early blight and Bacterial spot. The model is
known as handcrafted characteristics [2]. These limitations divided into three phases. In the first phase, Pre-processing-
were overcome by Deep Learning (DL) techniques which uses Remove Background, color space conversion and image
978-1-7281-1772-0 ©2019 IEEE

resizing is done [8]. In the Second phase, Feature Extraction- GoogleNet and Cifar10 were used for detection of disease in
features like color, edge, and texture are extracted by using maize plants by using a dataset containing 500 images of
GLCM (grey-Level co-occurrence Matrix) automatically. And diseases such as Northern leaf blight, Southern leaf blight,
finally Classification where a new image is used to test to Rust, Brown spot, Round spot, Curvularia leaf spot, Gray leaf
determine the disease. The architecture achieves 99.18% spot, Dwarf mosaic and one class of healthy leaves. After data
accuracy. augmentation dataset images are turned to 3060, in which 2448
images are used for training and 612 images were used for
Yang Lu et al. (2017) utilized AlexNet and GoogleNet for the testing. The number of layers is less in GoogleNet as compared
classification in rice leaf. In this study, the dataset consists 500 to VGG-16 architecture [13]. After pre-processing and training
images of rice leaf, which is divided into 10 classes of the model can accurately predict which disease is present in the
diseased such as rice blast, rice false smut, rice brown spot, rice leaf. This model achieves 98.9% accuracy.
bakane disease, rice sheath blight, rice sheath rot, rice seeding
blight, rice bacterial leaf blight and rice bacterial. The reason
for developing Convolution Architecture is to provide easy to III. METHODOLOGY UTILIZED
use access system to detect early-stage infections by just Data Collection: A large dataset is required for DL based
clicking a diseased image, because it becomes difficult for a model for training and analysis the performance of algorithm.
large dataset [9]. This model achieves 95.48% accuracy. For classification, we were using potato and mango leaf
Aravind Krishnaswamy Rangarajan et al. (2018) represented a images. The potato images were collected from plantvillage
novel approach based on Convolutional Neural Network for (which is an open database of images published in 2016),
identification of disease in tomato leaf. The standard while mango leaves were self-acquired images captured in the
architecture Alexnet and VGG-16 were used for the detection real-time environment (one of the field location at GBPUAT
of disease in tomato leaf. After augmentation dataset location).
containing 13,262 images (six diseased and one healthy class) The images in the dataset are grouped in 4 classes to
were collected from plantvillage. Transfer learning is used in differentiate between the healthy and diseased class. There are
pre-training architecture as it is the application of pre-trained 2 classes that represent diseased leaves and other 2 classes
architecture [10]. it is concluded that the accuracy can be represent healthy leaves. Only RGB (Red, Green, Blue)
increased by modifying a number of images because it directly images are used for the considered work. The steps of the
influences the performance of the model, and by setting methodology are shown in Figure 1.
weight, bias learning rate, minibatch size. This model achieves
97.49 accuracy.
Serawork Wallelign et al. (2018) introduced feasibility of
convolutional neural network in plant disease detection. LeNet
architecture was used as a CNN architecture for building a
classifier, using a dataset of 12,673 images (3 diseased and 1
healthy class) diseases such as Septorial leaf blight, Frogeye
leaf spot, Downy Mildew Collected Collected from plant
village. Data augmentation is used to enhance dataset, also it is
useful to remove overfitting firm the system [11] . Then the
dataset is divided into three parts first is grey scale, second is
color, and third is segmented leaves, but after classification
color images achieves the highest accuracy, which indicates
that using color images is useful to extract important features.
Prajwala TM et al. (2018) proposed the use of LeNet
architecture for identifying and detecting disease in tomato
leaves. A dataset contains 18160 images (10 diseased and 1
healthy class) of all kind of disease that effect the tomato crop
like Bacterial leaf spot and Septorial leaf spot, Yellow Leaf Figure 1: Methodology Utilized
Curla. The methodology of the model is divided into three A. Pre-processing
major parts which are data acquisition, pre-processing and
classification [12]. In data, acquisition images are collected Our dataset contains distinct formats with varying resolution
from plant village in RGB format. In a second step, the images and quality as some of the images were downloaded from
are resized, so that all pixel value lies in same range by using plantvillage and other images captured using the camera. We
mean and standard deviation, also values for normalization and resize the images 150×150 for CNN and 227×227 for AlexNet
termination were appropriate, this is to make computation architecture to get better feature extraction, reducing the
feasible. And finally, the classification step where the model training time and to obtain consistency. The pre-processed
accurately define whether the image is diseased or not, by images of potato and mango are shown in Figure 2 and Figure
giving it a new image. The model achieves 95.95 % accuracy. 3.
Xihai Zhang et al. (2018) represented identification and
diagnosis of diseases in the maize leaf. Two CNN architectures
B. Data Augmentation modification and retraining with our own training images, the
Data Augmentation is performed in the images in order to considered models give more precise outcomes.
increase the size of dataset. It consists of various All the experiments are performed on a 64-bit operating
transformation techniques like affine transformation, system, Intel Core i3-6006U, CPU processor with 4 GB RAM
perspective transformation, image rotation and intensity and 500 GB hard disk.
transformation (color, brightness, and contrast) [14]. Finally,
IV. IMPLIMENTATION
the database containing 4000 images has been created,
example augmented images are shown in figure 4 & Figure 5. We have used two standard deep learning architectures: CNN
and AlexNet for the purpose of leaf disease detection in potato
and mango leaves.
Convolutional Neural Network (CNN): CNN architecture

consists of convolutional (conv) layer, ReLU (Rectified linear
unit) layer, pooling layer, and a pair of fully-connected (FC)
Figure 2: An example from plantvillage potato layers.
Diseased leaf (Early blight) and healthy leaf Convolution (Conv) Layer: The first layer of CNN
Architecture where most of the computation is accomplished.
Conv layers act as feature extractors and it consists a
collection of feature map [17] as they discover the feature
descriptions of their input image. This conv layer performs
convolution operation along with ReLU followed by pooling
layer. The shape of CNN input image has batch size, channel
(for RGB =3, and for greyscale =1), height and width.
RELU: RELU layer also known as activation layer. RELU
Figure 3: An example from own dataset of mango layer is implemented after each and every convolution layer as
diseased leaf (Anthracnose) and healthy leaf it conducts non-linearity in the scheme. RELU layer changes
all negative values to 0 and reduces the vanishing gradient
problem to train the system faster.
Pooling layer: The pooling layer is used to decrease the spatial
resolution of the feature maps, as in feature map the neurons
shares their weight. Average pooling, max pooling, multiscale
order less pooling and stochastic pooling are the common
operation of pooling layer.
Figure 4: (a) Potato Augmented images Flatten Layer: Flatten layer is used between the convolutional
and fully connected layer, which transforms a 2-Dimensional
data into a single feature vector.
Fully Connected Layer: In this layer neurons are fully
connected with previous layers neuron. The fully connected
layer takes feature vector as input and used them to classify an
input image using softmax function.
Figure 5: (b) Mango Augmented Images An image of size 150×150 is taken as input to the first
convolution layer along with ReLU and filters of 3×3 size,
C. Classification then the pooling layer will perform down- sampling with 2×2
Initially, the entire database was split into three datasets: the filters which reduces the feature map size. The output of this
training set, validation set and the test set. By dividing the layer used as input to the second convolution layer, which
4000 images randomly so that 80% of them formed the convolves the image with filter size 3×3 shown in figure 5.
training set and 20% formed the test and validation set. The The images were further modified and lead to fully connected
validation set is separated from testing set but considered as a layer that produces the output. The main parameter used in the
part of training dataset, which is generally used to select convolutional network, as described below:
parameters and to remove overfitting [15].
From previous work it is observed that for classification 1. Depth: Depth describes the number of filters used in
purpose utilization of grey scale OR black & white images and the convolution process to operate the neurons in the output
for segmentation the leaves from the background of the layer.
images does not increase the accuracy, hence this step is not 2. Stride: stride indicated the quantity of pixels that we
considered in classification process [16]. In order to detect and used to move across the input image with filter matrix.
compare the accuracy of the model, we used transfer learning, 3. Feature map: Feature map shows the output of one
which is used in pre- existing neural networks. After kernel applied to the previous layer, the specified filter is
shifted one pixel at a moment across the image, which results
in the activation of the neurons in each place and the output is
stored in the feature map.
4. Zero-padding: Zero padding is the method of adding
zero to the input matrix. It is used when the input volume
needs to be maintained in the output volume.
Figure 5. Convolutional Neural Network
AlexNet Architecture: AlexNet architecture was developed

Alex krishevesky et al [18], which won the ILVRC (ImageNet
Large Scale Visual Recognition Challenge) in 2012. AlexNet
architecture consists of 5 convolutional (conv) layer, 3 pooling
Figure 6: AlexNet Architecture
layer (Pool) which is followed by three full connected (FC)
layer. To reduce overfitting problem these fully connected
Table-1 shows hyperparameters used in both CNN and
layers are used with dropout layer. Convolution layer uses
AlexNet architecture namely SGD, base learning rate,
number of filters to convolve the image, and generating
momentum and batch size. In DL hyperparameters are the
feature maps. Rectified linear unit (ReLU) layer is used along
variable used to describe the hierarchy of network structure
with convolutional layer as it performs non-linear operation
and how it is trained
and converts all negative value to zero. The task of pooling
layer is to reduce the spatial dimension (feature map) which is
Solver type- SGD algorithm is used in both architecture as
derived from previous layer.
solver type to overcome the limitation of vanishing gradient.
In this architecture first conv1 layer performs operation along
The main aim of this algorithm is to stops decreasing objective
with Rectified linear unit and max pooling that uses 96 filters
function (defined as a sum of different functions) [19].
of 11×11 size to filter an input image of 227×227 size with a 4
Learning rate: it indicates the step size used to process training
pixels stride. RELU (Rectified linear unit) or activation
faster during training, and it is very essential to select
function is applied to all output layers of convolution
appropriate value, because if the value is large it may starts
including fully connected layers because during the
diverging and if the value is too small then it will take a long
convolution process it performs non-linear operations and
time to converge
changes all negative activation values to 0. The first conv1
Momentum: It is used with SGD algorithm that moves the
layer output is filtered by a conv2 layer with 256 kernels with
average of gradient instead of current real value of the
a size of 5×5. The conv3 and conv4 layers have 384 kernels of
gradient.
3×3 size whereas the fifth convolutional layer has 256 kernels
Batch size: The batch size represents the number of images or
with 3×3 size. Alexnet architecture is illustrated in figure 6.
samples which will pass to the network at a time.
Hyperparameters CNN AlexNet
Solver type SGD SGD

Base Learning Rate 0.001 0.001
Momentum 0.9 0.9
Batch Size 32 32
Table -1 Hyperparameters used for training experiment
V . EXPERIMENTAL RESULTS
For training & validation we used 3523 images and for testing
481 images were used. The training accuracy of CNN is
93.06% and AlexNet training accuracy is 99.75% (Figure 7
and Figure 8). To evaluate the test classification efficiency of
the considered model confusion matrix is used, (Figure 9 and
Figure 10). Confusion matrix contains the values of true
positive, true negative, false positive and false negative. The Figure 9: Confusion Matrix of CNN
higher diagonal values in the confusion matrix show the
model's accurate predictions. The results are in Table-2 and
Table-3 which describes accuracy, precision and recall derived
from confusion matrix obtained by CNN and AlexNet
Architectures. The result represents that AlexNet architecture
achieves highest accuracy (98.33%), as compare to CNN
(90.85%).
Figure 10: Confusion matrix of AlexNet
Class Precision Recall Accuracy

Figure 7: CNN graph
Class 0 0.97 0.73
Class 1 0.80 0.98
90.85%
Class 2 0.97 0.92
Class 3 0.92 0.97
Table-2 Performance Matrices of CNN
Figure 8: AlexNet graph

will be very useful for farmers as it will be effective as well as
Class Precision Recall Accuracy less time consuming for large fields. During this work it is
also realized that DL architectures can identify important and
Class 0 1.00 0.94 insignificant features from a set of images.
Class 1 0.94 0.99

98.33% REFRENCES
[1]Arpita Patel, Mrs. Barkha Joshi, “A Survey on the Plant Leaf Disease
Class 2 0.99 1.00 Detection Techniques”, International Journal of Advanced Research in
Computer and Communication Engineering, Vol. 6, Issue 1, ISO 3297:2007
Class 3 1.00 1.00 [2] Nilay Ganatra and Atul Patel, PhD, “A Survey on Diseases Detection and
Table-3 Performance Matrices of AlexNet Classification of Agriculture Products using Image Processing and Machine
Learning”, International Journal of Computer Applications (0975 – 8887)
Volume 180 – No.13, January 2018.
After classifying and comparing deep learning based work, the
[3] Andreas Kamilaris, Francesc X. Prenafeta-Boldú, “Deep Learning in
result of this evaluation shows clearly (Figure 9 & Figure 10) Agriculture: A Survey, Computers and Electronics in Agriculture, April 201
that we can increase the accuracy by using AlexNet [4] R. Vargas, A. Mosavi, L. Ruiz, “Deep Learning: A Review, Advances in
architecture. AlexNet architecture achieves the most precise Intelligent Systems and Computing”, August 2017.
accuracy 98.33%, while CNN architecture achieves 90.85% [5] Ahmed Ali Mohammed Al-Saffar, Hai Tao, Mohammed Ahmed Talab,
accuracy (Table-2 & Table-3). It is observed that AlexNet “Review of Deep Convolution Neural Network in Image Classification”,
International Conference on Radar, Antenna, Microwave, Electronics, and
architecture takes much more time to train (avg time taken-50 Telecommunications, 2017.
min) as compare to CNN (avg time taken- 20 min), after [6] Erika Fujita, Yusuke Kawasaki, Hiroyuki Uga, Satoshi Kagiwada,
training both of the model is able to predict whether the leaf is Hitoshi Iyatomi, “Basic Investigation on a Robust and Practical Plant
diseased or healthy. Diagnosis system”, 15th IEEE International Conference on Machine Learning
and Applications, 2016
AlexNet deep learning model is an effective measure of the
[7] Mohammed Brahimi, Kamel Boukhalfa & Abdelouahab Moussaoui,
classification. The result shows that the model performs well “Deep Learning for Tomato Diseases: Classification and Symptoms
on the dataset and can be used as a classifier for the potato and Visualization”, Applied Artificial Intelligence An International Journal ISSN:
mango leaf diseases. It observed in the experiment that the 0883-9514 (Print) 1087-6545 (Online), 2017
model has stabilized around 15 epochs and the measurements [8] Yang Lu, Shujuan Yi, Nianyin Zeng, Yurong Li, Yong Zhang,
“Identification of rice diseases using deep convolutional neural networks” ,
in the last 12 epochs demonstrate no important improvement. Neurocomputing 267 (2017) 378–384.
It is also visible that implementation process requires [9] Aravind Krishnaswamy Rangarajan, Raja Purushothaman,Aniirudh
minimum hardware requirements as the model can be Ramesh, “ Tomato crop disease classification using pre-trained deep learning
performed on CPU without additional hardware, unlike most algorithm”, ELSEVIER, International Conference on Robotics and Smart
Manufacturing (RoSMa2018)
of the neural networks having high computational resource [10] Prajwala TM, Alla Pranathi, Kandiraju Sai Ashritha, Nagaratna B.
requirements such as GPU that speedup the training process. Chittaragi*, Shashidhar G. Koolagudi, “Tomato Leaf Disease Detection using
This is due to less number of training parameters as the both Convolutional Neural Networks”, Proceedings of 2018 Eleventh International
CNN and AlexNet model uses less filter sizes and smaller Conference on Contemporary Computing (IC3), 2-4 August, 2018
[11] Waseem Rawat, ZenghuiWang, “Deep Convolutional Neural Networks
train size images. Therefore these model, gives an easy and for Image Classification: A Comprehensive Review”, unpublished.
efficient way to solve the issue of detecting plant disease with [12] Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton “ImageNet
comparative outcomes. Classification with Deep Convolutional Neural Networks”, International
Conference on Neural Information Processing Systems - Volume 1, June 2016
CONCLUSION [13] Md Zahangir Alom, Tarek M. Taha, Chris Yakopci, Stefan Westberg,
Paheding Sidike, Mst Shamima Nasrin, Brian C Van Essen, Abdul A S.
A comparative study of the disease classification between Awwal, and Vijayan K. Asari, “ A history Began from AlexNet: A
CNN and AlexNet architectures on potato and mango leaves is Comprehensive survey on Deep Leanrning Approaches, unpublished.
done in this paper. In results it is observed that AlexNet [14] Neena Aloysius and Geetha M , “A Review on Deep Convolutional
Neural Networks”, International Conference on Communication and Signal
architecture has high precision and high recall as compare to Processing, April 6-8, 2017.
CNN architecture. Precision defines predictive positive out of [15] S.Arivazhagan, S.Vineth Ligi, “Mango Leaf Diseases Identification
actual positive, while recall defines the actual positive and Using Convolutional Neural Network”, International Journal of Pure and
high scores for both show that the classifier is returning Applied Mathematics” vol.120, No. 6,11067-11079, ISSN: 1314-3395,
August 2018.
accurate results. AlexNet architecture requires a long time to
[16] Myeongsuk Pak, Sanghoon Kim “A Review of Deep Learning in Image
train, because of having more number of layers as compared Recognition”, unpublished.
to CNN Architecture; also AlexNet architecture provides [17] Neena Aloysius and Geetha M , “A Review on Deep Convolutional
better results to classify diseased leaf against the healthy leaf. Neural Networks”, International Conference on Communication and Signal
The future work of this study will be development of a mobile Processing, April 6-8, 2017.
application that will detect the disease and is helpful for the [18] Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton “ImageNet
farmers. The farmers will take the pictures of diseased leaves Classification with Deep Convolutional Neural Networks”, International
Conference on Neural Information Processing Systems - Volume 1, June 2016
and then the mobile application will predict the disease and [19] Waseem Rawat, ZenghuiWang, “Deep Convolutional Neural Networks
the solution regarding the disease will be shown to them. This for Image Classification: A Comprehensive Review”, unpublished.

Literature Survey

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Literature Survey

Uploaded by

Copyright:

Available Formats

2019 2nd International Conference on Issues and Challenges in Intelligent Computing Techniques (ICICT)

A Comparative Study of CNN and AlexNet for Detection

Sunayana Arya1, Rajeev Singh2

978-1-7281-1772-0 ©2019 IEEE

Convolutional Neural Network (CNN): CNN architecture

Figure 5. Convolutional Neural Network

AlexNet Architecture: AlexNet architecture was developed

Solver type SGD SGD

Figure 10: Confusion matrix of AlexNet

Class Precision Recall Accuracy

Class 1 0.80 0.98

Class 3 0.92 0.97

Table-2 Performance Matrices of CNN

Figure 8: AlexNet graph

Class 1 0.94 0.99

You might also like