You are on page 1of 5

Disease Detection on the Leaves of the Tomato

Plants by Using Deep Learning


Halil Durmuú, Ece Olcay Güneú, Mürvet KÕrcÕ
Department of Electronics and Communication Engineering
Istanbul Technical University, Electrical and Electronics
Engineering Faculty, Maslak, Istanbul, Turkey
durmushalil@gmail.com, ece.gune@itu.edu.tr, ucerm@itu.edu.tr

Abstract—The aim of this work is to detect diseases that occur on Plantvillage [4] dataset is used in this work, only diseases
plants in tomato fields or in their greenhouses. For this purpose, included in the dataset are used. These are bacterial spot, early
deep learning was used to detect the various diseases on the leaves of blight, late blight, leaf mold, septoria leaf spot, spider mites,
tomato plants. In the study, it was aimed that the deep learning target spot, mosaic virus, and yellow curl virus. Also healthy
algorithm should be run in real time on the robot. So the robot will
images are included.
be able to detect the diseases of the plants while wandering manually
or autonomously on the field or in the greenhouse. Likewise, diseases Precision farming can be used to fight against these diseases
can also be detected from close-up photographs taken from plants by and pests. By using precision farming the use of these chemical
sensors built in fabricated greenhouses. The examined diseases in or costly methods can be reduced. In precision farming,
this study cause physical changes in the leaves of the tomato plant. information technologies such as sensor networks, remote
These changes on the leaves can be seen with RGB cameras. In the sensing, robotics are used in agricultural fields. For precise
previous studies, standard feature extraction methods on plant leaf agricultural applications, such as spraying medicine to only
images to detect diseases have been used. In this study, deep learning affected area, it is necessary to determine the region where the
methods were used to detect diseases. Deep learning architecture plant diseases occur and spread. Operators, static stations,
selection was the key issue for the implementation. So that, two
sensor networks, drones and mobile robots are used for
different deep learning network architectures were tested first
AlexNet and then SqueezeNet. For both of these deep learning detection in precision farming. The biggest disadvantage of
networks training and validation were done on the Nvidia Jetson these tools is that they cannot detect the field as an expert. In
TX1. Tomato leaf images from the PlantVillage dataset has been order to be able to do precision farming, it is necessary for these
used for the training. Ten different classes including healthy images tools to be able to process and make inferences from collected
are used. Trained networks are also tested on the images from the data like an expert in the fields or in the greenhouses.
internet. Image processing is widely used precision agriculture for
the images taken by remote sensing devices or field
Keywords—precision farming; deep learning; plant diseases; instruments. In the literature, image processing was used widely
mobile computing;
in the agriculture. For example, image processing used for the
I. INTRODUCTION weed detection and the fruit grading [5], for the detecting,
quantifying and classifying plant diseases [6], and for the
Tomato is one of the most produced crop all around the phenotyping of plant disease symptoms [7]. Recently, deep
world. According to the statistics obtained from the Food and learning is used for the detection [8]. Mohanty et al. in their
Agriculture Organization of the United Nations, approximately work utilized deep learning to detect diseases from the leaves
170.750 kilotons of tomato produced in the year 2014 in all of the various plants [8].
around the world [1]. According to Turkish Statistical Institute, Deep learning is the state of the art machine learning
Turkey has produced 12.600 kilotons of tomato in the year 2016 method, that utilizes artificial neural networks (ANNs) with
[2]. These production quantities are affected by the pests and hidden layers. Before the deep learning trend, classification
diseases that occur in tomato plants. To prevent these diseases tasks were done by using sematic features. These features can
and pests, costly methods and various pesticides are used in the be corners, edges, shapes or etc. Afterwards these features are
agriculture. The widespread use of these chemical methods used in various classifiers such as support vector machines or k
harms plant health and human health as well as affects the nearest neighbors.
environment negatively. Also these methods, increases the In this work, deep learning is used to detect diseases from
production costs. the leaves of the tomato plants. Two different deep learning
The diseases and pests effects the leaflets and leaves, the network architectures AlexNet [9] and SqueezeNet [10] are
roots, the stems, and the fruits of the tomato plants [3]. trained and tested on the tomato images of the Plantvillage
Phonological changes on the leaves and leaflets on the tomato dataset. Both training and testing are done on the mobile
plants can be abnormal growth, discoloration, spots, damages, supercomputer Nvidia Jetson Tx1.
wilting desiccation, and necrosis [3]. In this study, diseases and
pests affecting leaves and leaflets were examined. Since the
T.R. Ministry of Food, Agriculture, and Livestock
I.T.U. TARBIL Environmental Agriculture Informatics Applied Research
Center
II. MOTIVATION
Motivation of this work is to detect plant diseases using
robot platforms. Diseases and pests that are to be detected,
harms the leaves of the plants. These harmful effects change the
physical appearance of the leaf so that the cause of the harm can
be detected from the images taken from the cameras. In this
case, there is a need for a mobile computer and a standard RGB
camera for disease detection using a robot. Since the disease
detection is done on the robot, mobile computer must be
powerful enough to do computational task. Also mobile
computer has additional tasks such as autonomous navigating,
driving motors, image processing. That’s why, detection code
must be lightweight in terms of computation cost and RAM Fig. 1. a) Deep neural network, b) Schematic representation of a perceptron.
size.
Recent machine learning trend deep learning achieves high iteration step, all training data passes through the network.
accuracy in classification tasks. Application of deep learning Since it is supervised learning, loss is calculated between
methods started develop in the interdisciplinary area. Mohanty ground truth and output of the network. Loss is the input of the
et al. trained deep learning on the Plantvillage dataset [8]. This optimization algorithm. Optimization algorithm updates the
study was promising in the ways of mobile computing. In their weights according to this loss. One of the most used
study, they have used all Plantvilage dataset and tested AlexNet optimization algorithm is Stochastic Gradient Descent (SGD)
and GoogleNet [11]. AlexNet model size is 227.6 Mbyte and algorithm. Briefly, SGD minimizes the loss through iterations
GoogleNet model size is 51.1 Mbyte, for the mobile by updating means according to gradient.
applications both of this networks have relatively big size. For the image like high dimensional and relatively big data
Updating the network will be an issue, also RAM requirements there is specialized network architecture called convolutional
and computational cost of both of these networks are relatively neural networks (CNNs). CNNs are used for the first time to
high. In this work, recently introduced SqueezeNet is used. detect handwritten digits from the documents [13]. CNNs are
Model size of the SqueezeNet is 2.9 Mbyte and the accuracy consisting of convolutional layers, pooling layer, activation
almost remains the same. function layers, dropout layers, and fully-connected layers.
Convolutional layers hold the results of the convolution of
III. METHOD
filters or kernels with previous layer. These filters or kernels
A. Deep Learning consists of weights and biases to be learned. Purpose of the
As LeChun et al. defined the deep learning is a optimization function is generating these kernels, that
representation learning method at their survey [12]. Here represents the data without error. Pooling layers are used for the
representation learning means that algorithm finds the best way down sampling to lower neuron size and reduce overfitting.
to represent data. Algorithm finds this representation through Most used pooling type is max pooling, that takes the maximum
optimizations instead of semantic features. With this learning value in the pooling window. Activation function layers are
procedure, there is no need to do feature engineering. Because used to add non-linearity to the network. Most used activation
features are automatically extracted. function is called ReLU. Dropout layers are used to prevent
Mathematically, deep learning is artificial neural network overfitting. Dropout layers, randomly shuts down the neurons
with hidden layers. At the Fig. 1a, fully connected neural in the network. Fully connected layers are used to calculate the
network with two hidden layers is shown and at the Fig. 1b, class probabilities or scores. Result of the fully connected layers
schematic representation of perceptron is shown. Perceptrons can be the input of the classifier. Well known and used classifier
are inspired from the living neuron cells. Each perceptron has is softmax classifier.
multiple inputs (at the Fig. 1b, only 5 inputs are shown) and At the Fig. 2, a well-known CNN called AlexNet is shown
activation function. Mathematical formulation of the neuron is [9]. There are many kernels for each layer and every kernel is
given at the (1). Activation function makes the answer of the initialized randomly at the beginning of training. Kernels are
neuron non-linear without activation functions network is only learned through the optimization. Depth dimension of the
linear combinations of the inputs. There are a lot of activation
functions in the literature such as sigmoid, tanh, ReLU. ReLU
is one of the most used activation function and it is found
quicker to train [9].
‫ ݕ‬ൌ ݂ሺ‫ݔ‬ଵ ‫ݓ‬ଵ ൅ ‫ݔ‬ଶ ‫ݓ‬ଶ ൅ ‫ݔ‬ଷ ‫ݓ‬ଷ ൅ ‫ݔ‬ସ ‫ݓ‬ସ ൅ ‫ݔ‬ହ ‫ݓ‬ହ ሻ (1)
At the Fig. 1a, each circle is a perceptron. At the beginning
stage of the training weights of each perceptron is initialized.
Initialization can be sampled from Gaussian with bias. In every Fig. 2. Convolutional neural network.
convolutional layers are decided by the number of kernels at the
previous layer. This means each kernel maps the previous layer
to new space through the convolution.
There are many different network architectures. In this
work, AlexNet and SqueezeNet are used for the disease
detection. In the continuation of this section these two networks
are described.
1) AlexNet
In 2012, Krizhevsky et al. entered the image classification
competition, called ImageNet, with their network architecture
called AlexNet and they won with great distinction. AlexNet
became the starting point of the new trend CNNs. In AlexNet
they have used ReLU, local response normalization,
overlapping pooling and trained the network on the multiple
GPUs. In the following years, winners of the ImageNet have
used the deep learning. One of the revolutionary point of the
AlexNet is training on GPU. GPUs with many cores have
stepped up the training speeds in very big data. For instance,
ImageNet has 1.2 million high resolution images at 2012.
AlexNet architecture can be seen at the Fig. 3, on the left.
AlexNet has five convolutional layers every convolutional layer
is each followed by ReLU layer. Normalization layers are
added to help generalization [9]. Features at the convolutional
layer 5 are fed to fully connected network after pooling. As told
previously, fully connected layers calculates the class
probability. Fully connected layers has dropout layers to reduce
overfitting. Last full connected layer (FC8) has the class
probabilities of input image. These probabilities are classified
by softmax classifier.
2) SqueezeNet
After the publication of AlexNet, many CNN architectures
have been proposed. Primary aim of these methods is increasing
the accuracy. Apart from these studies, compression of deep
learning networks or models gain importance. Because, may be
one of the biggest deep learning applications will be on the
mobile devices such as, phones, cameras, autonomous devices.
One of the aims of this work is detecting diseases on mobile
devices or computers. Therefore, first aim of this work is
searching for network architecture that is lightweight in terms
model size. For this purpose, SqueezeNet has been selected.
SqueezeNet is proposed by Iandola et al. [10]. SqueezeNet
is a good display of network architecture engineering. To
reduce the model size SqueezeNet built on three design
strategy. That are reducing filter size, decreasing the input
channels, and down sampling late in the network [10].
SqueezeNet employs new fire module Fire module consists of Fig. 3. Visualization of AlexNet (left) and SqueezeNet v1.1 (right).
squeeze layer with 1x1 filters (decreasing input channel of 3x3
filters) and expand layer with combination of 1x1 and 3x3
filters (reducing the filter size). Squeeze and expand layers are B. Dataset
followed by the ReLU layers. PlantVillage dataset is used in this work. PlantVillage data
SqueezeNet v1.1 architecture is shown at the Fig. 3 on the set contains 54.309 labelled images for 14 different crops. In
right. In SqueezeNet v1.1, number of kernels in fire modules this work only, images of tomato leaves are used. There are ten
and convolutional layers are lower in size than the original different classes for tomato images including healthy ones.
publication. As seen from the Fig. 3, network starts with Sample images from the dataset are shown at the Fig. 4.
convolutional layer, followed by eight fire modules, followed Mohanty et al. tried different dataset combinations in their work
by convolutional layer, and softmax classifier. [8]. In their work, AlexNet performed with accuracy 0.9722
when training from scratch, segmented images, and %80 train
Fig. 4. Sample images from PlantVillage dataset (First row left to right labelled as bacterial spot, early blight, healthy, late blight, leaf mold; Second row left to
right labelled as septoria leaf spot, spider mites, target spot, mosaic virus, yellow leaf curl virus).
%20 test set division. The reason for the selection of segmented time takes longer and training batch sizes are smaller. Using
images is there will be upcoming work for the extracting images small batch size decreased the accuracy, that will be shown at
from the complex background. In that work, extracted leaf the results.
images won’t have any background.
IV. RESULTS
C. Hardware and Software Environment
Trained models are tested on the validation set with using
Training and test are done on the Nvidia Jetson Tx1. Nvidia GPU. The results can be seen at the TABLE I. Accuracy results
Jetson Tx1 has 256 CUDA cores, quad core ARM processor, are the results from the Caffe tests. AlexNet performed slightly
4GB RAM, 16 GB eMMC and other peripherals. Board is better than SqueezeNet. The reason results at the [8] are not
shown at the Fig. 5. On the board Ubuntu distribution of Nvidia obtained is in the training phase batch size is selected 20 due to
is working. less RAM size . This lowered the accuracy.
Convolutional neural networks are trained on the deep It is clearly seen that SqueezeNet model is almost 80 times
learning framework called Caffe [14]. Caffe is written in C++ smaller than AlexNet. Size of the models are taken directly
and has Python bindings to ease of use. In Caffe, network from Caffe model files. In SqueezeNet paper, it is said that
architecture is defined at the proto files. These files are trained model size is lower than 0.5Mbyte [10]. Cause of this difference
or tested with prebuilt programs on the data. Dataset is defined can be the Caffe format.
as lmdb file. Training parameters are changed at solver file. Inference time results are taken from the Python API.
Network architecture of AlexNet is used from the Caffe Network setting up time doesn’t include in inference time; only
library and SqueezeNet v1.1 is downloaded from the Github forward passing is shown at the table. Inference time is varying
link of the authors. Usually, training is done on the workstations in different tests by 5 milliseconds.
with GPU or GPU clusters. In this work, training is done on the
board. This constraint brings limitations to training. Training V. CONCLUSION
It is shown that SqueezeNet is good candidate for the mobile
deep learning classification due to its lightweight and low
computational needs. Another advantage of using smaller
network is the updating model. When the mobile application is
updated through the mobile communication, it will cost lower
data and updating speed will be higher.

TABLE I Comparison between AlexNet and SqueezeNet v1.1.

Network Name AlexNet SqueezeNet


Accuracy on Test
0.9565 0.943
Set
Model Size 227.6 Mbyte 2.9Mbyte
Inference Time ~150ms ~50ms
Fig. 5. Nvidia Jetson Tx1.
On board training brings the capability of training on the [4] D. P. Hughes and M. Salathé, "An Open Access Repository of Images on
Plant Health to Enable the Development of Mobile Disease Diagnostics,"
field. If there is a new disease detected or are new images for eprint arXiv:1511.08060, 2015.
the trained diseases, deep learning model can be trained on the [5] A. Vibhute and S. K. Bodhe, "Applications of Image Processing in
field. Agriculture: A Survey," International Journal of Computer Applications,
The board shown at the Fig. 5, is used as the brain of the vol. 52, no. 2, pp. 34-40, 2012.
robot in the previous work [15]. In that study, board is used to [6] J. G. B. Garcia, "Digital Image Processing Techniques for Detecting,
gather data and generate RGB-D map of the greenhouses. At Quantifying and Classifying Plant Diseases," SpringerPlus, 2013.
that moment, board has navigation, control, and data acquisition [7] A. M. Mutka and R. S. Bart, "Image-Based Phenotyping of Plant Disease
Symptoms," Frontiers in Plant Science, vol. 5, pp. 1-8, 2015.
tasks. With this work, real time disease detection capability
[8] S. P. Mohanty, D. Hughes and M. Salathe, "Using Deep Learning for
added to robot. In the future, there will be a study on leaf Image-Based Plant Disease Detection," eprint arXiv:1604.03169, 2016.
extraction from the complex background to complete the [9] A. Krizhevsky, I. Sutskever and G. H. E. Hinton, "ImageNet
system. Classification with Deep Convolutional Neural Networks," in Advances
in Neural Information Processing Systems, 2012.
ACKNOWLEDGMENT [10] F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally and K.
Keutzer, "Squeezenet: AlexNet-Level Accuracy with 50x Fewer
This research was funded by T.R: Ministry of Food, Parameters and <0.5MB Model Size," eprint arXiv:1602.07360v4, pp. 1-
Agriculture and Livestock, and ITU TARBIL Environmental 13, 2016.
Agriculture Informatics Applied Research Center. [11] C. Szegedy, W. Liu, Y. Jia, P. Sarmanet, S. Reed, D. Anguelov, D. Erhan,
V. Canhoucke and A. Rabinovich, "Going Deeper with Convolutions,"
REFERENCES eprint arXiv:1409.4842, pp. 1-12, 2014.
[1] "Food and Agriculture Organization of The United Nations," [Online]. [12] Y. LeChun, Y. Bengio and G. Hinton, "Deep Learning," Nature, vol. 521,
Available: http://www.fao.org/home/en/. [Accessed 2017].I.S. Jacobs and pp. 436-444, 2015.
C.P. Bean, “Fine particles, thin films and exchange anisotropy,” in [13] Y. leChun, L. Bottou, Y. Bengio and P. Haffner, "Gradient-Based
Magnetism, vol. III, G.T. Rado and H. Suhl, Eds. New York: Academic, Learning Applied to Document Recognition," Proc. Advances in Neural
1963, pp. 271-350. Information Processing Systems, pp. 196-404, 1990.
[2] "Turkish Statistical Institute," [Online]. Available: [14] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S.
http://www.turkstat.gov.tr/Start.do. [Accessed 2017]. Guadarrama and T. Darrell, "Caffe: Convolutional Architecture for Fast
[3] D. Blancard, Compendium of Tomato Diseases and Pests, Second ed., Feature Embedding," eprint arXiv:1408.5093, pp. 1-4, 2014.
Manson Publishing Ltd, 2012. [15] H. Durmuú, E. O. Güneú and M. KÕrcÕ, "Data Acquisition from
Greenhouses by Using Autonomous Mobile Robot," in Agro-
Geoinformatics 2016 , Tianjin, 2016.

You might also like