Professional Documents
Culture Documents
Smote-DL: A Deep Learning Based Plant Disease Detection Method
Smote-DL: A Deep Learning Based Plant Disease Detection Method
Abstract: In the due course of time, computer vision, most fundamental aspect in plant disease detection which is
machine learning and deep learning has been widely used to reducing the number of false predictions that might lead to
detect disease in the plant leaf. Most works done in this area misdiagnosis. The result of which could be large scale crop
focuses upon coming up with accurate models but does not destruction.Our main contributions in this paper are:
focus on the false predictions which could be a serious cause.
Misdiagnosis of the plant leaf could cause large scale crop 1. We have done vast data-pre-processing and
destruction. We used a publicly available dataset which visualization to understand the data and then used
contained four categories of images belonging to Apple Plant- SMOTE to handle our imbalanced dataset.
2021 6th International Conference for Convergence in Technology (I2CT) | 978-1-7281-8876-8/21/$31.00 ©2021 IEEE | DOI: 10.1109/I2CT51068.2021.9417920
Authorized licensed use limited to: Linkoping University Library. Downloaded on June 20,2021 at 11:43:51 UTC from IEEE Xplore. Restrictions apply.
identification methods used in agriculture which is direct and multiple disease detection in a single leaf using neural
indirect [9]. P Moghadam et.al in their paper used networks which will now identify such leaves as a separate
hyperspectral imaging (VNIR and SWIR) and machine category and not detect them as any other disease.
learning techniques for the detection of Tomato Spotted Wilt
Virus (TSWV) in capsicum plants. They received an
accuracy of 90% in their trained model[10].Vijai Singh et.al
in their work found out unhealthy part plant leaf using image
processing and genetic algorithm. They used leaf samples
that are in RGB format and applied image acquisition and
pre processing on the image sample then they segmented the
components using genetic algorithm and obtained useful
segments to classify the leaf disease. They performed the
experiment in MATLAB[11].S. AashaNandhini et.al Fig. 1. Samples of healthy leaf Fig. 2. Samples of multiple
provided a web enabled disease detection system (WEDDS) images diseased leaf images
based on compressed sensing (CS) to detect and classify the
disease in leaves. They upload the CS measurements of the
segmented leaf to the cloud and retrieved it in the monitoring
site and extracted the features from it. They did the analysis
and classification using support vector machine (SVM).
They achieved an overall accuracy of 98.5% and
classification accuracy of 98.4%[12].
III. PROBLEM STATEMENT
From the above section it is evident that the use of Fig. 3. Samples of Apple Scab Fig. 4. Samples of Apple Rust
computer vision to solve the problem of manual disease disease leaf images diseased leaf images
detection is increasing which decreases human efforts. We
also saw few works done in the recent years which claim
high accuracy obtained by their machine learning or deep B. Data Pre-Processing and vizualization
learning models yet none of them claim to perform multiple Data Preprocessing and Visualization is a very important
disease detection in a single leaf with high accuracy along step before building Machine Learning and Deep Learning
with proper identification.There are variations in symptoms models. It gives us an insight into the data and evenshows
of the disease due to age difference of plants, severity of the distribution of data. We also thought of looking into the
disease itself, genetic variation due to hybrid genetic data before going for model selection and training part. The
modifications and light intensity of the images which cause dataset contains 3642 images of Apple Leaves divided into
the accuracy to fall very low when it comes to testing on Train and Test which belongs to 4 categories.“scab”- this
actual images in a crop field. Due to the drop in the refers to the Apple Scab leaf disease, “rust”- this refers to the
accuracy misdiagnosis is prone to occur which could lead to Apple Rust leaf disease, ”multiple_disease”- this refers to the
crop damage on a large scale. Thus to prevent this the Apple Leaves images having multiple disease and “healthy”-
proposed deep learning or machine learning model should be this refers to the healthy Apple Leaf images.
able to predict single diseased plant leaf with high accuracy Since, we are dealing with the leaf disease detection
and it should clearly distinguishin case of single leaf with dataset therefore it will be quite interesting to observe the
multiple diseasewhich will increase the overall accuracy and spread of the Red,Blue and Green channel values in the
reduce the false predictions. One of the main reasons in the images to identify which channel in dominant with the
lack of research into this is the lack of images having a healthy leaf and which is dominant with the diseased leaf.
single leaf with multiple diseases. Thus we could have a We first analyzed some random images from training set to
dataset of leaf images with multiple disease along with a see the channel distribution and understand about the
good deep learning model that has low false predictions and intensity of channel values in diseased and healthy leaf
high accuracy. images. Fig 5 shows the RGB values in a healthy leaf
IV. PROPOSED WORK image,Fig 6shows the RGB values of diseased part of the
leaf having only one disease and Fig 7 shows the RGB
A. Data Collection values of amultiple diseased leaf and these values are of the
The image dataset [9] was gathered from Kaggle diseased part as shown in image. From Table 1, we can see
competition “Plant Pathology 2020-FGVC7” [8], organized that the blue values are quite low in case of healthy leaf and
in the month of April-May 2020.The dataset contains images high for diseased leafimages. Thus it is now clear that the
belonging toapple leaf diseases. It contains images belonging blue channel is the key point of our observation and that it is
to 4 category of images“healthy”, themain key to detect the leaf disease.Yet we need to study
“multiple_diseases”,“scab” and “rust”. “scab” and “rust” the channel distribution in more detail to be sure of our
refers to the apple scab and apple rust disease and observation.
“multiple_diseases” refer to the images having multiple
diseases. Few samples of the three categories of images
could be seen in Fig 1,2,3, 4. As explained in the problem
statement section that lack of dataset of leaf images with
multiple images is a major block thus now with the new
dataset that we have gathered, we can easily perform
Authorized licensed use limited to: Linkoping University Library. Downloaded on June 20,2021 at 11:43:51 UTC from IEEE Xplore. Restrictions apply.
Fig 8, 9 and 10 shows the channel distribution of Red,
Blue and Green values. Upon observing Fig 8 which is the
red channel distribution plot, we found that the plot is rightly
skewed (slightly) with a positive skew and the red channel
valuesare roughly normally distributed.This indicates that
the red channel values are more concentratedat 100 which
can also be observed from the plot. Also we can observe that
there is large variation in averagered channel values across
the dataset. Fig 9 is the plot for green channel distribution
Fig. 5. RGB value in a healthy leaf Fig. 6. RGB value in a leaf with
single disease
and as compared to the red channel plot Fig 8, the green
channel values have a more uniform distribution but smaller
peak than the red channel plot. Also we can clearly observe
that this plot has left skew and larger mode of 140,
aroundwhich the green channel values are concentrated. This
clearly indicates the presence of more green color in the
dataset with a good distribution as compared to red channel.
This is quite obvious because the images are of leaf which
are green in color. Fig 10 is the plot for the blue channel
Fig. 7. RBG value in a leaf with
multiple disease distribution which is most uniformly distributed as compared
to the red and green channel plots. Also this channel shows
The three RGB values obtained are presented in Table-1. more variation than other two channels in the entire dataset.
The plot has a slightly leftward or minimal skew.Fig 11
TABLE I. COMPARISON OF RGB VALUES OF LEAVES shows the overall distribution of RGBchannels from which
we canobserve that the channel values are concentrated
Leaf Category Red Green Blue around 105 with a roughly normal distribution. Fig 12 shows
Healthy 60 137 33 the combined plot of RGB channel values which shows us
Single Disease in a Leaf 115 111 57 the variation of the channels throughout the images. Also we
Multiple Disease in a Leaf 141 128 115 observed green to be the most pronounced color followed by
red and blue.Fig 13 shows the mean value vs. the color
Thus we plotted thechannel distribution of RGB values. channel plot which is in alignment with the values which we
figured out from the channel plots. For red we figured out
the values to be concentrated at 100 which are same as
observed in Fig 13 for the red channel. Similar is the case for
green and blue channels.Due to this variation of blue channel
in the image it is becoming clearer that it is the key point for
leaf disease detection.
Authorized licensed use limited to: Linkoping University Library. Downloaded on June 20,2021 at 11:43:51 UTC from IEEE Xplore. Restrictions apply.
Fig. 15. Parallel categories plot showing relationship of categories
Authorized licensed use limited to: Linkoping University Library. Downloaded on June 20,2021 at 11:43:51 UTC from IEEE Xplore. Restrictions apply.
that line.Thus we applied SMOTE re-sampling and the 1.For i=1 to k, do
“multiple disease” category is the minority class in our case 2. From P2 set, pick ܥ and ͳܨ . Let this be ܥଵ and ͳܨଵଵ
which will be oversampled. Then we split the dataset into 3. Check if ܥଵ and ͳܨଵଵ are present simultaneously in
training and validation which will help us evaluate the model any tuple of P1. If yes, then ܥଵ is the model we are looking
better. We took 80% dataset for training and the rest 20% for for and ܥ௦௧ ൌ ܥଵ , break the loop
validation 4. Else continue from step 2
5.If no such ܥଵ found that lies between ͳܨଵଵ and ͳܨଵሺȀଶሻ then
D. Transfer Learning ܥଵ (first model from P2 set) is the best model.
We used the Transfer Learning to train the pre-trained
models on the dataset. The main reason to choose Transfer
Learning is that there can be many classes which could V. RESULTS AND DISCUSSIONS
resemble close to the leaf dataset thus the features extracted S:NO Pre-Trained Accuracy F1 Precision
by the pre-trained models in Table 1, could be used for Model Score
training on our dataset which increases the overall accuracy. 1 MobileNetsV2 0.8677 0.8509 0.8827
This process also reduces the training time of the models. 2 InceptionV3 0.8236 0.8036 0.7848
3 VGG19 0.669 0.6518 0.6376
E. Ensemble Algorithm 4 DenseNet 0.9228 0.9124 0.9193
Ensemble Methods like bagging and boosting have 5 VGG16 0.6335 0.6219 0.6146
proven to be a good way to build classifiers which increases 6 Xception 0.8787 0.8658 0.8615
the accuracy.Generally ensemble methods have been found 7 EfficientNetB7 0.9146 0.91903 0.9261
to take only one performance metric – accuracy which is not 8 NASNET Large 0.9090 0.9078 0.9077
always the only metrics for evaluation of models. Often in
real life classification problem like ours we have to deal with TABLE II. ACCURACY, F1 SCORE AND PRECISION OF CLASSIFIERS
imbalanced dataset and penalties of every wrong prediction
could be both deadly and economically disastrous. In our
scenario where we are dealing with Plant Disease
Classification, every wrong prediction can lead to wrong
chemical use which ultimately can destroy the crops causing
huge economic loss. So the model cannot be judged on only
one parameter accuracy, which simply tells the total correct
predictions in the dataset.F1 score is another parameter
which is used along-with accuracy to test a model’s
performance. It is the harmonic mean of recall and precision.
Recall and Precision are the two most important parameters
upon which the F1 Score depends, so instead of getting high
range of these two values we only need to focus on getting
good F1 Score. Good F1 score would mean good recall and
precision values.Thusin this paper we have proposed a novel
ensemble method which unlike other ensemble method takes
both accuracy and f1 score into account for choosing the best
classifier.The proposed algorithm- Algorithm 1 and 2, shows Fig. 20. Accuracy vs F1 Score comparison of classifiers
the detailed steps of the approach.
Algorithm 1 – Ensemble Algorithm Table 2, presents a summary of the Accuracy, F1 Score
Input - C- {ଵ ,….., ୩ } - Set of classifiers, k: No:of classifiers and Precision of the various classifiers which were trained
D – Training dataset using Transfer Learning process and using Ensemble
Output – Set P1-{(୧ ǡ ୧ ǡ ͳ୧ ), ……} – Sorted in decreasing order Method. Since, our proposed work consists of unique
of ୧ ensemble method to come up with the best classifier which
P2-{(୧ ǡ ୧ ǡ ͳ୧ ),(…), ……} – Sorted in decreasing order of ͳ୧ uses Algorithm 1 and 2, thus one cannot simply say that the
1. Set ܥ - Set of Classifiers model with the best accuracy is the best performing model.
2. For i=1 to k, do According to our Algorithm 1 and 2 the best model should
3. Train ܥ on D. have high F1 Score and its Accuracy should lie in the top
4. Evaluate the model. k/2, accuracies.k(total number of classifiers)=8.Thus after
5. Calculate and Save the accuracy in ܣ andF1 Score in applying the algorithm, our Ensemble Method came up with
ͳܨ EfficientNetB7 as the best classifier as it had the best F1
6. Store ܥ ,ܣ and ͳܨ as a set in Set P1 and P2 Score and its accuracy also lies in the top 4 accuracies. The
7. Sort P1 in decreasing order of ܣ and keep top k/2 plot in Fig 20 resembles the comparison of accuracy and F1
elements. Score of all the models. It can also be clearly observed that
8. Sort P2 in decreasing order of ͳܨ . EfficientNetB7has highest F1 Scorebut its accuracy is just
slightly less than the top accuracy which satisfies our
Algorithm 2 Algorithm 1, 2.Thus the proposed algorithm comes with a
classifier which is having a good F1 Score which indicates
Input Set P1-{(ܥ ǡ ܣ ǡ ͳܨ ), ……} – Sorted in decreasing order of good recall and precision and which shows lesser wrong
ܣ predictions. Also it predicts whether a leaf imageis having
P2-{(ܥ ǡ ܣ ǡ ͳܨ ),(…), ……} – Sorted in decreasing order of ͳܨ multiple disease or single leaf which reduces false
Output –ܥ௦௧ (Best classifier) predictions in real-life scenario.
Authorized licensed use limited to: Linkoping University Library. Downloaded on June 20,2021 at 11:43:51 UTC from IEEE Xplore. Restrictions apply.
VI. CONCLUSION Computing & Communication Systems (ICACCS), pp. 281-284.
IEEE, 2019.
In this paper we have presented a unique Ensemble [4] Shrivastava, Vimal K., Monoj K. Pradhan, SonajhariaMinz, and
algorithm –Algorithm1 and 2 which uses both accuracy and Mahesh P. Thakur. "RICE PLANT DISEASE CLASSIFICATION
F1 Score to choose the best classifier from among the list of USING TRANSFER LEARNING OF DEEP CONVOLUTION
classifiers [Table 1]. We have performed data pre-processing NEURAL NETWORK." International Archives of the
Photogrammetry, Remote Sensing & Spatial Information
and found that the blue channel was more abundant in the Sciences (2019).
diseased part as compared to the healthy part. We then
[5] Ferentinos, Konstantinos P. "Deep learning models for plant disease
analyzed the data and found it to be imbalanced as the detection and diagnosis." Computers and Electronics in
multiple diseased leaf category only had 91 images which is Agriculture 145 (2018): 311-318.
only 5% of the dataset. We performed SMOTEresampling [6] Mohanty, Sharada P., David P. Hughes, and Marcel Salathé. "Using
method to handle imbalanced dataset.From our deep learning for image-based plant disease detection." Frontiers in
experimentation, our proposed algorithmcame up with plant science 7 (2016): 1419.
EfficientNetB7 as the best classifier which has best F1 score [7] Khirade, Sachin D., and A. B. Patil. "Plant disease detection using
and its accuracy was also among the top k/2 classifiers.Using image processing." In 2015 International conference on computing
communication control and automation, pp. 768-771. IEEE, 2015.
our proposed algorithm and new dataset(having multiple
[8] Mahlein, Anne-Katrin. "Plant disease detection by imaging sensors–
disease leaf as a separate category) our proposed work parallels and specific demands for precision agriculture and plant
successfully reduces false predictions by first predicting phenotyping." Plant disease 100, no. 2 (2016): 241-251.
whether a leaf image has single or multiple disease and [9] Fang, Yi, and Ramaraja P. Ramasamy. "Current and prospective
secondly the classifier has low false predictions and good methods for plant disease detection." Biosensors 5, no. 3 (2015): 537-
accuracy as our algorithm chooses classifier with good F1 561.
score and accuracy. [10] Moghadam, Peyman, Daniel Ward, Ethan Goan, Srimal Jayawardena,
PavanSikka, and Emili Hernandez. "Plant disease detection using
REFERENCES hyperspectral imaging." In 2017 International Conference on Digital
Image Computing: Techniques and Applications (DICTA), pp. 1-8.
[1] Singh, Vijai, Namita Sharma, and Shikha Singh. "A review of IEEE, 2017.
imaging techniques for plant disease detection." Artificial Intelligence
in Agriculture (2020). [11] Singh, Vijai, and A. K. Misra. "Detection of unhealthy region of plant
leaves using image processing and genetic algorithm." In 2015
[2] Shah, Jitesh P., Harshadkumar B. Prajapati, and Vipul K. Dabhi. "A International Conference on Advances in Computer Engineering and
survey on detection and classification of rice plant diseases." In 2016 Applications, pp. 1028-1032. IEEE, 2015.
IEEE International Conference on Current Trends in Advanced
Computing (ICCTAC), pp. 1-8. IEEE, 2016. [12] Nandhini, S. Aasha, RadhaHemalatha, S. Radha, and K. Indumathi.
"Web enabled plant disease detection system for agricultural
[3] Shruthi, U., V. Nagaveni, and B. K. Raghavendra. "A review on applications using WMSN." Wireless Personal Communications 102,
machine learning classification techniques for plant disease no. 2 (2018): 725-740.
detection." In 2019 5th International Conference on Advanced
Authorized licensed use limited to: Linkoping University Library. Downloaded on June 20,2021 at 11:43:51 UTC from IEEE Xplore. Restrictions apply.