You are on page 1of 6

INTELLIGENT DISEASE DETECTION SYSTEM FOR EARLY

BLIGHT OF TOMATO USING FOLDSCOPE: A PILOT STUDY


P. Maheswari P. Raja N.M. Ghangaonkar
School of Mechanical Engineering School of Mechanical Engineering Department of Botany
SASTRA Deemed University SASTRA Deemed University Chandamal Tarachand Bora College
Thanjavur, India Thanjavur, India Shirur, Pune, India
maheswari@sastra.ac.in raja_sastra@yahoo.com ctborainfo68@gmail.com

Abstract— Early disease identification plays an inevitable role (ANN), SVM, decision trees, clustering, Principal Component
in modern agricultural fields to mitigate huge production losses. Analysis (PCA) and random forests which are used in many
This paper presents a method to identify the pathogen of applications based on the requirement [4]. The four supervised
Alternaria solani, causing early blight fungal disease in tomato learning algorithms used in the proposed work are LDA,
leaves, using foldscope and machine learning algorithms. KNN, linear SVM and quadratic SVM. LDA learning
Foldscope is a paper microscope invented by Manu Prakash and algorithm is used for dimension reduction and classification
his team. It is the remedy for bulky and expensive conventional based on conventional multivariate technique. KNN is a non-
microscope. The foldscope can be attached with high resolution parametric as well as supervised learning algorithm used for
mobile phone for obtaining magnified images. The images of
classification and regression [4, 5]. Image classification using
Alternaria solani were captured using the above mentioned set
up. Then the captured images were classified using various
SVM provides more robustness, accuracy and effectiveness
machine learning algorithms. The quadratic Support Vector even for the smaller set of training samples. SVMs are binary
Machine (SVM) classifier shows the highest classification classifiers and it can also be adapted to multiclass
accuracy of 89% in prediction phase when compared to other classification task. [5, 6].
machine learning algorithms. A. Early blight of tomato
Keywords— Agriculture, foldscope, early blight disease, Early blight is one of the fungal diseases which affects the
Alternaria solani, machine learning algorithms, tomato crop tomato plant leaves, fruit and stem. Among the common
fungal diseases, early blight is most dominant, serious and
I. INTRODUCTION damaging disease. The crop production is heavily affected by
Agriculture is the dominant sector in determining the the early blight disease, as they cause premature defoliation
economic status in vast majority of the countries. The and results in heavy losses in yield by decreasing quality and
production of crops is affected every year due to the intrusion quantity of the fruit. The symptoms of early blight on tomato
leaves are, 1. Small dark spots enlarged into circular lesions
of pathogens, changing climatic conditions, etc. Among the
consisting of concentric rings and 2. Leaves appear as older
cultivated crop, tomato (Solanum lycopersicum) which is a
foliage [7]. High rainfall, overhead irrigation, crowded
type of horticultural crop is one of the customary vegetable plantation, dews and extended period of leaf wetness are the
grown all over the world. Tomato has more demand, in appreciative conditions for disease development [8]. Fig.1
worldwide due to its nutritional values. Hence, there is a need shows the images of healthy and early blight disease of tomato
for higher production to fulfil the consumers. Like other leaf.
crops, tomato production is also hugely affected due to
various parasitic microorganisms such as fungal, bacterial,
viral, etc., and changing environmental conditions [1].
Hence, early stage identification of disease secure the crops
and increases the yield significantly.

An integrated system that uses several methodologies (to b. Early blight disease of
accumulate different data) and effectively analyse it to a. Healthy tomato leaf
tomato leaf
increase the production in a cost effective manner is precision Fig.1. Healthy and diseased tomato leaves
agriculture [2]. Precision agriculture with image processing
and machine learning algorithms finds immense applications The prime causal agent of early blight is a fungus,
in identifying the diseases on crops at its early stage [3]. The Alternaria solani. Fig.2 shows the morphology of Alternaria
main idea of machine learning is that it focuses on the given solani causing early blight in tomato leaves.
input data by its own without explicit programming and
emphasize the performance of given tasks. Supervised and
unsupervised learning are the two important categories of
machine learning algorithms. In supervised learning, input
and desired output data are provided. Based on that, the
prediction is made on future data. In unsupervised learning,
only the input data are provided and algorithm predicts the
future data without any guided information. Some of the a. Multiple Alternaria solani b. Single Alternaria solani
supervised and unsupervised learning algorithms are: linear pathogens pathogen
regression, K-Nearest Neighbours (KNN), Linear Fig.2. Images of Alternaria solani captured using foldscope
Discriminant Analysis (LDA), Artificial Neural Networks
Funding Agency: Department of Biotechnology-Government of India (Grant
No. BT/IN/Indo-US/Foldscope/39/2015)

XXX-X-XXXX-XXXX-X/XX/$XX.00 ©20XX IEEE


Alternaria solani reproduces asexually by conidia. matrix. Energy estimates the summed values of squared
Morphologically, these conidia are bell-shaped, having elements in the GLCM. Between the specified pixel pairs, the
horizontal and vertical septation [7, 8]. correlation measures the occurrence of joint probability in the
B. Foldscope – A paper microscope GLCM. The closeness of the distribution of elements is
measured using homogeneity feature [12]. The four textural
A new low-cost tool that explores the biological and non- descriptors and their corresponding formulae are given in
biological science, is foldscope, which is a paper microscope.
Table I.
The biological samples (such as fungi, bacteria, etc) along
with non-biological samples (for eg. particulates detrimental TABLE I. TEXTURE FEATURES AND THEIR FORMULAE
to air quality) can be visualized using foldscope. It is portable
as well as water-proof when compared to conventional Features Formulae
microscope [9]. The assembly of foldscope is explained in
 i  j pi, j 
2
Contrast, I con
detail in the link [10]. The assembled foldscope (with various
parts) is shown in Fig.3. i, j

 pi, j 
2
Energy, Ie
i, j

i i  j   j pi, j 
Correlation, I corr 
i, j  i j
Homogeneity,
pi, j 
I homo 
i, j 1  i  j

where pi, j  is the probability of occurrence of pixel i


Fig.3. Assembled foldscope with respect to pixel j in an image. i and  j are the mean
A foldscope kit consists of a punched sheet of cardstock, a
spherical glass lens, three squared magnetic couplers, a LED values of ith and j th pixels and  i and  j are the
and a diffuser panel, along with a watch battery that powers standard deviation of i and j
th th
pixels, respectively.
the LED. At present, the lens used in the foldscope are having
the magnification of 140X and resolution of 2 microns. D. Machine learning algorithms
Foldscope weighs just 8 grams, as a result, it can be easily
attached with the mobile phone to obtain images for Image classification using machine learning algorithms is
processing [11]. The foldscope attached with the mobile one of the major area of focus in precision agriculture.
phone is shown in Fig.4. Machine learning is a data driven approach which learns
based on the given input data and it is the subfield of artificial
intelligence [13]. The four machine learning algorithms used
in this proposed work are briefly described below:

The machine learning algorithm, namely LDA is mostly


used for regression and classification problems. In this
method, the sample mean and covariance matrices are
computed from different groups of the training samples. In
LDA, the Fisher discriminant criteria is used and defined as
the ratio of the variance between classes to the variance
within classes. The ratio should be maximum for optimal
solution [14]. Among all the machine learning algorithms, the
Fig.4. Foldscope attached with mobile phone simplest one is KNN. In KNN, object classification into a
particular class is performed based on its neighboring pixel
values [15].
C. Feature extraction from texture
Texture is defined as the local intensity values are One of the popular supervised learning algorithms is
arranged spatially in the given image which has the SVM. Using kernel function in SVM, the training data is
correlation within the areas of visual scene corresponding to projected nonlinearly in the input space to the property space
surface regions. Repetition, directionality and complexity are of higher dimensionality. Then the results are classified using
the three describing properties of texture. Texture manifests SVM classifier. Hyper-planes are constructed in SVM
some sort of periodicity of basic patterns. Gray Level Co- classification which are used to separate the class labels of
occurrence Matrix (GLCM) is used for feature extraction as different cases. Both regression and classification tasks are
it describes better texture details present in an image. The supported by SVM along with multiple categorical and
textural descriptors such as contrast, energy, correlation and continuous variables can be handled [16].
homogeneity are the four important features that relates the
properties of texture of a given image exactly. Contrast In linear SVM, the hyper-planes can be selected, so that
feature is measured to find the local variations in the GLCM there are no points between two classes of data. Then the
effort can be performed to maximize the distance. In feature A. Infected leaf specimen
space, there are number of hyper-planes, and one has to be The infected tomato leaves from the agricultural farm are
chosen which maximize the distance between datasets is collected. The portions having the lesion (dark spot
called an optimal separating hyper-plane. In quadratic SVM, surrounded by concentric rings) is cut into fine bits and
Lagrangian multiplier is used to obtain the optimal separating staining procedure is done on the nano pieces of leaves. These
hyper-plane with maximum margin [17]. nano pieces were inoculated on the Potato Dextrose Agar
(PDA) nutrient medium aseptically. It is incubated at a
Various research work has been done using image temperature of 27°C, for 10 days. The formed colonies were
processing implementing machine learning algorithms to analyzed by preparing slides. This fungal culture is made pure
identify the diseases in tomato leaves based on the external and maintained on the PDA slants. The prepared specimen
appearance of the leaves [18, 19]. But this is the first work
slide is inserted into the foldscope for observation.
that detects the early blight fungal disease in tomato leaves
based on the pathogen identification using paper
microscope (i.e, foldscope) and machine learning B. Capturing of pathogen images using foldscope coupled
algorithms. The captured images are classified using the with mobile phone
above four machine learning algorithms and the
The specimen slide is inserted into the foldscope attached
comparison by the measure of accuracy has also been
performed. with high-resolution mobile phone. The infected tomato leaf
specimen is visualized by adjusting the sample stage and
The paper is organized as follows: In section II, the panning guide of foldscope. While focusing the leaf specimen
proposed method is described and section III presents the by adjusting the focus ramp in the foldscope and zooming of
simulation results of the machine learning algorithms. The mobile camera, the presence of the pathogen is identified and
conclusion and future direction are described in section IV. captured on the mobile.
II. PROPOSED METHOD
In this work, the pathogen which causes early blight in C. Training phase
tomato leaves is identified using foldscope and machine The training dataset consists of 100 images in which 60
learning algorithms. The functional blocks of the proposed images are typical Alternaria solani pathogen images, and the
system is shown in Fig.5. remaining 40 images are other pathogen images (non-
First, the infected tomato leaves are collected from the Alternaria solani) taken by foldscope. For non-Alternaria
agricultural field. Then it undergoes proper staining solani case, Septoria lycopersici images are taken. These
procedure. After that, the specimen slide is inserted into the images are labeled corresponding to its category. The textural
foldscope coupled with mobile phone and the pathogen is descriptors such as I con , I e , I corr and I hom o are extracted
captured. The captured images are given to the above
from these images using GLCM.
described learning algorithms for classification and the
performance is measured. The steps of the proposed system
are discussed in the following sections. Then the extracted features and their corresponding labels
are tabulated which is the input table to the learning
algorithms for constructing the model using classification
learner in MATLAB (R2017a).
Test image

Input: Infected tomato


leaf Capturing of pathogen image Feature Extraction
Prediction phase

using foldscope with mobile


phone

Label Trained Classifier


Learning
Algorithms model
(LDA, KNN,
linear SVM
Training Feature and quadratic
images Extraction SVM)
Output Label

Training Phase Output


Pathogen : Alternaria Solani
Disease : Early Blight

Fig.5. Functional blocks of proposed method


The parameters of learning algorithms are defined as
follows:
m – number of observations = 100 images in the
proposed method
For this m observations, the training set T is defined as
follows:

T  x1 , y1,l , x2 , y 2,l ,........, x j , y j ,l 


  
(1)

where x j is the d dimensional keywords or feature
vectors which is defined as:

x j  {x1 ,........., xd } , d= 4 in the proposed work,
which are I con , I e , I corr and I hom o .
y j ,l is the class labels. In the proposed work, two classes
of labels are used which is defined as:
y j ,l   1,1. If y j ,l is +1, it classifies the pathogen
as Alternaria solani and if y j ,l is -1, it classifies the pathogen
as non-Alternaria solani.
These are the parameters which are the input to the four
learning algorithms. Using these parameters, the trained
model is developed.
Fig.6. Flow chart for the learning algorithms
D. Prediction phase
In the prediction phase, 40 test images were taken to III. SIMULATION RESULTS
predict the class of the each image using the already Computational experiments are carried out to show the
constructed model implementing the above four learning efficacy of the proposed method. The experiments are
algorithms. In these 40 images, 25 images are Alternaria executed on windows 10 processor with an Intel® Core™ i5-
solani and the remaining 15 images are non-Alternaria 6200U laptop with 8GB RAM and 2.4GHz speed. All the
solani. programs are written and compiled on MATLAB (R2017a)
version.
The test images were captured using foldscope attached
with the mobile phone. Feature extraction is done on these The captured images of pathogen using foldscope are
images. These features are tabulated (i.e, each image transferred to the MATLAB (R2017a) workspace.
comprising of contrast, energy, correlation and homogeneity) Preprocessing such as greyscale conversion is done on these
and the table is given to the trained classifier model to predict images, followed by feature extraction. The textural features
the class of pathogen. as described in Table I are extracted from the test pathogen
images. The values of textural descriptors for one of the
E. Classification of test pathogen images example test image is tabulated in Table II.
The trained classifier using LDA, KNN, linear SVM and TABLE II. FEATURES EXTRACTED FROM THE EXAMPLE INPUT TEST IMAGE
quadratic SVM are used to map feature vectors into a higher AND ITS VALUES
dimensional feature space and then separating two classes Texture descriptors Values
based on the GLCM features. The classifier highlights the Contrast,I con 0.040013712
pixels based on the GLCM features which helps to make the
boundaries in an image. The classifier classifies the pathogen Energy, I e
0.289466882
images with the trained model and it predicts the label (i.e, Correlation, I corr
Alternaria solani or non-Alternaria solani). The flow of the 0.98525872
learning algorithms used in the proposed system is shown in Homogeneity, I homo 0.979993311
Fig. 6. The flow of the learning algorithm is described as
follows:
Based on these values (for each test image), the trained
The dataset of training images are initially labelled as one
classifier of four machine learning algorithms predict the
of the class as Alternaria solani or non-Alternaria solani.
label. These descriptors are calculated for the 40 test images
Features extraction is done on these training images. The
in the prediction phase. Then a table is formulated from the
learning parameters such as predictors (textural features) and
obtained feature values of all the 40 images. This table is
responses (two classes) are defined and the training process
given as an input to the already trained classifier model which
is started using four learning algorithms. Then 5-fold cross
predicts the pathogen images along with the specific disease
validation process is applied and the trained model is
types.
obtained. This model is used to predict the new test images in
the prediction phase.
A. Prediction Results Under the ROC Curve (AUC) is 0.97 for the quadratic SVM
The test pathogen images are predicted by all four prediction. The AUC implies the collection of performance
classifier models. They predict the label based on the textural measure across all possible classification thresholds. The
descriptors values. The accuracy given by quadratic SVM in normal range of AUC lies between 0 and 1. The predicted
prediction phase is 89%. Alternaria solani pathogen image from the test images is
shown in Fig. 10 and Fig. 11.
The scatter plot of training dataset images of two
pathogen classes for quadratic SVM is shown in Fig.7. In this
plot, the blue color dots corresponds to the Alternaria solani
pathogen and red color dots scattered are the non-Alternaria
solani. The confusion matrix for the trained model
corresponding to quadratic SVM is shown in Fig.8. The
confusion matrix is the performance measure in which the
number of correct predictions and incorrect predictions are
outlined with count values and marked by each class.

Fig.9. ROC curve for the quadratic SVM trained model

Fig.7. Scatter plot for the quadratic SVM trained model

Prediction results
Pathogen class : Alternaria solani
Disease type : Early blight
Prediction Accuracy : 89% by quadratic SVM

Fig.10. Predicted test result of Alternaria solani image using quadratic SVM

Fig.8. Confusion matrix for the quadratic SVM trained model

From the above confusion matrix, it is inferred that the


trained model by quadratic SVM produced the correct
prediction of Alternaria solani pathogen by the count value
of 58 and incorrect prediction by the count value of 2. In the Prediction results
case of non-Alternaria solani, correct prediction is 33 and Pathogen class : Alternaria solani
incorrect prediction is 7. Fig.9 shows the Receiver Operating Disease type : Early blight
Prediction Accuracy : 85.4% by linear SVM
Characteristic (ROC) curve of the quadratic SVM trained
model. It is plotted between true positive rate and false
Fig.11. Predicted test result of Alternaria solani image using linear SVM
positive rate at different threshold settings. The value of Area
B. Performance evaluation of the proposed method BT/IN/Indo-US/Foldscope/39/2015). The authors would like
One of the metric is used to evaluate the classification to acknowledge Dr. Durga Prasad Awasthi, Assistant
models is accuracy. It is defined as the ratio between number Professor, Department of Plant Pathology, College of
Agriculture, Tripura, for his support.
of correct predictions and total number of predictions. The
accuracy values in percentage for various machine learning
algorithms in both phases (i.e, training and prediction phase)
are given in Table III. Based on the comparison, the accuracy REFERENCES
obtained by the quadratic SVM is high, i.e, 91%. The [1] P. Adhikari, Y. Oh and D. R. Panthee, “Current status of early blight
attributed reasons are, quadratic SVM provides optimal resistance in tomato – an update,” Int. J. Mol. Sci., vol. 18, issue 10,
September 2017.
margin of separation between the two classes and produces
[2] S. Dimitriadis and C. Goumopoulos, “Applying machine learning to
more effective classification by dual formulation of extract new knowledge in precision agriculture,” 2008 Panhellenic
optimization. Hence quadratic SVM provides better Conference on Informatics, Greece, pp. 100-104, 2008.
prediction than the other three classifier models. [3] P. Moghadam, D. Ward, E. Goan, S. Jayawardena, P. Sikka and E.
Hernandez, “Plant disease detection using hyperspectral imaging,”
TABLE III. COMPARISON OF ACCURACY BETWEEN VARIOUS LEARNING IEEE International Conference on Digital Image Computing:
ALGORITHMS Techniques and Applications, Sydney, pp. 1-8, 2017.
[4] E. Mwebaze and G. Owomugisha, “Machine learning for plant disease
Learning Training phase Prediction phase incidence and severity measurements from leaf images,” 15th IEEE
algorithms accuracy in % accuracy in % International Conference on Machine Learning and Applications,
USA, pp. 158-163, 2016.
LDA 85.0 84.0
KNN 85.0 83.0 [5] Y. Tian, Y. Shi, and X. Liu, “Recent advances on support vector
machines research,” Int. J. Techno. Econ. Dev. Econ., vol. 18, issue 1,
Linear SVM 88.0 85.4 pp. 5-33, April 2012.
Quadratic SVM 91.0 89.0 [6] X. Sun, L. Liu, H. Wang, W. Song and J. Lu, “Image classification via
support vector machine,” IEEE 4th International Conference on
Table IV gives the correct and incorrect predictions by Computer Science and Network Technology, Harbin, pp. 485-489,
2015.
count values obtained from the confusion matrix for all the
[7] P. R. Pawari, A. M. Bhosale and Y. P. Lolage, “Alternaria blight of
four implemented learning algorithms. tomato (lypoersicon esculentum mill),” Int. J. Adv. Technol. Inno.
Res., vol. 8, issue 9, pp. 1727-1728, August 2016.
TABLE IV. CORRECT AND INCORRECT PREDICTION VALUES OF FOUR
LEARNING ALGORITHMS
[8] U. Mokhtar, M. A. S. Ali, A. E. Haseenian and H. Hefny, “Tomato
leaves diseases detection approach based on support vector machines,”
Learning Correct predictions Incorrect predictions 11th International Conference on Computer Engineering, Cairo, pp.
algorithms Alternaria Non- Alternaria Non- 246-250, 2015.
solani Alternaria solani Alternaria [9] J. S. Cybulski, J. Clements and M. Prakash, “Foldscope: origami-based
solani solani paper microscope,” PLoS One, vol. 9, issue 6, June 2014.
LDA 59 26 1 14 [10] https://www.foldscope.com/tutorials/
KNN 54 31 6 9
[11] https://en.wikipedia.org/wiki/foldscope
Linear 59 29 1 11
SVM [12] J. Sharma, J. K. Rai and R. P. Tewari, “Co-occurrence matrix and
Quadratic 58 33 2 7 statistical features as an approach for mass classification,” IEEE
SVM International Conference on Advances in Computing, Communications
and Informatics, New Delhi, pp. 2369-2373, 2014.
IV. CONCLUSION AND FUTURE WORK [13] S. R. Maniyath, “Plant disease detection using machine learning,”
IEEE International Conference on Design Innovations for 3Cs
In this work, using foldscope, (i.e, cost effective paper Compute Communicate Control, Bangalore, pp. 41-45, 2018.
microscope), the images of diseased tomato leaves are [14] H. W. Luo, L. N. Yang, Y. M. Li, H. L. Yuan and Y. Y. Tang, “Feature
captured using mobile phone. By using various machine extraction based on discriminant analysis with penalty constraint for
hyperspectral image classification,” IEEE International Conference on
learning algorithms, the proposed method predicts the given Machine Learning and Cybernetics, Tianjin, pp. 931-936, 2013.
input image as Alternaria solani or non-Alternaria solani. [15] C. Li, S. Zhang, H. Zhang, L. Pang, K. Lam, C. Hui and S. Zhang,
Among the four implemented algorithms, quadratic SVM “Using the K-nearest neighbor algorithm for the classification of lymph
produced the highest accuracy in both training and prediction node metastasis in gastric cancer,” Interdiscip. J. Comput. Math.
Methods Med., pp. 1-11, October 2012.
phase. (i.e, 91% and 89% respectively). The training and
[16] S. R. Suralkar, A. H. Karode, and P. W. Pawade, “Texture image
prediction accuracy of classification can further be increased classification using support vector machine,” Int. J. Comp. Tech. Appl.,
by increasing the number of dataset. In future, the work can vol. 3, issue 1, pp. 71-75, February 2012.
be extended by the development of disease guidance system [17] K. Wang, X. Wang, and Y. Zhong, “A weighted feature support vector
for identifying various diseases along with assessing the machines method for semantic image classification,” IEEE
severity level of disease and treatment for the diseases (i.e, International Conference on Measuring Technology and Mechatronics
Automation, Changsha, pp. 377-380, 2010.
application of appropriate pesticides) employing deep
[18] S. R. Kamlapurkar, “Detection of plant leaf disease using image
learning algorithm. processing approach,” Int. J. Sci. Res. Publ., vol. 6, issue 2, February
2016.
ACKNOWLEDGMENT [19] K. R. Gavhale and U. Gawande, “An overview of the research on plant
This work is funded by DBT (Department of leaves disease detection using image processing techniques,” IOSR J.
Biotechnology), Government of India (Grant No. Comput. Eng., vol. 16, issue 1, pp. 10-16, January 2014.

You might also like