Professional Documents
Culture Documents
net/publication/335945621
CITATIONS READS
0 492
2 authors:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Shantha Jayalal on 17 October 2019.
Keywords: K-Means Clustering, Local Binary Pattern, Figure 1. Passion fruit production in previous years
Support Vector Machine, L*a*b
Figure 1 shows how the production of passion fruit in
1. Introduction Sri Lanka since 2006. The production of passion fruit
was steadily grown between 2006 and 2014. An average
Passion fruit (Passiflora edulis) is a very famous fruit
500MT can be observed during this period. In 2014, there
in modern society. Because of its nutrients and health
was a 635MT production and it increased sharply until
benefits, such as prevention of cancer, controlling blood
2016 when the production of passion fruit reached to
pressure and preventing hyperlipidemia. It has a huge
1484MT. It was the maximum production between 2006
demand in both local and international markets. Passion
and 2017. After 2016, production declines suddenly. In provided to find the user intention. The best result was
2017, passion fruit production was 731MT from got using morphology feature extraction. Experimental
470.27ha extent. When considering the ratio of evaluation of this approach was effective and 82%
production and hectare in 2016 and 2017, there was the accurate to identify pomegranate disease.
2.92 (MT per ha) in 2016. It was 1.56(MT per ha) in
2017. The decline of this ratio is 1.36(MT per ha). When In paper [5], the authors presented the image
considering only the production, it was around 50.7% processing based approach for fruit disease detection.
decline in 2017. First, read input image and transformed it from RGB to
L*a*b colour space. Because the colour information in
This production loss was around Rs.90 million in the L*a*b colour space is stored in only two channels.
2017 [3]. Fruit Research and Development Institute Input images were partitioned into four segments using
(FRDI) mentions “Pests, diseases and bad climate mainly K-means cluster in this research. Because the empirical
caused to decline the production in 2017”. After observations it was found that using 3 or 4 clusters yield
spreading diseases, farmers have to use fungicides, good segmentation results. GCH, LBP, CCV and CLBP
remove infected parts and burn them with a systematic were used for feature extraction. More accurate results
way. So initial disease identification is very important to could be taken using CLBP feature extraction technique.
prevent spreading diseases. However, this prevention K-means clustering was used for segmentation. Those
cannot be successful due to the lack of agricultural field segmented images were extracted to label each pixel in
officers. the image. SVM algorithm was used for training and
classification of fruit disease. Authors used apple as a test
Nowadays technology plays a vital role in all the case and evaluated the classification model for three
fields but till today traditional methodologies are used for types of apple diseases, which were apple rot, apple
agriculture in Sri Lanka. However the large scale of blotch and apple scab. The accuracy of this approach was
agricultural countries use technologies like MRI, X-ray achieved by up to 93%.
imaging for detecting the quality of the fruits. But these
technologies are costly for farmers to afford, occupy In paper [6], the author had used SVM classification
large space, users need to have the knowledge to use and for identifying and classifying the grape leaf diseases.
analyze the results. Grape leaf images were taken using a digital camera and
those were used to both training and testing the system.
However, today passion fruit disease identification is Collected images included the leaves infected by
done manually by experienced people, but due to so Powdery Mildew and Downy Mildew. Removing
many environmental changes and lack of resources for background noise and resizing to 300*300 PX to improve
getting information, the prediction is becoming tough. So the image quality were done under the image
the main purpose of this research is to develop the preprocessing. Gaussian filtering had been used to
classifier model using image processing, which will be remove noise in the image. K-means clustering was used
able to identify passion fruit diseases accurately. for segmenting an image into three groups. Features were
extracted based on both colour and texture for taking
2. Literature review accurate disease information. Finally, the classification
model was used to detect the leaf disease. LSVM was
Recently, many people have done researches for used in this research for the classification of leaf
detecting fruit and vegetable diseases using image diseases. This system could detect and classify the
processing and deep learning. According to the research examined disease successfully. The accuracy of this
paper [4], authors had used image processing technology system was 88.89%.
to identify the pomegranate diseases. Image
preprocessing was the first step of the methodology. In paper [7], authors had used image processing
Image resizing was done under the image preprocessing. technology for identifying the leaf diseases. First authors
Because digital camera had been used to capture the selected the plants, which were affected by the disease
images in this study. The size of those images was very and then took the snapshot of the diseased leaf. Contrast
large and take more time to process. So all images were enhancement and converting RGB to HIS was done
resized to 300 x 300 PX. Morphology, colour and CCV under the image preprocessing step. K-means clustering
features were used for feature extraction. K-means algorithm was used to cluster the object based on the
clustering technique was used for partitioning the training feature of leaf into k number of groups. SVM algorithm
dataset according to their features. After the clustering, had been used in this system for classification purpose.
SVM was used for classification to identify the image as SVM is a statistical learning-based solver. Finally, when
infected or non-infected. An intent search technique was entered a diseased leaf image to a system, the system was
able to detect the leaf disease successfully. and figure 4, which are the categories of passion fruits
0000000000000000 that are collected under the data acquisition.
1 0 0 0 0 0 100000
0 1 0 0 0 0 010000
0 0 1 0 0 0 001000
Figure 5. Image segmentation result images
The fourth phase of the methodology is feature Disease name Non- Scab Woodiness
extraction. Local Binary Pattern (LBP) mechanism was disease disease Virus
used for feature extraction. Segmented images were
taken from folders one by one and each image was No. of images 13 34 40
converted to grey-scale. Then a neighbourhood was (1st dataset)
selected for each pixel in grey-scale image surrounding No. of images 26 67 80
the center. An LBP value was calculated for this center (2nd dataset)
pixel and stored the output 2D array with the same width
and height as the taken image. No. of images 39 98 120
(3rd dataset)
Then calculating the LBP for center pixel was started
from top right neighbourhood pixel clockwise. The
results of binary values were stored in an 8-bit array,
which was converted to decimal later. These converted 3.5.2. Considering disease names and its stages
decimal values were stored in the output LBP 2D array.
A histogram for each image was computed using the Stages of the diseases are considered here. There are
output 2D array. For this implementation, the 3x3 pixel three types of stages, such as mild (I), moderate (II) and
severe (III). According to those stages, there are seven 3.6. Training & Testing
categories in this image sets. Those are non-disease,
mild-scab, moderate-scab, severe-scab, mild-woodiness, The last phase of the methodology is training and
moderate-woodiness and severe-woodiness. Table 5 testing the model. Support Vector Machine (SVM)
shows the labels, which were assigned for each category. algorithm was used for classification as mentioned in the
Table 6 shows the count of images, which were included methodology. A set of mathematical functions are used in
in each dataset. SVM algorithms, which are defined as kernels. The
function of the kernel is to take data as input and
Table 5. Labelling according to diseases' stage transform those data into the required form. SVM
Non- Scab Woodiness Mild Moderate Severe Label algorithms are used with different types of kernels.
disease disease virus Gaussian, sigmoid, hyperbolic tangent and polynomial
kernel are some of them. Among those kernels, the
1 0 0 0 0 0 100000
polynomial kernel is popular in image processing. So, the
0 1 0 1 0 0 010100
polynomial kernel was used for the SVM algorithm as
0 1 0 0 1 0 010010 the kernel in this study. Each CSV files were used for
0 1 0 0 0 1 010001 training the SVM separately. 70% of the data from each
0 0 1 1 0 0 001100 dataset was used for training purpose. Every training
0 0 1 0 1 0 001010 time, CSV file rows were shuffled for increasing the
accuracy. When considering the testing the model, the
0 0 1 0 0 1 001001
remaining 30% of data from each dataset were used. In
here, the model predicted the label for each image and
Table 6. Seven classes of datasets image counts those were evaluated using actual labels. The accuracy of
Disease Non- Scab Disease Woodiness these models can be defined as,
disease Virus
Stage I II III I II III
1st set 13 12 12 10 13 15 12
4. Experimental Results
2nd set 26 24 23 20 26 30 24
While labelling the features CSV file, three labels
rd
3 set 39 36 33 29 39 45 36 were used for identifying passion fruit disease according
to its name. These CSV files are called as “3 classes
dataset”. Another way is using seven labels for features
These image sets were preprocessed, segmented and CSV file to identify passion fruit diseases according to its
feature extracted separately. Four CSV files were stage. This CSV files are called as “7 classes dataset”.
received for every image set. Original image name and Every training and testing time, rows of CSV files were
label included into a one CSV file, which called a “label shuffled randomly for increasing the accuracy of the
file”. This file was used for giving labels for each row in model. Each CSV file was trained and tested in five times
feature extracted CSV file. Name column in feature and accuracy was taken. Average of those accuracies was
extracted and label files were matched. When similar taken as the accuracy of each model. Figure 7 shows the
names were found, a particular label in label file was total number of images, which were used to create each
given to label column in the feature extracted file. After dataset.
that, this dataset can be used for training and testing the
model. Using this image dataset, four types of CSV files were
created. Such as Grey, HSV, L*a*b and RGB, which are
A sample feature CSV file is shown in figure 6. Image the colour models that were used to evaluate in this
name, feature histogram values and labels are included in study.
this dataset.
348 image records are included in these CSV files and 4.2.1. First dataset results
labelled by 3 labels. Each CSV file was used five times
for getting accuracy and by using those accuracies, got 348 image records are included in these CSV files and
the average accuracy of the model. Those models labelled by 7 labels. Each CSV file was used five times
accuracies are discussed in table 7. for getting accuracy and by using those accuracies, got
the average accuracy of the model. Those models
Table 7. Accuracy of models using 3 classes of first accuracies are discussed in table 10.
dataset
Table 10. Accuracy of models using 7 classes of first
1st 2nd 3rd 4th 5th Average dataset
time time time time time Accuracy
Grey 76.9 75.0 68.3 71.2 69.2 72% 1st 2nd 3rd 4th 5th Average
HSV 67.3 75.9 69.2 72.1 69.2 70% time time time time time Accuracy
L*a*b 78.9 73.1 78.9 81.7 81.7 79% Grey 64.4 63.5 57.7 64.4 59.6 62%
RGB 70.2 69.2 75.0 63.5 71.2 70% HSV 63.5 57.7 55.8 52.9 56.7 57%
L*a*b 65.4 66.4 68.3 70.2 62.5 66%
RGB 63.5 63.5 59.6 70.2 61.5 64%
4.1.2. Second dataset results
692 image records are included in these CSV files and 4.2.2. Second dataset results
labelled by 3 labels. Each CSV file was used five times
for getting accuracy and by using those accuracies, got 692 image records are included in these CSV files and
the average accuracy of the model. Those models labelled by 7 labels. Each CSV file was used five times
accuracies are discussed in table 8. for getting accuracy and by using those accuracies, got
the average accuracy of the model. Those models
Table 8. Accuracy of models using 3 classes of accuracies are discussed in table 11.
second dataset
Table 11. Accuracy of models using 7 classes of
1st 2nd 3rd 4th 5th Average second dataset
time time time time time Accuracy
Grey 63.9 68.3 69.2 74.0 67.3 68% 1st 2nd 3rd 4th 5th Average
HSV time time time time time Accuracy
65.4 59.1 52.4 65.4 60.9 60%
L*a*b
Grey 53.4 54.8 49.5 51.4 56.7 53%
72.1 73.6 75.5 69.2 69.7 72%
RGB HSV 45.7 48.6 51.9 49.0 52.9 50%
71.2 67.3 67.3 67.3 73.6 69%
L*a*b 60.6 61.1 62.0 58.7 60.6 61%
RGB 61.5 59.1 58.2 58.7 52.9 58%
4.1.3. Third dataset results
1028 image records are included in these CSV files 4.2.3. Third dataset results
and labelled by 3 labels. Each CSV file was used five
times for getting accuracy and by using those accuracies, 1028 image records are included in these CSV files
got the average accuracy of the model. Those models and labelled by 7 labels. Each CSV file was used five
accuracies are discussed in table 9. times for getting accuracy and by using those accuracies,
got the average accuracy of the model. Those models
Table 9. Accuracy of models using 3 classes of third accuracies are discussed in table 12.
dataset
1st 2nd 3rd 4th 5th Average Table 12. Accuracy of models using 7 classes of third
time time time time time Accuracy dataset
Grey 58.8 59.4 65.9 61.7 60.7 61% 1st 2nd 3rd 4th 5th Average
HSV 56.1 58.4 56.5 54.9 57.4 56% time time time time time Accuracy
L*a*b 68.5 64.3 61.4 65.9 67.2 65% Grey 47.1 49.7 53.3 53.9 49.7 51%
RGB 58.8 64.3 60.4 58.1 59.4 60% HSV 47.4 40.3 41.2 43.8 47.1 44%
L*a*b 56.8 54.9 63.6 58.8 62.3 59%
RGB 47.4 44.2 45.8 51.9 47.4 47%
5. Conclusions
In this study, the support vector machine algorithm
was used for creating the models, which were built
according to passion fruit diseases and its stages. The
passion fruit diseases can be identified in the average
accuracy of 79% and its stage can be identified in the
average accuracy of 66%.
Figure 9 shows the accuracy comparison according When considered the 7 classes datasets, the highest
to 7 classes datasets. L*a*b gives the highest accuracies average accuracy of the model was received by the first
and HSV gives the lowest accuracies among the dataset, which also included 87 images. Those images
preprocessing colour models. When considering the grey were categorized as non-disease (13), mild-scab (12),
colour model, it takes the second place of the accuracy moderate-scab (12), severe-scab (10), mild-woodiness
comparison chart. The third place of accuracy (13), moderate-woodiness (15) and severe-woodiness
comparison chart is taken by the dataset, which images (12). A number of images in each category has been
were not used any preprocessed colour model. That is the mentioned in parenthesis. The model had 66% accuracy
RGB colour model. According to figure 9, the accuracy and it was received by the L*a*b colour model.
of each model decreases, when increasing the number of
images in each dataset.
6. References
[1] Sema, A. & Maiti, S. C. Status & Prospects of
Passion Fruit Industry in Northeast India.
Information bulletin, Central Institute of Horticulture,
Medziphema, Nagaland, 2009