Professional Documents
Culture Documents
A MASTER’S THESIS
BY
MEQUANNT KAHSAY
JUNE 2019
AASTU
2019
ADDIS ABABA SCIENCE AND TECHNOLOGY UNIVERSITY
By
MEQUANNT KAHSAY
A Thesis Submitted as a Partial Fulfillment to the Requirements for the Award of the
Degree of Master of Science in Software Engineering
to
JUNE 2019
Declaration
I hereby declare that this thesis entitled “Classification of Wheat Leaf Septoria Disease
Using Image Processing and Machine Learning Techniques” was prepared by me, with
the guidance of my advisor. The work contained herein is my own except where explicitly
stated otherwise in the text, and that this work has not been submitted, in whole or in part,
for any other degree or professional qualification.
Mequannt Kahsay_______________________________________________
Witnessed by:
ii
Approval Page
This is to certify that the thesis prepared by Mr. Mequannt Kahsay entitled
“Classification of Wheat Leaf Septoria Disease Using Image Processing and Machine
Learning Techniques” and submitted as a partial fulfillment for the award of the Degree
of Master of Science in Software Engineering complies with the regulations of the
university and meets the accepted standards with respect to originality, content and quality.
________________________________________________________________________
________________________________________________________________________
________________________________________________________________________
________________________________________________________________________
________________________________________________________________________
________________________________________________________________________
iii
Abstract
Education, health and food security are the three main concerns of developing countries
including Ethiopia, and it’s clear that agriculture is the most powerful factor for the growth
of Ethiopian economy. In addition to this, for citizens in order to keep them alive, at least
planning of food security program is very essential and for these programs to achieve
sufficient productivity of farming fields is expected. One way of making productive field
is the serious care of its elements which begins with growing healthy plants or crops. For
this to achieve a farmer or an agriculture expert should follow up, diagnose a field and
make decisions accordingly. Farmers and agriculture experts visually carry out
examination of crops. However, this evaluation process is tedious, time consuming, and
less accurate, which can cause high risk of loss later. Image processing and machine
learning have been extensively used in various disease diagnosis approaches. It has been
applied to both images captured from cameras of visible light and from equipment that
captures information in invisible wavelength, assisting experts to select the right measure
and treatment. In this research work, a digital camera captured image is used as input and
enhanced with various preprocessing techniques followed by color-based segmentation
method to separate the regions of interest then features are extracted using Gray Level Co-
occurrence Matrix. Classification of the input image is performed at the final stage taking
four different supervised learning algorithms to classify in to two different classes called
‘healthy’ and ‘infected’. All the work is done using Python and supporting libraries. Naïve
Bayes, k-Nearest Neighbor, Support Vector Machines and Random Forest are the
classification algorithms taken for comparison. Based on a confusion matrix evaluation
Random Forest found to be the best with 98.7 % accuracy of classification.
iv
Acknowledgments
I would like to forward my deepest appreciation and thanks to my advisor Sreenivasa Rao
Vuda (PhD) for his constructive guidance throughout the work.
I would also like to thank all my family, friends and colleagues for their personal
commitment and contribution to the success of the graduate program specially my friend
Temesgen Tsegai for his every bit of support.
v
Table of Contents
Declaration......................................................................................................................... ii
Approval Page .................................................................................................................. iii
Abstract ............................................................................................................................. iv
Abbreviations and Acronyms ......................................................................................... ix
List of Tables ..................................................................................................................... x
List of Figures ................................................................................................................... xi
1. INTRODUCTION......................................................................................................... 1
1.1 Background ............................................................................................................... 1
1.2 Motivation ................................................................................................................. 2
1.3 Statement of the Problem .......................................................................................... 3
1.4 Objectives .................................................................................................................. 4
1.4.1 General Objective ............................................................................................... 4
1.4.2 Specific Objectives ............................................................................................. 4
1.5 Methodology ............................................................................................................. 4
1.5.1 Data Collection ................................................................................................... 4
1.5.2 Dataset Preparation ............................................................................................. 4
1.5.3 Image Data Processing ....................................................................................... 5
1.5.4 Image Classification ........................................................................................... 6
1.5.5 Performance Evaluation ..................................................................................... 6
1.6 Scope and Limitation ................................................................................................ 7
1.7 Significance ............................................................................................................... 7
1.8 Organization of the Thesis ........................................................................................ 7
2. LITERATURE REVIEW AND RELATED WORK ................................................ 8
2.1 Introduction ............................................................................................................... 8
2.2 Wheat Overview ........................................................................................................ 8
2.3 Wheat Disease Types ................................................................................................ 8
2.4 Diseases Affecting Wheat Leaf ................................................................................. 9
2.5 Digital Image Processing ........................................................................................ 13
2.5.1 Low-Level Processing ...................................................................................... 14
2.5.2 Mid-Level Processing ....................................................................................... 14
vi
2.5.3 High-Level Processing ..................................................................................... 15
2.6 Basic Image Processing Steps ................................................................................. 16
2.6.1 Image Acquisition............................................................................................. 16
2.6.2 Image Preprocessing ......................................................................................... 17
2.6.3 Image Segmentation ......................................................................................... 18
2.6.4 Morphological Process ..................................................................................... 24
2.6.5 Feature Extraction............................................................................................. 25
2.7 Machine Learning ................................................................................................... 25
2.7.1 Supervised Learning ......................................................................................... 26
2.7.2 Unsupervised Learning ..................................................................................... 27
2.7.3 Semi-supervised Learning ................................................................................ 28
2.7.4 Some Popular Classification Algorithms ......................................................... 28
2.8 Applications of Image Processing and Machine Learning in Agriculture .............. 30
2.9 Related Work........................................................................................................... 31
3. PROPOSED SYSTEM ............................................................................................... 42
3.1 Introduction ............................................................................................................. 42
3.2 System Architecture ................................................................................................ 42
3.3 Data Collection and Dataset Preparation ................................................................ 43
3.3.1 Data Collection ................................................................................................. 43
3.3.2 Dataset Preparation ........................................................................................... 43
3.4 Image Processing Tasks .......................................................................................... 43
3.4.1 Image Preprocessing ......................................................................................... 43
3.4.2 Color-based Segmentation ................................................................................ 45
3.4.3 GLCM Features Extraction .............................................................................. 45
3.5 Classification Algorithms........................................................................................ 48
3.5.1 Naïve Bayes Classifier...................................................................................... 48
3.5.2 k-Nearest Neighbor Classifier .......................................................................... 49
3.5.3 Support Vector Machines Classifier ................................................................. 50
3.5.6 Random Forest Classifier ................................................................................. 50
3.6 Performance Evaluation Metrics ............................................................................. 51
3.7 Software Tools and Libraries .................................................................................. 52
4. EXPERIMENTAL RESULTS AND DISCUSSION ............................................... 54
vii
4.1 Setting up Development Environment .................................................................... 54
4.2 Training Phase ......................................................................................................... 54
4.3 Testing Phase........................................................................................................... 57
4.4 Performance Evaluation Results ............................................................................. 59
5. CONCLUSION AND RECOMMENDATION ........................................................ 63
5.1 Conclusion............................................................................................................... 63
5.2 Recommendation ..................................................................................................... 64
References ........................................................................................................................ 65
viii
Abbreviations and Acronyms
ADLI Agricultural Development Led Industrialization
ANN Artificial Neural Network
CV Computer Vision
DIP Digital Image Processing
EIAR Ethiopian Institute of Agricultural Research
FDRE Federal Democratic Republic of Ethiopia
GLCM Gray Level Co-occurrence Matrix
GTP Growth and Transformation Plan
HRC Holetta Research Center
HSV Hue, Saturation, Value
k-NN k-Nearest Neighbors
ML Machine Learning
NB Naïve Bayes
SVM Support Vector Machine
RF Random Forest
RGB Red, Green, Blue
ix
List of Tables
Table 2.1: Summary of related work ................................................................................ 40
Table 4.1: Confusion matrix result for different classifiers .............................................. 61
Table 4.2: Summary of evaluation result .......................................................................... 62
x
List of Figures
Figure 2.1: Wheat leaf diseases (Top – left) ..................................................................... 13
Figure 2.2: Low level processing ...................................................................................... 14
Figure 2.3: Mid-level processing ...................................................................................... 15
Figure 2.4: High-level processing ..................................................................................... 15
Figure 2.5: Basic steps in image processing ..................................................................... 16
Figure 3.1: Proposed system architecture ......................................................................... 42
Figure 3.2: Preprocessing tasks......................................................................................... 44
Figure 3.3: Segmentation tasks ......................................................................................... 45
Figure 3.4: Feature extraction tasks .................................................................................. 46
Figure 3.5: Gray Level Co-occurance Matrix (GLCM) calculation ................................. 46
Figure 3.6: ML Classification tasks .................................................................................. 48
Figure 3.7: Evaluation metric samples.............................................................................. 51
Figure 3.8: Development tools.......................................................................................... 53
Figure 4.1: Different color conversions ............................................................................ 55
Figure 4.2: Image dilation result ....................................................................................... 56
Figure 4.3: Segmentation results....................................................................................... 56
Figure 4.4 : Summarized output of tasks .......................................................................... 58
Figure 4.5: Classified image based on different classifiers prediction ............................. 58
Figure 4.6: Confusion matrix distribution result............................................................... 60
Figure 4.7: Accuracy, Sensitivity, and Specificity comparison ........................................ 61
xi
CHAPTER ONE
INTRODUCTION
1.1 Background
A successful implementation of education, health and agriculture programs are the main
concerns of developing countries like Ethiopia for sustainable development of the countries
and ease life of their citizens [1]. The Ethiopian government Agricultural Development
Led Industrialization (ADLI) is a central pillar of economic policy in the recently
completed plan for accelerated and sustained development to end poverty [2]. In the
agricultural sector, Ethiopia has a comprehensive and consistent set of policies and
strategies, which reflects the importance of the sector in the Nation’s development goals.
From this fact we can see that Ethiopian economy is highly dependent on success of the
agriculture sector. Additionally, according to [3] smallholder agriculture is the most
important sector of Ethiopian economy. More than 80% of the population lives in rural
areas, and their main source of income is agriculture. The agricultural sector accounts for
about 45% of GDP, almost 90% of exports, and 85% of employment. Food security
nonetheless remains a key challenge. Government, through the allocation of more than
15% of the total budget, along with development partners, have demonstrated a strong
commitment to the sector, although a significant portion of this directly targets the
relatively large and chronically food-insecure population. While such a strategy is expected
to strengthen the livelihoods of food-insecure households, long-term food security cannot
be achieved through exclusive attention to the vulnerability. Success will require
complementary efforts to enhance agricultural growth, and thereby reduce food prices and
diversify rural livelihoods. For this to achieve, good quality and planned productivity of a
farming field plays an essential role. One of the top ways to keep a farming field productive
is the proper care of crops’ health. Unless proper care is taken in the field, then it could
cause some serious effects on crops and due to which respective product quality, quantity
or productivity is affected and finally puts food security in to question.
Over the recent few years, Ethiopia has planned and launched different development
programs targeting the transformation and industrialization the existing agriculture-based
economy. To this end, the Government has already implemented the first Growth and
1
Transformation Plan (GTP I) and it is on the way to implement the second, GTP II. One of
the main goals of GTP II has been the conversion of the agricultural-lead to the industry-
lead system. Unfortunately, most of the current industries are based on manual systems,
with very limited partially automated components, especially in the agricultural sector [4].
As a reason, an automation of the agricultural practice is highly demanded for a rapid
growth.
Plant disease is any abnormal condition that damages a plant and reduce its quality,
productivity, or usefulness to the consumers (humans and/or animals). Biology categorizes
plant disease in to two broad classes, i.e., Infectious (biotic) and Non-infectious (abiotic).
Biotic diseases are the diseases that are caused by a living parasitic organism to take their
nutrition from the plants. These organisms can be a fungi, bacteria, or viruses and other
parasites.
Abiotic diseases are the diseases caused by environmental factors. Here some examples of
the factors are listed.
• Nutrition,
• Moisture,
• Temperature,
• Meteorological condition,
• Toxic chemicals and others.
1.2 Motivation
As agriculture is the backbone of Ethiopian economy every citizen should participate for a
successful implementation of programs in the sector. I believe that not only the farmers or
agricultural experts, but everyone should put his/her hand, so as a software engineering
researcher I have to relate my field of study with the major concern of my country. Today’s
technology advancements have so many advantages for agricultural practices and
implementations, the area of artificial intelligence is one of these technologies beside that
image processing and machine learning are emerging technologies that can benefit
Ethiopian agriculture specially in crop disease detection. Wheat is one of the most
important cereal crops in Ethiopia and produced across large area of the country.
2
Production of the crop constrained by several infection diseases including rust and Septoria
diseases which are the major bottle neck of wheat production in Ethiopia [5].
Significance progress in the area of artificial intelligence and image processing has led to
a good number of real-world applications that include industrial process, business
implementation, medical science, biological science, material science and the like. The
development in certain disciplines of computer science like image processing, machine
learning, pattern recognition, deep learning etc. promise the required technological support
to tackle the various issues in computer vision. Image processing technology has been and
being applied for different applications, agriculture is one of these applications. Image
processing is applied for weed detection, for which the plants growing in wrong place in
farm which compete with crop for water, light, nutrients and space, causing reduction in
yield and effective use of machinery. Weed control was important from agriculture point
of view, so many researchers developed various methods based on image processing. Weed
detection techniques used algorithms based on edge detection, color detection, and
classification. Image processing is also applied for fruit grading, need of accurate sorting
of fruits and foods or agriculture products arises because of increased expectations in
quality food and safety standards. Image processing in agriculture and food industries has
been applied in the areas of detection, recognition and classification of different diseases
including defects such as dark spots or cracks on crops [7].
This study will try to explore and address the following research questions:
3
• What possible algorithms are available for leaf disease classification?
• What is the performance of the models for septoria wheat leaf diseases
classification?
1.4 Objectives
1.4.1 General Objective
The main objective of this research work is to design and develop a model for classification
of wheat leaf septoria diseases using image processing and machine learning techniques.
1.5 Methodology
The research work is to be carried out under the following major phases. Several related
literatures will also be reviewed during the process of the research work in order to have
better understanding of the actual situations of the problem and the technology available to
finally come with good solution.
4
work local images from nearby agricultural fields and from the Internet are to be used for
model training and model testing purposes, in which these leaf image files will be
differentiated between healthy and infected categories manually at first. The dataset will
be prepared based on 70/30 strategy in which 70% of the dataset is used for training and
the rest 30% of the dataset is used for testing the model.
Segmentation stage is also another essential part of a system in which it helps to focus
only on the desired region out of the original leaf image, in the acquired image there may
exist some unnecessary parts that are captured with the camera, so the segmentation
process is done to get those portions of the preprocessed image which have certain value
of pixels (most of the time they are expected to be sign of a disease). There exist several
types of image segmentation techniques in the area of image processing and computer
vision having their own advantages and disadvantages according to the problem they are
applied on, i.e., the image to be segmented. Thresholding, Fuzzy C Means, Watershed, K
means Clustering, and other many, can be listed as an example.
5
reason, additional morphological processes are required to obtain clear and distinct region
of disease. In the implementation grayscale dilation, erosion or combination of them i.e.,
opening or closing techniques can be used based on the image under process. The feature
extraction and selection from an image plays a critical role in the performance of any
following classifier. After segmentation and morphological processing, set of features are
going to be extracted from the resultant image. In feature extraction stage every image is
assigned a feature vector to identify it. This vector is used to distinguish the image. Higher
accuracy of the classifier can be achieved by the selection of good feature extraction
techniques. Since an image is a large data set use of all the pixel values in classification
can create computational overhead as a result features with distinguishing properties should
be extracted and then be selected.
TP
Sensitivity (%) = ∗ 100
TP + FN
TN
Specifity (%) = ∗ 100
TN + FP
TP + TN
Accuracy (%) = ∗ 100
TP + FP + TN + FN
6
FP – False Positive
FN – False Negative
1.6 Scope and Limitation
As detection and classification of leaf disease is a very vast area of study to deal with, it is
important to make some sort of boundary of task coverage for having better outcomes. As
a result, the research work here is limited to only classifying an input image into either of
two different classes’ i.e., Healthy (uninfected leaf) and Infected (leaf infected with
septoria disease) and the test case on ‘Wheat’ for a disease called ‘Septoria tritici’ and part
of crop ‘leaf’, neither stem nor head and other parts of the crop.
1.7 Significance
In this research work, a development of wheat leaf disease classification model using image
processing and machine learning techniques is carried out. This might seem it’s only
important for the farmers and some agriculture experts, but in fact it can help every
individual who is beneficiary from the agricultural sector in other words almost everybody
who consumes agricultural products for his/her daily life. Finally, as a research, this work
can add some input on to the area that encloses applications of image processing and
artificial intelligence over Ethiopian agriculture, and can also be a starting point for other
researchers and practitioners.
7
CHAPTER TWO
2.1 Introduction
In this chapter, at the first 4 sections, the details of wheat and wheat disease types are
overviewed, next to this, what image processing is all about and what basic steps does it
has, are covered in Sections 2.5 and 2.6 respectively, in the following section the
background of machine learning and different techniques under the domain, are raised.
Then Section 2.8 tries to list out some applications of image processing and machine
learning in agriculture. At the end of this chapter, Section 2.9 overviews related previous
works that are done by different researchers.
8
Different parts of wheat can be affected by different types of diseases. Here the name of
known diseases that affect lower stems and root, diseases that affect heads and grain are
listed. Then, disease that affect leaves are detailed, since the focus of this research is in
wheat leaf disease at general and a disease named Septoria at specific.
Septoria tritici blotch: this fungal disease causes tan, elongated lesions on wheat leaves.
Lesions may have a brown body margin with yellow, but the degree of yellowing varies
among varieties. Rows of dark, reproductive structures produced by the fungus are key
diagnostic features and can often be seen without magnification. This disease is also known
as speckled leaf blotch. Conditions for disease development include temperatures between
59 to 77 °F and periods of rainy or humid weather that last for more than one day. Disease
9
outbreaks occur more prevalently on lower leaves in the early spring after cool, wet
conditions. The pathogen will start to decline as temperatures increase [10].
Bacterial streak: early symptoms of bacterial streak include small, water-soaked areas
between leaf veins. These water-soaked areas become tan streaks within a few days. When
the disease is severe, streaks may merge to form large, irregular areas of dead tissue. When
droplets are present, the bacteria causing this disease may discharge from the lesions and
dry to form a clear, thin film. This film flakes easily and is visible when the leaf is viewed
from different angles [9].
Barley yellow dwarf: this viral disease causes wheat leaves to have a yellow or red
discoloration. The discoloration is often more intense near the tip of affected leaves, giving
them a flame-like appearance. Barley yellow dwarf often occurs in patches within a field.
The size and distribution of these patches depends on the feeding activity of aphids, which
spread barley yellow dwarf virus. Infected plants within these patches may be shorter than
neighboring healthy plants.
Leaf rust: small, orangish-brown lesions are key features of leaf rust infections. These
blister-like lesions are most common on leaves but can occur on the leaf sheath, which
extends from the base of the leaf blade to the stem node. Lesions caused by leaf rust are
normally smaller, rounder, and cause less tearing of the leaf tissue than those caused by
stem rust. Conditions for disease development include temperatures between 64 and 77 °F
with high humidity or moisture. After spores’ land on leaves infection is completed in 6 to
8 hours and disease symptoms can develop within 7 days [10].
Tan spot: the key diagnostic feature of tan spot is tan lesions with a yellow margin. Mature
tan spot lesions often have a dark area in the center. Lesions may merge as they expand,
resulting in large sections of diseased leaf tissue. The fungus that causes tan spot survives
in the remains of previous wheat crops and produces small, black reproductive structures
in the spring.
Powdery mildew: signs of powdery mildew include white fungal growths on leaves and
leaf sheaths. Fungal growth is largely limited to outer plant surfaces and can be easily
wiped away by rubbing a finger across affected areas. Mature areas of fungal growth may
10
have dark, reproductive structures mixed with the white, cottony growth of the fungus.
Conditions for disease development are optimal between 59 and 72 °F with high humidity,
and are more prevalent in dense foliage and areas of heavy fertilization. The pathogen
survives on volunteer wheat and in the spring when wheat growth resumes powdery
mildew symptoms can appear on the older leaves initially [9] [10].
Wheat soilborne mosaic: winter wheat infected by wheat soilborne mosaic develops a pale-
yellow discoloration shortly after breaking dormancy in the spring. The incidence of wheat
soilborne mosaic is often greater in low areas of a field, where moist soil conditions favor
growth of the protozoa that spread this viral disease. Leaves of infected plants often have
a mosaic pattern of dark green blotches on a pale-yellow background. Symptoms normally
fade when warm weather slows the viral activity within infected plants.
Wheat streak mosaic: leaves of plants infected with wheat streak mosaic have a bright
yellow streaking. Symptoms are often most severe near the leaf tip. The virus that causes
wheat streak mosaic survives in volunteer wheat and is spread by wheat curl mites. The
disease is often most severe in areas of a field that are closest to these sources of the disease
and mites. Commonly, plants infected with wheat streak mosaic also are infected with High
Plains disease and Triticum mosaic. The symptoms of these diseases are nearly identical.
Disease severity is greater when plants are infected by more than one virus.
Wheat spindle streak mosaic: causes a yellow discoloration to wheat seedlings. This yellow
discoloration is often most intense in the wettest areas of a field. Leaves of infected plants
have long, yellow streaks that are slightly wider in the middle than at their ends. Symptoms
are similar to wheat soilborne mosaic, and plants often are infected with both diseases.
Symptoms fade when higher temperatures reduce viral activity within infected plants
Stripe rust: causes yellow, blister-like lesions that are arranged in stripes. The disease is
most common on leaves, but head tissue also can develop symptoms when disease is
severe. This disease is sometimes referred to as yellow rust. Conditions for disease
development are optimal during 50 to 64 °F with intermittent rain or dew. High levels of
disease can occur in years with cool, wet springs, mild winters, and cool summers for
spores to survive from season to season. Stripe rust can overwinter on leaf tissue, volunteer
11
wheat, and other grass hosts as it can survive temperatures as low as 23 °F. The spores
rapidly decline at temperatures above 59 °F.
Stem rust: causes blister-like lesions on leaves, leaf sheaths, and stems. Infection of glumes
and awns is also possible. The reddish-brown spores of the fungus cause considerable
tearing as they burst through the outer layers of the plant tissues. Mature stem rust lesions
are more elongated than those of leaf rust. Stem rust is typically most severe on later-
maturing wheat varieties.
Stagonospora leaf blotch: the lesions of Stagonospora leaf blotch are normally brown or
tan, surrounded by a thin, yellow halo. Lesions caused by Stagonospora leaf blotch are
more irregular in shape and often have a darker color than those of tan spot. The presence
of small, honey-colored fungal reproductive structures is diagnostic for Stagonospora
nodorum blotch; however, these reproductive structures are only visible with considerable
magnification. This disease also may affect heads late in the growing season. Conditions
for disease development are more prevalent in dense foliage and areas of heavy
fertilization. No-till and minimum tillage increase the risk of occurrence of disease in
continuous or short-rotation wheat production. Disease outbreaks are promoted by wet,
warm weather. Optimal temperature for symptom development is between 68-80 °F [9]
[10].
In the following figure, Figure 2.1 an illustration of different wheat diseases that affect the
leaf part are listed. In the left top corner, the image of wheat leaf infected with septoria
disease is appeared. Next to that, i.e., to the right side, image of wheat leaf infected with
powdery mildew is appeared. All images are ordered in this manner from top to the bottom
and from left to the right accordingly with the sequence of the names in the caption.
12
Figure 2.1: Wheat leaf diseases (Top – left)
Septoria, Powdery mildew, Yellow dwarf, Soilborne mosaic, Leaf rust, Stem rust, Tan
spot, Stripe rust, Stagonospora, Bacterial streak, Spindle streak mosaic, Streak mosaic
13
2.5.1 Low-Level Processing
This low-level image processes can involve operations which are considered primitive such
as image preprocessing to reduce noise, image sharpening, contrast enhancement, and
image resizing. Here both the input and output of the process are images as Figure 2.1
depicts.
Original
Image
Preprocessing Function
• Contrasting
• Noise removal
• Resizing
• Sharpening
Processed
Image
14
Processed
Image
Image Segmentation
And
Extraction
Features /
Attributes
Image
Features
Recognition
and
Interpretation
Extracted
Information
15
2.6 Basic Image Processing Steps
For successful implementation of an application using digital image processing, the
following fundamental steps are involved in most works. There will be several sub-steps
and techniques in each fundamental step. The stages listed in Figure 2.5 are considered to
be the major ones, other many sub stages which are explained next, will be involved in the
actual work. The drop-down steps in each phase are related in manner as input one to its
following step.
Image Acquisition
Image Pre-processing
ROI Segmentation
Feature Extraction
Feature Selection
Classification
Performance Evaluation
Overall System
16
acquisition. The image that is acquired is totally unprocessed, when we see it from the
corner of the intended image processing application. The original image is the result of
whatever hardware was used to generate it, which can be very important in some fields to
have a consistent baseline from which to work, the actual hardware device can be anything
from desktop scanner to a massive optical telescope.
Raw image or unprocessed image is acquired from different sources according to the
application or type of process, for instance image can be acquired using digital camera, cell
phone camera or web camera for an application mostly related with capturing
environmental scene, like face recognition application, object or animal identification,
plant disease detection and so many others. Possibly image can be found from medical
imaging equipment like X-Ray machine, MRI machine for medical diagnosis applications,
like tumor detection and cancer detection. Image can also be acquired from satellite for
geographic information handling and related applications.
For simplified computation, images can be resized to a light weight version of the original
image by minimizing the number of pixels, we call this image resizing task. But this is only
recommended when the resizing action does not affect the resultant image meaning or
appearance of the original image.
Image conversion is one part of preprocessing and can include the process of converting
an image between different types of formats or color spaces for the ease of computation.
For example, colorful images are converted into gray scale images or black and white
17
images because images with less color ranges are easier to process than those with many
color possibilities like RGB or other types. But image conversion is not always important,
it can be neglected in conditions when the result of conversion can affect the meaning of
an image. For example, color images provide more detailed information than gray images.
Keeping the color image unconverted or re-converting back to color image is quite
important for the detection of leaf diseases.
There are several existing techniques of image segmentation techniques in the area of
image processing and computer vision having their own advantages and disadvantages
according to the problem they are applied on, i.e., the image to be segmented [14]. These
all techniques can be found from two basic approaches of segmentation i.e., region based
18
or edge-based approaches. Every technique can be applied on different images to perform
required segmentation. These all techniques also can be classified into three categories
[15]. Some of the mostly used techniques are raised next.
The popular techniques used for image segmentation are: thresholding method, edge- based
detection techniques, region-based techniques, clustering based techniques, watershed-
based techniques and artificial neural network-based techniques etc. These all techniques
are different from each other with respect to the method used by these for segmentation.
19
Global Thresholding: This is done by using any appropriate threshold value/T. This value
of T will be constant for whole image. On the basis of T the output image q(x,y) can be
obtained from input image p(x,y) as:
Variable Thresholding: In this type of thresholding, the value of T can vary over the image.
This can further be of two types:
• Local Threshold: In this the value of T depends upon the neighborhood of x and y.
• Adaptive Threshold: The value of T is a function of x and y
Multiple Thresholding: In this type of thresholding, there are multiple threshold values like
T0 and T1. The values of thresholds can be computed with the help of the peaks of the
image histograms. By using these output image can be computed as:
Hue is an attribute of light that distinguishes one color from the other, for example a red
color from green or yellow color. Saturation describes the amount of whiteness of a light
source in a given image while Intensity or Value is a measure of the brightness of a given
image.
The mathematical formula that converts RGB color space to HSI is given as follows.
20
[(𝑅 − 𝐺) + (𝑅 − 𝐵)]/2
H = arccos{ }
[(R − G)2 + (R − G)(G − B)]1/2
3
S= 1− 𝑚𝑖𝑛[𝑅, 𝐺, 𝐵]
(𝑅 + 𝐺 + 𝐵)
1
V = (𝑅 + 𝐺 + 𝐵)
3
The watershed-based methods use the concept of topological interpretation. In this the
intensity represents the basins having hole in its minima from where the water spills. When
water reaches the border of basin the adjacent basins are merged together. To maintain
separation between basins dams are required and are the borders of region of segmentation.
These dams are constructed using dilation. The watershed methods consider the gradient
of image as topographic surface. Pixels that have more gradient are represented as
continuous boundaries [15].
The edge detection techniques are well developed techniques of image processing on their
own. The edge-based segmentation methods are based on the rapid change of intensity
value in an image because a single intensity value does not provide good information about
edges. Edge detection techniques locate the edges where either the first derivative of
intensity is greater than a particular threshold or the second derivative has zero crossings.
In edge-based segmentation methods, first of all the edges are detected and then are
connected together to form the object boundaries to segment the required regions. The
basic two edge-based segmentation methods are: Gray histograms and Gradient based
methods. To detect the edges one of the basic edge detection techniques like sobel operator,
canny operator and Robert’s operator and others can be used. Result of these methods is
basically a binary image. These are the structural techniques based on discontinuity
detection.
21
2.6.3.5 Segmentation by Region
The region-based segmentation methods are the methods that segments the image into
various regions having similar characteristics. There are two basic techniques based on this
method [12].
Clustering-based techniques are techniques that segment the image into clusters that have
pixels with similar characteristics. Data grouping is the method that divides the data
elements into clusters so that elements of the same cluster are more similar to each other
than the others. There are two basic categories of clustering methods: the hierarchical
22
method and the partition-based method. Hierarchical methods are based on the concept of
trees. In this, the root of the tree represents the whole of the database and the internal nodes
represent the clusters. On the other hand, partition-based methods use iterative optimization
methods to minimize an objective function. Between these two methods, there are several
algorithms to find clusters. There they are two types of basic grouping [15].
The artificial neural network-based segmentation methods simulate the learning strategies
of human brain for the purpose of decision making. Now days this method is mostly used
for the segmentation of medical images. It is used to separate the required image from
background. A neural network is made of large number of connected nodes and each
connection has a particular weight. This method can involve two basic steps: extracting
features and segmentation by neural network [15].
23
2.6.4 Morphological Process
Morphological process is an image post-processing technique done based on shape of the
object to be processed. It is normally performed on binary images, after segmentation is
performed there may exist unnecessary parts still or some necessary parts are removed.
These unwanted parts need to be removed once again and the necessary parts need to be
recovered. For this reason, additional morphological processes are required to obtain clear
and distinct region of interest. In the implementation grayscale dilation, erosion or
combination of them i.e., opening or closing techniques can be used based on the image
under process.
This process needs two inputs, one is the original image, while there other one is called
structuring element or kernel which decides the nature of the operation [17].
In addition to the above separate operations, there are combined techniques called opening
and closing methods which can be applied by joining erosion and dilation techniques by
performing one after the other. Opening is a combined operation performed as erosion
followed by dilation. Which is useful in removing noise. Closing is reverse of opening,
24
dilation followed by erosion, useful in filling small holes inside the segmented region, or
small black points on the region [17].
There are two main tasks involved in machine learning; that are learning/training and
prediction. The system is given with a set of examples called training data. The primary
goal is to automatically acquire effective and accurate model from the training data. The
training data provides the domain knowledge, i.e., characteristics of the domain from which
25
the examples are drawn. This is a typical task for inductive learning and is usually called
concept learning or learning from examples. The larger the amount of training data, usually
the better the model will be. The second phase of machine learning is the prediction,
wherein a set of inputs is mapped into the corresponding target values. The main challenge
of machine learning is to create a model; with good prediction performance on the test data,
i.e., model with good generalization on unknown data.
Supervised learning problems can be further grouped into regression and classification
problems. Both problems have as goal the construction of a succinct model that can predict
the value of the dependent attribute from the attribute variables. The difference between
the two tasks is the fact that the dependent attribute is numerical for regression and
categorical for classification.
2.7.1.1 Classification
A classification problem is when the output variable is a category, such as “red” or “blue”
or “disease” and “no disease”. A classification model attempts to draw some conclusion
26
from observed values. Given one or more inputs a classification model will try to predict
the value of one or more outcomes.
2.7.1.2 Regression
A regression problem is when the output variable is a real or continuous value, such as
“salary” or “weight”. Many different models can be used, the simplest is the linear
regression. It tries to fit data with the best hyper-plane which goes through the points.
Clustering is the task of dividing the population or data points into a number of groups such
that data points in the same groups are more similar to other data points in the same group
than those in other groups. In simple words, the aim is to segregate groups with similar
traits and assign them into clusters.
27
A hierarchical clustering method produces a classification in which small clusters of very
similar molecules are nested within larger clusters of less closely-related molecules.
Hierarchical agglomerative methods generate a classification in a bottom-up manner, by a
series of agglomerations in which small clusters, initially containing individual molecules,
are fused together to form progressively larger clusters.
28
typically small. If k = 1, then the sample is simply assigned to the class of its nearest
neighbor. In binary (two class) classification problems, it is helpful to choose k to be an
odd number as this avoids tied votes. Nearest neighbor method is easy to implement also
quite good results if the features are chosen carefully. The Classifier is works well on basic
recognition problems. The main disadvantage of the k-NN algorithm is that it is a slow
learner, i.e., it does not learn anything from the training data and simply make use the
training data itself for classification. Another disadvantage is this method is also rather
slow if there are a large number of training examples as the algorithm must have to compute
the distance and sort all the training data at each prediction. Also, it is not robust to noisy
data in case of large number of training examples. The most serious disadvantage of nearest
neighbor methods is that they are very sensitive to the presence of irrelevant parameters.
Support vector machine (SVM) is a non-linear Classifier. This is a new trend in machine
learning algorithm which is used in many pattern recognition problems, including texture
classification. In SVM, the input data is non-linearly mapped to linearly separated data in
some high dimensional space providing good classification performance. SVM maximizes
the marginal distance between different classes. The division of classes is carried out with
different kernels. SVM is designed to work with only two classes by determining the hyper
plane to divide two classes. This is done by maximizing the margin from the hyper plane
to the two classes. The samples closest to the margin that were selected to determine the
hyper plane is known as support vectors. Multiclass classification is also applicable and is
basically built up by various two class SVMs to solve the problem, either by using one-
versus-all or one versus-one. The winning class is then determined by the highest output
function or the maximum votes respectively. Main advantages of SVM is its prediction
accuracy is high. It is less sensitive and flexible even if training example contains errors.
Like neural networks the computational complexity of SVMs does not depend on the
dimensionality of the input space. However, this classifier involves long training time. It
is also difficult to understand the learned function (weights). The large number of support
vectors used from the training set to perform classification task which can cause
unbalanced result.
29
2.7.4.3 Artificial Neural Network
Many researchers apply the image processing techniques with convergence of machine
learning techniques to address different problems in the agriculture. Here some overview
of the applications is discussed.
Crop management which includes activities to improve the growth, development and yield
of agriculture crop. It involves activities like pest management, irrigation management, and
weed detection. Image processing techniques can be also used in management of crops.
Using pest management, detection of insect has been done, wireless sensor network is used
for irrigation. In addition, weed detection is used for crop assessment using remote sensing.
The use of image processing for early detection of plant leaf diseases could be a valuable
source of information for executing proper diseases detection, plant growth management
strategies and disease control measures to prevent the development and the spread of
diseases [20] [21].
30
Nutrient deficiencies and various content of plants have been identified from leaves and
skin of product using image processing techniques. To improve and maintain the quality
of fruits and vegetables and for classification of agricultural products, image processing
and machine learning can be used. Many researchers work on fruit quality inspection,
sorting and grading applications.
Authors of [22] published a paper titled detection of plant leaf disease using image
segmentation and soft computing techniques that focuses on variety of fruit plants’ leaf
diseases. The algorithm starts with feeding input image, following the acquisition of an
image preprocessing takes place, to remove the unwanted distortions and to improve the
quality of the image, furtherly the image is clipped to get desired part of the image and
31
smoothing is done using filtering techniques. Color and texture features are then extracted
with the help of GLCM, features called as texture features, which include local
homogeneity, contrast, cluster shade, energy, and cluster prominence are computed. The
classification is first done using the Minimum Distance Criterion with K-Mean Clustering
and shows its efficiency with accuracy of 86.54%. The detection accuracy is improved to
93.63% by proposed algorithm. In the second phase classification is done using SVM
classifier and shows its efficiency with accuracy of 95.71%. The detection accuracy is
improved then to 95.71% by SVM with proposed algorithm. The proposed algorithm is
tested on ten different species that include banana, beans, jackfruit, lemon, mango, potato
and tomato leaf images.
In [23], researchers try to introduce a new approach for detection of grape leaf diseases
using image processing, which will minimize the loss and increase its profit due to
automation. Leaf mages infected by powdery mildew and downy mildew are collected and
preprocessed for further actions. The preprocessing includes residing every input image to
300x300 for easiest computation, thresholding is done to select out green pixels and
gaussian filter is applied to remove noisy pixels from the image. For segmentation k-means
clustering is used to cluster the preprocessed image into three separate groups. For Downy
Mildew color features and for Powdery Mildew texture features are used. As a result, total
of 18 features are extracted for both color and texture features to get better accuracy. These
features are combined (9 color and 9 texture features) before the training phase has come
to happen. A new classifier is proposed, but before that the first analysis is made by using
SVM and ANN classifiers then the proposed fusion classification is done, which ensembles
classifiers from SVM and ANN to regenerate base classifier for grape leaf disease detection
in order to improve the classification process. The given classification system aims at
benefiting from the capabilities of both SVM and ANN algorithms. Based on detection of
disease the proper mixture of fungicides will be provided to the grape farmers. In the work
total of 137 grape leaf images (containing both initial stage as well as final stage images)
are used, out of which 75 images are downy leaf images and 62 are powdery leaf images.
For training phase 60 downy and 50 Powdery images are used and 15 downy and 12
powdery are used for testing. Finally, 93.33 % of accuracy for downy mildew and 83.33%
for powdery mildew using SVM, 86.67% for downy mildew and 91.67% for powdery
32
mildew using ANN, 100% of accuracy for both using the proposed method is reported.
Though, the absence of healthy leaf images in the training and testing phases can affect the
reliability of the overall result.
The classification of tomato leaf diseases using image processing and classification tree,
tomato leaf diseases that include tomato late blight, Septoria spot, bacterial canker,
bacterial spot, and tomato leaf curl is studied by [24]. A dataset that has 95 leaf images
with fungal late blight, 80 with bacterial leaf spot, 78 with leaf curl, 46 with bacterial
canker, 26 with septoria and 58 images of healthy leaf, that is total of 383 images of tomato
leaf is used. In the proposed method both the testing and training modules are started with
segmentation phase using Otsu’s techniques, in which this can affect the accuracy of
obtaining the desired area of interest and the honesty of the overall system result. Since a
digital camera captured image can have some unwanted picture elements. As a result, some
techniques of image preprocessing or image morphology should have been employed.
After segmentation, color, shape and texture features of every segmented image are
extracted using MATLAB GLCM function. Out of these 383, 345 images were selected
randomly and used for repetitive training and 39 images for repetitive testing using
classification tree. Finally, 97.3 % of accuracy is scored.
In the paper by [25], a web enabled disease detection system (WEDDS) based on
compressed sensing (CS) is proposed to detect and classify the diseases in leaves.
Statistical based thresholding strategy is proposed for segmentation of the diseased leaf.
CS measurements of the segmented leaf are uploaded to the cloud to reduce the storage
complexity. Images are acquired with a camera that is triggered only when the color change
on leaf is detected thereby reducing the memory and energy consumption. After pre-
processing is done, segmentation is performed with the help of mean-based thresholding
strategy. At the monitoring site, the measurements are retrieved and the GLCM features
are extracted from the reconstructed segmented image. The analysis and classification are
done using support vector machine classifier. The performance of the proposed system has
been evaluated in terms of accuracy and is compared with the existing techniques. The
system was also evaluated experimentally using Raspberry pi 3 board. The results show
that the proposed method provides an overall detection accuracy of 98.5% and
33
classification accuracy of 98.4%. But the dataset used to train and test the system was
inadequate, total of 30 plant images were used, 20 images for training and 10 images for
testing which put into question; to deal with the candidate classifier, i.e., SVM, having very
small number of images.
Scholars in [27] show the identification of the color image that mainly includes the process
of image capturing, image processing with identification, diagnosis and classification of
disease. The method adopted in each process is different according to the characteristics of
different types of diseases. In pre-processing, based on the analysis of three channel (red,
green and blue) histograms and the comparison the content of the original unprocessed
image, the image in green channel in found to be the clearest one. As a result, the green
channel is chosen in the process of image to gray conversion. In this research Sobel edge
detection based on exploration is employed. The other important finding from this research
is the grading of infected region according to percentage of scab area. The proposed
embedded wheat leaf rust detection uses Samsung S3C244A0 microprocessor for
hardware, embedded Linux is chosen as operating system and Qt is used to write the
application software. The system captures images by means of web camera that is
connected with ARM microcontroller through USB and result is displayed using dedicated
LCD. Here the authors state two different things which are not clear, at first, they explained
34
they used some microprocessor and later they apply ARM microcontroller for the same
purpose which is contradictory. However, the proposed system has an accuracy rate up to
96.2% for recognition.
The study [28] concerns on 5 types of plants’ leaf diseases which includes 35 lternaria-
aletrnata, anthracnose, bacterial blight, cercospora leaf spot, and mosaic. Total of 348
images are collected, for which 227 of them are used for training and the remaining 121
for testing. The collected images are of distinguished forms and in various dimensions,
hence the images are preprocessed accordingly based on different appropriate techniques
and brought to same dimension, suppresses unwanted distortions and remove noisy pixels.
In the segmentation phase K-means clustering and Otsu’s detection are applied to bring out
the region of interest, next to this color co-occurrence is used to extract color, shape and
texture features. Support vector machines (SVM) is recruited for the classification phase
and overall recognition rate is found to be 92.4%.
According to [29] an extensive dataset was generated to perform the training and validation
of the proposed algorithm. A dataset with total number of 8178 images, which includes
3338 leaf images infected with Rust, 2744 infected with Septoria, 1568 infected with Tan
Spot and 1116 healthy leaf images. The pictures were acquired from expanded leaf from
the upper leaf surface and avoiding pictures with symptoms or signals on the margins of
the leaves, and were photographed avoiding direct light. As a preprocessing and
segmentation phase, the authors propose three alternative techniques that will feed input
image to the natural network. In the first techniques, the high-resolution image is resized
into the network input size following common approaches for deep learning architectures.
In this approach, the whole image is down sampled regardless of the size of the leaf within
the image. In the second technique, the image is cropped to the bounding rectangle that
contains the leaf element on the image. The extent of the leaf element is provided by expert
annotation at training stage and by the end user at test time. Intuitively, cropping the image
into the leaf bounding rectangle diminishes the detail loss by discarding non-relevant areas
of the original image before down-sampling, especially for early-stage diseases. As an
alternative, in order to avoid a significant downscaling of the portion of the image that
might contain potential diseases (i.e., the leaf mask), which could degrade the visual
35
features of the early-stage diseases, they propose the third technique to segment the input
image in a group of roughly homogeneous regions they called super-pixels. The technique
adapts a k-means clustering-based approach in a specified color space while keeping a
configurable spatial coherence so as to enforce a certain compactness. All super-pixels not
intersecting with the leaf mask are discarded, while the rest are independently resized to
the neural network’s input size. For the classification task, a Deep Residual Neural
Network with 50 layers and 224 x 224 input image size was selected, given the required
trade-off between network capacity-complexity and training set size. Next to this, the
training workflow comprises a three-stage learning process. In the first stage which is
pretraining, a network is trained from scratch by initializing with pseudo-random weights.
The second stage corresponds to a constrained fine-tuning. The final training stage
completes the fine-tuning by starting from the weights resulting from the previous stage
and unfreezing all the layers, thus yielding a free, unconstrained training. The algorithm
was tested over three different diseases: Septoria, Tan Spot and Rust on wheat leaf images
then scores 96% of average accuracy, and it has been deployed on a real smartphone
application that is auxiliary application developed to allow image acquisition and automatic
picture recognition. The application works on Android, iOS and Windows Phones. It is
able to provide fast disease identification with around 5 seconds response time.
The main goal of the system proposed by [20] is to detect and recognize the class disease
in the image. They tried to accurately detect the object, as well as identify the class to which
disease group does the plant belongs. Extending the idea of object detection framework to
adapt it with different feature extractors that detect diseases in the image is the job behind.
Some of commercial crops that include, cereal crops, vegetable crops and fruit plants such
as sugarcane, cotton, potato, carrot, chilly, brinjal, rice, wheat, banana and guava, these
leaves images are selected for the purpose of the research. In this paper no evaluation result
it reported.
In the research, by [13] RGB images of various leaves are acquired using a digital camera.
The input image is converted from RGB to I format to mask and remove green pixels. The
purpose of the color space is to facilitate the specification of colors in some standard,
generally accepted way. I (hue, saturation, intensity) color model is a popular color model
36
because it is based on human perception. Region of interest is segmented with Otsu’s
method. Useful features are extracted by computing the texture features using Color-Co
Occurrence methodology. The color co-occurrence texture analysis method is developed
through the SGDM. The gray level co-occurrence methodology is a statistical way to
describe the shape by statistically sampling the way certain gray-levels occur in relation to
other gray levels. These matrices measure the probability that a pixel at one particular gray
level will occur at a distinct distance and orientation from any pixel given that a pixel has
a second particular gray level Then Classifications of the disease using Genetic Algorithm
were the last step. Finally, the output from the classifier block is fed to the microcontroller
unit. The microcontroller checks the result and decides to shift the motor to directions
accordingly with the driver IC using a conveyor belt. A driver IC is used to drive the dc
motor. The healthy leaves can be collected in a basket and used for further applications.
The authors in [30] presents an automatic identification of Ethiopian coffee plant diseases
which occurs on the leaf part and also provides suitable segmentation technique regarding
the identifications of the three types of Ethiopian coffee diseases. In the paper different
classifiers, such as artificial neural network (ANN), k-Nearest Neighbors (K-NN), Naïve
and a hybrid of self-organizing map (SOM) and Radial basis function (RBF) were used.
They also used five different types of segmentation techniques i.e., Otsu, FCM, K-means,
Gaussian distribution and the combinations of K-means and Gaussian distribution. They
conduct an experiment for each segmentation technique to find the suitable one. Finally,
their overall result showed that the combined segmentation technique is better than Otsu,
FCM, K-means and Gaussian distribution and the performance of the combined classifiers
of RBF (Radial basis function) and SOM (Self organizing map) together with a
combination of k-means and Gaussian distribution is 92.10%.
The author of [31] conducted a research under Ethiopian Institute of Textile and Fashion
Technology, Bahir Dar University focusing on maize leaf diseases. The researcher takes
800 maize leaf images for training and testing, 80% were used for training and the
remaining 20% was to test the model. For the recognition and classification analysis, 7
texture’s, 6 color’s and 9 morphological features, totally 22 features were extracted from
each image. To build the recognition and classification model, K-Nearest Neighbor and
37
Artificial Neural Network classification techniques were used. Based on the experimental
results using combined texture, color and morphology features, the Artificial Neural
Network classifier was performing better than K-Nearest Neighbor classifier. The Artificial
Neural Network classifiers algorithm with combined features accuracy for different classes
of disease shows good results. Based on the result, accuracy of 92.5% for common rust,
100% for leaf blight, 90% for healthy leaf and 95.0% for leaf spot were coined, and the
overall performance was 94.4%.
In the work [32] a recognition system capable of identifying plants by using the images of
their leaves has been developed. A mobile application was also developed to allow a user
to take pictures of leaves and upload them to a server. The server runs pre-processing and
feature extraction techniques on the image before a pattern matcher compares the
information from this image with the ones in the database in order to get potential matches.
The different features that are extracted are the length and width of leaf, the area of the leaf
the perimeter of the leaf, the hull area, the hull perimeter, a distance map along the vertical
and horizontal axes, a color histogram and centroid-based radial distance map. A k-Nearest
Neighbor classifier was implemented and tested on 640 leaves belonging to 32 different
species of plants. An accuracy of 83.5% was obtained. The system was further enhanced
by using information obtained from a color histogram which increased the recognition
accuracy to 87.3 %.
In the paper by [33] a review of detection and classification of plant leaf diseases using
several techniques has been discussed. These techniques include Otsu method, image
compression, image cropping and image denoising including K means clustering to
articulate the disease images. Neural networks including back propagation (BP) networks,
radial basis function (RBF) neural networks, generalized regression networks (GRNNs)
and probabilistic neural networks (PNNs) are also used to diagnose wheat and grape
diseases. Many other diseases like orchid leaf disease, rubber tree leaf disease; apple fruit
disease and chili plant disease can also be encountered using other approaches like fuzzy
logic, Multi-class Support Vector Machine and Local Binary Pattern. A miniature
explication on all the diseases and their detection has been given in this paper.
38
Another paper [34] presents an algorithm for image segmentation technique which is used
for automatic detection and classification of wheat leaf diseases. It also covers survey on
different diseases classification techniques that can be used for wheat leaf disease
detection. Image segmentation, which is an important aspect for disease detection in wheat
leaf disease, is done by using genetic algorithm.
Based on the previous related work reviewed above, most of researchers has only focused
on addressing the problems by training the models with a common image setting i.e.,
prepared angle of acquisition, the same lighting property, the same background and others.
But when we come to the practice on ground, these types of training samples might affect
the overall performance of the model. So, this research will try to reach these limitations
by training a model with samples that are acquired at different characteristics.
Table 2.1: recaps the related work section by listing overview of few authors work.
39
Table 2.1: Summary of related work
SN. Author(s) and Year Techniques used Application, Observed weakness Results and
(Focus area) remarks
1. Anand R. et al. [16] K-means clustering with Detection of Tested with single N/A
(2016) ANN disease on brinjal classifier
leaf
2. H.Sabrol and K. Otsu’s segmentation, color, Tomato plant Ready-made image 97.3% classification
Satish [24] (2016) shape and texture feature disease background was used accuracy
extraction with classification classification
tree
3. Pranjali B. Padol and K-means clustering, ANN Downy and Absence of healthy Accuracies found
S. D. Sawant [23] with SVM classification powdery mildew samples in training and SVM 93.33% (for
(2016) grape leaf disease testing phases Downy) 83.33% (for
classification Powdery)
ANN 86.67% (for
Downy) 91.67% (for
Powdery)
Fusion 100% (for
Both)
4. Vijay Singh and K-means, GLCM, minimum Detection of 95.71% detection
A.K. Misra [35] distance criterion and SVM different plant accuracy
(2016) leaves
5. Aasha Nandhini et GLCM, SVM Web based plant Tested with single 98.4% classification
al. [25] (2017) leaf disease classifier and 98.5% detection
detection accuracies
6. Kangshun Li et al. Image processing and Maize leaf diseases N/A
[26] (2017) evolutionary algorithm and pests
7. Peifeng Xu. el. Al. Embedded image processing Wheat leaf rust Image was taken at a 96.2 % recognition
[27] (2017) with ARM9 microcontroller detection and specific and prepared with 92.3% accuracy
grading condition
40
8. Pooja V. et al. [28] Otsu’s detection, RGB to HSI Plant leaf diseases Proper evaluation 92.4% rate of
(2017) conversion, SVM classifier identification technique missed recognition on five
different leaf diseases
types
9. R. Meena Prakash et K-means clustering, GLCM Detection and Stable and solid color 90% classification
al. (2017) features extraction, SVM classification of was used as background accuracy
classifier plant leaf disease for all samples
10. Mrinal Kumar et al. K-means clustering, GLCM, Wheat leaf disease N/A
(2017) PNN detection
11. Akila et al. (2018) Image processing and deep Plant leaf disease N/A
learning detection
12. Artzai Picon et al. Image processing with deep Wheat crop disease All samples were taken 96% accuracy
[29] (2018) convolutional neural network classification at the same angle of
acquisition
13. Arya M. S. et al. Image processing and genetic Detection of N/A
[13] (2018) algorithm with arduino unhealthy plant
leaves
41
CHAPTER THREE
PROPOSED SYSTEM
3.1 Introduction
This chapter has 6 sections, excluding this section. These sections give a detailed information
about the architecture of the system, concepts and algorithms used, evaluation metrics and their
representation related to the study, and finally software tools and libraries used to carry out the
implementation.
Preprocessing Preprocessing
Segmentation Segmentation
Feature Feature
Extraction Extraction
Model
Knowledge Classification
Training
Base
Classified Image
Generally, this architecture tells the overall process followed to classify a particular input
image in either of two classes. As shown in the figure, there are two separate phases in the
system. The training phase and the testing phase with a slight difference in appearance. These
42
two phases are treated separately, after the training phase come to an end the testing phase
follows. The training phase starts from importing a batch of input images, which means several
images are started to be processed at a given time in one after the other manner, or
independently. In the testing phase a single image is imported for the process. After image is
imported both phases pass through the same processes that are providing the same purpose.
The preprocessing, segmentation and future extraction functionalities are the same for both
phases (a detailed design and explanation is found in the coming sections). After the feature
extraction both phases follow different paths, the training phase provides feature vector with a
label input for the model to train and the result is stored in the knowledge base. The testing
phase provides a feature vector to the model and expects for label return classifier returns that
label from knowledgebase that is trained previously.
43
This task has number of sub-tasks like image type conversion for ease of computation, image
enhancement for visual easiness, noise removal for process consistency, image scaling and
resizing or shrinking for complexity minimization quick computation, using filter functions
and other important techniques that are necessary to improve the quality of the input leaf image.
These sub-tasks can be used one or more time during the process of the implementation, and
their order of appearance might not be necessarily the same as it is shown in Figure 3.2. Image
resizing is applied for ease of computation in some cases image files with large size can slow
down the overall performance of a model. This resizing task is applied in way which cannot
abuse the information that is found in the original image. Histogram equalization is another
sub-task under the pre-processing task, that is applied for
Preprocessing
contrast enhancement which enables to get detailed
information from input image. Input image has
Resizing
RGB/BGR color space originally, but for additional
purposes this color image can be converted to different
Hist. Equalization
color spaces. To grayscale conversion helps to work fine
with many image processing algorithms grayscale
Color Space
Transformation images are easy to process than RGB images. Because of
the number of possible pixel values in different color
Dilation spaces varies, most algorithms treat input images
differently. Additionally, HSV/HIS images are easy for
Denoising the machine to identify the dominant color in a given
image than the RGB one. Here, sometimes it is also
called as postprocessing technique, image dilation
Figure 3.2: Preprocessing tasks technique is applied to recover missed pixels due to
techniques that are applied before, mostly due to segmentation. Dilation helps to fill up these
missed pixels so that we can get the desired unbiased information, but dilation not only
recovers missed pixels it might also add unnecessary pixels accidentally these additives are
sometimes found to be noises. So, for these noises to be removed some denoising techniques
like median filtering are applied.
44
3.4.2 Color-based Segmentation
Segmentation task is also another essential part of the image processing stage. It helps to only
focus on the desired region out of the whole leaf image, typically the resultant image from pre-
processing task. The color-based segmentation is one of the
Color Based Segmentation stochastic types that works on the discrete pixel values of
the image. Foreground and background regions are
FG and BG regions separated based on the color range they found in. Wheat leaf
Separation
by nature has a range of color between light green and dark
green. This can also be due to nature of the crop or the
Healthy Region
Separation imaging conditions of particular environment and particular
acquisition tool. But if the leaf is infected with any disease,
some other type of color can be seen. For instance, if the leaf
Mask healthy region
is infected with ‘Septoria tritici’ a lesion colored between
yellow and brown is seen. It is easy for human eye to detect
Infected Region and recognize such changes, but for a machine it could be a
Separation
bit difficult. For this reason, image background and
foreground regions should be separated first. After having
the desired foreground region, the background region is
Figure 3.3: Segmentation tasks
eliminated (colored black) since it is not needed in the
coming processes. In the foreground image one color range for healthy leaf and two different
color ranges for infected leaf are expected. Earlier these all ranges were joined and treated as
a single range of color in order to eliminate the background. Furthermore, the foreground
image is separated in to healthy and infected regions with the same technique for the same
reason. Now things are getting easy, the relevant part of the input image is selected out and
segmentation task is ended here feature extraction comes next.
45
While local features have a position and are a function of a local image region. Texture defines
the consistency of patterns and colors in an image. Gray Level Co-occurrence matrix (GLCM)
uses adjacency concept in images. The basic idea is that it looks for pairs of adjacent pixel
values that occur in an image and keeps recording it over the entire image. The co-occurrence
matrix can be constructed based on the orientation and
Feature Extraction distance between image pixels. Texture patterns are
governed by periodic occurrence of certain gray levels.
Consequently, the repetition of the same gray levels at
Texture Features
predefined relative positions can indicate the presence
of a specific texture. Several texture features such as
GLCM entropy, energy, contrast and homogeneity, can be
extracted from the co-occurrence matrix. A gray level
co-occurrence matrix P(i, j) is defined based on a
displacement vector dxy = (dx, dy). The pairs of pixels
Figure 3.4: Feature extraction tasks
separated by dxy distance that have gray levels i and j
are counted and the results are stored to P(i, j) [36].
As it is seen in Figure 3.5, gray-level pixel value 1 and 2 occurs twice in the image and hence
GLCM records it as two. But pixel value 1 and 3 occurs only once in the image and thus GLCM
records it as one.
Here, the adjacency calculation is assumed only from left-to-right. But in practice, there are
four types of adjacency and hence four GLCM matrices are constructed for a single image.
Four types of adjacency are as follows.
46
• Left-to-Right
• Top-to-Bottom
• Top Left-to-Bottom Right
• Top Right-to-Bottom Left
Some of the texture features that are extracted from the segmented image are explained below.
Homogeneity returns a value that measures the closeness of the distribution of elements in the
GLCM to the GLCM diagonal. Homogeneity=1 for a diagonal GLCM.
𝑁−1
𝑃𝑖𝑗
Homogeneity = ∑
1 + (𝑖 − 𝑗)2
𝑖,𝑗=0
Contrast returns a measure of the intensity contrast between a pixel and its neighbor over the
whole image. Contrast is 0 for a constant image.
𝑁−1
Energy returns the sum of squared elements in the GLCM. Energy=1 for a constant image.
𝑁−1
Energy = ∑ (𝑃𝑖𝑗 )2
𝑖,𝑗=0
Correlation returns a measure of how correlated a pixel is to its neighbor over the whole image.
Correlation is 1 or -1 for a perfectly positively or negatively correlated image and correlation
is NaN for a constant image.
𝑁−1
(𝑖 − 𝜇)(𝑗 − 𝜇)
Correlation = ∑ 𝑃𝑖𝑗
𝜎2
𝑖,𝑗=0
Where
47
𝑁−1
𝜇 = ∑ 𝑖𝑃𝑖,𝑗
𝑖,𝑗=0
• 𝜎 2 = The variance of the intensities of all reference pixel in the relationships that
contributed to the GLCM, calculated as
𝑁−1
𝜎 2 = ∑ 𝑃𝑖,𝑗 (𝑖 − 𝜇)2
𝑖,𝑗=0
ML Algorithm
Supervised Learning
Algorithms
Classified Data
Performance Evaluation
48
properties independently contribute to the probability. The classification approach is described
as follows [37].
Assume that there are N classes of patterns C1, C2 , . . . , CN, and an unknown pattern x in a d
dimensional feature space x= [x1, x2,. . ., xd] . Hence the pattern is characterized by d number
of features. The problem of pattern classification is to compute the probability of
belongingness of the pattern x to each class Ci, i = 1, 2, . . . , N. The pattern is classified to the
class Ck if probability of its belongingness to Ck is a maximum. While classifying a pattern
based on Bayesian classification, we distinguish two kinds of probabilities. These are prior
probability and posterior probability. The prior probability indicates the probability that the
pattern should belong to a class, say Ck, based on the prior belief or evidence or knowledge.
This probability is chosen even before making any measurements, i.e., even before selection
or extraction of a feature. Sometimes this probability may be modeled using Gaussian
distribution, if the previous evidence suggests it.
In cases where there exists no prior knowledge about the class membership of the pattern,
usually a uniform distribution is used to model it. For example, in a four-class problem, we
may choose the prior probability as 0.25, assuming that the pattern is equally likely to belong
to any of the four classes. The posterior probability P(Ci|x), on the other hand, indicates the
final probability of belongingness of the pattern x to a class Ci. The posterior probability is
computed based on the feature vector of the pattern, class conditional probability density
functions P(x|Ci) for each class Ci and prior probability P(Ci) of each class Ci.
Naïve Bayes classification states that the posterior probability of a pattern belonging to a
pattern class Ck is given by:
P (𝑥|𝐶𝑘 ) P (𝐶𝑘 )
P (𝐶𝑘 |x) =
∑𝑁
𝑖=1(P (𝑥|𝐶𝑖 )P (𝐶𝑖 ))
The denominator ∑𝑁
𝑖=1(P (𝑥|𝐶𝑖 )P (𝐶𝑖 )) = P (x) is a scaling term which yields the normalized
value of the posterior probability that the pattern x belongs to class Ci. Hence, x belongs to
class Cp where P(Cp|x) = max{P(C1|x), P(C2|x), . . . , P(CN|x)}
49
make use of those neighbors for determination of the class of the input. In K-NN the
classification i.e., to which class the given point is belongs is based on the calculation of the
minimum distance between the given point and other points. It is not applicable in case of large
number of training examples as it is not robust to noisy data. For the leaf classification, the
Euclidean distance between the test samples and training samples is calculated. In this way it
finds out similar measures and accordingly the class for test samples. A sample is classified
based on the highest number of votes from the k neighbors, with the sample being assigned to
the class most common amongst its k nearest neighbors. k is a positive integer, typically small.
If k = 1, then the sample is simply assigned to the class of its nearest neighbor. In binary (two
class) classification problems, it is helpful to choose k to be an odd number as this avoids tied
votes.
The SVM algorithm is implemented in practice using a kernel. A kernel transforms an input
data space into the required form This helps to build a more accurate classifier. For this
implementation linear kernel is applied, A linear kernel can be used as normal dot product of
any two given observations. The product between two vectors is the sum of the multiplication
of each pair of input values.
50
2. Construct a decision tree for each sample and get a prediction result from each decision
tree.
3. Perform a vote for each predicted result.
4. Select the prediction result with the most votes as the final prediction.
The sensitivity measures how many of the positive samples have been recognized as TP:
TP
Sensitivity (%) = ∗ 100
TP + FN
The specificity measures how many of the negative samples have been recognized as TN:
TN
Specificity (%) = ∗ 100
TN + FP
The classification accuracy is defined as the ratio of the correctly recognized (either positive
or negative) to the total number of samples:
TP + TN
Accuracy (%) = ∗ 100
Total
51
The false positive rate (FPR) and false negative rate (FNR) measure the rate of the falsely
recognized samples.
FN FP
FNR (%) = ∗ 100 𝑎𝑛𝑑 FPR (%) = ∗ 100
FN + TP FP + TN
52
Spyder 3.3.4
53
CHAPTER FOUR
Python 3.7.2, which is the updated version of Python is installed to carry out the coding
function for development and testing of the prototype.
Spyder IDE under Anaconda navigator is used for coding (details about the tool are found in
Section 3.7).
The loop starts with image reading task. After image data is collected and dataset is prepared,
input images get ready for processing. Images are first treated with preprocessing techniques
for better result in the coming steps. The very first step in our code is to read an image with
OpenCV Python method imread(‘img_name.jpg’) , here JPG format is used for a reason of its
popularity and wide access through many image acquisition tools, all most all digital cameras
and mobile phones use the JPG image format to produce image files. Conversion for image
format extension will not be applied for any input image with format other than .JPG, keeping
54
in mind that ‘all input images for both training and testing phases must be .JPG images’ is very
important. Unless Python might fail to recognize an image and could terminate the execution
before further process.
(a) BGR color space (b) converted to Grayscale (c) converted to HSV
OpenCV method resize(img,(256,256)) is applied for input image to resize the image to
256x256 pixels for the ease of computation. After having a resized image cvtColor(img,
cv2.COLOR_BGR2HSV) is used to convert the original RGB color space (BGR in case of
OpenCV) to HSV color space. Then the inRange() method brings out the most relevant color
ranges which are HSV color points between (36,0,0) and (100,255,255) for green and the
second range for yellow-brownish that are HSV color points between (8,0,0) and (36,255,255).
These color ranges represent the possible actual color of a certain wheat leaf image, the green
range for the healthy part and the yellow-brownish for the infected part, since the target disease
septoria is characterized by the yellow-brownish color (‘Septoria’ is a fungal disease causes
tan, elongated lesions on wheat leaves. Lesions have a yellow margin, with brown body). These
two ranges are then assigned to two separate variables. Here the main purpose of the color
ranges is to filter out the leaf part from the whole input image. In other words, to segment the
interesting region by separating foreground and background colors. To have a meaningful
foreground, parts of an image that have similar color value with the previous separate color
ranges should come together. For this reason, method bitwise_or() is applied using the
variables that hold the two color ranges as input. Now by applying bitwise_or() once again,
but with different number of arguments, foreground of the input image (parts of the image that
holds the leaf) is segmented.
55
(a) original input (b) foreground image (c) dilated foreground
image image
Figure 4.2: Image dilation result
In case of necessary parts are excluded from foreground image, additional techniques are
applied to bring these parts back, for this to happen the image should first converted to
grayscale then to binary image with methods cvtColor(image, cv2. COLOR_BGR2GRAY) and
cv2.threshold() respectively. Image dilation cv2.dilate() is applied over the binary image to
recover the missed parts due to the preprocessing and segmentation operations as seen on
Figure 4.2. Finally, the dilated binary image is used as mask to resized input image to segment
out the desired foreground (only leaf part) Figure 4.3 (c). This segmented image is then used
as input for the next step, i.e., feature extraction.
(a) original input image (b) healthy part (c) infected part
segmented segmented
Figure 4.3: Segmentation results
56
GLCM Texture features are extracted from the segmented image using mahotas image
processing library for python. These texture features are 13 in number which include, angular
second moment, contrast, correlation, sum of squares (variance), inverse difference moment,
sum average, sum variance, sum entropy, entropy, difference variance, difference entropy,
information measures of correlation and maximal correlation coefficient. These features create
a feature vector which will be served as an input for the training of a classification model
collaboration with training labels.
In the testing phase, two different approaches are followed for implementation unlike in the
methodology explanation. Two separate programs are written for each approach. Both the
programs run the same code with slight difference at the testing time. The first one is an
independent test for a single image where all classification models are called one after the other
for manual testing, in this approach a single input image is inserted for the testing module and
the classification models will put their predictions on the top-left corner of the original image.
For this approach to conclude Figure 4.4 shows the individual output of the main image
processing tasks with the prediction and Figure 4.5 shows the prediction of each classifier
according to the provided feature vector.
57
(a) original input image (b) foreground image (c) healthy part separated
(d) infected part separated (e) features extracted (f) classified image
The second approach is an automatic testing which is implemented for separate classification
model evaluation, here in this approach the same strategy is used to test a model, an automatic
data splitter is applied to split total samples into 70/30 training/testing strategy. Based on this
approach a confusion matrix is created and used for evaluation following the metrics that are
discussed in Section 3.6 the detail of the result is explained in the coming section, Section 4.4.
(a) NB Predicted (b) K-NN Predicted (c) SVM Predicted (d) RF Predicted
58
4.4 Performance Evaluation Results
All classification models are evaluated by the help of 2x2 confusion matrix. This confusion
matrix shows the number of correct and incorrect predictions made by the classification model
compared to the target values in the data. by calculating the number of test samples under four
categories (TP, TN, FN, FP). These categories are discussed in detail under Section 3.7, here
in this section an interpretation of the metrics that are calculated based on these four categories
is presented.
Accuracy: overall, how often is the classifier correct. How correctly the classifier predicts for
healthy leaves as ‘HEALTHY’ and as ‘INFECTED’ for leaves infected with septoria disease.
Classification error / misclassification: tells how often is the classifier incorrect to predict the
expected class of an input.
Sensitivity: also known as true positive rate or recall, when the actual value is positive, how
often is the prediction correct? Or how sensitive is the classifier in detecting positive instances?
In our case, how correct is the classifier to classify all available infected leaves as
‘INFECTED’.
Specificity: when the actual value is negative, how often is the prediction correct? How specific
or selective is the classifier in predicting positive instances? How correct enough is the
classifier to recognize the healthy samples.
Precision: when a positive value is predicted, how often is the prediction correct? How precise
is the classifier in predicting positive instances? For samples predicted as ‘INFECTED’ how
many of them are infected for real.
False positive rate: when the actual value is negative, how often is the prediction incorrect? In
our case how likely a model misclassifies the healthy samples as ‘INFECTED’.
Based on the evaluation metrics different classification models has the following results.
59
Figure 4.6: Confusion matrix distribution result
The four pie charts in Figure 4.6 illustrates the coverage of testing samples in relation to the
predicted and actual classes of the samples all the classifiers have a greater number of true
positive samples than the other categories of samples, this shows all classifiers are good at
realizing the infected samples that they are ‘INFECTED’. The classifier in the first pie chart
i.e. Naïve Bayes has the greatest coverage of false positive samples, no classifier has predicted
false positive samples, this tells that NB is predicting healthy samples as ‘INFECTED’ too
much than the others. K-NN and SVM are the same at predicting infected samples correctly,
SVM is better in predicting healthy samples same as RF, though RF excels SVM in predicting
infected samples correctly, this helps RF to get better accuracy.
In Table 4.1, the result of the generated 2x2 confusion matrix is populated and calculated for
a share out of the total testing sampled taken. The first columns under each classification
models indicate the samples that are found from the generated confusion matrix while the
second columns under each model represents the percentage of each sample size in relation to
the total samples taken by the splitter.
60
Table 4.1: Confusion matrix result for different classifiers
Classifier
Samples Naïve Bayes K-NN SVM RF
True Positive 38 46.9% 44 54.3% 44 54.3% 46 56.7%
True Negative 27 33.3% 33 40.7% 34 41.9% 34 41.9%
False Positive 6 7.4% 0 0% 0 0% 0 0%
False Negative 10 12.3% 4 4.9% 3 3.7% 1 1.23%
Total test samples 81 81 81 81
The chart in Figure 4.7 shows the growth of performance scores with respect to the candidate
classification models, in the chart there are three lines that represent the most common used
evaluation metrics. Specificity (top line), Accuracy (middle line), and Sensitivity (bottom line),
classification models are listed in the x-axis and their corresponding evaluation result is in the
y-axis. All the lines lay between almost 80% and exact 100% results which shows that these
classification models have scored good performance results.
100%
90%
Performance Score
80%
70%
60%
50%
NB kNN SVM RF
Classifier
The top line exactly fits 100% score for the 3 regions of classifiers except for the first region,
this tells that the 3 classifiers i.e., K-NN, SVM and RF are perfect at predicting correct for the
healthy samples or the true negatives but NB missed some of the healthy samples to classify
as ‘HEALTHY’. Similarly, the middle line (accuracy) also found to be better when the x-axis
61
moves toward right, the last classifier RF has better ability to generally predict the correct class
of samples than the classifiers prior. The bottom line shows the growth of correctly predicting
infected samples as ‘INFECTED’. It started below 80% and achieved to almost 100% at the
end. At the beginning, NB missed a lot true negative samples to correctly predict, when K-NN
comes into action the sensitivity changed to 91% for the same job SVM enhanced the
sensitivity to about 93% and lastly RF predicts almost all true positives as ‘INFECTED’ with
a slight misclassification but the sensitivity is 97.8% which is best of all the candidate
algorithms. Table 4.2: Summarizes the evaluation results for different classification models
based on the formulae under Section 3.6.
Classifier
Metric Naïve Bayes k-NN SVM RF
Accuracy 80.2% 95% 96.2% 98.7%
Sensitivity 79.1% 91.6% 93.6% 97.8%
Specifity 81.8% 100% 100% 100%
Precision 86.3% 100% 100% 100%
Error 19.7% 5% 3.8% 1.3%
False Positive Rate 18.1% 0% 0% 0%
In general, Random Forest and SVM takes more training time than Naive Bayes and k-NN but
the prediction is faster. In some cases of the experiment Naive Bayes and K-NN outperforms
SVM. However, the training time and performance totally depends on the scenario taken and
the dataset recruited.
62
CHAPTER FIVE
5.1 Conclusion
Different types of crops can be infected with several kinds of diseases. The naked eye
observation of experts is the main approach used in practice for detection and identification of
these diseases. But this needs continuous monitoring of experts. This evaluation process is
however, tedious, time consuming, and less accurate. The main aim of this research work is to
design and implement a model for a classification of wheat leaf images with septoria tritici
disease using different image processing and machine learning techniques in combination. The
machine learning algorithms and techniques was generally applied at the classification stage
while the image processing algorithms and techniques was applied in prior stages of the whole
process.
The implementation started with preprocessing and color-based segmentation, then GLCM
features are extracted from the segmented image, these features (in addition to labels) are used
as training inputs to the classification model. During the classification, four different ML
models are created and trained for separate testing and comparison. According to the
evaluation applied, NB scored classification accuracy of 80.2 % and sensitivity of 79.1 % with
Specifity of 81.8 % while K-NN scored classification accuracy 95 % and sensitivity of 91.6 %
with Specifity of 100 % and SVM has classification accuracy of 96.2 % and sensitivity of 93.6
% with Specifity of 100 % while RF has classification accuracy of 98.7 % and sensitivity of
97.8 % with Specifity of 100 % from this we can conclude that all the models except NB, are
successful at specifying the healthy samples and good at the infected ones, generally RF model
outperforms in all evaluations and NB is the model with least scores.
This research work tried to specify possible supervised classification algorithms for binary
classification of wheat leaf ‘septoria tritici’ disease and what performance can be achieved by
these algorithms. The selected and compared algorithms are not the only algorithms available
for such application. Hopefully, this research results will make suggestive contribution to the
machine learning based agriculture automation studies.
63
5.2 Recommendation
Ethiopian economy is highly depended on the successful practice of agriculture. Image
processing and machine learning technologies have a primary importance in several disease
identifications of agricultural products such as cereal crops, fruits, vegetables and others. In
Ethiopia, as per the related work review, no research has been conducted to classify wheat leaf
disease in this direction to support the sector even if some researchers have tried to place their
contribution by taking cases like maize leaf and coffee leaf disease identification [31] [40].
But still, this is not enough. Hence, this research may motivate researchers to work more in the
area. Researchers who are interested to design and develop any system related to plant leaf
diseases classification or identification with similar techniques can adopt the facts, algorithms,
rules and feature analysis concepts carried out in this study with or without any sort of
modifications. In addition to the work already done, the following recommendations are made
for further improvement and research.
• In this research only a single disease for a single cereal crop was considered, but there
exist too many disease types that can affect different type of cereal crops. Interested
researcher can enlarge the scope this research to address additional diseases that are
discussed in Chapter Two.
• Another method of acquisition can be used for similar problem, drone technology and
satellite imagery can be possible ways of acquisition that help to provide a fully
automated and improved solutions.
• Desktop or mobile web-based applications can be developed for delivering the solution
to its primary beneficiaries.
64
References
[1] Solomon Mulugeta Kassa, in Girimite SciTech, Addis Ababa, Rohobot Publishers, 2018,
p. 258.
[4] Addis Ababa Science and Technology University, Curriculum for MSc in Software
Engineering, 2017.
[5] Endale, Hailu; Getaneh, Weldeab, "Survey of Rust and Septoria Leaf Blotch Diseases of
Wheat in Central Ethiopia and Virulence Diversity of Stem Rust Puccinia graminis f. sp.
tritici," Advances in Crop Science and Technology, vol. II, no. 3, pp. 1-5, 2015.
[9] Wheat Leaf Identification, Kansas State University, Agricultural Experiment Station,
2017.
[10] Wafaa M, Haggag, "Wheat Diseases in Egypt and its management," Journal of Applied
Sciences Research, vol. IX, no. 1, pp. 46-50, 2013.
[12] Rafael C, Gonzalez; Richard E, Woods, Digital Image Processing, Upper Saddle River,
New Jersey: Pearson Education, Inc., 2008.
65
[13] Arya, M S; Anjali, K; Divya, Uni, DETECTION OF UNHEALTHY PLANT LEAVES
USING IMAGE PROCESSING AND GENETIC ALGORITHM WITH ARDUINO, IEEE,
2018.
[16] Anand, R; Veni, S; Aravinth, J, "An Application of image processing techniques for
Detection of Diseases on Brinjal Leaves Using K-Means Clustering Method," in
International Conference On Recent Trends In Information Technology, 2016.
[19] Janwale Asaram, Pandurng; Santosh S, Lomte, "Digital Image Processing Applications
in Agriculture: A Survey," International Journal of Advanced Research in Computer
Science and Software Engineering, vol. V, no. 3, pp. 622-624, 2015.
[20] M, Akila; P, Deepak, "Detection and Classification of Plant Leaf Diseases by Using
Deep Learning Algorithm," International Journal of Engineering Research and
Technology, vol. VI, no. 7, pp. 1-5, 2018.
[21] "Detection and Classification of Plant Leaf Diseases Using Image Processing
Techniques: A Review," International Journal of Recent Advances in Engineering &
Technology, vol. II, no. 3, pp. 1-7, 2014.
[22] Vijay, Singh; AK, Musta, "Detection of plant leaf diseases using image segmentation
and soft computing techniques," Information Processing in Agriculture, pp. 1-9, 2016.
[23] Pranjali B, Padol; S D, Sawant, "Fusion Classification Technique Used to Detect Downy
and Powdery Mildew Grape Leaf Diseases," International Conference on Global Trends
in Signal Processing, Information Computing and Communication, pp. 298-301, 2016.
66
[24] H, Sabrol; K, Satish, "Tomato Plant Disease Classification in Digital Images using
Classification Tree," in International Conference on Communication and Signal
Processing, 2016.
[25] S. Aasha, Nandhini; R, Hemalatha; S., Radha; K., Indumathi, "Web Enabled Plant
Disease Detection System for Agricultural Applications Using WMSN," Springer
Journals, 2017.
[26] Kangshun, Li; Lu, Doing; Dongbo, Zhang; Zhengping, Ling; Yu, Xue, "The Research of
Disease Spots Extraction Based on Evolutionary Algorithm," Hindawi Journal of
Optimization, 2017.
[27] Peifeng, Xu; Gangshan, Wu; Yijia, Guo; Xianoyin, Chen; Heating, Yang; Rongbiao,
Zhang, "Automatic Wheat Leaf Rust Detection and Grading Diagnosis via Embedded
Image Processing System," International Congress of Information and Communication
Technology, no. 107, pp. 836-841, 2017.
[28] Pooja, D; Rahul, Das; Kanchan, V, "Identification of Plant Leaf Disease Using Image
Processing Techniques," in IEEE International Conference on Technological
Innovations in ICT for Agriculture and Rural Development, 2017.
[29] Artzai, Picon; Aitor, Alvarez-Gila; Maximiliam, Seitz; Amalia, Ortiz-Barredo; Home,
Echazarra, "Deep convolutional neural networks for mobile capture device-based crop
disease classification in the wild," Computers and Electronics in Agriculture, pp. 1-13,
2018.
[30] D. M. Abrham, G. M. Seffi and M. A. Dagnachew, mage Analysis for Ethiopian Coffee
Plant, Bahir Dar University.
[31] A. Enquhone, "Maize Leaf Diseases Recognition and Classification Based on Imaging
and Machine Learning Techniques," International Journal of Innovative Research in
Computer and Communication Engineering, vol. 5, no. 12, 2017.
[32] M. Trishen, R. Mahess, K. Somveer and P. Sameerchand, "Plant leaf recognition using
shape features and colour histogram with k-NN classifiers," in Second Internatioanal
Symposium on Computer Vision and The Internet, 2015.
[33] R. Nikita and J. S. Gill, "An Overview on Detection and Classification of Plant Diseases
in Image Processing," International Journal of Scientific Engineering and Research
(IJSER), vol. V, no. 5, pp. 114-117, 2014.
[34] S. Vijai and K. M. A, "Detection of plant leaf diseases using image segmentation and
soft computing techniques," INFORMATION PROCESSING IN AGRICULTURE, vol. 4,
pp. 41-49, 2017.
67
[35] Vijai, Singh; A, K. Misra, "Detection of plant leaf diseases using image segmentation
and soft computing techniques," INFORMATION PROCESSING IN AGRICULTURE,
vol. 4, pp. 41-49, 2017.
[36] Nikos Petrellis, A Review of Image Processing Techniques Common in Human and
Plant Disease Diagnosis, 2018.
[39] "Spyder Website," The Spyder Website Contributors , 2018. [Online]. [Accessed April
2019].
[41] Enquhone, Alehegn, "Maize Leaf Diseases Recognition and Classification Based on
Imaging and Machine Learning Techniques," International Journal of Innovative
Research in Computer and Communication Engineering, vol. 5, no. 12, 2017.
[42] Nikos Petrellis, A Review of Image Processing Techniques Common in Human and
Plant Disease Diagnosis, Computer Science and Engineering Department,
Technological Educational Institute of Thessaly, 2018.
[43] Michael Beyeler, Machine Learning for OpenCV, Birmingham: Packt Publishing Ltd.,
2017.
[44] Gabriel Garrido , Prateek Joshi, OpenCV 3.x with Python By Example Second Edition,
Birmingham: Packt Publishing Ltd., 2018.
[45] Samuel Gebreselassie, Mekbib G. Haile, Mathias Kalkuhl, The Wheat Sector in
Ethiopia: Current Status and Key Challenges for Future Value Chain Development,
Bonn: Zef, Center for Development Research, University of Bonn, 2017.
68