Leaf Disease Detection Using Decision Tree

A Dissertation Work Entitled on
LEAF DISEASE DETECTION USING DECISION TREE

A project Report submitted in partial fulfillment of the requirements for the award of the
degree of
MASTER OF TECHNOLOGY
IN
COMPUTER SCIENCE AND ENGINEERING
BY
FATIMA MASOOD
160617742120
Under the Esteemed Guidance of
SUMAYYA AFREEN
Professor, Department
of computer science
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Stanley College of Engineering and Technology for Women
(Affiliated to Osmania University& Approved by AICTE,Accredited by NBA,
Accredited by NAAC “A” Grade)
Chapel Road, Abids, Hyderabad.
2019-2020
(Affiliated to Osmania University& Approved by AICTE,Accredited by
NBA, Accredited by NAAC “A” Grade)
CERTIFICATE
This is to certify that dissertation work entitled LEAF DISEASE DETECTION
USING DECISION TREE. Submitted by Ms. FATIMA MASOOD(Roll No.
160617742120), a student of Department of Computer Science & Engineering, Stanley
college of engineering and technology for women in partial fulfillment of the
requirements for the award of the degree of Master of Technology with Computer
Science and Engineering as specialization is a record of bonafide work carried out by
her during the academic year 2019-2020.
Signature of the Supervisor Signature of the Head of the Dept

Dr. Srinivasu Badugu Dr. B.V Ramana Murthy
PROFESSOR, PROFESSOR & HOD,
Department of CSE, Department of CSE,
Stanley College of Engineering Stanley College of Engineering
and Technology for women and Technology for women
(Affiliated to Osmania University & Approved by AICTE, Accredited
by NBA, Accredited by NAAC “A” Grade)
DECLARATION
I declare that the work reported in the dissertation entitled LEAF DISEASE
DETECTION USING DECISION TREE is a record of the work done by me in the
Department of Computer Science and Engineering.
No part of the thesis is copied from books, journals, internet the same has been duly
acknowledged in the text. The reported data is based on the dissertation work done
entirely by me and not copied from any other source.
Name: FATIMA MASOOD

Roll No.:160617742120
i
ACKNOWLEDGEMENT
I feel satisfied presenting the project entitled “Leaf disease

detection using decision tree”with favourable and warm
reception, which would not have come into existence without
the active support given to me from various sites.
I thank Dr. Satya Prasad Lanka, the principal , Stanley

College of Engineering and Technology for women, for his
timely cooperation and for providing me all the required
facilities to complete the project successfully. I am extremely
grateful to Head of the Department DR. B.V Ramana Murthy
and DR. B. Srinivasu for providing excellent computing
facility and such a nice atmosphere for completing y project.
I would like to express my deep sense of respect and gratitude

towards my guide SUMAYYA AFREEN, who has been the
guiding force behind this work. I am greatly indebted to him
for his constant encouragement , invaluable advice and for
propelling me further in every aspect of my academic life.
FATIMA MASOOD
(160617742120)
II
ABSTRACT
Agricultural productivity is very dependent on the economy. Plant diseases play an important
role in agriculture because plant diseases are very natural and failure to care will have serious
consequences for plants and therefore affect the quality, quantity, or productivity of the
product. Timely and accurate diagnosis of leaf diseases plays a major part in preventing loss
in productivity and loss or reduction of agricultural products. Detection of plant diseases by
automated techniques is beneficial because it reduces monitoring efforts on large plants and
detects an indication of disease which occurs when they came on the leaves of plants very
early. More researchers have proposed leaf disease detection techniques. The existing
systems have less detection accuracy. This proposed system uses a decision tree to identify
and classify leaf disease and increases its detection accuracy with less time compared with
the existing system.
The present work proposes a methodology for detecting plant diseases early and accurately,
using diverse image processing techniques and convolutional neural network (CNN).
Farmers experience great difficulties in changing from one disease control policy to another.
Relying on pure naked-eye observation to detect and classify diseases can be expensive
various plant diseases pose a great threat to the agricultural sector by reducing the life of the
plants. The present work is aimed to develop a simple disease detection system for plant
diseases. The work begins with capturing the images. Filtered and segmented .Then, texture
and color features are extracted from the result of segmentation and convolutional neural
network (CNN) is then trained by choosing the feature values that could distinguish the
healthy and diseased samples appropriately. Experimental results showed that classification
performance by CNN taking feature set is better with an accuracy of 91%.
KEYWORDS: Plant disease and pest detection, Preprocessing,

Feature Extraction, Decision Tree Method, Classification
III
TABLE OF CONTENTS
DECLARATION…........................................................................................ I
ACKNOWLEDGMENT ............................................................................... II
ABSTRACT… ................................................................................................ III
CONTENTS....................................................................................................IV
LIST OF TABLES.................................................................................................. V
LIST OF FIGURES................................................................................................ VI
1 INTRODUCTION.........................................................................................01
1.1 General Approaches ..............................................................................01
1.2 Problem Statement ................................................................................03
1.3 Motivation............................................................................................. 05
1.4 Aim & Scope ........................................................................................ 06
1.5 Background ........................................................................................... 12
1.6 Thesis organisation ............................................................................... 15
2 LITERATURE SURVEY… .................................................................................16
2.1 limitation of existing work… .................................................................... 19
3 PROBLEM ANALYSIS & DESIGN ................................................................ 20
3.1 Experimental methods & algorithm ...........................................................20
3.1.1 Decision tree modeling ............................................................ 21
3.2 CNN |(convolution neural network) ........................................................22
3.3 Model building..........................................................................................27
3.3.1 Proposed system…..................................................................28
3.4 Software hardware requirements… .............................................. 30
3.5 System designs… ............................................................................. 31

IV
4 IMPLEMENTATION ........................................................................................ 39
4.1 Software used ...................................................................……………...39
4.2 Data preprocessing ........................................................... ……………...41
5 RESULT ................................................................................................................ 42
6 CONCLUSION & FUTURE WORKS ............................................................... 52
REFERENCES ....................................................................................................53
V
LIST OF TABLES Table Page
Table 2.1 Comparative study of all authorship techniques… ............................. 17
Table 5.1 Dataset of classifications...................................................................... 40
Table 5.9 Analyzing the various plant leafs ........................................................ 44
Table 5.10 Detection accuracy ............................................................................. 45
Table 5.11 Comparision and accuracy ................................................................. 45
Table 5.12 Disease accuracy ............................................................................... 46
V
LIST OF FIGURES Figure Page
Fig 1.1 Three categories of these Machine Learning algorithm… ..................... 08
Fig 1.2 Category of Approaches ......................................................................... 11
Fig 1.3 Applications of DRL Approaches .......................................................... 11
Fig 3.1 Decision Tree .......................................................................................... 21
Fig 3.2 CNN ........................................................................................................ 23
Fig 3.5 Architecture of proposed system ............................................................. 28
Fig.3.6 Waterfall Model ..................................................................................... 31
Fig 3.7 Use case model ....................................................................................... 34
Fig 3.8 Sequence model ...................................................................................... 35
Fig 3.9 Activity model ........................................................................................36
Fig 5.2 Sample images (healthy) ........................................................................ 40
Fig 5.3 Sample images (unhealthy) .................................................................... 41
Fig 5.4 I/o of an image ....................................................................................... 41
Fig 5.5 Browse images ....................................................................................... 42
Fig 5.6 Analyses the images ............................................................................... 42
Fig 5.7 Status of leaf ........................................................................................... 43
Fig 5.8 Remedies to protect the plant ................................................................. 43
VI
Chapter 1
INTRODUCTION
Since the 1950s, a small subset of Artificial Intelligence (AI), often called Machine Learning
(ML), has revolutionized several fields in the last few decades.
Machine learning, as a powerful approach to achieve Artificial Intelligence, has been widely
used in pattern recognition, a very basic skill for humans but a challenge for machines.
Nowadays, with the development of computer technology, pattern recognition has become an
essential and important technique in the field of Artificial Intelligence. The pattern
recognition can identify letters, images, voice or other objects and also can identify status,
extent or other abstractions. It is the scientific research of statistical procedures and methods
which they are used by computer systems designed to perform such functions without
specific instructions, rather than trusting in the models and conclusions. This is believed to be
part of artificial Intelligence.
MACHINE LEARNING ALGORITHM

Set up a mathematical model based on data examples called "training data" to make
predictions without the completion of a task being explicitly programmed.India is an
agriculture-dependent country.70% of the Indian economy depends on agriculture but leaf
infection phenomena cause the loss of major crops results in economic loss. In plants,
diseases usually occur on leaves, fruit, buds, and branches. This situation causes plants to be
damaged. For this reason, it is very important to determine the disease first and take the
necessary precautions before spreading to other trees. Therefore, the fight against diseases
and plant pests is the most important issue in agriculture. The current method for detecting
plant diseases is a simple observation with the naked eye of professionals who identify and
detect plant diseases. This requires a large team of experts and ongoing monitoring from the
factory, which spends a lot of money on large companies. Therefore, consulting experts are
even cost and time taking process. In such circumstances, this proposed method is useful in
observing the area. Also, image processing is maintained to allow automatic monitoring,
review of processes and robot guidance based on images. Visual recognition of plant diseases
is even riskier and less accurate and can particularly be done in some areas. Using self-
detection techniques requires very little effort, very little time and more accurate.
1
Brown and yellow spots, and early and late starch are frequently observed in plants, while
others are fungal, viral, and bacterial. Image analysis is used to assess the area affected, and
to determine the colour variation of the area affected. Image segmentation is the mechanism
whereby an image is divided or clustered into several sections. Today there are many ways to
group images, from simple threshold methods to sophisticated methods for segmenting
images. The process of segmentation is based on various features in the image. In proposed
system leaf infection detection and diagnosis is made through a decision tree method. This
situation is proved using execution time calculation. In the literature, studies are usually
carried out with synthetic images of plants and pests. In this dataset, the proposed system was
tested, and the results clearly show that it can be used in real applications.
 Machine learning (ML) is a branch of Artificial Intelligence that pushes forward the
idea that, by giving access to the right data, machines can learn by themselves how to
solve a specific problem. It may seem that Machine Learning, from now referred as
ML, is something very novel, but the truth is that ML started in 1960s . It has been
known with different names throughout history: Statistical learning, pattern
recognition, etc.
 ML has become a trendy topic because, currently, it is easier to use and way more
effective. This can be attributed to an improvement in the computing power available to
the common user, and the quantity of data that is handled and digitised at present (Big
Data).
 Machine learning is a category of algorithm that allows software applications to become
more accurate in predicting outcomes without being explicitly programmed. The basic
premise of machine learning is to build algorithms that can receive input data and use
statistical analysis to predict an output while updating outputs as new data becomes
available. Machine Learning approaches, use a set of URLs as training data, which
sometimes can be Blacklists, and based on the statistical properties, learn a prediction
function to classify a URL as malicious or benign, which gives them the ability to
generalize to new URLs unlike blacklisting methods [6].
 The main objective of ML is to recognize patterns and take automatic decisions based on
previous training. Quoting Arthur Samuel, "machine learning’s main purpose is to learn
how to work without human interaction and to learn without being explicitly
programmed, trying to copy how the human brain works". This is cheaper and faster
than manual programming [7].
2
Agriculture is the mother of all cultures. It has played a key role in the development of human
civilization. Agricultural practices such as irrigation, crop rotation, fertilizers, and pesticides
were developed long ago, but have made great strides in the past century. By the early 19th
century, agricultural techniques had so improved that yield per land unit was many times that
seen in the middle ages. Agricultural production system is an outcome of a complex
interaction of soil, seed and agro chemicals (including fertilizers). Therefore, judicious
management of all the inputs is essential for the sustainability of a complex system. The
focus on enhancing the productivity, without considering the ecological impacts has resulted
into environmental degradation. Without any adverse consequences, enhancement of the
productivity can be done in a sustainable manner. Plants exist everywhere we live, as well as
places without us. Many of them carry significant information for the development of human
society. As diseases of the plants are inevitable, detecting disease plays a major role in the
field of Agriculture. Plant disease is one of the crucial causes that reduces quantity and
degrades quality of the agricultural products.
Diseases and insect pests are the major problems that threaten pomegranate cultivation. These
require careful diagnosis and timely handling to protect the crops from heavy loses [2]. In
pomegranate plant, diseases can be found in various parts such as fruit, stem and leaves.
Major diseases that affect pomegranate fruit are bacterial blight (Xanthomonas axonopodis pv
punicae), antracnose
1.2 PROBLEM STATEMENT
IN India is an agriculture-dependent country.70% of the Indian economy depends on

agriculture but leaf infection phenomena cause the loss of major crops results in economic
loss. In plants, diseases usually occur on leaves, fruit, buds, and branches. This situation
causes plants to be damaged. For this reason, it is very important to determine the disease
first and take the necessary precautions before spreading to other trees. Therefore, the fight
against diseases and plant pests is the most important issue in agriculture. The current method
for detecting plant diseases is a simple observation with the naked eye of professionals who
identify and detect plant diseases. This requires a large team of experts and ongoing
monitoring from the factory, which spends a lot of money on large companies. Therefore,
3
consulting experts are even cost and time taking process. In such circumstances, this
proposed method is useful in observing the area. Also, image processing is maintained to
allow automatic monitoring, review of processes and robot guidance based on images. Visual
recognition of plant diseases is even riskier and less accurate and can particularly be done in
some areas. Using self-detection techniques requires very little effort, very little time and
more accurate.
1.2.1 Pesticides in Freshwater Ecosystems
Pesticides are currently applied on a large scale in agricultural crops, but also in urban areas,
private gardens, and households. Pesticides enter freshwater ecosystems for example via
surface runoff, spray drift, or wastewater treatment plants. The study from Ippolito et al.
shows that more than 40% of the global land area is at risk to insecticide, as displayed in fig
The authors modelled insecticide exposure using the runoff potential model [7]. Up to 18% of
the global land area is predicted to cause a high to very high insecticide runoff into draining
freshwaters. Parameters that contribute to a high runoff potential are predominantly pesticide
use, proportion of cropland, precipitation, slope and soil characteristics. For validation of
exposure, the authors compared the predicted runoff potential with measured pesticide
concentrations in streams from field studies in Europe and Australia. While the runoff
potential model mainly represents a risk potential of certain regions towards pesticide
exposure, high pesticide concentrations in freshwater systems have also been reported in
several studies. Examples are included in a recent study by Stehle and Schulz [8] that
detected insecticide concentrations exceeding regulatory thresholds in 50% of the
investigated concentrations at a global scale. Also, Malaj et al. [9] reviewed the available
exposure monitoring studies of organic pollutants in European freshwater systems and
identified pesticides as one of the major contributors to toxicant exposure of freshwater
ecosystems.
4
FIG: Global insecticide runoff potential map. The map shows the spatial distribution of
potential insecticide runoff to stream ecosystems considering agricultural activities,
geomorphological and climatic conditions.The class boundaries of the runoff potential (−3;
−2; −1; 0)]. (Reprinted from Ippolito et al. [2];with permission) Concentration–response
relationships between the pesticide concentration (Toxic Unit) and mean overall taxa richness
of stream invertebrates. The relationships are given for species and family richness.
1.3 MOTIVATION
The fight against diseases and plant pests is the most important issue in agriculture. The
current method for detecting plant diseases is a simple observation with the naked eye of
professionals who identify and detect plant diseases. This requires a large team of experts and
ongoing monitoring from the factory, which spends a lot of money on large companies.
Therefore, consulting experts are even cost and time taking process. In such circumstances,
this proposed method is useful in observing the area. Also, image processing is maintained to
allow automatic monitoring, review of processes and robot guidance based on images.
5
1.4 AIM AND SCOPE OF THE PROJECT
1.4.1 DECISION TREE
Decision tree builds classification or regression models in the form of a tree structure. It
breaks down a dataset into smaller and smaller subsets while at the same time an associated
decision tree is incrementally developed. The final result is a tree with decision nodes and
leaf nodes. A decision node (e.g., Outlook) has two or more branches (e.g., Sunny, Overcast
and Rainy). Leaf node (e.g., Play) represents a classification or decision. The topmost
decision node in a tree which corresponds to the best predictor called root node.
Decision trees can handle both categorical and numerical data. The core algorithm for
building decision trees called ID3 by J. R. Quinlan which employs a top-down, greedy search
through the space of possible branches with no backtracking. ID3 uses Entropy and
Information Gain to construct a decision tree. In Zero R model there is no predictor, in One R
model we try to find the single best predictor, naive Bayesian includes all predictors using
Bayes' rule and the independence assumptions between predictors but decision tree includes
all predictors with the dependence assumptions between predictors.
The core algorithm for building decision trees called ID3 by J. R. Quinlan which employs a
top-down, greedy search through the space of possible branches with no backtracking. ID3
uses Entropy and Information Gain to construct a decision tree. In Zero R model there is
no predictor, in One R model we try to find the single best predictor, naive Bayesian includes
all predictors using Bayes' rule and the independence assumptions between predictors but
decision tree includes all predictors with the dependence assumptions between predictor.
The Aim of this research are as follows:
 To identify and classify leaf disease and increases its detection accuracy with less
time compared with the existing system.
The main objective of this project is to detect the unhealthy region of plant leaf's using
machine learning. It is to detect leaf disease portion from image. It is to extract features of
detected portion of leaf. It is to recognize disease from detected portion of leaf through
Machine learning algorithms i.e. CNN and Decision tree algorithms.
6
1.4.2 MACHINE LEARNING SCOPE
Machine learning as a very likely approach to achieve human-computer integration and can
be applied in many computer fields. Machine learning is not a typical method as it contains
many different computer algorithms. Yu Yang algorithms aim to solve different machine
learning tasks. At last, all the algorithms can help the computer to act more like a human.
Machine learning is already applied in many fields, for instance, pattern recognition,
Artificial Intelligence, computer vision, data mining, text categorization and so on. Machine
learning gives a new way to develop the intelligence of the machines. It also becomes an
easier way to help people to analyse data from huge data sets. A learning method is a
complicated topic which has many different kinds of forms. Everyone has different methods
to study, so does the machine. We can categorize various machine learning systems by
different conditions. In general, we can separate learning problems in two main categories:
supervised learning and unsupervised learning. Learning is a very important feature of
Artificial Intelligence. Many scientists tried to explain and give a proper definition for
learning. However, learning is not that easy to cover with few simple sentences. Many
computer scientists, sociologists, logicians and other scientists discussed about this for a long
time. Some scientists think learning is an adaptive skill so that the system can perform the
similar task better in the next time (Simon 1987). Others claim that learning is a process of
collecting knowledge (Feigenbaum 1977). Even though there is no proper definition for
learning skill, we still need to give a definition for machine learning. In general, machine
learning aims to find out how the computer algorithms can be improved automatically
through experience (Mitchell 1997). Machine learning has an important position in the field
of Artificial Intelligence. At the beginning of development of Artificial Intelligence (AI), the
AI system does not have a thorough learning ability so the whole system is not perfect. For
instance, a computer cannot do self-adjustment when it faces problems. Moreover, the
computer cannot automatically collect and discover new knowledge. Machine learning has an
important position in the field of and subset of Artificial Intelligence.
A. Types of DL approaches:
Like machine learning, deep learning approaches can be categorized as follows: supervised,
semi-supervised or partially supervised, and unsupervised. In addition, there is another
category of learning called Reinforcement Learning (RL) or Deep RL (DRL) which are often
7
discussed under the scope of semi supervised or sometimes under unsupervised learning
approaches.
Fig. 1. AI: Artificial Intelligence, ML, NN, DL, and Spiking Neural Networks (SNN)
according to [294].
1) Supervised Learning
 Supervised learning is a commonly used machine learning algorithm which appears

in many different fields of computer science. In the supervised learning method, the
computer can establish a learning model based on the training data set.
 According to this learning model, a computer can use the algorithm to predict or
analyze new information. By using special algorithms, a computer can find the best
result and reduce the error rate all by itself. Supervised learning is mainly used for
two different patterns: classification and regression.
 In supervised learning, when a developer gives the computer some samples, each
sample is always attached with some classification information. The computer will
8
analyze these samples to get learning experiences so that the error rate would be
reduced when a classifier does recognitions for each pattern.
 Supervised learning is a learning technique that uses labeled data. In the case of
supervised DL approaches, the environment has a set of inputs and corresponding
outputs(𝑥𝑡,𝑦𝑡)~𝜌. For example, if for input xt, the intelligent agent predicts 𝑦̂𝑡=(𝑥𝑡),
the agent will receive a loss value 𝑙(𝑦𝑡,𝑦̂𝑡). The agent will then iteratively modify the
network parameters for better approximation of the desired outputs. After successful
training, the agent will be able to get the correct answers to questions from the
environment. There are different supervised learning approaches for deep leaning
including Deep Neural Networks (DNN), Convolutional Neural Networks (CNN),
Recurrent Neural Networks (RNN) including Long Short Term Memory (LSTM),
and Gated Recurrent Units (GRU). These networks are described in Sections 2, 3, 4,
and 5 respectively.
2) Semi-supervised Learning
 Semi-supervised learning is learning that occurs based on partially labeled datasets

(often also called reinforcement learning). Section 8 of this study surveys DRL
approaches. In some cases, DRL and Generative Adversarial Networks (GAN) are
used as semi-supervised learning techniques. Additionally, RNN including LSTM and
GRU are used for semi-supervised learning as well. GAN.
3) Unsupervised learning
 Unsupervised learning systems are ones that can without the presence of data labels.
In this case, the agent learns the internal representation or important features to
discover unknown relationships or structure within the input data. Often clustering,
dimensionality reduction, and generative techniques are considered as unsupervised
learning approaches. There are several members of the deep learning family that are
good at clustering and non-linear dimensionality reduction, including
9
 Auto Encoders (AE), Restricted Boltzmann Machines (RBM), and the recently
developed GAN. In addition, RNNs, such as LSTM and RL, are also used for
unsupervised learning in many application domains [243]. Sections 6 and 7 discuss
RNNs and LSTMs in detail.
4) Deep Reinforcement Learning (DRL)
 Deep Reinforcement Learning is a learning technique for use in unknown

environments. DRL began in 2013 with Google Deep Mind [5, 6]. From then on,
several advanced methods have been proposed based on RL. Here is an example of
RL: if environment samples inputs: 𝑥𝑡~ , agent predict: 𝑦̂𝑡=𝑓(𝑥𝑡), agent receive cost:
𝑐𝑡~𝑃(𝑐𝑡|𝑥𝑡,𝑦̂𝑡) where P is an unknown probability distribution, the environment asks
an agent a question, and gives a noisy score as the answer. Sometimes this approach is
called semi-supervised learning as well. There are many semi-supervised and un-
supervised techniques that have been implemented based on this concept .
 In RL, we do not have a straight forward loss function, thus making learning harder
compared to traditional supervised approaches. The fundamental differences between
RL and supervised learning are: first, you do not have full access to the function you
are trying to optimize; you must query them through interaction, and second, you are
interacting with a state-based environment: input 𝑥𝑡 depends on previous actions.
 Depending upon the problem scope or space, you can decide which type of RL needs
to be applied for solving a task. If the problem has a lot of parameters to be optimized,
DRL is the best way to go. If the problem has fewer parameters for optimization, a
derivation free RL approach is good. An example of this is annealing, cross entropy
methods, and SPSA. We conclude this section with a quote from Yann LeCun:
 “If intelligence was a cake, unsupervised learning would be the cake, supervised
learning would be the icing, and reinforcement learning would be the carry.” –
Yann LeCun
10
What are anomalies?
Anomalies are also referred to as abnormalities, deviants, or outliers in the data mining and
statistics literature (Aggarwal
[2013]). As illustrated in Figure 3, N1 and N2 are regions consisting of a majority of
observations and hence
considered as normal data instance regions, whereas the region O3, and data points O1 and
O2 are few data points
which are located further away from the bulk of data points and hence are considered
anomalies. arise due to several
11
1.5 BACKGROUND
Since the computer was invented, it has begun to affect our daily life. It improves the quality
of our lives; it makes our life more convenient and more efficient. A fascinating idea is to let
a computer think and learn as a human. Basically, machine learning is to let a computer
develop learning skills by itself with given knowledge. Pattern recognition can be treated like
computer being able to recognize different species of objects. Therefore, machine learning
has close connection with pattern recognition. In this project, the object is the Iris flower. The
data set of Iris contains three different classes: Setosa, Versicolor, and Virginica. The
designed recognition system will distinguish these three different classes of Iris.
12
Machine learning, as a powerful approach to achieve Artificial Intelligence, has been widely
used in pattern recognition, a very basic skill for humans but a challenge for machines.
Nowadays, with the development of computer technology, pattern recognition has become an
essential and important technique in the field of Artificial Intelligence. The pattern
recognition can identify letters, images, voice or other objects and also can identify status,
extent or other abstractions.
1.5.1 Supervised Learning:
Is the Machine Learning task of inferring a function from labeled training data. The training
data consists of a set of training examples. In supervised Learning, each example is a pair
consisting of an input object and a desired output value. A supervised learning algorithm
analyzes the training data and produces an inferred function, which can be used for mapping
new examples. An optimal scenario will allow the algorithm to correctly determine the class
labels for unseen instances. This requires the learning algorithm to generalize from the training
data unseen situations in a “reasonable” way. Classification is a supervised learning problem
where there is an input, X, an output, Y, and the task is to learn the mapping from the input to
the output. In machine learning methods, the emotion is detected by using classification
approaches based on a training dataset.
 In supervised learning, the output datasets are provided which are used to train the
machine to get the desired output.
In supervised learning, the output datasets are provided which are used to train the
machine to get the desired outputs. The manual extraction of patterns from data has
occurred for centuries. Early methods of identifying patterns in data include Bayes
theorem (1700s) and regression analysis (1800s). The increasing power of computer
technology has increased data collection, storage, and manipulations. As data sets have
grown in size and complexity, direct hands-on data analysis has increasingly been
augmented with indirect, automatic data processing. This has been aided by other
algorithms (1950s), decision tress (1960s) and support vector machines (1990s) [9].
13
Support Vector Machine (SVM)
Is one of the supervised learning models with associated learning algorithms used for
classification and regression analysis. SVM can perform linear classification by building a
model that assigns new instances into one class or the other. In addition, SVM can
efficiently perform a nonlinear classification using kernel methods. This makes SVM
easily applicable to both linear and nonlinear data. SVM has been used in various
classification problems in the real world. SVM has been widely applied to classify text as
it can significantly reduce the size of the training set in both the standard inductive and
transductive settings [10]. SVM is also used for image classification. A support vector
machine active learning (SVM Active) algorithm proposed by Edward Chang and Simon
Tong “has achieved significantly higher search accuracy than traditional query refinement
schemes” [11].
Some advantages of SVM are:
 SVM is more effective than most classifiers in many applications involving very high
dimensional data.
 SVM can still be effective in cases where the number of dimensions is greater than the
number of samples.
 SVM is also memory efficient because it uses a subset of training points in the decision
function (called support vectors).
 SVM is versatile as different Kernel functions can be specified for the decision function.
Common kernels are provided, but it is also possible to specify custom kernels.
Some disadvantages of SVM are:
 SVM is likely to perform poorly if the number of features is much greater than the number
of samples.
 SVMs do not directly provide probability estimates, and these are calculated using an
expensive cross-validation, especially for large data sets.
14
1.6 THESIS ORGANISATION:
The thesis is consists of six chapters organized as the follows:

 Chapter One: Introduction: overview of leaf disease detection,
 Chapter Two: Literature review: this chapter provides an overview of the related
works in leaf disease detection and summary of articles that published by other
researchers.
 Chapter Three: Problem Analysis and Design: this chapter provides a brief
description of the Existing system problem and proposed system and different feature
extraction techniques and metrics evaluations. Overview of the software that used for
the evaluation of the proposed method and System design.
 Chapter Four: Implementation: the datasets were used in this research, details of
experiment and classification algorithm.
 Chapter Five: Results Comparisons of results.
 Chapter Six: Conclusion and future.
15
CHAPTER-2
LITERATURE SURVEY
The authors in the paper [1] illustrate that in the processing scheme developed there are four
main steps, where first the color conversion structure is made for RGB image input, because
RGB is used to create colors and to change or change images. Change RGB using HSI as a
color descriptor. In the second step, using a threshold, the green pixels are closed and
removed. Third, using threshold levels that have been previously calculated, removal of green
pixels and masking is done for the useful segments extracted at this stage first when the
image is segmented. Segmentation is done in the last or fourth main step.
Mrunalini et al. [2] presents techniques for classifying and identifying various diseases that
affect plants. In the Indian economy, machine learning based recognition systems will prove
to be very useful because they save effort, money, and time. The approach described here for
extracting a series of functions is the simultaneous appearance of color. The neural network is
used to automatically detect disease in the leaves. The proposed approach can significantly
contribute to the accurate detection of leaves and appears to be an important approach to
dealing with vapor and root disease, which requires less computational effort.
Kulkarni et al. presents methods for early detection and accurate detection of plant diseases
using artificial neural networks (ANN) and various imaging techniques. Because the
proposed approach is based on ANN classification for Gabor classification and filter for
feature extraction, better results are achieved with recognition rates of up to 91%. ANN-
based classification classifies various plant diseases and uses a combination of texture, color,
and properties to identify this disease [3].
16
The authors present evidence of disease in Malus domestica using effective methods such as
K grouping, texture and color analysis [4].
Classification and detection of various ponds using texture and color features that are usually
found in normal and affected areas. In the following days, the Bayes classifier and classifier
of the main components can be used to classify the k-means group. [7]
In this article, the disease detection process is carried out by comparing the effect of the HSI,
CIELAB and YCbCr color spaces. Images use a calming media filter. In the final step, using
the Otsu method for color components, threshold calculations can be made to find the
location of the disease. Noise is heard because of the background displayed in the test results,
camera flash and veins. The CIELAB color model is used to eliminate this noise.
Table 2.1: Summary of survey articles

S.NO AUTHOR YEAR OF DESCRIPTION PROS CONS
PUBLICATION
1 Prof. Sanjay B.et al., 2013 Agricultural plant leaf Vision-based NN’s can be used
disease detection detection to increase the
Using image algorithm with recognition rate
processing. masking the green- of classification
pixels and color process .
co-occurrence
method.
2 Mrunalini R 2015 An application of K- K-means clustering Artificial neural
means clustering and algorithm with network and fuzzy
Artificial intelligence neural networks logic with other
in pattern recognition for automatic soft computing
for crop diseases. detection of leaves technique can be
17
diseases. used to classify the
crop diseases.
3 Anand H. Kulkarni 2016 Applying image Gabor filter for Recognition rate
processing feature extraction Can be increased.
Technique to detect and ANN
plant diseases. Classifier for
classification.
4 Sabah Bashir 2017 Remote area plant Texture Bayes classifier,
disease detection segmentation by K-means clustering
using image co-occurrence and principal
processing. matrix method component
and K-means classifier can be
clustering used to classify
technique. various plant.
6 Piyush Chaudhary 2018 Color transform Median filter is Disease spot area
based approach for used for image can be computed
disease spot detection smoothing and for assessment of
on plant leaf . threshold can be loss in agriculture
calculated by crop. Disease can
applying Otsu be classified by
method. calculating
dimensions of
disease.
7 Muammer and Davut 2019 Plant disease and pest This is used for Recognition rate
detection using deep leaf detection by can be more
learning. using deep neural effective .
networks for deep
feature extraction.
18
2.1 Limitations of existing work:
 In some cases, the application still does not have accurate results. Further
optimization is needed.
• Priority information is needed for segmentation.
• Database extension is required for greater accuracy.
• Only a few diseases are covered. Therefore, the work must be expanded to cover more
diseases.
• Possible causes that can cause misclassification can be: Symptoms of the disease vary
from one plant to another, optimizing the characteristics needed, more training patterns
are needed to cover and predict more cases - the actual disease.
19
CHAPTER - 3
PROBLEM ANALYSIS AND DESIGN
This chapter gives a brief description of the problem. It also discuss the potential solution
to the problem i.e., proposed system and different feature extraction techniques. I discuss
about the different requirements needed to accomplish our aim. I also discuss about the
software development life cycle method that followed during this research and System
design.
EXPERIMENTAL OR METHODS AND ALGORITHMS USED
3.1 DECISION TREE MODELLING
In computational complexity the decision tree model is the model of computation in which an
algorithm is basically a decision tree, i.e., a sequence of branching operations based on
comparisons of some quantities, the comparisons being assigned unit computational cost.
The branching operations are called "tests" or "queries". In this setting the algorithm in
question may be viewed as a computation of a Boolean function {\display style
f:\{0,1\}^{n}\right arrow \{0,1\}} f:\{0,1\}^{n}\rightarrow \{0,1\} where the input is a series
of queries and the output is the final decision. Each query may be dependent on previous
queries.
Several variants of decision tree models have been introduced, depending on the complexity
of the operations allowed in the computation of a single comparison and the way of
branching.
Decision trees models are instrumental in establishing lower bounds for complexity theory
for certain classes of computational problems and algorithms. The computational complexity
of a problem or an algorithm expressed in terms of the decision tree model is called its
decision tree complexity or query complexity. A decision tree is a decision support tool that
uses a tree-like model of decisions and their possible consequences, including chance event
outcomes, resource costs, and utility. It is one way to display an algorithm that only contains
conditional control statements.
20
Decision trees are commonly used in operations research, specifically in decision analysis, to
help identify a strategy most likely to reach a goal but are also a popular tool in machine
learning.
A decision tree consists of three types of nodes:

Decision nodes – typically represented by squares Chance nodes – typically represented by
circles End nodes – typically represented by triangles
Decision trees are commonly used in operations research and operations management. If, in
practice, decisions must be taken online with no recall under incomplete knowledge, a
decision tree should be paralleled by a probability model as a best choice model or online
selection model algorithm. Another use of decision tree is as a descriptive means for
calculating conditional probabilities.
Decision trees, influence diagrams, utility functions, and other decision analysis tools
and methods are taught to undergraduate students in schools of business, health
economics, and public health, and are examples of operations research or management
science methods.
Fig: Decision Tree
21
3.2 CNN {Convolutional neural network}
It is a specific type of artificial neural network that uses perceptron’s, a machine learning
unit algorithm, for supervised learning, to analyze data. CNNs apply to image processing,
natural language processing and other kinds of cognitive tasks.
Convolutional Neural Networks, like neural networks, are made up of neurons with learnable
weights and biases. Each neuron receives several inputs, takes a weighted sum over them,
pass it through an activation function and responds with an output.
Convolutional Neural networks are designed to process data through multiple layers of
arrays. This type of neural networks is used in applications like image recognition or face
recognition. The primary difference between CNN and any other ordinary neural network is
that CNN takes input as a two-dimensional array and operates directly on the images rather
than focusing on feature extraction which other neural networks focus on.
The dominant approach of CNN includes solutions for problems of recognition. Top
companies like Google and Facebook have invested in research and development towards
recognition projects to get activities done with greater speed.
CNNs have similar performance to the ordinary fully connected Neural Networks. These
convolutional networks have weights that can learn from the input and biases. Every neuron
connected in the network receives an input and performs a dot product on it. This proceeds in
a non-linear fashion. There is a singular differentiable score function at the end. This function
consists of scores that we obtain from the various layers of the neural network. Finally, a loss
function at the end to evaluate the performance of the model.
The convolutional neural network is different from the standard Neural Network in the
sense that there is an explicit assumption of input as an image.
22
FIG:CNN
CNN OVERVIEW
This network structure was first proposed by Fukushima in 1988 [48]. It was not widely used
however due to limits of computation hardware for training the network. In the 1990s, LeCun
et al. applied a gradient-based learning algorithm to CNNs and obtained successful results for
the handwritten digit classification problem [49]. After that, researchers further improved
CNNs and reported state-of-the-art results in many recognition tasks. CNNs have several
advantages over DNNs, including being more similar to the human visual processing system,
being highly optimized in structure for processing 2D and 3D images, and being effective at
learning and extracting abstractions of 2D features. The max pooling layer of CNNs is
effective in absorbing shape variations. Moreover, composed of sparse connections with tied
weights, CNNs have significantly fewer parameters than a fully connected network of similar
size. Most of all, CNNs are trained with the gradient-based learning algorithm, and suffer less
from the diminishing gradient problem. Given that the gradient-based algorithm trains the
whole network to minimize an error criterion directly, CNNs can produce highly optimized
weights.
23
Fig. 11. The overall architecture of the CNN includes an input layer, multiple alternating
convolution and max-pooling layers, one fully-connected layer and one classification layer.
The CNN architecture consists of a combination of three types of layers: convolution, max-
pooling, and classification. There are two types of layers in the low and middle-level of the
network: convolutional layers and max-pooling layers. The even numbered layers are for
convolutions and the odd numbered layers are for max-pooling operations. The output nodes
of the convolution and max-pooling layers are grouped into a 2D plane called feature
mapping. Each plane of a layer is usually derived of the combination of one or more planes of
previous layers. The nodes of a plane are connected to a small region of each connected
planes of the previous layer. Each node of the convolution layer extracts the features from the
input images by convolution operations on the input nodes.
Higher-level features are derived from features propagated from lower level layers. As the
features propagate to the highest layer or level, the dimensions of features are reduced
depending on the size of kernel for the convolutional and max-pooling operations
respectively. However, the number of feature maps usually increased for representing better
features of the input images for ensuring classification accuracy. The output of the last layer
of the CNN are used as the input to a fully connected network which is called classification
layer. Feed-forward neural networks have been used as the classification layer as they have
better performance [50, 58]. In the classification layer, the desired number of features are
selected as inputs with respect to the dimension of the weight matrix of the final neural
network. However, the fully connected layers are expensive in terms of network or learning
parameters. Nowadays, there are several new techniques including average pooling and
global average pooling that are used as an alternative of fully-connected networks. The score
24
of the respective class is calculated in the top classification layer using a soft-max layer.
Based on the highest score, the classifier gives output for the corresponding classes.
Mathematical details on different layers of CNNs are discussed in the following section.
1) Convolution Layer
In this layer, feature maps from previous layers are convolved with learnable kernels. The
output of the kernels go through a linear or non-linear activation function such as a(sigmoid,
hyperbolic tangent, Softmax, rectified linear, and identity functions) to form the output
feature maps. Each of the output feature maps can be combined with more than one input
feature map. In general, we have that
𝑥𝑗𝑙=(Σ𝑥𝑖𝑙−1𝑖𝜖𝑀𝑗∗ 𝑘𝑖𝑗𝑙+ 𝑏𝑗𝑙)
where 𝑥𝑗𝑙 is the output of the current layer, 𝑥𝑖𝑙−1 is the previous layer output, 𝑘𝑖𝑗𝑙 is the
kernel for the present layer, and 𝑏𝑗𝑙 are biases for the current layer. 𝑀𝑗 represents a selection
of input maps. For each output map, an additive bias 𝑏 is given. However, the input maps will
be convolved with distinct kernels to generate the corresponding output maps. The output
maps finally go through a linear or non-linear activation function (such as sigmoid,
hyperbolic tangent, Softmax, rectified linear, or identity functions).
2) Sub-sampling Layer
The subsampling layer performs the down sampled operation on the input maps. This is
commonly known as the pooling layer. In this layer, the number of input and output feature
maps does not change. For example, if there are 𝑁 input maps, then there will be exactly 𝑁
output maps. Due to the down sampling operation the size of each dimension of the output
maps will be reduced, depending on the size of the down sampling mask. For example: if a
2×2 down sampling kernel is used, then each output dimension will be the half of the
corresponding input dimension for all the images.
This operation can be formulated as
xjl=down(xjl−1)
where down( .) represents a sub-sampling function. Two types of operations are mostly
performed in this layer: average pooling or max-pooling. In the case of the average pooling
approach, the function usually sums up over N×N patches of the feature maps from the
25
previous layer and selects the average value. On the other hand, in the case of max-pooling,
the highest value is selected from the N×N patches of the feature maps. Therefore, the output
map dimensions are reduced by n times. In some special cases, each output map is multiplied
with a scalar. Some alternative sub-sampling layers have been proposed, such as fractional
max-pooling layer and sub-sampling with convolution.
3) Classification Layer
This is the fully connected layer which computes the score of each class from the extracted
features from a convolutional layer in the preceding steps. The final layer feature maps are
represented as vectors with scalar values which are passed to the fully connected layers. The
fully connected feed-forward neural layers are used as a soft-max classification layer. There
are no strict rules on the number of layers which are incorporated in the network model.
However, in most cases, two to four layers have been observed in different architectures
including LeNet [49], AlexNet [7], and VGG Net [9]. As the fully connected layers are
expensive in terms of computation, alternative approaches have been proposed during the last
few years. These include the global average pooling layer and the average pooling layer
which help to reduce the number of parameters in the network significantly.
In the backward propagation through the CNNs, the fully connected layers update following
the general approach of fully connected neural networks (FCNN). The filters of the
convolutional layers are updated by performing the full convolutional operation on the
feature maps between the convolutional layer and its immediate previous layer.
Fig. Example of convolution and pooling operation.
26
A Convolutional neural network (CNN) is a neural network that has one or more
convolutional layers and are used mainly for image processing, classification, segmentation
and also for other auto correlated data. A convolution is essentially sliding a filter over the
input.
3.3 Model Building
Machine training is concerned with classification, research is used to classify healthy and
unhealthy plants. Our work is based on the morphological characteristics of plant leaves.
27
3.3.1 PROPOSED SYSTEM
The following algorithm describes the step by step approach for the proposed model.
Figure: Architecture of proposed system
28
1) Input of an image – This is the first step where images need to be uploaded with a digital
camera or from another website.
2) Pre-processing of an image – Image input is pre-processed to improve image quality and

to avoid unwanted image distortion. Trimming the sheet image is done to get an interesting
area of the image, and then the image is smoothed using a refinement filter. Image
enhancement is also done to increase contrast.
3) Segmentation – It refers to dividing an image into groups of pixels based on several

criteria. The segmentation algorithm takes an image as input and displays a collection of
regions.
4) Feature extraction – A type of size reduction that effectively shows interesting parts of
the image as a concise vector. This approach is useful when large image sizes and reduced
performance are needed for fast tasks such as matching and taking pictures.
5) Machine learning algorithms – These are used in a variety of applications, such as email
filtering and image processing, where it is difficult or impossible to develop conventional
algorithms to perform tasks effectively. This algorithm is used to identify plant leaf diseases.
6) The knowledge base – It is a collection of data organized in a form that facilitates analysis
through an automatic deductive process. Here the machine learning algorithm provides all
data about leaf disease.
7) Machine Learning Classifier – The Machine Learning Classifier used here is a decision
tree algorithm. Decision trees are graphical representations of certain decision situations that
are used when complex branches occur in a structured decision process. Decision trees are
29
predictive models based on a series of branched Boolean tests that draw more general
conclusions based on specific facts.
8) Results – This is the last step in the whole process. Here we can compete with other
algorithms and can prove our algorithm has better results.
3.3 SOFTWARE REQUIREMENT SPECIFICATION
SOFTWARE USED
The software’s applicable are python using anaconda prompt where it works under the
browser Jupyter Notebook and even we have the imported packages the Pandas.
3.3.1 HARDWARE REQUIREMENTS:

RAM : 8GB
Hard Disk : 20GB
Processor : Intel core i5
3.3.2 SOFTWARE REQUIREMENTS:
Programming Language: Python 3.7.0 Anaconda prompt
Integrated Development Environment: IDLE 3.7.0
Operating System: Windows 10
Packages: Pandas
3.3.3 FUNCTIONAL REQUIREMENTS:

The classifier model built must be able to correctly classify.
3.3.4 NON-FUNCTIONAL REQUIREMENTS:

The model built must take less time and provide highest accuracy.
3.3.5 SOFTWARE DEVELOPMENT LIFE CYCLE MODEL:
3.3.5.1 Waterfall Model

The model we opted is Waterfall model. It is also referred to as a linear-sequential life
cycle model. It is very simple to understand and use. In a waterfall model, each phase
30
must be completed before the next phase can begin and there is no overlapping in the
phases. The Waterfall model is the earliest SDLC approach that was used for software
development.
The waterfall Model illustrates the software development process in a linear
sequential flow. This means that any phase in the development process begins only if the
previous phase is complete. In this waterfall model, the phases do not overlap.
Figure 3.1 - Waterfall Model

3.3.5.2 Advantages:
 Simple and easy to understand and use
 Easy to manage due to the rigidity of the model. Each phase has specific deliverables
and a review process.
 Phases are processed and completed one at a time.
 Works well for smaller projects where requirements are very well understood.
 Clearly defined stages.
 Well understood milestones.
 Easy to arrange tasks.
3.4 SYSTEM DESIGN
In this section we discuss about the Unified Modelling Language and the different UML
diagrams.
31
3.4.1 Introduction to the UML:
The unified modeling language (UML) is a standard language for writing software blueprints.
The UML is a language for Visualizing, Specifying, Constructing, and Documenting. UML
is a language that provides vocabulary and the rules for combining words in that vocabulary
for the purpose of communication. A modeling language is a language whose vocabulary and
rules focus on the conceptual and
physical representation of a system. Vocabulary and rules of a language tell us how to create
and real well formed models, but they don’t tell you what models you should create and when
should create them.
Visualizing
The UML is more than just a bunch of graphical symbols. In UML each symbol has well
defined semantics. In this manner one developer can write a model in the UML and
another developer or even another tool can interpret the model unambiguously.
Specifying
UML is used for specifying means building models that are precise, unambiguous and
complete.
UML addresses the specification of all the important analysis, design and implementation
decisions that must be made in developing and deploying a software intensive system.
Constructing
UML is not a visual programming language but its models can be directly connected to a
variety of programming languages. This means that it is possible to map from a model in
the UML to a programming language such as java, C++ or Visual Basic or even to tables
in a relational database or the persistent store of an object-oriented database. This mapping
permits forward engineering. The generation of code from a UML model into a
programming language. The reverse engineering is also possible you can reconstruct a
model from an implementation back into the UML.
Documenting
32
UML is a language for Documenting. A software organization produces all sorts of
artifacts in addition to raw executable code. These artifacts include Requirements,
Architecture, Design, Source-code, Project plans, Test, Prototype, Release. Such artifacts
are not only the deliverables of a project, they are also critical in controlling, measuring
and communicating about a system during its development and after its deployment.
3.4.2 Diagrams in UML:
A diagram is the graphical presentation of a set of elements, most often rendered as a

connected graph of vertices (things) and areas (Relationships). We draw diagram to
visualize a system from different perspective, so a diagram is a projection into a system
Some UML of the Diagrams:
Use case Diagrams:
Use case diagram shows a set of use cases and actors (a special kind of class) and their
relationships. They address the static use case view of a system. These diagrams are
important in organizing and modelling the behaviour of a system.
33
Figure 3.2 - Use Case Diagram
Interaction Diagrams:
An interaction diagram shows an interaction consisting of a set of objects and their

relationships including the messages that may be dispatched among them. Interaction
diagram address the dynamic view of a system
These are two kinds of interaction diagrams
a) Sequence diagram
A sequence diagram is an interaction diagram that emphasizes the time ordering of
messages.
34
Figure 3.3 Sequence Diagram
35
b)Collaboration diagram
A collaboration diagram is an interaction diagram that emphasizes the structural

organization of the objects that send and receive messages.
Sequence and collaboration diagrams are isomorphic, meaning that you can take one and
transform it into the other.
Activity Diagram:
Activity diagram is a special kind of a state chart diagram that shows the flow from
activity within a system. They address the dynamic view of a system. They are important
in modeling the functions of a system and emphasize the flow of control among objects.
36
37
Figure 3.4 – Activity Diagram
38
CHAPTER 4
IMPLEMENTATION
4.1 SOFTWARE USED
The software’s applicable are python using anaconda prompt where it works under the
browser Jupyter Notebook and even we have the imported packages the Pandas.
4.1.1 Anaconda Command Prompt
Anaconda is an open source distribution of the Python and R programming languages and it
is used in data science, machine learning, deep learning- related applications aiming at
simplifying package management and deployment. Anaconda Distribution is used by over 7
million users, and it includes more than 300 data science packages suitable for Windows,
Linux, and MacOS.
Anaconda is one of several Python distributions. Python on its own is not going to be useful
unless an IDE is installed. This is where Anaconda comes into picture. Python distributions
provide the Python interpreter, together with a list of Python editors, tools and packages.
The package management tool is part of the Anaconda software package. Install anaconda by
navigating to the Anaconda download page. Scroll down to the “Anaconda for Windows”
portion of the web page. Download the Python
3.5 version by clicking on the “Windows 64-bit Graphical Installer” link. It is a big
download, so it is best to be on fast network. Open the installer file you just downloaded.
It should be named something like Anaconda [version]-Windows-x86_64.This action will
guide you through the anaconda installation on Windows. The last step of the installation
process will ask you if you want to add Anaconda to my PATH environment variable. Ensure
this option is checked.
39
4.1.2Jupyter Notebook
On Windows, you can run Jupyter via the shortcut Anaconda adds to your start menu, which
will open a new tab in your default web browser that should look something like the
following screenshot.
Fig.4.1Jupyter Notebook
This isn’t a notebook just yet, but don’t panic! There’s not much to it. This is the Notebook
Dashboard, specifically designed for managing your Jupyter Notebooks. Think of it as the
Launch pad for exploring, editing and creating your notebooks.
Be aware that the dashboard will give you access only to the files and sub- folders contained
within Jupyter’s start-up directory; however, the start-up directory can be changed. It is also
possible to start the dashboard on any system via the command prompt (or terminal on UNIX
systems) by entering the command jupyter notebook; in this case, the current working
directory will be the start-up directory. The astute reader may have noticed that the URL for
the dashboard is something like http://localhost:8888/tre Localhost is not a website, but
indicates that the content is being served from your local machine: your own computer.
Jupyter’s Notebooks and dashboard are web apps, and Jupyter starts up a local Python server
to serve these apps to your web browser, making it essentially platform independent and
opening the door to easier sharing on the web.
The dashboard’s interface is mostly self-explanatory — though we will come back to it
briefly later. So what are we waiting for? Browse to the folder in which you would like to
create your first notebook, click the “New” drop-down button in the top-right and select
“Python 3”
40
4.2 DATA PRE-PROCESSING
Data preprocessing is a data mining technique that involves transforming raw data into an
understandable format. Real-world data is often incomplete, inconsistent, and/or lacking in
certain behaviors or trends, and is likely to contain many errors.
Data preprocessing is a proven method of resolving such issues. Data preprocessing prepares
raw data for further processing.
● Data Cleaning: Data is cleansed through processes such as filling in missing values,
smoothing the noisy data, or resolving the inconsistencies in the data.
● Data Integration: Data with different representations are put together and conflicts within
the data are resolved.
● Data Transformation: Data is normalized, aggregated and generalized.
● Data Reduction: This step aims to present a reduced representation of the data in a data
warehouse.
● Data Discretization: Involves the reduction of a number of values of a continuous attribute

by dividing the range of attribute intervals.
41
CHAPTER 5
RESULTS
To classify leaf diseases a collection of leaf images is required. The images are downloaded
from the kagle.com. The images from the dataset are taken for training the model. It contains
a collection of images taken at different environment. The dataset contains more than 1000
images of the leaf. The samples of the dataset are shown in table 4.1. The model is trained by
healthy images in fig 4.2 and unhealthy images in fig 4.3. The proposed system will fastly
detect the unhealthy leaf and report the remedies to the farmer. The proposed system
detection accuracy is above 95%.
S.NO TYPE OF COUNT

DISEASE
1 Healthy 345
2 Late blight 700
3 Bacterial Spot 400
4 Yellow Curl Virus 355
5 Anthracnose 200
FIG: Dataset of classifications
Fig 4.2 Sample Images of Healthy Leaves
42
Fig 4.3 Sample Images of Unhealthy Leaves
Fig.4.4 Input of an Image
43
FIG:browse image
FIG:Analyses the image
44
FIG: Status of Leaf
FIG: Remedies to protect the plant
45
FIG: Analyzing the various plant leaf’s
46
Experimental Results
The Detection Accuracy is measured by taking 5 different types of leaves and 100 images
used for training and few images taken for testing. The proposed method compares with two
existing algorithms K-Mean and SVM shown in Table 4.3. The detection accuracy is shown
in fig 4.9 and fig 4.10.
Table 4.3 Detection Accuracy
Fig 4.9 Detection Accuracy
47
Fig 4.10 Detection Accuracy for detecting the exact d
48
B.SCREENSHOTS
ACCURACY RESULTS
Fig B.1 Input of an Image
Fig B.2 Browse Image
49
FIG B.3 Analyses the image
Fig B.4 Status of the Leaf
50
Fig B.5 Remedies to protect the plants
51
CHAPTER 6
CONCLUSION AND FUTURE WORK
6.1 CONCLUSION
This can provide information on classification technique which used to detect diseases in
plant leaves, as well as algorithms for image segmentation techniques that can be used for
automatic detection and classification of diseases in plant leaves later.
Lady's fingers, tomatoes, roses, citrus fruits, potatoes and jasmine are some of the ten types
tested in the proposed algorithm.
Therefore, diseases related to this plant are used for identification. Optimal results are
achieved with far less computational effort, which also shows the effectiveness of the
proposed algorithm in the detection and classification of leaf diseases.
Another advantage of this method is that plant diseases can be detected early or in early
stage. To increase the level of recognition in the classification process, artificial neural
networks, Bayesian classifiers, fuzzy logic, and hybrid algorithms can also be used.
6.2 FUTURE ENHANCEMENT
We can also increase the level of recognition in the classification process, artificial neural
networks, Bayesian classifiers, fuzzy logic, and hybrid algorithms can also be used.
By using the above-mentioned algorithms, we can get more the accurate results.
52
CHAPTER 7
REFERENCE
[1] Alom M. Z., Taha T. M., Yakopcic C., Westberg S., Hasan M., Esesn B. C. V. et al.
(2018). The story begins with alexnet: a comprehensive study of part-time learning
approaches. arXiv: 1803.01164.
[2] Barbedo J.G.A. (2018a). Factors that influence the use of comprehensive training in
detecting plant diseases. Biosyst. 172, 84-91. 10.1016 / j.biosystemseng.2018.05.013
[3] Barbedo J.G.A. (2018b). Effect of size and variation of data sets on the effectiveness of
transfer training and depth for classification of plant diseases. Comput Electron. Agric 153,
46-53. 10.1016 / j.compag.2018.08.013
[4] Brahimi M., Arsenovich M., Laraba S., Sladojevich S., Buhalfa K., Musau A. (2018).
Deep Learning for Plant Diseases: Visualization of maps for recognition and highlighting in
human and machine learning, ed. Zhou J., Chen F., editor. (Cham: Springer International
Publishing;), 93-117.
[5] Chalapathy R., Chawla S. (2019). In-depth learning to detect anomalies: survey. arXiv:
1901.03407.
[6] Ching T., Himmelstein D.S., Beaulieu-Jones B.K., Kalinin A.A., Do B.T., Way G.P. et al.
, (2018). Opportunities and obstacles for comprehensive education in biology and medicine.
J. R. Soc. The interface. 15: 20170387. 10.1098 / rsif.2017.0387
[7] Knillmann S., Liess M. (2019). Effects of pesticides on ecosystems. Cham: Springer
International Publishing.
[8] Dhaygude Sanjay B, Kumbhar Nitin P (2013), "Detection of leaf plant diseases through
image processing", Int J Adv Res Electron Electro Instrum, Volume 2 (1).
[9] Bashir Sabah, Sharma Navadepe (2012). Remote detection of plant diseases through
image processing. IOSR J Electron Commun Eng; 2 (6): 31-4. ISSN: 2278-2834.
[10] Chaudhary Piyush et al. (2012). Color transformation based approach for detecting
diseases in the leaves of IntComput SciTel Telecommun; 3 (6).
[11] Kulkarni Anand H., Ashwin Patil RK (2012). The use of image processing technology to
detect plant diseases. Input J Mod Eng Res; 2 (5): 3661-4.
[12] Mrunalini R Badnakhe, Deshmukh Prashant R (2011). The use of K-means grouping and
artificial intelligence to identify patterns of cultural disease. Int ConfAdvInfTechnol; IPCSIT.
53
[13] Christina, Ruth, Greeshma Liz Shajan, and B. Ankayarkanni. "CART-A Statistical
Model for Predicting QoE using Machine Learning in Smartphones." In IOP Conference
Series: Materials Science and Engineering, vol. 590, no. 1, p. 012001. IOP
Publishing, 2019.
[14] M. J. N. V. S. K. Asrith, K. P. Reddy and Sujihelen, "Face Recognition and Weapon
Detection from Very Low Resolution Image," 2018 International Conference on Emerging
Trends and Innovations In Engineering And Technological Research
(ICETIETR), Ernakulam, 2018, pp. 1-5.
[15] Chandy, Abraham. "RGBD ANALYSIS FOR FINDING THE DIFFERENT STAGES
OF MATURITY OF FRUITS IN FARMING." Journal of Innovative Image Processing
(JIIP)1, no. 02 (2019): 111-121.
54

Leaf Disease Detection Using Decision Tree

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Leaf Disease Detection Using Decision Tree

Uploaded by

Copyright:

Available Formats

A Dissertation Work Entitled on

LEAF DISEASE DETECTION USING DECISION TREE

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Signature of the Supervisor Signature of the Head of the Dept

Name: FATIMA MASOOD

I feel satisfied presenting the project entitled “Leaf disease

I thank Dr. Satya Prasad Lanka, the principal , Stanley

I would like to express my deep sense of respect and gratitude

KEYWORDS: Plant disease and pest detection, Preprocessing,

ABSTRACT… ................................................................................................ III

1.1 General Approaches ..............................................................................01

1.2 Problem Statement ................................................................................03

1.4 Aim & Scope ........................................................................................ 06

1.5 Background ........................................................................................... 12

1.6 Thesis organisation ............................................................................... 15

2 LITERATURE SURVEY… .................................................................................16

2.1 limitation of existing work… .................................................................... 19

3 PROBLEM ANALYSIS & DESIGN ................................................................ 20

3.1 Experimental methods & algorithm ...........................................................20

3.1.1 Decision tree modeling ............................................................ 21

3.2 CNN |(convolution neural network) ........................................................22

3.3 Model building..........................................................................................27

3.3.1 Proposed system…..................................................................28

3.4 Software hardware requirements… .............................................. 30

3.5 System designs… ............................................................................. 31

4.2 Data preprocessing ........................................................... ……………...41

6 CONCLUSION & FUTURE WORKS ............................................................... 52

Table 5.1 Dataset of classifications...................................................................... 40

Table 5.9 Analyzing the various plant leafs ........................................................ 44

Table 5.10 Detection accuracy ............................................................................. 45

Table 5.11 Comparision and accuracy ................................................................. 45

Table 5.12 Disease accuracy ............................................................................... 46

Fig 1.1 Three categories of these Machine Learning algorithm… ..................... 08

Fig 1.2 Category of Approaches ......................................................................... 11

Fig 1.3 Applications of DRL Approaches .......................................................... 11

Fig 3.1 Decision Tree .......................................................................................... 21

Fig 3.2 CNN ........................................................................................................ 23

Fig 3.5 Architecture of proposed system ............................................................. 28

Fig.3.6 Waterfall Model ..................................................................................... 31

Fig 3.7 Use case model ....................................................................................... 34

Fig 3.8 Sequence model ...................................................................................... 35

Fig 3.9 Activity model ........................................................................................36

Fig 5.2 Sample images (healthy) ........................................................................ 40

Fig 5.3 Sample images (unhealthy) .................................................................... 41

Fig 5.4 I/o of an image ....................................................................................... 41

Fig 5.5 Browse images ....................................................................................... 42

Fig 5.6 Analyses the images ............................................................................... 42

Fig 5.7 Status of leaf ........................................................................................... 43

Fig 5.8 Remedies to protect the plant ................................................................. 43

MACHINE LEARNING ALGORITHM

1.2 PROBLEM STATEMENT

IN India is an agriculture-dependent country.70% of the Indian economy depends on

1.2.1 Pesticides in Freshwater Ecosystems

1.4.1 DECISION TREE

The Aim of this research are as follows:

 Supervised learning is a commonly used machine learning algorithm which appears

 Semi-supervised learning is learning that occurs based on partially labeled datasets

4) Deep Reinforcement Learning (DRL)

 Deep Reinforcement Learning is a learning technique for use in unknown

1.5.1 Supervised Learning:

Some advantages of SVM are:

Some disadvantages of SVM are:

The thesis is consists of six chapters organized as the follows: