Professional Documents
Culture Documents
MACHINE LEARNING
Synopsis Report submitted in partial fulfillment
of the requirement for the degree of
B. E.(Information Technology)
Submitted By
SAUMITRA APTE
MANOR DESHMUKH
YOGITA SANGALE
University of Mumbai
2018-19
CERTIFICATE OF APPROVAL
For
Project Synopsis
SAUMITRA APTE
MANOR DESHMUKH
YOGITA SANGALE
Dr. P. S. Patekar
Declaration
We declare that this written submission represents our ideas in our own words and
where others' ideas or words have been included, we have adequately cited and referenced the
original sources. We also declare that we have adhered to all principles of academic honesty
and integrity and have not misrepresented or fabricated or falsified any idea/data/fact/source
in our submission. We understand that any violation of the above will be cause for
disciplinary action by the Institute and can also evoke penal action from the sources which
have thus not been properly cited or from whom proper permission has not been taken when
needed.
Date:
i
ACKNOWLEDGEMENTS
ii
TABLE OF CONTENTS
ABSTRACT
Crop diseases are a major threat to food security, but their rapid identification remains
difficult in many parts of the world due to the lack of the necessary infrastructure. The
iii
combination of increasing global smartphone penetration and recent advances in computer
vision have paved the way for smartphone-assisted disease diagnosis. We make use of
different machine learning algorithms like Random Forests, SVM, KNN, CNN in identifying
between healthy and diseased grape leaves from the data sets created. Our proposed project
includes various phases of implementation namely dataset creation, feature extraction,
training the classifier and classification. For extracting features of an image we use Histogram
of an Oriented Gradient (HOG). Overall, using machine learning to train the large data sets
available publicly gives us a clear way to detect the disease present in plants in a colossal
scale.
iv
1. INTRODUCTION
Machine learning is a field of artificial intelligence that uses statistical techniques to give
computer systems the ability to "learn" (e.g., progressively improve performance on a specific
task) from data, without being explicitly programmed. Machine learning algorithms are often
categorized as supervised or unsupervised.
Supervised algorithms require a data scientist or data analyst with machine learning skills to
provide both input and desired output, in addition to furnishing feedback about the accuracy
of predictions during algorithm training.
Unsupervised algorithms do not need to be trained with desired outcome data. Instead, they
use an iterative approach called deep learning to review data and arrive at conclusions.
Classification is the process of predicting the class of given data points. Classes are
sometimes called as targets/ labels or categories. Classification predictive modeling is the task
of approximating a mapping function (f) from input variables (X) to discrete output variables
(y).
Our system will classify healthy and diseased leaves by using different classification
algorithms i.e. supervised learning. After getting the required results we will also try
unsupervised algorithms like convolutional neural networks.
Accuracy can be defined as the ratio of Number of correct predictions and Total number of
predictions. Accuracy will be the factor which will assist us in comparing the above
mentioned algorithms and the algorithm with the highest accuracy will be selected by us for
further growth of the project.
Our system will replace the manual process of taking a sample leaf to the nearest village
knowledge center or agricultural facility. This will enhance the lives of Indian farmers and
increase production.
1
2. AIM AND OBJECTIVES
Our aim is to create a machine learning model which can accurately classify healthy and
diseased grape leaves. We then plan to use this model to create an application which will be
accessible to farmers.
Our objectives for each phase are as follows-
1. Using the dataset PlantVillage available on github, processing each image to extract
HOG(Histogram of oriented gradients) features which can be used for our algorithms.
2. To find the best performing algorithm which can give us an accuracy of 93+%, as
models with accuracy more than 93% are more trustworthy.
3. Using this machine learning model to create an application which can be user friendly
for farmers.
2
3. LITERATURE SURVEYED
3
4. PROBLEM STATEMENT
Crop diseases are a major threat to food security, but their rapid identification remains
difficult in many parts of the world due to the lack of the necessary infrastructure. The
combination of increasing global smartphone penetration and recent advances in computer
vision have paved the way for smartphone-assisted disease diagnosis
We aim to create an application which can process images of various infected and healthy
grape leaves and make them suitable to extract features out of them. These features will be
used as the dataset for machine learning algorithms like Random Forests, KNN, SVM, CNN.
4
5. SCOPE
Modern technologies have given human society the ability to produce enough food to meet
the demand of more than 7 billion people. However, food security remains threatened by a
number of factors including climate change, the decline in pollinators, plant diseases, and
others. Plant diseases are not only a threat to food security at the global scale, but can also
have disastrous consequences for smallholder farmers whose livelihoods depend on healthy
crops. Thus a good idea to help making these applications more user-friendly is to ask users to
take a photo of their crops, and from that photo to give a detailed result.
Our project narrows down this ideology to detection of diseases of grape crops. Our system
will be able to-
1. Successfully identify grape plant diseases.
2. Provide the farmer with details about the disease.
3. Provide the farmer with treatment options and related information about pesticides and
their usage.
5
6. PROPOSED SYSTEM
6
7. METHODOLOGY
To find out whether the leaf is diseased or healthy, certain steps must be followed. i.e.,
Preprocessing, Feature extraction, Training of classifier and Classification. Preprocessing of
image, is bringing all the images size to a reduced uniform size. Then comes extracting
features of a preprocessed image which is done with the help of HOG. HOG is a feature
descriptor used for object detection. In this feature descriptor the appearance of the object and
the outline of the image is described by its intensity gradients. One of the advantage of HOG
feature extraction is that it operates on the cells created. Any transformations doesn’t affect
this.
The algorithm we are planning to implement is random forests classifier. They are flexible in
nature and can be used for both classification and regression techniques. Compared to other
machine learning techniques like SVM, Gaussian Naïve Bayes, logistic regression, linear
discriminant analysis, Random forests may gave more accuracy with less number of image
data set. The following figure shows the architecture of our proposed algorithm.
7
8. ANALYSIS
The process model to be used for our project is the incremental model.
Reasons to use incremental model-
Our project guide will review the project on a weekly basis and provide us with
suggestions and improvements. We will have to change our approach accordingly.
It is more flexible and requires less time.
Easier to manage risks because risky pieces are identified and handles during its
iteration.
It is easier to test and debug during a smaller iteration.
Generates working software quickly and early during the software life cycle.
8
8.2 Feasibility Study
1. EXECUTIVE SUMMARY
Crop diseases are a major threat to food security, but their rapid identification remains
difficult in many parts of the world due to the lack of the necessary infrastructure. The
combination of increasing global smartphone penetration and recent advances in computer
vision have paved the way for smartphone-assisted disease diagnosis.
2. DESCRIPTION OF PRODUCTS AND SERVICES
This application will be very user friendly even for people who are old and not so tech savvy.
Also we will implement high level of abstraction and the user won’t even be aware that the
results are presented with the use of some complex machine learning algorithms. The users
will have to click a picture of the crop and they will get the results after some processing time.
3. TECHNOLOGY CONSIDERATIONS
We will require a good quality camera with at least 4 megapixels, which can capture RGB
images. The images will be converted to features using image processing; this will require a
decent processor such as Intel Pentium.
4. PRODUCT/SERVICE MARKETPLACE
The target users for this project are the young farmers who are tech savvy. The application
will be launched on platforms like the Google Play Store..
5. MARKETING STRATEGY
We have planned an advertising campaign on social media applications like Facebook,
Twitter, Instagram, Snapchat, etc.
6. FINDINGS AND RECOMMENDATIONS
Small chunks of this project idea have already been executed and hence their success will
probably mean that our project will also be successful. The constraints do not slow down or
stop the project and hence we have selected this approach.
9
8.3 Timeline Chart
10
8.4 Cost Analysis
a) Effort
E= ab(KLOC)
3(6)1.12
b) Project duration
D=cb(E)db
=2.5(22.32)0.35
=7.412
c) Number of person
N=E/D
=22.32/7.4
=3.011 ~3 persons
11
9. DESIGN
9.1 DATA FLOW DIAGRAM
In level 0 DFD input is a leaf Image then analysis process is done on that infected leaf
the output result is as disease result.
In level 1 DFD input is a leaf Image taken by mobile camera or digital camera, then
analysis process is done by using some machine learning and image processing
12
techniques on that infected leaf then the output result is as disease result such as
disease name, remedy uses and disease details.
13
9.3 USE CASE DIAGRAM
14
10. HARDWARE AND SOFTWARE REQUIREMENTS
15
REFERENCES
16
17