Professional Documents
Culture Documents
BACHELOR OF TECHNOLOGY
IN
COMPUTER SCIENCE AND ENGINEERNG
Submitted by
V.Lokesh (19W61A0578)
L.Ganesh(19W61A0546)
Y.Dileep(19W61A0584)
J.Bharath vamsi(19W61A0532)
E.Chanikya(19W61A0520)
Under the Esteemed Guidance of
Miss.K.Sakunthala,
Asst.professor
Certificate
This is to certify that this project work entitled "CROP
RECOMMENDATION SYSTEM USING MACHINE LEARNING" is the bonafide work
carried out by V.Lokesh(19W61A0578),
L.Ganesh(19W61A0546),Y.Dileep(19W61A0584),J.Bharathvamsi(19W61A0532),
E.Chanikya(19W61A0520) submitted in partial fulfillment of the requirements for the
Award of degree of Bachelor of Technology in Computer Science and Engineering, during
the year 2019-2023.
External Examiner
ACKNOWLEDGEMENT
It is indeed with a great sense of pleasure and immense sense of gratitude that we
acknowledge the help of these individuals. We feel elated in manifesting our sense of
gratitude to our guide, Miss.K.Sakunthala, Assistant Professor in the Department of
Computer Science and Engineering for his valuable guidance. He has been a constant source
of inspiration for us and we are very deeply thankful to him for his support and invaluable
advice. We would like to thank our Head of the Department Dr. M.Murali krishna, for his
constructive criticism throughout our project. We are highly indebted to Principal Dr. Y
Srinivasa Rao, for the facilities provided to accomplish this project. We are extremely
grateful to our Departmental staff members, lab technicians and non-teaching staff members
for their extreme help throughout our project. Finally we express our heartfelt thanks to all of
our friends who helped us in successful completion of this project.
Project Members
V.Lokesh(19W61A0578)
Y.Dileep (19W61A0584)
L.Ganesh(19W61A0546)
E.Chanikaya(19W61A0520)
DECLARATION
I do hereby declare that the work embodied in this project report entitled " CROP
RECOMMENDATION SYSTEM USING MACHINE LEARNING" is the
outcome of genuine research work carried out by me under the direct supervision of
Miss.K.Sakunthala Assistant Professor, Department of Computer Science
Engineering and is submitted by me to Sri Sivani College of Engineering. The work is
original and has not been submitted elsewhere for the award of any other degree or
diploma.
Project Members
V.Lokesh(19W61A0578)
L.Ganesh (19W61A0546)
Y.Dileep (19W61A0584)
J.Bharath vamsi (19W61A0532)
E.Chanikya(19W61A0520)
INDEX
1 INTRODUCTION 1
1.1 OVERVIEW 1
1.2 IDENTIFICATION/NEED 2
2 LITERATURE SURVEY 5
3 SYSTEM ANALYSIS&DESIGN 9
4.2.1 PYTHON 26
5.2.10 SUGGESTIONS 66
5.2.11 SUGGESTIONS 67
6.1 CONCLUSION 75
LIST OF TABLES
India being an agriculture country, its economy predominantly depends on agriculture yield
growth and agro industry products. Data Mining is an emerging research field in crop yield
analysis. Yield prediction is a very important issue in agricultural. Any farmer is interested in
knowing how much yield he is about to expect and what is the crop that is suitable for the
land. Analyze the various related attributes like location, pH value from which alkalinity of
the soil is determined. Along with it, percentage of nutrients like Nitrogen (N), Phosphorous
(P), and Potassium (K) Location is used along with the use of third-party applications like
APIs for weather and temperature, type of soil, nutrient value of the soil in that region,
amount of rainfall in the region, soil composition can be determined. All these attributes of
data will be analyzed, train the data with various suitable machine learning algorithms like
SVM, Random-Forest, KNN and Voting Classifier for creating a model. The system comes
with a model to be precise and accurate in predicting crop yield and deliver the end user with
the proper recommendations about required fertilizer ratio based on atmospheric and soil
parameters of the land which enhance to increase the crop yield and increase farmer revenue.
Thus, the proposed system takes the data regarding the quality of soil and the weather related
information as an input. The quality of the soil such as Nitrogen, Phosphorous, Potassium and
Ph value. Weather related information like Rainfall, Temperature and Humidity to predict the
better crop. In our project we are taking the datasets from Kaggle website.
1. INTRODUCTION
CHAPTER 1
INTRODUCTION
1.1 OVERVIEW
Agriculture is one of the most important occupation practiced in our country. It is the
broadest economic sector and plays an important role in overall development of the country.
About 60 % of the land in the country is used for agriculture in order to suffice the needs of
1.2 billion people. Thus, modernization of agriculture is very important and thus will lead the
farmers of our country towards profit. Data analytic (DA) is the process of examining data
sets in order to draw conclusions about the information they contain, increasingly with the aid
of specialized systems and software. Earlier yield prediction was performed by considering
the farmer's experience on a particular field and crop. However, as the conditions change day
by day very rapidly, farmers are forced to cultivate more and more crops. Being this as the
current situation, many of them don’t have enough knowledge about the new crops and are
not completely aware of the benefits they get while farming them. Also, the farm productivity
can be increased by understanding and forecasting crop performance in a variety of
environmental conditions. Thus, the proposed system takes the data regarding the quality of
soil and the weather related information as an input. The quality of the soil such as Nitrogen,
Phosphorous, Potassium and Ph value. Weather related information like Rainfall,
Temperature and Humidity. In our project we are taking the datasets from Kaggle website.
1
1.2. IDENTIFICATION/ NEED
A crop prediction is a widespread problem that occurs. During the rising season, a
farmer had curiosity in knowing how much yield he is about to expect. In the earlier period,
this yield prediction become a matter of fact relied on Farmer’s long-term experience for
specific yield, crops and climatic conditions. Farmer directly goes for yield prediction rather
than concerning on crop prediction with the existing system. Unless the correct crop is
predicted how the yield will be better and additionally with existing systems pesticides,
environmental and meteorological parameter related to crop is not considered. Promoting and
soothing the agricultural production at a more rapidly pace is one of the essential situation for
agricultural improvement. Any crop's production show the way either by interest of domain
or enhancement in yield or both. In India, the prospect of widening the district under any crop
does not exist except by re-establishing to increase cropping strength or crop replacement.
So, variations in crop productivity continue to trouble the area and generate rigorous distress.
So, there is need to attempt good technique for crop prediction in order to overcome existing
problem.
SCOPE
Applying Naive Bayes Data Mining Technique for Crop selection will depending on
the nature of the Naive probability model. It can be trained very easy in a supervised learning
section. In several practical applications, parameter estimation for naive Bayes uses the
method of naive Bayes model with believing in Bayesian probability or using any Bayesian
methods.
ADVANTAGES
2
DISADVANTAGES
OBJECTIVE
Achieving maximum crop yield at minimum cost is one of the goals of agricultural
production. Early detection and management of problems associated with crop yield
indicators can help increase yield and subsequent profit. By influencing regional weather
patterns, large-scale meteorological phenomena can have a significant impact on agricultural
production. Predictions could be used by crop managers to minimize losses when un –
favorable conditions may occur. Additionally, these predictions could be used to maximize
crop prediction when potential exists for favorable growing conditions.
3
2. LITERATURE SURVEY
4
CHAPTER 2
LITERATURE SURVEY
It had accomplished gigantic work for Indian ranchers by making productive yield
proposal framework. They created framework utilizing classifier models, for example,
Decision Tree Classifier, KNN, and Naive Bayes Classifier. The proposed framework can be
utilized to figure out best season of planting, development of plant and Plant reaping. They
utilized distinctive classifier for accomplishing better exactness for instance: Decision tree
shows less precision when dataset is having more varieties yet Naïve Bayes gives preferable
exactness over choice tree for such datasets. The best favorable position of framework that it
can without much of a stretch versatile all things considered/be utilized to test on various
yields.
It have presumed that this paper fabricate extemporized framework for crop yield
utilizing administered AI calculations and with objective to give simple to utilize User
Interface, increment the precision of crop yield forecast, investigate distinctive climatic
boundaries, for example, overcast cover, precipitation, temperature, and so on In the
proposed framework they zeroed in on MAHARASHTRA State for implantation and for
information gathering they utilized govt. site, for example, www.data.gov.in. For crop yield
forecast they utilized calculations, for example, Random Forest Algorithm and for
convenience they created website page so it will be not difficult to use for all. The primary
favorable position of proposed framework is precision rate is more than 75 percent on the
whole the yields and areas chose in the examination
It has inferred that this paper will survey that different utilization of AI in the
cultivating areas. And furthermore, helps in can be select appropriate crop select land and
select season settled utilizing these procedures. The calculations use is Naive Bayes and K-
Nearest Neighbor. The calculations are utilizes precision of execution.
5
2.4 AMIT KUMAR ET:
It has presumed that this paper helps in foreseeing crop arrangements and augmenting
yield rates and making advantages to the ranchers. Additionally, Using Machine learning
applications with farming in foreseeing crop sicknesses, examining crop copies, diverse water
system designs. The calculations utilized are fake neural organizations. The serious issue with
neural organization is that the proper organization which suits best for the arrangement is
difficult to accomplish and it incorporates experimentation. The second issue with neural
organization is the equipment reliance as the calculation incorporates more calculations in
reverse and forward the preparing needs more. Assurance of appropriate organization
structure requires insight and time. The proposed framework likewise centers around crop
determination utilizing natural just as financial variables. The framework likewise utilizes the
monetary factor that is the cost of the crop which assumes a significant part on the off chance
that if the yields with same yield yet unique yield cost. The framework additionally utilizes
other strategy which is crop sequencing which gives a full arrangement of yield which can be
developed all through the season. The proposed framework likewise centers around crop
choice utilizing ecological just as financial variables. The framework likewise utilizes the
monetary factor that is the cost of the crop which assumes a significant part on the off chance
that if the crops with same yield yet unique yield cost. The framework additionally utilizes
other technique which is crop sequencing which gives a full arrangement of yield which can
be developed all through the season.
It has have presumed that this paper helps in improving the yield pace of crops by
utilizing rule based mining. The paper utilizes affiliation rule mining to foresee the yield of
the crop. The calculations utilized are k-Means Algorithm, bunching strategy and deduced
affiliation rule mining. The significant impediment is that the paper utilizes affiliation rule
digging for expectation of crop yield. The issue with affiliation decides mining is that it
creates an excessive number of rules sometimes and the exactness of the expectation
decreases. Likewise the principles will in general fluctuate according to dataset and the
outcomes additionally enormously. The proposed framework mostly centers around the issue
of yield expectation of crop which assumes vital part in yield choice as rancher can choose
crop with greatest yield. The frameworks utilize affiliation rule mining to discover rules and
6
crops with greatest yield. This framework centers on formation of an expectation model
which might be utilized to future forecast of crop yield.
It has have presumed that this paper helps in improving the yield pace of crops by
applying order techniques and looking at the boundaries. The paper clarifies the utilization of
various calculations to accomplish the equivalent. The calculations proposed are Bayesian
calculation, K-implies Algorithm, Clustering Algorithm, and Support Vector Machine. The
hindrance is that there could be no appropriate precision and execution referenced in the
paper according to usage of the proposed calculations. The paper is a study paper and just
recommends the utilization of the calculations yet there is no usage proof gave in the paper.
The technique applied on this paper for crop decision centers uniquely around the plants
which might be developed as indicated by season. The proposed approach settles decision of
crop (s) principally dependent on forecast yield cost supported by boundaries (for example
Environment, soil kind, water thickness, crop kind). It takes crop, their planting time, estate
days and foreseen yield charge for the season as information and finds a succession of
vegetation whose creation with regards to day are greatest over season.
The depicts and gave the subtleties us for rundown of utilized techniques, In India
there are divergent Agriculture crops creation and those crops relies upon the few sort of
elements, for example, natural science, economy and furthermore the geological variables
covering such procedures and strategies on memorable yield of disparate yields, it is
conceivable to get information or information which can be steady to ranchers and
government associations for creation well choices and for improve rules which help to
expanded creation. In this article, our work is on utilization of information mining strategies
which is use to separate data from the horticultural records to assess better crop yield for
primary yields in principle regions of India. In our task we found that the exact expectation of
disparate indicated crop yields across various locales will help to ranchers of India. From this
Indian ranchers will plant various crops in various districts.
7
2.8 VISHNUVARDHAN ET:
They examined a few development in India is dealing with thorough issue to benefit as
much as possible from the crop efficiency. More than 60 out of a hundred the crop actually
relies upon rainstorm precipitation. Momentum developments in Information Technology for
Agriculture field have built up an intriguing exploration zone to conjecture the crop yield.
The risky of yield expectation is a significant issue that stays to be addressed dependent on
available information. Information mining techniques are the better determinations for this
reason. Distinctive Data Mining strategies are utilized and assessed in agribusiness for
approximating the impending year's crop creation. This paper presents a concise investigation
of crop yield forecast utilizing Multiple Linear Regression (MLR) strategy and Density based
grouping procedure for the specific district for example East Godavari region of Andhra
Pradesh in India. In this paper an exertion is made in order to know the locale exact crop
yield examination and it is prepared by applying both Multiple Linear Regression technique
and Density-based bunching strategy. These models were tested in regard of the multitude of
areas of Andhra Pradesh; at that point the strategy of assessment is dropped with just East
Godavari region of Andhra Pradesh in India
8
3. SYSTEM ANALYSIS
&
DESIGN
9
CHAPTER 3
Niketa et al in 2016 have indicated that the yield of the crop depends on the
seasonal climate. In India, climate conditions vary unconditionally. In the time of drought,
farmers face serious problems. So this taken into consideration they used some machine
learning algorithms to help the farmers to suggest the crop for the better yield. They take
various data from the previous years to estimate future data. They used SMO classifiers in
WEKA to classify the results. The main factors that take into consideration are minimum
temperature, maximum temperature, average temperature, and previous year’s crop
information and yield information. Using SMO tool they classified the previous data into two
classes that are high yield and low yield.
Eswari et al in 2018 have indicated that yield of the crop depends on the perception,
average, minimum and maximum temperature. Apart, from that, they have taken one more
attribute named crop evapotranspiration. The crop evapotranspiration is a function of both the
weather and growth stage of the plant. This attribute is taken into consideration to get a good
decision on the yield of the groups. They all collected the dataset with these attributes and
send as input to the Bayesian network and classify into the two classes named true and false
classes and compared with the observed classifications in the model with a confusion matrix
and bring the accuracy. Finally, they concluded that crop yield prediction with Naïve Bayes
and Bayesian network give high accuracy when compared to SMO classifier and forecasting
the crop yield prediction in different climate and cropping scenarios will be beneficial.
The obtained result for the crop yield prediction using SMO classifier gives less
accuracy when compared to naïve Bayes, multilayer perception and Bayesian network.
Previously yield is predicted on the bases of the farmer’s prior experience but now
weather conditions may change drastically so they cannot guess the yield.
10
3.2 PROPOSED SYSTEM
In the proposed system, we develop Prediction of the crop using the efficient
algorithm.
The challenge in it is to build the efficient model to predict the better crop
Here in this project we use machine learning algorithms like Voting classifier which
is nothing but hybrid classification/ensemble of models. In our project the Voting classifier is
an ensemble of models that are obtained from SVM, Random-Forest and KNN. Which can
enhance the accuracy and it can give a better prediction system.
Early detection of problems and management of those problems can help the farmers for
better crop yield.
For the better understanding of the crop yield, we need to study of the huge data with the help
of machine learning algorithm so it will give the accurate prediction of crop and suggest the
farmer for a better crop.
11
3.3.1 ECONOMICAL FEASIBILITY:
This study is carried out to check the economic impact that the system will have on
the organization. The amount of fund that the company can pour into the research and
development of the system is limited. The expenditures must be justified. Thus the developed
system as well within the budget and this was achieved because most of the technologies
used are freely available. Only the customized products had to be purchased.
Here, in this project we had used limited resources which are well in limit of our
project budget and hence the it is justified that it is economically feasible.
This study is carried out to check the technical feasibility, that is, the technical
requirements of the system. Any system developed must not have a high demand on the
available technical resources. This will lead to high demands being placed on the client. A
feasibility study evaluates the project’s potential for success.
The technologies used such as are Python are open resources which will be versatile
in developing various applications. Hence, it is justified that it is technically feasible
The aspect of study is to check the level of acceptance of the system by the user. This
includes the process of training the user to use the system efficiently. The user must not feel
threatened by the system, instead must accept it as a necessity. The level of acceptance by the
users solely depends on the methods that are employed to educate the user about the system
and to make him familiar with it. His level of confidence must be raised so that he is also able
to make some constructive criticism, which is welcomed, as he is the final user of the system.
The developed system is useful for farmers and other cultivators, which in turn affect
the society and hence it is justified that it is socially feasible.
12
3.4 SYSTEM REQUIREMENTS
The most common set of requirements defined by any operating system or software
application is the physical computer resources, also known as hardware. The minimal
hardware requirements are as follows,
1. Processor : Pentium IV
2. RAM : 8 GB
3. Processor : 2.4 GHz
4. Main Memory : 8GB RAM
5. Hard Disk Drive : 1TB
6. Keyboard : 104 Keys
13
3.5 SYSTEM ARCHITECTURE
14
The UML represents a collection of best engineering practices that have proven
successful in the modeling of large and complex systems.
The UML is a very important part of developing objects oriented software and the
software development process. The UML uses mostly graphical notations to express the
design of software projects.
GOALS:
The Primary goals in the design of the UML are as follows:
1. Provide users a ready-to-use, expressive visual modeling Language so that they can
develop and exchange meaningful models.
2. Provide extendibility and specialization mechanisms to extend the core concepts.
3. Be independent of particular programming languages and development process.
4. Provide a formal basis for understanding the modeling language.
5. Encourage the growth of OO tools market.
6. Support higher level development concepts such as collaborations, frameworks,
patterns and components.
7. Integrate best practices.
Use case diagrams are used to gather the requirements of a system including internal
and external influences. These requirements are mostly design requirements. Hence, when a
system is analyzed to gather its functionalities, use cases are prepared and actors are
identified.
When the initial task is complete, use case diagrams are modelled to present the
outside view.
15
FIG 3.2 USE CASE DIAGRAM
16
Fig 3.3 SEQUENCE DIAGRAM
Activity diagram is basically a flowchart to represent the flow from one activity to
another activity. The activity can be described as an operation of the system.
The control flow is drawn from one operation to another. This flow can be sequential,
branched, or concurrent. Activity diagrams deal with all type of flow control by using
different elements such as fork, join, etc
17
FIG 3.4 ACTIVITY DIAGRAM
The input design is the link between the information system and the user. It
comprises the developing specification and procedures for data preparation and those steps
are necessary to put transaction data in to a usable form for processing can be achieved by
inspecting the computer to read data from a written or printed document or it can occur by
having people keying the data directly into the system. The design of input focuses on
controlling the amount of input required, controlling the errors, avoiding delay, avoiding
extra steps and keeping the process simple. The input is designed in such a way so that it
provides security and ease of use with retaining the privacy. Input Design considered the
following things:
18
OBJECTIVES
2. It is achieved by creating user-friendly screens for the data entry to handle large
volume of data. The goal of designing input is to make data entry easier and to be free from
errors. The data entry screen is designed in such a way that all the data manipulates can be
performed. It also provides record viewing facilities.
3. When the data is entered it will check for its validity. Data can be entered with the
help of screens. Appropriate messages are provided as when needed so that the user will not
be in maize of instant. Thus the objective of input design is to create an input layout that is
easy to follow.
The design of output is the most important task of any system. During output design,
developers identify the type of outputs needed, and consider the necessary output controls
and prototype report layouts.
Computer output is the most important & direct source of information to the user. The
system is accepted by the user only by the quality of its output. If the output is not of good
quality, the user is likely to reject the system. Therefore, an effective output design is the
major criteria for deciding the overall quality of the system.
While designing the output one should try to accomplish the following:
19
Output Design Objectives
To develop output design that serves the intended purpose and eliminates the
production of unwanted output.
To develop the output design that meets the end users requirements.
To form the output in appropriate format and direct it to the right person.
In this project an effort is made in order to predict the better crop and also the
fertilizers that are required to increase the yield of a crop. Our project has
implemented using voting classifier which is nothing but ensemble of models. In our
project we had taken voting classifier, ensemble of models obtained from SVM,
random forest and KNN. By implementing using voting classifier and also by taking
inputs regarding both quality of soil and environmental conditions we got better
accuracy because the yield of crop not only depends quality of soil but also on
environmental like Temperature, Humidity and Rainfall. The accuracy of our project
is approximately 97%.
20
4. IMPLEMENTATION
21
CHAPTER 4
IMPLEMENTATION
4.1 MODULE DESCRIPTION
DATA PRE-PROCESSING
Here the raw data in the crop data is cleaned and the metadata is appending to it by
removing the things which are converted to the integer. So, the data is easy to train. Hear all
the data. In this pre-processing, we first load the metadata into this and then this metadata
will be attached to the data and replace the converted data with metadata. Then this data will
be moved further and remove the unwanted data in the list and it will divide the data into the
train and the test data For this splitting of the data into train and test we need to import
train_test_split which in the scikit-learn this will help the pre-processed data to split the data
into train and test according to the given weight given in the code. The division of the test
and train is done in 0.2 and 0.8 that is 20 and 80 percent respectively.
Model Creation:
Model evaluation:
We apply the machine learning algorithm for testing part and get the accuracy
of this model.
Prediction:
This module based on GUI part. we create a web page using bootstrap. The
web page like (Nitrogen, Phosphorous, Potassium, PH value, Humidity,
Rainfall, Temperature).now we get the data’s from user to compare the dataset
values .finally it will predict for the Crop and soil to be planted.
22
Methodology:
Give the value of Nitrogen, Phosphorus, Potassium, PH value, Rainfall, Humidity and
Temperature. We already trained the dataset. Our value compared to dataset and finally result
will displayed what seed we cultivated that particular place.
This is the sample data set used in this project. The data in Table I is data used to
predict crop yield based on 7 factors. These 7 factors are Nitrogen, Phosphorous, Potassium,
PH value, Rainfall, Humidity, and Temperature. We can create a machine learning model and
train the model and we can predict the crop and from Table II we can predict the fertilizer
should be used to get the proper yield the input parameters are the quantity of Nitrogen,
Phosphorus, Potassium and the output is the respective fertilizer should be used. Here in the
input parameters 1, 2, 3, 4, 5, 6,7represents the soil quality respectively.
Necessary Packages:
1. Numpy
2. Pandas
3. Matplotlib
4. pyplot
5. Scikit-learn
6. Tensorflow
7. Jupyter
23
Sample dataset of crop prediction
24
Sample dataset for fertilizer prediction
25
4.2 SOFTWARE ENVIRONMENT:
Python 2.0 was released in 2,000, and the 2.x versions were the prevalent releases
until December 2008. At that time, the development team made the decision to release
version 3.0, which contained a few relatively small but significant changes that were not
backward compatible with the 2.x versions. Python 2 and 3 are very similar, and some
features of Python 3 have been back ported to Python 2. But in general, they remain not quite
compatible.
Both Python 2 and 3 have continued to be maintained and developed, with periodic
release updates for both. As of this writing, the most recent versions available are 2.7.15 and
3.6.5. However, an official End of Life date of January 1, 2020 has been established for
Python 2, after which time it will no longer be maintained. If you are a newcomer to Python,
it is recommended that you focus on Python 3, as this tutorial will do.
Python is still maintained by a core development team at the Institute, and Guido is
still in charge, having been given the title of BDFL (Benevolent Dictator For Life) by the
Python community. The name Python, by the way, derives not from the snake, but from the
British comedy troupe Monty Python’s Flying Circus, of which Guido was, and presumably
still is, a fan. It is common to find references to Monty Python sketches and movies scattered
throughout the Python documentation.
4.2.1 PYTHON:
Python is a high level general purpose open source programming language. It is both
object oriented and procedural. Python is an extremely powerful language. This language is
very easy to learn and is a good choice for most of the professional programmers.
Python is popular:-
27
Python has been growing in popularity over the last few years. The 2018 Stack
Overflow Developer Survey ranked Python as the 7th most popular and the number one most
wanted technology of the year. World-class software development countries around the globe
use Python every single day.
According to research by Dice Python is also one of the hottest skills to have and the
most popular programming language in the world based on the Popularity of Programming
Language Index.
Python is interpreted:-
Many languages are compiled, meaning the source code you create needs to be
translated into machine code, the language of your computer’s processor, before it can be run.
Programs written in an interpreted language are passed straight to an interpreter that runs
them directly.
This makes for a quicker development cycle because you just type in your code and
run it, without the intermediate compilation step.
One potential downside to interpreted languages is execution speed. Programs that are
compiled into the native language of the computer processor tend to run more quickly than
interpreted programs. For some applications that are particularly computationally intensive,
like graphics processing or intense number crunching, this can be limiting.
Python is Free:-
28
The Python interpreter is developed under an OSI-approved open-source license,
making it free to install, use, and distribute, even for commercial purposes.
A version of the interpreter is available for virtually any platform there is, including
all flavours of Unix, Windows, MACOS, smart phones and tablets, and probably anything
else you ever heard of. A version even exists for the half dozen people remaining who use
OS/2.
Python is Portable:-
Because Python code is interpreted and not compiled into native machine instructions,
code written for one platform will work on any other platform that has the Python interpreter
installed. (This is true of any interpreted language, not just Python.)
Python is Simple:-
As programming languages go, Python is relatively uncluttered, and the developers
have deliberately kept it that way.
A rough estimate of the complexity of a language can be gleaned from the number of
keywords or reserved words in the language. These are words that are reserved for special
meaning by the compiler or interpreter because they designate specific built-in functionality
of the language.
Python 3 has 33 keywords, and Python 2 has 31. By contrast, C++ has 62, Java has
53, and Visual Basic has more than 120, though these latter examples probably vary
somewhat by implementation or dialect.
Python code has a simple and clean structure that is easy to learn and easy to read. In
fact, as you will see, the language definition enforces code structure that is easy to read.
For all its syntactical simplicity, Python supports most constructs that would be
expected in a very high-level language, including complex dynamic data types, structured and
functional programming, and object-oriented programming.
29
manipulation or GUI programming. Python accomplishes what many programming
languages don’t: the language itself is simply designed, but it is very versatile in terms of
what you can accomplish with it.
Conclusion:-
This section gave an overview of the Python programming language, including:
Python is a great option, whether you are a beginning programmer looking to learn
the basics, an experienced programmer designing a large application, or anywhere in
between. The basics of Python are easily grasped, and yet its capabilities are vast. Proceed to
the next section to learn how to acquire and install Python on your computer.
Python drew inspiration from other programming languages like C, C++, Java, Perl,
and Lisp.
30
Python has a very easy-to-read syntax. Some of Python's syntax comes from C,
because that is the language that Python was written in. But Python uses whitespace to
delimit code: spaces or tabs are used to organize code into groups. This is different from C. In
C, there is a semicolon at the end of each line and curly braces ({}) are used to group code.
Using whitespace to delimit code makes Python a very easy-to-read language.
Its standard library is made up of many functions that come with Python when it is
installed. On the Internet there are many other libraries available that make it possible for the
Python language to do more things. These libraries make it a powerful language; it can do
many different things.
Web development
Scientific programming
Desktop GUIs
Network programming
Game programming
31
crop_model.py
import pandas as pd
crop = pd.read_csv('Data/crop_recommendation.csv')
X = crop.iloc[:,:-1].values
Y = crop.iloc[:,-1].values
models = []
models.append(('rf',RandomForestClassifier(n_estimators = 21)))
models.append(('gnb',GaussianNB()))
models.append(('knn1', KNeighborsClassifier(n_neighbors=1)))
32
models.append(('knn3', KNeighborsClassifier(n_neighbors=3)))
models.append(('knn5', KNeighborsClassifier(n_neighbors=5)))
models.append(('knn7', KNeighborsClassifier(n_neighbors=7)))
models.append(('knn9', KNeighborsClassifier(n_neighbors=9)))
vot_soft.fit(XSSS_train, y_train)
y_pred = vot_soft.predict(X_test)
print("Accuracy: ",scores.mean())
import pickle
pkl_filename = 'Crop_Recommendation.pkl'
pickle.dump(vot_soft, Model_pkl)
Model_pkl.close()
app.py
33
from flask import Flask, render_template, request, Markup
import pandas as pd
import os
import numpy as np
import pickle
classifier = load_model('Trained_model.h5')
classifier._make_predict_function()
crop_recommendation_model_path = 'Crop_Recommendation.pkl'
app = Flask(__name__)
@ app.route('/fertilizer-predict', methods=['POST'])
def fertilizer_recommend():
crop_name = str(request.form['cropname'])
N_filled = int(request.form['nitrogen'])
P_filled = int(request.form['phosphorous'])
K_filled = int(request.form['potassium'])
df = pd.read_csv('Data/Crop_NPK.csv')
34
n = N_desired- N_filled
p = P_desired - P_filled
k = K_desired - K_filled
if n < 0:
key1 = "NHigh"
elif n > 0:
key1 = "Nlow"
else:
key1 = "NNo"
if p < 0:
key2 = "PHigh"
elif p > 0:
key2 = "Plow"
else:
key2 = "PNo"
if k < 0:
key3 = "KHigh"
elif k > 0:
key3 = "Klow"
else:
key3 = "KNo"
abs_n = abs(n)
35
abs_p = abs(p)
abs_k = abs(k)
response1 = Markup(str(fertilizer_dict[key1]))
response2 = Markup(str(fertilizer_dict[key2]))
response3 = Markup(str(fertilizer_dict[key3]))
recommendation2=response2, recommendation3=response3,
def pred_pest(pest):
try:
test_image = image.img_to_array(test_image)
result = classifier.predict_classes(test_image)
return result
except:
return 'x'
@app.route("/")
@app.route("/index.html")
def index():
return render_template("index.html")
@app.route("/CropRecommendation.html")
def crop():
36
return render_template("CropRecommendation.html")
@app.route("/FertilizerRecommendation.html")
def fertilizer():
return render_template("FertilizerRecommendation.html")
@app.route("/PesticideRecommendation.html")
def pesticide():
return render_template("PesticideRecommendation.html")
def predict():
if request.method == 'POST':
filename = file.filename
file.save(file_path)
pred = pred_pest(pest=file_path)
if pred == 'x':
return render_template('unaptfile.html')
if pred[0] == 0:
pest_identified = 'aphids'
elif pred[0] == 1:
pest_identified = 'armyworm'
elif pred[0] == 2:
pest_identified = 'beetle'
37
elif pred[0] == 3:
pest_identified = 'bollworm'
elif pred[0] == 4:
pest_identified = 'earthworm'
elif pred[0] == 5:
pest_identified = 'grasshopper'
elif pred[0] == 6:
pest_identified = 'mites'
elif pred[0] == 7:
pest_identified = 'mosquito'
elif pred[0] == 8:
pest_identified = 'sawfly'
elif pred[0] == 9:
@ app.route('/crop_prediction', methods=['POST'])
def crop_prediction():
if request.method == 'POST':
N = int(request.form['nitrogen'])
P = int(request.form['phosphorous'])
K = int(request.form['potassium'])
ph = float(request.form['ph'])
rainfall = float(request.form['rainfall'])
38
temperature = float(request.form['temperature'])
humidity = float(request.form['humidity'])
my_prediction = crop_recommendation_model.predict(data)
final_prediction = my_prediction[0]
if __name__ == '__main__':
app.run(debug=True)
cnn_model.py
39
# Part 1 - Building the CNN
classifier = Sequential()
#step 2 - Pooling
classifier.add(MaxPooling2D(pool_size =(2,2)))
classifier.add(MaxPooling2D(pool_size =(2,2)))
classifier.add(MaxPooling2D(pool_size =(2,2)))
#Step 3 - Flattening
classifier.add(Flatten())
40
classifier.add(Dense(256, activation = 'relu'))
classifier.add(Dropout(0.5))
classifier.compile(
optimizer = 'adam',
loss = 'categorical_crossentropy',
metrics = ['accuracy'])
train_datagen = ImageDataGenerator(
rescale=1./255,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True)
test_datagen = ImageDataGenerator(rescale=1./255)
training_set = train_datagen.flow_from_directory(
'Data/train',
target_size=(64, 64),
batch_size=32,
class_mode='categorical')
test_set = test_datagen.flow_from_directory(
'Data/test',
41
target_size=(64, 64),
batch_size=32,
class_mode='categorical')
model = classifier.fit_generator(
training_set,
steps_per_epoch=100,
epochs=100,
validation_data = test_set,
validation_steps = 6500
import h5py
classifier.save('Trained_Model.h5')
print(model.history.keys())
plt.plot(model.history['acc'])
plt.plot(model.history['val_acc'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.show()
42
# summarize history for loss
plt.plot(model.history['loss'])
plt.plot(model.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.show()
43
An SVM model is a representation of the examples as points in space, mapped so
that the examples of the separate categories are divided by a clear gap that is as wide as
possible. In addition to performing linear classification, SVMs can efficiently perform a
non-linear classification, implicitly mapping their inputs into high-dimensional feature
spaces.
Given a set of training examples, each marked as belonging to one or the other of two
categories, an SVM training algorithm builds a model that assigns new examples to one
category or the other, making it a non-probabilistic binary linear classifier. Let you have
basic understandings from this article before you proceed further. Here I’ll discuss an
example about SVM classification of cancer UCI datasets using machine learning tools.
In our project for SVM algorithm we used values of degree as degree=1,2,3,4,5 and
kernel=poly to generated a model.
44
Fig: 4.3. Example of SVM algorithm
In our project for random forest algorithm we used 21 estimators that is nothing but a
decision trees to generate a model.
45
Fig: 4.3. Example for RANDOM FOREST algorithm
In our project for knn algorithm we used values of n as n=1,3,5,7,9 to generated a model.
46
Fig: 4.4. Example of KNN algorithm
VOTING CLASSIFIER
1. Hard Voting: In hard voting, the predicted output class is a class with the
highest majority of votes i.e the class which had the highest probability of being
predicted by each of the classifiers. Suppose three classifiers predicted
the output class(A, A, B), so here the majority predicted A as output.
Hence A will be the final prediction.
47
2. Soft Voting: In soft voting, the output class is the prediction based on the
average of probability given to that class. Suppose given some input to three
models, the prediction probability for class A = (0.30, 0.47, 0.53) and B = (0.20,
0.32, 0.40). So the average for class A is 0.4333 and B is 0.3067, the winner is
clearly class A because it had the highest probability averaged by each classifier.
Now we combine all the model that are obtained from above three classifiers Random-forest,
KNN and SVM.
In our project we used soft type voting that means here the output class is predicted based on
the average of probability given to that class.
In our project we are using Convolutional neural networks for identifying the type of
insect when image of an insect is uploaded. Because we can predicts the pesticides for that
insect only by knowing which kind of insect it was.
We used 3 convolutional layers in our project.
48
A convolution neural network has multiple hidden layers that help in extracting
information from an image.
The four important layers in CNN are:
1. Convolution layer
2. ReLU layer
3. Pooling layer
4. Fully connected layer
Convolution Layer
This is the first step in the process of extracting valuable features from an image. A
convolution layer has several filters that perform the convolution operation. Every image is
considered as a matrix of pixel values.
ReLU layer
ReLU stands for the rectified linear unit. Once the feature maps are extracted, the next
step is to move them to a ReLU layer.
ReLU performs an element-wise operation and sets all the negative pixels to 0. It
introduces non-linearity to the network, and the generated output is a rectified feature map.
Pooling Layer
49
Fig: Example for CNN algorithm
50
5. SYSTEM TESTING
&
SCREEN SHOTS
51
CHAPTER 5.
SYSTEM TESTING
5.1 INTRODUCTION
In a generalized way, we can say that the system testing is a type of testing in which
the main aim is to make sure that system performs efficiently and seamlessly. The process of
testing is applied to a program with the main aim to discover an unprecedented error, an error
which otherwise could have damaged the future of the software. Test cases which brings up a
high possibility of discovering and error is considered successful. This successful test helps to
answer the still unknown errors.
TESTING METHODOLOGIES
A test plan is a document which describes approach, its scope, its resources and the
schedule of aimed testing exercises. It helps to identify almost other test item, the features
which are to be tested, its tasks, how will everyone do each task, how much the tester is
independent, the environment in which the test is taking place, its technique of design plus
the both the end criteria which is used, also rational of choice of theirs, and whatever kind of
risk which requires emergency planning. It can be also referred to as the record of the process
of test planning. Test plans are usually prepared with signification input from test engineers.
In unit testing, the design of the test cases is involved that helps in the validation of the
internal program logic. The validation of all the decision branches and internal code takes
place. After the individual unit is completed it takes place. Plus it is taken into account after
the individual united is completed before integration. The unit test thus performs the basic
level test at its component stage and tests the particular business process, system
configurations etc. The unit test ensures that the particular unique path of the process gets
performed precisely to the documented specifications and contains clearly defined inputs
with the results which are expected.
52
5.1.2 INTEGRATION TESTING:
These tests are designed to test the integrated software items to determine whether if they
really execute as a single program or application. The testing is event driven and thus is
concerned with the basic outcome of field. The Integration tests demonstrate that the
components were individually satisfaction, as already represented by successful unit testing,
the components are apt and fine. This type of testing is specially aimed to expose the issues
that come-up by the components combination.
The following are the types of Integration Testing:
In bottom-up testing, each module at lower levels is tested with higher modules
until all modules are tested. The primary purpose of this integration testing is, each
subsystem is to test the interfaces among various modules making up the subsystem. This
integration testing uses test drivers to drive and pass appropriate data to the lower level
modules.
Advantages:
In bottom-up testing, no stubs are required.
A principle advantage of this integration testing is that several disjoint
subsystems can be tested simultaneously.
Disadvantages:
Driver modules must be produced.
In this testing, the complexity that occurs when the system is made up of a large
number of small subsystem.
5.1.3 FUNCTIONAL TESTING
53
The functional tests help in providing the systematic representation that functions
tested are available and specified by technical requirement, documentation of the system and
the user manual.
The white box testing is the type of testing in which the internal components of the
system software is open and can be processed by the tester. It is therefore a complex type of
testing process. All the data structure, components etc. are tested by the tester himself to find
out a possible bug or error. It is used in situation in which the black box is incapable of
finding out a bug. It is a complex type of testing which takes more time to get applied.
The black box testing is the type of testing in which the internal components of the
software is hidden and only the input and output of the system is the key for the tester to find
out a bug. It is therefore a simple type of testing. A programmer with basic knowledge can
also process this type of testing. It is less time consuming as compared to the white box
testing. It is very successful for software which are less complex are straight-forward in
nature. It is also less costly than white box testing.
54
5.1.6 Acceptance Testing:
User Acceptance Testing is a critical phase of any project and requires significant
participation by the end user. It also ensures that the system meets the functional
requirements.
TEST CASE
Testing, as already explained earlier, is the process of discovering all possible weak-points in
the finalized software product. Testing helps to counter the working of sub-assemblies,
components, assembly and the complete result. The software is taken through different
exercises with the main aim of making sure that software meets the business requirement and
user-expectations and doesn’t fail abruptly. Several types of tests are used today. Each test
type addresses a specific testing requirement.
Advantages:
Separately debugged module.
Few or no drivers needed.
It is more stable and accurate at the aggregate level.
Disadvantages:
Needs many Stubs.
Modules at lower level are tested inadequately.
55
2 Accepting ph Ph=110 Value not be Value not pass
value accepted accepted
56
fertilizer based on ,
input values crop you want to
grow=coconut
5.2 SCREEN SHOTS
57
5.2.2 HOME PAGE
58
5.2.3 TAKING INPUTS FOR CROP PREDICTION
59
5.2.4 CROP PREDICTED
60
5.2.5 TAKING INPUTS FOR CROP PREDICTION
61
5.2.6. CROP PREDICTED
62
5.2.7. TAKING INPUTS FOR CROP PREDICTION
63
5.2.8. CROP PREDICTED
64
5.2.9. TAKING INPUT TO PREDICT FERTILIZERS
65
5.2.10. Suggestions
66
5.2.11. Suggestions
67
5.2.12. Information about fertilizers.
68
5.2.13. Suggestions about fertilizers.
69
Fig: 5.14: Predicting pesticide
70
5.2.16: IDENTIFIED PESTICIDE
71
5.2.17: TAKING INPUT TO PREDICT PESTICIDES
72
5.2.19:. IDENTIFIED PEST
73
Enter Required values for manual analysis:
74
Temperature Requirement:
75
76
6. CONCLUSION
&
FUTURE SCOPE
Chapter 6.
CONCLUSION AND FUTURE SCOPE
77
6.1 CONCLUSION
The proposed work presents a crop prediction framework utilizing Voting classifier
which is nothing but an ensemble of models. Here in our project voting classifier ensembles
the models obtained from SVM, Random-Forest and KNN. Our project predict the crop with
more accuracy. In this way the framework will help decrease the challenges looked by the
farmers and prevent them from endeavoring suicides. It will go about as a medium to give the
farmers effective data needed to get high return and consequently augment benefits which
thus will diminish the self destruction rates and reduce his challenges.
It’s lead to increasing the Countries’ overall profit. In our project we found that the
accurate prediction of different specified crop yields across different districts will help to
farmer. From this farmers will plant different crops in different districts. In the near future,
geospatial analysis can be added to improve accuracy and also implement a better
geographical data.
78
7. BIBLOGRAPHY
&
REFERENCES
CHAPERT 7.
79
[1] Mayank Champaneri, Chaitanya Chandvidkar , Darpan Chachpara, Mansing Rathod,
“Crop yield prediction using machine learning” International Journal of Science and
Research ,April 2020.
[2] Pavan Patil, Virendra Panpatil, Prof. Shrikant Kokate, “Crop Prediction System using
Machine Learning Algorithms”, International Research Journal of Engineering and
Technology, Feb 2020.
[3] Ramesh Medar , Shweta, Vijay S. Rajpurohit, “Crop Yield Prediction using Machine
Learning Techniques”, 5th International Conference for Convergence in Technology, 2019.
[4] Trupti Bhange, Swati Shekapure, Komal Pawar, Harshada Choudhari, “Survey Paper on
Prediction of Crop yield and Suitable Crop”, International Journal of Innovative Research in
Science, Engineering and Technology, May 2019.
[6] Nishit Jain, Amit Kumar, Sahil Garud, Vishal Pradhan, Prajakta Kulkarni, “Crop
Selection Method Based on Various Environmental Factors Using Machine Learning”,
International Research Journal of Engineering and Technology (IRJET), Feb 2017.
[7] Rakesh Kumar, M.P. Singh, Prabhat Kumar, J.P. Singh, “Crop Selection Method to
Maximize Crop Yield Rate using Machine Learning Technique”, 2015 International
Conference on Smart Technologies and Management for Computing, Communication,
Controls, Energy and Materials (ICSTM),Vel Tech Rangarajan Dr. Sagunthala R&D Institute
of Science and Technology, Chennai, T.N., India., May 2015.
[8] Rajshekhar Borate., “Applying Data Mining Techniques to Predict Annual Yield of Major
Crops and Recommend Planting Different Crops in Different Districts in India”, International
Journal of Novel Research in Computer Science and Software Engineering,Vol. 3, Issue 1,
pp: (34-37), April 2016.
80
[9] D Ramesh, B Vishnu Vardhan, “Analysis of Crop Yield Prediction using Data Mining
Techniques”, International Journal of Research in Engineering and Technology
(IJRET),Vol.4, 2015.
[11]. Igor Oliveira, Renato L. F. Cunha, Bruno Silva, Marco A. S. Netto.2018.”A Scalable
Machine Learning System for PreSeason Agriculture Yield Forecast.”.978-1-5386-9156-
4/18/$31.00 ©2018.
[12] Neha Rale, Raxitkumar Solanki, Doina Bein, James Andro-Vasko, Wolfgang
Bein.”Prediction of Crop Cultivation”.978-1-7281-0554-3/19/$31.00©2019 IEEE.
[13]. Md. Tahmid Shakoor, Karishma Rahman, Sumaiya Nasrin Rayta, Amitabha
Chakrabarty.2017.”Agricultural Production Output Prediction Using Supervised Machine
Learning Techniques”.978-1-5386-3831-6/17/$31.00 ©2017 IEEE.
[14]. G Srivatsa Sharma, Shah Nawaz Mandal, Shruti Kulkarni, Monica R Mundada,
Meeradevi.2018.”Predictive Analysis to Improve Crop Yield Using a Neural Network
Model”.978-1-5386-5314-2/18/$31.00 ©2018 IEEE.
[15]. Rashmi Priya, Dharavath Ramesh.2018.”Crop Prediction on the Region Belts of India:
A Naïve Bayes MapReduce Precision Agricultural Model”. 978-1-5386-5314- 2/18/$31.00
©2018 IEEE.
81