You are on page 1of 69

SOIL HEALTH PREDICTOR AND CROP

RECOMMENDER USING NEURAL NETWORKS


A project report submitted in partial fulfillment
of the requirements for the award of the Degree of
Bachelor of Technology
In
Computer Science and Engineering (CSE)

By
Rohith Reddy(19011P0515)
Sowmya Raye(19011P0523)
Sahithi V(19011P0528)00
K Soumya (19011P0529)

Under the guidance of


Dr. R. SRIDEVI
Professor of CSE, JNTUH-UCEH
Department of Computer Science and Engineering

JNTUH University College of Engineering,

Science & Technology, Hyderabad - 500085.

DECLARATION BY THE CANDIDATES

We, Rohith Reddy(19011P0515), Sowmya Raye (19011P0523), Sahithi V (19011P0528) and


K Soumya (19011P529) hereby certifies that the project report entitled “Soil Health Predictor
and Crop Recommender using Neural Networks” carried out under the guidance of Dr. R.
Sridevi, , is submitted in partial fulfillment of the requirements for the award of the degree of
Bachelor of Technology in Computer Science and Engineering. This is a record of bona fide
work carried out by us and the results embodied in this project have not been reproduced /copied
from any source.
The results embodied in this project report have not been submitted to any other University or
Institute for the award of any other degree or diploma.
Rohith Reddy(19011P0515),
Sowmya Raye(19011P0523),
Sahithi V(19011P0528),
K Soumya (19011P0529)
Department of Computer Science and Engineering

JNTUH University College of Engineering,

Science & Technology, Hyderabad - 500085.

CERTIFICATE BY THE SUPERVISOR

This is to certify that the project report entitled “Soil Health Predictor and Crop Recommender

using Neural Networks”, being submitted by Rohith Reddy(19011P0515), Sowmya

Raye(19011P0523), Sahithi V(19011P0528), K Soumya (19011P0529) for the award of the

degree of Bachelors of Technology in Computer Science & Engineering, is a record of bonafide

work carried out by them and the results produced are verified and found satisfactory.

Dr. R. Sridevi,

Professor,

(Department of Computer Science & Engineering)


Department of Computer Science and Engineering

JNTUH University College of Engineering,

Science & Technology, Hyderabad - 500085.

CERTIFICATE BY THE HEAD

This is to certify that the project report “Soil Health Predictor and Crop Recommender using

Neural Networks”, being submitted by Rohith Reddy(19011P0515), Sowmya

Raye(19011P0523), Sahithi V(19011P0528) and K Soumya(19011P0529) in partial fulfillment is

a record of Bonafide work carried out by them. The results are verified and found satisfactory.

Dr D. Vasumathi

Professor & Head of the Department

(Department of Computer Science &Engineering)

DATE :
ACKNOWLEDGEMENT

I would like to express sincere thanks to our Supervisor Dr. R Sridevi, Professor of Computer
Science and Engineering Department, JNTUH-CEH for her admirable guidance and inspiration
both theoretically and practically and most importantly for the drive to complete the project
successfully. Working under such an eminent guide was our privilege.

I owe a debt of gratitude to Dr D. Vasumathi madam, Professor & Head of the Department of
Computer Science & Engineering, for her kind consideration and encouragement in carrying out
this project successfully.

I am grateful to the Project Review Committee members and the Department of Computer
Science & Engineering who have helped in successfully completing this project by giving their
valuable suggestions and support
.
I express thanks to our parents for their love, care and moral support without which we would
have not been able to complete this project. It has been a constant source of inspiration for all our
academic endeavors. Last but not the least, we thank the Almighty for making us a part of the
world.

Rohith Reddy(19011P0515),
Sowmya Raye(19011P0523),
Sahithi V(19011P0528),
K Soumya(19011P0529).
ABSTRACT

The problem at hand is that every soil is different and has different features, so based on those
features this study provides the best crop that can be grown on that soil for maximum yield. 75%
of soil health depends on the macronutrients which are Nitrogen, Potassium, and Phosphorus.
Taking these macronutrients, other micronutrients, and features such as texture to train various
machine learning models (decision tree, SVM, random forest, etc.) and a few Neural Network
models. Based on various classification metrics and accuracies given by the models, this work
presents the Neural Network model as the best model for crop recommendation.

A fertility scale has also been defined based on the research conducted and a soil fertility
calculator has been implemented which gives a soil fertility score as the result. A UI (website)
has also been implemented which is integrated with our model using which the farmer or the end
user can use the crop recommender and fertility score calculator with ease.
TABLE OF CONTENTS

1. INTRODUCTION
1.1 Overview 10
1.2 Problem Statement 10
1.3 Objectives 11
1.4 Chapter Organization 11

2. LITERATURE SURVEY
2.1 Introduction 13
2.2 Existing Systems 13
2.3 Drawbacks of Existing Systems 18

3. SYSTEM REQUIREMENTS
3.1 Software Requirements 21
3.2 Hardware requirements 22
3.3 Functional Requirements 22
3.4 Non Functional Requirements 23

4. SYSTEM ARCHITECTURE AND DESIGN


4.1 Introduction 25
4.2 System Architecture 25
4.3 Flowchart 26

5. PROPOSED METHODOLOGIES
5.1 Proposed System 28
5.2 Data Collection 28
5.3 Data Preprocessing 29
5.3.1 Feature Selection 29
5.3.2 Normalization 30
5.4 Training and Testing Phase 31
5.5 Models
5.5.1 Decision Tree 32
5.5.2 Support Vector Machines 34
5.5.3 Random Forest 35
5.5.4 Artificial Neural Networks 37
5.6 Choosing Best Model 39
5.7 Graphical Observation
5.7.1 Training and Validation Accuracy 40
5.7.2 Training and Validation Loss 40

6. IMPLEMENTATION
6.1 Code Snippets 42

7. RESULTS 63

8. CONCLUSION AND FUTURE SCOPE


8.1 Conclusion 66
8.2 Future Scope 66

9. REFERENCES 67
CHAPTER 1

INTRODUCTION
1. INTRODUCTION

1.1 OVERVIEW

In the next 40 years soil health is predicted to come to a stage where food production in the world
will drop by 40% and the population will rise to 9.3 billion. This is a direct result of unearthly
farming methods without proper knowledge of soil health. Researchers at the University of Iowa
found that farmers had the highest suicide rate among people in all other occupations due to little
to no profits. To help farmers get the best yield we are going to build a Deep Learning model
which recommends the crop that provides maximum yield based on the various features of that
particular soil. We will also be providing the user with a soil fertility score which helps the user to
assess how healthy or fertile his/her soil is.

1.2 PROBLEM STATEMENT

60% of the population of India consists of the Agricultural community. Farmers lack the
know-how of what various crops they can grow for maximum yield and income. Our main
motivation behind this project comes from looking at various news reports about farmers
committing suicide on a daily basis. As engineers, we want to help these farmers as much as we
can on our level with our crop recommender and fertility score calculator.

After conducting research we realized that there aren’t many innovations made in this field even
though there are a few projects which can recommend crops but those don’t seem to take soil
features into consideration which are the most important for crop growth. So we decided to build
a machine learning/deep learning model which takes these features into account to recommend
crops and give a fertility score.
1.3 OBJECTIVES

The problem we are solving here is, every soil is different and has a different number of features
so based on those features we will provide the best crop that can be grown on that soil for
maximum yield. 75% of soil health depends on the macronutrients Nitrogen, Potassium, and
Phosphorus. We will be taking these macronutrients, other micronutrients, and features such as
texture to train various machine learning models (decision tree, SVM, random forest, etc.) We
will also be training a few Neural Network models and based on various classification metrics
and accuracies given by the models, we will arrive at the best model that gives us the right crops
to grow. We will also be defining a fertility scale based on the research we do and implementing a
soil fertility calculator which gives the soil fertility score as the result. We will be implementing a
highly accessible UI (web or app) in which the farmer or the end user can use the crop
recommender and fertility score calculator with ease.

1.4 CHAPTER ORGANIZATION

Chapter 2: Provides an overview of the literature Survey.

Chapter 3: Gives an overview of requirements..

Chapter 4: Gives an overview about the system architecture and design.

Chapter 5: Provides a detailed information on proposed system.

Chapter 6: Implementation of the project is understood.

Chapter 7: Deals with the testing and results of the proposed system.

Chapter 8: Gives Conclusions detected from the results nad future enhancements.

Chapter 9: References.
CHAPTER 2

LITERATURE SURVEY
2. LITERATURE SURVEY

2.1 INTRODUCTION

We finished a literature survey, To acquire a clear idea of crop recommendation and ML


techniques in the agricultural sector. Machine learning and Neural Networks have been used in
the agriculture sector for several years. Machine learning algorithms are being applied to a great
extent for prediction of crop yield strategies. Many ML Techniques and Neural Network models
have been suggested for solving this challenging problem. By this we can say that predicting the
crops is not an easy job as there are many countries, many types of crops and soil. Cuisine varies
greatly around the world, the basic ingredients that every crop contains are quite similar.

2.2 EXISTING SYSTEMS

Shilpa Mangesh Pandey et al. (2021) developed a model in which an integrated predictor
system assists farmers in predicting the production of a specific crop. The built-in recommender
system enables the user to explore possible crops and their yield in order to make more informed
judgments. Various machine learning methods, like Artificial Neural Networks, Support Vector
Machine, etc. have been implemented and evaluated over the states of Karnataka and
Maharashtra datasets for yield to accuracy. The accuracy of each algorithm in comparison with
one another. The acquired findings indicate RF algorithm, with an accuracy of 95%, is the most
accurate standard method performed on the provided datasets. The proposed model also
investigated the time of fertilizer application and suggested the optimal period
Priyadharshini A et al. (2021) have presented a system that assists farmers in selecting the
optimal crop by supplying information that regular farmers do not keep track of, hence reducing
crop failure risks and boosting agricultural yield. Additionally, it saves them from suffering losses.
It is anticipated that millions of farmers across the country would be able to access a web interface
and a mobile app that will deliver crop cultivation recommendations to farmers.

S Bangaru Kamatchi et al. (2019) constructed a model for enhancing Crop Yield using a
Recommendation program determined by analyzing weather. In accordance with the dynamics of
climate and the theory of climate prediction, weather forecasting is a demanding prospect in the
technological and scientific realms. Artificial Neural Networks (ANN) are being implemented in
weather prediction research. The research employs categorization, recommendation, and
prediction methods. For increasing the success rate of the recommendation program, the
regularization of ANN (Artificial Neural Networks) is incorporated into the Hybrid approach via
CBR.

Mummaleti Keerthana et al. (2021) devised a technique for predicting agricultural yield
using previously gathered data. This issue has been resolved by the application of various ML
algorithms. A mixture of the Decision Tree algorithm and AdaBoost algorithm is utilized to
accurately anticipate the result in this instance. They have realized that the addition of a decision
tree will boost the rate of accuracy relative to other methods. The improvement in accuracy rate
will increase the yield. This system assists farmers in determining what crops can be grown in a
given area depending on the characteristics considered. In this paper suggestion, the authors have
selected the top ten crops capable of resolving the major outcomes when the criteria include
climate and area.
Atharva Jadhav et al. (2022) have created a crop-recommendation system employing
variables such as the soil's composition, the local temperature, the soil's pH, etc. Using machine
learning algorithms, they predicted a 98% accuracy in forecasting the appropriate crop to grow, if
the farmer adopts this technique.

S. Pudumalar et al. (2016) suggested a method that employs the Majority Voting technique,
one of the most well-known techniques for assembling. There is no limit to the number of base
learners that can be utilized in the voting procedure. There must be a minimum of two
foundation learners. The students are selected such that they complement one another while still
being compatible. The greater the competition, the greater the likelihood of accurate prediction.
However, learners must act with caution since if a single member or multiple of them make a
mistake, the likelihood that others will correct it is high. Each student constructs herself as a
model. This model is trained using the specified training data set. When classifying the latest
report, each model predicts the class independently.

Using historical data, Dr. Y. Jeevan Nagendra Kumar et al. (2020) created a technique to
anticipate crop production. Utilizing data mining techniques, agricultural yield predictions are
made. As an output, the Random Forest algorithm is used to predict the optimal crop yield. In the
realm of agriculture, crop yield prediction is most applicable. The increase in precision increases
the agricultural yield's profitability. The suggested methodology aids farmers to gain information
about the requirement and costs of various crops. This will guide farmers by giving them which
crop to cultivate in the area. This work is performed to give information regarding crops that
might be useful for precise and effective yield. With this model, lots of crop varieties are
included. India's farmers will benefit from the accurate forecasting of many specific crops across
districts.
Bhawana Sharma et al (2020), This review contains 39 articles on various aspects that
include the management of crops, employing machine learning and image processing, crop
development, disease diagnosis, pest identification, and conservation of Soil and Water are
included. Accuracy-wise, a particular method yields superior outcomes. This promotes the
application of machine learning in the agricultural sector. In crop management, none of these
algorithms placed particular emphasis on crop maturity classification. Therefore, ML is believed
to aid agricultural growth monitoring. The research demonstrates a suggested methodology that
employs ML and image-processing approaches which assess crop growth using photographs.

It has been advocated by Sudha Bhatia et al. (2021) that soil testing processes be undertaken
to comprehend the soil's composition is essential for determining soil fertility and well-being. It's
also generally accepted that Potassium, Nitrogen, and Phosphorus are the three most essential
nutrients for plant growth and productivity in the majority of crops. On the basis of the study
conducted on the aforementioned samples, they came to the conclusion that 3rd instance, which
was a mixture of particles using Ag, contained relatively greater amounts among three nutrients
Ca, P, and Mg. This discovery in the soil sample was mostly attributable to the deposition of
particles of silver, which acted by being an environmentally benign manure and may have
stimulated crop development. These results demonstrate that soil treated with silver
nanoparticles will enable improved plant growth and production.

Sadia Afrin et al (2018), This study’s objective is to assess a connection among crop yields,
soil nutrients, and climatic conditions using various data mining approaches. Several
machine-learning approaches have been utilized to determine an optimal model. Based on our
clustering investigation, PAM yields superior outcomes to other models implemented. The
extended linear model offers the lowest RMSE among the four regression algorithms we have
employed. This proposed effort may be valuable and requires agricultural study or expertise to
discover how to avoid difficulties, to acquire the most output from different features of soil and
environmental circumstances. Even though the suggested research was based on altitude and
medium altitude soil components and agricultural cultivation, this very same research can be
extended to cover additional farmland categories along with other components of soil, like the
level of water, manures, and soil composition.

Ganesh Babu R et al (2020), Agriculture based on soil dependent crop identification and
management of fertilizers. Multiple detectors were deployed locations across the Thanjavur
district area. An application has been built and deployed on the farmers' mobile phones. The
report of the results is entered into the UI which determines whether that plant is the best choice
for that area. Also, this UI transmits to a field controller all relevant information, such as
fertilizer requirements, soil composition, etc. This program was developed for the soil testing
method of selection of crops and evaluates several field factors using information collected. The
controller combines the quantities of each fertilizer and distributes them uniformly throughout
the field based on the data. The model was performed using software such as MATLAB and the
output shows that the proposed technique accurately predicts the type of crops to be grown and
other features for agricultural areas. The hardware for the suggested method has been built and
tested in the field.

Aditya Motwani et al (2022), This study offers a mechanism for crop production and
analysis of soil. employs an RF and CNN Algorithm to tackle several long-standing difficulties
in the Indian agricultural sector and boost the average farmer's profitability. In addition, it
describes the execution of a User interface that functions as a market for users and prospective
agricultural purchasers, removing the necessity for a facilitator.

Akshar Tripathi et al (2022), This research proved that soil features influence crop yield
and that health of the soil is essential to increased production. With precise cultivation study and
synchronous collection of data, the suggested DLMLP model's precision can be enhanced. The
study paves the path for future research using SAR datasets with longer wavelengths to provide a
temporal soil health evaluation. Such soil-health studies based on remote sensing are extremely
valuable to scientists for identifying methods and actions to help keep several soil quality
indicators at proper capacity. In addition to visual remotely sensed information characteristics
and top vegetation cover light scattering from ripening crops enabling SOC assessment, this
work illustrates the significant susceptibility of C-band SAR data in forecasting conductivity and
soil water levels.

Takalani Oretha Mufamadi et al (2020), In this work, they were able to construct a crop
recommendation system based on soil parameters. For the development of our CR system, they
applied the RF algorithm and SVM and discovered that the RF algorithm performs best for soil
classification, which serves as the system's base. As seen in our discussion of findings, their final
system uses RF algorithm and achieves 91.1% accuracy. In addition to the soil's pH, they
examined the three key soil nutrients: nitrogen, phosphorus, and potassium.

Younes OMMANE et al (2022), This paper gives a comprehensive study of forty


publications submitted from 2010 to 2020 on Crop Recommendation, as well as a summary of
the major accomplishments and current concerns. The purpose of the research was to provide
insight into the types of solutions presented in current times. These solutions can be useful for
recommending people who would like to conduct investigations and for offering a solid
awareness of current trends in the field.

2.3 DRAWBACKS OF EXISTING SYSTEMS

A few major limitations or drawbacks found during the literature review are, many papers
reviewed did not take key features into account such as the N, P, and K values of the soil in
question or the texture of the soil, etc., and even if they considered the macronutrients, they did
not consider the micronutrients which also play a typical role. Another limitation identified is
that most of the papers only implemented machine learning techniques and did not try any deep
learning (neural network models), also they did not clearly show the parameters used to decide
which is the best machine learning algorithm for getting an optimal result. Few models only have
done the qualitative analysis, they do not implement any models but they do help in data
pre-processing and feature selection.
CHAPTER 3

SYSTEM REQUIREMENTS
3. REQUIREMENTS

3.1 SOFTWARE REQUIREMENTS

● Operating System : Windows 8 or higher.


An Operating System is a group of several GUIs (Graphical User Interface) families, all
of which are developed, sold and marketed by Microsoft. The latest version of Microsoft
Operating System which is being widely used across the world is Windows 10.

● Python3.7.x or higher :
Python is a programming language, which means it’s a language both people and
computers can understand. Python was developed by a Dutch software engineer named
Guido van Rossum, who created the language to solve some problems he saw in computer
languages of the time. Python is an interpreted high-level programming language for
general-purpose programming. Created by Guido van Rossum and first released in 1991,
Python has a design philosophy that emphasizes code readability, and a syntax that allows
programmers to express concepts in fewer lines of code, notably using significant
whitespace. It provides constructs that enable clear programming on both small and large
scales. Python features a dynamic type system and automatic memory management. It
supports multiple programming paradigms, including object-oriented, imperative,
functional, and procedural, and has a large and comprehensive standard library. Python
interpreters are available for many operating systems. C Python, the reference
implementation of Python, is open-source software and has a community-based
development model, as do nearly all of its variant implementations. C Python is managed
by the non-profit Python Software Foundation.

Python can be used for Pretty Much Anything: One significant advantage of learning
Python is that it’s a general-purpose language that can be applied in a large variety of
projects. Below are just some of the most common fields where Python has found its use:
Data science Scientific and mathematical computing
Web development
Computer graphics
Basic game development
Mapping and geography (GIS software)

● Google Collab :
Collaboratory ("Collab" for short) is a data analysis and machine learning tool that allows
you to combine executable Python code and rich text along with charts, images, HTML,
LaTeX and more into a single document stored in Google Drive. It connects to powerful
Google Cloud Platform runtimes and enables you to easily share your work and
collaborate with others.

3.2 HARDWARE REQUIREMENTS

Hardware Requirement for installing software :


Disk Space: Minimum 32 GB
The required memory space for installing Anaconda Software :
Processor: 1.4 GHz 64 bit
Memory: 512 MB Display

3.3 FUNCTIONAL REQUIREMENTS

ID REQUIREMENT

FR1.1 User should give soil features -> User will get
crop recommendation along with a soil fertility
score
Table 1: Functional Requirements

3.4 NON FUNCTIONAL REQUIREMENTS

ID REQUIREMENT

Ease of access – How easily can our project be


NFR1.1
used by everyone

Reliability – How reliable is our model’s output


NFR1.2
when compared to other models

NFR1.3 Scalability – By asking for fewer


parameters how easily can our project give
the desired output

Table 2: Non-Functional Requirements


CHAPTER 4

SYSTEM ARCHITECTURE AND DESIGN


4. SYSTEM ARCHITECTURE AND DESIGN

4.1 INTRODUCTION

This system mainly focuses on estimating the names of a crop by making learn the historical
data. Here we have taken a few factors such as macro nutrients, micro nutrients, soil texture and
to which district the soil belongs. The system builds a model to predict the outcome using the
ensembling of ML techniques and Neural Networks.

4.2 SYSTEM ARCHITECTURE

Fig 1. Architecture Diagram


4.3 FLOWCHART

Fig 2. Class Diagram of Crop Detection Model


CHAPTER 5

PROPOSED METHODOLOGY
5. PROPOSED METHODOLOGY

5.1 PROPOSED SYSTEM

The problem at hand is that every soil is different and has different features, so based on those
features this study provides the best crop that can be grown on that soil for maximum yield.75%
of soil health depends on the macronutrients which are Nitrogen, Potassium, and
Phosphorus.Taking these macronutrients, other micronutrients, and features such as texture to
train various machine learning models (decision tree, SVM, random forest, etc.) and a few
Neural Network models. Based on various classification metrics and accuracies given by the
models, the best model is chosen for crop recommendation. A fertility scale will be defined
based on the research conducted and a soil fertility calculator will be implemented which gives a
soil fertility score as the result.

5.2 DATA COLLECTION

Data collection is the process of gathering the information on variables of interest. The data
related to food and agriculture is collected from kaggle and github. This Dataset contains 18
columns and 2736 instances which consist of various nutrients of the soil, ph value of the soil,
texture, and district the soil is from against the crops.The raw data collected, we have performed
data pre-processing and feature selection. The 18 columns of the data frame contain <pH, EC,
OC, N, P, K, S, Ca, Mg, Zn, Cu, Fe, Mn, Mo, Tex, District, label>. Data is collected without any
empty entities, But the high variance in the values for each column is controlled by scaling.

By dropping some of the columns that won’t be of any use to the analysis. So using only relevant
columns make easier to give crops recommendations. Finally, by merging the columns of various
datasets and dropping some columns results in 16 columns.
Table 3: Dataset

5.3 DATA PREPROCESSING

Pre-processing is an approach that is used to convert the unanalyzed data into a structured
dataset. This is to say, whenever different sources are used to gather the data, It is collected in an
unstructured format which is not useful for the analysis. Data Collection is the way of gathering
and measuring information on variables of interest. In the dataset there are two categorical
columns, labeled data are variables that contain label values rather than numerical values. There
is a limited number of possible values, like in this case, crop name and District names. Basically,
machine learning algorithms cannot operate on label data directly. To operate, the labeled data
should be converted to numerical data using any one of the encoding techniques.

5.3.1 FEATURE SELECTION

While developing the machine learning model, only a few variables in the dataset are useful for
building the model, and the rest features are either redundant or irrelevant. If we input the dataset
with all these redundant and irrelevant features, it may negatively impact and reduce the overall
performance and accuracy of the model. Hence it is very important to identify and select the
most appropriate features from the data and remove the irrelevant or less important features,
which is done with the help of feature selection in machine learning.

For finding the outliers in the data features we have plotted various boxplots and other graphs
and from observation, we found that nutrients MO and B columns have many outliers due to
which we have excluded them from features.

5.3.2 NORMALIZATION

We used label encoding on districts and texture values to convert them into numerical data and
performed normalization on all the features.

Label Encoding :

For finding the outliers in the data features we have plotted various boxplots and other graphs
and from observation, we found that nutrients MO and B columns have many outliers due to
which we have excluded them from features. We used label encoding on districts and texture
values to convert them into numerical data and performed normalization on all the features.

The Dataset after removing features that has more outliers and performing label encoding on
crop names and district names is as follows :
Fig 4. Normalization of data

5.4 TRAINING AND TESTING PHASE

The crucial step in data preprocessing is the training and testing of data. The dataset will be split
into two sets, the training dataset and testing dataset. For training the model, we usually require
as many data points as possible, So the data tend to be split inequality. The commonly used split
ratios are 70/30 or 80/20 for training/testing. The training dataset is the initial dataset that is used
to train ML algorithms to learn and produce the right predictions. (80% of the dataset is training
dataset). The test dataset, however, is used to validate how well Machine Learning algorithms are
trained with the training dataset. You can't use the same training dataset in the testing phase
because the Machine Learning algorithm will already know the expected outcome, which defeats
the purpose of testing the algorithm (20% of the dataset is the testing dataset).

For testing we used 80% of data instances i.e 2189 instances and for testing we used 20% i.e 548
instances of data instances.
5.5 MODELS

Before deciding to choose an algorithm to use, Evaluation should be done to choose the best one
among others that fits our dataset. Basically, when a given dataset is being trained by ML
techniques, we try different models to solve the optimization problem, But the suitable model
will neither underfit nor overfit the model. Here, we’ll compare a few Classification models
through their rooted square value.

● Decision Tree
● Support Vector Machines with kernels as RBF and polynomial
● Random Forest
● Artificial Neural Networks

5.5.1 DECISION TREE

Decision Tree is a Supervised learning technique that can be used for both classification and
Regression problems, but mostly it is preferred for solving Classification problems. It is a
tree-structured classifier, where internal nodes represent the features of a dataset, branches
represent the decision rules and each leaf node represents the outcome.In a Decision tree, there
are two nodes, which are the Decision Node and Leaf Node. Decision nodes are used to make
any decision and have multiple branches, whereas Leaf nodes are the output of those decisions
and do not contain any further branches.The decisions or the test are performed on the basis of
features of the given dataset.It is a graphical representation for getting all the possible solutions
to a problem/decision based on given conditions.It is called a decision tree because, similar to a
tree, it starts with the root node, which expands on further branches and constructs a tree-like
structure.In order to build a tree, we use the CART algorithm, which stands for Classification
and Regression Tree algorithm.A decision tree simply asks a question, and based on the answer
(Yes/No), it further split the tree into subtrees.
Below diagram explains the general structure of a decision tree:

Fig 5. Decision Tree Structure

The accuracy of the decision tree classifier is 99.63%.The classification report is as follows:

Fig 6. Classification report of Decision Tree Classifier


5.5.2 SUPPORT VECTOR MACHINE

Support Vector Machine or SVM is one of the most popular Supervised Learning algorithms,
which is used for Classification as well as Regression problems. However, primarily, it is
used for Classification problems in Machine Learning.

The goal of the SVM algorithm is to create the best line or decision boundary that can segregate
n-dimensional space into classes so that we can easily put the new data point in the correct
category in the future. This best decision boundary is called a hyperplane.SVM chooses the
extreme points/vectors that help in creating the hyperplane. These extreme cases are called as
support vectors, and hence the algorithm is termed as Support Vector Machine.

We use Kernelized SVM for non-linearly separable data. Say, we have some non-linearly
separable data in one dimension. We can transform this data into two dimensions and the data
will become linearly separable in two dimensions. This is done by mapping each 1-D data point
to a corresponding 2-D ordered pair. So for any non-linearly separable data in any dimension, we
can just map the data to a higher dimension and then make it linearly separable. This is a very
powerful and general transformation. A kernel is nothing but a measure of similarity between
data points. The kernel function in a kernelized SVM tells you, that given two data points in the
original feature space, what the similarity is between the points in the newly transformed feature
space.

There are various kernel functions available, but two are very popular :

1. Radial Basis Function Kernel (RBF) : The similarity between two points in the
transformed feature space is an exponentially decaying function of the distance
between the vectors and the original input space. RBF is the default kernel used in
SVM.The
accuracy of the SVM with RBF kernel is 86.13%.

2. Polynomial Kernel : The polynomial Kernel takes an additional parameter, ‘degree’ that
controls the model’s complexity and computational cost of the transformation.The
accuracy of the SVM with polynomial kernel is 94.89%.The classification report is as
follows:

Fig 7. Classification report of SVM with Polynomial Kernel

Implemented two SVM models taking kernels as polynomial and rbf based on the accuracies
SVM with polynomial kernel gives better performance.

5.5.3 RANDOM FOREST

The Random forest classifier creates a set of decision trees from a randomly selected subset of
the training set. It is basically a set of decision trees (DT) from a randomly selected subset of the
training set and then It collects the votes from different decision trees to decide the final
prediction.Instead of relying on one decision tree, the random forest takes the prediction from
each tree and based on the majority votes of predictions, and it predicts the final output.

The greater number of trees in the forest leads to higher accuracy and prevents the problem of
overfitting.
The below diagram explains the working of the Random Forest algorithm:

Fig 8 : Working of Random Forest classifier

Random Forest classifier is one of the most used Algorithms in these crop recommendation
systems. At the training phase, Output will be divided based on the number of classes and the
’N’ number of decision trees generated. The more increase in decision trees results in more
accuracy in prediction. i.e., accuracy in prediction is directly proportional to the number of trees.
This Algorithm consists of 3 parameters

• “n”- describes the number of trees that require to grow.

• “m”- Specifies the number of variables that need to be taken under consideration at the
time of split.

• “size”- Indicates the size of the node and number of observations to be taken is suggested
by terminal nodes.

By implementing the Random Forest Classifier, the accuracy obtained is 99.78%.


5.5.4 ARTIFICIAL NEURAL NETWORKS

Artificial neural networks (ANNs), usually simply called neural networks (NNs) or neural nets,
are computing systems inspired by the biological neural networks that constitute animal brains.

An ANN is based on a collection of connected units or nodes called artificial neurons, which
loosely model the neurons in a biological brain. Each connection, like the synapses in a
biological brain, can transmit a signal to other neurons. An artificial neuron receives signals then
processes them and can signal neurons connected to it. The "signal" at a connection is a real
number, and the output of each neuron is computed by some non-linear function of the sum of its
inputs. The connections are called edges. Neurons and edges typically have a weight that adjusts
as learning proceeds. The weight increases or decreases the strength of the signal at a connection.
Neurons may have a threshold such that a signal is sent only if the aggregate signal crosses that
threshold.

Typically, neurons are aggregated into layers. Different layers may perform different
transformations on their inputs. Signals travel from the first layer (the input layer), to the last
layer (the output layer), possibly after traversing the layers multiple times.

Fig 9 : Structure of ANN


For this project 2 types of ANN are implemented.They are

1. ANN-1 : The first ANN model has 20 epochs, 5 Dense layers with relu as an activation
function on alternate layers and sigmoid on the last layer, and adam optimizer with an
85% accuracy. The classification report is as follows :

Fig 10. Classification report of ANN-1

2. ANN-2 : The second ANN model has 20 epochs, 5 Dense layers with relu as an
activation function on all the layers and softmax on the last layer, and rmsprop
optimizer. This model gives an accuracy of 95%. The classification report is as follows:

Fig 11. Classification report of ANN-2

As ANN-2 has higher accuracy than ANN-1.So for this algorithm, ANN-2 is chosen for
predicting the crop names more efficiently.
5.6 CHOOSING BEST MODEL

Checked the various classification metrics of all the models to see which model is superior and
even though a couple of machine learning models have better accuracies than neutral network
models but this is due to limited data but when the dataset is bigger the computational power
consumed by ml models will be much more than neural network models.

Fig 12. Accuracy comparison of all the models

The accuracy comparison of both the Neural Networks is shown in Fig 10. So the superior model
is ANN and ANN model 2 is an optimized, neural network model.

Fig 13. Accuracy comparison of ANN models


5.7 GRAPHICAL OBSERVATION

5.7.1 TRAINING AND VALIDATION ACCURACY

Fig 14. Training and Validation Accuracy

5.7.2 TRAINING AND VALIDATION LOSS

Fig 15. Training and Validation Loss


CHAPTER 6

IMPLEMENTATION
6. IMPLEMENTATION

6.1 CODE SNIPPETS


CHAPTER 7

RESULTS
7.RESULTS

Checked the various classification metrics of all the models to see which model is superior and
even though a couple of machine learning models have better accuracies than neutral network
models but this is due to limited data but when the dataset is bigger the computational power
consumed by ml models will be much more than neural network models so the superior model is
ANN and ANN model 2 is optimized, neural network model. ANN Model 2 is the superior
model and we have chosen this as our model for crop recommender and have integrated this
model with the UI for seamless service to users. For the fertility score, we have made a custom
scale based on literature review and research which gives a score from 0 – 10 in which 0 means
least fertile and 10 is most fertile. We have also implemented a District model which predicts
from which district the soil belongs based on its features as of now we have a limited number of
districts but as we go on to add data from different districts this feature becomes more and more
useful. As for the UI, a website called Crop Booster has been created which was integrated with
our ANN model. The user must enter data such as ph, N, P, K, etc. and the output will be
displayed on the right. Users will also be given the option to generate random data.

Fig 16. UI for Crop Booster


CHAPTER 8

CONCLUSION AND FUTURE SCOPE


7. CONCLUSION AND FUTURE SCOPE

8.1 CONCLUSION

We are able to build a system model that can determine the suitable crop recommendation by
training the collected dataset.This system model can be implemented with the help of few
machine learning algorithms, by training and testing on the collected data.This model also
predicts the fertility score and to which district the soil belongs based on its features.We
implemented a highly accessible UI (web or app) in which the farmer or the end user can use the
crop recommender and fertility score calculator with ease.

8.2 FUTURE SCOPE

In the future, we will include Recommendations for better soil health if a lower value is
determined on the soil fertility scale, and we will also attempt to integrate our software with IoT
devices in order to automate the process using sensors. For greater reach, the UI will be made
more easily accessible and clear.
CHAPTER 9

REFERENCES
8. REFERENCES

• Shilpa Mangesh Pande , Dr. Prem Kumar Ramesh , Anmol , B.R Aishwarya , Karuna
Rohilla , Kumar Shaurya, 2021, Crop Recommender System Using Machine
Learning Approach , IEEE

• Mummaleti Keerthana, K J M Meghana, Dr. Modepalli Kavitha,


Siginamsetty Pravallika, 2021, An Ensemble Algorithm for Crop Yield
Prediction, IEEE

• Sadia Afrin, Abu Talha Khan, Mahrin Mahia, Rahbar Ahsan, Mahbub Rahman
Mishal, Wasit Ahmed, Rashedur M. Rahman, 2021, Analysis of Soil Properties and
Climatic Data to Predict Crop Yields and Cluster Different Agricultural Regions of
Bangladesh, IEEE

• Ganesh Babu R, Chellaswamy C, Geetha T S, Venkatachalam K, Mulla M A, Daniel


Raj T, 2020 , Soil Test Based Smart Agriculture Management System, IEEE

• Bhawana Sharma, Jay Kant Pratap Singh Yadav, Sunita Yadav, 2020, Predict Crop
Production in India Using Machine Learning Technique: A Survey, IEEE

• Sudha Bhatia, Reshmi S Nair , Ved Prakash Mishra, 2021. Nutrient Analysis of Soil
Samples Treated with Agrochemicals, IEEE

• S.Bangaru Kamatchi, R.Parvathi, 2019, Improvement of Crop Production Using


Recommender System by Weather Forecasts, Elsvier

• Akshay Tripathia,Reet Kamal Tiwaria, Surya PrakashTiwari , 2022 , A deep learning


multi-layer perceptron and remote sensing approach for soil health-based crop yield
estimation, Elsvier

• Priyadharshini A, Swapneel Chakraborty, Aayush Kumar, Omen Rajendra


Pooniwala, 2021, Intelligent Crop Recommendation System using Machine
Learning, IEEE
• Dr. Y. Jeevan Nagendra Kumar, Dr. Y. Jeevan Nagendra Kumar, V.S. Vaishnavi,
K. Neha, V.G.R.R. Devi, 2020, Supervised Machine learning Approach for Crop
Yield Prediction in Agriculture Sector, IEEE

• Atharva Jadhav, Nihar Riswadkar, Pranay Jadhav ,Yash Gogawale, 2022, Crop
Recommendation System Using Machine Learning Algorithms, IRJET

• S. Pudumalar, E. Ramanuja, R. Harine Rajashree, C. Kavya,T. Kiruthika, J. Nisha,


2016, Crop Recommendation System for Precision Agriculture, IEEE

• Aditya Motwani, Param Patil, Vatsa Nagaria, Shobhit Verma, Sunil Ghane, 2022,
Soil Analysis and Crop Recommendation using Machine Learning

• Takalani Orifha Mufamadi, Ritesh Ajoodha, 2020, Crop Recommendation


using Machine Learning Algorithms and Soil Attributes Data, IEEE

• Hicham Chouikh ,Younes OMMANE,Mohamed Amine Rhanbouri, 2022, Machine


Learning based Recommender Systems for Crop Selection: A Systematic
Literature Review, Research Square

You might also like