You are on page 1of 10

An efficient architecture for the accurate prediction of

Crop using Ensemble Learning


1 2 3
Saloni Jain , Anurag Gautam and *S.Priya
1
SRM Institute of Science and Technology
1
sj4446@srmist.edu.in
2
SRM Institute of Science and Technology
2
ag7711@srmist.edu.in
3*
Assistant Professor, SRM Institute of Science and Technology
3
priyas3@srmist.edu.in

Abstract. Agriculture is not just a supply of resources but a way of life. It is


the main source of food, fibers, and other raw material. 70 % of the Indian
population specifically depends on agribusiness. The most common
problem faced by Indian farmers is productivity. This issue arises because
the farmers do not choose the crops based on the quality of their own soil
and other climatic factors clubbed together. This concern can be solved with
the assistance of machine learning algorithms which is found to be a
successful strategy for predictive analysis. This study gives solutions like
proposing a recommendation system to recommend suitable crops based on
soil and climatic parameters with high specific accuracy using the ensemble
learning algorithm. Thus, the system will greatly help the farmers to make
valuable decisions.
Keywords: ensemble learning, recommendation system, machine learning,
smart agriculture

1 Introduction

A reaper’s call regarding selection of crop to grow is usually obscured by his in-
stincts and alternative parameters like generating immediate gains, insufficiency
of understanding regarding retail stipulation, overvaluation a soil’s capacity to
bolster a particular crop, and so on. A misinformed call on the portion of the
reaper may put a major pressure on his family’s budgetary status. Conceivably this
may be one in each one of the various causes conducive to the in-numerous self-
murder instances of reapers that we tend to listen in the news.

In a country like India, where agriculture and also, the connected sectors contrib-
ute so much to its Gross price side (GVA), such an inaccurate perception would
have obstructive impact on not essentially the farmer’s family, however also the
total wealth of a district. For this reason, we've known a reaper’s perplexity re-
garding selection of crop to grow as per the soil and environmental condition pa-
rameters, as a difficult one. The need of the hour is to style an application that
would offer prognostic experience to the Indian farmers, in this way serving to
them how to create a knowledgeable call regarding selection of the crop to grow.

With existing forecasting and monitoring strategies, we can address this issue.
While these strategies are helpful, there is no ideal solution to suggest a harvest. A
few of the disadvantages found with the existing framework are inadequate analy-
sis, the selection of effective algorithms and the efficient selection of attributes, all
these parameters can influence the crop yield. The proposed framework helps to
overcome the disadvantages of the existing system. With this in mind, we tend to
propose a framework that may contemplate natural parameters (temperature, rain-
fall, humidity) and soil characteristics (pH value, Nitrogen value, Phosphorus
value, Potassium value) before recommending the prime appropriate crop to the
client.

This research work proposes an ensemble learning-based crop recommendation


framework utilizing strategies such as bagging and boosting. The execution of the
proposed model is effective in recommending cultivation depending on different
feature inputs. The different steps of this exploration work are collecting informa-
tion, pre-processing, modelling a recommendation model, training, and testing the
recommendation model.

The primary step in pre-processing is the selection of features and parameters.


Since the data set is an unbalanced data set, the second pre-processing step is to
adjust the information set. The proposed recommendation model has been devel-
oped, trained and tested to recommend a suitable crop to the farmer. This study fo-
cuses to put forward an ensemble learning-based recommendation model that pre-
scribes the most appropriate harvest for the inputs given. The execution of the
model is approved against different features, and the ensemble learning-based rec-
ommendation model is effective in prescribing things to the client.

2 Related Work

The present project was considered after going through various previously
published studies. These studies covered various aspects of the project including
the parameters, methods and performance measures. But to bring in a novelty and
to take this topic one step ahead we decided to work with a technology and
method that has not been used yet in the field. This literature review is intended to
provide an insight into the various research papers published previously about this
topic.
Viviliya et al. authored research named “The Design of Hybrid Crop
Recommendation System using Machine Learning Algorithms” [1]. This probing
article brings forward a hybrid crop recommendation framework utilizing
algorithms such as J48, affiliation rules and Naive Bayes. The suggestion
framework is outlined based on certain traits such as atmospheric and geographic
features with the type of soil, region, content of soil nutrients, water table and
temperature.
Another study was conducted by Dhruv Parikh et al. Their research work uses
scikit-learn to implement machine learning algorithm and GUI implementation to
predict the best suitable crop based on different input parameters. Using Tkinter
gives a graphical user interface for the user to interact with the model more easily.
This research work was named “Machine Learning Based Crop Recommendation
System” [2].
The paper “Crop Prediction Using Machine Learning” [3] authored by Lisha
Varghese et al. takes nitrogen(N), phosphorus(P), potassium(K) and pH values
into consideration to determine the best crop for the given soil conditions. It uses
various algorithms like Naïve Bayes, k-Nearest Neighbours, Decision Tree and
Support Vector Machine to recommend the most suitable crop for the input soil
parameters.
An application program by the name of "Crop Advisor" was devised to forecast
the impact of atmospheric features over crop yield under the work done by
Veenadhari et al. Their publication “Machine Learning approach for Forecasting
Crop Yield based on Climatic Parameters” [4] uses Algorithm C .5 to determine
the climatic parameters with the greatest impact on the crop yield of selected
harvest. This software delivers a signal of the relative impact of different climatic
parameters on harvest yield.

The Crop Recommendation system in the study “Crop Recommendation System


through Soil Analysis Using Classification in Machine Learning” [5] by Nishitha
et al. apply the k-Nearest Neighbour classification algorithm which is also a
supervised ML algorithm to suggest appropriate crops with higher efficiency and
accuracy. The model created by them lists the appropriate crops based on the soil
parameters used by them to tutor the ML model and leaves it upon the farmers to
choose which crop they want to sow.

3 Proposed approach
The proposed system is depicted using an architecture diagram as shown in fig. 1,
where we have a database with pre-processed data which will be split into two
parts, one part is the training data and the other part is the testing data. The first
part of the data, i.e., the training data will be utilized to tutor or train the ensemble
model which will result in a recommendation model which will be further tested
for accuracy and also undergo cross-validation tests. The final model will use the
testing data to make concluding predictions.

Fig. 1 Architecture Diagram of Proposed System

3.1 Data Pre-processing

Initially, the raw data collected from the UCI repository in two files. One data file
contains the soil parameters, namely, pH value, Nitrogen value, Phosphorus value,
Potassium value; and the corresponding crop labels. The other data file contains
the climatic parameters, namely, temperature, humidity, rainfall; and the corre-
sponding crop labels. The data rows in both the files are also unequal. So, to solve
this problem and have one proper database with all the parameters and their corre-
sponding crop label, we identify the common labels in both files and copy all the
parameter values into a new file with the corresponding crop label. Fig. 2 shows a
list of all the parameters in the processed dataset, and Fig. 3 and 4 give further in-
sight into the database’ size, shape and label counts.

Fig. 2 Parameter columns of the processed database


Fig. 3 Size and shape of the processed database

Fig. 4 Label counts of processed dataset

3.2 Ensemble Learning

Ensemble learning is a popular machine learning technique that combines multiple


models to improve the overall accuracy of machine learning algorithms.[6] En-
semble modelling is a great way to enhance the model presentation. It is a repeti -
tive cycle in which different models are intentionally created and combined to deal
with a particular computing problem.
3.3 Bagging

Bagging stands for "Bootstrap Aggregation" and is used to reduce variance in the
forecast model. Grouping generates additional training data from the initial data
sets. This is performed by random sampling by the substitution of the initial data
set. Substitute selection can replicate some values with every new training dataset.
All Bagging items have an equal chance to pitch up in the new data set. These
multiple data sets are utilized in training different models simultaneously. All the
forecasts from the various ensemble models are averaged. Using the voting
implement, the majority vote obtained is taken into account in the ranking.
Bagging not only reduces variance but also, adjusts the prediction to an expected
outcome. This method is used by many algorithms like the Random Forest
algorithm, where many decision trees with high variance are present. Random
Forest randomly selects some parameters to grow trees. A random forest is formed
by several random trees combined together.

3.4 Boosting

Boosting is a succeeding ensemble method which repeatedly alters the observa-


tion weight according to the latest classification. If an observation is misclassi-
fied, weight is added to that specific observation only. The terminology "boost-
ing" commonly cites the algorithms which turn a base learner into a strong
learner. Boosting reduces bias error and shapes robust predictive models.
The erroneously predicted data points in each iteration are recognized and their
weight is increased. XGBoost is a boosting algorithm, which is short for eX-
treme Gradient Boosting. This is created to improve speed and performance as
compared to the gradient boosted decision trees.

4 Building Recommender Model

The dataset is divided into two parts: training data and test data, fig. 5 shows the
same. The trained Random Forest and XGBoost models are derived by applying
the training and test datasets to the ensemble learning algorithms. The soil
properties which include pH value, nitrogen, phosphorus, and potassium; and
climatic parameters such as temperature, humidity and rainfall are provided as
input data for the recommender model. The model will look for a crop that has the
closest value to the entered values. The output generated by the model will be the
name of the crop that is best suitable for the entered values.

Fig. 5 Splitting of Dataset into train and test data

4.1 Accuracy
Accuracy is evaluated by dividing the tally of correct predictions made by the
model with the total tally of predictions made by the model. It is a metric that is
utilized to judge the classification model’s performance. It is the part of the pre-
diction that the model correctly obtained. It can be represented as:

4.2 Precision

Precision is evaluated by dividing the tally of true positives with the sum of
tally of true positives and tally of false positives. It is a metric which evaluates
the potential of a classification model to not mark an occurrence as positive
when it is in fact negative. It can be represented as:

4.3 Recall

Recall is evaluated as shown below by dividing the tally of true positives by the
sum of tally of true positives and tally of false negatives.

4.4 f1-score

The f1-score is a weighted harmonic mean of precision and recall. The score is
considered as the best score if its value comes out to be 1.0 and the worst if the
value comes out to be 0.0. The f1-scores are lower than accuracy measures be-
cause they embed precision and recall into their computation.[7]
4.5 Support

Support is the total tally of true instances of the class in the defined data set.

4.6 Results

Table. 1 shows the classification reports of both bagging and boosting methods.
Fig. 6 gives a comparison between the accuracy of the two algorithms. As evident,
the boosting method using XGBoost algorithm yields better accuracy as compared
to all other methods.

Classes Precision Recall f1-score Support


Random XGBoost Random XGBoost Random XGBoost Random XGBoost
Forest Forest Forest Forest
apple 1.00 1.00 1.00 1.00 1.00 1.00 13 13
banana 1.00 1.00 1.00 1.00 1.00 1.00 17 17
blackgram 0.94 1.00 1.00 1.00 0.97 1.00 16 16
chickpea 1.00 1.00 1.00 1.00 1.00 1.00 21 21
coconut 1.00 1.00 1.00 1.00 1.00 1.00 21 21
coffee 1.00 1.00 1.00 1.00 1.00 1.00 22 22
cotton 1.00 1.00 1.00 1.00 1.00 1.00 20 20
grapes 1.00 1.00 1.00 1.00 1.00 1.00 18 18
jute 0.90 0.96 1.00 0.93 0.95 0.95 28 28
kidneybeans 1.00 1.00 1.00 1.00 1.00 1.00 14 14
lentil 1.00 1.00 1.00 1.00 1.00 1.00 23 23
maize 1.00 1.00 1.00 1.00 1.00 1.00 21 21
mango 1.00 1.00 1.00 1.00 1.00 1.00 26 26
mothbeans 1.00 1.00 0.95 1.00 0.97 1.00 19 19
mungbean 1.00 1.00 1.00 1.00 1.00 1.00 24 24
muskmelon 1.00 1.00 1.00 1.00 1.00 1.00 23 23
orange 1.00 1.00 1.00 1.00 1.00 1.00 29 29
papaya 1.00 1.00 1.00 1.00 1.00 1.00 19 19
pigeonpeas 1.00 1.00 1.00 1.00 1.00 1.00 18 18
pomegranate 1.00 1.00 1.00 1.00 1.00 1.00 17 17
rice 1.00 0.88 0.81 0.94 0.90 0.91 16 16
watermelon 1.00 1.00 1.00 1.00 1.00 1.00 15 15

accuracy 0.99 0.99 440 440


macro avg 0.99 0.99 0.99 0.99 0.99 0.99 440 440
weighted 0.99 0.99 0.99 0.99 0.99 0.99 440 440
avg
Table. 6 Classification reports of Random Forest (Bagging) and XGBoost (Boosting) models

Fig. 6 Accuracy comparison of Random Forest (Bagging) and XGBoost (Boosting) models

5 Conclusion
A large and sufficient dataset is collected from the UCI repository which is then
divided into two parts, viz., the training dataset and the testing dataset. The
ensemble model is supplied with training dataset for creating the crop suggestion
prediction model. Once the model is generated with maximum accuracy and
minimum error, the testing dataset is fed to the model for generating the outputs.
The model finally makes the predictions and recommends the crops that can be
sown according to the inputs entered about the various soil and climatic
parameters with an accuracy of about 99.31%.

References

1. Viviliya, B. and Vaidhehi, V., “The Design of Hybrid Crop Recommendation System using
Machine Learning Algorithms”

2. Dhruv Piyush Parikh, Jugal Jain, Tanishq Gupta and Rishit Hemant Dabhade, “Machine
Learning Based Crop Recommendation System”
3. Kevin Tom Thomas, Varsha S, Merin Mary Saji, Lisha Varghese, Er. Jinu Thomas, “Crop
Prediction Using Machine Learning”

4. S.Veenadhari, Dr. Bharat Misra, Dr. CD Singh, “Machine Learning approach for
Forecasting Crop Yield based on Climatic Parameters”

5. Dr.A.K.Mariappan, Ms C. Madhumitha, Ms P. Nishitha, Ms S. Nivedhitha, “Crop


Recommendation System through Soil Analysis Using Classification in Machine Learning”

6. https://bdtechtalks.com/2020/11/12/what-is-ensemble-learning/

7. https://medium.com/@kohlishivam5522/understanding-a-classification-report-for-your-
machine-learning-model-88815e2ce397

8. Anguraj.K, Thiyaneswaran.B, Megashree.G, Preetha Shri.J.G, Navya.S, Jayanthi. J, “Crop


Recommendation on Analyzing Soil Using Machine Learning”

You might also like