Professional Documents
Culture Documents
1 Introduction
A reaper’s call regarding selection of crop to grow is usually obscured by his in-
stincts and alternative parameters like generating immediate gains, insufficiency
of understanding regarding retail stipulation, overvaluation a soil’s capacity to
bolster a particular crop, and so on. A misinformed call on the portion of the
reaper may put a major pressure on his family’s budgetary status. Conceivably this
may be one in each one of the various causes conducive to the in-numerous self-
murder instances of reapers that we tend to listen in the news.
In a country like India, where agriculture and also, the connected sectors contrib-
ute so much to its Gross price side (GVA), such an inaccurate perception would
have obstructive impact on not essentially the farmer’s family, however also the
total wealth of a district. For this reason, we've known a reaper’s perplexity re-
garding selection of crop to grow as per the soil and environmental condition pa-
rameters, as a difficult one. The need of the hour is to style an application that
would offer prognostic experience to the Indian farmers, in this way serving to
them how to create a knowledgeable call regarding selection of the crop to grow.
With existing forecasting and monitoring strategies, we can address this issue.
While these strategies are helpful, there is no ideal solution to suggest a harvest. A
few of the disadvantages found with the existing framework are inadequate analy-
sis, the selection of effective algorithms and the efficient selection of attributes, all
these parameters can influence the crop yield. The proposed framework helps to
overcome the disadvantages of the existing system. With this in mind, we tend to
propose a framework that may contemplate natural parameters (temperature, rain-
fall, humidity) and soil characteristics (pH value, Nitrogen value, Phosphorus
value, Potassium value) before recommending the prime appropriate crop to the
client.
2 Related Work
The present project was considered after going through various previously
published studies. These studies covered various aspects of the project including
the parameters, methods and performance measures. But to bring in a novelty and
to take this topic one step ahead we decided to work with a technology and
method that has not been used yet in the field. This literature review is intended to
provide an insight into the various research papers published previously about this
topic.
Viviliya et al. authored research named “The Design of Hybrid Crop
Recommendation System using Machine Learning Algorithms” [1]. This probing
article brings forward a hybrid crop recommendation framework utilizing
algorithms such as J48, affiliation rules and Naive Bayes. The suggestion
framework is outlined based on certain traits such as atmospheric and geographic
features with the type of soil, region, content of soil nutrients, water table and
temperature.
Another study was conducted by Dhruv Parikh et al. Their research work uses
scikit-learn to implement machine learning algorithm and GUI implementation to
predict the best suitable crop based on different input parameters. Using Tkinter
gives a graphical user interface for the user to interact with the model more easily.
This research work was named “Machine Learning Based Crop Recommendation
System” [2].
The paper “Crop Prediction Using Machine Learning” [3] authored by Lisha
Varghese et al. takes nitrogen(N), phosphorus(P), potassium(K) and pH values
into consideration to determine the best crop for the given soil conditions. It uses
various algorithms like Naïve Bayes, k-Nearest Neighbours, Decision Tree and
Support Vector Machine to recommend the most suitable crop for the input soil
parameters.
An application program by the name of "Crop Advisor" was devised to forecast
the impact of atmospheric features over crop yield under the work done by
Veenadhari et al. Their publication “Machine Learning approach for Forecasting
Crop Yield based on Climatic Parameters” [4] uses Algorithm C .5 to determine
the climatic parameters with the greatest impact on the crop yield of selected
harvest. This software delivers a signal of the relative impact of different climatic
parameters on harvest yield.
3 Proposed approach
The proposed system is depicted using an architecture diagram as shown in fig. 1,
where we have a database with pre-processed data which will be split into two
parts, one part is the training data and the other part is the testing data. The first
part of the data, i.e., the training data will be utilized to tutor or train the ensemble
model which will result in a recommendation model which will be further tested
for accuracy and also undergo cross-validation tests. The final model will use the
testing data to make concluding predictions.
Initially, the raw data collected from the UCI repository in two files. One data file
contains the soil parameters, namely, pH value, Nitrogen value, Phosphorus value,
Potassium value; and the corresponding crop labels. The other data file contains
the climatic parameters, namely, temperature, humidity, rainfall; and the corre-
sponding crop labels. The data rows in both the files are also unequal. So, to solve
this problem and have one proper database with all the parameters and their corre-
sponding crop label, we identify the common labels in both files and copy all the
parameter values into a new file with the corresponding crop label. Fig. 2 shows a
list of all the parameters in the processed dataset, and Fig. 3 and 4 give further in-
sight into the database’ size, shape and label counts.
Bagging stands for "Bootstrap Aggregation" and is used to reduce variance in the
forecast model. Grouping generates additional training data from the initial data
sets. This is performed by random sampling by the substitution of the initial data
set. Substitute selection can replicate some values with every new training dataset.
All Bagging items have an equal chance to pitch up in the new data set. These
multiple data sets are utilized in training different models simultaneously. All the
forecasts from the various ensemble models are averaged. Using the voting
implement, the majority vote obtained is taken into account in the ranking.
Bagging not only reduces variance but also, adjusts the prediction to an expected
outcome. This method is used by many algorithms like the Random Forest
algorithm, where many decision trees with high variance are present. Random
Forest randomly selects some parameters to grow trees. A random forest is formed
by several random trees combined together.
3.4 Boosting
The dataset is divided into two parts: training data and test data, fig. 5 shows the
same. The trained Random Forest and XGBoost models are derived by applying
the training and test datasets to the ensemble learning algorithms. The soil
properties which include pH value, nitrogen, phosphorus, and potassium; and
climatic parameters such as temperature, humidity and rainfall are provided as
input data for the recommender model. The model will look for a crop that has the
closest value to the entered values. The output generated by the model will be the
name of the crop that is best suitable for the entered values.
4.1 Accuracy
Accuracy is evaluated by dividing the tally of correct predictions made by the
model with the total tally of predictions made by the model. It is a metric that is
utilized to judge the classification model’s performance. It is the part of the pre-
diction that the model correctly obtained. It can be represented as:
4.2 Precision
Precision is evaluated by dividing the tally of true positives with the sum of
tally of true positives and tally of false positives. It is a metric which evaluates
the potential of a classification model to not mark an occurrence as positive
when it is in fact negative. It can be represented as:
4.3 Recall
Recall is evaluated as shown below by dividing the tally of true positives by the
sum of tally of true positives and tally of false negatives.
4.4 f1-score
The f1-score is a weighted harmonic mean of precision and recall. The score is
considered as the best score if its value comes out to be 1.0 and the worst if the
value comes out to be 0.0. The f1-scores are lower than accuracy measures be-
cause they embed precision and recall into their computation.[7]
4.5 Support
Support is the total tally of true instances of the class in the defined data set.
4.6 Results
Table. 1 shows the classification reports of both bagging and boosting methods.
Fig. 6 gives a comparison between the accuracy of the two algorithms. As evident,
the boosting method using XGBoost algorithm yields better accuracy as compared
to all other methods.
Fig. 6 Accuracy comparison of Random Forest (Bagging) and XGBoost (Boosting) models
5 Conclusion
A large and sufficient dataset is collected from the UCI repository which is then
divided into two parts, viz., the training dataset and the testing dataset. The
ensemble model is supplied with training dataset for creating the crop suggestion
prediction model. Once the model is generated with maximum accuracy and
minimum error, the testing dataset is fed to the model for generating the outputs.
The model finally makes the predictions and recommends the crops that can be
sown according to the inputs entered about the various soil and climatic
parameters with an accuracy of about 99.31%.
References
1. Viviliya, B. and Vaidhehi, V., “The Design of Hybrid Crop Recommendation System using
Machine Learning Algorithms”
2. Dhruv Piyush Parikh, Jugal Jain, Tanishq Gupta and Rishit Hemant Dabhade, “Machine
Learning Based Crop Recommendation System”
3. Kevin Tom Thomas, Varsha S, Merin Mary Saji, Lisha Varghese, Er. Jinu Thomas, “Crop
Prediction Using Machine Learning”
4. S.Veenadhari, Dr. Bharat Misra, Dr. CD Singh, “Machine Learning approach for
Forecasting Crop Yield based on Climatic Parameters”
6. https://bdtechtalks.com/2020/11/12/what-is-ensemble-learning/
7. https://medium.com/@kohlishivam5522/understanding-a-classification-report-for-your-
machine-learning-model-88815e2ce397