Professional Documents
Culture Documents
Authors:-
Pratima Chaudhury
Kalinga Institute of Industrial Technology
3rd Year Information Technology B.Tech
Abstract-
Most agricultural crops have been badly affected by the effect of global climate change in India. This project will allow
farmers to capture the yield of their crops before cultivation in the field of agriculture and thus help them make the
necessary decisions. It utilizes Random Forest which is a Machine Learning Algorithm. By researching such
problems and issues such as weather, temperature, humidity, rainfall, humidity, there are no adequate solutions and
inventions to resolve the situation we face. In countries like India, even in the agricultural sector, there are many
types of increasing economic growth. In addition, the processing is useful for forecasting the production of crop
yields.
Keywords— Machine Learning; Crop_yield_prediction; Random forest Algorithm;
1. Introduction
The Indus Valley Civilization Period is when India's agricultural history began. In this industry, India is ranked second.
Agriculture and allied sectors account for 20.2% of GVA (gross value added) in fiscal year 2020-2021, which is 1.8%
higher than the previous fiscal year 2019-2020, and 18.8% with 42.6% of the workforce in fiscal year 2021-2022. In
terms of net cultivated area, India leads the world with 9.6% of all arable land, followed by the US (8.9%), China
(8.8%), and Russia (8.8%). According to demographics, India's socio economic fabric is mostly based on agriculture.
The GDP contribution of agriculture in India is significantly declining as industrialization rises. Integration with
technology is not at the desired level, which is a problem for the Indian agricultural sector. The reason why the
agriculture sector's full potential is not being used. It is difficult for farmers to predict the rainfall and temperature,
which has an impact on the yield of crops, as a result of the overuse of industrial technologies and non-renewable
resources. Here, machine learning can help farmers by using algorithms like RNN, LSTM, and others to predict
trends in temperature, rainfall, and crop yield. Due to the ability to pre-plan crops in accordance with the prediction,
this will help farmers with predictions ease their lifestyle a little bit and increase the yield and quality of their harvest.
The practical implementation of machine learning techniques and its quantification are the main topics of this study.In
order to obtain a consistent trend, the work presented here additionally takes into account the erratic data from the
temperature and rainfall databases. Contrary to the customary practice of making predictions about crop yields by
only taking into account one aspect at a time, this method takes into account all of the factors
The remainder of the paper is structured as follows. Section 2 contains Literature Surveys of the researches that
were done before this paper. Section 3 contains Methodology that briefly describes the different algorithms and the
requirements for ML Algorithms. Section 4 contains the Proposed Model. Section 5 contains Brief Detail on Data
Sources and Datasets. Section 6 contains the Prediction Result that we get after using Formula. Section 7 Contains
Result and Analysis that we get after processing the data in the Random Forest Model. Section 8 contains Pros and
Cons of the proposed model. Section 9 Contains Conclusion of the paper and Future use for the Proposed model.
2. Literature Survey
On a dataset from the Indian government, experiments by Aruvansh Nigam, Saksham Garg, and Archit Agrawal[1]
showed that the Random Forest machine learning method provides the best yield forecast accuracy.
Balamurugan [2], have implemented crop yield prediction by using only the random forest classifier. Various features
like rainfall, temperature and season were taken into account to predict the crop yield.
According to Dr. Y. Jeevan Nagendra Kumar [3], supervised learning allows machine learning algorithms to forecast
an objective or outcome. This study focuses on supervised learning methods for predicting crop yields.
Jig Han et al. [4] used a random forest algorithm to predict global and regional crop yields for potato, maize, and
wheat, as well as environmental variables such soil, climate, photoperiod, fertilization data, and water.
Leo Brieman [5] specializes in the random forest algorithm's accuracy, strength, and correlation. The random forest
algorithm generates decision trees from different data samples, predicts the data from each subset, and then
provides the best answer for the system by voting.
Mishra [6] has theoretically described various machine learning techniques that can be applied in various forecasting
areas.
Using data mining techniques, Shastry et al[7] fitted various regression models to forecast crop yield in India. The
crop yields of maize, wheat, and cotton are studied using time series data, soil, and weather parameters.
Manjula's et al.[8] research aimed to propose and implement a rule-based system to predict crop yield production
from past data by using association rule mining on agricultural data from 2000 to 2012.
Saeed Khaki, Lizhi Wang and Sotirios V. Archontoulis [9] uses CNN-RNN model for Crop yield prediction. Its used to
capture the time dependencies of environmental factors and the genetic improvement of seeds over time without
having their genotype information.
Using Random forest algorithm for Crop yield prediction is used by Mayank Champaneri , Darpan Chachpara ,
Chaitanya Chandvidkar , Mansing Rathod [10].
Thomas van Klompenburga , Ayalew Kassahuna , Cagatay Catalb [11] using different deep learning methods to find
best performing model, models for Crop yield prediction.
S. Vinson Joshua, A. Selwin Mich Priyadharson [12] used General Regression Neural Networks (GRNN), Back
Propagation Neural Network(BPNN), Support Vector Machine(SVM) for crop yield prediction.
2 Crops Arecanut, Other Kharif pulses, Rice, Banana, Cashewnut, Coconut, Dry ginger, Sugarcane, Sweet potato,
Tapioca, Black pepper, Dry chillies, other oilseeds, Turmeric, Maize, Moong(Green Gram), Urad, Arhar/Tur,
Groundnut, Sunflower, Bajra, Castor seed, Cotton(lint), Horse-gram, Jowar, Korra, Ragi, Tobacco, Gram,
Wheat, Masoor, Sesamum, Linseed, Safflower, Onion, other misc. Pulses, Samai, Small millets, Coriander,
Potato, Other Rabi pulses, Soyabean, Beans & Mutter(Vegetable), Bhindi, Brinjal, Citrus Fruit, Cucumber,
Grapes, Mango, Orange, other Fibers, Other Fresh Fruits, Other Vegetables, Papaya, Pome Fruits, Tomato,
Rapeseed & Mustard, Mesta, Cowpea(Lobia), Lemon, Pomegranate, Sapota, Cabbage, Peas, Niger seed,
Bottle Gourd, Sannhamp, Varagu, Garlic, Ginger, Oilseeds total, Pulses total, Jute, Peas & beans (Pulses),
Blackgram, Paddy, Pineapple, Barley, Khesari, Guar seed, Other Cereals & Millets, Cond-spcs other, Turnip,
Carrot, Redish, Arcanut (Processed), Atcanut (Raw),Cashew Nut Processed, Cashew Nut Raw, Cardamom,
Rubber, Bitter Gourd, Drum Stick, JackFruit, Snake Guard, Pump Kin, Tea, Coffee, Cauliflower, Other Citrus
Fruit, Water Melon, Total foodgrain, Kapas, Colocasia, Lentil, Bean, Jobster, Perilla, Rajmash Kholar,
Ricebean (nagadal), Ash Gourd, Beet Root, Lab-Lab, Ribbed Gourd, Yam, Apple, Peach, Pear, Plums,
Litchi, Ber, Other Dry Fruit, Jute & mesta
5. Proposed Model
The diagram of the proposed model shown in Fig[4] above is of Random Forest Model and it works in several steps
those are:
1. When the Algorithm is started the Data Sets are Loded in the model and Graphs are made according to
them in the 1st step and random samples are taken from the data sets that are then processed to get them
in suitable form to Construct Decision Trees.
2. When the Decision Trees are made they are made using Attribute selection Process and the attributes that
are selected are data points[subset] selected by the user and then the Decision Trees that are formed then
get the data and then the Decision Trees create some set of rules and formulas to predict the result each
tree uses different sets of data and form different rules for prediction.
3. The Result from each Decision Tree is taken and Voted upon By the random Forest Classifier and the result
that gets highest votes Gets selected for the Final Result.
4. The Final Result is Displayed and Graphs are made according to the result.
Fig[4] Proposed System Flowchart
The Random Forest Algorithm gets illustrated in Pseudo-code(1) in Table[2]. Out of all the features, K random
features can be chosen using a best split point scheme. Then, N trees are produced, each with a d node and several
daughter nodes. In this area of prediction, Random Forest provides the highest accuracy because it trains N numbers
of trees, and more trees lead to greater accuracy. It can manage enormous volumes of data.
The voting process is highlighted in Pseudo-code(2) in Table[3] to provide the final result. Each trained tree utilizes a
random set of data to predict an outcome for each event. This process is repeated numerous times, saving the
results for each event. Next, the voting process is started, and each tree casts votes for each outcome. The outcome
with the most votes is then chosen as the outcome for the event. If two results are in conflict, the data are again voted
on, and the result with the highest number of votes is chosen.
To perform prediction using the trained random forest algorithm uses the below pseudocode(2) as shown in
Fig[5]:
1. We used the test features and each random decision tree to predict the output and the outcome, which was
then saved.
2. The vote given by each decision tree for each predicted event was then calculated.
3. Finally, we looked at the most popular predicted outcome, which is the random forest algorithm's final
forecast.
Fig[7] Prediction of Yield through Linear Regression & Random Forest Model
After comparing linear regression and random forest regression, we performed an analysis on decision trees, which
revealed that the decision tree's R2 value was 0.93 as shown in Fig[8] , which was significantly lower than the R2
score of the random forest, indicating that the random forest was the most effective technique for the dataset in
question, with an accuracy of 95.32 percent and a standard deviation of 4.72%, as shown in the Fig[9].
10. References
1. Breiman, L. Random Forests. Machine Learning 45, 5–32 (2001). https://doi.org/10.1023/A:1010933404324
2. U. Muthaiah, "Predicting yield of the crop using machine learning algorithm", International journal of
engineering sciences & research Technology(IJESRT), 5.164,UGC Approved (2018)
3. Mishra, Subhadra & Mishra, Debahuti & Santra, Gour. (2016). Applications of Machine Learning Techniques
in Agricultural Crop Production: A Review Paper. Indian Journal of Science and Technology. 9.
10.17485/ijst/2016/v9i38/95032.
4. Breiman, L. (2001). Random Forests. Machine Learning, 45, 5-32.
5. Mahore, Pallavi & Bardekar, Dr. (2021). Crop Yield Prediction. International Journal of Scientific Research in
Computer Science, Engineering and Information Technology. 561-569. 10.32628/CSEIT2173168.
6. Champaneri, Mayank & Chachpara, Darpan & Chandvidkar, Chaitanya & Rathod, Mansing. (2020). CROP
YIELD PREDICTION USING MACHINE LEARNING. International Journal of Science and Research (IJSR).
9. 2.
7. Khaki S, Wang L, Archontoulis SV. A CNN-RNN Framework for Crop Yield Prediction. Front Plant Sci. 2020
Jan 24;10:1750. doi: 10.3389/fpls.2019.01750. PMID: 32038699; PMCID: PMC6993602.
8. Champaneri, Mayank & Chachpara, Darpan & Chandvidkar, Chaitanya & Rathod, Mansing. (2020). CROP
YIELD PREDICTION USING MACHINE LEARNING. International Journal of Science and Research (IJSR).
9. 2.
9. van Klompenburg, T., Kassahun, A., & Catal, C. (2020). Crop yield prediction using machine learning: A
systematic literature review. Computers and Electronics in Agriculture, 177, [105709].
https://doi.org/10.1016/j.compag.2020.105709
10. Anakha Venugopal, Aparna S, Jinsu Mani, Rima Mathew, Vinu Williams, 2021, Crop Yield Prediction using
Machine Learning Algorithms, INTERNATIONAL JOURNAL OF ENGINEERING RESEARCH &
TECHNOLOGY (IJERT) NCREIS – 2021 (Volume 09 – Issue 13),
11. S. Vinson Joshua, A. Selwin Mich Priyadharson, R. Kannadasan, A. Ahmad Khan, W. Lawanont et al.,
"Crop yield prediction using machine learning approaches on a wide spectrum," Computers, Materials &
Continua, vol. 72, no.3, pp. 5663–5679, 2022.
12. Lontsi Saadio, Cedric and Adoni, Wilfried Yves Hamilton and Aworka, Rubby and Zoueu, Jérémie
Thouakesseh and Kalala Mutombo, Franck and Kimpolo, Charles Lebon Mberi, Crops Yield Prediction
Based on Machine Learning Models: Case of West African Countries. Available at SSRN:
https://ssrn.com/abstract=4003105 or http://dx.doi.org/10.2139/ssrn.40