Professional Documents
Culture Documents
net/publication/370769944
CITATIONS READS
2 105
4 authors:
32 PUBLICATIONS 140 CITATIONS
Université Ibn Tofail, Faculté des Sciences
44 PUBLICATIONS 187 CITATIONS
SEE PROFILE
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Rachid Ed-Daoudi on 15 May 2023.
ABSTRACT
In Morocco, agriculture is an important sector that contributes to the country’s economy and food security. Accu-
rately predicting crop yields is crucial for farmers, policy makers, and other stakeholders to make informed deci-
sions regarding resource allocation and food security. This paper investigates the potential of Machine Learning
algorithms for improving the accuracy of crop yield predictions in Morocco. The study examines various factors
that affect crop yields, including weather patterns, soil moisture levels, and rainfall, and how these factors can be
incorporated into Machine Learning models. The performance of different algorithms, including Decision Trees,
Random Forests, and Neural Networks, is evaluated and compared to traditional statistical models used for crop
prediction. The study demonstrated that the Machine Learning algorithms outperformed the Statistical models in
predicting crop yields. Specifically, the Machine Learning algorithms achieved mean squared error values between
0.10 and 0.23 and coefficient of determination values ranging from 0.78 to 0.90, while the Statistical models had
mean squared error values ranging from 0.16 to 0.24 and coefficient of determination values ranging from 0.76 to
0.84. The Feed Forward Artificial Neural Network algorithm had the lowest mean squared error value (0.10) and
the highest R² value (0.90), indicating that it performed the best among the three Machine Learning algorithms.
These results suggest that Machine Learning algorithms can significantly improve the accuracy of crop yield pre-
dictions in Morocco, potentially leading to improved food security and optimized resource allocation for farmers.
Keywords: crop yield prediction; machine learning algorithms; statistical models; model evaluation.
392
Journal of Ecological Engineering 2023, 24(6), 392–400
& Burke, 2010]. As a result, there is a growing in- will summarize the key findings and provide rec-
terest in exploring alternative approaches for crop ommendations for the use of ML algorithms for
yield prediction that can incorporate multiple fac- crop yield prediction in Morocco.
tors and improve the accuracy and scalability of
yield estimates.
In recent years, advancements in technol- METHODOLOGY
ogy and data collection methods have provided
an opportunity to improve crop yield predictions In this section, we outline the methodology
using Machine Learning (ML) algorithms [Me- used to collect and analyze data to evaluate the
renda et al., 2020]. ML algorithms can analyze performance of ML algorithms for crop yield pre-
large amounts of data, identify patterns, and diction in Morocco for local crops such as wheat,
make predictions based on the relationships be- apples, dates, almonds, and olives. The following
tween different variables. Several studies have steps were taken to carry out the study:
explored the potential of ML algorithms for crop
yield prediction. For instance, [Cao, et al., 2021] Data collection
investigated the use of machine learning models
for predicting rice yield in China, and found that Data was collected by “The Regional Agri-
the models outperformed traditional statistical cultural Development Office (RAD)” in Ouarza-
models. Similarly, Studies by Drummond et al. zate Morocco, i.e. the “Office Régional de Mise
[2003] Ruß & Kruse, [2009], Bocca & Rodrigues en Valeur Agricole de Ouarzazate (ORMVAO)”
[2016] and Fortin et al. [2011] have compared [ORMVAO, 2023]. The office is responsible for
traditional statistical models with artificial neural implementing policies and programs aimed at im-
networks (ANNs) for improved crop yield mod- proving the agricultural sector in the southeastern
eling. These studies suggest that ML algorithms region of Morocco. Its involvement in Research
have the potential to significantly improve crop and Development (R&D) aims to improve pro-
yield prediction in different regions [Chlingaryan ductivity, sustainability, and economic develop-
et al., 2018]. ment in the agricultural sector.
However, while some studies have investigat- The data was collected using various sensors
ed the use of ML algorithms for crop yield pre- such as yield monitors, soil moisture sensors, and
diction in other regions, there is a research gap in weather stations. The data includes information
understanding their effectiveness for local crops on weather patterns, soil moisture levels, rainfall,
in Morocco [Idrissi & Nadem, 2022]. Therefore, and crop yields for the past several years. The
in this paper, we aim to investigate the potential data was collected in a consistent and standard-
of ML algorithms for improving the accuracy of ized format to ensure its accuracy and reliability.
crop yield predictions for local crops in Morocco. Temperature and rainfall data are collected from
Specifically, we will evaluate the performance historical climate data from the National Oce-
of different ML algorithms, including Decision anic and Atmospheric Administration (NOAA)
Trees, Random Forests, and Neural Networks, [NOAA, 2023] and World Meteorological Orga-
and compare their results to traditional statistical nization (WMO) [WMO, 2023]. Altogether, the
models used for crop prediction. By doing so, this dataset we employed contained 38 records and 4
study seeks to fill the research gap and provide in- features of the years between 1984 and 2022.
sights into the feasibility of using ML algorithms
for crop yield prediction in Morocco. Data pre-processing
This paper is organized as follows. The in-
troduction provides background information on The collected data was pre-processed to re-
Morocco’s agriculture sector and the importance move any missing values, outliers, and irrelevant
of crop yield prediction, while the methodology information. Missing values were imputed using
section outlines the steps taken to collect and a mean, median, or mode method depending on
analyze data using ML algorithms. The results the distribution of the data. Outliers were identi-
section will present the performance evaluation fied using a box plot and removed using the In-
of ML algorithms and comparison to traditional terquartile Range (IQR) method, where values
statistical models, and the discussion section will outside the range of and were considered outliers
interpret and analyze the findings. The conclusion and removed. The removal of outliers was based
393
Journal of Ecological Engineering 2023, 24(6), 392–400
on the principle that they can significantly affect previous studies on crop yield prediction [De-
the accuracy of the prediction model. The remain- wangan et al., 2022].
ing data was then used for further analysis. Data To further justify the selection of Decision
normalization was also performed to ensure that Trees, Random Forests, and Neural Networks
all variables had similar ranges and distributions for our study, we based our decision on a review
[Kotsiantis & Kanellopoulos, 2006]. of the literature. Previous studies have demon-
Crop yield data structure is described in Ta- strated the effectiveness of these algorithms in
ble 1. This Table give an overview of the crop predicting crop yields by handling complex rela-
tionships between variables [Nigam et al., 2019].
yields for the last three years in the study area.
Decision Trees, for example, are known for their
Table 2 summarizes the result of the features
ability to handle both categorical and continuous
data explanatory analysis. The Table 2 presents
data and can provide clear visualization of deci-
the statistical summary of monthly temperature, sion rules [Kamath et al., 2021]. Random Forests
monthly soil moisture, and monthly rainfall over have been shown to be effective in dealing with
the course of 455 entries. noisy and high-dimensional data, and are able
to produce accurate predictions by combining
Algorithm selection multiple decision trees [Li et al., 2010]. Neural
Networks have also been shown to be effective
Three ML algorithms, Decision Trees (DT), in handling complex relationships between vari-
Rrandom Forests (RF), and Neural Networks ables and have been successfully used in crop
(NN), were selected for this study [Nigam et yield prediction models [Dewangan et al., 2022].
al., 2019]. These algorithms were chosen for Therefore, we believe that these three algorithms
their ability to handle complex relationships are suitable for predicting crop yields in our
between variables and their performance in study area and can provide valuable insights for
394
Journal of Ecological Engineering 2023, 24(6), 392–400
395
𝑛𝑛𝑛𝑛
Journal of Ecological Engineering
1 2023,
2 24(6), 392–400
��𝑌𝑌𝑌𝑌𝑖𝑖𝑖𝑖 − 𝑌𝑌𝑌𝑌�𝑖𝑖𝑖𝑖 �
𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀 =
𝑛𝑛𝑛𝑛
Y – the actual 𝑖𝑖𝑖𝑖=1
yield; 4. Sensitivity analysis – a sensitivity analysis was
Ŷ – the predicted yield. performed to determine the impact of changes
The R² was calculated using the formula: in individual variables on the overall perfor-
2 mance of the ML algorithms [Liu et al., 2021].
∑�𝑌𝑌𝑌𝑌𝑖𝑖𝑖𝑖 − 𝑌𝑌𝑌𝑌�𝑖𝑖𝑖𝑖� This analysis helped to identify the most im-
𝑅𝑅𝑅𝑅² = 1 − � � (2)
∑(𝑌𝑌𝑌𝑌𝑖𝑖𝑖𝑖 − 𝑌𝑌𝑌𝑌�)2 portant variables affecting crop yields and their
relative importance.
where: ∑(Yi − Ŷi)2 – the sum of squared residu-
als, calculated as the sum of the squared The methodology used in this study was de-
differences between each predicted value signed to provide a comprehensive evaluation of
Ŷi and its corresponding actual value Yi; the performance of ML algorithms for crop yield
∑(Yi − Y̅i)2 – the total sum of squares, prediction in Morocco for local crops. The results
calculated as the sum of the squared dif- of this study will provide key informations about
ferences between each actual value Yi and the feasibility of using ML algorithms for crop
the mean of all actual values Y̅. yield prediction in Morocco.
396
Journal of Ecological Engineering 2023, 24(6), 392–400
The results of the study were based on the • multiple linear regression – the MLR model
evaluation of the ML algorithms for crop yield had an MSE of 0.24 and an R² of 0.76 for
prediction in Morocco for local crops such as wheat, 0.22 and 0.78 for apples, 0.20 and 0.80
wheat, apples, dates, almonds, and olives. The al- for dates, 0.18 and 0.82 for almonds, and 0.16
gorithms were evaluated using MSE and R² as the and 0.84 for olives. These results indicate that
evaluation metrics. The results of the evaluation the MLR model was less accurate than the ML
are presented in Table 3. The results presented in- algorithms with higher mean deviations and
dicate that: lower R² values.
• decision trees – the DT algorithm had an MSE • time-series analysis – the TSA had an MSE of
of 0.23 and an R² of 0.78 for wheat, 0.21 and 0.23 and an R² of 0.77 for wheat, 0.21 and 0.79
0.79 for apples, 0.19 and 0.81 for dates, 0.17 for apples, 0.19 and 0.81 for dates, 0.21 and
and 0.83 for almonds, and 0.15 and 0.85 for 0.79 for almonds, and 0.16 and 0.884 for ol-
olives. These results indicate that the DT al- ives. The results show that the TSA performed
gorithm was able to make accurate predictions similarly to the DT algorithm with similar
with a mean deviation of less than 0.23 units mean deviations and R² values.
for the crops.
• random forests – the RF algorithm had an To better explain the results obtained in Table 4,
MSE of 0.19 and an R² of 0.81 for wheat, 0.17 it should be noted that the ML algorithms used
and 0.83 for apples, 0.15 and 0.85 for dates, in this study, are designed to handle complex re-
0.13 and 0.87 for almonds, and 0.11 and 0.89 lationships between variables and make predic-
for olives. These results show that the RF al- tions based on patterns in the data. On the other
gorithm performed better than the DT algo- hand, the traditional statistical models used,
rithm with lower mean deviations and higher rely on specific response functions between
R² values. yields and independent variables, and may not
• neural network – the FFANN algorithm had an capture the nonlinear relationships between the
MSE of 0.18 and an R² of 0.82 for wheat, 0.16 variables affecting crop yields. Therefore, the
and 0.84 for apples, 0.14 and 0.86 for dates, superior performance of ML algorithms over
0.12 and 0.88 for almonds, and 0.10 and 0.90 traditional statistical models in predicting crop
for olives. The results show that the FFANN yields can be attributed to their ability to cap-
algorithm performed the best among the three ture the complex interactions between various
algorithms with the lowest mean deviations factors affecting crop yields, which may not be
and the highest R² values. captured by traditional statistical models. Ad-
ditionally, it should be noted that the perfor-
The results of the comparison between the mance of the models may also depend on the
ML algorithms and the traditional statistical mod- quality and structure of the input data, as well
els are shown in Table 4. The results presented as the specific architecture and parameters of
indicate that: the ML models used in the study.
397
Journal of Ecological Engineering 2023, 24(6), 392–400
The results of the sensitivity analysis are ML algorithms for comparison. Additionally,
shown below: our sensitivity analysis demonstrates the most
• wheat – the most important variables affect- important variables affecting crop yields for
ing crop yields for wheat were found to be wheat, apples, dates, almonds, and olives in the
temperature, rainfall, and soil moisture levels. context of Morocco. By improving the accura-
Changes in these variables had a significant cy of crop yield predictions, farmers can make
impact on crop yields, with rainfall having the informed decisions regarding resource alloca-
greatest impact. tion [Evans & Sadler, 2008], such as water and
• apples – the most important variables affect- fertilizer use, which can lead to increased crop
ing crop yields for apples were found to be yields and improved food security [Fan et al.,
temperature, soil moisture levels, and rainfall. 2012]. In addition, the results of the sensitivity
Changes in these variables had a significant analysis provide Significant findings about the
impact on crop yields, with rainfall having the most important variables affecting crop yields
greatest impact. and their relative importance. These results can
• dates – the most important variables affecting be used to optimize resource allocation and
crop yields for dates were found to be tem- improve food security in Morocco [Balaghi et
perature, soil moisture levels, and rainfall. al., 2008]. For example, the results showed that
Changes in these variables had a significant temperature was the most important variable
impact on crop yields, with temperature hav- affecting crop yields for wheat, apples, dates,
ing the greatest impact. It’s important to note almonds, and olives. By considering this infor-
that date palm trees exhibit a moderate degree mation, farmers can implement management
of alternation, meaning that a greater yield in practices that can mitigate the effects of tem-
a given year is frequently succeeded by a re-
perature on crops. For example, they can adjust
duced yield in the next year.
planting and harvesting times, use irrigation to
• almonds – the most important variables affect-
regulate soil moisture, and apply fertilizers and
ing crop yields for almonds were found to be
other soil amendments to improve soil health
temperature, soil moisture levels, and rainfall.
and nutrient availability [Wang & Hooks,
2011]. These management practices can help to
ensure that crops are able to tolerate and thrive
DISCUSSION in different temperature conditions, which can
The results of the study show the effective- ultimately lead to improved crop yields and
ness of ML algorithms in predicting crop yields food security [Dhankher & Foyer, 2018].
for local crops such as wheat, apples, dates, al- Our study uses well-established methods
monds, and olives in Morocco. The evaluation of for data pre-processing, algorithm selection,
the algorithms was performed using MSE and R² and evaluation, ensuring the reliability and
as the evaluation metrics. The results showed that validity of our findings. By demonstrating the
the FFANN algorithm performed the best among potential of ML algorithms for crop yield pre-
the three algorithms with the lowest mean devia- diction in Morocco, this study fills an existing
tions and the highest R² values. research gap and provides a basis for further
These results contribute to the growing body research in this area.
of literature on the use of ML algorithms for crop In terms of innovation, our study is one of
yield prediction, providing valuable insights the first to evaluate the effectiveness of mul-
into the potential of these methods in addressing tiple ML algorithms for crop yield prediction
food security and resource allocation challenges in Morocco. Therefore, our study can serve as
in Morocco and beyond. In comparison to tradi- a starting point for further research in this area,
tional statistical models, such as Multiple Linear leading to more accurate and effective methods
Regression and Time-series Analysis, our study for crop yield prediction in the region. By con-
highlights the superiority of ML algorithms in tinuing to refine and improve these algorithms,
this context. the accuracy of crop yield predictions can be
While similar studies have been conducted further improved [Sharma, et al., 2020], which
in other regions, our study is unique in its focus can lead to increased food security and im-
on local crops in Morocco and its use of multiple proved resource allocation.
398
Journal of Ecological Engineering 2023, 24(6), 392–400
399
Journal of Ecological Engineering 2023, 24(6), 392–400
19. Kotsiantis, S.B., Kanellopoulos, D.N. 2006. As- 27. Zaefizadeh, M., Khayatnezhad, M., Gholamin, R.
sociation rules mining: A recent overview. GESTS 2011. Comparison of multiple linear regressions
International Transactions on Computer Science and (MLR) and artificial neural network (ANN) in pre-
Engineering, 32(1), 71–82. dicting the yield using its components in the hulless
20. Nigam, A., Garg, S., Agrawal, A., Agrawal, P. 2019. barley. American-Eurasian Journal of Agricultural
Crop yield prediction using machine learning al- & Environmental Science, 10(1), 60–64.
gorithms. In 2019 Fifth International Conference 28. Liu, L.W., Hsieh, S.H., Lin, S.J., Wang, Y.M., Lin,
on Image Information Processing (ICIIP) IEEE, W.S. 2021. Rice blast (Magnaporthe oryzae) occur-
125–130. rence prediction and the key factor sensitivity analy-
21. Kamath, P., Patil, P.S., Sushma S., Sowmya, S. sis by machine learning. Agronomy, 11(4), 771.
2021. Crop yield forecasting using data mining. 29. Evans, R.G., Sadler, E.J. 2008. Methods and tech-
Global Transitions Proceedings, 2(2), 402–407. nologies to improve efficiency of water use. Water
22. Li, H.B., Wang, W., Ding, H.W., Dong, J. 2010. Resources Research, 44(7), Jul.
Trees weighting random forest method for classify- 30. Fan, M., Shen, J., Yuan, L., Jiang, R., Chen, X.,
ing high-dimensional noisy data. In 2010 IEEE 7th Davies, W.J., Zhang, F. 2012. Improving crop pro-
International Conference on E-Business Engineer- ductivity and resource use efficiency to ensure food
ing (ICEBE), IEEE, 160–163. security and environmental quality in China. Journal
23. Dewangan, U., Talwekar, R.H., Bera, S. 2022. Sys- of Experimental Botany, 63(1), 13–24. https://doi.
tematic Literature Review on Crop Yield Predic- org/10.1093/jxb/err248
tion using Machine & Deep Learning Algorithm. In 31. Balaghi, R., Tychon, B., Eerens, H., Jlibene, M.
2022 5th International Conference on Advances in 2008. Empirical regression models using NDVI,
Science and Technology (ICAST) Mumbai, India, rainfall and temperature data for the early predic-
654–661. tion of wheat grain yields in Morocco. International
24. Dahikar, S.S., Rode, S.V. 2014. Agricultural crop Journal of Applied Earth Observation and Geoinfor-
yield prediction using artificial neural network ap- mation, 10(4), 438–452.
proach. International journal of innovative research 32. Wang, K.H., Hooks, C.R. 2011. Managing soil
in electrical, electronics, instrumentation and con- health and soil health bioindicators through the
trol engineering, 2(1), 683–686. use of cover crops and other sustainable practices.
25. Brownlee, J. 2021. How to Prepare Data Chapter, 4, 1–18.
For Machine Learning. Retrieved from 33. Dhankher, O.P., Foyer, C.H. 2018. Climate resilient
h t t p s : / / m a c h i n e l e a r n i n g m a s t e r y. c o m / crops for improving global food security and safety.
how-to-prepare-data-for-machine-learning/ Plant, Cell & Environment, 41, 877–884. https://doi.
26. Haque, F.F., Abdelgawad, A., Yanambaka, V.P., org/10.1111/pce.13207
Yelamarthi, K. 2020. Crop Yield Analysis Us- 34. Sharma, A., Jain, A., Gupta, P., Chowdary, V. 2020.
ing Machine Learning Algorithms. In 2020 IEEE Machine learning applications for precision agri-
6th World Forum on Internet of Things (WF-IoT) culture: A comprehensive review. IEEE Access, 9,
IEEE, 1–2. 4843–4873.
400