Professional Documents
Culture Documents
Ayesha Asad…………L17-4203
Eiman Waheed………L17-4202
Salman Ahmed…………L17-4286
Title: Forecasting the yield using biomass calculated from satellite images.
is the sole contribution of the author(s) and no part hereof has been reproduced on as it is basis
(cut and paste) which can be considered as Plagiarism. All referenced parts have been used to
argue the idea and have been cited properly. I/We will be responsible and liable for any
consequence if violation of this declaration is determined.
Date: 25-06-2021
Student 1
Name: Ayesha Asad
Signature:
Student 2
Name: Eiman Waheed
Signature:
Student 3
Name: Salman Ahmed
Signature:
Table of Contents i
Table of Contents
Table of Contents .................................................................................................................... i
List of Tables .........................................................................................................................ii
List of Figures ...................................................................................................................... iii
Abstract ...................................................................................................................................... 1
Introduction .............................................................................................................. 2
Goals and Objectives ....................................................................................................... 2
Scope of the Project ......................................................................................................... 2
Definitions, Acronyms, and Abbreviations ..................................................................... 3
1.3.1 Abbreviations ............................................................................................................ 3
Literature Survey / Related Work ............................................................................ 4
Yield Prediction using Machine Learning Algorithms .................................................... 4
2.1.1 Yield Prediction using Multiple Linear Regression.................................................. 4
2.1.2 Yield Prediction using Neural Network .................................................................... 5
2.1.3 Yield Prediction using Support Vector Machine ...................................................... 7
2.1.4 Yield Prediction using Random Forest ..................................................................... 7
Hyperspectral Imaging ..................................................................................................... 8
2.2.1 Vegetation Indices derived from hyperspectral images ............................................ 9
2.2.2 Machine Learning for Remote Sensed Data ........................................................... 10
Literature Review Summary Table ................................................................................ 13
Requirements and Design ...................................................................................... 19
Functional Requirements ............................................................................................... 19
Non-Functional Requirements ....................................................................................... 19
3.2.1 Reusability .............................................................................................................. 19
3.2.2 Reliability................................................................................................................ 19
3.2.3 Extensibility ............................................................................................................ 19
3.2.4 Performance ............................................................................................................ 19
3.2.5 Robustness .............................................................................................................. 19
Hardware and Software Requirements .......................................................................... 20
3.3.1 Hardware Requirements.......................................................................................... 20
3.3.2 Software Requirements ........................................................................................... 20
System Architecture ....................................................................................................... 21
3.4.1 Architecture Diagram.............................................................................................. 21
3.4.2 System Modules ...................................................................................................... 21
Yield Estimation ............................................................................................................ 22
Implementation ...................................................................................................... 24
Implementation .............................................................................................................. 24
4.1.1 Data collection ........................................................................................................ 24
4.1.2 Preprocessing of Data ............................................................................................. 26
4.1.3 Building and Training Machine Learning Model ................................................... 28
Experimental Results and Analysis ........................................................................ 34
Validation of LAI prediction using Random Forest Regression ................................... 34
Validation of Yield prediction using LAI by Random Forest Regression..................... 34
Conclusion.............................................................................................................. 35
References ................................................................................................................................ 36
List of Tables ii
List of Tables
Table 1: Work done on remote sensing data ............................................................................ 13
List of Figures iii
List of Figures
Figure 1: MLR equation............................................................................................................. 4
Figure 2: Predictor Variables Combinations.............................................................................. 5
Figure 3: Relationship between NDVI and Yield ...................................................................... 5
Figure 4: Artificial Neural Network .......................................................................................... 6
Figure 5: crop prediction results ................................................................................................ 7
Figure 6: Yield graph ................................................................................................................. 8
Figure 7: List of ML algorithms ................................................................................................ 8
Figure 8: Schematic Diagram of 3D-CNN .............................................................................. 10
Figure 9: Setup of the Fourier transform ................................................................................. 11
Figure 10: System Architecture ............................................................................................... 21
Figure 11: List of Vegetation Indices ...................................................................................... 23
Figure 12: Rahim Yar Khan Area ............................................................................................ 24
Figure 13: GeoJSON coordinates ............................................................................................ 25
Figure 14: Data collection in EO browser ............................................................................... 26
Figure 15: Data Layer in QGIS ................................................................................................ 27
Figure 16: Point Sampling Tool............................................................................................... 28
Figure 17: Loading Data Module ............................................................................................. 29
Figure 18: Preprocessing Data Module.................................................................................... 29
Figure 19: Training Data Module ............................................................................................ 30
Figure 20: Output module ........................................................................................................ 30
Figure 21: Training Data Module ............................................................................................ 31
Figure 22: Output module ........................................................................................................ 31
Figure 23: Loading Data Module ............................................................................................. 32
Figure 24: Preprocessing Data Module.................................................................................... 32
Figure 25: Training Data Module ............................................................................................ 33
Figure 26: Output Module ....................................................................................................... 33
Forecasting the yield using biomass calculated from satellite images 1
Abstract
Population growth has typically increased the need for beforehand planning of crop production
and forecasting the yields of crops using biomass. Crop yield is mandatory, predominantly in
those countries where agriculture is their main source of economy. These predictions help in
estimating the reduction of crop yields so that an effective import and export system is
produced. The procedure to estimate the yield before the crops are harvested using satellite
remote sensing techniques is very important these days. Our project is mainly focused on how
much yield a crop will produce using different vegetation indices with the help of remote
sensing techniques through satellite images. For this purpose, we evaluated different techniques
and scenarios, calculated different time series data for the purpose of finding vegetation indices
which will further be used in estimating correct yield. Finding vegetation indices and their
approximate values to predict the future crop yield is one of the best methods to assess plants
future yield. We used sentinel-2 satellite for our data collection of Vegetation Indices.
Vegetation Index in remote sensing simple explains about the vegetation biomass for every
pixel in the remote sensing technique. The indices are obtained with the help of various spectral
bands reflectance. In this regard, we calculated Normalized Difference Vegetation Index
(NDVI), which tells about the greenness in a plant and helps in estimating the yield. Moreover,
we also calculated Leaf Area Index (LAI) with the help of NDVI. Leaf Area Index explains
about the leaf area per unit ground area and it is obtained using SNAP toolbox which is the
efficient software for processing satellite data. After the implementation of the required
methodology, results are achieved using Linear Regression and Random Forest Regression for
predicting the yield.
Introduction 2
Introduction
Agriculture is an important economic benefitting profession in Pakistan and most of the
country’s economy is dependent on agriculture. For increasing knowledge and productivity,
we have to change the conventional agricultural work by introducing the means of forecasting
the crop yield so that planners and administrators can formulate the budget and policies
beforehand, which will regulate the economy of the country [1, 2]. The economy of our country
is stressed and it is the need of hour to allot the budget effectively without making a waste and
this prediction of crop yield will result in immediate advances of wealth. Pakistan is one of the
largest producers of different crops and most of the people’s income depends on this profession
and with the beforehand knowledge of yield, the country can make a lot of profit. Hence,
strategy of developing economic benefitting applications by using computers is advantageous
for the economic aspects of country.
Our project facilitates the administration and decision makers to formulate the policies and
budget by providing them with the application, which can forecast the crop yield by real time
data using remote sensing imagery. Recent advancement in technology has paved the way for
developing smart agriculture in the outside world. Before the harvest of crops, yield can be
computed between different intervals of time on the ground rules of Artificial Intelligence.
Satellite remote sensing devices offer an exclusive outlook on the condition and active changes
occurring in land, coastline, and oceanic ecosystems. This application will make use of remote
sensing imagery, which will predict the crop yield using different regression models, which
will keep track of the increment in plant’s biomass from the initial stage of growth to its
maturity stage.
The first section is about the detailed introduction of the project which includes goals and
objectives and scope of the project. Second chapter is about the detailed literature survey. Third
chapter is about the requirements and design which dives the detailed methodology of our
project. Fourth chapter includes implementation of the work done so far.
technique which will be used by us. Some other techniques will also be used that will be
decided later.
1.3.1 Abbreviations
The most commonly used abbreviations in our document specified are:
• LAI which is the abbreviation for Leaf Area Index which tells the active surface area
covered by plant and is derived as leaf area per ground surface area.
• NDVI which is the abbreviation for Normalized Difference Vegetation Index which
tells the green matter in the plant measuring through the photosynthesis absorbed by
plant.
• EVI which is the abbreviation for Enhanced Vegetation Index which is used to enhance
and increase the vegetation in a plant.
• PAR which is the abbreviation for Photosynthetically Active Radiation which means
that it gives the information about the amount of light currently available for plant for
photosynthesis.
Literature Survey / Related Work 4
Dataset used
The dataset used in experimentation was derived from sentinel-2 having 11-12 years of
information which was further broken down into training and testing data [16]. The two types
of vegetation indices have been used derived from this data which were NDVI and PAR and
preprocessed using sensors like AVHRR.
The MLR model has been trained using different combinations of predictor variables against
the dependent variable which are shown below in figure 2.
Forecasting the yield using biomass calculated from satellite images 5
Normalized Difference Vegetation Index (NDVI) is an effective crop yield estimation tool and
there is found to be the direct relationship between crop yield production and NDVI as NDVI
value is the representation of yield level against each single pixel [8]. Spectral vegetation
indices are used because spectral data is easily available anywhere in the world and these
indices are normalized difference vegetation index (NDVI) and the enhanced vegetation index
(EVI) rather using climatic variables [9]. Researchers have been working on using these indices
and the results obtained are satisfactory which were tested on various samples ranging from
various seasons. Normalized Difference Vegetation Index is one of the tools used for crop yield
prediction and scanner for crop growth which is being used in various countries and crop
growth is the day to day need for sound planning and budgeting in various countries.
yield estimation and prediction has been done using various techniques used before which
includes remote sensing imagery and an extensive research has contributed to this fact that due
to various reasons, remote sensing imagery was not fulfilling completeness in achieving
efficient goals of prediction because there are number of reasons included due to environmental
limitations like weather, soil moisture and rooting zone which affects the average crop growth
cycle [4]. For this purpose, researchers have been working on collecting more information for
getting efficient results and for the prediction purpose the term Leaf Area Index (LAI) was
coined which measures plant’s canopy [3]. As the time passes, LAI value changes which is the
indication of plant growth and hence various crop growth models can be used to estimate the
plant’s biomass [5].
Dataset used
Dataset used in this experimentation was downloaded from in collaboration with Bangladesh
bureau of Statistics (BBS) ranging from 8 to 10 years having 2000 plus entries [18]. Data was
preprocessed and split into categorical attributes having multiple table values. Data was divided
into training and testing dataset and after the training, yield prediction is achieved.
Neural Network
The ANN model used in this experimentation is multi-layer perceptron. The satellite data is
advantageous if used for yield estimation due to its temporal resolution so NDVI is often used
because it is based on visible and infrared regions and also Enhanced Vegetation Index (EVI)
is used in various models for prediction. Researchers have collected samples to be used in the
statistical model for prediction from selected regions of world such as Argentina and the effect
of NDVI on the plant growth has been noticed immediately to the date and it has been
concluded that increase in the efficiency is achieved by 0.1% which means the increase in yield
production estimation [11]. The NN model has one layer and is built into Bagging Algorithm
using the bagging blocks for correctly training and testing the output as shown in figure 4.
Yield forecast and checking the condition of agriculture is critical due to the serious monetary
and social results of food lack. This exploration fundamentally plans to decide the season when
oat grain yield can be definitely anticipated utilizing the NDVI area of Central Europe. The
indication of this period is significant in light of the fact that the consequences of different past
investigations are not steady and show various occasions as urgent. To contemplate this, quite
a while range, with a few days' span, was dissected, from late-winter to the beginning of the
assortment of most grain species. This grants to anticipate the grain yield of oats with higher
precision and, if possible, a long time before the harvest.
crop model and this model took crop type as an input variable rather making stretchy long
variables [14]. The model had a coverage of dataset having 100m spectral resolution. the results
were fulfilling and thus this model can be easily used in other agricultural sectors as the
predicted and output yield is shown in the figure 6.
Different machine learning techniques have been developed and plant samples have been
collected keeping in view the monthly production for various seasons in three fields. For each
field, ten to twelve sample points have been withdrawn and results were predicted using LAI
equation [4]. The two most widely used machine learning algorithms are MLR and NN as
shown in figure 7.
Hyperspectral Imaging
Remote sensing helps to gather quality information about a process or an object, skipping the
direct physical contact with that particular object or area under observation specifically on
Earth. Remote sensing is used in various fields like geography, ecology, meteorology etc.
Remote sensing, now a days is majorly considered as satellite based or air-based technology
that helps us to identify and study specific geographical areas and the distant object traversing
using precise algorithms and also measure the environment around us based on the signals
produced [3]. Many scientists have acknowledged imaging spectrometry or hyperspectral
imaging in the field of science. Before the scientific community embraced it, the technique
Forecasting the yield using biomass calculated from satellite images 9
made advancements in the fields of electronics, computing and software in the era of 1980s to
1990s [4].
the green substance in leaves are visible region of the electromagnetic spectrum with
wavelength 400-700 nm, plants show high reflectance values in near infrared (700-1300 nm)
which give information about structural properties and biomass. PAR uses wavelengths
ranging 400-700 nm and it tells the amount of light absorbed by plants for photosynthesis. In
the past, models used for crop production and yield required a gauge of covering leaf area index
or absorption of radiation; nonetheless, direct estimation of LAI or light assimilation can be
dreary and tedious. The object of this investigation was to create connections between
photosynthetically dynamic radiations (PAR) consumed [12]. Assessments of crop
development and along these lines yield forecasts are mistaken for inhabitable developing
conditions. Plant factors are deducted using optical remote sensing which additionally assume
a significant part during the time spent harvest development. Through LAI and APAR, optical
remote detecting can give efficient results. The genuine status of rural harvests during the
developing season, consequently offering the chance of aligning the development displaying
[12]. Net increment in crop dry-matter in non-stress conditions can be demonstrated by usually
applied procedure that accepts that the measure of dry plant biomass created which is relative
to the intercepted photo synthetically active radiation (IPAR). The incline of this relationship
or 'radiation-use proficiency' is regularly thought to be consistent for crop species [11].
Sentinel-2 mission gives us a completely new outlook at assessing the crop development and
yield estimation with the help of remote sensing techniques [18]. This was used for assessing
the cotton production beforehand in US. BEPS ecosystem prototype helped in increasing cotton
gross main yields. Texas and Georgia used sentinel-2 data for this chief purpose. The ecology
model was obtained from Leaf Area Index information that was gained accurately from using
Sentinel-2 for research and other applications. The replicated GPP values of 20-m net
arrangement were combined at a regional level that was 17 states as a whole. The contrast
between the two showed 85% of different cotton productions. The testing suggests that the
expansion of Sentinel-2 Leaf Area Index period successions with the biological system model
makes a lot of beneficial cotton yields or creations [18].
LAI
Ittai
assessment of Spectral data
Herrmann, LAI is
wheat and was collected LAI is achieved
Agustin spectrally
1. potato crops 2011 by VENus and using red edge
Pimstein, achieved
by VENus Sentinel-2 spectral bands.
Karnieli, hgigh.
and Sentinel- bands
Coehn
2 bands
NDVI-LAI
inversion models
M.
were generated
Retrieving Aboelghar, LAI field
Algorithm which give
leaf area S. Arafat, measurements
predicted LAI accuracy for three
index from A.Saleh, collected
3. 2010 with 95% different rice
SPOT4 S.Naeem, through LAI-
confidence for varieties. The
satellite data, M. plant canopy
rice varieties. models generated
[1] Shirbeny, analyzer device
are empirical
A. Belal
models which are
limited to area.
Literature Survey / Related Work 14
Leaf area
index from God
CHRIS AM Data from relationship LAI is successfully
satellite data Smith, Compact High between estimated using
2. and Nadeau, J. 2005 Resolution ground based CHRIS satellite
applications Freemantle, Imaging and remote data which is used
in plant yield Hans Wehn Spectrometer sensing in crop modelling.
estimation, derived LAI.
[2]
Retrieving
leaf area
index using Forest tree Accurate and
Direct and indirect
remote Zheng, species data reliable leaf
4. 2009 methods are used
sensing: Moskal obtained from area index is
for obtaining LAI.
theories, remote sensing achieved.
methods and
sensors, [4]
Comparing
methods for
Satisfactory
estimating Multi angular
221 samples performance
leaf area He, Ren, remote sensing is
were used was achieved
index by Wang, Liu, used which
5. 2020 which were with
multi angular Zhang, Liu, included four
partitioned into coefficient of
remote Feng, Guo methods for
two databases. determination
sensing in estimating LAI.
>0.72.
winter wheat,
[5]
Forecasting the yield using biomass calculated from satellite images 15
Remotely
sensed rice Regression models
yield are developed for
Remote sensed
prediction Predicted yield NDVI estimation
data of rice
using multi- Huang, of rice was for rice yield
6. 2013 crops collected
temporal Wang significantly estimation given
in five growing
NDVI data high. the area of
seasons.
derived from production in five
NOAA’s- growing seasons.
AVHRR, [6]
The accuracy
of crop Multi resource data
Yield estimation based estimation
prediction Field and model is model is developed
with machine meteorological achieved and machine
Alireza
7. learning 2020 data collected highest when learning algorithm
Sharifi
algorithms from Sentinel- the time gave satisfactory
and satellite 2. interval results with three
images, [7] between crop evaluation
and harvest is indicators.
smaller.
Normalized
difference
Sultana,
vegetation Split plot design
Ali,
index as a Nitrogen was used as a
Ahmad, Soil samples
tool for wheat treatment gave model to
8. Mubeen, 2014 were used for
yield maximum experiment keeping
M. Zia-ul- the experiment.
estimation: A NDVI value. in view the
Haq,
case study for nitrogen rates.
Ahmad
Faisalabad,
Pakistan, [8]
Analysis of
relationship
between
cereal yield Remote sensing Linear regression
Strong
and NDVI for data for our model was
correlation
selected Panek, vegetation developed to study
9. 2020 NDVI and
regions of Gozdowski seasons was correlation between
yield was
Central obtained by NDVI and crop
observed.
Europe based MODIS sensor. yield.
on MODIS
satellite data,
[9]
Literature Survey / Related Work 16
Simulating
tropical
forage growth Andrade,
Different empirical
and biomass Santos,
Effective models were
accumulation: Pezzopane,
Local region biomass developed to
10. an overview Araujo, 2015
samples. accumulation predict crop growth
of model Pedreira,
was achieved. and biomass
development Marin,
accumulation.
and Lara
application,
[10]
Maize
Observed
Radiation use Lindquist, Short interval crop
Maize data was values of
efficiency Arkebauer, growth rate method
collected over biomass
11. under optimal Walkers, 2005 was used to obtain
five growing accumulation
growth Cassman, PAR by using two
seasons. and PAR were
conditions, Dobermann methods.
effective.
[11]
Remote sensing
techniques are
making used of
hyperspectral
images more
Different
Overview of Classification widespread.
Wenjing classification
Hyperspectral into different Supervised, semi-
Lv, methods to be
12. Image 2020 types and sub supervised and
Xiaofei combined for
Classification, types of unsupervised
Wang hyperspectral
[12] learning classifications have
images
sub categories to
represent different
techniques for
hyperspectral
images.
A 3-D neural
A new deep network is used for
convolutional the efficiency of
Three-
neutral Reduction of CNN and mirror
dimensional
network for computation strategy is used for
Paoletti, network using
13. fast 2018 time and clearly processing
Haut, Plaza both spatial and
hyperspectral increased the image border.
spectral
image efficiency This model proved
information
classification, to be better than
[13] different ANN
techniques.
Forecasting the yield using biomass calculated from satellite images 17
Proved to be better
than current
Hyperspectral Jin, Hui, spectrometry
imaging using Wang, approaches with a
Accurately
the single- Huang, Shi, Single-pixel wide spectral range
14. 2017 constructed
pixel Fourier Ying, Liu, technique. of 400-1100nm and
image.
transform, Qing Ye, spectral resolution
[14] Zhou, Tian of 1nm with less
measurement data
up to 6.25%.
Sentinel-2 A+B is
Remote
enhanced version
Sensing For
Better spatial of Sentinel-2 with
Precision Segarra,
and spectral drastically
Agriculture: Luisa
Sentinel-2 A+B resolution and improved
15. Sentinel-2 Buchaillot, 2020
satellite wide agricultural
Improved Araus, C.
application capabilities, biotic
Features And Kefauver.
ranges. and abiotic factors
Applications,
and higher
[15]
resolution.
Cotton Yield
Estimate GPP values were
Using replicated to the
Sentinel-2 Liming He, Sentinel-2 country level at
85% variation
16. Data and an Georgy 20196 biophysical 20m grid spacing.
in cotton yield
Ecosystem Mostovoy data. Sentinel-2 provides
Model over reliable estimation
the Southern of cotton yields.
US, [16]
Monitoring
urban growth
Information on
and land use Built up area
urban growth, land
change increased from
use and land cover
detection with Land use 26 to 255km^2
is very effective for
GIS and Hegazy, change more than 30%
17. 2015 local govt and
remote Kaloop detection by and
urban planners for
sensing GIS. agricultural
enhanced
techniques in land reduced
developmental
Daqahlia by 33%
plans.
governorate
Egypt, [17]
Literature Survey / Related Work 18
Surveillance
The use of remote
of Arthropod
Status of sensing techniques
Vector-Borne
Kalluri, Satellite data remote sensing to map vector-
Infectious
Gilruth, and studies of borne diseases has
18. Diseases 2007
Rogers, epidemiological arthropod changed
Using Remote
Szczur data vector-borne\ meaningfully over
Sensing
diseases. the previous 25
Techniques,
years.
[18]
Forecasting the yield using biomass calculated from satellite images 19
Functional Requirements
The functional requirements are listed below which will constitute the system.
• System will be able to get the required sentinel data by first achieving the GeoJSON
coordinates using JavaScript.
• System will be able to preprocess the data using SNAP toolbox which will build the
raster images to be further used in vegetation indices calculation.
• System will be able to get TIFF format LAI and NDVI files.
• System will be able to get the corresponding NDVI and LAI values using the spatial
analyst toolbox in ArcMap.
• System will be able to validate the data using regression model having LAI and NDVI
values.
• System shall be able to fetch the predicted results.
Non-Functional Requirements
The non-functional requirements are listed below which will aid in efficient working of system.
3.2.1 Reusability
System will be reusable in a way that it provides consistency and workable environment every
time user uses it. The system components should be modular enough to produce correct results.
3.2.2 Reliability
System will be reliable and provide immediate and correct results for each type of crop and
will provide reliable results. System will be self-explanatory which provides well
understanding for user.
3.2.3 Extensibility
System will be extensible for researchers so that anyone can contribute in the other researches
of agriculture sector. System will be able to provide well oriented interface so that researchers
benefit from it.
3.2.4 Performance
System will provide accurate results and will have the high performance on any kind of crop
data. System will be able to run efficiently on any kind of computer system so that crop data is
extracted well and perform well.
3.2.5 Robustness
System will be able to process data efficiently without the loss of any important information
and system shall be unaffected if any damage occurs.
Requirements and Design 20
System Architecture
3.4.1 Architecture Diagram
SNAP software which has the built-in functionality of producing the result in GeoTIFF/TIFF
format files.
λ1
NDVIλ1,λ2 = λ2- +λ1 (1)
λ2
Training Model
As the data will be finalized and passed through the above steps, the next step is to correctly
train our model which will be selected using the results from different research papers. The
most widely used model to be used is Regression model which constitutes Machine Learning
algorithm.
Analyze Results
After training our model, results will be analyzed using RMSE, standard deviation and variance
and other different statistical measures which in turn will give us the required information and
it will check the overfitting of our model.
Display Results
After our training and testing data has been passed to our model, and the correct predictions
have been made, system will produce the required result to the console screen so that user
maintains connection with it.
Yield Estimation
This section is about the calibration of yield results as this paper focuses on the crop yield
prediction, which is the step wise procedure which is attained using prediction using linear
regression method. It is significant to implement the remote sensing-based framework for crop
development assessment corresponding with the field approximations of the harvest
biophysical boundaries. This examination plans to apply the rice-developed territories to give
a prediction for the normal yield utilizing Sentinel-2 satellite information. As discussed in
previous chapters, the main methodology includes integrating finding vegetation indices which
are the best estimators for forecasting the yield and with the in-complexity analysis, we have
pointed down to using Leaf Area Index (LAI) as the straight estimator for yield prediction as
it has established to be the highest predictor. The vegetation indices used for crop yield
estimation are:
Forecasting the yield using biomass calculated from satellite images 23
The above figure gives us the evidence about different vegetation indices and their preparations
to be calculated for crop yield estimation. In the previous work, dissimilar indices were used
for yield estimation and by having in complexity analysis of all those indices used, LAI
outpaces all of them and it is also the key variable in plant’s biomass evaluation. The linear
regression models were used to guesstimate which index performs best and after having the
detailed analysis, we chose to use LAI for crop yield estimation. Different articles
recommended that among different vegetation indices, LAI presented best results for yield
prediction which is composed of red and near infrared bands. Linear regression is an approach,
which involves relationship between independent and dependent variables, and it is called
linear regression model, which is used to predict value depending upon some other variable.
So in this paper, we are going to reveal what variables are going to be used and how the yield
is going to be predicted.
Implementation 24
Implementation
Following are the details of how we gathered our data, processed it and used certain algorithm
to see the outputs.
Implementation
Following is our detailed implementation of the work we have done so far and the further
section elaborates the detailed explanation of our work.
The method is actually about how to obtain the at all sensed data and then how to smear
regression analysis on that data to get the accurate yield results. The rice and cotton yield data
has been deliberated for this report and it is collected from the official Sentinel Hub website
which ensures the real time data collection. For that, there are different ways on how to acquire
data based on agricultural indices which are various including the atmospheric and
environmental factors and these factors give clear indication about the environment being faced
at the crop site. The area which we are targeting is Rahim Yar Khan and the geographical
coordinates are obtained from the GeoJSON which are shown in the image shown below. Once
the Area of Interest (AOI) is marked the data is collected using the time filter available on the
website and the real time data is collected which can be visualized too for ease. Moreover, it is
downloaded which requires heavy space on the disk.
Forecasting the yield using biomass calculated from satellite images 25
Our scope is focused on acquiring the vegetation indices and specially Normalized Difference
Vegetation Index (NDVI) which is a useful tool in calibration of Leaf Area Index (LAI). These
vegetation indices give special reflectance in different bands which are collected through real
time data and useful information is extracted using satellite imagery processing toolbox like
SNAP and ArcMap. In our research work, the area of Rahim Yar Khan is focused and the data
has been collected starting from the Kharif season (April 2020-June 2020) till the end of it. The
data acquired is in the “tiff” format which gives the idea about additional information
associated with the data like map projections. The exact values of NDVI are extracted from
SNAP toolbox which gives the ranges of NDVI values within the real time data and uses
different bands to calculate the value of NDVI and as the exact values are determined it will be
successfully used in calculating the LAI value and it is the best estimator of plant’s biomass
which will be further used in yield regression analysis. For the collection of data, different
geographical points have to be used in order to get exact crop data. There are many formats
which can be used to get these values like getting the NDVI in pure image format where
different levels of colors can then distinguish which NDVI values are best to be used.
Therefore, we have used the exact NDVI values for our project and this data is of high
importance and will further be used for the real time crop data. The advanced search facility in
EO browser helped us to get the 0% cloud coverage data.
Implementation 26
Here RED is the reflectance in the red wavelength; and NIR is the reflectance in the near-
infrared wavelength. For calculation of NDVI, we use Sentinel-2 imagery.
To pre-process the images from Sentinel-2, NDVI is calculated for each crop separately.
Sentinel-2 gives us Hyperspectral images, in this step, crop’s spectral information was
collected and statistical analysis were done. We download the Sentinel-2 imagery data from
Sentinel-2 website for pre-processing, selecting the imageries without clouds and transformed
Forecasting the yield using biomass calculated from satellite images 27
The core purpose of extracting the data from QGIS is to get training and testing data which is
exported to CSV or XLSX format. There is the plugin in QGIS called as “Point Sampling Tool”
which is initially imported and used for achieving the required data points being specified as
the input to this module.
Implementation 28
Y=a+bX (3)
Other machine learning models are complex and computationally expensive as in accordance
to our dataset. Neural networks are very expensive and they require huge amount of dataset
which makes them slow. SVM, as discussed above, is not sensitive enough to detect the data
anomaly which left us with regression techniques where multiple linear regression is efficiently
useful which can incorporate multiple independent variables to predict the yield. Normalized
Difference Vegetation Index (NDVI) statistics are used to detect crop’s state and forecast yield
as well as production in various regions across the globe. Remote sensing and spectral
reflectance are used to provide us information about different vegetation indices as well as
NDVI. NDVI follows the method that actively growing plants absorb the radiations in visible
part of spectrum whereas highly reflecting radiations in the near-infrared region.
NDVI is calculated with the help of different approximation equations. The data is processed
and used to get the yield approximation. Linear regression and correlation techniques are used
to establish a relationship between yield and NDVI. There is a very strong correlation present
between NDVI and crop yield. The relationships between crop yield and NDVI turned out to
be enormously positively correlated to each other at the time of stem growth, striking and
maturity stages. The NDVI values were obtained through real time satellite values. Regression
equations and coefficient of determination were used for linear regression purposes. The lowest
NDVI values came out to be in March and the highest NDVI values came out to be in June.
Forecasting the yield using biomass calculated from satellite images 29
Libraries used
• CSV module for file reading and writing.
• NumPy module for scientific computation.
• Matplotlib module for creating figures and graphs.
• Scikit-learn module for using algorithms.
• SciPy module for computational purposes.
For the purpose of preprocessing the required data, we have used “reshape” function in Python
which is used to reshape an array of loaded data but it does not change the data values itself.
The training data and testing data have been divided in the effective ratio for prediction. Python
has been widely used for Machine learning purposes and “sklearn.linear_model” library has
been used which is further used for implementing the Linear Regression model. The following
piece of code shows the preprocessing of data.
Furthermore, as the required training and testing data has been loaded so we will use this data
for regression analysis which will give the required predicted LAI values against each actual
LAI value. Training and Testing data has been divided and passed into the fit function of model
which then predicts the data based on the testing data. The following piece of code illustrates
the required functionality.
After training the data, we have passed the testing data in the model for getting the desired
results. There are numerous relationships between LAI and different vegetation indices. Most
of them are based on Normalized Difference Vegetation Index as it shows the greenness in a
plant and is more accurate in determining the plan health and yield estimation. We used NDVI
to calculate our LAI values to estimate our yield. The more accurate LAI is, the more we are
closer to accurate assessment of yield prediction. Both vegetation indices are affected by
temperature, humidity, water, soil etc. The following piece of code shows the actual and
predicted values.
Here, n_estimators show the number of trees to be made at runtime for decision purposes. After
training the data, we have passed the testing data in the model for getting the desired results.
Now for seeing the output, actual and predicted LAI values have been displayed which are
shown in the following figure.
The dependent and independent are decided with the help of regression equations again where
the values on LAI are dependent upon different vegetation indices including NDVI. This
relation is determined with the help of different equations. After this, the yield is estimated
through LAI. Higher LAI values indicate the higher yield prediction of the certain crops. It
means NDVI and LAI are strongly positively co-related to each other. The coefficient of
determination R2 gives the amount of variation that can be presumed by the independent
variable for a dependent variable. If root mean square error is zero, the predicted and measured
values are alike else they are not alike. The expected outcome for RMSE for a good output is
expected to be between 10% and 20%.
The LAI and yield were calculated using dissimilar instruments and critical methods are
applied location wise. The analysis has to be done for assessing the difference between the
calculated yield and actual yield and for that purpose root mean square error (RMSE) is used
which gives a certain value to correlate if predicted yield is validated yield or not. Similarly,
coefficient of determination also correlates the actual and predicted values and, in this case,
ground based and satellite-based LAI and yield values are obtained which gives the information
and validation if the required linear regression model is performing well or not. Thus, for this
crop type, the linear regression model is developed which proves to give the highest coefficient
of determination, which tells that predicted value is actually close to the real one. The remote
sensing appraised values of LAI helps to predict the crop yield and it is experiential from the
past research work that it outdoes all the vegetation indices for estimating the crop yield and
the specific value of LAI gives material about yield estimation one month prior to the harvest
which is meaningfully reducing the manual concentrated work. The section summarizes how
to calculate correct yield estimates for proper results.
Implementation 32
For the preprocessing of the data, we used “reshape” function which is used to reshape array
of data but it does not make changed to the actual data. The data has been divided into the ratio
of 70-30% for training and testing respectively for prediction. “sklearn.linear_model” library
is responsible for implementing Regression Models. Following is the code for all this work.
Furthermore, 70% data was used as training data and 30% data was used as testing data, as data
is loaded, now, we will use this data for regression analysis which will provide us with
predicted yield values, and we can compare it with actual yield values. In the above section,
we have illustrated the prediction of LAI using Linear Regression and Random Forest
Regression, now we will show how to train and build Random Forest Regressor for calculating
yield in the similar manner, all the modules of retrieving and preprocessing data will remain
same except for the model which is shown below in the following piece of code.
Forecasting the yield using biomass calculated from satellite images 33
Data was being trained and preprocessed and the passed to the Random Forest Regression
model. Predicted values have been obtained using testing data. Following is the result,
Prediction Yield vs Actual Yield.
Yield has been calculated using LAI values, which is considered to be one of the most
important parameters for yield prediction. There are many other factors which may affect the
yield but results that can be achieved using LAI, makes it reliable parameter. Previous studies
have been done on this topic. Results were always good, but there exist external factors such
as temperature, rain, vegetation supersaturation, soil water content, and nutrients, these factors
have effects on yield even after its being predicted correctly. Prediction can be improvised if
we keep these factors under consideration. Yield prediction is important for farmers as well as
government. It can prevent major losses and it can help to prepare for food shortage. Moreover,
it can help to estimate extra supply for export. This section shows the results of yield prediction
using LAI by Random Forest Regression method.
Experimental Results and Analysis 34
Conclusion
Increase in population has increased the demand of food all across the world. It becomes the
need of hour to predict the yield of crops beforehand so that crop production can be done easily.
In this way, farmers and agricultural companies can estimate how much biomass is required
for each type of crop and how much yield they will produce. This will also lead to save us in
case of natural disasters so that we can store crops. In this project, we used remote sensing
techniques to predict our yield. Various works have been done in this regard where yield is
predicted through different satellites. Our study focused on using the sentinel-2 satellite for
obtaining our real time data. Previous works show that researchers have applied different
artificial intelligence and machine learning models to calculate the yield. In most of the papers,
it can be seen that mostly different vegetation indices have been used to estimate the yield.
These indices often depend upon each other. Multiple equations have been used in different
papers in this regard.
We have successfully used Normalized Difference Vegetation Index (NDVI) and Leaf Area
Index (LAI) to determine the yield and we tried to improve the accuracy of the results by
reducing the root mean square error between original data and satellite data. In this way, better
results can be obtained and we can obtain our objectives. Farmers will be able to know through
this, which is the best time to harvest to crops in their full throughput. The challenges we faced
during this were mostly regarding the data collection and improving the accuracy of the model.
We have incorporated the concepts of Artificial Intelligence and Machine learning in this
project which has provided us the platform for solving this complex problem alongside the
underlying problems that we have encountered during this project setup. There were some
loopholes like getting the access to some software and applications needed to achieve the
results.
For other researchers planning to carry our work forward, they have to do the thorough research
on getting the dataset. Moreover, the work that has been done on the previous crop yield
prediction models has been able to achieve the similar results by using NN or regression
techniques, there are possible underlying limitations like the sensitivity of data and crop type.
This research was solely based on the crop findings of cotton and wheat targeting the Pakistan’s
atmospheric conditions. If the data has been available free of cost and has been according to
the given optimum conditions then research findings could have been improved much better.
References 36
References
[1] I. Herrmann, A. Pimstein, A. Karneili and Y. Cohen, "Research Gate," August 2011.
[Online]. Available:
https://www.researchgate.net/publication/256850034_LAI_assessment_of_wheat_and_
potato_crops_by_VENmS_and_Sentinel-2_bands. [Accessed 20 September 2020].
[2] R. N. Sahoo, S. S. Ray and M. K. R, "Research Gate," March 2015. [Online]. Available:
https://www.researchgate.net/publication/273321445_Hyperspectral_remote_sensing_
of_agriculture. [Accessed 20 September 2020].
[3] W. Lv and X. Wang, "Overview of hyperspectral image classification," 2020.
[4] J. Segerra, M. L. Buchaillot, J. L. Araus and S. C. Kefauver, "Remote sensing for
precision agriulture: Sentinal-2 improved features and applications," Agronomy Journal,
2020.
[5] S. jin, w. hui, Y. Wang, K. Huang, Q. Shi, C. Ying, D. Liu, Q. Ye, W. Zhou and J. Tian,
"Hyperspectral imaging using the single pixel fourier transform technique," 2017.
[6] M. Aboelghar, S.Arafat, A.Saleh, S.Naeem, M.Shirbeny and A.Belal, "Retrieving leaf
area iedx from SPOT4 satellite data," The Egyptian Journel of Remote Sensing and
Space Science, vol. 13, no. 2, pp. 121-127, 2010.
[7] A. M. Smith, C. Nadeau, J. Freemantle and H. Wehn, "Leaf area index from CHRIS
satellite data and applications in plant yield estimation," 2005.
[8] S. J. Maas, "Using satellite data to improve model estimates of crop yield," Agronomy
journal, vol. 80, no. 4, 1988.
[9] G. Zheng and L. M. Moskal, "Retrieving leaf area index using remote sensing: Theories,
mothods and sensors," 2009.
[10] L. He, X. Ren, Y. Wang, B. Liu, H. Zhang, W. Liu and W. Feng, "Comparing methods
for estimating leaf area index by multiangular remote sensing in winter wheat," 2020.
[11] J. Huang, X. Wang, X. Li, H. Tian and Z. Pan, "Remotely sensed rice yield prediction
using multi-temporal NDVI data derived frolm NOAA's-AVHRR," 2013.
[12] A. Sharifi, "Yield prediction with machine learning algorithms and satellite images,"
Journal of the science of food and agriculture, 2020.
[13] S. R. Sultana, A. Ali, A. Ahmed, M. Mubeen, M. Z. u. Haq and S. Ahmad, "Normalized
difference vegetation index as a tool for wheat yield estimation: A case study from
Faisalabad, Pakistan," vol. 2014, 2014.
[14] E. Paneka and D. Gozdowski, "Analysis of relationship between cereal yield and NDVI
for selected regions of central Europe based on MODIS satellite data," 2020.
[15] A. S. Andrade, P. M. Santos, J. R. M. Pezzopane, L. C. d. Araujo, B. C. Pedreira, C. G.
S. Pedreria, F. Marin and M. Lara, "Simulating tropical forage growth and biomass
accumulation: An overview of model development and application," Grass and forage
science, vol. v71, no. 1, 2015.
[16] J. L. Lindquist, T. J. Arkebauer, D. T. Walters, K. G. Cassman and A. Dobermann,
"Maize radiation use efficiency under optimal growth conditions," Agronomy Journal,
vol. v97, no. 1, 2005.
[17] M. E. Paoletti, J. M. Haut, J. Plaza and A. Plaza, "A new deep convolutional neural
network for fast hyperspectral image classification," ISPRS Journal of Photogrammetry
and remote sensing, vol. 145, pp. 120-147, 2018.
[18] L. He and G. Mostovoy, "Cotton Yield estimate using Sentinal-2 data and an ecosystem
model for the southren US," Remote Sensing, 2019.
Forecasting the yield using biomass calculated from satellite images 37
[19] S. Kalluri, P. Gilruth, D. Rogers and M. Szczur, "Surveillance of anthropod vector borne
infectious diseases using remote sensing techniques: A review," 2007.
[20] I. R. Hegazy and M. R. Kaloop, "Monitoring Urban growth and land use change
detection with GIS and remote sensing techniques in Daqahlia governorate Egypt,"
International Journal of sustainable built enviornment, vol. 4, no. 1, pp. 117-124, 2015.