You are on page 1of 44

National University of Computer and Emerging Sciences

Forecasting the yield using biomass calculated from


satellite images

Ayesha Asad…………L17-4203
Eiman Waheed………L17-4202
Salman Ahmed…………L17-4286

Supervisor: Dr. Irfan Younas

B.S. Computer Science


Final Year Project
December 2020
Anti-Plagiarism Declaration
This is to declare that the above publication produced under the:

Title: Forecasting the yield using biomass calculated from satellite images.
is the sole contribution of the author(s) and no part hereof has been reproduced on as it is basis
(cut and paste) which can be considered as Plagiarism. All referenced parts have been used to
argue the idea and have been cited properly. I/We will be responsible and liable for any
consequence if violation of this declaration is determined.

Date: 25-06-2021
Student 1
Name: Ayesha Asad

Signature:

Student 2
Name: Eiman Waheed

Signature:

Student 3
Name: Salman Ahmed

Signature:
Table of Contents i

Table of Contents
Table of Contents .................................................................................................................... i
List of Tables .........................................................................................................................ii
List of Figures ...................................................................................................................... iii
Abstract ...................................................................................................................................... 1
Introduction .............................................................................................................. 2
Goals and Objectives ....................................................................................................... 2
Scope of the Project ......................................................................................................... 2
Definitions, Acronyms, and Abbreviations ..................................................................... 3
1.3.1 Abbreviations ............................................................................................................ 3
Literature Survey / Related Work ............................................................................ 4
Yield Prediction using Machine Learning Algorithms .................................................... 4
2.1.1 Yield Prediction using Multiple Linear Regression.................................................. 4
2.1.2 Yield Prediction using Neural Network .................................................................... 5
2.1.3 Yield Prediction using Support Vector Machine ...................................................... 7
2.1.4 Yield Prediction using Random Forest ..................................................................... 7
Hyperspectral Imaging ..................................................................................................... 8
2.2.1 Vegetation Indices derived from hyperspectral images ............................................ 9
2.2.2 Machine Learning for Remote Sensed Data ........................................................... 10
Literature Review Summary Table ................................................................................ 13
Requirements and Design ...................................................................................... 19
Functional Requirements ............................................................................................... 19
Non-Functional Requirements ....................................................................................... 19
3.2.1 Reusability .............................................................................................................. 19
3.2.2 Reliability................................................................................................................ 19
3.2.3 Extensibility ............................................................................................................ 19
3.2.4 Performance ............................................................................................................ 19
3.2.5 Robustness .............................................................................................................. 19
Hardware and Software Requirements .......................................................................... 20
3.3.1 Hardware Requirements.......................................................................................... 20
3.3.2 Software Requirements ........................................................................................... 20
System Architecture ....................................................................................................... 21
3.4.1 Architecture Diagram.............................................................................................. 21
3.4.2 System Modules ...................................................................................................... 21
Yield Estimation ............................................................................................................ 22
Implementation ...................................................................................................... 24
Implementation .............................................................................................................. 24
4.1.1 Data collection ........................................................................................................ 24
4.1.2 Preprocessing of Data ............................................................................................. 26
4.1.3 Building and Training Machine Learning Model ................................................... 28
Experimental Results and Analysis ........................................................................ 34
Validation of LAI prediction using Random Forest Regression ................................... 34
Validation of Yield prediction using LAI by Random Forest Regression..................... 34
Conclusion.............................................................................................................. 35
References ................................................................................................................................ 36
List of Tables ii

List of Tables
Table 1: Work done on remote sensing data ............................................................................ 13
List of Figures iii

List of Figures
Figure 1: MLR equation............................................................................................................. 4
Figure 2: Predictor Variables Combinations.............................................................................. 5
Figure 3: Relationship between NDVI and Yield ...................................................................... 5
Figure 4: Artificial Neural Network .......................................................................................... 6
Figure 5: crop prediction results ................................................................................................ 7
Figure 6: Yield graph ................................................................................................................. 8
Figure 7: List of ML algorithms ................................................................................................ 8
Figure 8: Schematic Diagram of 3D-CNN .............................................................................. 10
Figure 9: Setup of the Fourier transform ................................................................................. 11
Figure 10: System Architecture ............................................................................................... 21
Figure 11: List of Vegetation Indices ...................................................................................... 23
Figure 12: Rahim Yar Khan Area ............................................................................................ 24
Figure 13: GeoJSON coordinates ............................................................................................ 25
Figure 14: Data collection in EO browser ............................................................................... 26
Figure 15: Data Layer in QGIS ................................................................................................ 27
Figure 16: Point Sampling Tool............................................................................................... 28
Figure 17: Loading Data Module ............................................................................................. 29
Figure 18: Preprocessing Data Module.................................................................................... 29
Figure 19: Training Data Module ............................................................................................ 30
Figure 20: Output module ........................................................................................................ 30
Figure 21: Training Data Module ............................................................................................ 31
Figure 22: Output module ........................................................................................................ 31
Figure 23: Loading Data Module ............................................................................................. 32
Figure 24: Preprocessing Data Module.................................................................................... 32
Figure 25: Training Data Module ............................................................................................ 33
Figure 26: Output Module ....................................................................................................... 33
Forecasting the yield using biomass calculated from satellite images 1

Abstract
Population growth has typically increased the need for beforehand planning of crop production
and forecasting the yields of crops using biomass. Crop yield is mandatory, predominantly in
those countries where agriculture is their main source of economy. These predictions help in
estimating the reduction of crop yields so that an effective import and export system is
produced. The procedure to estimate the yield before the crops are harvested using satellite
remote sensing techniques is very important these days. Our project is mainly focused on how
much yield a crop will produce using different vegetation indices with the help of remote
sensing techniques through satellite images. For this purpose, we evaluated different techniques
and scenarios, calculated different time series data for the purpose of finding vegetation indices
which will further be used in estimating correct yield. Finding vegetation indices and their
approximate values to predict the future crop yield is one of the best methods to assess plants
future yield. We used sentinel-2 satellite for our data collection of Vegetation Indices.
Vegetation Index in remote sensing simple explains about the vegetation biomass for every
pixel in the remote sensing technique. The indices are obtained with the help of various spectral
bands reflectance. In this regard, we calculated Normalized Difference Vegetation Index
(NDVI), which tells about the greenness in a plant and helps in estimating the yield. Moreover,
we also calculated Leaf Area Index (LAI) with the help of NDVI. Leaf Area Index explains
about the leaf area per unit ground area and it is obtained using SNAP toolbox which is the
efficient software for processing satellite data. After the implementation of the required
methodology, results are achieved using Linear Regression and Random Forest Regression for
predicting the yield.
Introduction 2

Introduction
Agriculture is an important economic benefitting profession in Pakistan and most of the
country’s economy is dependent on agriculture. For increasing knowledge and productivity,
we have to change the conventional agricultural work by introducing the means of forecasting
the crop yield so that planners and administrators can formulate the budget and policies
beforehand, which will regulate the economy of the country [1, 2]. The economy of our country
is stressed and it is the need of hour to allot the budget effectively without making a waste and
this prediction of crop yield will result in immediate advances of wealth. Pakistan is one of the
largest producers of different crops and most of the people’s income depends on this profession
and with the beforehand knowledge of yield, the country can make a lot of profit. Hence,
strategy of developing economic benefitting applications by using computers is advantageous
for the economic aspects of country.
Our project facilitates the administration and decision makers to formulate the policies and
budget by providing them with the application, which can forecast the crop yield by real time
data using remote sensing imagery. Recent advancement in technology has paved the way for
developing smart agriculture in the outside world. Before the harvest of crops, yield can be
computed between different intervals of time on the ground rules of Artificial Intelligence.
Satellite remote sensing devices offer an exclusive outlook on the condition and active changes
occurring in land, coastline, and oceanic ecosystems. This application will make use of remote
sensing imagery, which will predict the crop yield using different regression models, which
will keep track of the increment in plant’s biomass from the initial stage of growth to its
maturity stage.
The first section is about the detailed introduction of the project which includes goals and
objectives and scope of the project. Second chapter is about the detailed literature survey. Third
chapter is about the requirements and design which dives the detailed methodology of our
project. Fourth chapter includes implementation of the work done so far.

Goals and Objectives


Yield forecasting plays a significant role in different levels of classes including stakeholders,
farmers, decision makers and policy makers. Yield forecasting has been a puzzle for different
researchers and developers due to the lack of real time data or different environmental factors
[1]. Our project aims at developing ease in forecasting the yield using biomass and has the
following objectives listed below:
- Remote sensing imagery to be applied on different regression models to timely predict
the yield of crop.
- Lessen the overall cost for the agriculture community to imply the stature of crop when
to be cultivated.
Limit the crop loss due to various calamities and recognize the specific harvest planted area
using remote sensing.

Scope of the Project


Scope of our project covers various techniques that will be done. We will modify typical
agricultural work by introducing the means of forecasting the crop yield so that planners and
farmers can formulate the budget and policies before time which will bring benefit to the
agriculture in Pakistan. We will cover techniques like hyper spectral images for forecasting the
yields of crops. Furthermore, Vegetation index and Leaf Area Index are also key factors for
calculating yield of plants and will also be used in this project. NDVI technique is also a famous
Forecasting the yield using biomass calculated from satellite images 3

technique which will be used by us. Some other techniques will also be used that will be
decided later.

Definitions, Acronyms, and Abbreviations


Hyperspectral Images: Hyperspectral images are the images in which one continuous spectrum
is measured for each pixel. Normally, the spectral resolution is given in nanometers or wave
numbers.
Remote Sensing: Remote sensing is the process of detecting and monitoring the physical
characteristics of an area by measuring its reflected and emitted radiation at a distance
(typically from satellite or aircraft). Special cameras collect remotely sensed images, which
help researchers “sense” things about the Earth.
Neural Network: Neural networks are the subset of Machine Learning Algorithms which are
inspired by the activity of human brain and they consist of different layers including an input
layer, hidden layer and an output layer.
Multiple Linear Regression: Multiple linear regression is a technique which comes under
Machine Learning and it is used to predict the outcome based on number of predictor variables.
It is an extensive form of linear regression which uses only one predictor variable.
Random Forest: it is a method used for regression, classification and other learning
mechanisms which is made using multiple decision trees and uses training and testing data.

1.3.1 Abbreviations
The most commonly used abbreviations in our document specified are:
• LAI which is the abbreviation for Leaf Area Index which tells the active surface area
covered by plant and is derived as leaf area per ground surface area.
• NDVI which is the abbreviation for Normalized Difference Vegetation Index which
tells the green matter in the plant measuring through the photosynthesis absorbed by
plant.
• EVI which is the abbreviation for Enhanced Vegetation Index which is used to enhance
and increase the vegetation in a plant.
• PAR which is the abbreviation for Photosynthetically Active Radiation which means
that it gives the information about the amount of light currently available for plant for
photosynthesis.
Literature Survey / Related Work 4

Literature Survey / Related Work


The following section elaborates the detailed literature review of previous researches done in
accordance with the chosen topic and gives the detailed knowledge of each technique.

Yield Prediction using Machine Learning Algorithms


Yield prediction is calculated using biomass which is the plant vigor and there are number of
indices used for the estimation of production of field and the number of indices used for
biomass estimation are leaf area index, biomass accumulation and NDVI which are discussed
in detail in next chapters. The following section describes the previous research on yield
production using machine learning techniques. Yield prediction estimates are processed using
various tools which includes direct and indirect methods, in the related literature survey,
different methods of yield production are discussed. The following section will describe the
previous research work done in the accumulation of vegetation indices for yield prediction
using machine learning approaches.

2.1.1 Yield Prediction using Multiple Linear Regression


Crop yield prediction can be done using linear regression technique which is the widely used
technique in the research papers. Different vegetation indices like LAI, NDVI and EVI are used
as predictor variables to estimate the yield. Leaf area index (LAI) is one of significant
parameter which is used for the plant’s biomass estimation and it is a very essential
characteristic for characterizing a plant's canopy which is defined as the total leaf area per unit
ground area. LAI can be used as a very important input parameter to the crop yield estimation
models which predicts the early productivity in a very efficient way [3]. Linear regression is a
mathematical term which is used to derive relation between dependent and independent
variables.

Dataset used
The dataset used in experimentation was derived from sentinel-2 having 11-12 years of
information which was further broken down into training and testing data [16]. The two types
of vegetation indices have been used derived from this data which were NDVI and PAR and
preprocessed using sensors like AVHRR.

Multiple Linear Regression Model


In the model developed for this experimentation, there was one dependent variable and multiple
independent variables to predict the correct results, figure 4 illustrates the equation of MLR.

Figure 1: MLR equation


MLR equation for multiple predictor variables.

The MLR model has been trained using different combinations of predictor variables against
the dependent variable which are shown below in figure 2.
Forecasting the yield using biomass calculated from satellite images 5

Figure 2: Predictor Variables Combinations


Predictor combinations of vegetation indices for MLR.

Results of the experimentation


MLR model was trained on the huge data and the results were satisfactory [16] having the
confidence interval of 95% with the skill of 0.245 which is highest and it outperformed other
machine learning algorithms. NDVI was proved to be the most effective vegetation index in
estimating the correct output of yield as the relationship was linear between NDVI and yield
prediction as shown in the figure 3.

Figure 3: Relationship between NDVI and Yield


NDVI and yield relationship graph obtained by using MLR.

Normalized Difference Vegetation Index (NDVI) is an effective crop yield estimation tool and
there is found to be the direct relationship between crop yield production and NDVI as NDVI
value is the representation of yield level against each single pixel [8]. Spectral vegetation
indices are used because spectral data is easily available anywhere in the world and these
indices are normalized difference vegetation index (NDVI) and the enhanced vegetation index
(EVI) rather using climatic variables [9]. Researchers have been working on using these indices
and the results obtained are satisfactory which were tested on various samples ranging from
various seasons. Normalized Difference Vegetation Index is one of the tools used for crop yield
prediction and scanner for crop growth which is being used in various countries and crop
growth is the day to day need for sound planning and budgeting in various countries.

2.1.2 Yield Prediction using Neural Network


Neural network uses the idea of biological brain of human to interpret the output based on the
trained data and it uses different layers using back propagation method to reduce the error.
Neural network has also been used in different studies to compute the predicted yield. The crop
Literature Survey / Related Work 6

yield estimation and prediction has been done using various techniques used before which
includes remote sensing imagery and an extensive research has contributed to this fact that due
to various reasons, remote sensing imagery was not fulfilling completeness in achieving
efficient goals of prediction because there are number of reasons included due to environmental
limitations like weather, soil moisture and rooting zone which affects the average crop growth
cycle [4]. For this purpose, researchers have been working on collecting more information for
getting efficient results and for the prediction purpose the term Leaf Area Index (LAI) was
coined which measures plant’s canopy [3]. As the time passes, LAI value changes which is the
indication of plant growth and hence various crop growth models can be used to estimate the
plant’s biomass [5].

Dataset used
Dataset used in this experimentation was downloaded from in collaboration with Bangladesh
bureau of Statistics (BBS) ranging from 8 to 10 years having 2000 plus entries [18]. Data was
preprocessed and split into categorical attributes having multiple table values. Data was divided
into training and testing dataset and after the training, yield prediction is achieved.

Neural Network
The ANN model used in this experimentation is multi-layer perceptron. The satellite data is
advantageous if used for yield estimation due to its temporal resolution so NDVI is often used
because it is based on visible and infrared regions and also Enhanced Vegetation Index (EVI)
is used in various models for prediction. Researchers have collected samples to be used in the
statistical model for prediction from selected regions of world such as Argentina and the effect
of NDVI on the plant growth has been noticed immediately to the date and it has been
concluded that increase in the efficiency is achieved by 0.1% which means the increase in yield
production estimation [11]. The NN model has one layer and is built into Bagging Algorithm
using the bagging blocks for correctly training and testing the output as shown in figure 4.

Figure 4: Artificial Neural Network


Artificial Neural Network architecture used in the recent
paper.

Results of the experimentation


Root mean squared error (RSME) was used for the result analysis which showed that what
algorithm performed better. The results show that NN performed slightly better in the land data
used here as the relationship between variables was weak for MLR but it is to note that NN
takes more time for testing and validating the results as shown in the following figure 5.
Forecasting the yield using biomass calculated from satellite images 7

Figure 5: crop prediction results


Crop yield prediction results using Neural Network.

Yield forecast and checking the condition of agriculture is critical due to the serious monetary
and social results of food lack. This exploration fundamentally plans to decide the season when
oat grain yield can be definitely anticipated utilizing the NDVI area of Central Europe. The
indication of this period is significant in light of the fact that the consequences of different past
investigations are not steady and show various occasions as urgent. To contemplate this, quite
a while range, with a few days' span, was dissected, from late-winter to the beginning of the
assortment of most grain species. This grants to anticipate the grain yield of oats with higher
precision and, if possible, a long time before the harvest.

2.1.3 Yield Prediction using Support Vector Machine


Support vector machine (SVM) has been used in this experimentation [17] which was used in
the MATLAB environment as it has been widely used ML model since last 15 years. Biomass
accumulation is the change in biomass from one period of time to another and it is used for the
crop yield and prediction. Several models have been developed to predict biomass
accumulation for producing efficient results [12]. In this experimentation [17], yield prediction
model has been developed which was entirely based on SVM analysis.

Analysis of the result


Data was collected from China using their China Meteorological Administration database and
the data was collected 53 dimensions [17]. The dataset was categorized into binary form, out
of which one class showed yes or no and the other class showed other factors. Fivefold cross
validation and different parameters were used for the training data sample and the results
achieved were not as staggering as MLR and NN.

2.1.4 Yield Prediction using Random Forest


Crop yield prediction has been done using RF and, in this experimentation, [14], rather than
using data with one field, multiple fields can be incorporated and dataset was collected over
the years for more precision-based result. The spatial goal of these yield expectations is a
pivotal part, as this permits the board to be custom-made to various fields inside a homestead,
or at the sub-field level. Customarily, ranchers and their counsels gauge their yield objectives
dependent on past experience and occasional conditions, and afterward utilized this as a manual
for build the board choices [13].

Analysis of the result


Dataset was collected for the wheat and rice crops covering different seasonal traits and values
ranging in the specific time interval. RF and STC were used in conjunction for the predictive
Literature Survey / Related Work 8

crop model and this model took crop type as an input variable rather making stretchy long
variables [14]. The model had a coverage of dataset having 100m spectral resolution. the results
were fulfilling and thus this model can be easily used in other agricultural sectors as the
predicted and output yield is shown in the figure 6.

Figure 6: Yield graph


Predicted and Observed yield graph.

Different machine learning techniques have been developed and plant samples have been
collected keeping in view the monthly production for various seasons in three fields. For each
field, ten to twelve sample points have been withdrawn and results were predicted using LAI
equation [4]. The two most widely used machine learning algorithms are MLR and NN as
shown in figure 7.

Figure 7: List of ML algorithms


List of Machine Learning Algorithms widely used.

Hyperspectral Imaging
Remote sensing helps to gather quality information about a process or an object, skipping the
direct physical contact with that particular object or area under observation specifically on
Earth. Remote sensing is used in various fields like geography, ecology, meteorology etc.
Remote sensing, now a days is majorly considered as satellite based or air-based technology
that helps us to identify and study specific geographical areas and the distant object traversing
using precise algorithms and also measure the environment around us based on the signals
produced [3]. Many scientists have acknowledged imaging spectrometry or hyperspectral
imaging in the field of science. Before the scientific community embraced it, the technique
Forecasting the yield using biomass calculated from satellite images 9

made advancements in the fields of electronics, computing and software in the era of 1980s to
1990s [4].

2.2.1 Vegetation Indices derived from hyperspectral images


Different studies showed that there are number of vegetation indices like LAI, NDVI, PAR and
Biomass that can be derived from hyperspectral images which can then be utilized in predicting
the crop yield.

Significance of LAI and NDVI


The vegetation indices give the accurate results as seen from various works cited before, as for
now the three vegetation indices i.e., NDVI, LAI and greenness is associated with absorbing
PAI as a whole so the relationship must be developed between PAR and these three vegetation
indices [13]. Leaf Area Index (LAI) and measure of retained photo synthetically active
radiation (APAR) are the most significant yield factors that can be assessed, for instance, by
utilizing a basic reflectance model or vegetation file [13]. Nowadays, various techniques are
emerging like the direct correlation between dynamic production and climatic effect on
ecosystems, using LAI the relationship between these two factors have been well established
and it can tell the beforehand production of crops [6]. The above ground biomass is converted
into crop yield for the prediction using various indices provided by remotely sensed data [7].
As of today, the leaf area index research point has moved from a careful and quantifiable stage
to procedural based showing stage on account of the incorporation of indirectly recognized
datasets and environment models. The canopy structure we have studied in this paper is meant
to be the spatial distribution of over-the-ground plant health materials and properties like
branches, leaves, blooms and regular item which affects the temperature of air, leaf and climate
which covers the soil evaporation.
Hyperspectral images are used for the extraction of LAI and as it remains unchanged but the
resolution changes so LAI can be easily measured as a biophysical parameter for various ranges
typically for a broader leaf [6]. Plants have various characteristics associated with biophysical,
physiological and structure which are determinants of the greenlight which is low light
reflectance and these characteristics are able to capture the detailed information about canopy
structure [6]. The remote sensing imagery and spectral reflectance has been used to eliminate
the existing challenges in crop monitoring and growth models and also for the installation of
various sensors which paved the way for more reliable and powerful assessment of crops using
Vegetation indices like LAI, Normalized Difference Vegetation Index (NDVI) and another
significant index which is EVI where NDVI can be used to estimate the spectral areas in the
leaf area index (LAI) [10]. Normalized Difference Vegetation Index (NDVI) is based on the
concept that active plants absorb radiation in visible regions and reflect strong radiations in
near-infrared radiations and it is the tool used to measure and calculate the amount of green
vegetation in the area [10]. Utilizing Earth perception satellite information, it is possible not
only to screen the current status of the atmosphere and the orientation of its change, yet
furthermore to envision crop yields and screen the status of rustic creation at the close by and
nearby levels. Various researchers from over the globe have focused on that satellite data and
the NDVI can help with foreseeing crop yields [11].

Significance of Plant’s Biomass


Plants have biophysical and physiological compositions and structures which are deducted as
an information when plants reflect certain radiations in the electromagnetic spectrum and for
this purpose various sensors in satellites are used to measure information and study shows that
Sentinel-2 produces efficient results in this regard [12]. The pigments of chlorophyll which is
Literature Survey / Related Work 10

the green substance in leaves are visible region of the electromagnetic spectrum with
wavelength 400-700 nm, plants show high reflectance values in near infrared (700-1300 nm)
which give information about structural properties and biomass. PAR uses wavelengths
ranging 400-700 nm and it tells the amount of light absorbed by plants for photosynthesis. In
the past, models used for crop production and yield required a gauge of covering leaf area index
or absorption of radiation; nonetheless, direct estimation of LAI or light assimilation can be
dreary and tedious. The object of this investigation was to create connections between
photosynthetically dynamic radiations (PAR) consumed [12]. Assessments of crop
development and along these lines yield forecasts are mistaken for inhabitable developing
conditions. Plant factors are deducted using optical remote sensing which additionally assume
a significant part during the time spent harvest development. Through LAI and APAR, optical
remote detecting can give efficient results. The genuine status of rural harvests during the
developing season, consequently offering the chance of aligning the development displaying
[12]. Net increment in crop dry-matter in non-stress conditions can be demonstrated by usually
applied procedure that accepts that the measure of dry plant biomass created which is relative
to the intercepted photo synthetically active radiation (IPAR). The incline of this relationship
or 'radiation-use proficiency' is regularly thought to be consistent for crop species [11].

2.2.2 Machine Learning for Remote Sensed Data


Artificial neural networks have been used extensively for remotely sensed imaging techniques.
In this regard, convolutional neural network CNN has much more worth. Areas like image
classification and recognition used this method on a very huge amount of big data sets for crop
yield prediction. Here CNN is used as a three-dimensional network for spatial and spectral
information collection purposes [16].

Convolutional Neural Network


The proposed network worked efficiently with hyperspectral images obtained from crop farm
thus saving a lot of time and the correctness of results was achieved. The experiments used two
famous hyperspectral data sets. First one was Airborne Visible/Infrared Imaging
Spectrometer (AVIRIS) sensor. Distant patches of ground and various crops in Indiana were
used to test this. AVIRIS Indiana Pines has 145x145 pixels that contains 224 phantom groups
400 to 2500nm. Subsequently, after the starting observations, four zero bands with 20 different
bands were detached because of surrounding captivation process. ROSIS sensor was used to
calculate the data set from the city of Pavia. It had a city situation with different concrete
assemblies [16]. It had 103 spectral groups containing 610×340 pixels. The range they had was
0.43 to 0.86 μm.

Figure 8: Schematic Diagram of 3D-CNN


Demonstrating working flow of 3D-CNN for the satellite
data.
Forecasting the yield using biomass calculated from satellite images 11

2.2.2.1.1 Results achieved


This method uses a single pixel procedure. In contrast with spectrometer methods that are used
these days, this system has better spectral limits that are 400-1100nm with advantage of having
a fine perseverance of 1nm. The presentation of the current technique was confirmed by the
presentation it has to the testing of potatoes without terminating them [17]. Hyperspectral
imaging equipment is widely used in the fields of food testing, biotech etc. For enhancing the
illuminance quantity and pace of process inside a dense apparatus construction, Fourier
transform hyperspectral imaging system is used.

Figure 9: Setup of the Fourier transform


Hyperspectral imaging system based on single pixel
technology.

Sentinel-2 mission gives us a completely new outlook at assessing the crop development and
yield estimation with the help of remote sensing techniques [18]. This was used for assessing
the cotton production beforehand in US. BEPS ecosystem prototype helped in increasing cotton
gross main yields. Texas and Georgia used sentinel-2 data for this chief purpose. The ecology
model was obtained from Leaf Area Index information that was gained accurately from using
Sentinel-2 for research and other applications. The replicated GPP values of 20-m net
arrangement were combined at a regional level that was 17 states as a whole. The contrast
between the two showed 85% of different cotton productions. The testing suggests that the
expansion of Sentinel-2 Leaf Area Index period successions with the biological system model
makes a lot of beneficial cotton yields or creations [18].

Artificial Neural Network


Artificial Neural Network is a very famous classification technique these days as it does
information dispensation with the help of human neurons. However, artificial neural networks
contain some drawbacks such as massive data is required for training purposes, sluggish
working speed etc. Hyperspectral picture characterization approaches have introduced spatial
proof of hyperspectral pictures. The classification relying on spectral data is the One-
dimensional spectral vectors arrangement. In this function, pixels take out the spectral data or
info to gather particular structures for classification through features analysis. This will take
out spectral features and categorize them. In K-means classification technique, all the data
points' midpoints are calculated and compared with sum of squares of distances with the
geospatial crop data [15].
Literature Survey / Related Work 12

2.2.2.2.1 Results achieved


Leaf area index, leaf chlorophyll content and crop cover fraction which are commonly known
as crop descriptors, can be easily and diligently measured and estimated using ANN because
of spectral regions from near infrared spectral region and visible region which gives the idea
of crop yield but for the accumulation of above ground dry biomass we need to provide
important information of dry biomass accumulation and for this problem, a study has been
carried out which showed that certain approach for collecting information from optical sensing
data can be extracted to estimate above ground dry biomass. Plants communicate with sunlight
using different electromagnetic radiation containing different wavelengths which means that
incident solar radiation is following three ways to communicate which are transmission,
reflectance and absorption.
Since use of satellites became very wide for crop assessing and yield production purposes,
European Space Agency made Sentinel-2 A and B identical podium for the accuracy of
agriculture across the globe. It had improved spectral and spatial bands [19]. Sentinel-2 A+B
group provided technical and better features for the crops and the software environment for this
satellite was also enhanced. Previous Sentinel-2 and Sential-2 A+B were combined in their
specific constituents including biotic and abiotic discovery and other management systems. As
compared to previous Sentinal-2, Sentinal-2 A+B showed much improved monitoring of crops
and their estimation for future. This had increased agricultural growth drastically across the
globe. Sentinal-2 provides large ranges of its useful purpose but developments can still be
made.
Forecasting the yield using biomass calculated from satellite images 13

Literature Review Summary Table


This table contains the summary of various past research papers from 1988-2020.
Table 1: Work done on remote sensing data
This table contains the summary of various past research
papers from 1988-2020.
Name,
No. Inventor Year Input Output Description
reference

LAI
Ittai
assessment of Spectral data
Herrmann, LAI is
wheat and was collected LAI is achieved
Agustin spectrally
1. potato crops 2011 by VENus and using red edge
Pimstein, achieved
by VENus Sentinel-2 spectral bands.
Karnieli, hgigh.
and Sentinel- bands
Coehn
2 bands

Quantitative Spectral properties


Hyperspectral
estimation was were discussed
remote Sahoo,
2. 2015 Optical data. developed and using
sensing of Ray, K R
high results electromagnetic
agriculture
were achieved. spectrum.

NDVI-LAI
inversion models
M.
were generated
Retrieving Aboelghar, LAI field
Algorithm which give
leaf area S. Arafat, measurements
predicted LAI accuracy for three
index from A.Saleh, collected
3. 2010 with 95% different rice
SPOT4 S.Naeem, through LAI-
confidence for varieties. The
satellite data, M. plant canopy
rice varieties. models generated
[1] Shirbeny, analyzer device
are empirical
A. Belal
models which are
limited to area.
Literature Survey / Related Work 14

Leaf area
index from God
CHRIS AM Data from relationship LAI is successfully
satellite data Smith, Compact High between estimated using
2. and Nadeau, J. 2005 Resolution ground based CHRIS satellite
applications Freemantle, Imaging and remote data which is used
in plant yield Hans Wehn Spectrometer sensing in crop modelling.
estimation, derived LAI.
[2]

Using 2% increment The model used in


Satellite Data in efficiency is this process used
Ten fields’ data
to improve achieved using three variables and
3. S. J. Maas 1988 captured from
model model for produced effective
Satellite.
estimated of simulating results with an
crop yield, [3] growth increase of 2%.

Retrieving
leaf area
index using Forest tree Accurate and
Direct and indirect
remote Zheng, species data reliable leaf
4. 2009 methods are used
sensing: Moskal obtained from area index is
for obtaining LAI.
theories, remote sensing achieved.
methods and
sensors, [4]

Comparing
methods for
Satisfactory
estimating Multi angular
221 samples performance
leaf area He, Ren, remote sensing is
were used was achieved
index by Wang, Liu, used which
5. 2020 which were with
multi angular Zhang, Liu, included four
partitioned into coefficient of
remote Feng, Guo methods for
two databases. determination
sensing in estimating LAI.
>0.72.
winter wheat,
[5]
Forecasting the yield using biomass calculated from satellite images 15

Remotely
sensed rice Regression models
yield are developed for
Remote sensed
prediction Predicted yield NDVI estimation
data of rice
using multi- Huang, of rice was for rice yield
6. 2013 crops collected
temporal Wang significantly estimation given
in five growing
NDVI data high. the area of
seasons.
derived from production in five
NOAA’s- growing seasons.
AVHRR, [6]
The accuracy
of crop Multi resource data
Yield estimation based estimation
prediction Field and model is model is developed
with machine meteorological achieved and machine
Alireza
7. learning 2020 data collected highest when learning algorithm
Sharifi
algorithms from Sentinel- the time gave satisfactory
and satellite 2. interval results with three
images, [7] between crop evaluation
and harvest is indicators.
smaller.
Normalized
difference
Sultana,
vegetation Split plot design
Ali,
index as a Nitrogen was used as a
Ahmad, Soil samples
tool for wheat treatment gave model to
8. Mubeen, 2014 were used for
yield maximum experiment keeping
M. Zia-ul- the experiment.
estimation: A NDVI value. in view the
Haq,
case study for nitrogen rates.
Ahmad
Faisalabad,
Pakistan, [8]
Analysis of
relationship
between
cereal yield Remote sensing Linear regression
Strong
and NDVI for data for our model was
correlation
selected Panek, vegetation developed to study
9. 2020 NDVI and
regions of Gozdowski seasons was correlation between
yield was
Central obtained by NDVI and crop
observed.
Europe based MODIS sensor. yield.
on MODIS
satellite data,
[9]
Literature Survey / Related Work 16

Simulating
tropical
forage growth Andrade,
Different empirical
and biomass Santos,
Effective models were
accumulation: Pezzopane,
Local region biomass developed to
10. an overview Araujo, 2015
samples. accumulation predict crop growth
of model Pedreira,
was achieved. and biomass
development Marin,
accumulation.
and Lara
application,
[10]

Maize
Observed
Radiation use Lindquist, Short interval crop
Maize data was values of
efficiency Arkebauer, growth rate method
collected over biomass
11. under optimal Walkers, 2005 was used to obtain
five growing accumulation
growth Cassman, PAR by using two
seasons. and PAR were
conditions, Dobermann methods.
effective.
[11]

Remote sensing
techniques are
making used of
hyperspectral
images more
Different
Overview of Classification widespread.
Wenjing classification
Hyperspectral into different Supervised, semi-
Lv, methods to be
12. Image 2020 types and sub supervised and
Xiaofei combined for
Classification, types of unsupervised
Wang hyperspectral
[12] learning classifications have
images
sub categories to
represent different
techniques for
hyperspectral
images.
A 3-D neural
A new deep network is used for
convolutional the efficiency of
Three-
neutral Reduction of CNN and mirror
dimensional
network for computation strategy is used for
Paoletti, network using
13. fast 2018 time and clearly processing
Haut, Plaza both spatial and
hyperspectral increased the image border.
spectral
image efficiency This model proved
information
classification, to be better than
[13] different ANN
techniques.
Forecasting the yield using biomass calculated from satellite images 17

Proved to be better
than current
Hyperspectral Jin, Hui, spectrometry
imaging using Wang, approaches with a
Accurately
the single- Huang, Shi, Single-pixel wide spectral range
14. 2017 constructed
pixel Fourier Ying, Liu, technique. of 400-1100nm and
image.
transform, Qing Ye, spectral resolution
[14] Zhou, Tian of 1nm with less
measurement data
up to 6.25%.
Sentinel-2 A+B is
Remote
enhanced version
Sensing For
Better spatial of Sentinel-2 with
Precision Segarra,
and spectral drastically
Agriculture: Luisa
Sentinel-2 A+B resolution and improved
15. Sentinel-2 Buchaillot, 2020
satellite wide agricultural
Improved Araus, C.
application capabilities, biotic
Features And Kefauver.
ranges. and abiotic factors
Applications,
and higher
[15]
resolution.
Cotton Yield
Estimate GPP values were
Using replicated to the
Sentinel-2 Liming He, Sentinel-2 country level at
85% variation
16. Data and an Georgy 20196 biophysical 20m grid spacing.
in cotton yield
Ecosystem Mostovoy data. Sentinel-2 provides
Model over reliable estimation
the Southern of cotton yields.
US, [16]
Monitoring
urban growth
Information on
and land use Built up area
urban growth, land
change increased from
use and land cover
detection with Land use 26 to 255km^2
is very effective for
GIS and Hegazy, change more than 30%
17. 2015 local govt and
remote Kaloop detection by and
urban planners for
sensing GIS. agricultural
enhanced
techniques in land reduced
developmental
Daqahlia by 33%
plans.
governorate
Egypt, [17]
Literature Survey / Related Work 18

Surveillance
The use of remote
of Arthropod
Status of sensing techniques
Vector-Borne
Kalluri, Satellite data remote sensing to map vector-
Infectious
Gilruth, and studies of borne diseases has
18. Diseases 2007
Rogers, epidemiological arthropod changed
Using Remote
Szczur data vector-borne\ meaningfully over
Sensing
diseases. the previous 25
Techniques,
years.
[18]
Forecasting the yield using biomass calculated from satellite images 19

Requirements and Design


In this chapter, we will be discussing functional and non-functional requirements of our system
discussing all the relevant details about the system design and its working.

Functional Requirements
The functional requirements are listed below which will constitute the system.
• System will be able to get the required sentinel data by first achieving the GeoJSON
coordinates using JavaScript.
• System will be able to preprocess the data using SNAP toolbox which will build the
raster images to be further used in vegetation indices calculation.
• System will be able to get TIFF format LAI and NDVI files.
• System will be able to get the corresponding NDVI and LAI values using the spatial
analyst toolbox in ArcMap.
• System will be able to validate the data using regression model having LAI and NDVI
values.
• System shall be able to fetch the predicted results.

Non-Functional Requirements
The non-functional requirements are listed below which will aid in efficient working of system.

3.2.1 Reusability
System will be reusable in a way that it provides consistency and workable environment every
time user uses it. The system components should be modular enough to produce correct results.

3.2.2 Reliability
System will be reliable and provide immediate and correct results for each type of crop and
will provide reliable results. System will be self-explanatory which provides well
understanding for user.

3.2.3 Extensibility
System will be extensible for researchers so that anyone can contribute in the other researches
of agriculture sector. System will be able to provide well oriented interface so that researchers
benefit from it.

3.2.4 Performance
System will provide accurate results and will have the high performance on any kind of crop
data. System will be able to run efficiently on any kind of computer system so that crop data is
extracted well and perform well.

3.2.5 Robustness
System will be able to process data efficiently without the loss of any important information
and system shall be unaffected if any damage occurs.
Requirements and Design 20

Hardware and Software Requirements


For our research purposes, we need the following hard and software requirements to be needed
which are listed below.

3.3.1 Hardware Requirements


Listed below are the hardware requirements required for our system to become operational.
• Stable internet connection.
• Graphic Processing Unit (GPU) system.
• Enough memory to support large dataset.
• Fast operating system: macOS.

3.3.2 Software Requirements


The software requirements considered for our system are listed below:
• SNAP Toolbox
• ArcMap
• QGIS
• PyCharm
• Jupyter Notebook
• OpenCV
• Planet API
• JSON
• Rasterio
• Matplotlib
• Tensor Flow
• SentinelHub
• Google Colab
• Stable browser: Chrome or Firefox.
Forecasting the yield using biomass calculated from satellite images 21

System Architecture
3.4.1 Architecture Diagram

Figure 10: System Architecture


Detailed system architecture of our model.

3.4.2 System Modules


Following is the description of our system modules through which our research will flow.

Data Retriever Module


The sole purpose of this module will be to collect the required data through SentinelHub,
system can also get the required data through Planet webpage portal which requires licensing
and there is also a 14-day trial associated with it. The required data can also be collected using
Python which in turn imports Planet API and requests the real time data by specifying the data
constraints on it.

Vegetation Index Calculator


The next module is responsible for calculating the different vegetation indices which we are
going to use in our research. These vegetation indices can be computed using the formula as
shown in equation (2) associated with each one of them or can directly be extracted using
Requirements and Design 22

SNAP software which has the built-in functionality of producing the result in GeoTIFF/TIFF
format files.
λ1
NDVIλ1,λ2 = λ2- +λ1 (1)
λ2

Data Preprocessing module


Further, data will be preprocessed using ArcMap and QGIS which correctly analye the data
and produces the excel sheet file format for the vegetation indices and in turn make random
point values using Spatial Analyst tool.

Training Model
As the data will be finalized and passed through the above steps, the next step is to correctly
train our model which will be selected using the results from different research papers. The
most widely used model to be used is Regression model which constitutes Machine Learning
algorithm.

Analyze Results
After training our model, results will be analyzed using RMSE, standard deviation and variance
and other different statistical measures which in turn will give us the required information and
it will check the overfitting of our model.

Display Results
After our training and testing data has been passed to our model, and the correct predictions
have been made, system will produce the required result to the console screen so that user
maintains connection with it.

Yield Estimation
This section is about the calibration of yield results as this paper focuses on the crop yield
prediction, which is the step wise procedure which is attained using prediction using linear
regression method. It is significant to implement the remote sensing-based framework for crop
development assessment corresponding with the field approximations of the harvest
biophysical boundaries. This examination plans to apply the rice-developed territories to give
a prediction for the normal yield utilizing Sentinel-2 satellite information. As discussed in
previous chapters, the main methodology includes integrating finding vegetation indices which
are the best estimators for forecasting the yield and with the in-complexity analysis, we have
pointed down to using Leaf Area Index (LAI) as the straight estimator for yield prediction as
it has established to be the highest predictor. The vegetation indices used for crop yield
estimation are:
Forecasting the yield using biomass calculated from satellite images 23

Figure 11: List of Vegetation Indices


This figure represents different vegetation indices formulas to
be used.

The above figure gives us the evidence about different vegetation indices and their preparations
to be calculated for crop yield estimation. In the previous work, dissimilar indices were used
for yield estimation and by having in complexity analysis of all those indices used, LAI
outpaces all of them and it is also the key variable in plant’s biomass evaluation. The linear
regression models were used to guesstimate which index performs best and after having the
detailed analysis, we chose to use LAI for crop yield estimation. Different articles
recommended that among different vegetation indices, LAI presented best results for yield
prediction which is composed of red and near infrared bands. Linear regression is an approach,
which involves relationship between independent and dependent variables, and it is called
linear regression model, which is used to predict value depending upon some other variable.
So in this paper, we are going to reveal what variables are going to be used and how the yield
is going to be predicted.
Implementation 24

Implementation
Following are the details of how we gathered our data, processed it and used certain algorithm
to see the outputs.

Implementation
Following is our detailed implementation of the work we have done so far and the further
section elaborates the detailed explanation of our work.

4.1.1 Data collection


As deliberated in above chapters, crop yield predicting is a significant task for policy makers
and budget allocators, the sole determination has been trusting on how to predict the precise
yield and for those dissimilar techniques have been measured by doing research level work. A
linear regression model has to be used because it will evaluate the perfect linearity between
different variables and for the analysis between the predicted and actual value, root mean
squared error is the technique which will tell if the value is greater than some threshold the
relationship is high otherwise it is not. Thus, the main task relies on the data set which is
significantly the serious part for any model calibration. The following figure shows the AOI
we are going to use in this report.

Figure 12: Rahim Yar Khan Area


Marking point of interest of Rahim Yar Khan on EO map.

The method is actually about how to obtain the at all sensed data and then how to smear
regression analysis on that data to get the accurate yield results. The rice and cotton yield data
has been deliberated for this report and it is collected from the official Sentinel Hub website
which ensures the real time data collection. For that, there are different ways on how to acquire
data based on agricultural indices which are various including the atmospheric and
environmental factors and these factors give clear indication about the environment being faced
at the crop site. The area which we are targeting is Rahim Yar Khan and the geographical
coordinates are obtained from the GeoJSON which are shown in the image shown below. Once
the Area of Interest (AOI) is marked the data is collected using the time filter available on the
website and the real time data is collected which can be visualized too for ease. Moreover, it is
downloaded which requires heavy space on the disk.
Forecasting the yield using biomass calculated from satellite images 25

Figure 13: GeoJSON coordinates


Physical coordinates of Rahim Yar Khan Area.

Our scope is focused on acquiring the vegetation indices and specially Normalized Difference
Vegetation Index (NDVI) which is a useful tool in calibration of Leaf Area Index (LAI). These
vegetation indices give special reflectance in different bands which are collected through real
time data and useful information is extracted using satellite imagery processing toolbox like
SNAP and ArcMap. In our research work, the area of Rahim Yar Khan is focused and the data
has been collected starting from the Kharif season (April 2020-June 2020) till the end of it. The
data acquired is in the “tiff” format which gives the idea about additional information
associated with the data like map projections. The exact values of NDVI are extracted from
SNAP toolbox which gives the ranges of NDVI values within the real time data and uses
different bands to calculate the value of NDVI and as the exact values are determined it will be
successfully used in calculating the LAI value and it is the best estimator of plant’s biomass
which will be further used in yield regression analysis. For the collection of data, different
geographical points have to be used in order to get exact crop data. There are many formats
which can be used to get these values like getting the NDVI in pure image format where
different levels of colors can then distinguish which NDVI values are best to be used.
Therefore, we have used the exact NDVI values for our project and this data is of high
importance and will further be used for the real time crop data. The advanced search facility in
EO browser helped us to get the 0% cloud coverage data.
Implementation 26

Figure 14: Data collection in EO browser


Advanced search facility in EO browser for data collection.

4.1.2 Preprocessing of Data


Leaf area index (LAI) is the most important vegetation index for calculating biomass and
estimating yield of the crop. LAI can only be measured through remote sensing data. Most of
the methods and models of yield estimation and other related models and methods use LAI as
one canopy parameter. To calculate LAI, first, we need Normalized difference vegetation Index
(NDVI). For analysis of canopy properties of remote sensing data, we usually use Spectral
Vegetation Indices (SVIs). For our study, we needed NDVI which is the most general SVI. It
can be calculated as:
NIR-RED
NDVI= (2)
NIR+RED

Here RED is the reflectance in the red wavelength; and NIR is the reflectance in the near-
infrared wavelength. For calculation of NDVI, we use Sentinel-2 imagery.
To pre-process the images from Sentinel-2, NDVI is calculated for each crop separately.
Sentinel-2 gives us Hyperspectral images, in this step, crop’s spectral information was
collected and statistical analysis were done. We download the Sentinel-2 imagery data from
Sentinel-2 website for pre-processing, selecting the imageries without clouds and transformed
Forecasting the yield using biomass calculated from satellite images 27

geographical coordinates as we desired. Then, we cropped Sentinel-2 imageries according to


our crop field’s land-use of our study area. Since, NDVI is a mathematical combination
between the red band and the NIR band suggested by it formula above, it produces values
between -1 and +1. Then these NDVI values are classified. NDVI value is less (toward 0 range)
even contrary due to domination of water in the fields). Gradually the values increase by the
time as plant grows and there are less grains. After specific time for specific crop, the value of
vegetation index will decrease as grain maturity level until nearing the harvest. Value of NDVI
is important to under the crop health and different perspectives, for example, if we consider
our crop is rice, high NDVI values suggest that there is high chlorophyll content, it is important
part of rice, since, it is responsible for photosynthetic activity that produces carbohydrates in
plant, which supports the growth of rice.

Data Extraction using QGIS


QGIS is also for preprocessing the data as it is the geographic information system software
which is used to achieve and preprocess geospatial data and also for exploring maps. We have
achieved the data from sentinel hub which is further preprocessed using QGIS.

Figure 15: Data Layer in QGIS


LAI and NDVI layers achieved from data.

The core purpose of extracting the data from QGIS is to get training and testing data which is
exported to CSV or XLSX format. There is the plugin in QGIS called as “Point Sampling Tool”
which is initially imported and used for achieving the required data points being specified as
the input to this module.
Implementation 28

Figure 16: Point Sampling Tool


Point Sampling Tool in QGIS for extracting random points.

4.1.3 Building and Training Machine Learning Model


As discussed in above findings, we can see that machine learning is one of the most widely
used algorithm, from the literature review we can see that neural networks and linear regression
models are used widely. Different vegetation indices have been used in variety of papers and
true investigation lies in determining what vegetation indices will be used as independent
variables to predict yield (dependent variable) using equation (4) we can see the relationship
between different variable, where ‘X’ is the independent variable, ‘Y’ is the dependent variable,
‘a’ is the line intercept and ‘b’ is the slope of the line. Thus, the empirical formula has to be
calculated to predict cotton and rice yield in the harvest season of Kharif. The model developed
has proven to be working best on the peak/maximum LAI predicted. Depending upon the crop
type, the linear regression for cotton crop yield estimation is:

Y=a+bX (3)

Other machine learning models are complex and computationally expensive as in accordance
to our dataset. Neural networks are very expensive and they require huge amount of dataset
which makes them slow. SVM, as discussed above, is not sensitive enough to detect the data
anomaly which left us with regression techniques where multiple linear regression is efficiently
useful which can incorporate multiple independent variables to predict the yield. Normalized
Difference Vegetation Index (NDVI) statistics are used to detect crop’s state and forecast yield
as well as production in various regions across the globe. Remote sensing and spectral
reflectance are used to provide us information about different vegetation indices as well as
NDVI. NDVI follows the method that actively growing plants absorb the radiations in visible
part of spectrum whereas highly reflecting radiations in the near-infrared region.
NDVI is calculated with the help of different approximation equations. The data is processed
and used to get the yield approximation. Linear regression and correlation techniques are used
to establish a relationship between yield and NDVI. There is a very strong correlation present
between NDVI and crop yield. The relationships between crop yield and NDVI turned out to
be enormously positively correlated to each other at the time of stem growth, striking and
maturity stages. The NDVI values were obtained through real time satellite values. Regression
equations and coefficient of determination were used for linear regression purposes. The lowest
NDVI values came out to be in March and the highest NDVI values came out to be in June.
Forecasting the yield using biomass calculated from satellite images 29

Libraries used
• CSV module for file reading and writing.
• NumPy module for scientific computation.
• Matplotlib module for creating figures and graphs.
• Scikit-learn module for using algorithms.
• SciPy module for computational purposes.

Leaf Area Index prediction using Linear Regression


As discussed in the above sections of data gathering, the real time data is used to predict LAI
values which has been obtained in the excel sheet. For the purpose of loading it, Python is used
to import the files and the required data has been loaded. The following piece of code shows
the import of data.

Figure 17: Loading Data Module


Piece of code for loading the data for prediction.

For the purpose of preprocessing the required data, we have used “reshape” function in Python
which is used to reshape an array of loaded data but it does not change the data values itself.
The training data and testing data have been divided in the effective ratio for prediction. Python
has been widely used for Machine learning purposes and “sklearn.linear_model” library has
been used which is further used for implementing the Linear Regression model. The following
piece of code shows the preprocessing of data.

Figure 18: Preprocessing Data Module


Piece of code for preprocessing the data for prediction.
Implementation 30

Furthermore, as the required training and testing data has been loaded so we will use this data
for regression analysis which will give the required predicted LAI values against each actual
LAI value. Training and Testing data has been divided and passed into the fit function of model
which then predicts the data based on the testing data. The following piece of code illustrates
the required functionality.

Figure 19: Training Data Module


Piece of code for training and testing the data for prediction.

After training the data, we have passed the testing data in the model for getting the desired
results. There are numerous relationships between LAI and different vegetation indices. Most
of them are based on Normalized Difference Vegetation Index as it shows the greenness in a
plant and is more accurate in determining the plan health and yield estimation. We used NDVI
to calculate our LAI values to estimate our yield. The more accurate LAI is, the more we are
closer to accurate assessment of yield prediction. Both vegetation indices are affected by
temperature, humidity, water, soil etc. The following piece of code shows the actual and
predicted values.

Figure 20: Output module


Piece of code for displaying the results achieved after
prediction.

Leaf Area Index prediction using Random Forest Regression


Random Forest Regression has been widely used for the purpose of retrieving the prediction
values of LAI. Random Forest uses the ensemble learning technique for doing the regression
by building several decision trees. In the above section, we have illustrated the prediction of
LAI using Linear Regression, now we will show how to train and build Random Forest
Regressor, all the modules of retrieving and preprocessing data will remain same except for the
model which is shown below in the following piece of code.
Forecasting the yield using biomass calculated from satellite images 31

Figure 21: Training Data Module


Piece of code for training and testing the data for prediction.

Here, n_estimators show the number of trees to be made at runtime for decision purposes. After
training the data, we have passed the testing data in the model for getting the desired results.
Now for seeing the output, actual and predicted LAI values have been displayed which are
shown in the following figure.

Figure 22: Output module


Piece of code for displaying the results achieved after
prediction.

The dependent and independent are decided with the help of regression equations again where
the values on LAI are dependent upon different vegetation indices including NDVI. This
relation is determined with the help of different equations. After this, the yield is estimated
through LAI. Higher LAI values indicate the higher yield prediction of the certain crops. It
means NDVI and LAI are strongly positively co-related to each other. The coefficient of
determination R2 gives the amount of variation that can be presumed by the independent
variable for a dependent variable. If root mean square error is zero, the predicted and measured
values are alike else they are not alike. The expected outcome for RMSE for a good output is
expected to be between 10% and 20%.
The LAI and yield were calculated using dissimilar instruments and critical methods are
applied location wise. The analysis has to be done for assessing the difference between the
calculated yield and actual yield and for that purpose root mean square error (RMSE) is used
which gives a certain value to correlate if predicted yield is validated yield or not. Similarly,
coefficient of determination also correlates the actual and predicted values and, in this case,
ground based and satellite-based LAI and yield values are obtained which gives the information
and validation if the required linear regression model is performing well or not. Thus, for this
crop type, the linear regression model is developed which proves to give the highest coefficient
of determination, which tells that predicted value is actually close to the real one. The remote
sensing appraised values of LAI helps to predict the crop yield and it is experiential from the
past research work that it outdoes all the vegetation indices for estimating the crop yield and
the specific value of LAI gives material about yield estimation one month prior to the harvest
which is meaningfully reducing the manual concentrated work. The section summarizes how
to calculate correct yield estimates for proper results.
Implementation 32

Yield Prediction using Random Forest Regression


As LAI was predicted in the above, again Random Forest regression is used to predict yield.
Data consisted of LAI values and previous yield values in Punjab. Accuracy of yield prediction
depends on the accuracy of LAI. Using the remote sensing technique and Random Forest
regression, LAI calculation had high accuracy, which helped to receive good results in yield
prediction. Now, we will show how to train and build Random Forest Regression. Data has
been obtained in the excel sheet. We imported the file in python and required data has been
loaded. Following piece of code shows the import of data.

Figure 23: Loading Data Module


Piece of code for loading the data for prediction.

For the preprocessing of the data, we used “reshape” function which is used to reshape array
of data but it does not make changed to the actual data. The data has been divided into the ratio
of 70-30% for training and testing respectively for prediction. “sklearn.linear_model” library
is responsible for implementing Regression Models. Following is the code for all this work.

Figure 24: Preprocessing Data Module


Piece of code for pre-processing the data for prediction.

Furthermore, 70% data was used as training data and 30% data was used as testing data, as data
is loaded, now, we will use this data for regression analysis which will provide us with
predicted yield values, and we can compare it with actual yield values. In the above section,
we have illustrated the prediction of LAI using Linear Regression and Random Forest
Regression, now we will show how to train and build Random Forest Regressor for calculating
yield in the similar manner, all the modules of retrieving and preprocessing data will remain
same except for the model which is shown below in the following piece of code.
Forecasting the yield using biomass calculated from satellite images 33

Figure 25: Training Data Module


Piece of code for training and testing the data for prediction.

Data was being trained and preprocessed and the passed to the Random Forest Regression
model. Predicted values have been obtained using testing data. Following is the result,
Prediction Yield vs Actual Yield.

Figure 26: Output Module


Piece of code for displaying the results achieved after
prediction.

Yield has been calculated using LAI values, which is considered to be one of the most
important parameters for yield prediction. There are many other factors which may affect the
yield but results that can be achieved using LAI, makes it reliable parameter. Previous studies
have been done on this topic. Results were always good, but there exist external factors such
as temperature, rain, vegetation supersaturation, soil water content, and nutrients, these factors
have effects on yield even after its being predicted correctly. Prediction can be improvised if
we keep these factors under consideration. Yield prediction is important for farmers as well as
government. It can prevent major losses and it can help to prepare for food shortage. Moreover,
it can help to estimate extra supply for export. This section shows the results of yield prediction
using LAI by Random Forest Regression method.
Experimental Results and Analysis 34

Experimental Results and Analysis


Validation of LAI prediction using Random Forest Regression
The Random Forest regression model was developed to predict the values of LAI using NDVI.
Results are shown in Figure 22. RMSE value is 0.1953 and Max error is 0.93. Difference
between actual and predicted values is very minimal. Therefore, Random Forest regression
model predicts LAI value using NDVI value is valid with 93% significance level. These results
are given below.

Figure 23: Results achieved


Coefficient of determination, RMSE, Max error values of RF
for LAI.

Validation of Yield prediction using LAI by Random Forest


Regression
The Random Forest regression model was developed to predict the yield using LAI. LAI was
calculated with accuracy of 93%. Results are shown in Figure 23. RMSE value is 0.93 and Max
error is 2.72. Difference between actual and predicted values is not significant. Therefore,
Random Forest regression model predicts Yield value using LAI value is valid with 72%
significance level. These results are given below.

Figure 24: Results achieved


Coefficient of determination, RMSE, Max error values of RF
for Yield.
Forecasting the yield using biomass calculated from satellite images 35

Conclusion
Increase in population has increased the demand of food all across the world. It becomes the
need of hour to predict the yield of crops beforehand so that crop production can be done easily.
In this way, farmers and agricultural companies can estimate how much biomass is required
for each type of crop and how much yield they will produce. This will also lead to save us in
case of natural disasters so that we can store crops. In this project, we used remote sensing
techniques to predict our yield. Various works have been done in this regard where yield is
predicted through different satellites. Our study focused on using the sentinel-2 satellite for
obtaining our real time data. Previous works show that researchers have applied different
artificial intelligence and machine learning models to calculate the yield. In most of the papers,
it can be seen that mostly different vegetation indices have been used to estimate the yield.
These indices often depend upon each other. Multiple equations have been used in different
papers in this regard.
We have successfully used Normalized Difference Vegetation Index (NDVI) and Leaf Area
Index (LAI) to determine the yield and we tried to improve the accuracy of the results by
reducing the root mean square error between original data and satellite data. In this way, better
results can be obtained and we can obtain our objectives. Farmers will be able to know through
this, which is the best time to harvest to crops in their full throughput. The challenges we faced
during this were mostly regarding the data collection and improving the accuracy of the model.
We have incorporated the concepts of Artificial Intelligence and Machine learning in this
project which has provided us the platform for solving this complex problem alongside the
underlying problems that we have encountered during this project setup. There were some
loopholes like getting the access to some software and applications needed to achieve the
results.
For other researchers planning to carry our work forward, they have to do the thorough research
on getting the dataset. Moreover, the work that has been done on the previous crop yield
prediction models has been able to achieve the similar results by using NN or regression
techniques, there are possible underlying limitations like the sensitivity of data and crop type.
This research was solely based on the crop findings of cotton and wheat targeting the Pakistan’s
atmospheric conditions. If the data has been available free of cost and has been according to
the given optimum conditions then research findings could have been improved much better.
References 36

References
[1] I. Herrmann, A. Pimstein, A. Karneili and Y. Cohen, "Research Gate," August 2011.
[Online]. Available:
https://www.researchgate.net/publication/256850034_LAI_assessment_of_wheat_and_
potato_crops_by_VENmS_and_Sentinel-2_bands. [Accessed 20 September 2020].
[2] R. N. Sahoo, S. S. Ray and M. K. R, "Research Gate," March 2015. [Online]. Available:
https://www.researchgate.net/publication/273321445_Hyperspectral_remote_sensing_
of_agriculture. [Accessed 20 September 2020].
[3] W. Lv and X. Wang, "Overview of hyperspectral image classification," 2020.
[4] J. Segerra, M. L. Buchaillot, J. L. Araus and S. C. Kefauver, "Remote sensing for
precision agriulture: Sentinal-2 improved features and applications," Agronomy Journal,
2020.
[5] S. jin, w. hui, Y. Wang, K. Huang, Q. Shi, C. Ying, D. Liu, Q. Ye, W. Zhou and J. Tian,
"Hyperspectral imaging using the single pixel fourier transform technique," 2017.
[6] M. Aboelghar, S.Arafat, A.Saleh, S.Naeem, M.Shirbeny and A.Belal, "Retrieving leaf
area iedx from SPOT4 satellite data," The Egyptian Journel of Remote Sensing and
Space Science, vol. 13, no. 2, pp. 121-127, 2010.
[7] A. M. Smith, C. Nadeau, J. Freemantle and H. Wehn, "Leaf area index from CHRIS
satellite data and applications in plant yield estimation," 2005.
[8] S. J. Maas, "Using satellite data to improve model estimates of crop yield," Agronomy
journal, vol. 80, no. 4, 1988.
[9] G. Zheng and L. M. Moskal, "Retrieving leaf area index using remote sensing: Theories,
mothods and sensors," 2009.
[10] L. He, X. Ren, Y. Wang, B. Liu, H. Zhang, W. Liu and W. Feng, "Comparing methods
for estimating leaf area index by multiangular remote sensing in winter wheat," 2020.
[11] J. Huang, X. Wang, X. Li, H. Tian and Z. Pan, "Remotely sensed rice yield prediction
using multi-temporal NDVI data derived frolm NOAA's-AVHRR," 2013.
[12] A. Sharifi, "Yield prediction with machine learning algorithms and satellite images,"
Journal of the science of food and agriculture, 2020.
[13] S. R. Sultana, A. Ali, A. Ahmed, M. Mubeen, M. Z. u. Haq and S. Ahmad, "Normalized
difference vegetation index as a tool for wheat yield estimation: A case study from
Faisalabad, Pakistan," vol. 2014, 2014.
[14] E. Paneka and D. Gozdowski, "Analysis of relationship between cereal yield and NDVI
for selected regions of central Europe based on MODIS satellite data," 2020.
[15] A. S. Andrade, P. M. Santos, J. R. M. Pezzopane, L. C. d. Araujo, B. C. Pedreira, C. G.
S. Pedreria, F. Marin and M. Lara, "Simulating tropical forage growth and biomass
accumulation: An overview of model development and application," Grass and forage
science, vol. v71, no. 1, 2015.
[16] J. L. Lindquist, T. J. Arkebauer, D. T. Walters, K. G. Cassman and A. Dobermann,
"Maize radiation use efficiency under optimal growth conditions," Agronomy Journal,
vol. v97, no. 1, 2005.
[17] M. E. Paoletti, J. M. Haut, J. Plaza and A. Plaza, "A new deep convolutional neural
network for fast hyperspectral image classification," ISPRS Journal of Photogrammetry
and remote sensing, vol. 145, pp. 120-147, 2018.
[18] L. He and G. Mostovoy, "Cotton Yield estimate using Sentinal-2 data and an ecosystem
model for the southren US," Remote Sensing, 2019.
Forecasting the yield using biomass calculated from satellite images 37

[19] S. Kalluri, P. Gilruth, D. Rogers and M. Szczur, "Surveillance of anthropod vector borne
infectious diseases using remote sensing techniques: A review," 2007.
[20] I. R. Hegazy and M. R. Kaloop, "Monitoring Urban growth and land use change
detection with GIS and remote sensing techniques in Daqahlia governorate Egypt,"
International Journal of sustainable built enviornment, vol. 4, no. 1, pp. 117-124, 2015.

You might also like