You are on page 1of 13

Computers and Electronics in Agriculture 153 (2018) 213–225

Contents lists available at ScienceDirect

Computers and Electronics in Agriculture


journal homepage: www.elsevier.com/locate/compag

Original papers

Integration of high resolution remotely sensed data and machine learning T


techniques for spatial prediction of soil properties and corn yield

Sami Khanala, , John Fultonb, Andrew Klopfensteinb, Nathan Douridasc, Scott Shearerb
a
Department of Food, Agricultural and Biological Engineering, Ohio State University, Wooster, OH 44691, USA
b
Department of Food, Agricultural and Biological Engineering, Ohio State University, Columbus, OH 43210, USA
c
Farm Science Review, Ohio State University, London, OH 43140, USA

A R T I C LE I N FO A B S T R A C T

Keywords: Widespread adoption of precision agriculture requires timely acquisition of low-cost, high quality soil and crop
Remote sensing yield maps. Integration of remotely sensed data and machine learning algorithms offers cost-and time-effective
Soil approach for spatial prediction of soil properties and crop yield compared to conventional approaches. The
DEM objectives of this study were to: (i) evaluate the role of remotely sensed images; (ii) compare the performance of
Yield
various machine learning algorithms; and (iii) identify the importance of remotely sensed image-derived vari-
Mapping
ables, in spatial prediction of soil properties and corn yield. This study integrated field based data on five soil
properties (i.e., soil organic matter (SOM), cation exchange capacity (CEC), magnesium (Mg), potassium (K), and
pH) and yield monitor based corn yield data with multispectral aerial images and topographic data, both col-
lected in 2013, from seven fields at the Molly Caren Farm near London, Ohio. Digital elevation model data, at a
resolution of 1 m, was used to derive topographic properties of the fields. Multispectral images collected at bare-
soil conditions, at a resolution 0.30 m, were used to derive soil and vegetation indices. Models developed for
prediction of soil properties and corn yield using linear regression (LM) and five machine learning algorithms
(i.e., Random Forest (RF); Neural Network (NN); Support Vector Machine (SVM) with radial and linear kernel
functions; Gradient Boosting Model (GBM); and Cubist (CU)) were evaluated in terms of coefficient of de-
termination (R2) and root mean square error (RMSE). Machine learning algorithms were found to outperform LM
algorithm for most of the times with a higher R2 and lower RMSE. Based on models for seven fields, on average,
NN provided the highest accuracy for SOM (R2 = 0.64, RMSE = 0.44) and CEC (R2 = 0.67, RMSE = 2.35); SVM
for K (R2 = 0.21, RMSE = 0.49) and Mg (R2 = 0.22, RMSE = 4.57); and GBM for pH (R2 = 0.15, RMSE = 0.62).
For corn yield, RF consistently outperformed other models and provided higher accuracy (R2 = 0.53,
RMSE = 0.97). Soil and vegetation indices based on bare-soil imagery played a more significant role in de-
monstrating in-field variability of corn yield and soil properties than topographic variables. The accuracy of the
models developed for prediction of soil properties and corn yield observed in this study suggested that the
approach of integrating remotely sensed data and machine learning algorithms are promising for mapping soil
properties and corn yield at a local scale, which can be useful in locating areas of potential concerns and im-
plementing site-specific farming practices.

1. Introduction target areas within the field for soil fertility interventions, improved
crop productivity, and better economic outcomes.
Accurate and detailed information on soil properties and crop Traditional approaches for mapping soil properties and crop yield
health is essential for optimization of farm management practices for have mostly relied on field surveys and the use of costly equipment. Soil
sustainable production of agricultural goods and services (Souza et al., sampling and laboratory analyses are conducted for evaluating soil
2016; Yao et al., 2016), as well as for environmental modeling, and health, and harvester-mounted yield monitors are used for under-
environmental risk assessment and management. High resolution maps standing the spatial variability in crop yield. These approaches however
of soil properties and crop yields enable producers and the agricultural are time consuming and expensive, especially when mapping needs to
community to identify in-field variability in soil and crop health and be done at regional, national, and global scales (Mulder et al., 2011;


Corresponding author.
E-mail address: Khanal.3@osu.edu (S. Khanal).

https://doi.org/10.1016/j.compag.2018.07.016
Received 9 January 2018; Received in revised form 17 April 2018; Accepted 8 July 2018
Available online 23 August 2018
0168-1699/ © 2018 Elsevier B.V. All rights reserved.
S. Khanal et al. Computers and Electronics in Agriculture 153 (2018) 213–225

Table 1
Basic characteristics of the fields studied, including field size, slope, dominant soil map unit, dominant soil order, number of soil samples, and field management
practices.
Field Size (ha) Slope (%) Soil map unit Dominant soil order Sample number Tillage Crop rotation

1B 11 4.37 Ochraqualfs (40.7%), Argiaquolls (31%), Epiaqualfs (18%), Argiudolls (10.3%) Alfisols 27 NT C-C-S
1C 5.3 5.86 Ochraqualfs (74%), Argiaquolls (26%) Alfisols 17 CT C-S-C
1D 6.5 4.35 Ochraqualfs (94.8%), Argiaquolls (5.2%) Alfisols 20 NT C-S-C
9A 13.3 5.7 Ochraqualfs (58%), Argiaquolls (42%) Alfisols 39 CT C-S-C
12D 17.5 4.98 Argiaquolls (46%), Hapludalfs (27.9%); Ochraqualfs (23.8%) Mollisols 49 CT W-S-C
MISD 12 9.26 Ochraqualfs (82.5%), Argiaquolls (18.5%) Alfisols 36 CT S-W-C
PENIN 3.8 9.6 Ochraqualfs (98%) Alfisols 12 NT C-C-S

Tillage: NT – No Till; CT – conventional tillage (i.e., field cultivator was used prior to planting the crop). Crop Rotation: C- Corn; S- Soybean; W-Wheat.

Yang et al., 2014). Furthermore, these approaches have several lim- The objectives of this study were to: (i) examine the role of remotely
itations. For example, yield monitor based data can only be collected at sensed images; (ii) evaluate the performance of linear regression and
harvest and, thus, cannot be used for in-season crop management. Also, machine learning algorithms; and (iii) identify the importance of re-
these data are spatially coarse and fail to capture in-field variability in motely sensed image-derived variables, for prediction and mapping of
soil and crop health (Souza et al., 2016). soil properties and corn yield. Seven statistical models were developed
Remotely sensed images have the potential to overcome the lim- for predicting corn yield and soil properties. Soil properties examined in
itations of traditional approaches and improve the spatial coverage of this study included soil organic matter (SOM), cation exchange capacity
soil and crop yield data (Peng et al., 2015; Stevens et al., 2013; Yao (CEC), potassium (K), magnesium (K), and pH. Prior studies (Forkuor
et al., 2016). Studies have demonstrated that many soil properties can et al., 2017; Morellos et al., 2016) have used remotely sensed data for
be estimated by integrating georeferenced field collected soil and crop mapping of soil properties; however, this is to our knowledge the first
data with spectral properties of soil acquired by sensors onboard sa- evaluation of remotely sensed images of bare soil surface at a spatial
tellite and aircrafts. Dobos et al. (2001) found the Advanced Very High resolution < 1 m from multiple fields for prediction and mapping of
Resolution Radiometer (AVHRR) satellite data and DEM derived terrain both soil properties and corn yield.
variables to be powerful in characterizing soil-forming environments
and delineation of soil patterns on a regional scale. Scudiero et al. 2. Materials and methods
(2014) found multi-year spectral reflectance data from the Landsat to
be a reliable indicator of soil salinity in the western San Joaquin Valley 2.1. Study area
in California, USA. Several studies have also been conducted focusing
on crop yield mapping by integrating remotely sensed images acquired Fields examined in this study are located in the northwest part
from satellite (Lobell et al., 2015), aircraft (Yang et al., 2014), and (83°26′14.3″–83°26′49.24″W, 39°56′37.82″–39°57′28.7″N) of Madison
unmanned aerial vehicles (Geipel et al., 2014; Shi et al., 2016). County, Ohio, USA. The dominant soil types in these fields are
Despite prior efforts, further exploration on the application of re- Ochraqualfs (Crosby-Lewisburg Complex), Argiaquolls (Kokomo Silty
motely sensed data for mapping of soil properties and crop yield is Clay Loam, Westland silty clay loam), and Hapludalfs (Miamian Silt
needed. The success in prediction and mapping of soil properties, and Loam, Eldean silt loam, Thackery variant silt loam) (Table 1). These
crop health and yield using remotely sensed data to a large extent de- fields are gently rolling, with the mean slope ranging from 4.35 to
pends on the availability, quality, and timing of remotely sensed data 9.26%. The average elevation of the fields is 311 m. The mean annual
collection (Blasch et al., 2015), as well as the approaches used for rainfall (1981–2016) is 998 mm with approximately 58% of annual
model development (Forkuor et al., 2017; Morellos et al., 2016). Prior rainfall occurring between April and September. The mean annual
studies have mostly focused on estimating crop yield and soil properties temperature is 10.9 °C, with daily temperatures ranging from −6.7
at regional scales rather than for individual fields (Lobell et al., 2015). (minimum) to 29.2 °C (maximum).
These studies used satellite acquired remotely sensed images with A strong spatial variability in soil properties was observed in the
coarse spatial resolution. Mapping of soil properties and crop yield at study area. Soil properties were characterized by large range and high
coarse resolution is of limited use for resource assessment and man- standard deviation, with SOM in the range of 1.2–4.9 (%), CEC of
agement at a field scale; whereas, maps at high resolution can help the 6–27.3 (meq/100 g), K of 1.2–5.9 (%), Mg of 10.2–36.7 (%), and pH of
agricultural and environmental community to cost-effectively detect 5–78 (Table 2).
and characterize the extent of soil and crop health issues. This in-
formation can be used for prescription-based farming that help improve 2.2. Data
economic outcome and environmental footprints associated with agri-
cultural practices. 2.2.1. Soil and crop data
A linear regression algorithm is the most commonly used approach A total of 200 soil samples were collected from seven bare fields
to estimate crop yield and soil properties (Geipel et al., 2014; Lobell (Table 1) in October 1, 2013. In each field, samples were taken at a
et al., 2015). However, it has limitations in handling non-linear re- depth of 18 cm on 1-acre intervals. The samples were air-dried at 49 °C
lationships between response and predictor variables that usually exist (120 °F) for 24 h, sieved, and sent to the Spectrum Analytic lab (Spec-
in heterogeneous agricultural landscapes. There are several machine trum Analytic, 2017) for soil analyses. As field 12D has very different
learning algorithms that can overcome this limitation, and provide soil map units compared to six other fields (Table 1), soil samples were
better prediction of soil variables and crop yield. However, comparisons classified into two dominant soil orders (Alfisols and Mollisols), and a
of the traditional linear regression algorithm to machine learning al- “group” was introduced as an independent variable for model devel-
gorithms for prediction of soil properties and crop yield are limited. In opment.
addition to understanding the performance of various models in map- Corn yield data were available for only one field (i.e.,12D), and
ping soil properties and crop yield, there is a need to identify the re- thus, the models for corn yield prediction were focused on this field
lative importance of variables for enhancing the predictive ability of the only. Corn yield data were recorded by a John Deere yield monitoring
models. system during harvest. The yield monitor was calibrated before and

214
S. Khanal et al. Computers and Electronics in Agriculture 153 (2018) 213–225

Table 2 2.2.2. Remotely sensed data


Summary of soil properties and corn yield for study area. Remotely sensed data used in this study included high spatial re-
Soil variables Minimum Maximum Mean Standard deviation solution multispectral images collected from bare fields and digital
elevation model (DEM). Multispectral images (Fig. 1) were obtained in
All data May of 2013 under the Ohio Statewide Imagery Program. They were
SOM (%) 1.20 4.90 2.31 0.71
collected with a Leica ADS80 digital camera onboard aircraft, and
CEC (meq/100 g) 6.00 27.30 15.31 4.18
K (%) 1.20 5.90 2.28 0.56
rectified using LiDAR data, and have visible (red, green, and blue) and
Mg (%) 10.20 36.70 26.24 5.24 near-infrared wavebands at 0.30 m spatial resolution. Six soil and ve-
pH 5 7.8 6.75 0.65 getation indices that were found useful in digital soil mapping (Ray
Corn yield (t/ha) 6.4 18.0 14.39 1.47 et al., 2004) were calculated using the combination of spectral bands in
Train data the multispectral images. Table 3 provides further details on the spec-
SOM (%) 1.2 4.9 2.32 0.68 tral indices considered in the study. Terrain variables (Table 4) were
CEC (meq/100 g) 6 27.3 15.35 4.17
extracted using DEM data, with 0.76 m resolution, available from the
K (%) 1.2 5.9 2.29 0.58
Mg (%) 10.2 36.7 26.44 5.10 Ohio Geographically Referenced Information Program. Prior to the
pH 5 7.8 6.76 0.65 calculation of terrain variables for analyses, the DEM was pre-processed
Corn yield (t/ha) 6.11 18.0 14.4 1.47 to generate a depression free DEM. To ensure the proper integration
Test Data between varying datasets, the DEM was resampled at 0.30 m resolution,
SOM (%) 1.2 4.6 2.26 0.77 the resolution of the multispectral images, using the bilinear inter-
CEC (meq/100 g) 7.1 22.5 15.1 4.15 polation method.
K (%) 1.6 3.6 2.2 0.4
To minimize the potential variance among pixels that might have
Mg (%) 11.7 36.7 25.36 5.6
pH 5.4 7.7 6.71 0.66 been introduced by various factors, such as microtopography, image
Corn yield (t/ha) 6.9 17.9 14.41 1.44 processing, and scanning, a low-pass filter with a 5 by 5 cell mask was
applied to each band of the multispectral images and the DEM (Hively
Note: Train and test data indicate soil samples and corn yield observations used et al., 2011). Bare soil imagery for field 12D was classified into three
for model development and validation, respectively. soil color classes – dark, medium and light, using supervised algorithms
including support vector machine (with linear and radial kernel),
during the harvest to minimize the potential error in yield estimates. random forest, and neural network in R software. Among these algo-
Despite proper calibration, due to changes in machine orientation and rithms, support vector machine with radial function provided the
speed, yield monitor is likely to provide erroneous yield estimates in highest classification accuracy of 81%. Details of these algorithms are
field edges (Lyle et al., 2014). Thus, a 20 m buffer was established in- provided in Section 2.3.1.
side the field edges, and the yield data from that buffer were excluded Spectral bands, spectral indices, and terrain properties were ex-
from the analyses. Additionally, yield data was checked for errors, and tracted at locations used for collecting soil samples and yield data using
errors were removed using the workflow discussed by Sudduth and ArcGIS software, and related with soil properties and corn yield to es-
Drumm (2007). tablish the relationship between remotely sensed and field-measured

Fig. 1. Seven fields used in the study. Circles and stars indicate spatial locations of soil samples used for model development and validation, respectively. Field 12D
was used for corn yield prediction. The background image is a multispectral displayed with a combination of red, green and blue wavebands. (For interpretation of
the references to colour in this figure legend, the reader is referred to the web version of this article.)

215
S. Khanal et al. Computers and Electronics in Agriculture 153 (2018) 213–225

Table 3
Soil and vegetation indices considered for the analyses.
Indices Formula Index property Reference

Brightness Index (BI) 2 + G2 + B2 0.5 Average reflectance magnitude Ray et al. (2004)
⎛R ⎞
⎝ 3 ⎠
Saturation Index (SI) (R − B ) Spectral slope Ray et al. (2004)
(R + B )
Hue Index (HI) (2 ∗ R − G − B ) Primary colors Ray et al. (2004)
(G − B )
Coloration Index (CI) (R − G ) Soil color Ray et al. (2004)
(R + G )
Redness Index (RI) R2 Hematite content Ray et al. (2004)
(B × G3)
Normalized Difference Vegetation Index (NDVI) (NIR − R) Health and amount of vegetation Ray et al. (2004)
(NIR + R)

Table 4
Terrain variables considered in the study.
Parameters Definition Units References

Elevation (Elev) Height above a sea level Meter


Slope Inclination of the land surface Degree Allen et al. (2014)
Aspect Direction the slope faces Degree Davy and Koen, (2014)
Roughness (Rough) Difference between maximum and minimum elevation – Wilson et al. (2007)
Terrain Ruggedness Index (TRI) Amount of elevation difference between neighboring areas – Riley (1999)
Topographic Position Index (TPI) Measure of where a location is in the overall landscape – Wilson et al. (2007)
Flow Direction (FlowDir) Path of water flow – Kitchingman and Lai (2004)

data. To extract information from images and relate to corn yield, a Johnson, 2013). To provide an unbiased sense of model effectiveness,
rectangle with the length equal to harvester swath width (6.32 m) and the total data was randomly split into training and test sets at a 4:1
width equal to the distance between logged data points (2 m) in a row ratio, where the training set was used for model calibration and the test
was drawn around each logged yield data point. This is done because set was used for model evaluation. Although the splitting of data was
the size of area represented by each logged point in a yield monitor is random, mean values of corn yield, SOM, CEC, K, Mg, and pH between
proportional to the header width and the distance travelled by the the training and test sets were ensured to be similar so that the cali-
combine harvester between logged data points. Image related in- brated models were well trained to predict the range of soil properties
formation was extracted at these polygons using the zonal statistics and crop yield in the test dataset (Table 2).
function of the Spatial Analyst tool in the ArcGIS. Each rectangle in-
cluded 97 pixels from the images with 0.30 m resolution, and an 2.3.1. Statistical models
average of these pixel values was related to a yield observation. It is our During the model design, soil properties and corn yield were the
understanding that the DEM and multispectral images used in this study dependent variables, and spectral, soil color class, and terrain variables
are the highest spatial resolution datasets ever used in mapping soil were the independent or predictor variables. One soil property was
properties. modeled at a time as the dependent variable against all predictor
variables. For each model, the adjusted R2 and residual standard error
2.3. Statistical analyses were considered. During model development, random numbers are
used for splitting data for resampling and parameter estimations (Kuhn
All statistical analyses were performed using R software. Six sta- and Johnson, 2013). To control the randomness ensuring that the same
tistical models - linear regression (LM), random forest regression (RF), resampling sets were used during cross-validations of models and assure
support vector machine (SVM), stochastic gradient boosting model reproducible results for comparison between models, the same random
(GBM), neural network (NN), and cubist (CU), were developed for number seed was set prior to the development and training of all the
predicting soil properties and corn yield. The performance of these models. Variables were scaled prior to model runs to ensure that all
models were then compared to determine the best model. For the model variables are on the same scale. Seven different models (briefly dis-
development, the statistical package “caret” was used. The “caret” cussed below) were developed for each soil parameter and corn yield
package allows fitting and comparisons of numerous linear and non- estimation. The details on the background and mathematical function
linear regression models under a unified framework (Kuhn and of these models can be found in Kuhn and Johnson (2013).

Table 5
Explanatory variables selected for modeling of each soil parameter, and corn yield after stepwise regression.
Parameter Spectral variables Other variables

Soil Organic Matter (SOM) Red, blue, NIR, NDVI, BI, CI, RI, SI TRI, Group
Cation Exchange Capacity (CEC) Red, green, blue, NDVI, BI FlowDir
Potassium (K) Green, blue, NIR, NDVI, BI, CI, HI, RI, SI, Elev, TRI, Slope
Magnesium (Mg) NIR, BI, CI, HI, RI Elev, Group, Slope, TRI, FlowDir
pH Red, BI, HI, RI Group, FlowDir, Slope, TRI, Rough
Yield (Approach 1) Red, green, blue, NIR, NDVI, BI, CI, HI, RI, SI, soil class Elev, Aspect, TRI, TPI, FlowDir
Yield (Approach 2) SOM, CEC, Mg, K, pH

216
S. Khanal et al. Computers and Electronics in Agriculture 153 (2018) 213–225

2.3.1.1. Linear regression model (LM). It explains dependent variable by absolute error. This model is easy to understand and interpret, and was
means of a linear combination of predictor variables. For the LM found to have a good predictive power (Minasny and McBratney,
analysis, the “lm” function was used. A stepwise regression was 2008). The model was tuned with 10 committees and 5 neighbors to
conducted to address a problem of multi-collinearity in the regression reduce the root mean square error.
model. Stepwise regression identifies a subset of predictors based on The performances of machine learning algorithms were enhanced
their statistical significance using stepwise selection using approaches, by tuning several parameters specific to each model (discussed above)
such as forward selection, backward elimination, and a combination of using the tenfold cross-validation with five repetitions in the “caret”
the two. For stepwise regression, the “stepAIC” function based on both package. A grid search strategy was used while optimizing the para-
the forward and background search method was used. This function is meters specific to models. The parameters selected for model tuning are
available in the “MASS” package of the R software, and it uses the AIC provided in the supporting document (Table S1; see supplementary
statistics as the criteria for variable selection. Table 5 provides the list document).
of predictors identified by stepwise regression. To provide comparison with other models, the same set of pre-
dictors (Table 5) were maintained for all other models. For corn yield
2.3.1.2. Random forest (RF). It is an ensemble learning method that is prediction, two approaches were used. The first approach used only
used for both classification and regression problems. It operates by remotely sensed image-derived independent variables, and the second
constructing multiple decision trees and outputting either a class for approach used only soil variables, which were predicted using remotely
classification or mean prediction for regression of the individual trees. sensed data.
Each tree in the forest is independently constructed using a unique
bootstrap sample of the training data. The best split from a randomly
selected subset of predictors is then selected. Unlike linear regression, it 2.3.2. Model assessment and validation
requires no assumption of the probability distribution of the predictor The performances of the statistical models were assessed using a
variables, and is robust against nonlinearity and overfitting. In the repeated k-fold cross-validation resampling technique on the training
study, RF model was developed using the “rf” function available set, and then validated with the test set. For the k-fold cross-validation,
through “randomForest” package. The model’s performance was samples are randomly partitioned into k sets of roughly equal sizes. A
optimized by tuning parameters, such as the number of predictors model is fit using all samples except the first subset, and held-out
that are randomly sampled as candidates for each split (i.e., mtry) and samples are used to estimate performance measures. The first subset is
the number of trees to grow in the forest (i.e., ntree). treated as the training set, and the process repeats with the second
subset held-out, and so on. This approach tests the performance of a
2.3.1.3. Support Vector Machine (SVM). This is a machine-learning model on every instance in the available data set without having used it
method that constructs a hyperplane or set of hyperplanes in a high- in the training phase. In this study, a 10-fold cross-validation was re-
or infinite-dimensional space, which can be used for classification or peated five times resulting in 50 different subsets for testing the model
regression. A good separation between hyperplanes is achieved through efficacy. The results were then aggregated and summarized.
different types of kernel functions such as linear, radial, sigmoid, and Statistics, including adjusted R square (hereafter referred to as “R2”)
polynomial. For simplicity purpose, only linear and radial kernel and root mean squared error (RMSE), were used to evaluate the effec-
functions were selected in this study. The SVM models were tuned tiveness of model’s capabilities in predicting soil properties. R2 can be
based on bandwidth cost parameter and insensitive loss function. interpreted as the proportion of the information in the data that is ex-
plained by the model. It is a measure of correlation, not accuracy. A
2.3.1.4. Stochastic gradient boosting (SGB). This is another data mining model with a high R2 value may not necessarily lead to accurate pre-
approach that combines the advantages of nonparametric tree-based diction, and could systematically and significantly over and/or under
methods and strengths of boosting algorithms. Instead of focusing on predict the data. Thus, RMSE, a function of the model residuals (i.e.,
the complete training data, it performs boosting by selecting only a observed values minus model predictions) Eq. (1) that represents how
fraction of the training data leading to a gradual improvement in the far, on average, the residuals are from zero or the average distance
prediction accuracy. In the study, the model’s performance was between observed (O) and model predictions (P) was also used.
optimized by tuning parameters such as tree depth, number of trees,
n
and shrinkage. 1
RMSE =
n
∑ (Pi−Oi )2
i=1 (1)
2.3.1.5. Neural network (NN). It is one of the powerful nonlinear
regression approaches, which is designed to model or mimic some
properties of biological neural networks. It consists of interconnected
processing elements called nodes or neurons that work together to 2.3.3. Variable importance
produce an output function. The connection between nodes are A variable importance measure was estimated to understand the
described by the weights, which at the beginning are randomly relative importance of predictors to the outcome of various models.
chosen, but are adjusted interactively if predicted output does not Variable importance measure is estimated based on a method specific to
match output of a training dataset. The resilient backpropagation the model (Kuhn, 2017). For instance, in LM, it is computed based on
algorithm (rprop) was used because of its promising capabilities the absolute value of the t-statistics of each model parameter. In RF, it is
compared to other algorithms (Riedmiller and Braun, 1993). NN computed based on two common measures – increased mean square
models were optimized by tuning parameters such as numbers of error (IncMSE) and increased impurity index (IncNodePurity). IncMSE
hidden layer and decay rate. measures the change in predictive power by constructing trees with and
without a predictor. IncCodePurity measures the total decrease in node
2.3.1.6. Cubist model (CUB). This is a data-mining technique for impurity from splitting on a predictor in the tree construction process,
generating data driven rule-based predictive models. It works in a and is averaged over all trees. For this study, the IncMSE measure was
similar way as decision tree regression models do. A tree is created used. In NN, variable importance is estimated based on the combination
where the terminal leaves contain linear regression models. At each of the absolute values of weights. In CU, it is estimated based on the
step of the tree, there are intermediate linear models. The tree is then percentage of times each variable is used in a condition and/or linear
reduced to a set of rules that initially are paths from the top of the tree model. A varImp function in the package “caret” in the R software was
to the bottom, and the linear model is then adjusted to reduce the used for this purpose.

217
S. Khanal et al. Computers and Electronics in Agriculture 153 (2018) 213–225

Fig. 2. Correlation among 24 variables for all seven fields. The abbreviation of variable are provided in Tables 3 and 4. Red texts indicate the correlation values that
are not significant at p < 0.10. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Table 6
Model performance for estimation of soil properties for all seven fields.
Cross-Validation with Training Dataset
SOM CEC K Mg pH
Model
R2 RMSE R2 RMSE R2 RMSE R2 RMSE R2 RMSE
LM 0.55 0.47 0.65 2.40 0.23 0.55 0.21 4.69 0.14 0.63
RF 0.56 0.47 0.61 2.53 0.19 0.50 0.10 4.96 0.13 0.63
SVML 0.54 0.48 0.65 2.40 0.18 0.50 0.22 4.57 0.14 0.63
SVMR 0.56 0.47 0.62 2.50 0.21 0.49 0.11 4.85 0.09 0.64
GBM 0.58 0.46 0.63 2.45 0.18 0.50 0.08 4.99 0.15 0.62
NN 0.61 0.44 0.67 2.35 0.16 0.51 0.11 5.03 0.12 0.65
CU 0.60 0.46 0.61 2.51 0.22 0.51 0.11 5.04 0.16 0.63
Validation with Test Dataset
LM 0.56 0.61 0.61 3.08 0.12 0.55 0.27 5.16 0.13 0.62
RF 0.56 0.53 0.63 3.02 0.15 0.53 0.01 5.97 0.13 0.62
SVML 0.49 0.63 0.60 3.16 0.08 0.56 0.23 5.21 0.09 0.64
SVMR 0.44 0.55 0.57 3.19 0.11 0.55 0.13 5.57 0.05 0.65
GBM 0.50 0.63 0.62 3.09 0.21 0.51 0.09 5.67 0.07 0.63
NN 0.55 0.53 0.53 3.10 0.11 0.53 0.02 6.10 0.12 0.62
CU 0.51 0.57 0.60 3.15 0.08 0.58 0.04 6.01 0.08 0.68
Overall Dataset
LM 0.55 0.50 0.64 2.54 0.21 0.55 0.22 4.78 0.13 0.63
RF 0.56 0.48 0.62 2.62 0.18 0.51 0.08 5.16 0.13 0.63
SVML 0.53 0.51 0.64 2.55 0.16 0.51 0.22 4.70 0.13 0.63
SVMR 0.54 0.49 0.61 2.64 0.19 0.50 0.11 4.99 0.09 0.64
GBM 0.56 0.49 0.63 2.58 0.18 0.50 0.08 5.13 0.13 0.62
NN 0.60 0.46 0.64 2.50 0.15 0.51 0.09 5.24 0.12 0.64
CU 0.58 0.48 0.61 2.64 0.19 0.52 0.10 5.23 0.14 0.64
Note: LM – Linear Model, RF – Random Forest, SVML– Support Vector Machine with linear kernel, SVMR– Support Vector
Machine with radial kernel, GBM – Gradient Boosting Model, NN – Neural Network, CU – Cubist Model. Bold texts indicate the
model with the least RMSE and the highest R2 for each soil parameter.

218
S. Khanal et al. Computers and Electronics in Agriculture 153 (2018) 213–225

Fig. 3. Plots of predicted versus observed soil organic matter (% SOM), cation exchange capacity (CEC meq/100 g), magnesium (Mg), potassium (K), and pH for the
training and test datasets. Predictions for SOM and CEC were based on neural network; K, Mg and pH were based on support vector machine with radial function,
support vector machines with linear function, and gradient boosting model, respectively; and predicted corn yield was based on random forest algorithm.

3. Results values ranged from 0.61 to 0.67 for CEC, 0.54 to 0.63 for SOM, 0.16 to
0.23 for K, 0.09 to 0.121 for Mg, and 0.09 to 0.16 for pH (Table 6).
3.1. Relationship between soil properties, yield and remote sensing data Variability in R2 values during cross-validation of models are provided
in the supporting document (Figs. S1 and S2). Except for few models for
Soil properties were highly correlated with the individual wave- CEC, K, and Mg, performances of the majority of the models were found
bands (Red, Green and Blue) as well as the soil and vegetation indices to be poor for the test sets (Table 6). With the test sets, R2 values were
than the terrain properties of the fields. Terrain characteristics were relatively lower, and RMSE values were relatively higher. This could be
correlated with bare soil imagery, but not as much as the soil proper- attributed to a larger number of samples allocated for model develop-
ties. Among seven terrain properties, elevation was found to have the ment and fewer for model validation.
highest correlation with the individual wavebands such as green and While evaluating the models that provided the highest accuracy for
blue, and RI (Fig. 2). Yield of field 12D was found to have a higher prediction of soil properties at a field level, it was found that although
correlation with soil indices such as BI, HI and CI, followed by Mg, CEC, the overall performance of the models developed by integrating the
SOM and elevation of the field (results not shown). information of seven fields together were low, the models could predict
soil properties of some fields with higher accuracy than for others
3.2. Model performance (Table 8, Fig. 3). For example, for field 1B, NN model predicted SOM
with R2 = 0.85, but for PENIN, R2 = 0.21. Similarly, the overall per-
Assessment of the models used for prediction of five soil properties formance of models for K, Mg and pH were low (R2 = 0.19, 0.22 and
for all seven fields suggested that high resolution remotely sensed data 0.13 for K, Mg and pH, respectively), but the model predicted these
can predict CEC with relatively higher accuracy, followed by SOM, Mg, variables with higher R2 values for some fields. Mg was predicted with
K, and pH (Table 6). During cross-validation of the models, average R2 R2 = 0.55 for 1C; K was predicted with R2 = 0.56 for 1B; and pH was

219
S. Khanal et al. Computers and Electronics in Agriculture 153 (2018) 213–225

Table 7
Model performance for estimation of corn yield based on remotely sensed image-derived variables.
Cross-Validation with
Validation with Test Dataset Overall Dataset
Models Training Dataset
R2 RMSE R 2
RMSE R2 RMSE
LM 0.34 1.14 0.35 1.17 0.34 1.15
RF 0.52 0.97 0.56 0.97 0.53 0.97
SVML 0.33 1.15 0.35 1.18 0.33 1.16
SVMR 0.44 1.05 0.48 1.05 0.45 1.05
GBM 0.40 1.08 0.43 1.10 0.41 1.08
NN 0.37 1.11 0.39 1.14 0.37 1.12
CU 0.51 0.98 0.55 0.98 0.52 0.98
Note: Bold texts indicate the model with the least RMSE and the highest R2.

predicted with R2 = 0.73 for 1C. performed marginally better in prediction of pH. However, during
Table 7 shows average R2 and RMSE of seven models for both cross- model validation with test dataset, RF performed better than NN for
validation and validation stages during corn yield prediction using re- both SOM and CEC. GBM performed better in K prediction, and LM
motely sensed data-derived variables. R2 values ranged from 0.32 to performed marginally better for pH.
0.51 during the model cross-validation, and 0.30 to 0.51 during the For two approaches (first based on only remotely sensed data-de-
validation phase. rived variables, and the second based on only soil variables) for corn
yield prediction, RF and CU models consistently performed better (i.e.,
higher R2 and lower RMSE) than other models during both cross-vali-
3.3. Comparison of model performance dation and validation phases. Although RF and CU models had the same
R2, RF performed marginally better, as indicated by its lower RMSE
For five soil properties, no model was found to have a consistently (Table 7).
superior performance during both cross-validation and validation Superiority of machine learning models over LM model could be
phases (Table 6). However, for most of the time, machine learning attributed to the existence of non-linear relationships between the re-
models performed better than LM. For instance, during model devel- sponse and predictor variables that machine learning algorithms can
opment, NN performed better in prediction of SOM and CEC with a integrate during model development. Differences in accuracy of corn
higher R2 and lower RMSE. SVM with linear and radial kernel functions yield prediction models developed using only remotely sensed data-
performed better in prediction of Mg and K, respectively. GBM

Table 8
Model performance for prediction of soil properties at field level.
SOM CEC K Mg pH
Field Dataset
R2 RMSE R2 RMSE R2 RMSE R2 RMSE R2 RMSE
All 0.60 0.46 0.64 2.50 0.19 0.50 0.22 4.70 0.13 0.62
Overall Train 0.61 0.44 0.67 2.35 0.11 0.55 0.22 4.57 0.15 0.62
Test 0.55 0.53 0.53 3.10 0.19 0.50 0.23 5.21 0.13 0.62
All 0.85 0.43 0.78 2.31 0.56 0.16 0.20 4.78 0.27 0.63
1B Train 0.91 0.43 0.78 2.13 0.60 0.16 0.36 4.12 0.35 0.58
Test 0.41 0.45 0.77 2.64 0.45 0.17 0.09 6.98 0.05 0.82
All 0.75 0.42 0.71 2.27 0.15 0.44 0.55 3.25 0.73 0.45
1C Train 0.66 0.44 0.73 2.18 0.23 0.41 0.55 3.19 0.85 0.44
Test 1.00 0.26 1.00 2.87 1.00 0.60 1.00 3.65 0.77 0.48
All 0.53 0.30 0.47 2.32 0.05 0.50 0.04 4.28 0.41 0.49
1D Train 0.62 0.30 0.52 2.29 0.08 0.51 0.09 4.03 0.38 0.49
Test 0.79 0.35 0.00 2.46 0.97 0.44 0.32 5.46 0.99 0.46
All 0.76 0.49 0.67 2.55 0.07 0.42 0.18 5.37 0.42 0.66
9A Train 0.76 0.40 0.67 2.31 0.07 0.46 0.18 4.98 0.42 0.56
Test 0.76 0.68 0.92 3.15 0.51 0.30 0.23 6.37 0.05 0.90
All 0.20 0.33 0.17 2.31 0.00 0.62 0.00 4.10 0.01 0.49
PENIN Train 0.07 0.34 0.26 2.81 0.00 0.73 0.00 4.25 0.07 0.39
Test 0.91 0.33 0.31 0.49 0.01 0.27 0.07 3.79 0.02 0.63
All 0.68 0.45 0.57 2.40 0.33 0.41 0.33 5.21 0.24 0.65
MISD Train 0.69 0.44 0.55 2.50 0.33 0.45 0.31 5.38 0.25 0.64
Test 0.69 0.50 0.77 1.92 0.79 0.12 0.62 4.48 0.35 0.71
All 0.32 0.48 0.67 2.07 0.37 0.68 0.22 3.82 0.36 0.38
12D Train 0.31 0.46 0.72 2.03 0.37 0.70 0.17 4.02 0.33 0.39
Test 0.43 0.59 0.20 2.27 0.64 0.50 0.69 2.29 0.52 0.31
*Texts that are bold and highlighted for overall dataset indicate R2 > 0.50.

220
S. Khanal et al. Computers and Electronics in Agriculture 153 (2018) 213–225

derived variables (Table 7), and only soil parameters (Table S2; see 3.4. Variable importance in model development
supplementary document) suggested that remotely sensed data-derived
variables have high potential to provide better estimates of corn yield Of the 18 variables considered in the model development, only few
than based on soil properties. variables were found to have significant influence on the prediction of
soil properties (Table 5). While only six variables were found to have

Fig. 4. A comparison of importance scores for selected variables used in six statistical models for: (a) Soil Organic Matter, (b) Cation Exchange Capacity, (c)
Potassium, (d) Magnesium, (e) pH, and (f) corn yield. Importance scores for variables were scaled between 0 and 100. Importance scores of variables in SVM models
with radial and linear kernel functions were the same.

221
S. Khanal et al. Computers and Electronics in Agriculture 153 (2018) 213–225

significant influence on the prediction of CEC, ten variables were found contributed the most to the accuracy of the majority of models for SOM
to be significant for prediction of SOM and Mg. For CEC, K and pH, 6, (Fig. 4a) were NIR, Red, and SI. Similarly, red, green and BI were three
12 and 9 variables, respectively, had significant influence. For corn top predictors for the majority of the models for CEC. Unlike SOM and
yield prediction, 15 variables were found to have significant influence. CEC, importance scores of selected variables for Mg, K, pH and corn
Based on the analyses of importance scores of selected variables for yield prediction were more distributed. The top three variables were CI,
prediction of soil properties and corn yield (Table 5), the influence of HI, and BI for Mg; NIR, SI, and CI for K; TRI, Rough, and Slope for pH;
variables was found to vary with the model. For instance, the variable SI, CI, and HI for corn yield. Except for pH, spectral bands and indices
Group contributed the most to the prediction accuracy of LM and NN were consistently identified as important predictors for soil properties.
based models for SOM, suggesting the importance of soil type during For corn yield prediction, variables such as FlowDir, SI, and NDVI were
SOM estimation. However, the variable NIR contributed the most for found to contribute the most to the RF model, which had superior
RF, SVM, GBM and CU models for SOM. In general, the predictors that performance compared to other models (Fig. 4f).

Fig. 5. Maps showing (a) visual image of bare soil, and predicted (b) SOM (%), (c) CEC, (d) K, (e) Mg, and (f) pH in the study region with observed values at sampling
locations overlaid. Note: Predicted maps for SOM and CEC was based on NN model. SVM with radial and linear kernel functions were used for K and Mg, respectively;
and GBM was used for pH prediction.

222
S. Khanal et al. Computers and Electronics in Agriculture 153 (2018) 213–225

Fig. 6. Maps of (left) observed, and (right) predicted corn yield in t/ha. Note: predicted yield map was based on RF model.

3.5. Mapping the spatial distribution of soil properties and corn yield values obtained in this study is better than or comparable to other
studies conducted at local (Morellos et al., 2016; Thomasson et al.,
The model with the highest R2 and the lowest RMSE (Table 6) was 2001) or regional scales (Forkuor et al., 2017; Peng et al., 2015) that
selected to create high resolution maps for each soil property, and corn considered only spectral data or a combination of spectral, climate and
yield (Figs. 5–6). The geographical distribution of predicted soil prop- terrain variables.
erties was found to be similar to that of observed soil properties for The accuracy of the models in this study might have been influenced
most of the fields. For fields 1B, 1C, 1D, 9A, and MISD, the geographical by several things, including the difference in timing between soil
distributions of observed and predicted SOM and CEC were very si- sample collection and image acquisition, and the use of limited machine
milar. When the geographical distribution of soil properties were ex- learning algorithms. For instance, the bare soil imagery was acquired in
amined against three soil color classes, it was found that both SOM and May and the soil samples were collected in October. Thus, the model
CEC were highly correlated with soil color (results not shown). Dark developed based on data collected around the same time might be of
color soil corresponded to areas with higher SOM and CEC, and lower K interest to improve the accuracy of the prediction. In the study, we
and pH, and vice versa (Fig. 5a–c). This congruence of soil properties observed improvement in prediction of soil properties and corn yield
with soil color suggested that in-field variability of some soil properties with the use of machine learning algorithms than the linear regression
can be estimated based on color of bare soil images. algorithm. For example, NN model, closely followed by CU, produced
The model predicted corn yield reasonably well with an average the most accurate prediction for SOM. Similarly, CU and RF models
difference of 1.48% ( ± 8.85% standard deviation) between predicted produced the most accurate prediction of CEC and corn yield, respec-
and observed corn yield. The observed corn yield ranged from 6.1 to tively. These findings are similar to prior studies that have found
17.9 t⋅ha−1, and predicted corn yield ranged from 9.5 to 15.24 t⋅ha−1 models based on machine learning algorithms to be superior to ones
(Fig. 6). Except for few locations in the center and west parts of the using linear regression (Hahn and Gloaguen, 2008; Minasny and
field, the geographical distribution of predicted corn yield was found to McBratney, 2008; Peng et al., 2015). Increase in model accuracy by
be similar to the pattern of observed corn yield, suggesting that the using machine learning algorithm is due to the ability of these algo-
model could capture the spatial variability for most of the observed low rithms to handle the non-linear relationships, which is typically ob-
and high spots in field. served between crop, soil, environmental and topographical variables.
This study suggested that no single machine learning algorithm is
4. Discussion best for evaluating all soil parameters and crop yield at all locations,
and that multiple models should be evaluated to enhance the accuracy
4.1. Models for prediction of soil properties and corn yield of prediction estimates. Similar observations were found in prior studies
as well. Ließ et al. (2016) reported that GBM performed better than NN,
In the study, remotely sensed image-derived variables were in- RF, and SVM in prediction of soil organic carbon in a complex tropical
tegrated with field collected data to develop models for predicting soil mountain landscape in Ecuador. However, Were et al. (2015) found
properties and corn yield. Because the fields were heterogeneous (i.e., SVM to be the best method to predict SOC stocks in the Afromontane
different in terms of agricultural practices and soil properties; Table 1) Forest in Eastern Africa. Rossel and Behrens (2010) reported that the
and models were developed for all seven fields, combined, instead for smallest RMSE values were found with the SVM approach used for
each field, the overall accuracy of the models were reported low. When prediction of three soil properties, including SOC, clay content, and pH.
the models’ performances were evaluated for individual fields, the ac- Jeong et al. (2016) found RF to be a more effective machine learning
curacies however were higher for some fields (Table 8). This suggests method for crop yield predictions at regional and global scales com-
that a model developed at a plot or field level performs better than a pared to LM. Uno et al. (2005) reported NN to provide better corn yield
model developed at a larger geographic scale. Studies (Barnes et al., prediction compared to LM approach.
2000; Stevens et al., 2013) have also noted that the use of multispectral In this study, we evaluated the performance of seven most popular
data for predicting the spatial distribution of soil properties can achieve models. There are however other machine learning algorithms, such as
optimal results when the study is conducted at a plot level or in an area multivariate adaptive regression splines, K Nearest Neighbor, and var-
with uniform soil surface characteristics. Nonetheless, the range of R2 ious types of neural network (e.g., convolutional, recursive, recurrent,

223
S. Khanal et al. Computers and Electronics in Agriculture 153 (2018) 213–225

feedforward). The use of these machine learning algorithms may help However, no model was found to consistently outperform other models
improve accuracy of the models than the algorithms examined in the for prediction of soil properties. NN performed better in prediction of
study. Thus, we suggest to examine the performance of these models for SOM and CEC with a higher R2 and lower RMSE, while SVM model with
future works. linear and radial kernel function performed better for prediction of Mg
The another approach to improve the model accuracy might be the and K, respectively. For pH and corn yield prediction, GBM and RF
use of advanced algorithms for variable selection such as genetic al- models, respectively, performed better than other models. For seven
gorithms. In this study, variables for the linear regression models were fields, models for SOM, CEC, Mg, K, and pH showed R2 in the range of
selected using the most commonly used stepwise AIC approach. Genetic 0.2–0.85, 0.17–0.78, 0.0–0.55, 0.0–0.56, and 0.0–0.73, respectively.
algorithms, inspired by the laws of genetics, try to find optimal solu- For corn yield, RF consistently outperformed other models and pro-
tions to complex problems, which is usually the case in the context of vided R2 = 0.53. These findings suggest that remotely sensed data can
agriculture. It is thus useful to explore the role of genetic algorithms in serve as a surrogate for more intensive soil sampling and costly yield
future studies related to soil properties and yield estimation. monitoring systems.
Variables based on multispectral bare soil images were found to be
4.2. Important variables for modeling of soil properties and corn yield the most important predictors for enhancing the model’s accuracy for
the spatial prediction of soil properties, including SOM, CEC, K and Mg.
This study demonstrated that the information derived from multi- Topographic variables were found to have more influence in the pre-
spectral images contributes more to improve the prediction of soil diction of pH. For corn yield, both spectral and topographic information
properties than terrain information derived from DEM. This is con- were important. Despite the high variability in topography and farm
sistent with the findings of Dobos et al (2001) that combined coarse management practices of the seven fields, the accuracy obtained in
resolution AVHRR satellite data and DEM derived terrain variables to prediction of soil properties and corn yield in this study are promising
characterize the soil-forming environment. Among the variables de- for high resolution mapping of soil properties and corn yield at a local
rived using spectral information of multispectral images, it was inter- scale. High resolution maps of soil properties and crop yield help
esting to note that NDVI of bare soil imagery was found to have sig- farmers to identify areas of potential concerns prior to planting and
nificant influence on prediction of majority of the soil properties manage them for improved crop productivity.
evaluated in this study, including SOM, CEC, K and pH, although it is a
commonly used index for representing vegetation growth. This finding Acknowledgements
was found to be consistent with previous studies (Escadafal and Huete,
1993; Huete and Tucker, 1991) that also found NDVI to be sensitive to This work was supported in parts by funds from programs at the
mineral constituents of soil. This study also showed that bare soil Ohio State University- the Field to Faucet program (Grants No. F2F-
imagery can be a good indicator of potential corn yield pattern in a 000004), and Ohio Agricultural Research and Development Center
field. Understanding of the potential spatial variability in corn yield (OARDC) (SEEDS: the OARDC Research Enhancement Competitive
patterns based on soil spectral information and topographic conditions Grants Program).
prior to planting might give farmers enough time to take preventive
actions, such as levelling of high elevation areas, fertilization of areas Appendix A. Supplementary material
with poor fertility, to maintain crop quality and yield.
Supplementary data associated with this article can be found, in the
4.3. Applicability of the models online version, at https://doi.org/10.1016/j.compag.2018.07.016.

The statistical models for prediction of soil properties in this study References
were calibrated and tested with data collected from seven fields in one
year. The corn yield was predicted based on one field with one year of Allen, D.E., Pringle, M.J., Bray, S., Hall, T.J., O’Reagain, P.O., Phelps, D., Cobon, D.H.,
data. Thus, the models developed in this study cannot be generalized Bloesch, P.M., Dalal, R.C., 2014. What determines soil organic carbon stocks in the
grazing lands of north-eastern Australia? Soil Res. 51, 695–706.
for the prediction of the same soil parameters and corn yield in other Barnes, E.M., Baker, M.G., et al., 2000. Multispectral data for mapping soil texture:
soil types and geographic regions. To reinforce the findings of this study possibilities and limitations. Appl. Eng. Agric. 16, 731–746.
as well as to strengthen the model’s predictive capability over the wide Blasch, G., Spengler, D., Itzerott, S., Wessolek, G., 2015. Organic matter modeling at the
landscape scale based on multitemporal soil pattern analysis using rapideye data.
range of soil properties and field management practices, further studies Remote Sens. 7, 11125–11150. https://doi.org/10.3390/rs70911125.
should be carried out with more data from multiple years, and from Davy, M.C., Koen, T.B., 2014. Variations in soil organic carbon for two soil types and six
other fields with varying management practices and soil types. land uses in the Murray Catchment, New South Wales, Australia. Soil Res. 51,
631–644.
Nevertheless, the analyses presented in this study demonstrated that
Dobos, E., Montanarella, L., Nègre, T., Micheli, E., 2001. A regional scale soil mapping
remotely sensed data and machine learning approaches could be approach using integrated AVHRR and DEM data. Int. J. Appl. Earth Obs. Geoinf. 3,
adopted for cost-effective prediction of soil properties and crop yield at 30–42.
Escadafal, R., Huete, A.R., 1993. Soil optical properties and environmental applications of
high spatial resolution.
remote sensing. Int. Arch. Photogramm. Remote Sens. 29, 709–715.
Forkuor, G., Hounkpatin, O.K.L., Welp, G., Thiel, M., 2017. High resolution mapping of
5. Conclusions soil properties using Remote Sensing variables in south-western Burkina Faso: a
comparison of machine learning and multiple linear regression models. PLoS One 12,
1–21. https://doi.org/10.1371/journal.pone.0170478.
High spatial resolution mapping of soil properties and crop yield is Geipel, J., Link, J., Claupein, W., 2014. Combined spectral and spatial modeling of corn
required for proper management of crop and soil health which is yield based on aerial images and crop surface models acquired with an unmanned
needed for improving crop productivity and lowering agriculture re- aircraft system. Remote Sens. 6, 10335–10355.
Hahn, C., Gloaguen, R., 2008. Estimation of soil types by non linear analysis of remote
lated negative environmental footprint. This study demonstrated that sensing data. Nonlinear Process. Geophys. 15, 115–126.
the use of high spatial resolution (< 1 m) multispectral bare soil image Hively, W.D., McCarty, G.W., Reeves, J.B., Lang, M.W., Oesterling, R.A., Delwiche, S.R.,
and terrain data can capture in-field variability of soil properties, in- 2011. Use of airborne hyperspectral imagery to map soil properties in tilled agri-
cultural fields. Appl. Environ. Soil Sci.
cluding SOM, CEC, K, Mg, and pH, and corn yield. The performance of Huete, A.R., Tucker, C.J., 1991. Investigation of soil influences in AVHRR red and near-
seven statistical models, including LM, RF, SVM with linear and radial infrared vegetation index imagery. Int. J. Remote Sens. 12, 1223–1242.
kernel functions, SGB, NN, and CUB, were compared for their ability to Jeong, J.H., Resop, J.P., Mueller, N.D., Fleisher, D.H., Yun, K., Butler, E.E., Timlin, D.J.,
Shim, K.M., Gerber, J.S., Reddy, V.R., Kim, S.H., 2016. Random forests for global and
predict soil properties and corn yield, and the machine learning algo- regional crop yield predictions. PLoS One 11, 1–15. https://doi.org/10.1371/journal.
rithms were found to outperform the LM algorithm most of the time.

224
S. Khanal et al. Computers and Electronics in Agriculture 153 (2018) 213–225

pone.0156571. 2009.12.025.
Kitchingman, A., Lai, S., 2004. Inferences on potential seamount locations from mid- Scudiero, E., Skaggs, T.H., Corwin, D.L., 2014. Regional scale soil salinity evaluation
resolution bathymetric data. Focus (Madison). 32, 128. using Landsat 7, western San Joaquin Valley, California. USA. Geoderma Reg. 2,
Kuhn, M., 2017. CARET: Classification and Regression Training [WWW Document]. 82–90.
URL < https://github.com/topepo/caret/ > (accessed 10.1.17). Shi, Y., Thomasson, J.A., Murray, S.C., Pugh, N.A., Rooney, W.L., Shafian, S., Rajan, N.,
Kuhn, M., Johnson, K., 2013. Applied Predictive Modeling. < https://doi.org/10.1007/ Rouze, G., Morgan, C.L.S., Neely, H.L., et al., 2016. Unmanned aerial vehicles for
978-1-4614-6849-3 > . high-throughput phenotyping and agronomic research. PLoS One 11, e0159781.
Ließ, M., Schmidt, J., Glaser, B., 2016. Improving the spatial prediction of soil organic Souza, E.G., Bazzi, C.L., Khosla, R., Uribe-Opazo, M.A., Reich, R.M., 2016. Interpolation
carbon stocks in a complex tropical mountain landscape by methodological specifi- type and data computation of crop yield maps is important for precision crop pro-
cations in machine learning approaches. PLoS One 11, e0153673. duction. J. Plant fcenNutr. 39, 531–538. https://doi.org/10.1080/01904167.2015.
Lobell, D.B., Thau, D., Seifert, C., Engle, E., Little, B., 2015. A scalable satellite-based crop 1124893.
yield mapper. Remote Sens. Environ. 164, 324–333. https://doi.org/10.1016/j.rse. Spectrum Analytic, 2017. Analysis Services [WWW Document]. URL < https://www.
2015.04.021. spectrumanalytic.com/services/analysis/agsoil.html > .
Lyle, G., Bryan, B.A., Ostendorf, B., 2014. Post-processing methods to eliminate erroneous Stevens, A., Nocita, M., Tóth, G., Montanarella, L., van Wesemael, B., 2013. Prediction of
grain yield measurements: review and directions for future development. Precis. soil organic carbon at the European scale by visible and near infrared reflectance
Agric. 15, 377–402. spectroscopy. PLoS One 8, e66409.
Minasny, B., McBratney, A.B., 2008. Regression rules as a tool for predicting soil prop- Sudduth, Kenneth, A., Drumm, 2007. Yield editor: software for removing errors from crop
erties from infrared reflectance spectroscopy. Chemom. Intell. Lab. Syst. 94, 72–79. yield maps. Agron. J. 99, 1471–1482. https://doi.org/10.2134/agronj2006.0326.
https://doi.org/10.1016/j.chemolab.2008.06.003. Thomasson, J.A., Sui, R., Cox, M.S., Al–Rajehy, A., 2001. Soil reflectance sensing for
Morellos, A., Pantazi, X.-E., Moshou, D., Alexandridis, T., Whetton, R., Tziotzios, G., determining soil properties in precision agriculture. Trans. ASAE 44, 1445–1453
Wiebensohn, J., Bill, R., Mouazen, A.M., 2016. Machine learning based prediction of https://doi.org/10.13031/2013.7002.
soil total nitrogen, organic carbon and moisture content by using VIS-NIR spectro- Uno, Y., Prasher, S.O., Lacroix, R., Goel, P.K., Karimi, Y., Viau, A., Patel, R.M., 2005.
scopy. Biosyst. Eng. 152, 104–116. https://doi.org/10.1016/j.biosystemseng.2016. Artificial neural networks to predict corn yield from Compact Airborne
04.018. funoSpectrographic Imager data. Comput. Electron. Agric. 47, 149–161. https://doi.
Mulder, V.L., De Bruin, S., Schaepman, M.E., Mayr, T.R., 2011. The use of remote sensing org/10.1016/j.compag.2004.11.014.
in soil and terrain mapping-a review. Geoderma 162, 1–19. Were, K., Bui, D.T., Dick, Ø.B., Singh, B.R., 2015. A comparative assessment of support
Peng, Y., Xiong, X., Adhikari, K., Knadel, M., Grunwald, S., Greve, M.H., 2015. Modeling vector regression, artificial neural networks, and random forests for predicting and
soil organic carbon at regional scale by combining multi-spectral images with la- mapping soil organic carbon stocks across an Afromontane landscape. Ecol. Indic. 52,
boratory spectra. PLoS One 10. https://doi.org/10.1371/journal.pone.0142295. 394–403.
Ray, S.S., Singh, J.P., Das, G., Panigrahy, S., 2004. Use of high resolution remote sensing Wilson, M.F.J., O’Connell, B., Brown, C., Guinan, J.C., Grehan, A.J., 2007. Multiscale
data for generating site-specific soil management plan. Int. Arch. Photogramm. terrain analysis of multibeam bathymetry data for habitat mapping on the continental
Remote Sens. Spat. Inf. Sci. 35, 127–132. slope. Mar. Geod. 30, 3–35.
Riedmiller, M., Braun, H., 1993. A direct adaptive method for faster backpropagation Yang, C., Westbrook, J.K., Suh, C.P.-C., Martin, D.E., Hoffmann, W.C., Lan, Y., Fritz, B.K.,
learning : the RPROP algorithm. In: Neural Networks, International Conference on. Goolsby, J.A., 2014. An airborne multispectral imaging system based on two con-
pp. 586–591. sumer-grade cameras for agricultural remote sensing. Remote Sens. 6, 5257–5278.
Riley, S.J., 1999. Index that quantifies topographic heterogeneity. Intermt. J. Sci. 5, Yao, R.J., Yang, J.S., Wu, D.H., Xie, W.P., Gao, P., Wang, X.P., 2016. Characterizing
23–27. spatial-temporal changes of soil and crop parameters for precision management in a
Rossel, R.A.V., Behrens, T., 2010. Using data mining to model and interpret soil diffuse coastal rainfed agroecosystem. Agron. J. 108, 2462–2477.
reflectance spectra. Geoderma 158, 46–54. https://doi.org/10.1016/j.geoderma.

225

You might also like