You are on page 1of 10

CSIRO PUBLISHING

Soil Research, 2019, 57, 387–396


https://doi.org/10.1071/SR18319

Digital mapping of topsoil pH by random forest with


residual kriging (RFRK) in a hilly region

A
Lei Wang , Wei Wu B, and Hong-Bin Liu A,C
A
College of Resources and Environment, Southwest University, Beibei, Chongqing 400715, China.
B
College of Computer and Information Science, Southwest University, Beibei, Chongqing 400715, China.
C
Corresponding author. Email: lhbin@swu.edu.cn

Abstract. Soil pH is a vital attribute of soil fertility. The accurate and efficient prediction of soil pH can provide the
necessary basic information for agricultural development. In the present study, random forest with residual kriging
(RFRK) was used to predict soil pH based on stratum, climate, vegetation and topography in a hilly region. The
performance of RFRK was compared with those of the classification and regression tree (CART) and the random forest
(RF). Comparative results showed that RFRK provided the best performance. The corresponding values of Lin’s
concordance correlation coefficient, coefficient of determination, mean absolute error and root mean square error were as
follows: 0.70, 0.51, 0.44 and 0.61 for CART; 0.80, 0.70, 0.34 and 0.48 for RF; and 0.88, 0.80, 0.25 and 0.39 for
RFRK. Stratum and average annual temperature were the most important factors affecting the soil pH in the study area.
Results indicate that RFRK is a feasible and reliable tool for predicting soil pH in hilly regions.

Additional keywords: classification and regression tree; digital soil mapping; soil pH.

Received 22 October 2018, accepted 1 March 2019, published online 9 April 2019

Introduction
Soil pH, as an important attribute of soil, exerts an impact on soil combines RF and ordinary kriging (OK). This method inherits
fertility (Filippi et al. 2018), plant growth (Schwamberger and the advantages of RF and OK by taking into account the non-
Sims 1991) and other soil properties (Heggelund et al. 2014; linear relationships between soil properties and environmental
Tu et al. 2018). The accurate and efficient prediction of soil pH is variables and spatial autocorrelation of target variables (Guo
vital for agricultural development (Ou et al. 2017), ecological et al. 2015). Previous studies have demonstrated that RFRK
modelling (Lauber et al. 2009) and environmental pollution outperforms other ordinary DSM methods, such as support
management (Ye et al. 2014). Traditional soil nutrient vector machine (SVM), OK, inverse distance squared (IDS),
prediction methods are mainly based on soil-survey mapping stepwise linear regression (SLR) and RF (Li et al. 2011; Guo
(McBratney et al. 2000), which is not only time- and labour- et al. 2015; Tziachris et al. 2019; Szatmári and Pásztor 2019).
consuming, but also cannot easily and sufficiently provide detailed Li et al. (2011) used different methods to predict mud content in
information (Band and Moore 1995). As an alternative, digital soil the south-west Australian margin and found that RF and RFRK
mapping (DSM) is used to accurately and efficiently describe the outperformed SVM, OK and IDS. Guo et al. (2015) applied
soil spatial distribution (Zhu et al. 2010). RFRK to predict the soil organic matter of a rubber plantation
Various prediction methods, such as geostatistical methods in Hainan Island, China and found that RFRK achieved
and multiple linear regression, have been used to produce the better results than SLR and RF. Tziachris et al. (2019) used
spatial distribution of soil properties. However, multiple linear hybrid spatial models to predict soil organic matter and found that
regression relies on linear assumptions and does not consider RFRK outperformed other hybrid and ordinary machine learning
the spatial autocorrelation of soil properties (Guo et al. 2013). models. Szatmári and Pásztor (2019) predicted the soil organic
Moreover, the relationships between soil properties and carbon store and found that RFRK was more optimally accurate
environmental variables are often non-linear and complex than universal kriging, sequential Gaussian simulation and
(Lark 1999; Tan et al. 2017). Geostatistical methods only quantile regression forest. Moreover, RFRK was used to model
consider the spatial autocorrelation of the target variable other kinds of spatial distribution, such as PM2.5, grassland leaf
and ignore the influences of environmental variables (Guo area index (LAI), radionuclide map and precipitation (Liu et al.
et al. 2015). 2018; Li et al. 2016; Viscarra Rossel et al. 2014; Kim and Park
To overcome the limitations, a new hybrid approach called 2016). Liu et al. (2018) applied RFRK to predict PM2.5, and the
random forest with residual kriging (RFRK) was introduced for model achieved refined spatial resolution and satisfactory results.
soil nutrient prediction (Li et al. 2011; Guo et al. 2015). RFRK Li et al. (2016) predicted grassland LAI, and findings indicated
was developed on the basis of the random forest (RF) and that RFRK outperformed partial least-squares regression,

Journal compilation  CSIRO 2019 www.publish.csiro.au/journals/sr


388 Soil Research L. Wang et al.

artificial neural networks (ANNs) and RF. Viscarra Rossel et al. free period of 17.58C, 1154.8 mm, 1184.3 h and 336 days
(2014) used RFRK to generate a radionuclide map that was respectively.
sufficiently accurate. However, although RFRK achieved the
aforementioned advantages and has been increasingly used in Data collection
modelling spatial distribution in various fields, its application in A total of 5162 topsoil samples were collected from the
predicting soil pH remains limited. agricultural soil fertility survey database of Changshou
Soil is a complex entity that is influenced by several factors District. The soil samples were obtained after the crop
(McBratney et al. 2003). Dokuchaev (1883) systematically harvesting season. Each soil sample was mixed by 10–15
proposed that soil was developed from long-term interaction of sub-samples collected within a radius of 10 m at a depth of
climate, biology, topography and parent material. Jenny (1941, 20 cm. The soil samples were dried at room temperature and
1980) emphasised that soil be considered a function of climate, passed through a 2-mm soil sieve. Soil pH was measured in
organisms, relief, parent material and time (i.e. abbreviated as soil–water suspension (1 : 2.5) with a pH meter.
‘clorpt’). Soil pH, as an important attribute of soil, is affected by Terrain indicators were derived from the ASTER GDEM V2
numerous factors (Liu et al. 2013; Mosleh et al. 2016; Tan et al. (Global Digital Elevation Model) with a resolution of 30 m. The
2017; Pahlavan-Rad and Akbarimoghaddam 2018). Tan et al. dataset was provided by the International Scientific and
(2017) predicted soil pH distribution in a hilly region based on Technical Data Mirror Site, Computer Network Information
terrain attributes, climate indicators and geological units by Center, Chinese Academy of Sciences (http://www.gscloud.cn).
using geographically weighted regression. Liu et al. (2013) ELE (Fig. 1), valley depth (VD) (Fig. 2a), vertical distance to
found that soil pH distribution was controlled by precipitation, channel network (VDCN) (Fig. 2b), relative slope position
topography, soil type and vegetation type based on IDW, splines, (RSP) (Fig. 2c) and topographic wetness index (TWI)
OK and co-kriging. Mosleh et al. (2016) generated a soil pH (Beven and Kirkby 1979) (Fig. 2d) were calculated by using
map based on topography, remote sensing, geology, soil type SAGA GIS v.6.4.
and geomorphology by using ANN, boosted regression tree, The normalised difference vegetation index (NDVI), which
generalised linear model and multiple linear regression. The was downloaded from http://ladsweb.nascom.nasa.gov (Fig. 3),
present work considers the four factors of stratum, climate, was used to represent vegetation. The data are part of the MODIS
vegetation and topography, which correspondingly represent series land level 3 standard data product (MYD13Q1), which
parent material, climate, organism and terrain, to predict soil belong to the Aqua afternoon satellite with a spatial resolution of
pH in a hilly area. These data are easy to obtain with appropriate 250 m and a time resolution of 16 days. A total of 230 satellite
precision. images were collected from 2003 to 2012. All images were
The hilly region is widely distributed in south-west China and subjected to atmospheric, radiation and geometric correction.
has developed as important agricultural land over time. Therefore, In addition, the Savitzky–Golay filter approach was used to
accurately and efficiently predicting the spatial distribution of soil remove noise from the NDVI time series data (Chen et al.
nutrients and analysing the controlling environmental factors in 2004). The average of the 2003–12 dataset was calculated with
the hilly region are important not only for DSM theory but also the ArcGIS v.10.5 software. To match the DEM resolution, the
for local agricultural development. The present study aimed to NDVI maps were resampled to 30 m resolution by the nearest
(1) compare the performance of various methods; (2) use the best neighbour approach with the ArcGIS v.10.5 software.
method to predict the spatial distribution of soil pH; and Annual average temperature (AVTP) (Fig. 4a) and annual
(3) investigate the importance of environmental factors that average precipitation (AVPR) (Fig. 4b) were collated from the
control soil pH variability. The study was conducted in a World Meteorological Database (2ed) (http://www.worldclim.
typical hilly area with complex topography in south-west China. org) with a resolution of 1000 m (Fick and Hijmans 2017). The
climate maps were also resampled to 30 m resolution to match
Materials and methods the DEM resolution.
The strata (Fig. 5) were digitised from the geological map at
Study area a scale of 1 : 500 000. The study area comprised five strata,
The study area, Changshou District (1068490 –1078270 E, namely, ST1, ST2, ST3, ST4 and ST5.
298430 –308120 N) (Fig. 1), is located in the upper reaches of
the Yangtze River in south-west China. The total area is Methodology
1423 km2. The terrain is characterised as a typical hilly Classification and regression tree (CART) is a machine learning
region. The elevation (ELE) mainly varies from 500 to 900 algorithm (Breiman et al. 1984) that is widely used in
m, with the highest and lowest points at 1034 and 175 m classifying or predicting soil properties (Ließ et al. 2012;
respectively. Soil types include paddy soil, purple soil, fluvo- Wu et al. 2018). CART generates a series of binary trees by
aquic soil, yellow loam soil and lime (rock) soil (Xi et al. 1998). cyclically analysing the training dataset, and the final leaf nodes
Soil textures comprise loam, clay and sand. The strata consist of are appointed as the classification or prediction values. Here,
Triassic limestone (ST1), Triassic Xujiahe Formation sandstone CART was run in SPSS v.22.0, with the minimal parent node,
(ST2), lower Jurassic Ziliujing Formation siltstone (ST3), minimal child node and maximum tree depth set as 20, 10 and
middle Jurassic Shaximiao Formation siltstone (ST4) and 20 respectively.
upper Jurassic Suining Formation sand–mudstone (ST5) (Xi RF is derived from CART (Breiman 2001). In general, RF
et al. 1998). The climate is subtropical monsoon with annual consists of a large number of CARTs, and each CART
average temperature, precipitation, sunshine hours and frost- independently performs classification or prediction operation.
Digital soil pH mapping by RFRK: hilly region Soil Research 389

China Chongqing N

Changshou District

Legend
Training points
Validation points
Water
Elevation (m)
High : 1034
0 3 6 12 18 24
km Low : 175
Fig. 1. Location of study area and distribution of soil samples.

All results were voted or averaged as the final result. Moreover, applied the relative importance (RI) by normalising the two
RF can determine the importance of input variables by indices to determine the influence of environmental variables.
calculating the mean square error (MSE) or the Gini index RF was run in MatLab ver. 2016a, with nTree (number of
of out-of-bag data, and those variables with high MSE or Gini CARTs) and mTry (number of feature variables on each non-
index values are considered important. The current work leaf node) of 500 and 3 respectively.
390 Soil Research L. Wang et al.

(a) (b)
N N

Legend Legend
Water Water

VD (m) VDCN (m)


High : 288.61 High : 361.76
0 3 6 12 18 24 0 3 6 12 18 24
km Low : 0 km Low : 0

(c) (d)
N N

Legend Legend
Water Water

RSP TWI
High : 1 High : 14.36
0 3 6 12 18 24 0 3 6 12 18 24
km Low : 0 km Low : 9.35

Fig. 2. Terrain indicators: (a) valley depth (VD), (b) vertical distance to channel network (VDCN), (c) relative slope position (RSP) and (d)
topographic wetness index (TWI).

RFRK is a combination of RF and OK. The RFRK procedure where PRFRK (pH) is the soil pH value predicted by RFRK,
can be summarised as follows: PRF (pH) is the soil pH value predicted by RF, and ROK (pH) is
Step 1: Soil pH is predicted by using RF with environmental the interpolated residuals of RF.
variables, and then residuals are calculated. OK is a single-variable interpolation method based on the
Step 2: OK is used to interpolate the residuals. spatial autocorrelation of target variables. Therefore, OK
Step 3: The interpolated residuals are added to soil pH requires that the target variables meet spatial autocorrelation.
(i.e. predicted in Step 1). A semi-variance function was used to determine the spatial
autocorrelation of RF residual. The nugget (Co) value, the sill
The formula of RFRK is
(Co + C) value and the optimal semi-variogram model were
PRFRK ðpHÞ ¼ PRF ðpHÞ þ ROK ðpHÞ ð1Þ calculated with the GS+ v.7.0 software. The nugget effect was
Digital soil pH mapping by RFRK: hilly region Soil Research 391

calculated with the Co/(Co + C) formula. According to validation datasets, and Pearson’s correlation coefficient was
Goovaerts (1999), the spatial autocorrelation is strong if the used to determine the relationships between pH and quantitative
nugget effect is less than 25%, medium if between 25% and environmental variables. ANOVA and Pearson’s correlation
100% and weak if greater than 100%. coefficient were calculated in SPSS v.22.0.
Analysis of variance (ANOVA) was used to analyse the The digital soil pH maps, as predicted by CART, RF and
differences in soil pH amongst strata and between training and RFRK, were generated in ArcGIS v.10.5.

N N

Legend
Water
Legend Strata
ST1
Water
ST2
NDVI ST3
High : 0.67
0 3 6 12 18 24 ST4
0 3 6 12 18 24
km Low : 0.03
km ST5

Fig. 3. Vegetation: normalised difference vegetation index (NDVI). Fig. 5. Strata: Triassic limestone (ST1); Triassic Xujiahe Formation
sandstone (ST2); lower Jurassic Ziliujing Formation siltstone (ST3);
middle Jurassic Shaximiao Formation siltstone (ST4); upper Jurassic
Suining Formation sand-mudstone (ST5).

(a) (b)
N N

Legend Legend
Water Water
AVTP (°C) AVPR (mm)
High : 18.38 High : 1231
0 3 6 12 18 24 0 3 6 12 18 24
km Low : 15.17 km Low : 1161

Fig. 4. Climate factors: (a) annual average temperature (AVTP) and (b) annual average precipitation (AVPR).
392 Soil Research L. Wang et al.

Validation Table 1. Statistical characteristics of soil pH


CV, coefficient of variation
In evaluating model prediction performance, a separate dataset
containing 1062 samples (~20%) was randomly selected from Item n Min Max Mean s.d. CV Skewness Kurtosis
5162 samples (total) as the validation dataset. Lin’s concordance Total dataset 5162 4.20 8.30 5.90 0.86 14.58% 0.56 0.52
correlation coefficient (LCCC) (Lin 1989, 2000), coefficient of Training dataset 4100 4.20 8.30 5.90 0.86 14.53% 0.56 0.54
determination (R2), mean absolute error (MAE) and root mean Validation dataset 1062 4.20 8.30 5.87 0.87 14.77% 0.57 0.45
square error (RMSE) were used to evaluate model accuracy.
LCCC measures the agreement of two variables along a 458 line.
Table 2. Relationship between soil pH and strata (mean þ s.d.)
R2 (i.e. the square of the correlation coefficient) can be used to
Note: Letters denote l.s.d. at P < 0.05. ST1, Triassic limestone; ST2, Triassic
explain the proportion of variation of the dependent variables in
Xujiahe Formation sandstone; ST3, lower Jurassic Ziliujing Formation
the model. MAE is the arithmetic mean of the absolute values of siltstone; ST4, middle Jurassic Shaximiao Formation siltstone; ST5,
the error between measured and predicted values. RMSE is the upper Jurassic Suining Formation sand–mudstone
root mean square error between the measured values and the
predicted values. High LCCC and R2 values and low MAE and Item ST1 ST2 ST3 ST4 ST5
RMSE values indicate good modelling performance. The pH 6.53 ± 0.79d 6.08 ± 0.86e 5.52 ± 0.81b 5.65 ± 0.73b 6.77 ± 0.69a
equations of these indices are as follows:
  PÞ
LCCC ¼ 2rSM SP =½S2M þ S2P þ ðM  2 ð2Þ Table 3. Pearson’s correlation coefficients between soil pH and
quantitative environmental variables
X
n X
n Note: *P < 0.05 and **P < 0.01. AVTP, annual average temperature;
R2 ¼  2=
ðPi  MÞ  2
ðMi  MÞ ð3Þ AVPR, annual average precipitation; NDVI, normalised difference
i¼1 i¼1 vegetation index; VD, valley depth; TWI, topographic wetness index;
VDCN, vertical distance to channel network; ELE, elevation; RSP,
1X n
relative slope position
MAE ¼ jMi  Pi j ð4Þ
n i¼1 Item AVTP AVPR NDVI VD TWI VDCN ELE RSP
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pH 0.024 0.022 0.019 0.119** 0.027 0.07** 0.025 0.131**
1X n
RMSE ¼ ðMi  Pi Þ2 ð5Þ
n i¼1
were neutral. Soils developed from ST3 and ST5 exhibited the
where r is the Pearson’s correlation coefficient; SM and SP are lowest and highest pH values respectively.
the standard deviations of the measured values and the predicted The results of Pearson’s correlation indicated that soil pH
values respectively; M  and P  are the mean values of the was negatively correlated with RSP, ELE, AVPR and AVTP
measured values and the predicted values respectively; and and positively correlated with VD and NDVI (Table 3).
Mi, Pi and n are the measured values, the predicted values, and
the validation sample number respectively.
Semi-variance of RF residual
The relative enhancement (RE) of the accuracy indicators
was used to evaluate the performance improvement of RFRK The calculated semi-variance of the RF residual was shown in
over CART and RF models. The equation is Fig. 6. The nugget (Co) and sill (Co + C) values were 0.0045 and
0.0458 respectively. The nugget effect [Co/(Co + C)] was 9.83%,
jRRFRK  Rj
RE ¼  100% ð6Þ which indicates that the residual had strong spatial autocorrelation
R (Goovaerts 1999). The optimal model for a semi-variogram is an
exponential model with a separation distance of 690 m.
where RE is the relative enhancement, RRFRK is the accuracy
indicator of RFRK, and R is the accuracy indicator of CART or
RF. Comparison of model performance
Results The prediction accuracy of RFRK was compared with those of
CART and RF by using the aforementioned separate validation
Descriptive statistics of soil pH dataset. RFRK obtained comparatively high LCCC and R2
The descriptive statistics of soil pH (Table 1) indicate that the values and low MAE and RMSE values (Table 4). RFRK
training dataset was highly similar to the validation dataset, and achieved remarkable improvements over CART and RF. The
the minimal pH and the maximal pH were 4.2 and 8.3 respectively. LCCC and R2 values were 0.70 and 0.51 for CART, 0.80 and
The ANOVA results did not show significant differences in soil 0.70 for RF, and 0.88 and 0.80 for RFRK respectively. The REs
pH between training and validation datasets. Soil pH exhibited in LCCC of RFRK over CART and RF were 26% and 10%
medium variability, and the coefficients of variation (CVs) were respectively. The REs in R2 of RFRK over CART and RF were
14.53% and 14.77% for the training dataset and the validation 57% and 14% respectively. The MAE and RMSE values were
dataset respectively (Nielsen and Bouma 1985). 0.44 and 0.61 for CART, 0.34 and 0.48 for RF and 0.25 and 0.39
Significant differences in soil pH were found for the strata for RFRK respectively. The REs in MAE of RFRK over CART
(Table 2). On average, soils developed from ST2, ST3 and ST4 and RF were 43% and 26% respectively. The REs in RMSE of
were weakly acidic, whereas soils developed from ST1 and ST5 RFRK over CART and RF were 36% and 19% respectively.
Digital soil pH mapping by RFRK: hilly region Soil Research 393

Table 4. Lin’s concordance correlation coefficient (LCCC), coefficient


of determination (R2), mean absolute error (MAE) and root mean square N
error (RMSE) of classification and regression tree (CART), random
forest (RF) and random forest with residual kriging (RFRK)

Item CART RF RFRK


LCCC 0.70 0.80 0.88
R2 0.51 0.70 0.80
MAE 0.44 0.34 0.25
RMSE 0.61 0.48 0.39

Table 5. Relative importance (RI) of variables produced by


random forest
ST, stratum; AVTP, annual average temperature; VD, valley depth; AVPR,
annual average precipitation; ELE, elevation; NDVI, normalised difference
vegetation index; VDCN, vertical distance to channel network; RSP, Legend
relative slope position; TWI, topographic wetness index Water
Soil pH
Item ST AVTP VD AVPR ELE NDVI VDCN RSP TWI
4.6–5.5
RI 23.51% 14.21% 11.91% 11.66% 11.12% 9.83% 6.10% 5.92% 5.74%
5.6–6.5
0 3 6 12 18 24
km 6.6–7.5
Isotropic variogram
0.0473 Fig. 7. Soil pH map produced by the classification and regression tree.

0.0354
Semi-variance

N
0.0236

0.0118

0.0000
0.00 9612.57 19225.14 28837.71

Separation distance (h)


Exponential model (Co = 0.00450; Co + C = 0.04580; Ao + 690.00; R2 = 0.748;
RSS = 1.892E–05)

Fig. 6. Semivariogram of the random forest residual. Co, nugget value; Legend
Co + C, sill value; C, partial sill value; Ao, separation distance; R2, Water
coefficient of determination; RSS, residual sum of squares. Soil pH
4.4–4.5
These results indicate that RFRK is the optimal model for
4.6–5.5
predicting soil pH at the study site.
5.6–6.5
6.6–7.5
Relative importance (RI) of variables 0 3 6 12 18 24
km 7.6–8.0
According to the RI results, the two most important variables were
ST and AVTP with RIs of 23.51% and 14.21% respectively Fig. 8. Soil pH map produced by random forest.
(Table 5). VD, AVPR and ELE obtained similar RI values.
acidic (pH = 4.6–5.5), weakly acidic (pH = 5.6–6.5), neutral (pH
Digital mapping of soil pH = 6.6–7.5) and alkaline (pH >7.5) soils respectively. Acidic soils
The three maps of soil pH produced by CART, RF and RFRK represent 88.44% of the total area, and this proportion is higher
obtained similar spatial distributions (Figs 7, 8 and 9). The than those of neutral and alkaline soils. Thus, the soils in
ranges of soil pH of CART, RF and RFRK were 4.6–7.5, Changshou District are generally acidic.
4.4–8.0 and 4.2–8.3 respectively. The maximum pH values
of CART and RF were underestimated, whereas the minimum Discussion
pH values were overestimated. These findings indicate that
CART and RF have ‘smoothing effects’. According to the Main influencing factors of soil pH
RFRK results, 0.09%, 33.1%, 55.25%, 11.45% and 0.11% of Strata greatly affect soil properties (Kassai and Sisák 2018).
the total area can be classified as strongly acidic (pH  4.5), Moreover, strata can be used to determine soil type, which is the
394 Soil Research L. Wang et al.

Vegetation was also an important factor of soil pH in the


N study area. The effect of vegetation on soil pH can be attributed
to the remaining organisms in the soil after harvest. Previous
studies have shown that vegetation exerts an impact on soil
pH (Andrade and Mendonça-Santos 2016; Dharumarajan et al.
2017; Ou et al. 2017). Andriesse and Schelhaas (1987) found
that soil pH is increased when crop residues are burned, whereas
soil pH is decreased when crop residues are buried.

Comparison of CART, RF and RFRK


The results of LCCC, R2, MAE and RMSE suggest that RFRK
outperforms CART and RF in predicting soil pH. This finding
Legend may be attributed to RFRK, which inherits the advantages of RF
Water and OK, and the consideration of non-linear relationships
Soil pH amongst variables and the spatial autocorrelation of residual in
4.2–4.5 this study. The residual is the part of models that represents
4.6–5.5 unexplained information (Guo et al. 2015). Hengl et al. (2004)
5.6–6.5 reported that if residuals are spatially autocorrelated, then it can be
6.6–7.5 interpolated by kriging and combined with stepwise regression to
0 3 6 12 18 24
km 7.6–8.3 improve model performance. Kumar et al. (2012) and Guo et al.
(2015) reported that adding kriged residual to prediction models
Fig. 9. Soil pH map produced by random forest with residual kriging. can remarkably improve prediction accuracy.
RFRK also inherits the advantages of RF, i.e. the model does
not need to assume relationships between target and predictive
basis of soil properties. In the study area, soils that developed variables, and it does not lead to over-fitting. These advantages
from the Jurassic siltstone stratum exhibited remarkably lower pH enable RFRK to easily deal with various types of variables.
values than that from the Jurassic sand–mudstone stratum.
According to Vaysse and Lagacherie (2015), a parent material
Conclusion
with mineralogy and texture index is the most importance factor
of soil pH prediction. Kassai and Sisák (2018) reported that strata This research predicted the spatial distributions of soil pH by
are among the best predictors for predicting soil properties. Many using CART, RF and RFRK based on stratum, topography,
researchers also found that parent material can significantly affect vegetation and climate indicators. The results from using an
soil pH (Gray et al. 2016; Tan et al. 2017). independent validation dataset imply that RFRK outperforms
AVTP and AVPR (i.e. climatic indices) were the second and CART and RF. RFRK remarkably improved model performance
fourth most important influencing factors of soil pH in the study and enhanced prediction accuracy. The outcomes of RF indicate
area. AVTP affects soil properties by accelerating the weathering that stratum and AVTP are the two most important indicators of
and eluviation of rocks, whereas AVPR leads to soil eluviation. soil pH prediction in the study area. Thus, RFRK is a feasible and
The current study site is located in a subtropical monsoon climate reliable tool for predicting soil properties in hilly regions.
area with high temperature and precipitation. Under strong
leaching, the soils have become weakly acidic or acidic, Conflicts of interest
although the strata in this area originally had relatively high The authors declare no conflicts of interest.
contents of Ca2+ and Mg2+. Some researchers have emphasised
that soil pH is significantly and negatively related to precipitation Acknowledgements
(Helyar et al. 1990; Ou et al. 2017), whilst other researchers have
found that climate is the main influencing factor of soil pH This research did not receive any specific funding.
(Vaysse and Lagacherie 2015; Kassai and Sisák 2018).
The effects of topography on soil pH cannot be ignored. VD References
was the most important terrain factor affecting soil pH in the Andrade SFD, Mendonça-Santos MDL (2016) Predição da fertilidade do solo
study area. Topographical indices, such as VD, ELE, TWI, do polo agrícola do Rio de Janeiro por meio de modelagem solo x
VDCN and RSP, affect soil pH mainly through water and heat paisagem. Pesquisa Agropecuária Brasileira 51, 1386–1395.
redistribution. In the current work, soil pH was negatively doi:10.1590/s0100-204x2016000900037 [In Portuguese with an English
correlated with TWI, VDCN, RSP and ELE and positively abstract]
correlated with VD. This finding implies that soil pH decreases Andriesse JP, Schelhaas RM (1987) A monitoring study on nutrient cycles
in soils used for shifting cultivation under various climatic conditions in
as altitude increases. This may be due to the leaching effect.
tropical Asia. III. The effects of land clearing through burning on
With the increase of altitude, the leaching effect of rainfall fertility level. Agriculture, Ecosystems & Environment 19, 311–332.
becomes increasingly intense, and the soil salt-based materials doi:10.1016/0167-8809(87)90059-4
tend to leach easily, which leads to acidic soils. Smith et al. Band LE, Moore ID (1995) Scale: landscape attributes and geographical
(2002) and Dharumarajan et al. (2017) reported similar results, information systems. Hydrological Processes 9, 401–422. doi:10.1002/
i.e. soil pH decreases with the increase in altitude. hyp.3360090312
Digital soil pH mapping by RFRK: hilly region Soil Research 395

Beven KJ, Kirkby MJ (1979) A physically based, variable contributing area Li J, Heap AD, Potter P, Daniell JJ (2011) Application of machine learning
model of basin hydrology. Hydrological Sciences Bulletin 24, 43–69. methods to spatial interpolation of environmental variables. Environmental
doi:10.1080/02626667909491834 Modelling & Software 26, 1647–1659. doi:10.1016/j.envsoft.2011.
Breiman L (2001) Random forests. Machine Learning 45, 5–32. 07.004
doi:10.1023/A:1010933404324 Li ZW, Wang JH, Tang H, Huang CQ, Yang F, Chen BR, Wang X, Xin XP,
Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) ‘Classification and Ge Y (2016) Predicting grassland leaf area index in the meadow steppes
regression trees.’ (Chapman & Hall/CRC: Boca Raton, FL) of northern China: a comparative study of regression approaches and
Chen J, Jonsson P, Tamura M, Gu Z, Matsushita B, Eklundh L (2004) A hybrid geostatistical methods. Remote Sensing 8, 632. doi:10.3390/
simple method for reconstructing a high-quality NDVI time-series data rs8080632
set based on the Savitzky–Golay filter. Remote Sensing of Environment Ließ M, Glaser B, Huwe B (2012) Uncertainty in the spatial prediction of soil
91, 332–344. doi:10.1016/j.rse.2004.03.014 texture. Geoderma 170, 70–79. doi:10.1016/j.geoderma.2011.10.010
Dharumarajan S, Hegde R, Singhb SK (2017) Spatial prediction of major Lin LIK (1989) A concordance correlation coefficient to evaluate
soil properties using random forest techniques – a case study in semi- reproducibility. Biometrics 45, 255–268. doi:10.2307/2532051
arid tropics of South India. Geoderma Regional 10, 154–162. Lin LIK (2000) A note on the concordance correlation coefficient.
doi:10.1016/j.geodrs.2017.07.005 Biometrics 56, 324–325.
Dokuchaev VV (1883) ‘The Russian chernozem report to the free economic Liu ZP, Shao MA, Wang YQ (2013) Large-scale spatial interpolation of soil
society.’ (Imperial University of St. Petersburg: St. Petersburg) [in Russian] pH across the loess plateau, China. Environmental Earth Sciences 69,
Fick SE, Hijmans RJ (2017) Worldclim 2: New 1-km spatial resolution 2731–2741. doi:10.1007/s12665-012-2095-z
climate surfaces for global land areas International Journal of Liu Y, Cao G, Zhao N, Mulligan K, Ye X (2018) Improve ground-level
Climatology 37, 4302–4315. doi:10.1002/joc.5086 PM2.5 concentration mapping using a random forests-based
Filippi P, Cattle SR, Bishop TFA, Odeh IOA, Pringle MJ (2018) Digital soil geostatistical approach. Environmental Pollution 235, 272–282.
monitoring of top- and sub-soil pH with bivariate linear mixed models. doi:10.1016/j.envpol.2017.12.070
Geoderma 322, 149–162. doi:10.1016/j.geoderma.2018.02.033 McBratney AB, Odeh IOA, Bishop TFA, Dunbar MS, Shatar TM (2000) An
Goovaerts P (1999) Geostatistics in soil science: state-of-the-art and overview of pedometric techniques for use in soil survey. Geoderma 97,
perspectives. Geoderma 89, 1–45. doi:10.1016/S0016-7061(98)00078-0 293–327. doi:10.1016/S0016-7061(00)00043-4
Gray JM, Bishop TFA, Wilford JR (2016) Lithology and soil relationships McBratney AB, Santos MLM, Minasny B (2003) On digital soil mapping.
for soil modelling and mapping. Catena 147, 429–440. doi:10.1016/j. Geofísica Internacional 117, 3–52.
catena.2016.07.045 Mosleh Z, Salehi MH, Jafari A, Borujeni IE, Mehnatkesh A (2016) The
Guo PT, Wu W, Sheng QK, Li MF, Liu HB, Wang ZY (2013) Prediction of effectiveness of digital soil mapping to predict soil properties over low-
soil organic matter using artificial neural network and topographic relief areas. Environmental Monitoring and Assessment 188, 195.
indicators in hilly areas. Nutrient Cycling in Agroecosystems 95, doi:10.1007/s10661-016-5204-8
333–344. doi:10.1007/s10705-013-9566-9 Nielsen DR, Bouma J (Eds) (1985) ‘Soil spatial variability: proceedings of a
Guo PT, Li MF, Luo W, Tang QF, Liu ZW, Lin ZM (2015) Digital mapping workshop of the ISSS and the SSSA, Las Vegas, USA, 30 November–1
of soil organic matter for rubber plantation at regional scale: an December 1984.’ (Pudoc: Wageningen, The Netherlands)
application of random forest plus residuals kriging approach. Ou Y, Rousseau AN, Wang L, Yan B (2017) Spatio-temporal patterns of
Geoderma 237–238, 49–59. doi:10.1016/j.geoderma.2014.08.009 soil organic carbon and pH in relation to environmental factors—a
Heggelund LR, Diez-Ortiz M, Lofts S, Lahive E, Jurkschat K, Wojnarowicz case study of the black soil region of northeastern China. Agriculture,
J, Cedergreen N, Spurgeon D, Svendsen C (2014) Soil pH effects on the Ecosystems & Environment 245, 22–31. doi:10.1016/j.agee.2017.
comparative toxicity of dissolved zinc, non-nano and nano ZnO to the 05.003
earthworm Eisenia fetida. Nanotoxicology 8, 559–572. doi:10.3109/ Pahlavan-Rad MR, Akbarimoghaddam A (2018) Spatial variability of soil
17435390.2013.809808 texture fractions and pH in a flood plain (case study from eastern Iran).
Helyar KR, Cregan PD, Godyn DL (1990) Soil acidity in New South Wales Catena 160, 275–281. doi:10.1016/j.catena.2017.10.002
– current pH values and estimates of acidification rates. Australian Schwamberger EC, Sims JL (1991) Effects of soil pH, nitrogen source,
Journal of Soil Research 28, 523–537. doi:10.1071/SR9900523 phosphorus, and molybdenum on early growth and mineral nutrition of
Hengl T, Heuvelink GBM, Stein A (2004) A generic framework for spatial burley tobacco. Communications in Soil Science and Plant Analysis 22,
prediction of soil variables based on regression-kriging. Geoderma 120, 641–657. doi:10.1080/00103629109368444
75–93. doi:10.1016/j.geoderma.2003.08.018 Smith JL, Halvorson JJ, Bolton H (2002) Soil properties and microbial
Jenny H (1941) ‘Factors of soil formation.’ (McGraw-Hill: New York, NY) activity across a 500 m elevation gradient in a semi-arid environment.
Jenny H (1980) ‘The soil resources.’ (Springer-Verlag: New York, NY) Soil Biology & Biochemistry 34, 1749–1757. doi:10.1016/S0038-0717
Kassai P, Sisák I (2018) The role of geology in the spatial prediction of soil (02)00162-1
properties in the watershed of Lake Balaton, Hungary. Geologia Szatmári G, Pásztor L (2019) Comparison of various uncertainty modelling
Croatica 71, 29–39. doi:10.4154/gc.2018.04 approaches based on geostatistics and machine learning algorithms.
Kim Y, Park NW (2016) Spatial disaggregation of coarse scale satellite- Geoderma 337, 1329–1340. doi:10.1016/j.geoderma.2018.09.008
based precipitation data using machine learning model and residual Tan X, Guo PT, Wu W, Li MF, Liu HB (2017) Prediction of soil properties
kriging. Journal of Climate Research 11, 183–195. doi:10.14383/ by using geographically weighted regression at a regional scale. Soil
cri.2016.11.2.183[in Korean with English abstract] Research 55, 318–331. doi:10.1071/SR16177
Kumar S, Lal R, Liu D (2012) A geographically weighted regression kriging Tu C, He T, Lu X, Luo Y, Smith P (2018) Extent to which pH and
approach for mapping soil organic carbon stock. Geoderma 189–190, topographic factors control soil organic carbon level in dry farming
627–634. doi:10.1016/j.geoderma.2012.05.022 cropland soils of the mountainous region of southwest China. Catena
Lark RM (1999) Soil–landform relationships at within-field scales: an 163, 204–209. doi:10.1016/j.catena.2017.12.028
investigation using continuous classification. Geoderma 92, 141–165. Tziachris P, Aschonitis V, Chatzistathis T, Papadopoulou M (2019)
doi:10.1016/S0016-7061(99)00028-2 Assessment of spatial hybrid methods for predicting soil organic
Lauber CL, Hamady M, Knight R, Fierer N (2009) Pyrosequencing-based matter using DEM derivatives and soil parameters. Catena 174,
assessment of soil pH as a predictor of soil bacterial community 206–216. doi:10.1016/j.catena.2018.11.010
structure at the continental scale. Applied and Environmental Vaysse K, Lagacherie P (2015) Evaluating digital soil mapping approaches
Microbiology 75, 5111–5120. doi:10.1128/AEM.00335-09 for mapping GlobalSoilMap soil properties from legacy data in
396 Soil Research L. Wang et al.

Languedoc-Roussillon (France). Geoderma Regional 4, 20–30. Xi CF, Zhu KG, Zhou MZ, Du GH, Li XR, Zhang SY, Yang BQ, Hou CQ,
doi:10.1016/j.geodrs.2014.11.003 Tang JC, Zhou CH (1998) ‘Soils of China.’ (Chinese Agriculture Press:
Viscarra Rossel RA, Webster R, Kidd D (2014) Mapping gamma radiation Beijing) [in Chinese]
and its uncertainty from weathering products in a Tasmanian landscape Ye X, Li H, Ma Y, Wu L (2014) The bioaccumulation of Cd in rice
with a proximal sensor and random forest kriging. Earth Surface grains in paddy soils as affected and predicted by soil properties.
Processes and Landforms 39, 735–748. Journal of Soils and Sediments 14, 1407–1416. doi:10.1007/
doi:10.1002/esp.3476 s11368-014-0901-9
Wu W, Li AD, He XH, Ma R, Liu HB, Lv JK (2018) A comparison of support Zhu AX, Qi F, Moore A, Burt JE (2010) Prediction of soil properties using
vector machines, artificial neural network and classification tree for fuzzy membership values. Geoderma 158, 199–206. doi:10.1016/j.
identifying soil texture classes in southwest China. Computers and geoderma.2010.05.001
Electronics in Agriculture 144, 86–93.
doi:10.1016/j.compag.2017.11.037 Handling Editor: Abdul Mouazen

www.publish.csiro.au/journals/sr

You might also like