You are on page 1of 15

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL.

60, 2022 4505815

A Global Meta-Analysis of Soil Salinity Prediction


Integrating Satellite Remote Sensing, Soil
Sampling, and Machine Learning
Haiyang Shi , Olaf Hellwich , Senior Member, IEEE, Geping Luo , Chunbo Chen, Huili He,
Friday Uchenna Ochege , Tim Van de Voorde , Alishir Kurban , and Philippe de Maeyer

Abstract— Despite the growing interest among researchers, others), soil texture (R 2 of 0.66 in sandy areas and 0.57 in
satellite-based prediction of soil salinity remains highly uncertain. others), and the interval between sampling date and satellite data
The improvements in prediction accuracy reported in previous acquisition date (R 2 of 0.53 under the condition of over 15 days
studies are usually limited to a single area. We performed a and 0.65 in others). Generally, using different satellite data has
meta-analysis of regional satellite-based soil salinity predictions limited effects on model performance among which Sentinel-2
combined with in situ soil sampling and machine learning. performed better (R 2 = 0.72) than Landsat (R 2 = 0.66). The
Based on R 2 and root-mean-square error (RMSE) collected, sampling of subsamples for each sample should focus on their
we evaluated the effects of various features on the model accuracy subpixel-scale spatial heterogeneity across satellite data rather
and established a Bayesian network to evaluate the joint causal than the number of subsamples. It is also necessary to select
effect of multifeatures. Most significant differences were found appropriate vegetation and salinity indices for different satellite
in soil sampling schemes and characteristics of the study area, data under different vegetation conditions. Among algorithms,
including the mean and variability (averaged R 2 of 0.75 for soil random forests (R 2 = 0.70) and support vector machines (R 2 =
sample sets with lower salinity variation and 0.62 for others) of 0.71) performed best.
the salinity, climate type (R 2 of 0.64 in arid areas and 0.74 in
Index Terms— Hyperspectral, machine learning, multispectral,
remote sensing, satellite, soil salinity.
Manuscript received June 16, 2021; revised July 30, 2021 and August 18,
2021; accepted August 29, 2021. Date of publication September 15, 2021;
date of current version January 21, 2022. This research has been supported
by the National Natural Science Foundation of China (grant nos. U1803243 N OMENCLATURE
and 41877012), the Strategic Priority Research Programme of the Chinese NIR Near-infrared band.
Academy of Sciences (grant no. XDA20060302), the Team project of the
Chinese Academy of Sciences (grant no. 2018-YDYLTD-002), the West SWIR Short-wave infrared band.
Light Foundation of the Chinese Academy of Sciences (grant no. 2018- UAV Unmanned aerial vehicle.
XBQNXZ-B-011), and High-End Foreign Experts Project. (Corresponding SSC Soil salt content.
author: Geping Luo.)
Haiyang Shi and Huili He are with the State Key Laboratory of Desert ECe Electrical conductivity of saturated soil
and Oasis Ecology, Xinjiang Institute of Ecology and Geography, Chinese extract.
Academy of Sciences, Beijing 100049, China, also with the College of
Resources and Environment, University of Chinese Academy of Sciences,
EC1:5 Electrical conductivity of a 1:5 soil–water
Beijing 100049, China, also with the Sino-Belgian Joint Laboratory of Geo- dilution ratio.
Information, Ghent University, 9000 Ghent, Belgium, and also with the SEM Structural equation modeling.
Department of Geography, Ghent University, 9000 Ghent, Belgium (e-mail:
haiyang.shi@ugent.be; huili.he@ugent.be). BN Bayesian network.
Olaf Hellwich is with the Department of Computer Vision and Remote PRISMA Preferred reporting items for systematic
Sensing, Technische Universität Berlin, 10623 Berlin, Germany (e-mail: reviews and meta-analyses.
olaf.hellwich@tu-berlin.de).
Geping Luo is with the State Key Laboratory of Desert and Oasis Ecology, EM Expectation–maximization.
Xinjiang Institute of Ecology and Geography, Chinese Academy of Sciences, MI Mutual information.
Beijing 100049, China, and also with the College of Resources and Environ- FVC Fractional vegetation cover.
ment, University of Chinese Academy of Sciences, Beijing 100049, China
(e-mail: luogp@ms.xjb.ac.cn). XGBOOST Extreme gradient boosting.
Chunbo Chen, Friday Uchenna Ochege, and Alishir Kurban are RF Random forest.
with Xinjiang Institute of Ecology and Geography, Chinese Academy MLR Multiple linear regression.
of Sciences, Beijing 100049, China (e-mail: ccb_8586@ms.xjb.ac.cn;
friday@ms.xjb.ac.cn; alishir@ms.xjb.ac.cn). ANN Artificial neural network.
Tim Van de Voorde is with the Department of Geography, Ghent University, SVM Support vector machine.
9000 Ghent, Belgium (e-mail: tim.vandevoorde@ugent.be). PLSR Partial least squares regression.
Philippe de Maeyer is with Xinjiang Institute of Ecology and Geography,
Chinese Academy of Sciences, Beijing 100049, China, also with the College SD Standard deviation.
of Resources and Environment, University of Chinese Academy of Sciences, NDVI Normalized difference vegetation index.
Beijing 100049, China, also with the Sino-Belgian Joint Laboratory of Geo- EVI Enhanced vegetation index.
Information, Ghent University, 9000 Ghent, Belgium, and also with the
Department of Geography, Ghent University, 9000 Ghent, Belgium (e-mail: SAVI Soil-adjusted vegetation index.
philippe.demaeyer@ugent.be). CRSI Canopy response salinity index.
This article has supplementary downloadable material available at SI Salinity index.
https://doi.org/10.1109/TGRS.2021.3109819, provided by the authors.
Digital Object Identifier 10.1109/TGRS.2021.3109819 VI_SI Vegetation and salinity index.
1558-0644 © 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: THE LIBRARY OF CHINESE ACADEMY OF SCIENCES. Downloaded on February 07,2022 at 10:02:33 UTC from IEEE Xplore. Restrictions apply.
4505815 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 60, 2022

I. I NTRODUCTION synthesize the collective knowledge of peer-reviewed journal

S OIL salinization is a global ecological problem leading


to soil degradation; it is considered one of the main
factors limiting the sustainable development of irrigated crop-
articles, to determine which remote sensing data, feature
selection, machine learning models, and other factors can
provide the most promising contributions to improve the
lands [1], [2] and other economic activities [3]. Soil salin- prediction accuracy and under which conditions are they more
ity is dominated by a combination of various natural and effective.
anthropogenic factors, such as landform [4], [5], land surface The factors that lead to differences in the performance of
altitude [6], soil drainage condition [7], irrigation, climate, and such models are diverse, including the used satellite images,
soil texture. The rapid and accurate regional prediction of soil soil sampling schemes, characteristics of the study area, and
salinity is essential for decision-makers to formulate optimiza- algorithms that are described in the following.
tion policies for regions with soil salinization risks, which can
1) Satellite Images: Multispectral satellite sensors were
avoid unreasonable use of water and land resources in irrigated
the preferred method for mapping and monitoring soil
areas as well as continuous degradation of land ecosystems.
salinity, largely due to the low cost of such imagery. The
In arid areas, excessive irrigation will cause the shallow
most commonly used multispectral remote sensing data
groundwater level to rise and cause secondary salinization if
source for soil salinity estimation was Landsat (e.g., TM,
the drainage conditions are poor [8]; 20% of irrigated land
ETM+, and OLI) [17], [18] with a 16-day revisit period
over the world is affected by salinity, especially in Australia,
due to open access and the high resolution of 30 m.
China, Egypt, Uzbekistan, India, Iran, Iraq, Mexico, Pakistan,
By contrast, Sentinel-2 [19]–[21] has a higher resolution
and so on [9]. Due to the increasing population and food
of 10–20 m and a smaller temporal granularity of five
demand, the potential increasing irrigation in arid/semiarid
days. Compared with Landsat and Sentinel, MODIS has
regions may result in a higher risk of salinization in more
the advantage of smaller time granularity to capture
croplands in the future. Due to its advantages in providing
salinity dynamics and the disadvantage of much lower
regional and dynamic information based on imagery, satellite
spatial resolution. The NIR and SWIR of these sensors
remote sensing, such as multispectral, hyperspectral, and active
are considered effective to distinguish the spectrum of
microwave images, has become an important approach to
saline soil [2], [10]. Also, various VI_SIs calculated
monitor regional soil salinity [2], [10]. One of the most
from multibands have been used as input variables for
commonly used approaches is establishing a regression model
the prediction model, but it is still unclear when they
of pixel-scale image data and ground in situ salinity of
are more effective. The limitations of these multispectral
soil samples. Subsequently, the established model is used
remote sensing methods can be attributed but not limited
to estimate soil salinity at a regional scale based on the
to the relatively low spectral resolution, especially when
reflectance value of the images. With the increasing popularity
applied to areas with the spectral mixture and subpixel
of machine learning approaches in the field of remote sensing
spatial heterogeneity. Hyperspectral satellite data can
application research in recent years, a variety of machine
purportedly be used to solve spectral mixture problems
learning algorithms, such as RF, SVM, and ANN, have enabled
caused by insufficient spectral resolution of the tradi-
us to incorporate more satellite imagery variables and auxiliary
tional multispectral approach. Some hyperspectral data,
data such as terrain, vegetation, soil texture/moisture, and
such as EO-1 Hyperion and HJ-1, have been used for soil
hydrological data into the regression model to fit the nonlinear
salinity estimation [22], [23]. However, the availability
relationship with the salinity of soil samples. Such studies
of these hyperspectral datasets may limit their large-
using various methods and satellite remote sensing data have
scale applications. Besides, active (radar) microwave
been applied to salt-affected areas around the world [11]–[16].
imagery, such as Sentinel-1 backscatter data, has also
However, the comprehensive and systematic evaluation of how
been used to estimate soil salinity [24], [25], which
to improve their performance and universality remains limited.
currently has shown its advantages in soil moisture
Researchers have made considerable efforts to improve the
retrieval [26]. However, the characterization of saline
accuracy of soil salinity prediction using machine learning and
soils using radar imagery and complex dielectric con-
remote sensing at the regional scale. Most published papers
stants determined by radar backscattering inversion tech-
prove the effectiveness of their proposed improvements by
niques requires some soil moisture, as the measurements
comparing the accuracy of the developed predictive model
depend principally on dielectric constant and permittiv-
with the existing models. However, the improvements reported
ity determined by moisture conditions and chemical and
in these studies are usually limited to a single area. In these
biological compositions [27]. UAV remote sensing has
comparisons, it is still difficult for us to promote a general
also been used for soil salinity estimation [28], and it
guideline for selecting suitable remote sensing data, features,
may have advantages due to its high resolution. How-
and models. In addition, in many cases, even when comparing
ever, its ability to estimate salinity at a regional scale
similar modeling methods, different studies may report con-
remains very limited compared with satellite remote
flicting results, and it is challenging to obtain general recom-
sensing approaches.
mendations with these isolated individual studies. Therefore,
2) Soil Sampling Schemes: The following conditions hold.
questions such as “Which remote sensing data is the best?” and
“how can we effectively improve the prediction accuracy of a) Differences in soil sampling schemes affect model
soil salinity?” have not been answered. Therefore, we should performance. In study areas of different sizes,

Authorized licensed use limited to: THE LIBRARY OF CHINESE ACADEMY OF SCIENCES. Downloaded on February 07,2022 at 10:02:33 UTC from IEEE Xplore. Restrictions apply.
SHI et al.: GLOBAL META-ANALYSIS OF SOIL SALINITY PREDICTION 4505815

the spatial density and representativeness of soil climate, the higher the topsoil salinity may be due to
samples will affect the predictive capability of the extremely high potential evaporation and insufficient
the established model. In addition, at the sub- precipitation leaching. Soil texture may affect the topsoil
pixel scale, different sampling schemes were used. salinity through the transport of soil water and salt.
In some studies, the salt content of a soil sample In low-lying terrain, salt may accumulate due to lack of
represents the salinity of one pixel, while in other drainage. To some extent, the model performance may
studies, the average value of multiple (approxi- also vary when applied to study areas with different
mately 3 to 5) adjacent samples represents the characteristics. There are differences between applica-
salinity of one pixel or one plot to reduce errors tions in arid/semiarid areas and other areas due to
and enhance representativeness. different climates, vegetation coverage, human activities,
b) For the study areas with multiple land use and topsoil salt dynamics. The possibility to accurately
and cover types, it is important to con- estimate soil salinity varies with soil moisture content,
sider the representativeness of soil sampling in salt pureness, coatings, and spectral contrast with other
advance. surface features (e.g., braided stream beds, eroded terrain
c) The temporal representation of soil sampling surfaces with truncated soils, and nonsaline silt-rich
should also be considered. The dynamic changes structural crusts) of the study areas [2]. The interference
of topsoil salinity may be rapidly affected by fluc- of vegetation in the study area may also cause spectral
tuations in hydrological and meteorological vari- confusion, especially in the green and red bands [31].
ables [2] and human activities such as irrigation. Therefore, there are probably different mechanisms of
To minimize the effect of the difference in surface the surface salinity retrieval between salinized bare
salt between the date of soil sampling and the date land with low vegetation coverage and salt-affected
of acquisition of satellite images except for studies cropland with higher vegetation coverage. In salinized
using multiyear data [29], [30], we need to consider bare land, the spectral interference from vegetation may
the abovementioned factors synergistically in the come from salt-tolerant halophytes mainly. In cropland,
soil sampling scheme. However, except for studies it mainly comes from the crop and varies with stages
in which the soil sampling and remote sensing of the growing season. Due to the potentially stronger
data acquisition are scheduled on the same day, dynamics of vegetation and soil moisture, the complexity
the analysis of the errors caused by the time of soil salinity retrieval in cropland may be higher.
difference of the soil sampling and remote sensing Although some vegetation indices (e.g., NDVI, EVI,
data acquisition is still limited. and CRSI) have also been used to indirectly infer
d) In terms of different soil salinity extraction meth- salinity by monitoring canopy temperature and crop
ods, such as SSC, ECe, and EC1:5, the salinity growth [18], 32], [33] based on the assumption of
values of the target variable of models were mea- the negative correlation between soil salinity and crop
sured in different units (e.g., in dS/m and g/kg) or growth with an assumed steady-state approach of soil
dimensions. It may be one of the reasons why it is salinity and make use of multiyear data to filter other
difficult to compare the model accuracy of multiple factors that may be more transient in time such as
studies. irrigation water management, pest, and diseases, it is still
e) The depths of sampling in different studies also uncertain how the real salinity dynamics may cause the
vary. Although most are described as the same failure of the steady-state assumption. Also, the topog-
“estimation of surface salt,” the actual sampling raphy variability, hydrometeorological variability, and
depths may be at 0–5 cm, 0–15 cm, 0–20 cm, the intensity of human activities (e.g., irrigation) of the
and so on, which differ from the depth that can study area may dominate the effectiveness of introducing
be detected by remote sensing. auxiliary data (e.g., topographic variables derived from
f) In some sampling schemes, the salinity value of DEM). If the terrain of the study area is very flat, adding
each sampling point or plot is represented by the terrain data derived from DEM to the model may not
mean value of several surrounding subsamples. provide additional information related to salt transport
This method was believed to be able to reduce and thus cannot substantially improve the performance
the error, which considers the representativeness of of the model.
the sample and the matching with the pixel of the 4) Algorithms: Different algorithms may have their advan-
satellite image. However, whether this approach is tages when applied to fit the relationship between
reasonable and how to weigh the representative- the salinity of soil samples and the covariates. For
ness, the number of subsamples in each sample, example, neural networks may have advantages in
and the size of the plot (as the cost of the sampling) nonlinear fitting [34] and RF may avoid overfitting
is still unclear. due to the introduction of randomness [35]. How-
ever, when focusing on fitting the relationship between
3) Characteristics of the Study Area: The contributions of soil salinity and covariates, it is still unclear which
climate, topography, soil texture, and so on to topsoil algorithm is more universal and why it performed
salt accumulation vary with study areas. The drier the better.

Authorized licensed use limited to: THE LIBRARY OF CHINESE ACADEMY OF SCIENCES. Downloaded on February 07,2022 at 10:02:33 UTC from IEEE Xplore. Restrictions apply.
4505815 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 60, 2022

The main purpose of this study was to conduct a meta-


analysis of published satellite remote sensing predictions of
regional soil salinity combined with in situ soil sampling and
machine learning. We define “meta-analysis” as a method
to systematically integrate results from peer-reviewed studies
and, through statistical comparison, assess the patterns of
effects (e.g., the statistical metric of prediction model per-
formance) by satellite data type and other features of interest.
Furthermore, we analyze the complex relationships among the
multiple variables collected focusing on model performance.
Compared with SEM [36], which has been used in meta-
analysis to analyze the relationship between such multiple
factors [37]–[39], BN may have advantages in integrating
expert knowledge and inference [40], [41]. Therefore, in this
study, we will use BN to analyze the causal relationships
within the process of integrating soil sampling, remote sensing
data analysis, and machine learning. The remainder of this
article is organized as follows. We first detail the methods
used to conduct the meta-analysis, including selection and data
extraction from published studies and the subsequent analysis. Fig. 1. PRISMA flow diagram for including articles.
Next, we present results of the evaluation of the model
performance divided by satellite data type, soil sampling
are replicated by the model, based on the proportion
schemes, characteristics of the study area, and algorithms.
of total variation of outcomes explained by the model.
Third, we analyze the joint probability effects of multiple
It was selected over other accuracy measures because it
features on the model performance based on the BN as a
is most frequently reported and thus would result in a
tool to guide future practical studies with the tradeoff between
larger sample size of articles.
accuracy and cost considered. Finally, we discuss the impli-
5) Only articles published in English journals were
cations of the findings for remote sensing prediction of soil
included.
salinity. The results of this summary provide some guidance on
6) Two global-scale studies [43], [44] were excluded
the expected performance of soil salinity prediction in various
because the number of used soil samples data and spatial
conditions. Moreover, the proposed method of using BN for
scale was considerably larger than other included articles
causal multifeature meta-analysis can be referenced in other
applied at the watershed scale (focused in this meta-
meta-analysis studies (e.g., a meta-analysis of remote sensing
analysis).
estimation of soil moisture).

B. Features of the Prediction Processes Evaluated


II. M ETHODOLOGY
We consider a variety of features in Table I (see the Sup-
A. Protocol for Selecting the Sample of Articles
plementary Material) involved in the soil salinity prediction
Based on the Scopus database, we designed and applied a process (Fig. 2) from the included papers. These variables can
general query on title, abstract, and keywords to include rele- be roughly categorized into three groups: algorithms, remote
vant articles with the “OR” operator applied among expressions sensing and auxiliary data, and characteristics of the study
in Table I. We followed PRISMA [42] in the selection process area and soil sampling. We recorded the algorithms used in
(Fig. 1). The used criteria to include articles are as follows. the articles, and the performance of each model is processed as
1) Articles were limited to regional soil salinity prediction a data record. If multiple algorithms are applied comparatively
using satellite data. Articles mainly focusing on UAV to the same dataset and region, then multiple data records will
sensors or ground in situ hyperspectral sensors were not be extracted. In the case of the same algorithm used, models
included. that use different data or features will also be recorded as
2) Only articles with soil salinity sampling reported were multiple records. The later meta-analysis is performed based
included. on these records. R 2 (in the validation phase) is extracted as
3) Only articles using multivariable regression were the main model performance evaluation indicator and the root-
included. Articles aimed at soil salinity-level classifica- mean-square error (RMSE) is also included in the evaluation
tion were not included. if reported in this article. However, the units and dimensions
4) Only articles with determination coefficient R-square of RMSE for different salt measurement methods are different.
R 2 of the validation or test step as the measure of Therefore, we use conversion factors to uniformly convert the
model performance reported were included. R 2 is the RMSE, mean, maximum, and SD value of the salinity of
proportion of the variance in the dependent variable that soil sample collections to ECe with the unified unit dS/m.
is predictable from the independent variable(s), which We extracted the average soil texture of soil samples from
provides a measure of how well the observed outcomes the SoilGrids1km [45] data of the basin-scale HydroSHEDS

Authorized licensed use limited to: THE LIBRARY OF CHINESE ACADEMY OF SCIENCES. Downloaded on February 07,2022 at 10:02:33 UTC from IEEE Xplore. Restrictions apply.
SHI et al.: GLOBAL META-ANALYSIS OF SOIL SALINITY PREDICTION 4505815

TABLE I
A RTICLE S EARCH Q UERY D ESIGN : ‘[A1 OR A2 OR A3…]
AND [B1 OR B2…] AND [C1 OR C2…]’

product in Table I (see the Supplementary Material) and the


conversion factors are determined based on these extracted soil
texture features in Table II (see the Supplementary Material)
based on experience [46], [47]. As a result, most of the
studies were given medium conversion factors because the
sand content (%) and clay content (%) are not very high.

C. BN Used to Model the Joint Causal Effects of


Multifeatures as a Tool Guiding Future Research
In addition to the above single analysis of the impact of
each feature, it is necessary to simulate the causal relationship
between model performance and the multifeature joint effects.
We perform it using the BN for its potential to incorporate
expert knowledge as a tool for guiding future research. Con-
cretely, we process the multidimensional data collected from
the articles into a training dataset with R 2 and RMSE as the
target variables and other variables as the independent features.
Based on empirical knowledge, we determine the directional Fig. 2. Features of the soil salinity prediction process. The central part
causal link between nodes and the status discretization of each shows the general process of soil salinity prediction modeling using remote
variable node. Furthermore, the BN is parameterized by the sensing, soil sampling, and machine learning methods. The upper part shows
the potential sources of errors when using different remote sensing data. The
training data with the joint probability calculated. lower part shows the factors that may cause variation in prediction accuracy
The nodes of a BN represent random variables (X 1 ,…, X n ) at the stage of ground in situ soil sampling.
with their joint probability distribution [48] calculated as

n
P(X) = P(X 1 , X 2 , . . . , X n ) = P(X i | pa(X i )) (1) where H represents the entropy, Q represents the target node,
i=1 F represents the set of other nodes, and q and f represent the
where pa(X i ) represents the value of the parent node of status of Q and F, respectively.
the node X i . With expert knowledge, the prior conditional We also develop another BN which included the studies
probability table of BN can be given. Then, we use the EM using the Landsat data (most commonly used) only with
algorithm [49] to incorporate observational data extracted from a tradeoff analysis of cost and accuracy. The costs include
the included articles into the BN to update the conditional transportation cost, soil analysis cost, and labor cost for
probability table. sampling, and they are calculated as follows:
To evaluate the sensitivity of model performance nodes to C1 = a L + bn (3)
various characteristics, we use the sensitivity analysis of the Nd
BN with MI, which indicates the entropy reduction of the C2 = n p (4)
5 × 30
distribution of the child node caused by the determination C3 = C1 + C2 (5)
of the status of the parent node. The greater the MI value,
the higher the sensitivity of the child node to the parent node, where C1 is the sum of the transportation cost and soil analysis
accompanied by stronger causality. The formula of MI is as cost, C2 is the labor cost for sampling, C3 is the total costs,
follows: a is the gas bill per kilometer, b is the soil analysis cost per
soil sample, L is the estimated total distance of the sampling
MI = H (Q) − H (Q|F)
   event, n is the number of samples, N is the number of
P(q, f )
= P(q, f ) log2 (2) subsamples of each sample, d is the width of the sampling
q f
P(q)P( f ) plot, and p is the labor cost under the five subsamples and

Authorized licensed use limited to: THE LIBRARY OF CHINESE ACADEMY OF SCIENCES. Downloaded on February 07,2022 at 10:02:33 UTC from IEEE Xplore. Restrictions apply.
4505815 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 60, 2022

TABLE II
T RADEOFF A NALYSIS B ETWEEN A CCURACY AND C OST U NDER
D IFFERENT C OMBINATIONS OF VARIABLE S TATUS

Fig. 4. Number of publications per year.

B. Meta-Analysis: Evaluation of the Effects of Various


Features
Under the control of different types of characteristics,
the difference in the distribution of R 2 is analyzed (see Fig. 5).
These categories are divided into three groups focusing on
algorithms, satellite and auxiliary data, and characteristics of
the study area and soil sampling. The analysis results are as
follows.
1) In Terms of Algorithms: The articles using Cubist
showed the highest average R 2 . Among algorithms
that used frequently, RF and SVM performed bet-
ter than ANN and MLR, while PLSR performed the
Fig. 3. Location of studies included in the meta-analysis (n = 57 worst. It also matches the results of internal com-
experiments).
parisons of studies that developed multiple models
based on the same datasets and model parameters
(see Fig. 6) with the potentially disturbing effects of
30 × 30 m plot condition. In fact, sampling and labor costs other features on the model performance are strictly
in different regions or research institutes of the world are limited.
different, so this study only uses this as an example of definite 2) In Terms of Remote Sensing and Auxiliary Data: The
cost. performance of models using different satellite data
is different. Among the commonly used multispectral
III. R ESULTS satellites, Sentinel-2 performed significantly better than
Landsat which is most frequently used. The performance
A. Articles Included in the Meta-Analysis of the relatively rarely used MODIS was similar to that
A total of 57 articles were selected and included in the of Landsat. The models using active microwave radar
meta-analysis during PRISMA flow (Fig. 1). The geograph- data such as Sentinel-1 also performed well, and the
ical coverage of these articles was mainly Asia (especially performance was between Sentinel-2 and Landsat. The
in Iran and Xinjiang, China), Africa, and North America performance of the models using hyperspectral satellite
(Fig. 3). Most of them are located in arid and semiarid data, such as HJ-1 and EO-1 Hyperion, is similar to
regions with high potential evaporation and low precipitation. that of Landsat. Unexpectedly, the models using mul-
Irrigated agriculture in these areas is mostly affected by soil tiple satellite data combinations did not show strong
salinization (especially in the Middle East, Central Asia, and performance. Using a combination of SWIR and NIR
North Africa). Articles in nonarid areas are mostly near the is more effective than just using NIR. Besides, using
sea, and salinization is caused by seawater intrusion into hydrometeorological data and terrain data derived from
coastal groundwater. The number of published articles within DEM slightly improved the performance.
this domain has increased rapidly in recent years (Fig. 4), 3) In Terms of the Characteristics of the Study Area
and the combination of satellite remote sensing and machine and Soil Sampling: A medium S/n value corresponds
learning for regional soil salinity prediction has attracted more to higher performance. The interval between the soil
researchers’ interest. sampling date and the remote sensing data acquisition

Authorized licensed use limited to: THE LIBRARY OF CHINESE ACADEMY OF SCIENCES. Downloaded on February 07,2022 at 10:02:33 UTC from IEEE Xplore. Restrictions apply.
SHI et al.: GLOBAL META-ANALYSIS OF SOIL SALINITY PREDICTION 4505815

Fig. 6. Comparison of selected pairs of regression algorithms based on R 2 .


Regression algorithms: RF, MLR, ANN, SVM, and PLSR.

Fig. 5. Summary and test results of feature categories in the predic-


tion process displayed in groups: (a) algorithms, (b) remote sensing and
auxiliary data, and (c) characteristics of the study area and sampling.
L, M, S1, S2, DEM, and Hydro respectively represent Landsat, MODIS,
Sentinel-1, Sentinel-2, terrain data derived from DEM, and hydrometeorologi-
cal data, respectively. Their combinations are indicated by an underscore (e.g.,
DEM_S_Hydro represents the combined usage of terrain data, soil texture, and
hydrometeorological data). S/n is the area of the study area divided by the
number of soil samples. FVC is the fractional vegetation coverage of the study
area. In the box plots, the center lines represent the means, and the box limits
represent the interquartile ranges.

date significantly affected the model performance, and Fig. 7. Correlation matrix of evaluated features focusing on R 2 and RMSE.
the subgroup with an interval greater than 15 days RMSE_c, max_c, mean_c, and SD_c are, respectively, the converted RMSE of
performed worst. It illustrates that the matching of the models and the converted max value, mean value, and SD of salinity derived
from soil samples, respectively. Sand0_30 is the sand content of the 0–30-cm
sampling date and the satellite image acquisition date topsoil. S/n is the value of the study area divided by sample numbers.
is critical due to the correlation between the surface
salt and the remotely sensed spectral data may decrease
when the interval is more than 15 days. Therefore, when irrigation water volume/frequency and evaporation. The
using satellite data with a revisit period of more than average sand content fraction and FVC in the sampling
15 days (e.g., Landsat), incorporating the revisit period area are indirectly extracted from the literature with
and the projected quality of remote sensing images a certain uncertainty. Higher sand content corresponds
into the design of the sampling scheme may improve to better model performance, which may be related to
the model performance. When applied to collections lower spectral interference of the topsoil moisture due
of samples with higher mean salinity or its variation, to low soil water retention, and also, the salinity of
the average model performed worse. For the sample topsoil with high sand content is inherently relatively
collections with higher salinity, the correlation between low. Also, a higher FVC corresponds to a better model
salinity and spectrums may be lower. The relatively poor performance but not significantly. In addition, unexpect-
performance of models in arid and semiarid regions may edly, no significant differences in model performance
be related to both high salinity and its variation, as well were observed under low, medium, and high FVC
as stronger salt vertical transport due to the higher conditions.

Authorized licensed use limited to: THE LIBRARY OF CHINESE ACADEMY OF SCIENCES. Downloaded on February 07,2022 at 10:02:33 UTC from IEEE Xplore. Restrictions apply.
4505815 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 60, 2022

Fig. 9. Effect of the VI/salinity index calculated by the bands of different


satellite images on the model performance when applied to the study area
with different FVC. FVC is classified into one, two, and three levels, which
Fig. 8. Effect of the number of subsamples per soil sample and the size represent low (mainly bare land), medium (a mixture of cropland and bare
of the sampled plot (measured by its width) on the model performance using land), and high (mainly cropland) vegetation coverage, respectively. Note that
different satellite images. “non”: information was not reported in this article. records that did not use the five indices of NDVI, EVI, SAVI, CRSI, and SI
are not included. In the box plots, the center lines represent the means, and
the box limits represent the interquartile ranges.
On another level, we also evaluated the studies with R 2 ,
the normalized RMSE, max, mean, and SD (of the salinity
of the sample collection) at the ECe phase all recorded little than using 3–4 subsamples on the 20 × 20 m2 plot.
(Fig. 7). In a multiobjective evaluation criterion, features not In summary, in terms of model performance, the effect of
only negatively correlated with RMSE but also positively this improvement of multi subsamples may be insignificant.
correlated with R 2 are the most effective for improving On the other hand, due to the difference in the wavelength
the performance of the model. S/n satisfies this condition range of different bands of satellite images, the evaluation of
but not significantly ( p-value > 0.1). Also, the use of various vegetation/salinity indices was subdivided by differ-
Landsat and FVC showed a significant negative correlation ent satellites (see Fig. 9). Among all VI_SIs, CRSI is the
with RMSE, which means to improve the prediction of most useful for improving the model performance. Under the
the model. medium FVC conditions, the model performance is better than
the low and high FVC conditions. Under medium and high
FVC conditions, vegetation indices, such as NDVI, EVI, SAVI,
C. Meta-Analysis: The Potentially Applicable Plot Sampling
and CRSI that simultaneously monitor vegetation conditions,
Design and Vegetation/SI When Using Different Satellite
performed better than SI, especially for studies using Landsat
Images
data. Under low FVC conditions, the performance of SI was
The selection of vegetation and SI in prediction model better than other vegetation indices. Besides, CRSI and SI of
construction and a field-scale soil sample design is often more sentinel-2 performed correspondingly better than CRSI and SI
practical. On one hand, the number of subsamples of each of Landsat under various FVC conditions. This shows that it is
sample and the area covered by the sampling plot may be necessary to select appropriate vegetation and SI for different
related to the pixel size of the satellite image. Therefore, satellite data under different FVC conditions.
the effect of the sampling design of studies using different
satellites was evaluated (see Fig. 8). When the sampling plot
is larger, the number of subsamples is also larger. In the studies D. Joint Causal Effects of Multifeatures and the Tradeoff
using only Landsat data, plots width smaller than 20 × 20 m2 Between Accuracy and Cost Based on the BN
correspond to fewer than four subsamples, and plots larger To build the BN with all records of included studies (BNall ),
than or equal to 30 × 30 m2 correspond to more than five we constructed the causal relationship corresponding to the
subsamples. In the studies using only Sentinel-2 data, plots prediction process and quantified its conditional probability
width smaller than 10 × 10 m2 correspond to fewer than five table using the data collected from the articles (see Fig. 10).
subsamples, and plots larger than or equal to 20 m correspond R 2 and RMSE are regarded as the main target variables, and a
to more than ten subsamples. It shows that when the sample variety of features are regarded as parent nodes that influence
plot area is small (smaller than the size of satellite image the target variable through the conditional probability formula.
pixel), researchers tended to use a small number of subsamples Based on experience and expert knowledge, we set the value
with lower costs. When the sampling plot reaches or exceeds range of each status of each feature node. Subsequently, the
one image pixel, researchers used more subsamples to try to EM algorithm was used to compile the network. Based on
reduce the error. When the sample plots were further increased the sensitivity analysis of BNall (Fig. 12), we found that R 2
(e.g., the 500 × 500 m2 plots when using Landsat data), and RMSE are more sensitive to S/n, auxiliary data, and FVC,
the number of samples did not increase accordingly. At the which is consistent with the results of the single feature analy-
30-m resolution of Landsat, the model performance of studies sis. With the joint influence of multiple features considered,
using five subsamples on the 30 × 30 m plot only improved a the model performance is less sensitive to the satellite remote

Authorized licensed use limited to: THE LIBRARY OF CHINESE ACADEMY OF SCIENCES. Downloaded on February 07,2022 at 10:02:33 UTC from IEEE Xplore. Restrictions apply.
SHI et al.: GLOBAL META-ANALYSIS OF SOIL SALINITY PREDICTION 4505815

Fig. 10. Joint probability effects of features on the model performance (R 2


and RMSE) of the BN based on all studies (BNall ). The values before and after
the “±” indicate the mean and SD of the distribution, respectively. S/n: study
area divided by the number of samples. FVC: fractional vegetation coverage.
L, M, S1, S2, DEM, and Hydro, respectively, represent Landsat, MODIS,
Sentinel-1, Sentinel-2, terrain data derived from DEM, and hydrometeorolog- Fig. 12. Sensitivity of the R 2 and RMSE to the findings at various feature
ical data. nodes of BNall based on (a) all studies and (b) BNLandsat based on studies using
Landsat data. MI: mutual information between nodes. MI-R 2 and MI-RMSE
indicate the sensitivity of R 2 and RMSE to various nodes, respectively.

(transportation cost, soil sample salinity analysis cost, and field


sampling labor cost). Similar to the BNall , the R 2 and RMSE
nodes of BNLandsat are more sensitive to FVC and S/n (see
Fig. 12). Besides, they are also sensitive to the algorithm,
the combinations of vegetation/salinity indices, the SD of the
salinity of samples, and the interval days between the date of
soil sampling and the satellite image acquisition. The R 2 node
of BNLandsat is not significantly sensitive to the plot size and
the number of subsamples.
In addition, the established BN can be used to guide future
research as a tool to project the performance and cost of soil
salinity prediction models. Using interactive operations and
entering the findings of the nodes, we can infer that the pos-
terior probability distribution of R 2 , RMSE, and cost-related
nodes with the status of some nodes determined (see Fig. 13).
Changes in the probability distribution of nodes can help us
recognize the potential impacts of various variables. We also
performed a scenario analysis with BNLandsat to demonstrate
the effectiveness of the BN in the analysis of the tradeoff
Fig. 11. Joint probability effects of features on the model performance (R 2
and RMSE) of the BN based on studies using Landsat data (BNLandsat ) with
between accuracy and cost under different combinations of
the tradeoff between accuracy and cost included. The values before and after variable status in Table II. It shows that the optimization of the
“±” indicate the mean and SD of the distribution, respectively. S/n: study target variables (e.g., accuracy and cost nodes) under different
area divided by the number of samples. FVC: fractional vegetation coverage.
VI_SI: vegetation and salinity indices. combinations of node status is not synchronized, possibly due
to the potential tradeoff relationship.

sensing data used than other features. Among different satellite IV. D ISCUSSION
data, the model performance is more sensitive to Sentinel- Many articles have evaluated various satellite remote sens-
2 and the least sensitive to MODIS. ing data and machine learning algorithms to improve the
Also, we further developed a more practical BN suitable prediction accuracy of regional soil salinity. To date, the results
for studies using Landsat data (BNLandsat ) (Fig. 11) with not of these studies have not been combined and cannot provide
only more detailed variables included (e.g., plot size, the a final guideline for the selection of the soil salt prediction
number of subsamples, and the VI_SIs calculated with Landsat process. Our work filled this gap by performing a meta-
image bands) but also the tradeoff between accuracy and cost analysis of peer-reviewed studies, by statistically quantifying

Authorized licensed use limited to: THE LIBRARY OF CHINESE ACADEMY OF SCIENCES. Downloaded on February 07,2022 at 10:02:33 UTC from IEEE Xplore. Restrictions apply.
4505815 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 60, 2022

may be overestimated. As the two most commonly used


multispectral satellite data for salinity prediction, Sentinel-
2 performed a little better than Landsat probably due to
higher spatial and temporal resolution [50]–[52]. The sat-
isfactory performance of MODIS illustrates the importance
of time granularity, which is also consistent with the sig-
nificant impact of the interval between the sampling date
and the remote sensing data acquisition date. When satellite
data are used for soil salinity prediction, the small tempo-
ral granularity may be able to compensate for the negative
effects of low spatial resolution. This also requires us to
further study the spatial and temporal errors. The role of
auxiliary data may also be limited especially in cultivated
areas where topographical, hydrological, and meteorological
variables may not be the dominant factors of soil salinity
variation.
The most significant differences between model perfor-
mances were found in different soil sampling schemes and
characteristics of the study area. First, the higher the mean
value and SD of the salinity of the sample collection,
the poorer the model performance. This shows the difficulty
Fig. 13. Inference based on tBNLandsat with the status of some nodes
determined. to establish a quantitative regression relationship between high
salinity values and covariates. Sample collections with high
salinity variability may be accompanied by strong spatial
the improvement in model performance achieved based on heterogeneity, which may lead to strong spectral confusion
different features and regression algorithms. The current state- and worse model performance [2]. Second, the temporal error
of-the-art technology was determined to provide pragmatic cannot be ignored when the sampling date and the remote
guidance from several years of published research, aimed at sensing data acquisition date are inconsistent. In previous
improving the accuracy of soil salinity prediction. We per- studies, this kind of date interval occurred possibly due to
formed a meta-analysis of published regional satellite remote the assumption of researchers that the surface salt is relatively
sensing predictions of soil salinity combined with in situ stable within the interval, but this study shows that this
soil sampling and machine learning to better understand this may require more caution, due to surface salinity dynamics
approach and to guide future research. The effects of multiple caused by human activities, hydrology, climate, and vege-
features on the performance of the model were analyzed tation conditions especially in irrigated areas. Third, under
separately and jointly. Our analysis has shown that, despite the different vegetation coverage conditions, the applicability of
growing interest among researchers, satellite-based predictions some VI_SIs is indeed different. For example, under low FVC
of soil salinity remain uncertain, with potential for measure- conditions, the average performance of the SI, which pays
ment errors. The main practical application of our results more attention to the difference in the spectral reflectance of
is to help researchers decide which improvement methods bare soil salinity, is better than other vegetation indices that
are most promising with the developed BN as an interactive pay more attention to vegetation conditions. With the rapid
and visual tool. With a clearer understanding of the expected development of current machine learning methods, it may be
improvements achieved through including different features, still feasible to use some ineffective variables as input, but if
we can determine the priority of work and avoid practices the number of samples is limited, feature selection may still
that may decrease model accuracy in the actual modeling be necessary [53]. Fourth, the design of subsamples matching
prediction process. satellite image pixels may also have a certain meaning, but
too many subsamples may be unnecessary, and more attention
should be paid to the subpixel-scale spatial heterogeneity of
A. Opportunities and Challenges Exist for Improving Model the subsamples. It is also related to the cost because the cost
Performance Based on the Main Findings of This of a large number of subsamples in each plot (especially at
Meta-Analysis a large plot, such as the 500-m-wide pixel of MODIS) is
The applications of different satellite images have limited significantly higher including labor costs and the cost of salt
effects on the improvement of model performance, compared chemical analysis.
with the differences in the soil sampling schemes and the In summary, most of the current satellite data used for
characteristics of the study area. It indicates that the effects regional soil salinity prediction have certain practicability, and
of different temporal and spatial resolutions and spectral we may need to pay more attention to the matching of the soil
resolutions of various satellite remote sensing data on accuracy sampling schemes and the characteristics of the study area

Authorized licensed use limited to: THE LIBRARY OF CHINESE ACADEMY OF SCIENCES. Downloaded on February 07,2022 at 10:02:33 UTC from IEEE Xplore. Restrictions apply.
SHI et al.: GLOBAL META-ANALYSIS OF SOIL SALINITY PREDICTION 4505815

with these satellite data with an in-depth understanding of the However, in this study, the number of samples was
specific response mechanism of remote sensing data to the regarded as one of the evaluated features due to the lack
salinity variation, especially in terms of temporal and spatial of convincing methods to determine the weights between
granularity. the models of the included articles.
2) Limitations of the Criteria for Including Articles: For
the quantitative evaluation of model accuracy, we only
B. Choice of the Features Involves Tradeoffs Between
selected studies that established multivariate regression
Accuracy and Costs
models and reported R 2 . Studies focusing on the classi-
In the practical implementation and prediction, we need to fication of soil salinity levels were not included, and
consider the sampling scheme, the remote sensing data, and the studies that established a univariate regression model
tradeoff of the expected accuracy and cost (e.g., satellite data were not included. It may lead to the incompleteness and
processing cost, plot subsamples sampling cost, transportation- limitation of the evaluation of some features. The remote
related cost, and soil sample analysis cost). The built BN sensing datasets used in the included articles are still
that is based on previous studies can provide quantitative mainly multispectral data and, thus, the evaluation of
guidance with the practicality in modeling the joint effects of the studies using active/passive microwave remote sens-
multifeatures on the model performance. Under the conditions ing [59], [60], and hyperspectral data are insufficient.
of the combined multivariate changes, it can analyze the main In addition, when considering the relationship between
influencing factors and effectively estimate the probability the measured soil salinity and remote sensing data, this
distribution of model performance with some status of feature study did not distinguish those studies that have used
nodes determined. Also, the BN can guide soil sampling remote sensing data for multiyear [29], [30] focusing
schemes through diagnostic analysis. For example, if we only on the root zone but still calculated the time interval
need a moderate accuracy, the BN can give recommendations based on the remote sensing data acquisition date in the
for soil sampling density with different satellite data. The same year as the soil sampling. However, since there are
tradeoff between accuracy and cost may also vary with the only a few studies using data for multiyear, its impact
research aims. Most articles included in this meta-analysis on the evaluation results may be limited. Also, only
mainly focused more on achieving high prediction accuracy articles published in English journals were included.
with high spatial resolution datasets. However, in practical However, many studies may have been published in
agricultural management, our needs for the high spatial reso- other languages (e.g., Chinese, Spanish, and Arabic)
lution may not be very essential because coarse-scale salinity because most areas affected by soil salinization are not
prediction is enough to support agricultural management (e.g., English-speaking countries.
determining the sowing area before the growing season). 3) Uncertainty of the Extracted Feature Information: First,
Although MODIS has disadvantages in spatial resolution, it because most studies use far more soil sample inputs
has significant advantages over Sentinel-2 and Landsat in in the regression model than the covariates, we did not
temporal resolution, which may be more effective in predicting adapt R 2 in this study to the adjusted- R 2 . Although this
the salinity dynamics with its time-series data [33] in the may cause partial overestimation of the positive effects
growing season. of several features, second, the soil texture extracted
from HydroSHEDS may also introduce errors when
converted from the watershed to the field scale at the in
C. Uncertainties and Limitations of This Meta-Analysis
situ soil sample sites. Therefore, based on the extracted
The potential uncertainties and limitations of the results of soil texture, the conversion of the mean, maximum,
this meta-analysis are as follows. SD, and model RMSE of the salinity measured with
1) Publication Bias and Weight: This study did not pay different methods may also introduce some errors. Thus,
much attention to publication bias because the number of due to the uncertainty in the normalization of salinity
articles that can be included in the meta-analysis is rela- values, the RMSE-based evaluation in this article may be
tively limited. Previous meta-analysis studies were often biased compared to R 2 . Third, the classification of FVC,
performed based on the quality of the published journal in which the land use and land cover of the study area
and the public availability of research data [54], [55]. and the date of the soil sampling are comprehensively
Unfortunately, most of the articles included in this study considered, may introduce errors due to subjectivity.
did not share soil sample data and established models Fourth, the interference effect of soil moisture [61] on
publicly. The preferences in specific soil sampling and the spectrum of salinized lands has not been specifically
the setting of machine learning model parameters are studied in this study because the in situ measured soil
not transparent, which makes it difficult to evaluate the moisture data are almost not used in the selected pub-
reliability of the results of a single study, although most lished articles used for the meta-analysis. The temporal
of the remote sensing data used were obtained from open variability of soil moisture may be distinguished by
source. Besides, the previous meta-analysis studies often the difference in satellite time resolution. For example,
weigh the effects of included studies based on sample the five-day resolution of Sentinel-2 can contain less
size and variance of experimental results [56]–[58]. likely soil moisture fluctuations caused by precipitation

Authorized licensed use limited to: THE LIBRARY OF CHINESE ACADEMY OF SCIENCES. Downloaded on February 07,2022 at 10:02:33 UTC from IEEE Xplore. Restrictions apply.
4505815 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 60, 2022

or irrigation than the 16-day resolution of Landsat. The and machine learning. We evaluated the various features
influence of different irrigation methods can also be involved in the prediction process and established a BN
partly attributed to the temporal soil moisture variation. based on multiple features to evaluate the joint causal effect
Compared with flood irrigated areas, in drip-irrigated and guide future research with the tradeoff between accu-
areas, soil moisture fluctuation may be smaller due to racy and cost. The main conclusions of this study are as
higher irrigation frequency and lower irrigation volume follows.
per time. Therefore, the possibility of model failure due 1) Most significant differences between the model perfor-
to large soil moisture fluctuations can be lower than that mance were found in different soil sampling schemes
in flood irrigated areas. Fifth, considering that a large and the characteristics of the study area, including the
number of studies are located in arid and extremely arid mean and variability (averaged R 2 of 0.75 in sample
regions, low precipitation has little effect on the leaching sets with lower salinity variation and 0.62 in others)
of surface salt, so we did not distinguish between wet of the salinity of the soil samples, the climate type
and dry seasons in this study. Sixth, since most of the (averaged R 2 of 0.64 in arid/semiarid areas and 0.74 in
soil salinity samples of the articles included are located others), the soil texture of the study area (averaged R 2
in low-lying flat lands that are often more salt-affected, of 0.66 in sandy areas with sand content higher than 40%
we did not consider the slope differences among these and 0.57 in others), and the interval between sampling
study areas. However, a few included studies may still date and satellite data acquisition date (averaged R 2
include some sampling sites located in undulating ter- of 0.53 under the condition of over 15 days and 0.65 in
rains, which may lead to the overestimation of the effec- others).
tiveness of topographic variables extracted from DEM 2) Using different satellite data has limited effects on
data in the interpretation of soil salinity variation. Corre- the improvement of model performance among which
spondingly, in a small area of cropland, the contribution Sentinel-2 (averaged R 2 = 0.72) performed better than
of topographical variables to the improvement of model Landsat (averaged R 2 = 0.66). The sampling of subsam-
accuracy may be limited. For small-scale undulating ples for each sample should focus on their subpixel-scale
terrain, the current commonly used DEM data with a spatial heterogeneity across various satellite data rather
resolution of 90 m may not be sufficient to capture the than the number of subsamples. It is also necessary
spatial heterogeneity of the subpixel salinity due to the to select appropriate VI_SIs for different satellite data
coarse spatial resolution. Finally, various parameter and under different FVC conditions.
structure settings and optimization forms of the models 3) RF (averaged R 2 = 0.70) and SVM (averaged R 2 =
are not treated differently. Instead, they were classified 0.71) performed best among all evaluated algorithms.
into algorithm families. It may also bring uncertainty to 4) The established BN has practicality in guiding future
the subsequent evaluation. research with the tradeoff between accuracy and cost.
4) Independence Between Features: In terms of the char-
acteristics of the study area, although multiple features
ACKNOWLEDGMENT
were extracted, the independence between these fea-
tures was not taken into consideration. For example, The built BN is publicly available at
mean salinity values of the sample collection, FVC, https://doi.org/10.5281/zenodo.5146240.
soil texture, and whether the study area belong to the
arid/semiarid region that may not be independent, which R EFERENCES
may interfere with the evaluation of the effect of a single
feature. [1] F. Ghassemi, A. J. Jakeman, and H. A. Nix, Salinisation of
Land and Water Resources: Human Causes, Extent, Manage-
5) Potential Expert Knowledge Integrated Into the BN: ment and Case Studies. Wallingford, U.K.: CAB International,
In this study, the BN was completely parameterized 1995.
based on the data collected from published articles. [2] G. I. Metternicht and J. A. Zinck, “Remote sensing of soil salinity:
Potentials and constraints,” Remote Sens. Environ., vol. 85, no. 1,
In future applications, expert knowledge in the field of pp. 1–20, 2003, doi: 10.1016/S0034-4257(02)00188-8.
remote sensing estimation of soil salinity can be first [3] J. Chen and V. Mueller, “Coastal climate change, soil salin-
incorporated by giving the prior distribution (the rela- ity and human migration in Bangladesh,” Nature Climate Change,
vol. 8, no. 11, pp. 981–985, Nov. 2018, doi: 10.1038/s41558-
tionship between multifeatures and the potential effects 018-0313-8.
on accuracy) and then use the observation records of [4] R. M. Abou Samra and R. R. Ali, “The development of an overlay model
the included articles to update the BN to achieve the to predict soil salinity risks by using remote sensing and GIS techniques:
A case study in soils around Idku Lake, Egypt,” Environ. Monitor.
best integration and utilization of qualitative and quan- Assessment, vol. 190, no. 12, p. 706, Nov. 2018, doi: 10.1007/s10661-
titative information. With the inclusion of new records, 018-7079-3.
the assessment objectives may change, and the structure [5] R. R. Ali and F. S. Moghanm, “Variation of soil properties over
the landforms around Idku Lake, Egypt,” Egyptian J. Remote Sens.
of the BN may also need to be adapted. Space Sci., vol. 16, no. 1, pp. 91–101, Jun. 2013, doi: 10.1016/j.ejrs.
2013.04.001.
V. C ONCLUSION [6] N. Bakr and R. R. Ali, “Statistical relationship between land surface
altitude and soil salinity in the enclosed desert depressions of arid
We performed a meta-analysis of regional satellite-based regions,” Arabian J. Geosci., vol. 12, no. 23, p. 715, Nov. 2019, doi:
soil salinity predictions combined with in situ soil sampling 10.1007/s12517-019-4969-9.

Authorized licensed use limited to: THE LIBRARY OF CHINESE ACADEMY OF SCIENCES. Downloaded on February 07,2022 at 10:02:33 UTC from IEEE Xplore. Restrictions apply.
SHI et al.: GLOBAL META-ANALYSIS OF SOIL SALINITY PREDICTION 4505815

[7] M. El Bastawesy, R. R. Ali, K. Al Harbi, and A. Faid, “Impact [25] P. Hoa et al., “Soil salinity mapping using SAR sentinel-1 data and
of the geomorphology and soil management on the development of advanced machine learning algorithms: A case study at Ben Tre Province
waterlogging in closed drainage basins of Egypt and Saudi Arabia,” of the Mekong River Delta (Vietnam),” Remote Sens., vol. 11, no. 2,
Environ. Earth Sci., vol. 68, no. 5, pp. 1271–1283, Mar. 2013, doi: p. 128, Jan. 2019, doi: 10.3390/rs11020128.
10.1007/s12665-012-1826-5. [26] H. Lievens et al., “Joint sentinel-1 and SMAP data assimilation to
[8] P. S. Minhas, T. B. Ramos, A. Ben-Gal, and L. S. Pereira, “Coping improve soil moisture estimates,” Geophys. Res. Lett., vol. 44, no. 12,
with salinity in irrigated agriculture: Crop evapotranspiration and water pp. 6145–6153, Jun. 2017, doi: 10.1002/2017GL073904.
management issues,” Agricult. Water Manage., vol. 227, Jan. 2020, [27] B. Mougenot, M. Pouget, and G. F. Epema, “Remote sensing of salt
Art. no. 105832, doi: 10.1016/j.agwat.2019.105832. affected soils,” Remote Sens. Rev., vol. 7, nos. 3–4, pp. 241–259,
[9] F. Ghassemi, A. J. Jakeman, and H. A. Nix. (1995). Salini- Nov. 1993.
sation of Land and Water Resources: Human Causes, Extent, [28] K. Ivushkin et al., “UAV based soil salinity assessment of
Management and Case Studies. Accessed: Jul. 22, 2021. cropland,” Geoderma, vol. 338, pp. 502–512, Mar. 2019, doi:
[Online]. Available: https://www.cabdirect.org/cabdirect/abstract/ 10.1016/j.geoderma.2018.09.046.
19976767459 [29] W. Wu et al., “Mapping soil salinity changes using remote sensing in
[10] A. Allbed and L. Kumar, “Soil salinity mapping and monitor- central Iraq,” Geoderma Regional, vols. 2–3, pp. 21–31, Nov. 2014, doi:
ing in arid and semi-arid regions using remote sensing technology: 10.1016/j.geodrs.2014.09.002.
A review,” Adv. Remote Sens., vol. 2, no. 4, pp. 373–385, 2013, doi: [30] E. Scudiero, T. H. Skaggs, and D. L. Corwin, “Regional scale soil
10.4236/ars.2013.24040. salinity evaluation using landsat 7, Western San Joaquin Valley, Cal-
[11] A. A. A. Aldabaa, D. C. Weindorf, S. Chakraborty, A. Sharma, and ifornia, USA,” Geoderma Regional, vols. 2–3, pp. 82–90, Nov. 2014,
B. Li, “Combination of proximal and remote sensing doi: 10.1016/j.geodrs.2014.10.004.
methods for rapid soil salinity quantification,” Geoderma, [31] B. R. M. Rao et al., “Spectral behaviour of salt-affected soils,” Int.
vols. 239–240, pp. 34–46, Feb. 2015, doi: 10.1016/j.geoderma. J. Remote Sens., vol. 16, no. 12, pp. 2125–2136, Aug. 1995.
2014.09.011. [32] K. Ivushkin, H. Bartholomeus, A. K. Bregt, and A. Pulatov, “Satel-
[12] H. Fathizad, M. Ali Hakimzadeh Ardakani, H. Sodaiezadeh, R. Kerry, lite thermography for soil salinity assessment of cropped areas in
and R. Taghizadeh-Mehrjardi, “Investigation of the spatial and tem- Uzbekistan,” Land Degradation Develop., vol. 28, no. 3, pp. 870–877,
poral variation of soil salinity using random forests in the central Apr. 2017, doi: 10.1002/ldr.2670.
desert of Iran,” Geoderma, vol. 365, Apr. 2020, Art. no. 114233, doi: [33] T.-T. Zhang, J.-G. Qi, Y. Gao, Z.-T. Ouyang, S.-L. Zeng, and
10.1016/j.geoderma.2020.114233. B. Zhao, “Detecting soil salinity with MODIS time series VI
[13] H. Jiang, Y. Rusuli, T. Amuti, and Q. He, “Quantitative assess- data,” Ecol. Indicators, vol. 52, pp. 480–489, May 2015, doi:
ment of soil salinity using multi-source remote sensing data based 10.1016/j.ecolind.2015.01.004.
on the support vector machine and artificial neural network,” Int. [34] S. Chen and S. A. Billings, “Neural networks for nonlinear dynamic
J. Remote Sens., vol. 40, no. 1, pp. 284–306, Jan. 2019, doi: system modelling and identification,” Int. J. Control, vol. 56, no. 2,
10.1080/01431161.2018.1513180. pp. 319–346, 1992.
[14] L. Ma et al., “Modeling variations in soil salinity in the oasis of Junggar [35] A. Liaw and M. Wiener, “Classification and regression by randomforest,”
Basin, China,” Land Degradation Develop., vol. 29, no. 3, pp. 551–562, R News, vol. 2, no. 3, pp. 18–22, 2002.
Mar. 2018. [36] J. B. Ullman and P. M. Bentler, “Structural equation modeling,” in
[15] R. Taghizadeh-Mehrjardi et al., “Improving the spatial prediction of soil Handbook of Psychology, 2nd ed. American Cancer Society, 2012, doi:
salinity in arid regions using wavelet transformation and support vector 10.1002/9781118133880.hop202023.
regression models,” Geoderma, vol. 383, Feb. 2021, Art. no. 114793, [37] M. W.-L. Cheung, “MetaSEM: An R package for meta-analysis using
doi: 10.1016/j.geoderma.2020.114793. structural equation modeling,” Frontiers Psychol., vol. 5, p. 1521,
[16] X. Wang, F. Zhang, J. Ding, A. Latif, and V. C. Johnson, “Estima- Jan. 2015.
tion of soil salt content (SSC) in the Ebinur Lake Wetland National [38] Z. Li et al., “Microbes drive global soil nitrogen mineralization and
Nature Reserve (ELWNNR), Northwest China, based on a Bootstrap-BP availability,” Global Change Biol., vol. 25, no. 3, pp. 1078–1088,
neural network model and optimal spectral indices,” Sci. Total Environ., Mar. 2019, doi: 10.1111/gcb.14557.
vol. 615, pp. 918–930, Feb. 2018. [39] T. Sun, Y. Wang, D. Hui, X. Jing, and W. Feng, “Soil properties
[17] M. M. Taghadosi and M. Hasanlou, “Trend analysis of soil salinity in rather than climate and ecosystem type control the vertical varia-
different land cover types using landsat time series data (case study tions of soil organic carbon, microbial carbon, and microbial quo-
Bakhtegan Salt Lake),” Int. Arch. Photogramm., Remote Sens. Spatial tient,” Soil Biol. Biochem., vol. 148, Sep. 2020, Art. no. 107905, doi:
Inf. Sci., vol. 42, pp. 251–257, Sep. 2017. 10.1016/j.soilbio.2020.107905.
[18] E. Scudiero, T. H. Skaggs, and D. L. Corwin, “Regional-scale [40] B. G. Marcot, “Metrics for evaluating performance and uncertainty
soil salinity assessment using landsat ETM + canopy reflectance,” of Bayesian network models,” Ecol. Model., vol. 230, pp. 50–62,
Remote Sens. Environ., vol. 169, pp. 335–343, Nov. 2015, doi: Apr. 2012.
10.1016/j.rse.2015.08.026. [41] B. G. Marcot and T. D. Penman, “Advances in Bayesian network
[19] M. M. Taghadosi and M. Hasanlou, “Developing geographic weighted modelling: Integration of modelling technologies,” Environ. Model.
regression (GWR) technique for monitoring soil salinity using sentinel- Softw., vol. 111, pp. 386–393, Jan. 2019.
2 multispectral imagery,” Environ. Earth Sci., vol. 80, no. 3, p. 75, [42] D. Moher, A. Liberati, J. Tetzlaff, D. G. Altman, and P. Group,
Jan. 2021, doi: 10.1007/s12665-020-09345-0. “Preferred reporting items for systematic reviews and meta-analyses:
[20] A. Zarei, M. Hasanlou, and M. Mahdianpari, “A comparison of machine The PRISMA statement,” PLoS Med., vol. 6, no. 7, 2009,
learning models for soil salinity estimation using multi-spectral earth Art. no. e1000097.
observation data,” ISPRS Ann. Photogramm., Remote Sens. Spatial Inf. [43] A. Hassani, A. Azapagic, and N. Shokri, “Predicting long-term dynam-
Sci., vol. 3, pp. 257–263, Sep. 2021. ics of soil salinity and sodicity on a global scale,” Proc. Nat.
[21] M. M. Taghadosi, M. Hasanlou, and K. Eftekhari, “Retrieval Acad. Sci. USA, vol. 117, no. 53, pp. 33017–33027, Dec. 2020, doi:
of soil salinity from sentinel-2 multispectral imagery,” Eur. 10.1073/pnas.2013771117.
J. Remote Sens., vol. 52, no. 1, pp. 138–154, Jan. 2019, doi: [44] K. Ivushkin, H. Bartholomeus, A. K. Bregt, A. Pulatov, B. Kempen, and
10.1080/22797254.2019.1571870. L. de Sousa, “Global mapping of soil salinity change,” Remote Sens.
[22] L. Bai, C. Wang, S. Zang, C. Wu, J. Luo, and Y. Wu, “Mapping soil Environ., vol. 231, Sep. 2019, Art. no. 111260.
alkalinity and salinity in Northern Songnen Plain, China with the HJ-1 [45] T. Hengl et al., “SoilGrids1km—Global soil information based on auto-
hyperspectral imager data and partial least squares regression,” Sensors, mated mapping,” PLoS ONE, vol. 9, no. 8, Aug. 2014, Art. no. e105992,
vol. 18, no. 11, p. 3855, Nov. 2018, doi: 10.3390/s18113855. doi: 10.1371/journal.pone.0105992.
[23] W. Yong-Ling, G. Peng, and Z. Zhi-Liang, “A spectral index for [46] S. A. Shahid, M. Zaman, and L. Heng, “Introduction to soil salinity,
estimating soil salinity in the Yellow River Delta Region of China sodicity and diagnostics techniques,” in Guideline for Salinity Assess-
using EO-1 Hyperion data,” Pedosphere, vol. 20, no. 3, pp. 378–388, ment, Mitigation and Adaptation Using Nuclear and Related Techniques,
2010. M. Zaman, S. A. Shahid, L. Heng, Eds. Cham, Switzerland: Springer,
[24] M. M. Taghadosi and M. Hasanlou, “Soil salinity mapping using 2018, pp. 1–42, doi: 10.1007/978-3-319-96190-3_1.
dual-polarized SAR sentinel-1 imagery,” Int. J. Remote Sens., [47] P. Slavich and G. Petterson, “Estimating the electrical conductivity of
vol. 40, no. 1, pp. 237–252, 2018, doi: 10.1080/01431161.2018. saturated paste extracts from 1: 5 soil, water suspensions and texture,”
1512767. Soil Res., vol. 31, no. 1, pp. 73–81, 1993.

Authorized licensed use limited to: THE LIBRARY OF CHINESE ACADEMY OF SCIENCES. Downloaded on February 07,2022 at 10:02:33 UTC from IEEE Xplore. Restrictions apply.
4505815 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 60, 2022

[48] J. Pearl, “Bayesian netwcrks: A model CF self-activated memory for Geping Luo was born in 1968. He graduated from
evidential reasoning,” in Proc. 7th Conf. Cogn. Sci. Soc. Irvine, CA, the State Key Laboratory of Resources and Environ-
USA: Univ. California, 1985, pp. 15–17. mental Information System, Institute of Geograph-
[49] T. K. Moon, “The expectation-maximization algorithm,” IEEE Signal ical Sciences and Resources, Chinese Academy of
Process. Mag., vol. 13, no. 6, pp. 47–60, Nov. 1996. Sciences, in June 2002, and received the Ph.D.
[50] E. Davis, C. Wang, and K. Dow, “Comparing sentinel-2 MSI and degree in cartography and geographic information
landsat 8 OLI in soil salinity detection: A case study of agri- system. Since 2005, he has been a Professor and a
cultural lands in Coastal North Carolina,” Int. J. Remote Sens., Doctoral Supervisor with Xinjiang Institute of Ecol-
vol. 40, no. 16, pp. 6134–6153, Aug. 2019, doi: 10.1080/01431161. ogy and Geography, Chinese Academy of Sciences,
2019.1587205. and a Professor with the University of Chinese
[51] T. Gorji, A. Yildirim, N. Hamzehpour, A. Tanik, and E. Sertel, “Soil Academy of Sciences. At present, he mainly uses
salinity analysis of Urmia Lake Basin using landsat-8 OLI and sentinel- ecological models, regional climate/land surface process models, remote
2A based spectral indices and electrical conductivity measurements,” sensing, and empirical statistical models, combined with big data analysis and
Ecological Indicators, vol. 112, May 2020, Art. no. 106173, doi: machine learning methods, to engage in the research on ecological and climate
10.1016/j.ecolind.2020.106173. effects of land use and cover change, remote sensing, and GIS applications.
[52] T. B. Ramos et al., “Soil salinity assessment using vegetation indices
derived from sentinel-2 multispectral data. application to Lezíria
Grande, Portugal,” Agricult. Water Manage., vol. 241, Nov. 2020,
Art. no. 106387, doi: 10.1016/j.agwat.2020.106387.
[53] H. Xu et al., “AGA-SVR-based selection of feature subsets and optimiza- Chunbo Chen was born in Neijiang, Sichuan,
tion of parameter in regional soil salinization monitoring,” Int. J. Remote China, in 1985. He received the Ph.D. degree in car-
Sens., vol. 41, no. 12, pp. 4470–4495, Jun. 2020. tography and geographic information system from
[54] M. Borenstein, L. V. Hedges, J. P. Higgins, and H. R. Rothstein, the Chinese Academy of Sciences, Urumqi, China,
Introduction to Meta-Analysis. Hoboken, NJ, USA: Wiley, 2011. in 2018.
[55] A. P. Field and R. Gillett, “How to do a meta-analysis,” Brit. J. Math. In 2018, he joined the Team of Land Change and
Stat. Psychol., vol. 63, no. 3, pp. 665–694, 2010. Ecological Modeling, Xinjiang Institute of Ecology
[56] D. C. Adams, J. Gurevitch, and M. S. Rosenberg, “Resampling and Geography, Chinese Academy of Sciences. His
tests for meta-analysis of ecological data,” Ecology, vol. 78, no. 4, research interests include big data mining, ecological
pp. 1277–1283, 1997. modeling, and remote sensing observation.
[57] A. Don, J. Schumacher, and A. Freibauer, “Impact of tropical land-
use change on soil organic carbon stocks–a meta-analysis,” Global
Change Biol., vol. 17, no. 4, pp. 1658–1670, 2011, doi: 10.1111/j.1365-
2486.2010.02336.x.
[58] Q. Liu et al., “How does biochar influence soil n cycle? A meta-
analysis,” Plant Soil, vol. 426, nos. 1–2, pp. 211–225, May 2018. Huili He was born in Hami, Xinjiang, China,
[59] T. J. Jackson and P. E. O’neill, “Salinity effects on the in March 1993. She is currently pursuing the
microwave emission of soils,” IEEE Trans. Geosci. Remote Sens., Ph.D. degree with Xinjiang Institute of Ecology and
vol. GE-25, no. 2, pp. 214–220 Mar. 1987, doi: 10.1109/TGRS. Geography, Chinese Academy of Sciences, Urumqi,
1987.289820. China. In her doctoral program, her research mainly
[60] Y. Wu, W. Wang, S. Zhao, and S. Liu, “Dielectric properties of concentrated on exploring the climatic effect of land
saline soils and an improved dielectric model in C-band,” IEEE Trans. use/land cover change in the Aral Sea region based
Geosci. Remote Sens., vol. 53, no. 1, pp. 440–452, Jan. 2015, doi: on the regional climate model.
10.1109/TGRS.2014.2323424.
[61] X. Yang and Y. Yu, “Estimating soil salinity under various
moisture conditions: An experimental study,” IEEE Trans. Geosci.
Remote Sens., vol. 55, no. 5, pp. 2525–2533, May 2017, doi:
10.1109/TGRS.2016.2646420.

Friday Uchenna Ochege received the B.Sc. and


Haiyang Shi is currently pursuing the joint Ph.D. M.Sc. degrees in geography and cartography from
degree with Ghent University, Ghent, Belgium, the University of Nigeria, Nsukka, Nigeria, in 2010
and Xinjiang Institute of Ecology and Geography, and 2014, respectively, and the Ph.D. degree in car-
Chinese Academy of Sciences (XIEG), Urumqi, tography and geographic information system from
China. His research interests include hydrology and the University of Chinese Academy of Sciences,
machine learning applications in remote sensing and Beijing, China, in 2021. He is currently a Post-
geography. Doctoral Research Fellow with the Xinjiang Insti-
tute of Ecology and Geography, Chinese Academy
of Sciences, China. His research interests include
remote sensing of vegetation, evapotranspiration,
land use/cover and climate changes, and the application of machine learning
Olaf Hellwich (Senior Member, IEEE) was born methods in remote sensing and geographical research.
in 1962. He received the B.S. degree in surveying
engineering from the University of New Brunswick,
Fredericton, NB, Canada, in 1986, and the Ph.D.
degree in linienextraktion aus SAR-Daten mit einem
Markoff-Zufallsfeld-Modell from the Technische Tim Van de Voorde received the M.Sc. degree
Universität München, Munich, Germany, in 1997. in geography/geographic information systems,
He headed the Remote Sensing Group, Department the M.A. degree in business economics, and the
of Photogrammetry and Remote Sensing, Technische Ph.D. degree in geography from Vrije Universiteit
Universität München. Since 2001, he has been a Brussel (VUB), Brussels, Belgium, in 1997, 2006,
Professor with the Technische Universität Berlin and 2011, respectively.
(TUB), Berlin, Germany, initially for photogrammetry and cartography and He has been working as a Research Associate
since 2004 for computer vision and remote sensing. From 2006 to 2009, at the Cartography and GIS Research Group,
he was the Dean of the Faculty of Electrical Engineering and Computer VUB, since 1998. In 2017, he became a part-time
Science, TUB. His research interests include 3-D object reconstruction, object Professor at the Geography Department, Ghent
recognition, and synthetic aperture radar remote sensing. Most recently, University, Ghent, Belgium. His research interests
he focuses on discovery and use of object shape priors in 3-D reconstruction. include urban remote sensing, urban greenspaces, ecosystem services, GIS
Dr. Hellwich was a recipient of the Hansa Luftbild Prize of the German and remote sensing applications in archeology, and deep learning applications
Society for Photogrammetry and Remote Sensing in 2000. in remote sensing and geography.

Authorized licensed use limited to: THE LIBRARY OF CHINESE ACADEMY OF SCIENCES. Downloaded on February 07,2022 at 10:02:33 UTC from IEEE Xplore. Restrictions apply.
SHI et al.: GLOBAL META-ANALYSIS OF SOIL SALINITY PREDICTION 4505815

Alishir Kurban is currently a Professor at Xin- Philippe De Maeyer is currently a Senior Full Pro-
jiang Institute of Ecology and Geography, Chinese fessor in cartography and GIS and the Chair of the
Academy of Sciences, Urumqi, China. He tested Department of Geography, Ghent University. He is
new 3-D modeling technology for archeological site also a Full Member of the Royal Academy of Over-
erosion research and 3-D change detection of arid seas Science, Belgium. He is involved in research on
land surface including vegetation above biomass the history of cartography and map making and the
changes. He made over 30 variety scales thematic use of GIS in historical (map) studies (esp. 18th until
maps, including land cover/land use maps, vegeta- 20th Century); the use of GIS and remote sensing
tion maps, and natural resources maps for Xinjiang for land cover/land use issues/changes (including
Region. He has published four monographs and climate change, esp. in Central Asia); the use of
over 50 research articles in scientific journals in GIS and remote sensing in archeology (Silk road
Uyghur, Chinese, and English. He has supervised and co-supervised over in Xinjiang, China; Campeche, Mexico; and so on); risk modeling (esp. the
30 M.Sc. and Ph.D. dissertations. He successfully organized or co-organized economic direct impact and the social, ecological, and cultural impact) of
several international scientific conferences and summer schools in the field floods and other natural hazards (Belgium, SIDS Small Island Developing
of cartography and GIS, and Intercarto-InterGIS 14, GISCA, and SilkGIS States, and so on); the use of GIS for accessibility studies, e.g., accessibility
series conference in Urumqi, Bishkek, Tashkent, and Isfahan. His research to health care; and indoor routing (algorithms and landmarks) and so on.
interests include the application of remote sensing and geographic information
technology in the field of arid environment and ecosystem research, specially
focused on natural resource mapping, land cover and land use mapping,
vegetation mapping and urban mapping via applying remote sensing, and GIS.

Authorized licensed use limited to: THE LIBRARY OF CHINESE ACADEMY OF SCIENCES. Downloaded on February 07,2022 at 10:02:33 UTC from IEEE Xplore. Restrictions apply.

You might also like