You are on page 1of 14

Geoderma 405 (2022) 115399

Contents lists available at ScienceDirect

Geoderma
journal homepage: www.elsevier.com/locate/geoderma

Assessing toxic metal chromium in the soil in coal mining areas via
proximal sensing: Prerequisites for land rehabilitation and
sustainable development
Jingzhe Wang a, Xianjun Hu d, Tiezhu Shi a, b, Li He c, *, Weifang Hu e, Guofeng Wu a, b
a
MNR Key Laboratory for Geo-Environmental Monitoring of Great Bay Area & Guangdong Key Laboratory of Urban Informatics & Shenzhen Key Laboratory of Spatial
Smart Sensing and Services, Shenzhen University, Shenzhen 518060, China
b
School of Architecture & Urban Planning, Shenzhen University, 518060 Shenzhen, China
c
College of Mechatronics and Control Engineering, Shenzhen University, 518060 Shenzhen, China
d
School of Electronic Engineering, Naval University of Engineering, Wuhan 430033, China
e
Institute of Agricultural Resources and Environment, Guangdong Academy of Agricultural Sciences, Guangzhou 510640, China

A R T I C L E I N F O A B S T R A C T

Handling Editor: Cristine Morgan The rapid and accurate determination of soil chromium (Cr) is crucial for preventing toxic element pollution in
soils and ensuring ecological security. Proximal sensing technology uses visible and near-infrared (Vis-NIR)
Keywords: diffuse reflectance spectroscopy, which has been demonstrated to be a viable approach for monitoring soil Cr
Soil pollution mapping concentrations. However, at trace levels, soil Cr is not especially spectrally active, thus limiting the practical
Soil chromium (Cr)
application of using corresponding spectral data for quantifying soil Cr concentrations. In this study, we hy­
Proximal soil sensing
pothesized that fused proximal sensing and soil auxiliary attributes (including organic matter (OM) and pH)
Auxiliary soil attributes
Three-band spectral indices (TBI) could improve estimation of Cr concentrations in the soil. Additionally, the introduction of best-fit variogram
Variogram model models was theoretically possible to improve spatial visualization. To address these hypotheses, we collected 168
soil samples from the open coal mine area in the Eastern Junggar Basin, China. Fractional-order derivative (FOD)
pretreatment and optimal band combination methods were implemented for spectral data mining and the
derivation of spectral parameters, respectively. Soil Cr estimation models were calibrated with a partial least
squares (PLS) approach through four designed strategies with different predictors: (I) full Vis-NIR variables, (II)
effective three-band spectral indices (TBIs), (III) the effective TBIs and OM, and (IV) the effective TBIs, OM, and
the pH. The results suggest that FOD could identify abundant spectral variability. Compared with full Vis-NIR
variables, the effective TBIs can effectively magnify the subtle spectral signals concerning soil Cr. The optimal
estimation model was determined as Strategy IV, indicating that the introduction of soil auxiliary attributes (pH
and OM) can improve the estimation performance of the model; notably, the coefficient of determination (R2)
and ratio of performance to interquartile distance (RPIQ) were 0.87 and 2.68, respectively. Based on the optimal
semivariance model, we used kriging interpolation to map regional soil Cr. In the study area, the soil Cr dis­
tribution features strong spatial dependence and strong associations. Our study might inspire further research on
soil contamination mapping based on proximal Vis-NIR sensors.

1. Introduction widely used in the coal, steel, electroplating, and tanning industries;
therefore, Cr has become a typical pollutant of toxic metal contamina­
1.1. Background tion pollution in China, especially in coal industrial areas and mining
wastelands (Gao and Xia 2011; Wang et al. 2011). In addition, Cr is one
Toxic metal contamination in the soil is a global environmental issue, of the most harmful toxic elements, causes severe human health impli­
and a notable threat to ecological systems and human well-being cations (Ellis et al. 2002).
(McBratney et al. 2014). Chromium (Cr) and its compounds are Considering its various sources and high toxicity, Cr has been defined

* Corresponding author at: College of Mechatronics and Control Engineering, Shenzhen University, 518060 Shenzhen, China.
E-mail address: heli@szu.edu.cn (L. He).

https://doi.org/10.1016/j.geoderma.2021.115399
Received 14 December 2020; Received in revised form 9 August 2021; Accepted 11 August 2021
Available online 25 August 2021
0016-7061/© 2021 Elsevier B.V. All rights reserved.
J. Wang et al. Geoderma 405 (2022) 115399

as a priority monitoring and control pollutant according to the Chinese and the selected sensitive wavelengths (Horta et al. 2015; Shi et al.
government (Hu et al. 2020; Xu et al. 2019). Referring to the “Outline of 2014a; Shi et al. 2018a).
the 13th Five-Year Plan for Land and Resources (2016–2020)”, over In essence, the spectral estimation of soil Cr concentrations is a
5000 km2 of the geological environment of historical mines needs to be typical application of quantitative remote sensing. However, the number
restored, and the reclamation of mining wastelands (ecological reha­ of environmental variables in a model calibration system is commonly
bilitation) is an increasing need (Zhou and Cao 2020). Mining waste­ much larger than that for remote sensing observations, and the nature of
lands are preferred for agricultural production after ecological inversion is an ill-posed inversion problem (Liang et al. 2012). In terms
restoration in China (Gao and Xia 2011). However, land rehabilitation of soil, Cr is not a decisive element in forming the soil reflectance
may cause secondary toxic metal contamination based on the filling spectrum. Consequently, Baveye and Laba (2015) noted that a persistent
materials and restricted technologies used (Cao 2007). Vis-NIR myth seems to exist in proximal soil sensing. To overcome this
The two main valence states for Cr in soils are Cr(III) and Cr(VI) (Xu specific ill-posed inversion problem, some studies have focused on reg­
et al. 2019). The Cr(VI) state is toxic and can be taken up by plants, ularization methods, including the introduction of multisource data and
accumulated in plants, and be absorbed by humans where it has carci­ prior knowledge and the integration of multiple data-mining algorithms
nogenic effects. Overlying plants including grain crops and other cash (Wang et al. 2020; Xu et al. 2019).
crops, may be affected by Cr (Liu et al. 2017; Shi et al. 2018a). Conse­
quently, it is a foreseeable consequence that this specific toxic metal will 1.2.2. Indirect approaches
be absorbed and then accumulate in the human body and/or wildlife via In most toxic metal-polluted areas, the concentrations of metals in
the food chain (Hong et al. 2019c). Therefore, there is a tremendous the soil are measured at low or trace levels, moreover, some metals are
need to identify Cr-contaminated soil and further assess Cr concentra­ featureless in raw spectral reflectance (Wu et al. 2007; Xia et al. 2007).
tions. As a first step to obtaining a sustainable solution to toxic metal Considering these challenges, some scholars have developed new
contamination in the soil and controlling the corresponding eco- assessment methods. Specifically, researchers have developed indirect
environmental impacts, quantitative assessments are needed for land approaches to quantitatively or qualitatively assess toxic metal elements
rehabilitation and sustainable development (Minasny and McBratney in the soil (Gholizadeh and Kopačková 2019; Zhang et al. 2020a). Such
2010). Therefore, it is imperative to be able to rapidly, accurately, and estimations are not directly obtained based on a target toxic metal.
reliably characterize the soil Cr concentrations for possible remediation. Currently, there are two main types of indirect methods: (1) those that
Compared with regular field investigations, proximal sensing tech­ consider the potential relationships between pollutant-related materials
nology uses visible and near-infrared (Vis-NIR) diffuse reflectance and target toxic metals and (2) those based on the reflectance spectra of
spectroscopy methods, which have been demonstrated to be eco-friendly vegetation.
and accurately relate spectral reflectance information to specific soil The toxic metal elements in the soil are often associated with soil
properties (Jia et al. 2021; Wan et al. 2020). The development of physiochemical properties (Hong et al. 2019c). For example, heavy
proximal sensing and remote sensing technologies has provided reliable metal ions in the soil can easily form stable compounds with OM through
approaches for monitoring soil Cr concentrations in time- and cost- chelation or complexation. Exogenous heavy metals entering the soil can
saving modes (Shi et al. 2018a). Related studies in the fields of pedo­ also be adsorbed by soil clay minerals, iron oxides, and OM (Chakra­
metrics, spectral pedology, and environmental science have been con­ borty et al. 2017; Wang et al. 2014). These related components have
ducted (Hu et al. 2020; Viscarra Rossel et al. 2006b). significant spectral features; hence, indirect estimation via soil physi­
ochemical properties (OM, pH, and texture) and Vis-NIR spectroscopy is
1.2. Related works feasible (Cheng et al. 2019; Wang et al. 2014).
Plants grown in polluted soil can absorb toxic metal elements via
Since 1997, many studies have assessed and/or retrieved concen­ their root systems, and the elements then accumulate in their organs
trations of toxic metals in the soil via proximal sensing (Malley and (stem, leaf, and spike) (Gholizadeh and Kopačková 2019). These toxic
Williams 1997; Shi et al. 2014a; Shi et al. 2018a). Based on the test metals lead to the destruction of carrier proteins in leaves, and further
objects and media, current main approaches can be summarized as inhibit plant growth (Shi et al. 2016). Accordingly, the changing levels
direct approaches, indirect approaches and integrated approaches. of leaf nitrogen, chlorophyll, and protein result in changes in the spectral
reflectance of the leaf and canopy (Zhang et al. 2020a). Some scholars
1.2.1. Direct approaches have also systematically investigated the physiochemical responses of
Some toxic metal elements in the soil can be adsorbed by soil organic plants to toxic metal elements and established corresponding quantita­
matter, clay minerals, iron manganese oxides, and carbonate minerals. tive retrieval models (Li et al. 2015; Shi et al. 2018a). Published studies
These soil components can lead to changes in the soil spectral shape and have shown that these indirect measurements of toxic metals provide
reflectance and display specific spectral absorption features (Liu et al. reliable prediction accuracy. Nevertheless, it is noted that the perfor­
2018; Mustafa et al. 2020). The correlation between cuprum (Cu) and mance of indirect approaches partly depends on the correlation between
the reflectance spectrum is mainly affected by organic matter (OM); the toxic metal elements and auxiliary attributes as well as the stress
those of lead (Pb), zinc (Zn), cobalt (Co), and nickel (Ni) are mainly severity to plants (Shi et al. 2018a; Zhang et al. 2020a). Related works
affected by clay minerals and iron manganese oxides. Wu et al. (2007) should only be performed if the prerequisites are met; otherwise, the
have investigated the spectral response by adding known amounts of estimation accuracy may be unsatisfactory, furthering lead to error
heavy metals into soil. The results showed that the spectral features for accumulation.
heavy metals only appear in the Vis-NIR region for transitional elements
at very high concentrations, although it is rarely found in common 1.2.3. Integrated approaches
natural soils. These above mechanisms lay the foundation for moni­ The soil system, a natural system formed by long-term evolution, has
toring toxic metal contamination in the soil based on diffuse spectral synergistic relationships between various physiochemical processes and
reflectance. A direct approach, as the term suggests, is to assess the environmental variables that affect its development (McBratney et al.
concentrations of soil toxic metals based on in situ and laboratory 2014). This relationship is summarized as a soil-landscape model and
spectral measurements at high spectral resolutions in the visible, near- such models can be used for the rapid characterization of toxic metals in
infrared, mid-infrared, and thermal infrared regions (Cheng et al. the soil (Minasny and McBratney 2016). Existing studies have shown
2019). Notably, spectral reflectance can be directly obtained from a that the differences in landscape variables are closely related to regional
spectroradiometer. Subsequently, a quantitative inversion model can be soil toxic metal contamination and can be summarized into some
established by integrating laboratory-based toxic metal concentrations soil–landscape correlation functions, such as Jenny’s model and the

2
J. Wang et al. Geoderma 405 (2022) 115399

SCORPAN framework (McBratney et al. 2003). In terms of toxic metals, realizing spatial distribution mapping of soil Cr. Therefore, the intro­
some scholars have used the integrated approach to perform digital soil duction of best-fit variogram models could improve the spatial visuali­
mapping (DSM) and conduct the point estimation (Shi et al. 2018b; zation of results. Specifically, the main objectives of this study are (1) to
Wadoux et al. 2020). Compared with direct and indirect approaches, the explore the potential of considering soil auxiliary information (OM and
integrated approach can not only estimate the attributes of the soil at a pH) in soil Cr spectral estimation; (2) to compare the performance of
point but also visualize toxic metal-polluted areas. However, the quan­ different modeling modes (introducing auxiliary attributes or not); and
titative characterization performance is usually unsatisfactorily (3) to produce a soil Cr map based on the integration of proximal sensing
restricted by soil auxiliary attributes with coarse spatiotemporal reso­ and the kriging interpolation method. The main contributions of this
lutions (Tóth et al. 2016). study include developing an integrated estimation methodology for
Vis-NIR spectral reflectance data are typical stored in large and toxic metals (Cr) and applying the method in an arid mining wasteland.
complex datasets characterized by ultrahigh spectral resolutions, many
wavebands, and abundant information (Ben-Dor et al. 1997; Viscarra 2. Methodology
Rossel et al. 2006a). Therefore, effective spectral pretreatment steps are
crucial for enhancing the internal differences in spectral variables and 2.1. Study area
revealing key information (Liu et al. 2019; Nawar et al. 2016).
Fractional-order derivative (FOD) methods can reduce such complex The study area, the eastern Junggar coalfield (also known as the
datasets and minimize the loss of spectral information (Abulaiti et al. Zhundong coalfield) is well known for its massive coal reserves (over
2020; Hong et al. 2019a; Wang et al. 2018a; Zhang et al., 2020b). 3.9 × 1011 tons) and is the largest integrated coalfield in China, ac­
However, the integrated application of FOD methods and soil auxiliary counting for approximately 7.2% of China’s total reserves (Imin et al.
variables in the quantitative characterization of soil toxic metals based 2020). Previous geological surveys have revealed that local coal seams
on soil proximal sensing has not been explored. Close relationships exist are strongly associated with Cr elements (Zhou et al. 2010). The eastern
between Cr and different soil covariates (OM and pH) (Shi et al. 2014a). Junggar coalfield is located in the Gobi Desert region in the south­
To improve predictive ability, the potential concatenation of FOD- western Karamaili Nature Reserve on the northern slope of the Tianshan
pretreated Vis-NIR spectroscopy and auxiliary attributes is a highly Mountains and adjacent to the eastern Junggar Basin in China. This
anticipated task. Moreover, existing studies often focus on point esti­ specific coalfield area is also named after its geographic location (Fig. 1a
mation, but mapping the distribution soil properties is more useful for a and b). With an extremely arid continental climate, the study area re­
better decision making of soil resources (Jia et al. 2021). ceives scarce precipitation (183.5 mm yr− 1) and experiences intense
evaporation (2070.3 mm yr− 1). The average wind speed of the local
prevailing northwestern winds is 2.0 m s− 1, reaching a maximum of 16
1.3. Contributions m s− 1. The total area of the eastern Junggar coalfield is approximately
1.3 × 104 km2. Referring to the World Reference Base (WRB) soil clas­
Inspired by these approaches, we hypothesize that fused proximal sification, the local soil types are dominated by Skeletic and Yermic
sensing and soil auxiliary attributes can improve estimates of Cr con­ (Imin et al. 2020; Nachtergaele et al. 2000). Local vegetation cover is
centrations in the soil. A precise variogram estimator is essential for

Fig. 1. Maps of the study area and the distribution of sampling sites (a: Xinjiang in China; b: Geographical location of the eastern Junggar coalfield; c: Soil sampling
sites across the study area; d: Typical landscape photograph of sampling sites (Photograph credit: Jingzhe Wang); and f: Sampling schema (four points)
within quadrat).

3
J. Wang et al. Geoderma 405 (2022) 115399

extremely low (<10%), with the main plant types being Anabasis salsa (IOD) to derivatives of any order and be used to obtain precise in­
(C. A. Mey.) Benth. ex Volkens, Ephedra przewalskii Stapf, Haloxylon terpolations between the 0-order derivative (original) and the IOD to
ammodendron (C. A. Mey.) Bunge, concomitant extreme xerophytes such identify changes in the details of the spectrum for different IODs; this
as Salsola collina Pall., and Ceratoides latens (J. F. Gmel.) Reveal et approach has high application potential in reducing spectral noise and
Holmgren, Iljinia regelii (Bunge) Korov., and Sarcozygium xanthoxylon increasing the resolution of spectral features (Wang et al. 2018a; Wang
Bunge (Zeng et al. 2020). In recent years, the habitation of the eastern et al. 2020). The Grünwald–Letnikov equation is a discrete definition
Junggar Basin has attracted enormous governmental concern. As a that is convenient for numerical calculations and highly computation­
result, assessments of soil Cr are needed for regional land rehabilitation ally efficient, so this specific formula is used in this study (Schmitt
and sustainable development. 1998). The related mathematical expressions of FOD are as follows:
( )∑(t− a)
2.2. Sampling and experiments 1 h
Γ(q + 1)
dq f (x) = lim q
( − 1)m f (x − mh) (1)
h→∞ h m!Γ(q − m − 1)
m=0
The soil sampling investigation was conducted in June 2019. The
sampling strategy was designed based on the land use/land cover, soil where q represents the derivative order; h represents the step interval;
types, soil surface characteristics, previous field sampling experience, and t and a are the upper and lower limits of the fractional derivative
and accessibility of the study area to the designed investigation sites. order, respectively.
Each quadrat was set as 10 m × 10 m (Fig. 1e). At each sampling point, The formula contains the gamma function (Γ(β)):
four subsamples ranging from 0 to 20 cm were collected via a wooden
∫∞
shovel. All subsamples were fully mixed to generate representative
Γ(β) = e− t tβ− 1 dt = (β − 1)! (2)
composite samples, placed into sealed plastic bags and labelled. The
corresponding geographic coordinates were simultaneously recorded
0

with a portable high precision GPS (GARMIN Oregon 550, positioning When a one-dimensional spectrum in the data interval of [a, t] is set
accuracy ≤ 5 m). Notably, the sampling campaigns including collection as f(λ), the wavelength interval is portioned into n equal divisions based
and preservation, strictly complied with the Technical Specification for on the spectral step (h). Because the spectral data in this study had
Soil Environmental Monitoring (HJ/T 166–2004) released by the State already been resampled to a resolution of 10 nm, h was set to 1.
Environmental Protection Agency of China (2004) to avoid the potential Consequently, the FOD formula from Eq. (1) can be defined as:
cross-contamination of collected soil samples. dq f(λ)
≈ f(λ) +( − q)f (λ − 1) +(− q)(− q+1)
f(λ − Γ(− q+1)
2) +⋯ +n!Γ(−
dλq 2 q+n+1) f(λ − n)
Under laboratory conditions, all samples were air dried and ground.
(3)
Moreover, they were sieved to ≤0.15 mm and <2 mm for subsequent
Notably, if λ is set to 0, 1, or 2, Eq. (3) represents the non-processed
chemical analysis and spectral measurements, respectively (Hong et al.
original spectrum, the classic first derivative or the classic second de­
2019a). After appropriate pretreatments, all chemical measurements
rivative, respectively. In this investigation, a total of nine FOD series
were conducted immediately to reduce the potential effects of exposed
(from order 0.0 to 2.0) of the soil spectrum were calculated using Eq. (3)
time/amounts/mixed time as possible. The OM content was determined
to mine potential ‘hidden’ information.
based on the potassium dichromate method. The soil Cr concentration
Previous studies have demonstrated that spectral indices can reduce
was determined using an atomic absorption spectrophotometer (Hitachi
the redundancy of the spectral matrix and be used to select sensitive
Z-2000, Japan). The soil pH was determined at a solid/water ratio of
spectral parameters related to soil attributes (Tian et al. 2013; Wang
1:2.5 with a precise pH meter (Shanghai Leici, PHBJ-260, China) ac­
et al. 2019). A spectral index composed of three spectral bands is a better
cording to the method of Bao (2005).
representation than an index based on a single band or two band (Li
et al. 2014; Zhang et al., 2020b). In this study, five three-band spectral
2.3. Spectral measurements and preprocessing indices (TBIs) are investigated at three wavelengths (R1 , R2 , andR3 ) over
the range of 400–2400 nm based on Eqs. (4)–(8). Moreover, the corre­
The spectral measurements of all pretreated soil samples obtained lation coefficients of the TBIs with the soil Cr are calculated. Five types
with an ASD FieldSpec® 3 portable spectrometer (Malvern Panalytical, of TBIs were used to explore their relationships with Cr concentrations
Malvern, UK) installed in a dark room with controlled lighting condi­ and further extract the spectral parameters. It is noted that
tions. Detailed technical specifications of the spectroradiometer are R1 ∕= R2 ∕= R3 . For each potential combination, the TBI with the
available at https://www.malvernpanalytical.com/en/products/prod maximum absolute correlation coefficient (|r|) with the Cr concentra­
uct-range/asd-range. The detailed steps of sample preparation and the tion is used to determine the optimal band combination and spectral
geometric parameters of the spectrometer setup are as follows (Shi et al. variables for subsequent estimation analysis. The optimal band combi­
2014b). All soil samples were evenly placed in a contained with a 12-cm nation algorithm was run by programming Eqs. (4)–(8) in MATLAB
diameter and a 1.8-cm depth. A standardized Spectralon white plate 2019b (MathWorks, Natick, MA, USA).
with 99% reflectance was applied to calibrate the spectrometer before
each measurement. Measurements were made with a Hi-Brite Contact TBI1 (R1 , R2 , R3 ) = (R1 − R2 )/(R1 + R3 ) (4)
Probe using a halogen bulb (color temperature of 2901 ± 10 K) for
illumination. The collected Vis-NIR reflectance spectra covered wave­ TBI2 (R1 , R2 , R3 ) = R1 /(R2 × R3 ) (5)
lengths from 350 nm to 2500 nm with a final spectral resolution of 1 nm.
Each sample was repeatedly scanned 10 times and then averaged to TBI3 (R1 , R2 , R3 ) = (R1 − R2 )/(R1 − 2R2 + R3 ) (6)
create the representative spectrum. In this study, the spectral wave­
TBI4 (R1 , R2 , R3 ) = R1 /(R2 + R3 ) (7)
lengths were narrowed (400–2400 nm) to reduce the inherent high-
frequency noise of the spectrometer. Subsequently, the original reflec­
TBI5 (R1 , R2 , R3 ) = (R1 − R2 )/(R2 − R3 ) (8)
tance spectra were smoothed by the second-order Savitzky-Golay
method with a window size of 9 wavelengths. To further simplify the
spectral matrix and reduce redundant information, the spectral data for 2.4. Modeling strategies and statistical analysis
each soil sample were downsampled at an interval of 10 nm prior to
spectral preprocessing (Zhang et al. 2021). Then, spectral pretreatment In this study, the Kennard-Stone algorithm was employed to establish
with the FOD method was conducted. the calibration and validation datasets. Based on the concept of this
The FOD algorithm can extend the common integer-order derivative specific partitioning algorithm, two-thirds of the whole dataset was

4
J. Wang et al. Geoderma 405 (2022) 115399

selected as calibration dataset (n = 112); therefore, the remaining operations were performed via the packages of PLS (version 2.7–3),
samples (n = 56, accounting for one-third of the data) were regarded as geoR (version 1.8–1), gstat (version 2.0–7), and corrplot (version 0.84)
the validation dataset. Additionally, we conducted an analysis of vari­ in the R platform (version 3.4.0) (Pebesma and Heuvelink, 2016; R
ance (ANOVA), as proposed by Levene (1960) to examine the data dis­ Development Core Team 2019; Ribeiro et al., 2007; Wei et al. 2017).
tributions for both partitioned sub datasets and prove reasonable
statistical status between them. 3. Results
The partial least square (PLS) method is one of the most widely
adopted regression strategies in spectral modeling (Viscarra Rossel et al. 3.1. Exploratory data analysis
2006a). The PLS approach was developed from principal component
regression and can simultaneously model predictive multiple dependent The statistical descriptions of the soil Cr concentrations (mg kg− 1) of
variables, reduce dimensionality and avoid multicollinearity (Nawar the 168 collected soil samples are summarized in Table 2. For the entire
et al. 2016; Wold et al. 1983). Compared with the conventional multi- dataset, the Cr concentration varied between 14.31 and 110.07, with an
linear regression (MLR), PLS is a more appropriate approach for cali­ average of 53.67 mg kg− 1. The mean Cr concentration was twice higher
brating models with severe collinearity in the independent variables than the Xinjiang soil background value (49.30 mg kg− 1). According to
(spectral data in this study), particularly in cases where the sample size the pollution level (78 mg kg− 1) in China’s Soil Environmental Quality
is small (Wang et al. 2019). In this study, PLS analysis is employed to Control Standards (GB36600-2018), 122 samples showed different de­
assess the potential relationships between dependent and independent grees of Cr pollution, with pollution rate of 72.62%. The data approxi­
variables. Leave-one-out cross-validation is used to prevent model mately obey a normal distribution with a relatively small coefficient of
overfitting and improve the performance of the model. In this study, we variation (CV) of 0.34. In terms of the dataset partitions, the results of
designed four modeling strategies for the quantitative assessment of soil Levene’s test suggested that the homogeneity between the calibration
Cr concentrations (Table 1). To evaluate the performance of the cali­ and validation datasets was significant at the 0.05 significance level (p
brated models, several statistical parameters were compared between = 0.97). The OM and pH values varied from 0.26 to 33.37 g kg− 1 and
laboratory measurements and model-estimated values based on the in­ 7.60 to 10.60, respectively. The mean pH value indicated that the
dependent validation set: the coefficient of determination (R2), root regional surface soil is strongly alkaline.
mean square error (RMSE), and the ratio of performance to the inter­ To further investigate the estimation performance based on Vis-NIR
quartile range (RPIQ). RPIQ is a highly effective measure, especially of data and auxiliary information, we explore the internal correlations
asymmetric (nonnormal) behavior (Bellon-Maurel et al. 2010). among different soil parameters (Fig. 2). It is obvious that Cr was
In addition to the common local accuracy of the estimation models, significantly correlated with OM with a correlation coefficient (r-value)
we investigated variograms to support kriging interpolation. Variograms of 0.517 (p < 0.001). In addition, a significant negative correlation was
were introduced to examine the spatial structures of the lab- and model- observed between Cr and pH (r = − 0.254, p < 0.01), whereas no sig­
based soil Cr in the study area. The dependence of soil Cr in space was nificant correlation relationship between pH and OM was detected (r =
evaluated through a variogram analysis using semivariance values. The − 0.081). These results suggest that soil Cr estimation may benefit from
related formula is shown in Eq. (9): the selected soil auxiliary information (OM and pH).
1
∑N(h) 2 In terms of the spectral dimension, we identified and selected the
γ(h) = 2N(h) α=1 [E(xi ) − E(xi + h) ] (9)where γ(h) represents the
major wavelength based on Pearson correlation analysis. Considering
average semivariance of the soil Cr, h represents the lag (distance be­
the sample size (n = 168), the P-value thresholds for statistical signifi­
tween pairs), N(h) is the number of paired sampling points based the lag
cance of 0.01 and 0.001 were 0.201 and 0.255, respectively. Fig. 3 il­
h, and E(xi ) and E(xi +h) represent the observed values of soil Cr at the
lustrates the sensitive wavelengths of different soil parameters over the
given locations of xi and xi + h, respectively. Specifically, x is the co­
spectral range of 400–2400 nm. The correlations between Cr and the
ordinate location of the point. Variograms provide evidence of spatial
FOD processed spectra for all wavelengths were calculated, and the
autocorrelation if the semivariances are lower at small lags than at
maximum absolute correlation for the 0.75-order reflectance was 0.51.
larger lags, i.e., sampling locations located close to each other display
Moreover, the important spectral wavelengths of soil Cr (400–760 nm
similar values. Common variogram models, i.e., spherical, circular,
and 1800–2300 nm), OM (580–1000 nm and 2000–2400 nm), and the
exponential, and gaussian models were tested in this study (Oliver and
pH (880–1860 nm and 2070–2400 nm) partly overlapped. In other
Webster 2014). In this step, we also conduct the independent validation
words, some similar spectral ranges were identified, suggesting that the
(n = 56) to fairly assess the performance of semivariogram function. The
arduous spectral estimation of spectrally featureless Cr can be improved
best fit model with lowest value of RMSE was selected for subsequent
by considering the effects of soil auxiliary parameters. Consequently, the
analysis. More detailed descriptions of variogram analysis and the
OM and pH were introduced during the modeling process in subsequent
kriging method are available in previous studies and will not be repeated
model calibrations.
here (Chakraborty et al. 2017; Oliver and Webster 2015).
In this study, all mathematical calculations and geostatistical
3.2. Spectral feature detection

Table 1 In this study, we performed FOD calculations for each smoothed


Detailed specifications of the applied modeling strategies and the corresponding continuous spectrum to detect spectral features (Fig. 4). The spectral
input variables used in this study. features, including absorption and wave band shapes, are wide and
Strategies Predictor variable Detailed concept considerably overlap; therefore, the corresponding explanations of the
I Full Vis-NIR 400–2400 nm original reflection spectra are difficult to establish. As the FOD
variables increased, the FOD reflectivity of the general spectral range narrowed
II Effective TBI The selected optimal spectral band combination and tended to 0, suggesting that spectral baseline drift and mixed
based on the maximum correlation coefficient (|
overlapping peaks are gradually eliminated to some extent. The com­
r|) for soil Cr
III Effective TBI + Optimal spectral band combination and soil parison of results makes it clear that the FOD-pretreated spectrum can
OM organic matter contents express the variations in spectral details. That is, this specific spectral
IV Effective TBI + Optimal spectral band combination, soil organic preprocessing approach improves the analytical resolution and further
OM + pH matter contents and soil pH magnifies the spectral absorption valleys, especially at wavelengths of
NOTE: Vis-NIR represents visible and near-infrared diffuse reflectance spec­ 1600–2100 nm. These characteristics indicate that the FOD processing
troscopy, TBI represents three three-band spectral index. of the segmented spectrum was effective, and it is possible to extract

5
J. Wang et al. Geoderma 405 (2022) 115399

Table 2
Descriptive statistics of soil Cr and the related auxiliary information.
Count Minimum Mean Median Maximum Standard deviation Interquartile range Coefficient of variation Skewness
− 1
Cr (mg kg ) Entire 168 14.31 53.67 54.02 110.07 18.37 23.12 0.34 0.11
Calibration 112 14.31 53.34 56.11 110.07 18.46 23.24 0.35 − 0.04
Validation 56 18.04 54.32 52.15 105.00 18.35 23.05 0.34 0.45
OM (g kg− 1) 168 0.26 6.39 3.31 33.37 6.63 6.87 1.03 1.93
pH 168 7.50 9.04 9.10 10.50 0.83 1.40 0.09 –

NOTE: OM represents organic matter. Calculating the coefficient of variation for the pH is not statistically significant because the pH is an interval variable.

Fig. 2. Scatter matrix based on the correlations among the soil Cr concentration, pH, and OM (n = 168). Solid red lines indicate fitting curves. The double asterisks
(**) indicate p < 0.01 and the triple asterisks (***) indicate p < 0.001.

Fig. 3. Distribution of sensitive wavelengths based on the correlation coefficients (r) between various soil parameters and raw wavelengths (400–2400 nm). The
color blue represents the corresponding selected wavelength and statistical significance (p < 0.01).

effective spectral information. The absorption peaks (500 nm, 1413 nm, foundation for subsequent spectral modeling.
and 2200 nm) affected by organic components and clay minerals, such To understand the spectral features and synergy among multiple
as smectite, kaolinite, and illite, can provide crucial information on soil bands in detail, we traversed all wavelengths within the Vis-NIR region,
constituents. These abovementioned investigations also laid the thus preserving all potential combinations of the three separate spectral

6
J. Wang et al. Geoderma 405 (2022) 115399

Fig. 4. Average FOD spectral curves of the collected soil samples from order 0.0 to 2.0 (increment of 0.25 per step). The black solid line and colored shaded region in
each subfigure represent the average reflectivity and spectral standard deviation, respectively.

bands. Correspondingly, the contour map of the correlations (r) between rather than the original reflectance.
the Cr concentration and six forms of the TBI for different FOD spectra
were analyzed (Table 3 and Fig. 5). Similar to the distribution of sen­
3.3. Model performance and comparison
sitive wavelengths from Fig. 3, the obtained important bands selected
based on the TBIs were mainly concentrated within the regions of
In this study, the predictor variables of various modeling strategies
400–700 nm and 1700–2400 nm. The optimal result was generated by
are defined in Tables 2 and 3. To investigate whether the application of
TBI1 based on 0.75-order reflectance spectra (max |r| = 0.68). The
TBIs can improve the model, the PLS model was employed to calibrate
correlation coefficient was improved by 0.12 compared to that of the
the estimation models, and the results were compared for four different
original reflectance (0-order). Additionally, TBI1 generated the most
modeling strategies. Thus, a total of 36 PLS models (nine FOD orders and
band combinations (n = 4). In terms of the results based on various FOD
four strategies) were calibrated based on the designed variable pre­
reflectance spectra, the maximum correlation coefficients (|r|) of the
dictors and soil Cr concentrations as the input and response variables,
modes of 0-, 0.25-, 0.5-, 0.75-, 1-, 1.25-, 1.5-, 1.75-, and 2nd-order
respectively. The corresponding descriptive evaluation statistics are
reflectance were 0.58, 0.54, 0.62, 0.68, 0.64, 0.63, 0.63, 0.62, and
summarized in Fig. 6.
0.63, respectively. These results suggest that FOD pretreatment can
The introduction of spectral transformations improved model per­
provide additional detailed spectral parameters associated with soil Cr
formance. For example, in strategy I, the model based on the original

Table 3
Sensitive TBIs derived from the optimal band combination algorithm and FOD-pretreated spectrum (1D indicates the Pearson correlation between Cr and the
reflectance spectra, and the maximum absolute correlation coefficient (|r|) in bold indicates that the corresponding band combination is most effective in this study).
FOD 1D Optimal band combination and corresponding correlation coefficient (|r|)

TBI1 |r| TBI2 |r| TBI3 |r| TBI4 |r| TBI5 |r|

0 0.34 (R550-R470)/(R550- 0.56 R620/(R2180 × 0.54 (R550-R470)/(R550-2 × R1740 0.58 R1370/(R1710 + 0.53 (R650-R1920)/(R1920- 0.55
R620) R570) + R1670) R1050) R600)
0.25 0.37 (R1800-R1820)/ 0.54 R510/(R2630 × 0.46 (R950-R2280)/(R950-2 × 0.49 R1280/(R1620 + 0.44 (R2160-R940)/(R940- 0.47
(R1800-R1900) R1940) R2280 + R1180) R1070) R2310)
0.50 0.44 (R1960-R2210)/ 0.62 R2330/(R2320 × 0.61 (R1170-R1990)/(R1170-2 × 0.54 R2210/(R1900 + 0.56 (R890-R2000)/(R2000- 0.57
(R1960-R2300) R1430) R1990 + R950) R580) R1320)
0.75 0.51 (R1120-R850)/(R1120- 0.68 R1650/(R1310 × 0.49 (R1560-R1180)/(R1560-2 × 0.51 R1700/(R1390 + 0.54 (R2290-R1850)/ 0.57
R1950) R1210) R1180 + R740) R500) (R1850-R840)
1 0.50 (R750-R660)/(R750- 0.54 R1820/(R1230×R810) 0.64 (R2140-R1130)/(R2140-2 × 0.58 R1150/(R1840 + 0.56 (R640-R750)/(R750- 0.59
R1840) R1130 + R980) R780) R1300)
1.25 0.46 (R1970-R1410)/ 0.53 R1030/(R970 × 0.63 (R1440-R720)/(R1440-2 × 0.54 R2200/(R2290 + 0.55 (R470-R2380)/(R2380- 0.58
(R1970-R2290) R760) R720 + R1440) R1970) R800)
1.50 0.31 (R1000-R1200)/ 0.63 R1070/(R2180 × 0.54 (R2100-R830)/(R2100-2 × 0.61 R2150/(R1480 + 0.53 (R2080-R2240)/ 0.58
(R1000-R1690) R1290) R830 + R760) R1320) (R2240-R1640)
1.75 0.38 (R780-R2000)/(R780- 0.57 R1420/(R1040 × 0.58 (R1260-R410)/(R1260-2 × 0.62 R2000/(R1960 + 0.61 (R2200-R2330)/ 0.57
R1960) R1000) R410 + R700) R780) (R2330-R1820)
2 0.40 (R1650-R1970)/ 0.60 R490/(R1840 × 0.63 (R1510-R810)/(R1510-2 × 0.62 R1970/(R2260 + 0.60 (R590-R450)/(R450- 0.61
(R1650-R2260) R1690) R810 + R1540) R1650) R700)

7
J. Wang et al. Geoderma 405 (2022) 115399

Fig. 5. Correlation between the Cr concentration and different TBIs for the entire dataset: (a) 0-order (raw data), (b) 0.25-order, (c) 0.5-order, (d) 0.75-order, (e) 1-
order (first-order), (f) 1.25-order, (g) 1.5-order, (h) 1.75-order, and (i) 2.0-order (second-order) derivatives. The right color bar in each subfigure is the range of
correlation (r) values, and the x-, y-, and z-axes are spectral wavelengths from 400 to 2400 nm.

spectrum (R2v = 0.67 and RMSEv = 10.47 mg kg− 1) was inferior to that grouped along the actual 1:1 line. These fitted lines suggest that a strong
based on the 0.75-order spectrum (R2v = 0.79 and RMSEv = 8.98 mg correlation exists between the laboratory-measured and estimated Cr
kg− 1). The results in Table 3 also confirm this finding. Generally, the concentrations. However, it is noted that none of the calibrated models
estimation performance of the model with strategy II (with effective TBIs yield high values of Cr (≥80 mg kg− 1) based on the selected predictors.
as predictor variables) was better than that for strategy I (with full Vis- Among all the modeling schemes, the PLS models based on strategy IV
NIR variables as predictors), as reflected by the higher R2 and lower perform best. Moreover, the R2 and RMSE values, general statistical
RMSE values. This result suggested that the estimation models con­ indicators, between the calibration and validation datasets were similar
structed considering TBIs were better than those based on the original for all established models, highlighting the robust quantitative perfor­
spectrum, regardless of whether FOD processing was performed. When mance of the models (neither overfitting nor underfitting occurs).
one soil auxiliary attribute (OM) was considered (strategy III), the Consequently, these selected optimal models and outcomes were used
quantitative performance of the models improved, with satisfactory R2cv for the subsequent spatial visualization of soil Cr levels.
(0.67–0.84) and R2v (0.71–0.84) values. When both soil auxiliary vari­
ables and effective TBIs were employed for the inputs (strategy IV), the 3.4. Spatial visualization of soil Cr levels
prediction accuracies of the model further increased. For each modeling
strategy, we selected the optimal quantitative model to illustrate the The spatial heterogeneity statistics of soil Cr are indispensable for
relationships between the measured and estimated Cr concentrations predicting its spatial distribution. The experimental semivariograms and
(Fig. 7). related parameters are illustrated in Fig. 8 and Table 4. For both the lab-
Compared with the original spectrum (order = 0) and IOD spectrum and model-based Cr concentrations, the corresponding fitted variograms
(i.e., the first-order and second-order derivatives), a specific FOD pre­ all follow exponential models with good fitting performance. The pa­
treatment could improve the estimation performance. Specifically, the rameters of the variogram estimated from strategy IV (Fig. 8e) are
model based on 0.75-order reflectance spectra was optimal, with the similar to those estimated for the real soil Cr concentration (laboratory-
best prediction ability (R2cv = 0.84, RMSEv = 6.33 mg kg− 1, R2v = 0.87, based measurements, Fig. 8a) at the same locations, and the ranges,
RMSEv = 7.34 mg kg− 1, and RPIQ = 2.68). Regarding the modeling nuggets, and structural variances are also similar. Moreover, the nugget/
strategy, the 1:1 lines and confidence ellipses (95% confidence) for the sill ratios are much <25%, which reflects the strong spatial dependence
results of the PLS models based on the 0.75-order spectra were well of regional soil Cr concentrations.

8
J. Wang et al. Geoderma 405 (2022) 115399

Fig. 6. Accuracy statistics for models based on four modeling strategies: (I) full Vis-NIR variables, (II) effective TBIs, (III) the combination of effective TBIs and OM,
and (IV) the combination of effective TBIs, OM and the pH. (a) Latent variables (LVs), (b) coefficient of determination for the calibration process (R2cv), (c) root mean
squared error (mg kg− 1) for the calibration process (RMSEcv), (d) coefficient of determination for the validation process (R2v), (e) root mean squared error (mg kg− 1)
for the validation process (RMSEv), and (f) the ratio of performance to interquartile distance (RPIQ).

Considering the satisfactory performance of the semivariogram overlooked, which may significantly threaten the modeling accuracy
models, we produced soil Cr concentration maps (laboratory-based and and robustness. Although our main interest is in the predictive perfor­
strategy IV model-based maps) by integrating the optimal semivario­ mance of the models, it is informative to interpret the structure of the
gram function and kriging method (Fig. 9). Both spatial distributions models. The machine learning algorithms are often considered “black
exhibited similar distribution trends: lower values were concentrated in boxes” since the models derived from these algorithms are difficult to
the middle part of the study area, and prominently higher values were interpret (Zhu et al. 2017). Even though some algorithms provide var­
mainly in observed the western part of the region, thus validating the iable importance measures, such as the RF and the Cubist, the magnitude
accuracy of the prediction model. By comparison, the western part of the and direction of the predictor effects are unknown. It is obvious that
investigation area was not obviously contaminated, and low Cr con­ models built with PLS are easier to interpret. Under these conditions, we
centrations were observed. aimed to mitigate some of the criticisms and explore the possibility of
perpetuating the Vis-NIR myth based on PLS (Baveye and Laba 2015).
4. Discussion and perspectives Among the available spectral analysis methods, the FOD approach is
a powerful mathematical method that can highlight the differences in
Over the past several decades, many scholars have suggested that spectral information (Wang et al. 2020). The development of the FOD
Vis-NIR reflectance spectroscopy is an effective approach for the mea­ algorithm was based on the extension of the IOD method, and its ra­
surement of soil toxic metal concentrations (Gholizadeh and Kopačková tionality and excellent performance have been recognized (Abulaiti
2019; Jia et al. 2021; Shi et al. 2014a). Considering the limited perfor­ et al. 2020). Many existing studies have shown that the FOD algorithm
mance of conventional linear models, an increasing number of studies can remove baseline drift and overlapping peaks from spectral data
have suggested that advanced machine learning (ML) algorithms can (Hong et al. 2019a; Wang et al. 2018b; Zhang et al., 2020b). Moreover,
effectively capture both linear and nonlinear relationships between this algorithm can capture the details of spectral curves and improve the
specific soil properties and Vis-NIR datasets (Jia et al. 2021). Despite the estimation accuracy of soil characteristic values. Wang et al. (2018b)
broad application of MLs in proximal soil sensing, there have long been and Hong et al. (2020) investigated the performances of the FOD
debates regarding their use and model interpretability (Padarian et al. approach based on different parameters, such as the Pearson correlation,
2020; Wang et al. 2018a). The corresponding uncertainties, including peak signal-to-noise ratio (PSNR), and structural similarity index
strong randomness, overfitting and/or underfitting, are often (SSIM). These studies indicated that the FOD algorithm possesses the

9
J. Wang et al. Geoderma 405 (2022) 115399

Fig. 7. Scatterplots of the lab-measured versus estimated Cr concentration for PLS models with the optimal predictor variables based on four modeling strategies. The
blue, red, and black lines in each sub-figure represent the 1:1 line, fitted line and confidence ellipse with 95% probability, respectively.

Fig. 8. Empirical variograms of the soil Cr concentrations at 168 sites obtained from (a) laboratory-measured values and the corresponding estimated values with the
PLS model based on (b) strategy I, (c) strategy II, (d) strategy III, and (e) strategy IV, respectively.

capacity to gather information because the reflectance of the x-band of of the FOD method was feasible and effective for extracting spectral
the FOD spectrum is obtained by summing the reflectance of the other features, and the obtained correlation coefficient values at various FODs
bands with certain weights (Eqs. (1)–(3)). In this study, the application were generally better than those of raw data (Table 3). Commonly, the

10
J. Wang et al. Geoderma 405 (2022) 115399

Table 4
Parameters of the semivariogram models.
Model Range (km) Structural variance (Sill) Nugget Nugget/Sill ratio R2

Laboratory-measured Cr Exponential 14.53 1.01 0.02 2.24% 0.82


Estimated Cr for strategy I 20.39 0.88 0.15 17.43% 0.71
Estimated Cr for strategy II 25.59 0.88 0.19 21.20% 0.78
Estimated Cr for strategy III 11.85 0.97 0.00 0.28% 0.77
Estimated Cr for strategy IV 13.82 1.02 0.01 1.24% 0.80

Fig. 9. Distribution maps of soil Cr levels predicted with the kriging method based on (a) field measured values and (b) optimal PLS-estimated values for strategy IV.

corresponding weights are related to the spectral distance values strengthened and noise can be weakened. Many studies have adopted
(Schmitt 1998). The FOD method uses the same metric. Based on the this method but have mainly focused on conventional two-dimensional
mathematical definitions of the first derivative (slope of the spectrum) spectral indices (difference index, normalized difference index, or ratio
and second derivative (curvature of the spectrum), we can consider the index) rather than TBIs (Hong et al. 2019b; Wang et al. 2018b). Based on
FOD an effective indicator for quantifying the sensitivity to the slope and the absolute value of the correlation coefficient (|r|), we explored and
curvature of a spectrum. The spectral shape of the collected soil samples selected the potential optimal band combination (Table 3 and Fig. 5).
was relatively flat. In other words, the slope effect was more obvious Based on 0.75-order reflectance spectra, TBI1 ((R1120-R850)/(R1120-
than that of the curvature. Consequently, the optimal FOD (0.75-order R1950)) generated the optimal result (max |r| = 0.68). The results proved
derivative) was located in the range of low-frequency FODs (0–1). The that the FOD algorithm performed better than the IOD approach.
optimal band combination algorithm (based on TBIs in this study) tra­ In existing studies of soil toxic metal estimation, scholars often paid
verses all wavelength spectral signals, and through the combination of limited attention to soil auxiliary information. With reference to field
two or more wavelengths, the spectral variation characteristics can be investigations and core concept of DSM, we designed four different

11
J. Wang et al. Geoderma 405 (2022) 115399

modeling strategies and corresponding predictor variables in this study with auxiliary covariates in an arid region (eastern Junggar coalfield).
(Table 1). Subsequently, we compared the performance of different Therefore, there are clear spectral estimation mechanisms that support
modeling modes (introducing auxiliary attributes or not). Among them, the use of soil auxiliary attributes such as the OM and pH levels in
we found that Strategy IV (Effective TBI + OM + pH) was superior to the spectral estimation in this study.
others. The improved performance mainly attributed to the introduction Geostatistical methods can be effectively applied to analyze the
of auxiliary covariates. In fact, we also attempted to estimate soil Cr characteristics of the spatial distribution structures of regional variables,
based on auxiliary attributes independently without spectral informa­ and the variance function is commonly used. Under appropriate prior
tion. Not unexpectedly, the accuracy was poor (R2v = 0.542, RMSEv = assumptions, kriging based on the prior covariance produced the best
12.25 mg kg− 1, and RPIQ = 1.48), indicating that approximate 50% of linear, unbiased prediction of the median. Moreover, kriging is among
the Cr information was still not captured. Correspondingly, this specific the most common practical methods applied in spatial analysis studies
model was excluded from this study. According to (Hong et al. 2019c), it (Oliver and Webster 2015). Therefore, we predicted the spatial distri­
is suggested that using auxiliary attributes to enhance the predictive bution of Cr based on estimated values and a kriging interpolation
performance may be promising, but Vis-NIR spectroscopy is indispens­ method with optimal semivariograms. The semivariogram models re­
able. Soil study is usually considered to be a multi-factor problem flected the accuracy of the predicted spatial distribution of Cr in the soil.
(Viscarra Rossel et al. 2006b). Therefore, estimation merely using The results indicated that the Cr distribution features strong spatial
auxiliary attributes is undesirable, and proximal soil sensing (especially dependence (Table 4). In addition, obvious Cr enrichment was detected
for Vis-NIR spectroscopy) is still crucial and attractive in Cr estimation. in the study area (Fig. 9). The eastern Junggar coalfield is located in the
It was obvious that the full Vis-NIR variables and effective TBI are more Gobi Desert region in the southwestern Karamaili Nature Reserve with
comprehensively reflecting the soil properties, which includes not only extreme environmental conditions. Prior to production activities related
OM and pH but also texture and clay minerals (Minasny and McBratney to coal mining, the study area was not exposed to human disturbances;
2016; Rossel and Behrens, 2010). Therefore, we believe using Vis-NIR therefore, the local soil was under natural conditions. Heavy industry
variables and effective TBI can better estimate the soil Cr than merely has been promoted in the recent past, and industrial discharges from the
using auxiliary attributes. local metal smelting, raw material, and manufacturing industries are
In soil systems, Cr is strongly associated with OM and the pH via the considered other primary sources (Imin et al. 2020). From the distri­
metal complexation effect (Shi et al. 2018a). The first reaction process bution map and the results of previous studies and main contents in
when Cr enters the soil is adsorption. Many external factors affect the regional coal deposit, we speculate that the variance in regional soil Cr is
adsorption behavior of Cr, such as the soil type, OM level, pH, clay affected by the parent soil materials and industrial discharge sources.
mineral content and amounts of active iron and aluminum oxides (Wu It is noted that the cost of OM and pH measurements in the labora­
et al. 2007). Specifically, variations in the pH may affect the form of and tory are cheap and general institutes can measure them. However, we
binding reactions involving soil Cr. As the major provider of variable still need more time to measure them in the laboratory (especially for
charges, OM has a direct influence on the absorption of chromium ions SOM) and therefore make the prediction of Cr not that rapid any more.
in the soil. Humic acid is an essential component of OM. The contained In this context, advanced proximal soil sensing technology and portable
carbonyl, carboxyl, and hydroxyl groups mainly contribute to the X-ray fluorescence spectroscopy (pXRF) have become popular recently
negative surface charge of the soil (Jia et al. 2021; Rossel and Behrens, (Wan et al. 2020; Xu et al. 2019). The technology in itself is well
2010). In fact, Chromium is a low mobility element, especially under established in the laboratory for geochemical and chemical analysis, but
moderately oxidizing and reducing conditions and near-neutral pH it was only after its incorporation in a portable handheld device that it
values. In soil, Cr behavior is governed by pH, redox potential (Eh), and became popular for fast, low cost, in situ environmental monitoring
soil organic matter. Its adsorption by clay is also highly dependent on including detecting the occurrence of higher concentrations of heavy
pH. Cr(VI) adsorption decreases with increasing pH, and Cr(III) metals in the soil. The pXRF sensor provides adequate analytical accu­
adsorption increases with increasing pH. The dominant effect of organic racy and have been used successfully in existing studies (Horta et al.
matter is the stimulation of the reduction of Cr(VI) to Cr(III), the rate of 2021; Ravansari et al. 2020; Shin et al. 2019). The ability to integrate
which increases with soil acidity (Kabata-Pendias 2004; Salminen et al. pXRF with Vis-NIR has displayed considerable potential to improve the
2005). In general, whether it is OM, pH, or soil texture that connect estimation performance for potentially toxic elements. Though not
potentially toxic elements to spectral data, the spectral estimation of soil considered in this study, we intend to do so in the future to joint use
Cr is mainly dependent on its correlation with the spectrally active pXRF with Vis-NIR data for long-term soil Cr mapping and believe we
components, unless the Cr is present at very high concentrations could further reduce the potential uncertainty and improve the accuracy
(>4000 mg kg− 1). of the predictions. To reduce potential uncertainty and further make
In this study, FOD pretreatment and optimal band combination more solid predictions, we can investigate the geochemical homogeneity
methods were implemented for spectral data mining and the derivation condition of soil sample compositions based on X-ray diffraction (XRD)
of spectral parameters, respectively. These operations avoided intro­ analysis, inductively coupled plasma mass spectrometry (ICP-MS), and
ducing too many irrelevant variables, resulting in a decrease in model pXRF measurements (Lim et al. 2019). Other than these aforementioned
accuracy. PLS, a traditional but effective regression approach, was uti­ works, we also intend to focus on conducting studies on the spatial
lized to establish soil Cr estimation models for four designed modeling mapping of soil toxic metals based on advanced DSM technology to
schemes on the above basis. The results showed that the model based on obtain the optimal mapping strategy. Assessing the spatial distribution
the combination of TBIs and soil auxiliary information is promising (R2v of soil Cr is only the first step, how to protect local ecology and envi­
= 0.87 and RPIQ = 2.68). The differences between max |r| and R2 values ronment and ensure the development of natural resources, ecological
may be attributed to the potential outliers in different dataset. More­ governance, and achieve sustainable development is our ultimate goal.
over, many previous studies have investigated the spectral behavior
and/or responses with auxiliary components in a tailing, stream sedi­ 5. Conclusion
ment, and soil before constructing the models (Jeong et al., 2018; Lim
et al. 2019; Shin et al. 2019). Although these models were all site specific In this study, we used the FOD algorithm to preprocess the spectral
and the equations established in one study cannot extend to other sites dataset and investigated the potential for integrating soil auxiliary data
with different metal types and soils, these results indicate that geological and optimal band combinations for soil Cr estimation. The FOD algo­
processes associated with formation of soils and tailings are the major rithm was effectively used to process spectral curves, which improved
controlling factors of spectral responses to heavy metal contamination. the resolution of the spectrum and illustrated detailed variations in
Similar to aforementioned published results, the Cr is closely associated spectral features. In addition, the correlation coefficients of the

12
J. Wang et al. Geoderma 405 (2022) 115399

constructed TBIs indicated that the FOD approach can increase the Hong, Y., Guo, L., Chen, S., Linderman, M., Mouazen, A.M., Yu, L., Chen, Y., Liu, Y.,
Liu, Y., Cheng, H., Liu, Y.i., 2020. Exploring the potential of airborne hyperspectral
correlation between soil Cr and spectral data. Our results indicated that
image for estimating topsoil organic carbon: effects of fractional-order derivative
the introduction of soil auxiliary attributes (pH and OM) improved the and optimal band combination algorithm. Geoderma 365, 114228.
model estimation performance, with R2 and RPIQ values of 0.87 and Horta, A., Malone, B., Stockmann, U., Minasny, B., Bishop, T.F.A., McBratney, A.B.,
2.68, respectively, for the validation data set. The integration of the Pallasser, R., Pozza, L., 2015. Potential of integrated field spectroscopy and spatial
analysis for enhanced assessment of soil contamination: a prospective review.
proximal sensing and kriging interpolation methods yielded a soil Cr Geoderma 241-242, 180–209.
map with satisfactory performance. In the study area, the soil Cr dis­ Horta, A., Azevedo, L., Neves, J., Soares, A., Pozza, L., 2021. Integrating portable X-ray
tribution features strong spatial dependence and strong associations. fluorescence (pXRF) measurement uncertainty for accurate soil contamination
mapping. Geoderma 382, 114712.
Although Vis-NIR is of limited practical use in monitoring soil contam­ Hu, B., Shao, S., Ni, H., Fu, Z., Hu, L., Zhou, Y., Min, X., She, S., Chen, S., Huang, M.,
ination, we should be hopeful about its potential applications. This study Zhou, L., Li, Y., Shi, Z., 2020. Current status, spatial features, health risks, and
might inspire further research on soil contamination mapping in arid potential driving factors of soil heavy metal pollution in China at province level.
Environ. Pollut. 266, 114961.
regions based on proximal Vis-NIR sensors. The resulting conclusions Imin, B., Abliz, A., Shi, Q., Liu, S., Hao, L.i., 2020. Quantitatively assessing the risks and
improve our understanding of the effects of FOD technology and soil possible sources of toxic metals in soil from an arid, coal-dependent industrial region
auxiliary covariates on the spectral estimation of soil Cr levels. in NW China. J. Geochem. Explor. 212, 106505.
Jeong, Y., Yu, J., Wang, L., & Shin, J.H., 2018. Spectral Responses of As and Pb
Contamination in Tailings of a Hydrothermal Ore Deposit: A Case Study of
Declaration of Competing Interest Samgwang Mine, South Korea. Remote Sensing, 10.
Jia, X., O’Connor, D., Shi, Z., Hou, D., 2021. VIRS based detection in combination with
machine learning for mapping soil pollution. Environ. Pollut. 268, 115845.
The authors declare that they have no known competing financial Kabata-Pendias, A., 2004. Soil–plant transfer of trace elements—an environmental issue.
interests or personal relationships that could have appeared to influence Geoderma 122, 143–149.
Levene, H., 1960. Robust Tests for Equality of Variances. Stanford University Press,
the work reported in this paper. California.
Li, X., Liu, X., Liu, M., Wang, C., Xia, X., 2015. A hyperspectral index sensitive to subtle
Acknowledgments changes in the canopy chlorophyll content under arsenic stress. Int. J. Appl. Earth
Obs. Geoinf. 36, 41–53.
Li, F., Mistele, B., Hu, Y., Chen, X., Schmidhalter, U., 2014. Optimising three-band
The authors wish to thank Prof. Jianli Ding and Dr. Dong Zhang for spectral indices to assess aerial N concentration, N uptake and aboveground biomass
helping in providing helpful suggestions and sample collection. This of winter wheat remotely in China and Germany. ISPRS J. Photogramm. Remote
Sens. 92, 112–123.
study was supported by the National Natural Science Foundation of Liang, S., Li, X., Wang, J., 2012. Advanced Remote Sensing. Academic Press, New York.
China (41890854 and 41701476), National Key Research and Devel­ Lim, J., Yu, J., Wang, L., Jeong, Y., Shin, J.H., 2019. Heavy metal contamination index
opment Program of China (2019YFB2102703), Guangdong Basic and using spectral variables for white precipitates induced by acid mine drainage: a case
study of Soro Creek, South Korea. IEEE Trans. Geosci. Remote Sens. 57, 4870–4888.
Applied Basic Research Foundation (2020A1515111142), and China Liu, B., Ai, S., Zhang, W., Huang, D., Zhang, Y., 2017. Assessment of the bioavailability,
Postdoctoral Science Foundation (2020M672776). The funders had no bioaccessibility and transfer of heavy metals in the soil-grain-human systems near a
role in study design, data collection and analysis, decision to publish, or mining and smelting area in NW China. Sci. Total Environ. 609, 822–829.
Liu, Y.i., Liu, Y., Chen, Y., Zhang, Y., Shi, T., Wang, J., Hong, Y., Fei, T., Zhang, Y., 2019.
preparation of the manuscript.
The influence of spectral pretreatment on the selection of representative calibration
samples for soil organic matter estimation using Vis-NIR reflectance spectroscopy.
References Remote Sens. 11, 450.
Liu, J., Zhang, Y., Wang, H., Du, Y., 2018. Study on the prediction of soil heavy metal
elements content based on visible near-infrared spectroscopy. Spectrochim. Acta
Abulaiti, Y., Sawut, M., Maimaitiaili, B., Chunyue, M.a., 2020. A possible fractional order
Part A Mol. Biomol. Spectrosc. 199, 43–49.
derivative and optimized spectral indices for assessing total nitrogen content in
Malley, D.F., Williams, P.C., 1997. Use of near-infrared reflectance spectroscopy in
cotton. Comput. Electron. Agric. 171, 105275.
prediction of heavy metals in freshwater sediment by their association with organic
Bao, S., 2005. Soil and Agricultural Chemistry Analysis. China Agriculture Press, Beijing.
matter. Environ. Sci. Technol. 31, 3461–3467.
Baveye, P.C., Laba, M., 2015. Visible and near-infrared reflectance spectroscopy is of
McBratney, A., Field, D.J., Koch, A., 2014. The dimensions of soil security. Geoderma
limited practical use to monitor soil contamination by heavy metals. J. Hazard.
213, 203–213.
Mater. 285, 137–139.
McBratney, A.B., Mendonça Santos, M.L., Minasny, B., 2003. On digital soil mapping.
Bellon-Maurel, V., Fernandez-Ahumada, E., Palagos, B., Roger, J.-M., McBratney, A.,
Geoderma 117 (1-2), 3–52.
2010. Critical review of chemometric indicators commonly used for assessing the
Minasny, B., McBratney, A.B., 2010. Methodologies for Global Soil Mapping. In:
quality of the prediction of soil attributes by NIR spectroscopy. TrAC, Trends Anal.
Boettinger, J.L., Howell, D.W., Moore, A.C., Hartemink, A.E., Kienast-Brown, S.
Chem. 29, 1073–1081.
(Eds.), Digital Soil Mapping: Bridging Research, Environmental Application, and
Ben-Dor, E., Inbar, Y., Chen, Y., 1997. The reflectance spectra of organic matter in the
Operation. Springer, Netherlands, Dordrecht, pp. 429–436.
visible near-infrared and short wave infrared region (400–2500 nm) during a
Minasny, B., McBratney, A.B., 2016. Digital soil mapping: A brief history and some
controlled decomposition process. Remote Sens. Environ. 61, 1–15.
lessons. Geoderma 264, 301–311.
Cao, X., 2007. Regulating mine land reclamation in developing countries: the case of
Mustafa, B.M., Al-Quraishi, A.M.F., Gholizadeh, A., Saberioon, M., 2020. Proximal Soil
China. Land Use Policy 24 (2), 472–483.
Sensing for Soil Monitoring. In: Al-Quraishi, A.M.F., Negm, A.M. (Eds.),
Chakraborty, S., Weindorf, D.C., Deb, S., Li, B., Paul, S., Choudhury, A., Ray, D.P., 2017.
Environmental Remote Sensing and GIS in Iraq. Springer International Publishing,
Rapid assessment of regional soil arsenic pollution risk via diffuse reflectance
Cham, pp. 95–118.
spectroscopy. Geoderma 289, 72–81.
Nachtergaele, F.O., Spaargaren, O., Deckers, J.A., Ahrens, B., 2000. New developments
Cheng, H., Shen, R., Chen, Y., Wan, Q., Shi, T., Wang, J., Wan, Y., Hong, Y., Li, X., 2019.
in soil classification: world reference base for soil resources. Geoderma 96 (4),
Estimating heavy metal concentrations in suburban soils with reflectance
345–357.
spectroscopy. Geoderma 336, 59–67.
Nawar, S., Buddenbaum, H., Hill, J., Kozak, J., Mouazen, A.M., 2016. Estimating the soil
Ellis, A.S., Johnson, T.M., Bullen, T.D., 2002. Chromium isotopes and the fate of
clay content and organic matter by means of different calibration methods of vis-NIR
hexavalent chromium in the environment. Science 295, 2060–2062.
diffuse reflectance spectroscopy. Soil Tillage Res. 155, 510–522.
Gao, Y., Xia, J., 2011. Chromium contamination accident in China: viewing environment
Oliver, M.A., Webster, R., 2014. A tutorial guide to geostatistics: Computing and
policy of China. Environ. Sci. Technol. 45, 8605–8606.
modelling variograms and kriging. CATENA 113, 56–69.
Gholizadeh, A., Kopačková, V., 2019. Detecting vegetation stress as a soil contamination
Oliver, M.A., Webster, R., 2015. Geostatistical Prediction: Kriging. Basic Steps in
proxy: a review of optical proximal and remote sensing techniques. Int. J. Environ.
Geostatistics: The Variogram and Kriging. Springer International Publishing, Cham,
Sci. Technol. 16, 2511–2524.
pp. 43–69.
Hong, Y., Chen, S., Liu, Y., Zhang, Y., Yu, L., Chen, Y., Liu, Y., Cheng, H., Liu, Y., 2019a.
Padarian, J., Minasny, B., McBratney, A.B., 2020. Machine learning and soil sciences: a
Combination of fractional order derivative and memory-based learning algorithm to
review aided by machine learning tools. Soil 6, 35–52.
improve the estimation accuracy of soil organic matter by visible and near-infrared
Pebesma, E., Heuvelink, G., 2016. Spatio-temporal interpolation using gstat. R J. 8,
spectroscopy. CATENA 174, 104–116.
204–218.
Hong, Y., Liu, Y., Chen, Y., Liu, Y., Yu, L., Liu, Y., Cheng, H., 2019b. Application of
R Development Core Team, 2019. R: A language and environment for statistical
fractional-order derivative in the quantitative estimation of soil organic matter
computing. In: http://www.R-project.org.R.Foundation.for.Statistical.Computing.
content through visible and near-infrared spectroscopy. Geoderma 337, 758–769.
Ravansari, R., Wilson, S.C., & Tighe, M., 2020. Portable X-ray fluorescence for
Hong, Y., Shen, R., Cheng, H., Chen, S., Chen, Y., Guo, L., He, J., Liu, Y., Yu, L., Liu, Y.,
environmental assessment of soils: Not just a point and shoot method. Environment
2019c. Cadmium concentration estimation in peri-urban agricultural soils: using
International, 134, 105250.
reflectance spectroscopy, soil auxiliary information, or a combination of both?
Geoderma 354, 113875.

13
J. Wang et al. Geoderma 405 (2022) 115399

Ribeiro Jr, P.J., Diggle, P.J., Ribeiro Jr, M.P.J., Suggests, M., 2007. The geoR package. Wang, J., Ding, J., Abulimiti, A., Cai, L., 2018a. Quantitative estimation of soil salinity by
R news 1, 14–18. means of different modeling methods and visible-near infrared (VIS–NIR)
Rossel, R.A.V., Behrens, T., 2010. Using data mining to model and interpret soil diffuse spectroscopy, Ebinur Lake Wetland, Northwest China. PeerJ 6, e4703.
reflectance spectra. Geoderma 158, 46–54. Wang, J., Ding, J., Yu, D., Ma, X., Zhang, Z., Ge, X., Teng, D., Li, X., Liang, J., Lizaga, I.,
Salminen, R., Batista, M., Bidovec, M., Demetriades, A., De Vivo, B., De Vos, W., Chen, X., Yuan, L., Guo, Y., 2019. Capability of Sentinel-2 MSI data for monitoring
Duris, M., Gilucis, A., Gregorauskiene, V., Halamić, J., 2005. Geochemical Atlas of and mapping of soil salinity in dry and wet seasons in the Ebinur Lake region,
Europe, Part 1 — Background Information, Methodology and Maps. Geological Xinjiang, China. Geoderma 353, 172–187.
Survey of Finland, Espoo. Wang, J., Shi, T., Yu, D., Teng, D., Ge, X., Zhang, Z., Yang, X., Wang, H., Wu, G., 2020.
Schmitt, J.M., 1998. Fractional derivative analysis of diffuse reflectance spectra. Appl. Ensemble machine-learning-based framework for estimating total nitrogen
Spectrosc. 52, 840–846. concentration in water using drone-borne hyperspectral imagery of emergent plants:
Shi, T., Chen, Y., Liu, Y., Wu, G., 2014a. Visible and near-infrared reflectance a case study in an arid oasis, NW China. Environ. Pollut. 266, 115412. https://doi.
spectroscopy—an alternative for monitoring soil contamination by heavy metals. org/10.1016/j.envpol.2020.115412.
J. Hazard. Mater. 265, 166–176. Wang, X., Zhang, F., Kung, H.-T., Johnson, V.C., 2018b. New methods for improving the
Shi, T., Wang, J., Chen, Y., Wu, G., 2016. Improving the prediction of arsenic contents in remote sensing estimation of soil organic matter content (SOMC) in the Ebinur Lake
agricultural soils by combining the reflectance spectroscopy of soils and rice plants. Wetland National Nature Reserve (ELWNNR) in northwest China. Remote Sens.
Int. J. Appl. Earth Obs. Geoinf. 52, 95–103. Environ. 218, 104–118.
Shi, T., Guo, L., Chen, Y., Wang, W., Shi, Z., Li, Q., Wu, G., 2018a. Proximal and remote Wei, T., Simko, V., Levy, M., Xie, Y., Jin, Y., & Zemla, J. (2017). Corrplot: visualization of
sensing techniques for mapping of soil contamination with heavy metals. Appl. a correlation matrix. R package version 0.84. In: https://CRAN.R-project.org/pack
Spectrosc. Rev. 53, 783–805. age=corrplot.
Shi, T., Hu, Z., Shi, Z., Guo, L., Chen, Y., Li, Q., Wu, G., 2018b. Geo-detection of factors Wold, S., Martens, H., & Wold, H. (1983). The multivariate calibration problem in
controlling spatial patterns of heavy metals in urban topsoil using multi-source data. chemistry solved by the PLS method. In (pp. 286-293). Berlin, Heidelberg: Springer
Sci. Total Environ. 643, 451–459. Berlin Heidelberg.
Shi, Z., Wang, Q., Peng, J., Ji, W., Liu, H., Li, X., Rossel, R.A.V., 2014b. Development of a Wu, Y., Chen, J., Ji, J., Gong, P., Liao, Q., Tian, Q., Ma, H., 2007. A mechanism study of
national VNIR soil-spectral library for soil classification and prediction of organic reflectance spectroscopy for investigating heavy metals in soils. Soil Sci. Soc. Am. J.
matter concentrations. Sci. China Earth Sci. 57, 1671–1680. 71, 918–926.
Shin, J.H., Yu, J., Wang, L., Kim, J., Koh, S.-M., Kim, S.-O., 2019. Spectral responses of Xia, X.Q., Mao, Y.Q., Ji, J.F., Ma, H.R., Chen, J., Liao, Q.L., 2007. Reflectance
heavy metal contaminated soils in the vicinity of a hydrothermal ore deposit: a case spectroscopy study of Cd contamination in the sediments of the Changjiang River,
study of Boksu Mine, South Korea. IEEE Trans. Geosci. Remote Sens. 57, 4092–4106. China. Environ. Sci. Technol. 41, 3449–3454.
State Environmental Protection Agency of China (2004). Technical Specification for Soil Xu, D., Chen, S., Viscarra Rossel, R.A., Biswas, A., Li, S., Zhou, Y., Shi, Z., 2019. X-ray
Environmental Monitoring (HJ/T 166–2004). In. Beijing: China Environmental Press fluorescence and visible near infrared sensor fusion for predicting soil chromium
Beijing. content. Geoderma 352, 61–69.
Tian, Y., Zhang, J., Yao, X., Cao, W., Zhu, Y., 2013. Laboratory assessment of three Zeng, Q., Shen, L., Yang, J., 2020. Potential impacts of mining of super-thick coal seam
quantitative methods for estimating the organic matter content of soils in China on the local environment in arid Eastern Junggar coalfield, Xinjiang region, China.
based on visible/near-infrared reflectance spectra. Geoderma 202-203, 161–170. Environ. Earth Sci. 79, 88.
Tóth, G., Hermann, T., Szatmári, G., Pásztor, L., 2016. Maps of heavy metals in the soils Zhang, Z., Ding, J., Wang, J., Ge, X., 2020b. Prediction of soil organic matter in
of the European Union and proposed priority areas for detailed assessment. Sci. Total northwestern China using fractional-order derivative spectroscopy and modified
Environ. 565, 1054–1062. normalized difference indices. CATENA 185, 104257.
Viscarra Rossel, R.A., McGlynn, R.N., McBratney, A.B., 2006a. Determining the Zhang, Z., Ding, J., Zhu, C., Wang, J., Ma, G., Ge, X., Li, Z., Han, L., 2021. Strategies for
composition of mineral-organic mixes using UV–vis–NIR diffuse reflectance the efficient estimation of soil organic matter in salt-affected soils through Vis-NIR
spectroscopy. Geoderma 137, 70–82. spectroscopy: optimal band combination algorithm and spectral degradation.
Viscarra Rossel, R.A., Walvoort, D.J.J., McBratney, A.B., Janik, L.J., Skjemstad, J.O., Geoderma 382, 114729.
2006b. Visible, near infrared, mid infrared or combined diffuse reflectance Zhang, S., Fei, T., You, X., Wan, Y., Wang, Y., & Bian, M., 2020. Two hyperspectral
spectroscopy for simultaneous assessment of various soil properties. Geoderma 131, indices for detecting cadmium and lead contamination from arice canopy spectrum.
59–75. Land Degradation & Development, n/a.
Wadoux, A.-C., Minasny, B., McBratney, A.B., 2020. Machine learning for digital soil Zhou, J., Cao, X., 2020. What is the policy improvement of China’s land consolidation?
mapping: applications, challenges and suggested solutions. Earth Sci. Rev. 210, Evidence from completed land consolidation projects in Shaanxi Province. Land Use
103359. Policy 99, 104847.
Wan, M., Hu, W., Qu, M., Li, W., Zhang, C., Kang, J., Hong, Y., Chen, Y., Huang, B., 2020. Zhou, J., Zhuang, X., Alastuey, A., Querol, X., Li, J., 2010. Geochemistry and mineralogy
Rapid estimation of soil cation exchange capacity through sensor data fusion of of coal in the recently explored Zhundong large coal field in the Junggar basin,
portable XRF spectrometry and Vis-NIR spectroscopy. Geoderma 363, 114163. Xinjiang province, China. Int. J. Coal Geol. 82, 51–67.
Wang, Z.-X., Chen, J.-Q., Chai, L.-Y., Yang, Z.-H., Huang, S.-H., Zheng, Y.u., 2011. Zhu, X.X., Tuia, D., Mou, L., Xia, G.-S., Zhang, L., Xu, F., Fraundorfer, F., 2017. Deep
Environmental impact and site-specific human health risks of chromium in the learning in remote sensing: a comprehensive review and list of resources. IEEE
vicinity of a ferro-alloy manufactory, China. J. Hazard. Mater. 190, 980–985. Geosci. Remote Sens. Mag. 5, 8–36.
Wang, J., Cui, L., Gao, W., Shi, T., Chen, Y., Gao, Y., 2014. Prediction of low heavy metal
concentrations in agricultural soils using visible and near-infrared reflectance
spectroscopy. Geoderma 216, 1–9.

14

You might also like