You are on page 1of 15

Computers and Electronics in Agriculture 177 (2020) 105710

Contents lists available at ScienceDirect

Computers and Electronics in Agriculture


journal homepage: www.elsevier.com/locate/compag

Use of color parameters in the grouping of soil samples produces more T


accurate predictions of soil texture and soil organic carbon

José Janderson Ferreira Costaa, , Élvio Giassona, Elisângela Benedet da Silvab,
João Augusto Coblinskia, Tales Tiechera
a
Department of Soil Science, School of Agronomy, Federal University of Rio Grande do Sul State, Bento Gonçalves Avenue 7712, 91540-000 Porto Alegre, Brazil
b
Agricultural Research and Rural Extension Corporation of Santa Catarina, Rodovia Admar Gonzaga 1347, 88034-901 Florianópolis, Santa Catarina, Brazil

A R T I C LE I N FO A B S T R A C T

Keywords: Prediction of soil properties such as texture and soil organic carbon (SOC) content by reflectance spectroscopy
Reflectance spectroscopy (RS) is influenced by the heterogeneity of soil samples used to calibrate multivariate models. These soil prop-
Soil organic carbon erties are directly related to color, which, in turn, can be estimated by color parameters derived from the visible
Soil texture (Vis) spectrum at no additional cost. At present, only a few publications have addressed the effect that input data
Color parameters
structure and soil heterogeneity have on model performance. Therefore, the objectives of this study were to use
Multivariate calibration models
Vis-based-color parameters combined with multivariate statistical techniques to group soil samples and com-
paring the results of different SOC, clay, sand and silt prediction models. Soil sampling was conducted over an
area of approximately 500 ha in the region of São Joaquim National Park, Santa Catarina State, Brazil, where a
total of 260 soil samples were collected. Soil reflectance data were obtained by Vis-NIR-SWIR spectroscopy in the
laboratory, through a spectroradiometer that covers the 350–2500 nm. Soil organic carbon content was de-
termined by dry combustion in an elemental analyzer. Sand, silt and clay fractions were determined using the
pipette method. Twenty-two components of color parameters were derived from the Vis spectrum with the use of
colorimetry models. For the definition of the most appropriate number of soil sample clusters, two multivariate
statistical analyzes: principal component analysis (PCA) and cluster analysis of the samples were applied to the
color parameter values. Partial least squares regression (PLSR) and support vector machines (SVM) multivariate
models were calibrated for each cluster and also for the models without stratification using 260 soil samples and
95 selected samples (MWS-260 and MWS-95). Overall, the PLSR model performed better than the SVM model, as
confirmed by the statistical difference between RMSE results. Multivariate statistical analyzes applied to color
parameters were able to group soil samples with similar characteristics, reducing data amplitude and improving
the accuracy of soil property predictions. These analyzes demonstrated that the use of Vis-based-color para-
meters to group soil samples can be a quick and inexpensive way to increase the potential of spectroscopy to
accurately predict soil physical and chemical properties.

1. Introduction efficient methods to study soil properties. Among these methods, re-
flectance spectroscopy (RS) in the visible, near and shortwave infrared
Understanding the spatial variability of soil properties may improve (Vis-NIR-SWIR) regions (350–2500 nm) is a technique capable of
management and conservation strategies, as well as conscious use of measuring various soil properties at a low cost (O’Rourke and Holden,
soil resources (Gholizadeh et al., 2018). More data on the spatial 2011). Reflectance spectroscopy is a technique widely used in soil
variability of soil properties is necessary to support the development analysis laboratories that has recently also increased its field applica-
and implementation of management strategies (Nocita et al., 2014; tion. Soil spectral signatures allow inferring information about soil
Viscarra Rossel et al., 2016). However, due to the high costs of la- physical, chemical and mineralogical composition, and such inference
boratory chemical analysis, this practice has not yet become econom- is based on multivariate calibration models developed from spectral
ically viable (Nanni et al., 2011). libraries that correlate reflectance spectra with laboratory-measured
Remote and proximal sensing techniques have been proposed as soil property data (Gupta et al., 2018; Pabón et al., 2019; Stevens et al.,


Corresponding author at: Federal University of Rio Grande do Sul State, Brazil.
E-mail address: janderson.ferreirac@gmail.com (J.J.F. Costa).

https://doi.org/10.1016/j.compag.2020.105710
Received 6 May 2020; Received in revised form 4 August 2020; Accepted 9 August 2020
Available online 20 August 2020
0168-1699/ © 2020 Elsevier B.V. All rights reserved.
J.J.F. Costa, et al. Computers and Electronics in Agriculture 177 (2020) 105710

2013). through the Munsell Color System (Munsell Soil Color Charts, 2000)
Due to the wide variation in the mineralogical composition and which classifies colors into three components: hue, value, and chroma,
physical and chemical properties of soils, many studies have used where hue is the dominant spectral color; value indicates the degree of
spectral samples and libraries that represent the variability of soils, as lightness or darkness of a color, and chroma is the purity or strength of
follows: at local scale (Dotto et al., 2016; Gholizadeh et al., 2016; Silva the spectral color (Soil Survey Division Staff, 2017). However, more
et al., 2016); at regional scale (Franceschini et al., 2013; Ramirez-Lopez recent approaches to color measurement include the use of computer-
et al., 2013); at national scale (Araújo et al., 2014; Shi et al., 2014; coupled sensors such as colorimeters, spectrophotometers and spec-
Viscarra Rossel and Chen, 2011); at continental scale (Nocita et al., troradiometers (Levin et al., 2005; Moreno-Ramón et al., 2014; Viscarra
2014; Stevens and Ramirez Lopez, 2014), and at global scale (Ramirez- Rossel et al., 2008). These devices record the energy reflected by the
Lopez et al., 2013; Viscarra Rossel et al., 2016). Other studies re- soil, which absorbs certain wavelengths, and the combination of the
commend an adaptation of the models according to the characteristics reflected wavelengths determines soil color (Schanda, 2007; Viscarra
of the local samples, called “spiking” samples (Guerrero et al., 2014, Rossel et al., 2009). Thus, soil color can be derived from spectral curves
2010; Zhao et al., 2018), where representative local samples of the generated for previous studies at no additional cost and can be re-
study area are inserted in spectral libraries on a regional scale. The presented by color space models.
development of these libraries with a large number of samples is in- So far, although color parameters have been applied in studies to
tended to ensure that local samples fall within the domain of the model estimate soil color, mineral composition and clay content (Aitkenhead
(Grinand et al., 2012; Viscarra Rossel and Webster, 2012). However, et al., 2013a; Dominguez et al., 2010; Viscarra Rossel et al., 2009), in
there is no certainty because soils have highly variable features, even at the identification of SOC content (Vodyanitskii and Savichev, 2017),
a regional scale, making assessment more difficult, impacting model and discrimination of sediment source (Martínez-Carreras et al., 2010a,
performance (Stenberg et al., 2010; Viscarra Rossel et al., 2016). 2010b; Tiecher et al., 2015), there is no attempt available in the lit-
To improve model performance, many studies have used different erature using color parameters to stratify a set soil samples and group
types of pre-processing techniques and multivariate models (Table 1). them into homogeneous groups to predict soil properties. Mouazen
Allo et al. (2020), on a local scale, with 95 samples, applied the PLSR et al. (2007) evaluated soil color using Munsell chart for using in soil
algorithm and three pre-processing techniques to a data set with soil classification. They used the parameters Hue, Value and Chroma to
organic carbon (SOC) content ranging from 0.4 to 13% and found that characterize soil colors and PCA to discriminate samples. However,
the best performance was obtained with the use of a combination of color can be better represented when a larger number of parameters
pre-processing techniques (DET + FDSG). Allory et al. (2019) observed representing color in different scales are used (Viscarra Rossel et al.,
a better predictive capacity using a combination of pre-processing 2006). Therefore, the objectives of this study were to use Vis-based-
techniques (SG + SNV) that showed better performance when only one color parameters combined with multivariate statistical techniques to
pre-processing technique was used. Likewise, Dotto et al. (2017), on a group soil samples and comparing the results of different SOC, clay,
local scale, with 299 soil samples and clay content ranging from 21 to sand and silt prediction models.
78% and sand ranging from 1 to 35%, obtained R2 of 0.62 and RMSE of
6.8% for clay and R2 of 0.25 and RMSE of 6.4% for sand, using the SVM 2. Material and methods
algorithm and the DET pre-processing technique.
However, the prediction results obtained by Demattê et al. (2016), 2.1. Characterization of the study area
on a different scale, showed differences in the performance of SOC
content models and sand, clay and silt fractions, which may have been The study area is located in São Joaquim National Park (SJNP),
influenced by the number of samples in the calibration data set. This southern plateau region of the State of Santa Catarina (SC), Brazil
demonstrates that the input data structure is a factor that also influ- (Fig. 1). This Conservation Unit (CU) has an approximate area of
ences the predictive capacity of the models. 49,300 ha, with very rugged relief and altitudes ranging from 300 to
At present, only a few publications have addressed the effect that 1826 m. The park is located in a cold region, and the average annual
input data structure and soil heterogeneity have on model performance. temperature ranges from 12 °C to 14 °C (Vianna et al., 2015). It is the
Jiang et al. (2017) collected soil samples in surface (0–0.1 m) and only park in Brazil devoted to the conservation of Araucaria (Araucaria
subsurface (0.1–0.3 m) layers in central China and identified differ- angustifolia) and covers four main types of phytophysiognomy (Klein,
ences in the variability of SOC levels for each soil depth, which affected 1978): Araucaria forest, subtropical forest, dwarf cloud forest or dwarf
the performance of prediction models. Araújo et al. (2014) by segre- cloud highland forest and campos gerais de altitude (natural highland
gating data from a spectral library of soil into more homogeneous grasslands). The soils are mostly formed by volcanic and sedimentary
groups in relation to the mineralogy of the clay fraction, obtained a rocks of the Paraná Basin of the Serra Geral group (Serra Geral, Palmas,
reduction of 21% in the prediction error of that fraction. Moura-Bueno Paranapanema, Gramado and Botucatu formation) and the Passa Dois
et al. (2019) found that segregating samples based on soil classes, uses group (Rio do Rastro and Teresina formation) (Embrapa, 2004; Wildner
and layers, improved the accuracy of multivariate models in the pre- et al., 2014). The volcanic rocks of Serra Geral formation are dominant
diction of soil carbon. However, one possibility that can still be ex- and characterized by basalts and andesites (basic) and rhyolites and
plored is the establishment of a criterion for grouping samples based on riodacites (acidic). The soils in this region usually have a humic or
soil color and dividing them into homogeneous groups according to soil prominent A horizons (Dalmolin et al., 2017; Embrapa, 2004), with
features. There are many known relationships between soil color and its very high exchangeable Al contents and high potential acidity, low sum
mineralogical, physical and chemical characteristics (Aitkenhead et al., of base cations and base saturation, and with an accentuated undulating
2013b, 2013a; Ben-Dor et al., 1997; Davey et al., 1975; Murti and relief. These soils are moderately shallow and can be classified as En-
Satyanarayana, 1971; Viscarra Rossel et al., 2009; Vodyanitskii and tisols and Inceptisols (Dalmolin et al., 2017; Embrapa, 2004).
Savichev, 2017). Examples of variations in soil color associated with Research modules of the Biodiversity Research Program, Atlantic
soil properties include dark color of organic matter (OM), white color of Forest, Santa Catarina (PPBio-MA-SC) were installed in this area. The
quartz, calcite and other carbonates, red color of hematite (α-Fe2O3) module area has 500 ha (1 × 5 km) and is composed of two east-west
and yellow color of goethite (α- FeOOH) (Aitkenhead et al., 2013b, accesses. Inside module, 20 plots were installed: 10 terrestrial (Fig. 1)
2013a). Thus, previous clustering of soils by color may favor modeling, and 10 riparian plots. The plots are 250 m corridors along the contour
providing prediction models with subgroups with less variability of lines and are demarcated by 6 pickets, one every 50 m (0, 50, 100, 150,
organic matter, quartz, iron oxides or other elements. 200 and 250 m). A total of 260 soil samples were collected using two
The easiest and fastest way to measure or characterize soil color is sampling systems. In the first sampling system 145 soil samples were

2
J.J.F. Costa, et al. Computers and Electronics in Agriculture 177 (2020) 105710

Table 1
Selection of studies that applied different types of pre-processing and multivariate models to estimate SOC, sand, silt and clay contents using spectroscopy.
b a
Scale M. method S. pre-processing Samples SOC Clay Sand Silt Source

R2v RMSE% R2v RMSE% R2v RMSE% R2v RMSE%

Local PLSR PCA 144 0.84 0.1 0.78 3.9 0.79 3.4 0.65 2.6 (Margenot et al., 2020)
Local PLSR - 95 0.92 0.8 n.a. n.a. n.a. n.a. n.a. n.a. (Allo et al., 2020)
Local PLSR DET + FDSG 95 0.94 0.7 n.a. n.a. n.a. n.a. n.a. n.a. (Allo et al., 2020)
Local PLSR SNV + FDSG 95 0.92 0.8 n.a. n.a. n.a. n.a. n.a. n.a. (Allo et al., 2020)
Local PLSR SG 186 0.40 1.0 n.a. n.a. n.a. n.a. n.a. n.a. (Allory et al., 2019)
Local PLSR SG + SNV 186 0.83 0.5 n.a. n.a. n.a. n.a. n.a. n.a. (Allory et al., 2019)
Local PLSR SG + FDSG 186 0.83 0.5 n.a. n.a. n.a. n.a. n.a. n.a. (Allory et al., 2019)
Local PLSR SG + SDSG 186 0.82 0.5 n.a. n.a. n.a. n.a. n.a. n.a. (Allory et al., 2019)
National Cubist CRR 39,284 n.a. n.a. 0.88 7.6 0.87 10.3 n.a. n.a. (Demattê et al., 2019)
Local PLSR MSC 90 0.66 0.2 n.a. n.a. n.a. n.a. n.a. n.a. (Hutengs et al., 2019)
Regional PLSR SG 1013 n.a. n.a. 0.73 5.4 0.72 9.2 0.59 5.9 (Vasava et al., 2019)
National PLSR SNV, SG 10,802 n.a. n.a. 0.97 1.9 0.96 6.6 0.94 5.6 (Jaconi et al., 2019)
Local PLSR SG 148 0.74 0.2 n.a. n.a. n.a. n.a. n.a. n.a. (Nawar and Mouazen, 2018)
Regional PLSR SG 591 0.80 0.1 0.58 6.2 0.67 7.8 0.70 7.8 (Xu et al., 2018)
Local SVM DET 299 0.86 0.4 0.62 6.8 0.25 6.4 0.50 6.2 (Dotto et al., 2017)
Local SVM CRR 299 0.86 0.4 0.58 7.2 0.25 6.3 0.50 6.2 (Dotto et al., 2017)
Local PLSR CRR 299 0.83 0.4 0.52 7.5 0.17 6.5 0.56 5.3 (Dotto et al., 2017)
Local PLSR CRR + BR 299 0.79 0.5 0.45 8.3 0.18 6.5 0.44 6.7 (Dotto et al., 2017)
Local PLSR DET 299 0.72 0.5 0.40 8.8 0.19 6.5 0.44 6.7 (Dotto et al., 2017)
Local SVR FDSG 300 n.a. n.a. 0.53 7.9 0.18 5.5 0.39 9.1 (Duda et al., 2017)
Local PCR – 216 0.69 0.9 n.a. n.a. n.a. n.a. n.a. n.a. (Lucà et al., 2017)
Local PLSR SNV 216 0.79 0.7 n.a. n.a. n.a. n.a. n.a. n.a. (Lucà et al., 2017)
Local SVM – 216 0.82 0.7 n.a. n.a. n.a. n.a. n.a. n.a. (Lucà et al., 2017)
Local PLSR MSC, FDSG 200 n.a. n.a. 0.78 6.9 0.82 6.7 0.42 2.1 (Nanni et al., 2017)
Local PLSR SG/SNV 434 0.71 0.6 0.78 6.2 0.62 11.5 0.36 9.5 (Pinheiro et al., 2017)
Local Cubist SDSG/SNV 257 n.a. n.a. 0.70 14.7 0.50 18.3 0.00 16.7 (Zhang et al., 2017)
Regional PLSR PCA 7185 0.63 0.3 0.85 9.8 0.86 10.2 0.51 3.8 (Demattê et al., 2016)
Regional PLSR PCA 903 0.65 0.3 0.77 10.4 0.78 11.8 0.55 3.3 (Demattê et al., 2016)
Regional PLSR PCA 3093 0.61 0.2 0.67 3.2 0.64 4.2 0.30 2.6 (Demattê et al., 2016)
Local PLSR PCA 621 0.62 0.2 0.77 7.3 0.77 7.3 0.21 1.4 (Demattê et al., 2016)
Local PLSR PCA 563 0.54 0.3 0.74 11.3 0.77 12.7 0.71 3.1 (Demattê et al., 2016)
Local PLSR PCA 843 0.66 0.1 0.71 2.8 0.68 3.8 0.21 3.0 (Demattê et al., 2016)
Local PLSR PCA 541 0.61 0.4 0.55 7.6 0.45 6.7 0.50 3.8 (Demattê et al., 2016)
Global Cubist CRR 23,631 0.92 1.1 0.80 10.3 0.68 18.8 0.79 10.3 (Viscarra Rossel et al., 2016)
Local PLSR SNV 306 n.a. n.a. 0.78 3.1 n.a. n.a. n.a. n.a. (Camargo et al., 2015)
Regional PLSR SDSG 7172 0.60 0.6 0.82 10.9 n.a. n.a. n.a. n.a. (Araújo et al., 2014)
Regional SVM SDSG 7172 0.69 0.5 0.89 8.9 n.a. n.a. n.a. n.a. (Araújo et al., 2014)
Local PLSR SVN, SG, MSC 129 n.a. n.a. 0.92 5.2 0.87 7.3 0.80 3.1 (Franceschini et al., 2013)
Regional PLSR RAW 129 0.65 0.4 n.a. n.a. n.a. n.a. n.a. n.a. (Cambule et al., 2012)
Regional PLSR RAW-FDSG 129 0.85 0.3 n.a. n.a. n.a. n.a. n.a. n.a. (Cambule et al., 2012)
Regional PLSR SGD-FDSG 129 0.81 0.3 n.a. n.a. n.a. n.a. n.a. n.a. (Cambule et al., 2012)
Regional PLSR MSC 129 0.75 0.3 n.a. n.a. n.a. n.a. n.a. n.a. (Cambule et al., 2012)
Regional PLSR MSC – SGD 129 0.77 0.3 n.a. n.a. n.a. n.a. n.a. n.a. (Cambule et al., 2012)
Regional PLSR SNV 129 0.73 0.4 n.a. n.a. n.a. n.a. n.a. n.a. (Cambule et al., 2012)
Regional PLSR MSC – FDSG 129 0.78 0.3 n.a. n.a. n.a. n.a. n.a. n.a. (Cambule et al., 2012)
Regional PLSR MSC – FDSG 129 0.84 0.3 n.a. n.a. n.a. n.a. n.a. n.a. (Cambule et al., 2012)
Local PLSR SNV 148 n.a. n.a. 0.74 12.0 0.56 24.0 0.46 49.0 (Vendrame et al., 2012)
Global BTR SMO, FDSG 3793 0.82 0.9 0.73 9.5 n.a. n.a. n.a. n.a. (Brown et al., 2006)

a
PLSR: partial least squares regression; SVM: support vector machine; GPR: Gaussian process regression; RF: random forest; MARS: multivariate adaptive re-
gression splines; BTR: boosted regression trees; BR: band ratio; CRR: continuum removal; MSC: multiplicative scatter-correction; DET: detrend; SDSG: Savitzky–Golay
second derivative; FDSG: Savitzky–Golay first derivative; SNV: standard normal variate; PCA: principal component analysis; DER: derivative; R2v: coefficient of
determination; RMSEv: root mean square error of validation; n.a.: Not available.
b
The criterion for differentiating at the scale of the spectral library is usually the spatial coverage and the number of soil samples.

collected (in terrestrial plots) using the PPBio protocol, with adaptation made following the criterion of larger correlation (r greater than 0.4)
in the collection depth according to GlobalSoilMap.Net (0–10, 10–20, between the covariables selected in the previous step and the SOC
20–30 and 30–60 cm depth). In the second sampling system 115 soil content obtained by soil analysis of the samples collected in terrestrial
samples were collected (outside the plots, in different locations of the plots. Thus, two variables with higher and lower r were selected,
module), using the conditioned Latin Hypercube Sampling (cLHs) ap- namely altitude and slope.
proach (Minasny and McBratney, 2006). This sampling used a set of
continuous or categorical environmental covariables that represent the
“predictor space” in which the most representative landscape sites were 2.2. Spectral reflectance measurements and laboratory analysis
selected.
A set of environmental covariables were selected to compose the Soil reflectance data were obtained by Vis-NIR-SWIR spectroscopy,
predictor space of the module based on the methodology proposed by based on the absorption spectrum produced after the reading of the soil
Stumpf et al. (2016), with adaptations. The selection was based on the samples with a spectral sensor. The spectral data ware obtained using a
lowest correlation (r < 0.4) between the environmental covariables to FieldSpec 3 spectroradiometer (Analytical Spectral Devices, Boulder,
avoid collinearity (Mulder et al., 2012). Another selection was then USA) with the ASD Contact Probe® set that covers the spectral range
350–2500 nm at 1 nm resolution. For spectra acquisition, the samples

3
J.J.F. Costa, et al. Computers and Electronics in Agriculture 177 (2020) 105710

Fig. 1. Research module and location of plots where soil samples were collected at São Joaquim. National Park, in Santa Catarina State, Brazil, with 95 soil collection
points.

were air-dried, ground and sieved (2 mm). The samples were placed in virtual components of the primary spectra. The derived XYZ values
petri dishes (7.5 cm in diameter and 2 cm in height) and their surface were then transformed into nineteen other color space models (i.e.,
was homogenized so that there was no interference in the readings. We RGB, Munsell HVC, CIE xyY, CIE La*b*, CIE Lu*v*, CIE Lc*h*, and
used a tripod that keeps the probe constant, moving the sample so that CMYK chromaticity coordinates) using the Munsell Conversion soft-
there was contact with the Contact Probe. The sensor was calibrated at ware (WallkillColor, 2019). In the CIE xyY system, Y represents lumi-
the beginning of spectral measurements and every 20 min during nance and x and y represent color variations from blue to red and blue
readings using a Spectralon white plate (Labsphere, North Sutton, NH, to green, respectively. In the CIE La*b* and CIE Lu*v* systems, L re-
USA) with more than 100% reflectance. Three readings were obtained presents brightness or luminance, and a* and b* and u* and v* re-
for each sample and the mean spectral curve was used for analysis. The present chromaticity coordinates, as opponent red-green and blue-
acquisition of spectra was obtained according to Ben Dor et al. (2015) yellow scales. CIE Lc*h* model represents a transformation of the CIE
and Romero et al. (2018). To reduce interference in the spectra caused La*b* spherical color space into cylindrical coordinates, resulting in
by the soil-device-light source environment combination, the original hue (h*) and chroma (c*) values. The RGB system forms a cube com-
spectral curves were only smoothed using Savitzky-Golay filter (with prising red (R), green (G) and blue (B) orthogonal axes, from which
adjustment through second-order polygon and 1 nm search window). each color can be produced by a mixture of these three primary colors.
For the determination of soil SOC contents, subsamples with ap- The CMYK model is an abbreviation of the system formed by the colors
proximately 3 g of soil were milled and sieved < 0.25 mm. The sub- Cyan, Magenta, Yellow and Black. The Munsell HVC system used in soil
samples were stored in Eppendorf vials for dry combustion analysis in a science describes soil color using hue (H), value (V), and chroma (C).
Flash 2000 Organic Elemental Analyzer (Zobeck et al., 2013). The Table 2 presents a summary of the color space models used in this study
particle size fractions of clay, silt, and sand were determined using the and the abbreviations for all 22 calculated color parameters.
pipette method (Donagemma et al., 2017).

2.3. Vis-based-color parameters calculation 2.4. Descriptive and multivariate statistical analyzes

The methodological sequence applied in this study is shown in the Descriptive statistics was performed for SOC contents and sand, silt
flow chart of Fig. 2. Twenty-two (22) components were derived from and clay fractions, showing a great variability of these properties,
the Vis spectrum using various colorimetry models described in detail especially for SOC contents (Table 3). Thus, criteria had to be estab-
by Viscarra Rossel et al. (2006). The Commission Internationale de lished for dividing the group of samples into subgroups. Two multi-
l'Eclairage (CIE, 1996) proposed the CIE models to facilitate visualiza- variate statistical analyzes were applied to the color parameter values:
tion and to standardize color models. XYZ tristimulus values were Principal Component Analysis (PCA) and Cluster Analysis of the sam-
calculated based on the color-matching functions defined in 1931 by ples (or cluster), both performed on the R computing platform (R
the CIE (CIE, 1996), where Y represents the brightness and X and Z are Development Core Team, 2017).

4
J.J.F. Costa, et al. Computers and Electronics in Agriculture 177 (2020) 105710

Fig. 2. Methodological sequence of spectral reflectance measurements and laboratory analysis. In the sequence, the Vis-based-color parameters calculation, the
multivariate analysis and the construction of the prediction models. MWS-260: model without stratification using 260 soil samples; MWS-95: model without stra-
tification using 95 selected samples.

2.5. Principal component analysis (PCA) variables, thus simplifying mathematical relationships, as suggested by
Legendre and Legendre (1998). Then, the variables were linearly
Principal component analysis was applied to color parameter values transformed through a data translation and expansion process, by
to reduce the dimensionality of such data in multivariate space and subtracting a constant (mean) from each value and then dividing them
visualize their distribution patterns and structures that allow them to by another constant (standard deviation). This transformation is called
identify clusters and outliers (Galvãdo et al., 1995). Each principal standardization or z-scores, and is often used in PCA. The scale function
component is represented by an axis orthogonal to the other axes, of the base package was used to perform this transformation. PCA was
which is formed by a linear combination of the original variables. Ac- applied with the use of the princomp function of the stats package on
cording to this technique, the distance preserved between the de- descriptor values to extract information from the 22 color parameters
scriptors was the Euclidean distance and the relationships identified are across the entire sample set, on a single 260 × 22 matrix. Principal
linear (Borcard et al., 2011). The descriptors used in this study were the component scores were used to determine the optimal number of
color parameters: XYZ, RGB, Munsell HVC color coordinates, CIE xyY, clusters in the subsequent step. A general implementation of PCA is
CIE La*b*, CIE Lu*v*, CIE Lc*h* and CMYK. As these variables are given by Eq. (1).
measured in different units of measurement, the values had to be
converted to the same scale to equalize the statistical significance of all PC1 = (B1 .e1,1) + (B2 .e1,2) + ...+(Bn .e1,n)

5
J.J.F. Costa, et al. Computers and Electronics in Agriculture 177 (2020) 105710

Table 2 (color parameters); e is the eigenvector (contribution of each original


Vis-based-color parameters derived from different color space models and cal- data to the score), and n is the number of parameters.
culated using Munsell Conversion software (WallkillColor, 2019).
Color space Color parameters Parameter abbreviation 2.6. Cluster analysis of samples
template
For the stratification of the set of soil samples, Fuzzy K-means (FKM)
RGB Red R
Green G clustering was applied to the PCA scores, to discriminate different soil
Blue B samples based on color parameter values that were described in detail
CIE xyY Chromatic coordinate x x
by Terra et al. (2018). The most appropriate number of clusters and
Chromatic coordinate y y PCA scores for clustering were established by the following indices:
Brightness Y partition coefficient (PC) (Eq. (2)), partition entropy (PE) (Eq. (3)) and
CIE XYZ Virtual component X X modified partition coefficient (MPC) (Eq. (4)) (Ferraro and Giordani,
Virtual component Z Z 2015; Wu and Yang, 2005). Two (2) to 4 PCA scores and 2 to 8 clusters
CIE Luv Metric lightness function L were tested. According to the best PC and MPC score, it was defined
Chromatic coordinate opponent u* that the ideal number of clusters would be three. Therefore, the group
red-green scales of 260 soil samples was divided into three subgroups, as follows: the
Chromatic coordinate opponent v* first with 91 samples; the second with 95 samples and the third with 74
blue-yellow scales
samples. To facilitate understanding, the subgroups were considered as
CIE Lab Chromatic coordinate opponent a* clusters and the whole sample set containing all samples was con-
red-green scales
sidered as model without stratification (MWS-260). The model without
Chromatic coordinate opponent b*
blue-yellow scales stratification with 95 selected samples was considered as MWS-95.
CIE Lch CIE chroma c* k
1 n
CIE hue h* PC(k) =
n
∑j =1 ∑ u2ij
i=1 (2)
CMYK Cyan C
Magenta M where ≤ PC (k) ≤ 1. In general, we find an optimal cluster number
1
k
Yellow Ye
k* by solving max2⩽k⩽n−1PC(k) to produce a best clustering perfor-
Black K
mance for the data set; uij is the degree of membership of sample j.
Munsell HVC Hue Hue
k
Value Value 1 n
Chroma Chroma PE(k) = −
n
∑j =1 ∑ uijlog2uij
i=1 (3)
where 0 ≤ PE (k) ≤ log 2 k. In general, we find an optimal k* by solving
Table 3 min2⩽k⩽n−1PE(k) to produce a best clustering performance for the data
Descriptive statistics of soil properties and color parameters.
set.
Soil properties and Mean SD Minimum Maximum CV %
parameters
k
MPC(k) = 1 − (1 − PC(k))
k−1 (4)
SOC (%) 11.07 7.70 0.64 34.84 70
Clay (%) 34.19 16.53 8.83 65.72 48
where 0 ≤ MPC (k) ≤ 1. In general, an optimal cluster number k* is
Sand (%) 27.30 19.95 2.77 79.74 73 found by solving max2⩽k⩽n−1MPC(k) to produce a best clustering
Silt (%) 38.50 12.39 6.74 65.53 32 performance for the data set.
Hue 9.04YR 0.48YR 7.69YR 10.00YR 5
Value 3.14 0.74 1.64 5.15 23
Chroma 2.47 1.07 0.73 5.24 43 2.7. Predictive modeling and validation
X 8.22 4.08 2.37 22.58 50
Y 7.75 3.72 2.31 21.16 48 Predictive models for SOC, sand, silt and clay contents were cali-
Z 4.92 1.52 1.89 13.55 31 brated for each cluster and also for the MWS-260 and MWS-95. Data
x 0.38 0.02 0.34 0.43 6
were divided into two sets with 80% of samples for calibration (ran-
y 0.37 0.02 0.34 0.39 4
L 32.27 7.64 17.01 53.12 24 domly selected) and 20% for validation. Thus, the calibration sets of
a* 5.13 2.15 1.89 10.84 42 clusters 1, 2 and 3 contained 73, 76 and 59 samples, respectively (i.e.
b* 14.63 6.78 3.99 31.23 46 80% of the samples of each cluster). The validation sets of clusters 1, 2
u* 14.08 7.05 3.41 33.04 50
and 3 contained 18, 19 and 15 samples, respectively (i.e. 20% of the
v* 16.02 7.98 3.55 35.78 50
c* 15.52 7.10 4.41 32.82 46
cluster). In total, 208 samples were used to calibrate the models and in
h* 70.19 2.01 64.57 74.82 3 total 52 samples were predicted. To build the MWS-260 model, the
R 92.02 23.21 50.00 157.00 25 same 208 samples from the clusters were used to calibrate the model
G 73.78 15.29 44.00 120.00 21 and 52 samples were used for validation. MWS with 95 selected sam-
B 55.64 6.71 37.00 89.00 12
ples were also built, with a number of samples in the same range as the
C 19.03 0.16 19.00 20.00 1
M 26.65 3.23 22.00 35.00 12 clusters. Thus, the stratified and non-stratified groups are fully
Ye 33.76 6.94 23.00 52.00 21 equivalent and allowed a fair comparison of accuracy indices. Two
K 43.20 9.11 18.00 60.00 21 accuracy indices were used to assess the performance of the predictive
models: the coefficient of determination (R2v), which ranges from 0 to 1
SD: standard deviation; CV: Coefficient of variation.
and provides the percentage of variation explained by the model, and
the root mean square error (RMSEv), which measures the overall ac-
PC2 = (B1 .e2,1) + (B2 .e2,2) + ...+(Bn .e2,n)
curacy of the prediction model.
… In the prediction models, each time the model is run, a different R2
and RMSE value can be obtained, since random sampling is performed
PCn = (B1 .e n,1) + (B2 .e n,2) + ...+(Bn .en,n) (1)
in the sample set (randomized). To verify the variation in the prediction
where: PC is the principal component (scores); B is the original data of the models, the following simulations were performed: 1st) For each

6
J.J.F. Costa, et al. Computers and Electronics in Agriculture 177 (2020) 105710

cluster the prediction models were run 18 times (3 clusters × 18 re- models is related to the yellow color of soils, reflecting differences in
petitions = 54 prediction models for each soil property); 2nd) the same geological and pedogenetic conditions (Fischer et al., 2010) and can be
was done for the MWS-260 and MWS-95. (1 group × 18 repetitions for an estimate of goethite content, with a good contribution (0.98) in PC1
each soil property). The repetitions were defined according to the (Fig. 3). Thus, Vis-based-color parameters can be applied to determine
power of the statistical test (95%) to verify the variability of prediction soil color, mineral composition and clay content (Aitkenhead et al.,
of the models. Power is the probability that the null hypothesis will be 2013a; Dominguez et al., 2010; Viscarra Rossel et al., 2009), and in the
rejected in favor of the alternative hypothesis (Uttley, 2019). The power identification of SOC content (Vodyanitskii and Savichev, 2017).
of statistical testing was performed with the use of the stats R package
(R Development Core Team, 2017).
Thus, estimates were obtained considering the average of R2 and 3.3. Cluster analysis
RMSE, and the variation presented by the models. The models were
calibrated with the combination of the Vis-NIR-SWIR spectral curves Fuzzy k-means (FKM) clustering was used to partition the soil
corresponding to each soil. Two multivariate methods were im- sample set into more homogeneous groups (clusters). Based on cluster
plemented to evaluate the predictive performance: partial least squares evaluations (Table 4) and considering the color parameter values, soil
regression (PLSR) in the pls package (Liland, 2013) and support vector samples were divided into 3 clusters. The first two PCA scores were
machines (SVM) in the e1071 package (Dimitriadou et al., 2015). For used, which explained 91% of data variability (Table 4). The choice of
the comparison of the performance of the models, Student's t-test was the optimal number of clusters considered higher PC (0.99) and MPC
applied to RMSE values to check for differences between the means of (0.99) values, indicating good partition and also lower PE values (0.02),
the clusters and of the MWS-260 and MWS-95. Student's t-test was which imply a sharper partition (“crisp class”), as suggested by Bezdek
performed using the stats R package (R Development Core Team, 2017). et al. (1984) and Viscarra Rossel et al. (2016).
In FKM grouping, soil samples were sorted according to their
3. Results and discussion characteristics. The average spectra of all samples in each cluster
showed similar spectral absorption peak characteristics (Fig. 5a).
3.1. Descriptive statistics However, there was a statistically significant difference in the mean
reflectance value of the clusters, particularly because of the difference
Soil organic carbon and sand have coefficients of variation (CV) in mean SOC contents and the tendency for reflectance to decrease with
greater than 50%, characterizing the heterogeneity of these soil phy- increased SOC contents. For example, cluster 1 had the highest mean
sical and chemical properties (Table 3). Silt was the soil property with reflectance values and lowest SOC contents (4% on average) (Figs. 4a
the lowest CV, followed by clay, 32 and 48%, respectively. The varia- and 5a). In contrast, cluster 2 had the lowest mean reflectance and the
bility of SOC and sand contents is associated with the humic char- highest SOC contents (18% on average). These findings are consistent
acteristics of the soils conferred by climate condition, slope and geo- with (Liu et al., 2019), who applied cluster analysis to soil samples and
logical formation of the area (Dalmolin et al., 2017). The presence of found that changes in SOC content affected soil clustering. A similar
altitude fields with accumulation of SOC in the superficial layers of the spectral behavior was reported by Stenberg et al. (2010) and Nocita
soil, in the flat areas (shallow soils) and the formation of pine forests in et al. (2014) who found differences in reflectance for organic soils in the
the slopes (deeper soils) are also responsible for the granulometric and NIR wavelength range. Another key aspect in spectral analysis is the
SOC content variability of the samples. This variability is reflected in fact that the soils in cluster 2 had the highest contents of SOC and sand
both the distribution of soil properties and quantified soil color mea- (Fig. 4a and c). However, SOC has low reflectance and when it contains
surements (Table 3). Soil hues ranged from 7.69YR to 10.00YR, with materials with higher reflectance (such as quartz, the dominant com-
the value ranging from 1.64 to 5.15 units and chroma from 0.73 to 5.24 ponent of sand), it masks spectral response, reducing reflectance and
units. Since Munsell's soil color measurement is subjective (Post et al., hence the contrast of material absorption features (Demattê et al.,
1994), a Vis-based-color may provide fast quantitative measurements of 2019). Absorption at 2207 nm related to clay minerals was less evident
soil color (Viscarra Rossel et al., 2003) and assist in the selection of the for cluster 2, as it grouped samples with lower clay content. By contrast,
predominant soil color. absorption was more noticeable in cluster 1 because it grouped samples
with higher clay content (Fig. 4b and 5a).
3.2. Principal component analysis (PCA) Cluster 1 samples showed well-defined iron oxide (FeIII) absorption
features in the bands at 480, 550 and 850 nm (Fig. 5b), indicating the
According to the results of PCA (Fig. 3) in the first principal com- presence of goethite and hematite (Dalmolin et al., 2005; Moura-Bueno
ponent (PC1), which accounts for 84% of the total variation, virtually et al., 2019). Absorption bands were identified at wavelengths 1400
all parameters had significant contribution values (0.92 on average) and 2200 nm due to molecular vibrations of OH and Al-OH groups in
except for h* (0.54) and C (0.41) which had the lowest positive con- clay minerals, mainly related to the presence of kaolinite and/or
tribution values. The second principal component (PC2), accounting for montmorillonite (Dalmolin et al., 2005). In cluster 1, at certain wave-
7%, is associated with a strong influence of Hue (0.86) and h* (0.83). lengths, higher energy absorption rates were observed because the
The color parameters that best determined the behavior of the soil samples had clay texture and higher iron oxide content (FeIII), while in
sample set were value, chroma, CIE Lb*, CIE Lv*c*, RG and MYK (0.92 the samples of clusters 2 and 3, with a lower percentage of clay and
on average) in the first axis and Hue (0.86) in the second axis. These lower oxide content, this behavior was not observed (Fig. 5a and b).
results are consistent with Viscarra Rossel et al. (2003) who, in their Soil organic carbon content influenced the shape and albedo of the
studies of the most appropriate color models for quantitative descrip- whole spectral curve (Fig. 5a and b). In the literature, this soil property
tion of soil color and its relationship with SOC contents, found that the can have a greater influence in the region between 400 and 1000 nm.
parameters lightness (L), R from RGB model and v* from CIE Lu*v* This characteristic was observed in samples with high SOC content,
model had higher correlations with SOC content. Similarly, Baumann which showed reduced albedo throughout the spectral curve, mainly in
et al. (2016), when using soil color and spectroscopy to predict SOC, samples of clusters 2 and 3, but reductions were higher in the regions
found that lightness (L) has a good correlation with SOC contents. between 400 and 1000 nm. This reduction in albedo due to a higher
Lightness parameters (L) vary according to geographic region and land SOC content was observed by Liu et al. (2019) and Moura-Bueno et al.
use and are related to SOC content (Spielvogel et al., 2004). The a* (2019) who found that SOC content variability influenced the behavior
value of CIE La*b* models is related to the red color of soils that may be of spectral curves.
influenced by Fe oxides, mainly hematite. The b* value of CIE La*b*

7
J.J.F. Costa, et al. Computers and Electronics in Agriculture 177 (2020) 105710

Fig. 3. Result of principal component analysis of color parameters that best explained variation in soil informations based on soil color.

Table 4 respectively. In the SVM method, the cluster 3 obtained the best indices
Fuzzy k-means clustering assessments and selection of the optimal number of with mean values of R2 and RMSE of 0.82 and 2.7%, respectively. While
clusters considering the higher PC and MPC values and also lower PE values. cluster 2 obtained the highest mean values of RMSE of 2.9%. In the
Number Cumulative Indexes Number of Clusters prediction of clay content with the use of the PLSR method, the cluster
of Scores Variation% 2 obtained the best indices with mean values of R2 and RMSE of 0.72
8 7 6 5 4 3 2 and 7%, respectively. While cluster 3 obtained the highest mean value
RMSE of 9%. In the SVM method, the cluster 2 obtained mean values of
2 91 PC 0.93 0.95 0.95 0.98 0.97 0.99 0.98
PE 0.13 0.09 0.10 0.04 0.05 0.02 0.04 R2 and RMSE of 0.59 and 8.3%, respectively. While cluster 1 obtained
MPC 0.92 0.94 0.93 0.97 0.97 0.99 0.95 the lowest performance with mean values of R2 and RMSE of 0.30 and
3 96 PC 0.97 0.98 0.98 0.96 0.95 0.92 0.94
9.6%, respectively. When using the MWS with lower number of samples
PE 0.97 0.05 0.06 0.09 0.10 0.15 0.10 (n = 95), it was observed that the calibration models showed, on
MPC 0.97 0.98 0.97 0.95 0.94 0.88 0.89 average, a poor quality of prediction of the SOC content and clay
4 99 PC 0.97 0.99 0.99 0.96 0.93 0.90 0.92 fraction for the different multivariate models. For the SOC content the
PE 0.06 0.03 0.04 0.08 0.14 0.18 0.14 RMSE increased, on average, from 2.6 to 3.2% and from 2.6 to 3.3%,
MPC 0.97 0.99 0.98 0.96 0.91 0.86 0.84 using the SVM and PLSR models, respectively. Likewise, for the clay
fraction the RMSE increased, on average, from 8.8 to 10.7% and from
PC: partition coefficient; PE: partition entropy; MPC: modified partition coef-
8.8 to 11%, using the SVM and PLSR models, respectively. The results
ficient.
of the standard deviation indicate that there was greater absolute
variability of cluster validations (Table 5), especially for the cluster
3.4. Prediction performance of spectral models
containing soil samples with lower SOC contents. With the repetitions,
more accurate prediction values can be obtained, considering the
Assessment of the performance of the models considered the 18
changes in each model, without depending on the choice of the best or
repetitions for each prediction model (MWS-260, MWS-95 and clus-
worst prediction model.
ters). Thus, the average R2v and RMSEv values were considered to assess
In general, the error of the PLSR model (clusters and MWS-260) was
the performance of the two methods. The results (presented in Table 5)
smaller than those observed for the SVM model (Table 5). The mean
show changes in the prediction of soil properties, and the prediction
error of the models was often related to increased SOC content (Table 5
performances of the PLSR models, were on average better than those of
and Fig. 4a). For example, for cluster 1, the mean SOC content was 4%
SVM models. In SOC modeling, with the use of the PLSR method, the
and RMSE values were on average 1.2 and 1.4% for the SVM and PLSR
cluster 3 obtained the best accuracy rates, with mean values of R2 and
models, respectively. In turn, the mean SOC content of cluster 2 was
RMSE of 0.88 and 2.3%, respectively. While cluster 1 obtained the
18% and RMSE values were on average 2.9 and 2.6% for the SVM and
lowest indices with mean values of R2 and RMSE of 0.63 and 1.4%,

8
J.J.F. Costa, et al. Computers and Electronics in Agriculture 177 (2020) 105710

Fig. 4. Statistical distribution and variability of SOC (a), clay (b), sand (c) and silt (d) contents for the model without stratification (MWS-260 and MWS-95) and
clusters. Median values, 1st and 3nd quartile.

PLSR models, respectively. According to Nocita et al. (2014) prediction with RMSE values of 8.3 and 7%, on average, for the SVM and PLSR
errors tend to increase with increase in SOC content. This result was models, respectively. In cluster 1, which had the highest clay content
confirmed by the low reflectance of cluster 2 (Fig. 5a), and is explained (45%, on average), RMSE values were 9.6 and 7.3% for the SVM and
by the higher OM content, which has low reflectance level and masks PLSR models, respectively.
the reflectance of the samples (Nocita et al., 2014; Stenberg et al.,
2010).
The mean error of the models increased with increase in clay con-
tent (Table 5). Cluster 2 had the lowest clay content (21% on average),

Fig. 5. Mean reflectance spectra of clusters based on color (a) parameters and the first derivative of spectral curves, highlighting spectral bands related to soil
components and changes in reflectance intensity (b).

9
J.J.F. Costa, et al. Computers and Electronics in Agriculture 177 (2020) 105710

Table 5
Predictive performance of soil properties for the validation set considering the model without stratification (MWS-260 and MWS-95) and the performance of the
models after cluster analysis based on color parameters.
Cluster 1 (n − 91) Cluster 2 (n − 95) Cluster 3 (n − 74) MWS (n − 260) MWS (n − 95)

SVM PLSR SVM PLSR SVM PLSR SVM PLSR SVM PLSR

R2 RMSE R2 RMSE R2 RMSE R2 RMSE R2 RMSE R2 RMSE R2 RMSE R2 RMSE R2 RMSE R2 RMSE

SOC/Repet. % % % % % % % % % %
1 0.91 0.8 0.86 1.5 0.86 2.9 0.85 2.3 0.82 3.3 0.86 2.5 0.93 2.2 0.91 2.4 0.81 3.8 0.81 3.8
2 0.82 0.7 0.72 1.1 0.80 2.6 0.87 2.7 0.85 3.1 0.87 3.0 0.91 2.5 0.92 2.3 0.81 3.0 0.80 3.1
3 0.66 0.8 0.64 1.0 0.78 2.3 0.82 2.2 0.73 4.9 0.82 3.9 0.89 3.2 0.91 2.7 0.84 3.2 0.83 3.4
4 0.76 1.2 0.74 1.3 0.69 2.5 0.76 2.3 0.82 1.8 0.87 1.4 0.91 2.1 0.90 2.3 0.82 4.7 0.83 4.4
5 0.81 0.9 0.82 1.0 0.70 2.9 0.73 2.6 0.82 1.6 0.86 1.7 0.87 2.5 0.86 2.7 0.80 3.7 0.81 3.7
6 0.72 1.3 0.74 1.2 0.84 3.1 0.86 2.5 0.91 1.8 0.97 1.3 0.88 2.3 0.88 2.3 0.79 3.4 0.78 3.2
7 0.79 0.8 0.58 1.4 0.70 3.5 0.78 3.1 0.72 3.5 0.85 2.3 0.91 2.5 0.92 2.3 0.94 2.4 0.92 2.7
8 0.11 3.6 0.26 3.3 0.86 2.6 0.88 2.6 0.82 4.4 0.90 3.4 0.85 3.6 0.88 3.3 0.82 2.8 0.82 3.1
9 0.94 0.9 0.90 1.1 0.74 3.2 0.75 3.1 0.82 1.9 0.88 1.6 0.83 2.9 0.83 3.0 0.86 3.1 0.84 3.4
10 0.66 1.1 0.56 1.5 0.59 4.1 0.70 3.0 0.83 2.6 0.89 2.1 0.89 2.3 0.88 2.6 0.90 2.8 0.90 2.7
11 0.66 1.0 0.66 1.5 0.73 3.5 0.86 2.2 0.94 0.9 0.91 1.7 0.90 2.3 0.90 2.4 0.74 3.7 0.71 3.9
12 0.68 1.0 0.56 1.2 0.81 2.8 0.82 2.9 0.87 1.4 0.87 1.6 0.91 2.3 0.91 2.3 0.89 2.9 0.87 3.3
13 0.83 0.7 0.71 0.9 0.80 2.5 0.89 2.0 0.66 3.5 0.84 3.3 0.92 2.3 0.92 2.4 0.91 2.5 0.91 2.5
14 0.12 3.6 0.27 3.2 0.85 2.1 0.85 2.3 0.87 2.1 0.94 2.3 0.86 2.7 0.86 2.7 0.80 3.4 0.81 3.7
15 0.60 1.0 0.56 1.1 0.75 2.7 0.83 2.5 0.89 4.1 0.90 3.5 0.86 3.1 0.90 2.7 0.87 3.7 0.90 3.1
16 0.67 0.9 0.44 1.1 0.78 2.7 0.88 2.0 0.70 3.2 0.81 2.7 0.90 2.3 0.87 2.7 0.95 2.2 0.95 2.3
17 0.69 1.0 0.54 1.2 0.76 3.4 0.82 2.8 0.81 2.0 0.82 1.9 0.87 2.7 0.86 2.8 0.89 2.7 0.89 2.7
18 0.83 0.6 0.73 0.9 0.70 2.9 0.74 2.9 0.96 2.4 0.96 1.5 0.88 2.9 0.88 2.8 0.87 4.3 0.90 3.9
Mean 0.68 1.2 0.63 1.4 0.76 2.9 0.82 2.6 0.82 2.7 0.88 2.3 0.89 2.6 0.89 2.6 0.85 3.2 0.85 3.3
SD 0.23 0.9 0.18 0.7 0.07 0.5 0.06 0.4 0.08 1.1 0.05 0.8 0.03 0.4 0.03 0.3 0.06 0.7 0.06 0.6
Max. 0.94 3.6 0.90 3.3 0.86 4.1 0.89 3.1 0.96 4.9 0.97 3.9 0.93 3.6 0.92 3.3 0.95 4.7 0.95 4.4
Min. 0.11 0.6 0.26 0.9 0.59 2.1 0.70 2.0 0.66 0.9 0.81 1.3 0.83 2.1 0.83 2.3 0.74 2.2 0.71 2.3

Clay
1 0.63 6.4 0.67 6.0 0.82 8.1 0.90 5.4 0.76 7.2 0.88 5.4 0.76 8.2 0.77 7.9 0.73 9.9 0.74 9.7
2 0.19 10.5 0.64 6.4 0.80 5.5 0.88 4.3 0.77 8.9 0.90 6.3 0.77 8.3 0.76 8.4 0.49 11.9 0.34 14.4
3 0.56 8.0 0.67 6.7 0.71 7.8 0.86 5.4 0.90 6.9 0.88 6.8 0.71 9.2 0.67 9.7 0.59 12.6 0.65 11.7
4 0.34 9.5 0.50 6.9 0.80 7.1 0.81 7.3 0.84 7.4 0.82 7.5 0.76 8.4 0.81 7.5 0.60 11.0 0.67 10.0
5 0.47 8.5 0.61 6.9 0.59 6.4 0.79 5.2 0.65 9.4 0.70 8.0 0.70 8.7 0.73 8.1 0.65 11.5 0.67 11.0
6 0.46 8.2 0.59 6.9 0.81 5.6 0.79 5.9 0.62 9.2 0.75 8.8 0.76 7.7 0.72 8.3 0.72 11.4 0.71 10.9
7 0.24 9.5 0.51 7.1 0.23 11.1 0.78 6.0 0.62 9.8 0.66 8.8 0.64 9.8 0.65 9.3 0.78 8.0 0.85 7.0
8 0.43 8.0 0.54 7.2 0.69 8.4 0.77 6.8 0.75 7.9 0.82 8.9 0.71 8.7 0.74 8.3 0.45 11.5 0.54 10.8
9 0.16 10.3 0.48 7.2 0.63 10.2 0.76 7.9 0.70 9.8 0.74 9.1 0.65 10.2 0.67 9.9 0.91 6.3 0.87 7.7
10 0.27 7.8 0.27 7.4 0.60 8.1 0.68 8.7 0.66 9.3 0.68 9.5 0.79 7.8 0.72 8.9 0.63 10.8 0.64 11.6
11 0.20 9.2 0.39 7.6 0.52 7.8 0.67 6.4 0.47 11.3 0.62 9.7 0.74 8.9 0.73 9.1 0.48 13.1 0.41 12.9
12 0.30 9.4 0.52 7.7 0.63 6.6 0.66 6.3 0.69 9.9 0.68 9.7 0.73 8.4 0.67 9.2 0.73 9.4 0.72 9.5
13 0.02 13.4 0.48 7.8 0.69 7.0 0.66 6.9 0.76 9.3 0.79 9.8 0.81 7.4 0.78 7.9 0.69 9.9 0.67 10.1
14 0.44 9.3 0.56 7.8 0.50 8.5 0.65 8.0 0.52 11.8 0.61 10.4 0.59 10.0 0.63 9.6 0.74 8.5 0.59 11.2
15 0.38 9.6 0.64 7.9 0.48 10.1 0.61 9.6 0.51 12.3 0.57 10.6 0.71 9.1 0.72 8.9 0.70 11.7 0.71 10.5
16 0.18 11.9 0.60 7.9 0.34 13.0 0.60 10.1 0.63 12.0 0.74 10.9 0.75 8.3 0.71 8.9 0.66 9.7 0.45 13.0
17 0.17 10.4 0.64 8.0 0.39 11.7 0.54 10.0 0.66 10.7 0.80 11.2 0.72 8.7 0.74 8.4 0.53 12.9 0.45 14.5
18 0.00 12.9 0.50 8.0 0.44 7.0 0.54 6.3 0.48 13.3 0.64 11.3 0.69 10.1 0.71 9.7 0.63 11.9 0.66 11.1
Mean 0.30 9.6 0.54 7.3 0.59 8.3 0.72 7.0 0.67 9.8 0.74 9.0 0.72 8.8 0.72 8.8 0.65 10.7 0.63 11.0
SD 0.17 1.8 0.10 0.6 0.17 2.1 0.11 1.7 0.12 1.8 0.10 1.7 0.05 0.8 0.05 0.7 0.12 1.8 0.14 2.0
Max. 0.63 13.4 0.67 8.0 0.82 13.0 0.90 10.1 0.90 13.3 0.90 11.3 0.81 10.2 0.81 9.9 0.91 13.1 0.87 14.5
Min. 0.00 6.4 0.27 6.0 0.23 5.5 0.54 4.3 0.47 6.9 0.57 5.4 0.59 7.4 0.63 7.5 0.45 6.3 0.34 7.0

MWS-260: model without stratification using 260 soil samples.


MWS-95: model without stratification using 95 selected samples.

3.5. Prediction of soil properties using clusters and the MWS-260 and clusters, though not significant. However, when using the MWS with 95
MWS-95 selected samples, the reduction of the error was significant of 28% in
the prediction. These values are close to those observed in a study on
Cluster analysis based on soil color affected the prediction of SOC SOC prediction conducted by Jaconi et al. (2017), who applied cluster
contents (Table 6 and Fig. 6). In general, the error obtained in the methods based on soil depth, land use, pH, and soil texture data to
clusters was smaller than the error obtained in the MWS-260 and MWS- develop individual spectral models for each subset. According to the
95. The mean comparison test of RMSE values for both methods was authors, the stratification of the dataset reduced by 22% the error of the
presented with alpha level 0.05. According to the results obtained, all clustered models compared to the MWS. Similarly, Nocita et al. (2014),
soil property studied had significant error reduction and prediction when predicting SOC content, selected soil samples based on spectral
performances of the PLSR models, on average, were better than those of distance and spectral combination with sand (predictor variable). Ac-
SVM model. In the SOC modeling using the PLSR method, the mean cording to the authors, prediction errors were higher in soils with
RMSE values were 2.6 ± 0.3 for the MWS-260 and 2.1 ± 0.8 for the higher SOC content.
clusters, representing a 19% reduction in the prediction error. Likewise, In clay prediction using the PLSR method, the mean RMSE values
when using the MWS with 95 selected samples, the reduction of the were 8.8 ± 0.7 for the MWS-260 and 7.8 ± 1.6 for the clusters, re-
error was 36% (Table 6 and Fig. 6). For the SVM method, the mean presenting a significant 11% reduction in the prediction error. Likewise,
RMSE values were 2.6 ± 0.4 for the MWS-260 and 2.3 ± 1.1 for the when using the MWS with 95 selected samples, the reduction of the

10
J.J.F. Costa, et al. Computers and Electronics in Agriculture 177 (2020) 105710

Table 6
Statistically significant difference and reduction of prediction error in the results of prediction with PLSR and SVM methods for each soil property.
Attributes Method Process Mean of RMSE value (%)* Error reduction (%)* Mean of RMSE value (%)** Error reduction (%)**

SOC PLSR Clusters 2.1 ± 0.8 a 19 2.7 ± 0.6 a ns


MWS-260 2.6 ± 0.3 b 2.7 ± 0.2 a
Clusters 2.1 ± 0.8 a 36 – –
MWS-95 3.3 ± 0.6 b – –
SVM Clusters 2.3 ± 1.1 a ns 2.7 ± 0.6 a ns
MWS-260 2.6 ± 0.4 a 2.7 ± 0.4 a
Clusters 2.3 ± 1.1 a 28 – –
MWS-95 3.2 ± 0.7 b – –

Clay PLSR Clusters 7.8 ± 1.6 a 11 10.0 ± 2.5 a ns


MWS-260 8.8 ± 0.7 b 9.3 ± 1.1 a
Clusters 7.8 ± 1.6 a 29 – –
MWS-95 11.0 ± 2.0 b – –
SVM Clusters 9.2 ± 2.0 a ns 10.0 ± 2.4 a ns
MWS-260 8.8 ± 0.8 a 9.3 ± 1.2 a
Clusters 9.2 ± 2.0 a 14 – –
MWS-95 10.7 ± 1.8 b – –

Sand PLSR Clusters 9.1 ± 3.5 a 17 10.5 ± 2.5 a ns


MWS-260 10.9 ± 0.7 b 10.8 ± 1.0 a
Clusters 9.1 ± 3.5 a 25 – –
MWS-95 12.2 ± 1.9 b – –
SVM Clusters 11.1 ± 4.6 a ns 11.0 ± 2.5 a ns
MWS-260 11.4 ± 0.6 a 10.6 ± 1.0 a
Clusters 11.1 ± 4.6 a ns – –
MWS-95 12.8 ± 2.1 a – –

Silt PLSR Clusters 9.2 ± 1.6 a ns 10.4 ± 1.8 a ns


MWS-260 9.7 ± 0.8 a 9.7 ± 0.7 a
Clusters 9.2 ± 1.6 a 12 – –
MWS-95 10.4 ± 1.8 b – –
SVM Clusters 10.4 ± 2.0 a ns 11.0 ± 2.1 a 12
MWS-260 9.9 ± 1.0 a 9.8 ± 0.7 b
Clusters 10.4 ± 2.0 a ns – –
MWS-95 10.5 ± 1.9 a – –

MWS-260: model without stratification using 260 soil samples; MWS-95: model without stratification using 95 selected samples; *Considering only the Vis region to
group soil samples; **Considering the Vis-NIR-SWIR region to group the soil samples; Equal letters indicate no statistically significant difference and different letters
indicate a statistically significant difference between mean RMSE values for the clusters and the model without stratification (MWS-260), according to Student’s t-test
at p < 0.05. ns, not significant.

error was significant of 29% for the PLSR and 14% for the SVM (Table 6 method that considered only the Vis region obtained the largest dif-
and Fig. 6). Regarding the prediction of sand content using the PLSR ference between the mean RMSE values (for clusters and MWS-260),
method, error reduction was also significant, with mean RMSE values of with significant reductions in error for SOC content (19%, PLSR
10.9 ± 0.7 for the MWS-260 and 9.1 ± 3.5 for the clusters, re- method), clay (11%, PLSR method) and sand content (17%, PLSR
presenting an 17% reduction in the prediction error. Likewise, when method). While the method that considered the Vis-NIR-SWIR did not
using the MWS with 95 selected samples, the reduction of the error was show significant reductions among the mean RMSE values.
25%. For silt prediction using the PLSR method, the reduction of the Following the application of this methodology, it is necessary to
error was significant of 12% in the prediction, when using the MWS understand why color parameters grouped the samples based on the
with 95 selected samples. With the SVM method, the average RMSE relationships between soil features and color. For some soil properties
values of the clusters were higher than those in the MWS-260, in- the explanation for a relationship with soil color may be clearer than for
dicating the poor performance of this methodology in silt prediction. other properties, for example: (a) organic matter is a primary con-
Since the soils assessed in this study contained comprehensive in- stituent of soil color, imparting dark colors to surface horizons and
formation on various soil properties (Table 3 and Fig. 4), multivariate some subsurface horizons (illuviation); b) quartz (in the sand) gives the
analysis grouped soil samples based on color parameters and grouped soil a white color, increasing the intensity of soil reflectance; and c) iron
the complex dataset into clusters with similar spectral characteristics, oxides (FeIII) present in well-drained and weathered soils, impart
thus reducing the possible interference of the variability of these mainly a red (hematite) or yellow (goethite) color (Aitkenhead et al.,
properties in prediction (Guerrero et al., 2010; Shi et al., 2015). In 2013a). Each of these factors impact soil color, thus allowing the es-
contrast, the MWS-260 had soil samples with greater data variability, tablishment of clustering techniques based on color parameters.
resulting in a higher prediction error. Thus, data variability represented Unlike other attempts to improve prediction models, the present
by mean and standard deviation, influenced the final accuracy of the study sought simpler solutions, which do not entail additional costs, to
prediction models, as expressed in R2 and RMSE. The results revealed increase the potential to improve the prediction of soil properties. Other
that the prediction using data with larger variability have higher R2, approaches used simple stratification methods based on soil depth, pH
but, in contrast, there was an increase in the mean values of the RMSE, and texture to divide the data into subgroups (Jaconi et al., 2017). The
which is not desirable, since the higher the mean square error, the strategies for clustering soil samples are generally based on soil prop-
greater the differences between the predictions and estimated value. erties according to depths and also the types of land use and cover, or
According to Guerrero et al. (2014) and Moura-Bueno et al. (2019) if their combinations (Stevens et al., 2013; Vasques et al., 2010). Stevens
soil samples are similar to the overall prediction set, the model will et al. (2013) added new predictors to the spectral matrix in an attempt
provide more accurate results. to improve the models of prediction of SOC content. These authors
When comparing the sample grouping methods (Table 6), the tested sand and clay fractions as auxiliary predictors. However, as with

11
J.J.F. Costa, et al. Computers and Electronics in Agriculture 177 (2020) 105710

Fig. 6. Performance of prediction models considering the model without stratification (MWS-260 and MWS-95) and performance of the models after cluster analysis
based on color parameters. Median values, 1st and 3nd quartile.

SOC, these soil physical properties also need to be determined in the using the MWS with 95 selected samples, the error reduction was
laboratory. Moreover, using them as predictor variables makes pre- greater, of 29% for the PLSR and 14% for the SVM. Overall, the PLSR
diction more expensive. In contrast, clustering samples by color does model performed better than the SVM model, as confirmed by the
not entail higher costs, as color can be obtained from color parameters statistical difference between RMSE results.
derived from spectral curves (Vis). Moreover, this is a promising Therefore, the use of Vis-based-color parameters to group soil
method that can be used in laboratories for more accurate results in the samples can be a quick and inexpensive way to increase the potential of
determination of soil properties. spectroscopy to accurately predict soil physical and chemical proper-
The potential for discrimination of multivariate statistical analysis ties.
applied to color parameters was possible due to the variability of soil
characteristics for SOC content and texture. Thus, in the grouping of CRediT authorship contribution statement
samples it was possible to divide 3 clusters with well-defined patterns.
In a set of soils with greater variability in their features, the potential of José Janderson Ferreira Costa: Conceptualization, Methodology,
this methodology could be even greater. Software, Investigation, Data curation, Writing - original draft, Writing
- review & editing, Visualization. Élvio Giasson: Conceptualization,
4. Conclusions Methodology, Writing - review & editing, Supervision. Elisângela
Benedet da Silva: Methodology, Software, Investigation, Writing - re-
This study used color parameters derived from the visible spectrum view & editing. João Augusto Coblinski: Software, Writing - review &
to group soil samples in order to predict soil physical and chemical editing. Tales Tiecher: Conceptualization, Methodology, Writing - re-
properties. Moreover, we compared the different prediction models: view & editing, Supervision.
both the models of the sample sets grouped based on color parameters,
as well as the models without stratification using 260 soil samples and
Declaration of Competing Interest
95 selected samples (MWS-260 and MWS-95). Multivariate statistical
analysis applied to color parameters were able to group soil samples
with similar characteristics, reducing data amplitude and improving the The authors declare that they have no known competing financial
accuracy of predictions of soil properties. interests or personal relationships that could have appeared to influ-
There was a reduction in prediction error with stratification, which ence the work reported in this paper.
was significant for soil organic carbon (SOC) content, followed by clay,
sand and silt content. Prediction errors for SOC content using the PLSR Acknowledgments
model were on average 2.6 ± 0.3% for the MWS-260 and 2.1 ± 0.8%
for clusters, representing a 19% reduction in the prediction error. The authors thank the Coordination for the Improvement of Higher
However, when using the MWS with 95 selected samples, the error Education Personnel (CAPES) - Finance Code 001 for the PhD scho-
reduction was greater, of 36% in the prediction. larship. The second author thanks the National Council for Scientific
For clay content, prediction errors using the PLSR model were on and Technological Development (CNPq) for the research productivity
average 8.8 ± 0.7% for the MWS-260 and 7.8 ± 1.6 for clusters, grant. To the Graduate Program in Remote Sensing of the UFRGS for the
representing an 11% reduction in prediction error. However, when use of the equipment for Vis-NIR-SWIR spectrum measurements. The

12
J.J.F. Costa, et al. Computers and Electronics in Agriculture 177 (2020) 105710

Biodiversity Research Program, Atlantic Forest, Santa Catarina (PPBio- Dominguez, J.S., Roman, A.G., Prieto, F.G., Acevedo, O.S., 2010. Sistema de Notación
MA-SC) and Agricultural Research and Rural Extension Corporation of Munsell y CIELab como herramienta para evaluación de color en suelos. Rev. Mex.
ciencias agrícolas 3, 141–155.
Santa Catarina (EPAGRI) for providing the data. Donagemma, G.K., Viana, J.H.M., Almeida, B.G. de, Ruiz, H.A., Klein, V.A., Dechen, S.C.
F., Fernandes, R.B.A., 2017. Padronização de métodos para análise granulométrica no
References Brasil. Embrapa 3, 573 p. https://doi.org/ISSN1517-5685.
Dotto, A.C., Dalmolin, R.S.D., Grunwald, S., ten Caten, A., Pereira Filho, W., 2017. Two
preprocessing techniques to reduce model covariables in soil property predictions by
Aitkenhead, M., Coull, M., Towers, W., Hudson, G., Black, H.I.J., 2013a. Prediction of soil Vis-NIR spectroscopy. Soil Tillage Res. 172, 59–68. https://doi.org/10.1016/j.still.
characteristics and colour using data from the National Soils Inventory of Scotland. 2017.05.008.
Geoderma 200–201, 99–107. https://doi.org/10.1016/j.geoderma.2013.02.013. Dotto, A.C., Dalmolin, R.S.D., ten Caten, A., Moura-Bueno, J.M., 2016. Potential of
Aitkenhead, M., Donnelly, D., Coull, M., Black, H., 2013b. E-SMART: environmental spectroradiometry to classify soil clay content. Rev. Bras. Cienc. do Solo 40, 1–8.
sensing for monitoring and advising in real-time. In: IFIP Advances in Information https://doi.org/10.1590/18069657rbcs20151105.
and Communication Technology, pp. 129–142. https://doi.org/10.1007/978-3-642- Duda, B.M., Weindorf, D.C., Chakraborty, S., Li, B., Man, T., Paulette, L., Deb, S., 2017.
41151-9_13. Soil characterization across catenas via advanced proximal sensors. Geoderma 298,
Allo, M., Todoroff, P., Jameux, M., Stern, M., Paulin, L., Albrecht, A., 2020. Prediction of 78–91. https://doi.org/10.1016/j.geoderma.2017.03.017.
tropical volcanic soil organic carbon stocks by visible-near- and mid-infrared spec- Embrapa, 2004. Empresa Brasileira de Pesquisa Agropecuária. Solos do Estado de Santa
troscopy. Catena 189, 104452. https://doi.org/10.1016/j.catena.2020.104452. Catarina. CD-ROM, mapa color – (Embrapa Solos. Boletim de Pesquisa e
Allory, V., Cambou, A., Moulin, P., Schwartz, C., Cannavo, P., Vidal-Beaudet, L., Barthès, Desenvolvimento; n. 46, Rio de Janeiro.
B.G., 2019. Quantification of soil organic carbon stock in urban soils using visible and Ferraro, M.B., Giordani, P., 2015. A toolbox for fuzzy clustering using the R programming
near infrared reflectance spectroscopy (VNIRS) in situ or in laboratory conditions. language. Fuzzy Sets Syst. 279, 1–16. https://doi.org/10.1016/j.fss.2015.05.001.
Sci. Total Environ. 686, 764–773. https://doi.org/10.1016/j.scitotenv.2019.05.192. Fischer, M., Bossdorf, O., Gockel, S., Hänsel, F., Hemp, A., Hessenmöller, D., Korte, G.,
Araújo, S.R., Wetterlind, J., Demattê, J.A.M., Stenberg, B., 2014. Improving the predic- Nieschulze, J., Pfeiffer, S., Prati, D., Renner, S., Schöning, I., Schumacher, U., Wells,
tion performance of a large tropical vis-NIR spectroscopic soil library from Brazil by K., Buscot, F., Kalko, E.K.V., Linsenmair, K.E., Schulze, E.D., Weisser, W.W., 2010.
clustering into smaller subsets or use of data mining calibration techniques. Eur. J. Implementing large-scale and long-term functional biodiversity research: The
Soil Sci. 65, 718–729. https://doi.org/10.1111/ejss.12165. Biodiversity Exploratories. Basic Appl. Ecol. 11, 473–485. https://doi.org/10.1016/j.
Baumann, K., Schöning, I., Schrumpf, M., Ellerbrock, R.H., Leinweber, P., 2016. Rapid baae.2010.07.009.
assessment of soil organic matter: soil color analysis and Fourier transform infrared Franceschini, M.H.D., Demattê, J.A.M., Sato, M.V., Vicente, L.E., Grego, C.R., 2013.
spectroscopy. Geoderma 278, 49–57. https://doi.org/10.1016/j.geoderma.2016.05. Abordagens semiquantitativa e quantitativa na avaliação da textura do solo por
012. espectroscopia de reflectância bidirecional no VIS-NIR-SWIR. Pesqui. Agropecu. Bras.
Ben-Dor, E., Inbar, Y., Chen, Y., 1997. The reflectance spectra of organic matter in the 48, 1569–1582. https://doi.org/10.1590/S0100-204X2013001200006.
visible near-infrared and short wave infrared region (400–2500 nm) during a con- Galvãdo, L.S., Vitorello, Í., Paradella, W.R., 1995. Spectroradiometric discrimination of
trolled decomposition process. Remote Sens. Environ. 61, 1–15. https://doi.org/10. laterites with principal components analysis and additive modeling. Remote Sens.
1016/S0034-4257(96)00120-4. Environ. 53, 70–75. https://doi.org/10.1016/0034-4257(95)00040-8.
Ben Dor, E., Ong, C., Lau, I.C., 2015. Reflectance measurements of soils in the laboratory: Gholizadeh, A., Borůvka, L., Saberioon, M., Vašát, R., 2016. A memory-based learning
Standards and protocols. Geoderma 245–246, 112–124. https://doi.org/10.1016/j. approach as compared to other data mining algorithms for the prediction of soil
geoderma.2015.01.002. texture using diffuse reflectance spectra. Remote Sens. 8, 341. https://doi.org/10.
Bezdek, J.C., Ehrlich, R., Full, W., 1984. FCM: The fuzzy c-means clustering algorithm. 3390/rs8040341.
Comput. Geosci. 10, 191–203. https://doi.org/10.1016/0098-3004(84)90020-7. Gholizadeh, A., Žižala, D., Saberioon, M., Borůvka, L., 2018. Soil organic carbon and
Borcard, D., Gillet, F., Legendre, P., 2011. Numerical Ecology with R, Numerical Ecology texture retrieving and mapping using proximal, airborne and Sentinel-2 spectral
with R. https://doi.org/10.1007/978-1-4419-7976-6. imaging. Remote Sens. Environ. 218, 89–103. https://doi.org/10.1016/j.rse.2018.
Brown, D.J., Shepherd, K.D., Walsh, M.G., Dewayne Mays, M., Reinsch, T.G., 2006. 09.015.
Global soil characterization with VNIR diffuse reflectance spectroscopy. Geoderma Grinand, C., Barthès, B.G., Brunet, D., Kouakoua, E., Arrouays, D., Jolivet, C., Caria, G.,
132, 273–290. https://doi.org/10.1016/j.geoderma.2005.04.025. Bernoux, M., 2012. Prediction of soil organic and inorganic carbon contents at a
Camargo, L.A., Marques, J., Barrón, V., Alleoni, L.R.F., Barbosa, R.S., Pereira, G.T., 2015. national scale (France) using mid-infrared reflectance spectroscopy (MIRS). Eur. J.
Mapping of clay, iron oxide and adsorbed phosphate in Oxisols using diffuse re- Soil Sci. 63, 141–151. https://doi.org/10.1111/j.1365-2389.2012.01429.x.
flectance spectroscopy. Geoderma 251–252, 124–132. https://doi.org/10.1016/j. Guerrero, C., Stenberg, B., Wetterlind, J., Viscarra Rossel, R.A., Maestre, F.T., Mouazen,
geoderma.2015.03.027. A.M., Zornoza, R., Ruiz-Sinoga, J.D., Kuang, B., 2014. Assessment of soil organic
Cambule, A.H., Rossiter, D.G., Stoorvogel, J.J., Smaling, E.M.A., 2012. Building a near carbon at local scale with spiked NIR calibrations: effects of selection and extra-
infrared spectral library for soil organic carbon estimation in the Limpopo National weighting on the spiking subset. Eur. J. Soil Sci. 65, 248–263. https://doi.org/10.
Park, Mozambique. Geoderma 183–184, 41–48. https://doi.org/10.1016/j. 1111/ejss.12129.
geoderma.2012.03.011. Guerrero, C., Zornoza, R., Gómez, I., Mataix-Beneyto, J., 2010. Spiking of NIR regional
CIE, 1996. Commission Internationale de l’Éclairage [WWW Document]. Colourimetry. models using samples from target sites: effect of model size on prediction accuracy.
second ed.. Vienna CIE Publ. Geoderma 158, 66–77. https://doi.org/10.1016/j.geoderma.2009.12.021.
Dalmolin, R.S.D., Gonçalves, C.N., Klamt, E., Dick, D.P., 2005. Relação entre os con- Gupta, S., Islam, S., Hasan, M.M., 2018. Analysis of impervious land-cover expansion
stituintes do solo e seu comportamento espectral. Ciência Rural 35, 481–489. https:// using remote sensing and GIS: a case study of Sylhet sadar upazila. Appl. Geogr. 98,
doi.org/10.1590/s0103-84782005000200042. 156–165. https://doi.org/10.1016/j.apgeog.2018.07.012.
Dalmolin, R.S.D., Pedron, F. de A., Almeida, J.A. de, Curcio, G.R., 2017. Solos do Planalto Hutengs, C., Seidel, M., Oertel, F., Ludwig, B., Vohland, M., 2019. In situ and laboratory
das Araucárias. In: Curi, N., Ker, J.C., Novais, R.F., Vidal-Torrado, P., Schaefer, C.E.G. soil spectroscopy with portable visible-to-near-infrared and mid-infrared instruments
R. (Eds.), Pedologia - Solos Dos Biomas Brasileiros. Sociedade Brasileira de Ciência do for the assessment of organic carbon in soils. Geoderma 355, 113900. https://doi.
Solo, Viçosa, MG, p. 597. org/10.1016/j.geoderma.2019.113900.
Davey, B.G., Russell, J.D., Wilson, M.J., 1975. Iron oxide and clay minerals and their Jaconi, A., Don, A., Freibauer, A., 2017. Prediction of soil organic carbon at the country
relation to colours of red and yellow podzolic soils near Sydney, Australia. Geoderma scale: stratification strategies for near-infrared data. Eur. J. Soil Sci. 68, 919–929.
14, 125–138. https://doi.org/10.1016/0016-7061(75)90071-3. https://doi.org/10.1111/ejss.12485.
Demattê, J.A.M., Bellinaso, H., Araújo, S.R., Rizzo, R., Souza, A.B., 2016. Spectral re- Jaconi, A., Vos, C., Don, A., 2019. Near infrared spectroscopy as an easy and precise
gionalization of tropical soils in the estimation of soil attributes. Rev. Cienc. Agron. method to estimate soil texture. Geoderma 337, 906–913. https://doi.org/10.1016/j.
47, 589–598. https://doi.org/10.5935/1806-6690.20160071. geoderma.2018.10.038.
Demattê, J.A.M., Dotto, A.C., Paiva, A.F.S., Sato, M. V., Dalmolin, R.S.D., de Araújo, M. do Jiang, Q., Li, Q., Wang, X., Wu, Y., Yang, X., Liu, F., 2017. Estimation of soil organic
S.B., da Silva, E.B., Nanni, M.R., ten Caten, A., Noronha, N.C., Lacerda, M.P.C., de carbon and total nitrogen in different soil layers using VNIR spectroscopy: Effects of
Araújo Filho, J.C., Rizzo, R., Bellinaso, H., Francelino, M.R., Schaefer, C.E.G.R., spiking on model applicability. Geoderma 293, 54–63. https://doi.org/10.1016/j.
Vicente, L.E., dos Santos, U.J., de Sá Barretto Sampaio, E. V., Menezes, R.S.C., de geoderma.2017.01.030.
Souza, J.J.L.L., Abrahão, W.A.P., Coelho, R.M., Grego, C.R., Lani, J.L., Fernandes, A. Klein, R.M., 1978. Mapa fitogeográfico do estado de Santa Catarina. In : Reitz, R. (ed.).
R., Gonçalves, D.A.M., Silva, S.H.G., de Menezes, M.D., Curi, N., Couto, E.G., dos Flora Ilustrada Catarinense. Herbário Barbosa Rodrigues, Itajaí. 24p. Roberto Miguel
Anjos, L.H.C., Ceddia, M.B., Pinheiro, É.F.M., Grunwald, S., Vasques, G.M., Marques Klein 24.
Júnior, J., da Silva, A.J., Barreto, M.C. d. V., Nóbrega, G.N., da Silva, M.Z., de Souza, Legendre, P., Legendre, L., 1998. Numerical Ecology, 2nd edition. (Developments
S.F., Valladares, G.S., Viana, J.H.M., da Silva Terra, F., Horák-Terra, I., Fiorio, P.R., Environ. Model. 20) 24, 870 p. https://doi.org/10.1017/CBO9781107415324.004.
da Silva, R.C., Frade Júnior, E.F., Lima, R.H.C., Alba, J.M.F., de Souza Junior, V.S., Levin, N., Ben-Dor, E., Singer, A., 2005. A digital camera as a tool to measure colour
Brefin, M.D.L.M.S., Ruivo, M.D.L.P., Ferreira, T.O., Brait, M.A., Caetano, N.R., indices and related properties of sandy soils in semi-arid environments. Int. J. Remote
Bringhenti, I., de Sousa Mendes, W., Safanelli, J.L., Guimarães, C.C.B., Poppiel, R.R., Sens. 26, 5475–5492. https://doi.org/10.1080/01431160500099444.
e Souza, A.B., Quesada, C.A., do Couto, H.T.Z., 2019. The Brazilian Soil Spectral Liland, B.-H.M. and R.W. and K.H., 2013. {pls}: Partial Least Squares and Principal
Library (BSSL): A general view, application and challenges. Geoderma 354. https:// Component regression, Packages R CRAN.
doi.org/10.1016/j.geoderma.2019.05.043. Liu, S., Shen, H., Chen, S., Zhao, X., Biswas, A., Jia, X., Shi, Z., Fang, J., 2019. Estimating
Dimitriadou, E., Hornik, K., Leisch, F., Meyer, D., Weingessel, A., Leisch, M.F., 2015. Misc forest soil organic carbon content using vis-NIR spectroscopy: Implications for large-
functions of the Department of Statistics, Probability Theory Group (Formerly: scale soil carbon spectroscopic assessment. Geoderma 348, 37–44. https://doi.org/
E1071), Package: ‘e1071,’ R Software package, avaliable at http://cran.rproject.org/ 10.1016/j.geoderma.2019.04.003.
web/packages/e1071/index.html. Lucà, F., Conforti, M., Castrignanò, A., Matteucci, G., Buttafuoco, G., 2017. Effect of

13
J.J.F. Costa, et al. Computers and Electronics in Agriculture 177 (2020) 105710

calibration set size on prediction at local scale of soil carbon by Vis-NIR spectroscopy. https://doi.org/10.1007/s11430-013-4808-x.
Geoderma 288, 175–183. https://doi.org/10.1016/j.geoderma.2016.11.015. Silva, E.B., Ten Caten, A., Dalmolin, R.S.D., Dotto, A.C., Silva, W.C., Giasson, E., 2016.
Margenot, A., O’Neill, T., Sommer, R., Akella, V., 2020. Predicting soil permanganate Estimating soil texture from a limited region of the Visible/Near-Infrared Spectrum.
oxidizable carbon (POXC) by coupling DRIFT spectroscopy and artificial neural In: Hartemink, A.E., Minasny, B. (Eds.), Digital Soil Morphometrics, Progress in Soil
networks (ANN). Comput. Electron. Agric. 168, 105098. https://doi.org/10.1016/j. Science. Springer International Publishing, Cham, pp. 73–87. https://doi.org/10.
compag.2019.105098. 1007/978-3-319-28295-4.
Martínez-Carreras, N., Krein, A., Udelhoven, T., Gallart, F., Iffly, J.F., Hoffmann, L., Soil Survey Division Staff, 2017. Soil survey manual, United States Department of
Pfister, L., Walling, D.E., 2010a. A rapid spectral-reflectance-based fingerprinting Agricultur, Handbook No. 18. pp. 120–125 120–125. https://doi.org/10.1097/
approach for documenting suspended sediment sources during storm runoff events. J. 00010694-195112000-00022.
Soils Sediments 10, 400–413. https://doi.org/10.1007/s11368-009-0162-1. Spielvogel, S., Knicker, H., Kögel-Knabner, I., 2004. Soil organic matter composition and
Martínez-Carreras, N., Udelhoven, T., Krein, A., Gallart, F., Iffly, J.F., Ziebel, J., soil lightness. J. Plant Nutr. Soil Sci. 167, 545–555. https://doi.org/10.1002/jpln.
Hoffmann, L., Pfister, L., Walling, D.E., 2010b. The use of sediment colour measured 200421424.
by diffuse reflectance spectrometry to determine sediment sources: Application to the Stenberg, B., Viscarra Rossel, R.A., Mouazen, A.M., Wetterlind, J., 2010. Visible and Near
Attert River catchment (Luxembourg). J. Hydrol. 382, 49–63. https://doi.org/10. Infrared Spectroscopy in Soil Science. Adv. Agron. https://doi.org/10.1016/S0065-
1016/j.jhydrol.2009.12.017. 2113(10)07005-7.
Minasny, B., McBratney, A.B., 2006. A conditioned Latin hypercube method for sampling Stevens, A., Nocita, M., Tóth, G., Montanarella, L., van Wesemael, B., 2013. Prediction of
in the presence of ancillary information. Comput. Geosci. 32, 1378–1388. https:// soil organic carbon at the european scale by visible and near infraRed reflectance
doi.org/10.1016/j.cageo.2005.12.009. spectroscopy. PLoS ONE 8, 13 p. https://doi.org/10.1371/journal.pone.0066409.
Moreno-Ramón, H., Marqués-Mateu, Á., Ibáñez-Asensio, S., 2014. Significance of soil Stevens, A., Ramirez Lopez, L., 2014. An introduction to the prospectr package 1–22.
lightness versus physicochemical soil properties in semiarid areas. Arid L. Res. Stumpf, F., Schmidt, K., Behrens, T., Schönbrodt-Stitt, S., Buzzo, G., Dumperth, C.,
Manag. 28, 371–382. https://doi.org/10.1080/15324982.2014.882871. Wadoux, A., Xiang, W., Scholten, T., 2016. Incorporating limited field operability and
Mouazen, A.M., Karoui, R., Deckers, J., De Baerdemaeker, J., Ramon, H., 2007. Potential legacy soil samples in a hypercube sampling design for digital soil mapping. J. Plant
of visible and near-infrared spectroscopy to derive colour groups utilising the Munsell Nutr. Soil Sci. 179, 499–509. https://doi.org/10.1002/jpln.201500313.
soil colour charts. Biosyst. Eng. 97. https://doi.org/10.1016/j.biosystemseng.2007. Terra, F.S., Demattê, J.A.M., Viscarra Rossel, R.A., 2018. Proximal spectral sensing in
03.023. pedological assessments: vis–NIR spectra for soil classification based on weathering
Moura-Bueno, J.M., Dalmolin, R.S.D., ten Caten, A., Dotto, A.C., Demattê, J.A.M., 2019. and pedogenesis. Geoderma 318, 123–136. https://doi.org/10.1016/j.geoderma.
Stratification of a local VIS-NIR-SWIR spectral library by homogeneity criteria yields 2017.10.053.
more accurate soil organic carbon predictions. Geoderma 337, 565–581. https://doi. Tiecher, T., Caner, L., Minella, J.P.G., dos Santos, D.R., 2015. Combining visible-based-
org/10.1016/j.geoderma.2018.10.015. color parameters and geochemical tracers to improve sediment source discrimination
Mulder, V.L., de Bruin, S., Schaepman, M.E., 2012. Representing major soil variability at and apportionment. Sci. Total Environ. 527–528, 135–149. https://doi.org/10.1016/
regional scale by constrained Latin Hypercube Sampling of remote sensing data. Int. j.scitotenv.2015.04.103.
J. Appl. Earth Obs. Geoinf. 21, 301–310. https://doi.org/10.1016/j.jag.2012.07.004. Uttley, J., 2019. Power analysis, sample size, and assessment of statistical
Munsell Soil Color Charts, 2000. Munsell Soil Color Charts (revised). Munsell Color. assumptions—Improving the evidential value of lighting research. LEUKOS – J. Illum.
Murti, G.S.R.K., Satyanarayana, K.V.S., 1971. Influence of chemical characteristics in the Eng. Soc. North Am. https://doi.org/10.1080/15502724.2018.1533851.
development of soil colour. Geoderma 5, 243–248. https://doi.org/10.1016/0016- Vasava, H.B., Gupta, A., Arora, R., Das, B.S., 2019. Assessment of soil texture from
7061(71)90013-9. spectral reflectance data of bulk soil samples and their dry-sieved aggregate size
Nanni, M.R., Cezar, E., da Silva Junior, C.A., Silva, G.F.C., da Silva Gualberto, A.A., 2017. fractions. Geoderma 337, 914–926. https://doi.org/10.1016/j.geoderma.2018.11.
Partial least squares regression (PLSR) associated with spectral response to predict 004.
soil attributes in transitional lithologies. Arch. Agron. Soil Sci. 00, 1–14. https://doi. Vasques, G.M., Grunwald, S., Harris, W.G., 2010. Spectroscopic models of soil organic
org/10.1080/03650340.2017.1373185. carbon in Florida, USA. J. Environ. Qual. 39, 923–934. https://doi.org/10.2134/
Nanni, M.R., Povh, F.P., Alexandre, J., Demattê, M., Berti, R., 2011. Optimum size in grid jeq2009.0314.
soil sampling for variable rate application in site-specific management 386–392. Vendrame, P.R.S., Marchão, R.L., Brunet, D., Becquer, T., 2012. The potential of NIR
https://doi.org/https://doi.org/10.1590/S0103-90162011000300017. spectroscopy to predict soil texture and mineralogy in Cerrado Latosols. Eur. J. Soil
Nawar, S., Mouazen, A.M., 2018. Optimal sample selection for measurement of soil or- Sci. 63, 743–753. https://doi.org/10.1111/j.1365-2389.2012.01483.x.
ganic carbon using on-line vis-NIR spectroscopy. Comput. Electron. Agric. 151, Vianna, L.F. de N., Silva, E.B. da, Massignam, A.M., Oliveira, S.N. de, 2015. Aplicação de
469–477. https://doi.org/10.1016/j.compag.2018.06.042. descritores de heterogeneidade ambiental na seleção de áreas para sistemas de par-
Nocita, M., Stevens, A., Toth, G., Panagos, P., van Wesemael, B., Montanarella, L., 2014. celas amostrais: um estudo de caso para a determinação de rotspots potenciais de
Prediction of soil organic carbon content by diffuse reflectance spectroscopy using a biodiversidade. Geografia 40, 211–239.
local partial least square regression approach. Soil Biol. Biochem. 68, 337–347. Viscarra Rossel, R.A., Behrens, T., Ben-Dor, E., Brown, D.J., Demattê, J.A.M., Shepherd,
https://doi.org/10.1016/j.soilbio.2013.10.022. K.D., Shi, Z., Stenberg, B., Stevens, A., Adamchuk, V., Aïchi, H., Barthès, B.G.,
O’Rourke, S.M., Holden, N.M., 2011. Optical sensing and chemometric analysis of soil Bartholomeus, H.M., Bayer, A.D., Bernoux, M., Böttcher, K., Brodský, L., Du, C.W.,
organic carbon – a cost effective alternative to conventional laboratory methods? Soil Chappell, A., Fouad, Y., Genot, V., Gomez, C., Grunwald, S., Gubler, A., Guerrero, C.,
Use Manag. 27, 143–155. https://doi.org/10.1111/j.1475-2743.2011.00337.x. Hedley, C.B., Knadel, M., Morrás, H.J.M., Nocita, M., Ramirez-Lopez, L., Roudier, P.,
Pabón, R.E.C., de Souza Filho, C.R., de Oliveira, W.J., 2019. Reflectance and imaging Campos, E.M.R., Sanborn, P., Sellitto, V.M., Sudduth, K.A., Rawlins, B.G., Walter, C.,
spectroscopy applied to detection of petroleum hydrocarbon pollution in bare soils. Winowiecki, L.A., Hong, S.Y., Ji, W., 2016. A global spectral library to characterize
Sci. Total Environ. 649, 1224–1236. https://doi.org/10.1016/j.scitotenv.2018.08. the world’s soil. Earth-Sci. Rev. 155, 198–230. https://doi.org/10.1016/j.earscirev.
231. 2016.01.012.
Pinheiro, É., Ceddia, M., Clingensmith, C., Grunwald, S., Vasques, G., 2017. Prediction of Viscarra Rossel, R.A., Cattle, S.R., Ortega, A., Fouad, Y., 2009. In situ measurements of
soil physical and chemical properties by visible and near-infrared diffuse reflectance soil colour, mineral composition and clay content by vis-NIR spectroscopy. Geoderma
spectroscopy in the Central Amazon. Remote Sens. 9, 293. https://doi.org/10.3390/ 150, 253–266. https://doi.org/10.1016/j.geoderma.2009.01.025.
rs9040293. Viscarra Rossel, R.A., Chen, C., 2011. Digitally mapping the information content of
Post, D.F., Lucas, W.M., White, S.A., Ehasz, M.J., Batchily, A.K., Horvath, E.H., 1994. visible-near infrared spectra of surficial Australian soils. Remote Sens. Environ. 115,
Relations between soil color and landsat reflectance on semiarid rangelands. Soil Sci. 1443–1455. https://doi.org/10.1016/j.rse.2011.02.004.
Soc. Am. J. 58, 1809. https://doi.org/10.2136/sssaj1994.03615995005800060033x. Viscarra Rossel, R.A., Fouad, Y., Walter, C., 2008. Using a digital camera to measure soil
R Development Core Team, 2017. A Language and Environment for Statistical organic carbon and iron contents. Biosyst. Eng. 100, 149–159. https://doi.org/10.
Computing. R Found. Stat. Comput. 2, https://www.R-project.org. https://doi.org/R 1016/j.biosystemseng.2008.02.007.
Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL Viscarra Rossel, R.A., Minasny, B., Roudier, P., McBratney, A.B., 2006. Colour space
http://www.R-project.org. models for soil science. Geoderma 133, 320–337. https://doi.org/10.1016/j.
Ramirez-Lopez, L., Behrens, T., Schmidt, K., Stevens, A., Demattê, J.A.M., Scholten, T., geoderma.2005.07.017.
2013. The spectrum-based learner: A new local approach for modeling soil vis-NIR Viscarra Rossel, R.A., Walter, C., Fouad, Y., 2003. Assessment of two reflectance tech-
spectra of complex datasets. Geoderma 195–196, 268–279. https://doi.org/10.1016/ niques for the quantification of the within-field spatial variability of soil organic
j.geoderma.2012.12.014. carbon. Precis. Agric. 697–703.
Romero, D.J., Ben-Dor, E., Demattê, J.A.M., Souza, A.B. e., Vicente, L.E., Tavares, T.R., Viscarra Rossel, R.A., Webster, R., 2012. Predicting soil properties from the Australian
Martello, M., Strabeli, T.F., da Silva Barros, P.P., Fiorio, P.R., Gallo, B.C., Sato, M.V., soil visible-near infrared spectroscopic database. Eur. J. Soil Sci. 63, 848–860.
Eitelwein, M.T., 2018. Internal soil standard method for the Brazilian soil spectral https://doi.org/10.1111/j.1365-2389.2012.01495.x.
library: Performance and proximate analysis. Geoderma 312, 95–103. https://doi. Vodyanitskii, Y.N., Savichev, A.T., 2017. The influence of organic matter on soil color
org/10.1016/j.geoderma.2017.09.014. using the regression equations of optical parameters in the system CIE- L*a*b*. Ann.
Schanda, J., 2007. Colorimetry: Understanding the CIE System. John Wiley & Sons Inc, Agrar. Sci. 15, 380–385. https://doi.org/10.1016/j.aasci.2017.05.023.
Hoboken, New Jersey. WallkillColor, 2019. Munsell Conversion Software. http://wallkillcolor.com/. Accessed
Shi, Z., Ji, W., Viscarra Rossel, R.A., Chen, S., Zhou, Y., 2015. Prediction of soil organic October 18, 2019.
matter using a spatially constrained local partial least squares regression and the Wildner, W., Camozzato, E., Toniolo, J.A., Binotto, R.B., Iglesias, C.M.F., Laux, J.H., 2014.
Chinese vis-NIR spectral library. Eur. J. Soil Sci. 66, 679–687. https://doi.org/10. Mapa geológico do Estado de Santa Catarina. Porto Alegre: CPRM, 2014. Escala
1111/ejss.12272. 1:500.000. Geologia do Brasil e de cartografia geológica regional. [WWW
Shi, Z., Wang, Q.L., Peng, J., Ji, W.J., Liu, H.J., Li, X., Viscarra Rossel, R.A., 2014. Document]. Cia. Pesqui. Recur. Minerais, Ministério Minas e Energia, Serviço
Development of a national VNIR soil-spectral library for soil classification and pre- Geológico do Bras. URL http://geobank.cprm.gov.br/.
diction of organic matter concentrations. Sci. China Earth Sci. 57, 1671–1680. Wu, K.L., Yang, M.S., 2005. A cluster validity index for fuzzy clustering. Pattern Recognit.

14
J.J.F. Costa, et al. Computers and Electronics in Agriculture 177 (2020) 105710

Lett. 26, 1275–1291. https://doi.org/10.1016/J.PATREC.2004.11.022. https://doi.org/10.2136/sssaj2016.08.0253.


Xu, D., Ma, W., Chen, S., Jiang, Q., He, K., Shi, Z., 2018. Assessment of important soil Zhao, D., Zhao, X., Khongnawang, T., Arshad, M., Triantafilis, J., 2018. A Vis-NIR spectral
properties related to Chinese Soil Taxonomy based on vis–NIR reflectance spectro- library to predict clay in Australian cotton growing soil. Soil Sci. Soc. Am. J. 82,
scopy. Comput. Electron. Agric. 144, 1–8. https://doi.org/10.1016/j.compag.2017. 1347–1357. https://doi.org/10.2136/sssaj2018.03.0100.
11.029. Zobeck, T.M., Baddock, M., Scott Van Pelt, R., Tatarko, J., Acosta-Martinez, V., 2013. Soil
Zhang, Y., Biswas, A., Ji, W., Adamchuk, V.I., 2017. Depth-specific prediction of soil property effects on wind erosion of organic soils. Aeolian Res. 10, 43–51. https://doi.
properties in situ using vis-NIR spectroscopy. Soil Sci. Soc. Am. J. 81, 993–1004. org/10.1016/j.aeolia.2012.10.005.

15

You might also like