You are on page 1of 17

GEOMOR-107305; No of Pages 17

Geomorphology 367 (2020) 107305

Contents lists available at ScienceDirect

Geomorphology

journal homepage: www.elsevier.com/locate/geomorph

Digital mapping of soil parent material in a heterogeneous tropical area


Benito R. Bonfatti, José A.M. Demattê ⁎, Karina P.P. Marques, Raul R. Poppiel, Rodnei Rizzo,
Wanderson de S. Mendes, Nelida E.Q. Silvero, José L. Safanelli
Department of Soil Science, Luiz de Queiroz College of Agriculture, University of São Paulo, Ave. Pádua Dias, 11, Postal Box 09, Piracicaba, São Paulo 13416-900, Brazil

a r t i c l e i n f o a b s t r a c t

Article history: Parent material is one of the five factors in soil formation. Studies on parent material allow interpreting soil gen-
Received 7 November 2019 esis processes and improve our knowledge of specific soil attributes. However, soil parent material maps at de-
Received in revised form 9 May 2020 tailed cartographic scale (finer than 1:100,000) are rare in tropical areas and it is usually inferred from poorly
Accepted 13 June 2020
detailed geological data, which generally group different lithologies into single units. Thus, we propose a meth-
Available online 17 June 2020
odology to map soil parent material based on remote sensing and machine learning in a geologically very com-
Keywords:
plex area. The study site covers 1378 km2 in São Paulo State, Brazil. Prediction models used data from 280
Parent material geological observation points, a digital elevation model (spatial resolution of 5 m, upscale to 30 m) and
Soil formation factors multitemporal Landsat images in a range of 30 years. We evaluated six classification algorithms, namely random
Bare soil image forest, decision tree, support vector machine, multinomial logistic regression, K-means (unsupervised classifica-
Machine learning tion), and object-based image analysis with maximum likelihood classification. Environmental covariates were
grouped to create different scenarios combining terrain derivatives, hydrologic covariates, topsoil spectral reflec-
tance, and spatial coordinates. A bare soil image, elaborated using 30 years of Landsat data, was evaluated as a
covariate to predict soil parent material. Predictions were validated using three different strategies: cross-
validation, separate validation dataset (20%), and comparison with legacy geological maps (information from
two areas with geological maps at fine scale). We also assessed the correspondence between the map of pre-
dicted soil parent material and data of soil particle size from 571 soil sampling points. Random forest algorithm
presented the best validation performance, whereas the group of terrain derivatives and hydrologic covariates
explained most of model variation. The produced parent material map was coherent with the spatial distribution
of soil particle size across the study area.
© 2020 Elsevier B.V. All rights reserved.

1. Introduction clay contents than those developed from sandstone (Birkeland,


1999; Schaetzl and Anderson, 2005). In addition, the mineral com-
From a pedological perspective, parent material is an important position influences the weathering rate and can help indicate the
factor of soil formation. Physical and chemical weathering relative soil age (Jenny, 1941; Schaetzl and Anderson, 2005). Re-
processes, influenced by other soil formation factors (climate, re- vealing the spatial distribution of parent material provides support
lief, organism, and time) and soil dynamics, originate small parti- to soil classes delineation and characterization, as well as to evalu-
cles that form soils (Jenny, 1941). Although its prevailing ate the rate of soil formation (McBratney et al., 2003; Willgoose,
importance is reduced over time (Wilson, 2019), parent material 2018).
is still one of the main factors acting in thin soil profiles, in soils A common issue when analyzing spatial distribution of parent
with recent origin, or in deep soil horizons (Schaetzl and materials is the rareness of supporting maps in an appropriate repre-
Anderson, 2005). sentation (Lacoste et al., 2011). Therefore, geological maps can be al-
Several soil properties can be directly related to parent material ternatively used (Zeraatpisheh et al., 2017; Bogunovic et al., 2018;
(Gray et al., 2016). For instance, distribution of soil particle size is Poppiel et al., 2019; Silvero et al., 2019). Nevertheless, we can note
greatly influenced by mineral formation of the parent material. subtle differences between geological maps and soil parent material
Tropical soils developed from basalt are likely to present higher maps, which can be observed due to the different purposes. In
geological maps, lithologies are grouped according to a number of
complex criteria (chronology, structures, environments of sedimen-
⁎ Corresponding author at: Department of Soil Science, Luiz de Queiroz College of
Agriculture, University of São Paulo, Ave. Pádua Dias, 11, P.O. Box 09, Piracicaba, São
tation, evolution processes of the Earth's crust); whereas, soil parent
Paulo 13416-900, Brazil. material maps aim to understand the sources of soil formation and
E-mail address: jamdemat@usp.br (J.A.M. Demattê). its influence, where lithologies or deposited material are analyzed

https://doi.org/10.1016/j.geomorph.2020.107305
0169-555X/© 2020 Elsevier B.V. All rights reserved.
2 B.R. Bonfatti et al. / Geomorphology 367 (2020) 107305

according to pedological knowledge. Moreover, geological maps rep- is temperate and warm with dry winters (Cwa of the Köppen climate
resent information about lithologies from surficial or deep layers of classification system). Annual precipitation varies between 1100 and
the Earth's crust, which are usually grouped into units of rocks 1450 mm. Average temperature ranges from 22 °C to 24 °C. Predomi-
formed from similar geological periods. In contrast, parent material nant vegetation is the Brazilian Savana, currently converted to agricul-
maps represent reworked surface lithology, from which soils are tural uses (Alvares et al., 2013; Instituto Geográfico e Geológico,
formed. For the latter, map delineation in single units is preferable, 1965). The Piracicaba River is the main river, flowing in E–W direction.
to allow a preliminary analysis of the influence of each type of parent The municipality of Piracicaba has diversified geology, where several
material in soil formation. Gray and Murphy (1999) classified parent types of rocks and unconsolidated sediments are found (Fig. 2).
material into in-situ bedrock, secondary transported material, in-situ Fundamental stratigraphic units in Brazil are named Formations,
pedogenic material (an old soil surface), organic material (as peal or which can be clustered into Groups or divided into Members (Petri
alpine humus), and anthropogenic material (landfill or mining et al., 1986). Top layers in stratigraphic profile were formed during the
waste). Cenozoic Era (Fig. 3), where the oldest sediments were deposited dur-
Digital methods have been used to map soil parent material. Heung ing the Pleistocene and the most recent ones during the Holocene,
et al. (2014) did it for an area in Canada, using polygon disaggregation, a across alluvial plains (Instituto Geográfico e Geológico, 1965). Lowering
random forest classifier, and 27 topographic attributes. Lacoste et al. of sea level during the Holocene resulted in sand, gravel and clay de-
(2011) mapped soil parent material in northwestern France using 17 posits mainly in banks of the Piracicaba River (Instituto Geográfico e
covariates, such as digital elevation model (DEM) derivatives, geological Geológico, 1965; Bjornberg and Landim, 1966). The main Cenozoic
maps, gamma-ray spectrometry, and land use map. Miller et al. (2008) layer in the municipality of Piracicaba comprises the Rio Claro Forma-
created a Quaternary geological map for an area in the USA, linking geo- tion, represented by a fluvial system in a humid climate (Loreti Junior
logical units to the National Cooperative Soil Survey (NCSS) maps and et al., 2014). Poorly-sorted yellow and reddish sandstones with cross-
compared their results with Quaternary geological maps produced by stratification and conglomerates (de Melo, 1995) comprise the forma-
geologists. Prokopovich (1984) conducted similar studies using agricul- tion. Lakes and gullies are common across the area (Del Roveri, 2010).
tural soil survey maps to produce engineering geologic maps for an area The Serra Geral Formation originated during the Mesozoic Era, com-
in California (USA), and Florea et al. (2015) also took a similar approach posed of igneous rocks (Loreti Junior et al., 2014), such as diabases and
to map soil parent materials in Romania. Richter et al. (2019) mapped basalts, with occurrences of intertrap sandstones. The Botucatu Forma-
soil parent material in Arkansas River Valley (USA), using terrain deriv- tion, also deposited during the Mesozoic Era, consists mainly of sand-
atives and separating parent material in erosional and depositional stones. Geology is represented by reddish sandstones, medium to fine
areas. They used a rule-based approach, combining threshold values texture with cross-stratification. Sandstones of the Botucatu Formation
for the topographic position index (TPI), multi-resolution valley bottom resulted from accumulation of sediments caused by intense winds, also
flatness (MRVBF) and vertical distance to channel network (VDCN). with contribution of river processes (Carvalho, 1954), with fluvial-
Dobos et al. (2013) mapped soil parent material using MODIS images aeolian interaction (Côrtes and Perinotto, 2015). The Piramboia Forma-
from visible to thermal spectral bands, coupled with terrain attributes tion was deposited during the Mesozoic Era, and is composed mainly of
to separate regions with consolidated and unconsolidated materials sandstones. It was deposited in dunes due to the action of winds, and in-
across Central Europe, based on a maximum likelihood supervised clas- terspersed with fluvial deposits (Caetano-chang and Tai, 2003).
sification algorithm. Regional substrate formed during the Paleozoic Era consists mostly
The use of digital mapping elements such as machine learning algo- of the Corumbataí Formation, Irati Formation, Tatuí Formation, and
rithms and satellite-based covariates has emerged as an important al- Itararé Group. The Corumbataí Formation comprises clayey siltstones,
ternative to spatially predict parent materials. However, a relevant siltstones and variegated shales, and flint. Loreti Junior et al. (2014)
issue arises when using satellite images. Vegetation covers most of the studied the material influence on ceramic products, classifying the li-
soil surface, particularly in the tropical areas, with intensive agriculture thology in clayey siltstones and sandy siltstones. The Irati Formation
or dense natural vegetation. This imposes significant limitations when has different lithologies, as bituminous shales, black shales (not bitumi-
predicting surface targets, due to hiding soil reflectance from the satel- nous), limestones, silicified limestones, dolomites, gley siltstones, flint,
lite sensors. An alternative is using bare soil images, which can be elab- and occasionally sandstones. The Tatuí Formation has a glacial and
orated based on a multitemporal processing. This technique provides fluvio-glacial origin, whose sediments were deposited in aqueous
information about topsoil reflectance in areas without continuous cov- non-marine environments (Instituto Geográfico e Geológico, 1964).
erage, which can occur during any moment within the defined time se- The main rocks found are sandstones, siltstones, tillites, and varvites. Di-
ries of the satellite images. It produces a synthetic soil image (SYSI) by abase dikes are common in the region. The Itararé Group consists of five
mining satellite data to retrieve bare soil pixels using index-based clas- to six levels of tilites, interspersed with conglomerates and sandstones
sification rules (Demattê et al., 2018). To date, SYSIs have seldom been (Instituto Geográfico e Geológico, 1964), deposited during glacial
used as a predictor for mapping geology or soil parent material. thaw periods (Loreti Junior et al., 2014). The Itararé Formation is a gla-
Therefore, we expect that covariates obtained from optical satellite cial complex, whereas the Tatuí Formation is a post-glacial complex.
imagery, coupled with terrain derivatives, have potential to indicate
the spatial distribution of different parent materials across a highly
complex geological area. In this study, we aimed to evaluate a digital 3. Material and methods
mapping framework to predict soil parent materials based on remote
sensing data coupled with machine learning and geoprocessing 3.1. Parent material dataset
methods. In addition, we aimed to: a) evaluate the importance of differ-
ent group of covariates to predict parent materials; b) investigate the ef- A total of 280 georeferenced sampling points, representing geologi-
fectiveness of using bare topsoil reflectance from SYSIs to improve cal features, were acquired from three databases. Among them, 128
model predictions; and c) compare predictions obtained from different points were from a database of the Geological Survey of Brazil (Portu-
machine learning methods. guese acronym CPRM), based on rocky outcrops; 125 points were
from the “Pólo Cerâmico de Santa Gertrudes” project (Loreti Junior
2. Study area and geological background et al., 2014), with observations in areas with contacts between outcrop-
ping geological units; and 27 points were from Vidal-Torrado and
The study was conducted in the municipality of Piracicaba, which Lepsch (1999) and Marques et al. (2018) in the District of Tupi, based
covers an area of 1378 km2 in São Paulo State, Brazil (Fig. 1). Climate on soil samples with information about parent material collected in
B.R. Bonfatti et al. / Geomorphology 367 (2020) 107305 3

Fig. 1. Location of the study area in the municipality of Piracicaba, São Paulo State, Brazil. The Digital Elevation Model represents the distribution of different elevations. Yellow circles rep-
resent the observation points of the parent material used in this work.

areas of five different geomorphological classes (summits, shoulders, 3.2. Synthetic soil image (SYSI)
backslopes, footslopes, toeslopes) and different lithologies.
The corresponding surface stratigraphic unit was assigned to each of We acquired the dataset of Landsat 4, 5, 7 and 8 imagery (USGS,
280 sampling points to differentiate parent materials belonging to the 2018a, 2018b) from 1982 to 2019, and harmonized its bands to obtain
different stratigraphic formations, members or groups, which could reflectance of bare topsoil surface across the study area, applying the
suggest differently derived soil properties. For instance, sandstones of Geospatial Soil Sensing System (Demattê et al., 2018). This data mining
the Botucatu Formation, originated from eolian dune deposition, have technique produces a SYSI, using classification rules to retrieve bare soil
different composition from sandstones of the Itararé Formation with pixels and mask non-bare soil pixels (water bodies, burned areas, natu-
glacial origin. This contrast results in soils with different properties ral vegetation, and straw) from denser satellite time series, resulting in
inherited from their parent material. Accordingly, we categorized the six multispectral bands (three Vis - one NIR - two SWIR). SYSI gaps or
sampling points into 11 classes: Alluvial deposits, Sandstones of the non-bare soil pixels were filled by interpolation using ordinary kriging,
Piramboia Formation, Sandstones of the Botucatu Formation, Sand- after fitting a spherical function into each empirical semivariogram, in
stones of the Itararé Group, Sandstones of the Rio Claro Formation, Un- order to obtain a spatially continuous image.
consolidated clay, Basalts of the Serra Geral Formation, Shales of the Irati For comparison, we also acquired a single-date Landsat 8 image
Formation, Siltstones of the Tatuí Formation, Sandy siltstones of the (June 20, 2018, with the minimum cloud cover). A true color composite
Corumbataí Formation, and Clayey siltstones of the Corumbataí Forma- was implemented with both band sets to visually compare differences
tion. Information about stratigraphic units were obtained from available between the images. The datapoints were intersected with the SYSI
geological maps (Instituto Geográfico e Geológico, 1966; Loreti Junior and the Landsat bands, obtaining reflectance values on each band for
et al., 2014; Marques et al., 2018). each point. To minimize band intercorrelation and simplify the model,
4 B.R. Bonfatti et al. / Geomorphology 367 (2020) 107305

Fig. 2. Geological map across the municipality of Piracicaba (Instituto Geográfico e Geológico, 1966).

a Principal Component Analysis (PCA) was implemented in both band 38), mineral indices (from 39 to 43) and vegetation indices (from 44
sets. The principal components (PCs) tend to concentrate information to 46) (Table 2).
in the first component, which have the greatest variance. The second Statistical strength of association between each covariate and soil
principal component has the second most variance not described by parent material was analyzed using R software (R Core Team, 2019).
the first and so forth (Richards and Jia, 2006; ESRI, 2014). We compared Given that soil parent material is a categorical variable and the environ-
the statistical strength of association between parent material and the mental covariates encompass categorical and continuous variables, spe-
SYSI bands, and between parent material and the single data Landsat cific methods need to be applied for each condition. By using R software,
image, using the statistics described in the following section. we calculated a degree of association between a continuous and a cate-
gorical variable taking the r2 value of a simple linear regression. Associ-
ation between two categorical covariates, such as geomorphological
3.3. Environmental covariates and the statistical association with parent classes and parent material, was quantified based on the Cramer's V
material test (Gingrich, 1992; Kearney, 2017) using the cramersV function of R
(Navarro, 2015).
The 32 terrain derivatives and hydrologic variables used as parent
material predictors (Table 1) were obtained from a DEM, originally at
5 m spatial resolution and scaled up to 30 m. Landscape attributes (co- 3.4. Prediction models
variates 4 to 30), were calculated using SAGA GIS software. Geomorpho-
logical classes (covariates 31 and 32) were obtained by two landform We split the sampling points into two groups: 224 points (80%) for
classification algorithms. The first was performed using R software, de- the calibration dataset and 56 points (20%) for the validation dataset
lineating five geomorphological classes, as described by Marques et al. by balancing the set of points to cover all geological units (Instituto
(2018). The second was obtained using the LandMapR software, Geográfico e Geológico, 1966; Loreti Junior et al., 2014; Marques et al.,
resulting in 15 geomorphological classes (MacMillan, 2003). Spatial co- 2018). Then, the calibration dataset was used in six classification algo-
ordinates X and Y were obtained at each datapoint using the coordinate rithms: 1) random forest (RF), 2) decision trees (DT), 3) support vector
reference system EPSG code 31983 (SIRGAS 2000/ UTM zone 23 K machine (SVM), 4) multinomial logistic regression (MLR), 5) K-means
south). We used topsoil reflectance from SYSIs to calculate a set of co- clustering (KM), and 6) object based image analysis with maximum
variates: principal component using six spectral bands (from 33 to likelihood classification (OBIA-ML).
B.R. Bonfatti et al. / Geomorphology 367 (2020) 107305 5

Fig. 3. Stratigraphic profile of the Peripheral Depression in São Paulo State, where the study area is located. The stratigraphic column is divided in layers corresponding to different
geological eras and periods. The central column indicates the estimated average thickness (m) of each layer.
Adapted from Zaine and Perinotto (1996).

RF is a method that correlates independent and dependent variables that the number of clusters is previously known, keeping the number
by sets of classification trees. Tree classification comprises leaf nodes fixed during the entire classification process (Meneses and Almeida,
(output variable) and branches with a set of rules. It uses many classi- 2012). In OBIA-ML, the image is firstly segmented into homogeneous
fiers and aggregates the results. Thus, it tends to be more accurate areas. Then, some segments are selected for training and used by an al-
than base classifiers (Han et al., 2012). DT is a classifier based on binary gorithm, such as ML classification, to find statistics for each class. Subse-
trees, where an initial data (or a root node) is split into two descendant quently, ML calculates probability of each segment to belong to each
subsets, which also are split into other subsets. The process is repeated class. Finally, the highest probability class is assigned to each segment
to each subset until it reaches the terminal subsets or leaf nodes as the (Meneses and Almeida, 2012).
end product, designated by a class label (Breiman, 1984). SVM is an al- To determine the influence of environmental covariates, four scenar-
gorithm that uses an optimal hyperplane to discriminate the data ios were evaluated combining different predictors and classification al-
(Vapnik, 1998). In a linear SVM, data can be separated into two classes gorithms. Scenario 1 used terrain derivatives and hydrologic variables
by a line. Support vectors are the points closest to decision boundary as predictors of soil parent material coupled with four classifiers: RF,
represented by the line, and determine the margin into which two clas- DT, SVM and MLR. For Scenario 2, we used terrain attributes, hydrologic
ses are separated. SVM finds separating hyperplane for which the mar- variables, and spectral covariates (SYSIs and their derivatives) as predic-
gin between the samples is maximized (Rao and Scherer, 2010). MLR is tors, with the same four classifiers as Scenario 1. Scenario 3 comprised
an extension for logistic regression and can deal with three or more terrain derivatives, hydrologic variables, spectral covariates, and spatial
unique values for the dependent variable (Hosmer and Lemeshow, coordinates as predictors, using the same classification methods of pre-
2000). Prediction is based on probability of occurrence of each class, vious scenarios. Scenario 4 used only the principal components derived
using a non-linear logistic curve varying from 0 to 1. KM is a method al- from SYSIs (Table 2) as predictors, coupled with KM and OBIA-ML clas-
most completely automatic (unsupervised classification) and assumes sifiers. Scenarios 1 to 3 helped to analyze model improvement when
6 B.R. Bonfatti et al. / Geomorphology 367 (2020) 107305

Table 1
Terrain derivatives and hydrologic variables derived from a DEM used for prediction of soil parent material across the municipality of Piracicaba, São Paulo State, Brazil.

Variables Data descriptions Type Mean (Min-Max) Unit

1 Elevation Elevation above mean sea level Numeric 530.58 m


(450.42–774.11)
2 Coordinate X UTM Latitude Numeric 212,901 m
(181596–244,206)
3 Coordinate Y UTM Longitude Numeric 7,485,769 m
(7464439–7,507,099)
4 Slope Local hill slope gradient Numeric 7.8 (0–84.6) %
5 Aspect Slope aspect Numeric 186.74 (0–360) °
6 Analytical Hillshading Angle between the surface and the incoming light beams Numeric 0.79 (0.22–1.63) rad
7 Topographic Wetness Index (TWI) Indicator of spatial distribution and extent of zones of water saturation Numeric 8.08 (3.29–25.89) –
8 SAGA Wetness Index TWI to identify flow pattern in flat areas, with small difference in altitude Numeric 6.2 (−0.96–12.1) –
9 LS Factor Slope length factor Numeric 1.02 (0–23.4) –
10 Vertical Distance to Channel Network Altitude above channel network Numeric 25.18 (0–247.98) m
11 Valley Depth Relative position of the valley Numeric 32.4 (0–163.27) m
12 Slope Height Vertical distance from the base of the slope to the crest Numeric 18.26 (2.24–199.22) m
13 Normalized Height Height position within a reference area Numeric 0.5 (0.026–0.99) –
14 Standardized Height Normalized height multiplied with absolute height Numeric 271.01 (13.95–762.47) m
15 Mid Slope Position Assigns mid-slope positions with 0. Maximum vertical distances to the Numeric 0.48 (0–0.98) –
mid-slope in valley or crest = 1.
16 Flow Direction Direction of the flow Numeric 4 (0–7) –
17 Catchment Slope Slope to calculate Saga Wetness Index Numeric 0.07 (0–0.66) rad
18 Modified Catchment Area Catchment area based on slope angle and neighboring specific catchment Numeric 175,186.43 m2
areas (935.8–7,343,567.5)
19 Multiresolution index of valley bottom Indicator of valley bottoms based on flat low-lying areas Numeric 1.11 (0–5.99) –
flatness (MRVBF)
20 Multiresolution index of the ridge top Indicator of ridge tops based on elevation with respect to the surrounding Numeric 0.83 (0–5.98) –
flatness (MRRTF) areas
21 Overland Flow Distance to Channel Distance from non-channel cells to channel cells Numeric 616.15 (0–4243.02) m
Network
22 Cross-Sectional Curvature Tangential curvature Numeric 0 (0–0.01) (1/100)
*m
23 Longitudinal Curvature Profile curvature Numeric 0 (0–0.01) (1/100)
*m
24 Convergence Index Convergence/divergence regarding to overland flow Numeric 0 (−89.72–95.07) °
25 Convexity Terrain surface convexity Numeric 54.3 (17.92–91.69) %
26 Topographic Position Index (TPI) Compare elevation of each cell to the neighborhood Numeric 0 (−20.44–24.2) m
27 Mass Balance Index (MBI) Balance between soil mass deposited and eroded Numeric 0.01 (−0.8–1.09) –
28 Flow Accumulation Upslope contributing (catchment) area Numeric 2217 (1–2,046,304) n° cells
29 Vector Ruggedness Measure (VRM) Measures terrain ruggedness Numeric 0 (0–0.12) –
30 Channel Network Base Level A grid with interpolated channel network base level elevations Numeric 505.5 (452.11–637.73) m
31 Geomorphology 1 Landform classification Categorical 5 classes –
32 Geomorphology 2 Landform classification Categorical 15 classes –

aggregating each group of covariates (terrain attributes, hydrologic or Firstly, we performed a cross-validation using the entire dataset
spectral), while Scenario 4 allowed us to evaluate the capacity of the (280 observations). This dataset was split by 10-fold and repeated
bare soil images to predict parent material. 10 times resulting in 100 interactions. It was used for Scenarios 1,
2, and 3, to evaluate the predictions produced by RF, DT, SVM, and
3.5. Model evaluation MLR. Secondly, we validated the predictions using independent val-
idation dataset (20%). Thirdly, we compared our predictions to a
Three strategies were chosen to validate the methods and scenarios: thematic geological map covering 25 km2 of the study area across
10-fold cross-validation, 56 independent validation points (20%), and the District of Tupi (Vidal-Torrado and Lepsch, 1999; Marques
comparison with two legacy maps of parent material covering the et al., 2018), with geological units categorized as unconsolidated
study area. Overall accuracy and unweighted Kappa statistic were clay, shales, sandstones, and siltstones. After rasterization of the
used for evaluation, obtained using the caret package in R (Kuhn, 2008). vector map, each pixel was compared to each classified pixel for

Table 2
Spectral covariates and indices derived from SYSI bands used for prediction of soil parent material in the municipality of Piracicaba, São Paulo State, Brazil.

Variables Data descriptions Type Mean (Min-Max) Unit

33–38 PC1 to PC6 Principal Components of the SYSI Numeric 0.29 (0.076–1.12) –
39 Clay Minerals Ratio SWIR1/SWIR2 Numeric 1.14 (1–1.26) –
40 Ferrous Minerals Ratio SWIR/NIR Numeric 1.42 (0.93–2.34) –
41 Iron Oxide Ratio Red/Blue Numeric 2.02 (0.86–6.23) –
42 Soil Composition Index (SWIR1-NIR)/(SWIR1 + NIR) Numeric 0.16 (−0.17–0.51) –
43 Ferric Iron (Fe3+) Red/Green Numeric 1.34 (0.73–2.86) –
44 Perpendicular Vegetation Index (PVI) Orthogonal distance to the soil line (1:1) Numeric 0.14 (0.03–0.58) –
effective under sparse vegetation
45 Transformed Soil Adjusted Vegetation Index (TSAVI) A transformation to improve the SAVI Numeric −0.08 (−0.13–0.02) –
46 Normalized Difference Vegetation Index (NDVI) (NIR − Red) / (NIR + Red) Numeric 0.20 (−0.18–0.52) –
B.R. Bonfatti et al. / Geomorphology 367 (2020) 107305 7

every predicted map, using a confusion matrix. Finally, we built a 4. Results and discussion
confusion matrix between our predictions and a thematic geologi-
cal map covering 3.5 km 2 of the study area across the Santa Rita 4.1. Comparison between SYSI and a single-date Landsat image
Farm (Demattê et al., 2004), with geological units categorized as ba-
salts or siltstones. The Landsat true color composite (Fig. 4a), for a single date, shows
Since most soils from the study area present pedogenetic horizons, all land covers in the Piracicaba municipality, while the SYSI image com-
their attributes must match or be consistent with their parent material. posite (Fig. 4b) represents only bare soils identified by a multitemporal
The soil particle size is highly influenced by the parent material; there- analysis. Dark shadows in Fig. 4b represent locations of permanent veg-
fore, we measured the relationship between the parent material pre- etation, straw, or water bodies, no identified as uncovered surfaces and
dicted and a soil texture dataset. The predicted map was intersected masked by the multitemporal process. Differences between the SYSI
with 571 soil sampling points from an available soil database from the and the Landsat image true color composites is clearly observed, as
Brazilian Soil Spectral Library (Demattê et al., 2019), containing 2074 re- the SYSI could retrieve larger extension of bare soil spectra reflectance,
sults of soil analyses at different depths. Averages of clay, silt, and sand corresponding to 62% of the area, whereas the single-date image shows
contents (g kg−1) were calculated and compared for soil samples of vegetation covering almost the whole surface. Similarly, Demattê et al.
each parent material predicted. In addition, we performed a visual in- (2018) applied the same analysis for a larger extension in São Paulo
spection on the thematic geological map with a scale of 1:100000 and State, reaching 68% of bare soil. The smaller the area of permanent veg-
the soil parent material map predicted for the study area, to verify etation or water bodies, the greater the technique's ability to identify
their spatial correspondence. bare soils.

Fig. 4. Overview of a single-date Landsat 8 image (June 20, 2018) and the SYSI bands (bare soil) and association with parent material datapoints in the study area. a) Single-date Landsat 8
image – true color composite; b) SYSI image representing bare soil – true color composite (non-interpolated); c) correlation matrix of SYSI bands, Landsat bands, and the respective PCs.
The dark blue color corresponds to a higher positive correlation, whereas the dark red color corresponds to a higher negative correlation; d) strength of association of soil parent material
with the SYSI bands, with the Landsat bands, and with the PCs.
8 B.R. Bonfatti et al. / Geomorphology 367 (2020) 107305

SYSI bands were strongly intercorrelated, as well as the Landsat 8 correlation was B5 (NIR, r = 0.3). All SYSI bands had correlation values
bands (blue squares in Fig. 4c). PCA was carried out to reduce the effects higher than 0.4, whereas, for Landsat 8, only B4 and B7 had correlation
of the intercorrelation. The intercorrelation between the principal com- values higher than 0.4. The PC from the SYSI had PC1 (r = 0.47) and PC4
ponents (PC) was low for both satellite image sets, with PC1 explaining (r = 0.48) with correlation values close to 0.5. Meanwhile, for Landsat 8,
94.83% and 95.14% of the variance respectively for the SYSI and Landsat only PC5 (r = 0.49) had a correlation value close to 0.5. Considering the
images, and PC2 explaining 3.73% and 4.01%. The remaining PC ex- highest correlation with parent material and the uncorrelated compo-
plained b1% of the variance. nents, the six PCs from SYSI bands were chosen as covariates on the pre-
Pearson's correlation coefficients allowed identifying the linear in- diction models.
teraction of soil parent material with the SYSI bands, with the Landsat Topsoil color is represented by the SYSI, which can be used to inves-
bands and with the PCs (Fig. 4d). The bare soil image was expected to tigate soil mineralogy and its relationship with parent material (Goulart
be more closely related to parent material than the single-date Landsat et al., 1998). Goethite and hematite are the most common iron oxides in
image, considering the relationship between topsoil components and tropical soils, described as yellow-brown and red pigment agents, re-
bedrock or sediment. This was observed in the correlation plot spectively (Macedo and Bryant, 1989; Anda et al., 2008; Demattê
(Fig. 4d), where SYSI bands show higher correlation coefficients than et al., 2018). Contrary to expectations, a high correlation between top-
Landsat bands. The B1 SYSI band had the highest correlation value soil color and parent material is not observed, with correlation values
(Blue, r = 0.49), whereas the B3 band (Red, r = 0.4) had the lowest from 0.4 to 0.5 (Fig. 4d). Even for the same parent material and the
value. The bands from Landsat 8 with the highest correlations were B4 same hematite and goethite contents, color may vary according to sev-
(Red, r = 0.42) and B7 (SWIR2, r = 0.41), and the band with the lowest eral factors, such as moisture content, air temperature, soil organic

Fig. 5. Correlation between covariates and parent material. The graphic divides covariates into three groups: terrain derivatives and hydrologic covariates (Table 1), spectral covariates
(Table 2), and spatial coordinates.
B.R. Bonfatti et al. / Geomorphology 367 (2020) 107305 9

Fig. 6. Parent material maps of the municipality of Piracicaba, predicted by six different classification methods. Models 1 to 4 used terrain derivatives and hydrologic variables (Table 1),
spectral reflectance and derived indices (Table 2), and spatial coordinates (X and Y). Models 5 and 6 used only spectral covariates and derived indices.
10 B.R. Bonfatti et al. / Geomorphology 367 (2020) 107305

Table 3
The percentage area (%) of each soil parent material predicted by six classification algorithms to the total area of the municipality of Piracicaba, São Paulo State.

OP* (%) RF (%) DT (%) SVM (%) MLR (%) K-Means (%) OBIA ML (%)

Alluvial Deposit 24 14.98 9.89 17.06 26.79 7.04 11.43


Sandstones of Piramboia Formation 9.8 20.24 8.51 18.41 16.13 13.44 17.46
Sandstones of the Botucatu formation 2.2 0.10 0.00 0.00 0.58 9.94 5.00
Sandstones of the Itararé Group 12 5.67 2.57 8.69 4.67 6.96 10.98
Unconsolidated Clay 3.3 0.33 0.10 0.68 0.83 10.33 3.28
Sandstones of the Rio Claro Formation 4.1 5.12 13.07 3.67 10.31 3.14 17.24
Basalts of the Serra Geral Formation 13.1 15.71 2.15 20.22 6.74 5.16 4.84
Shales of the Irati Formation 2.9 0.05 0.00 0.00 1.78 15.32 8.44
Siltstones of the Tatuí Formation 10 7.04 16.11 6.75 9.28 9.89 2.32
Sandy Siltstones of the Corumbataí Formation 9.3 23.83 35.95 22.86 19.96 11.34 14.19
Clayey Siltstones of the Corumbataí Formation 9.3 6.93 11.65 1.66 2.93 7.44 4.82

*OP – Total Observation Points, RF - Random Forest; DT - Decision Tree, SVM - Support Vector Machine, MLR - Multinomial Logistic Regression; KM - K-Means clustering; OBIA ML - Object
Based Image Analysis with Maximum Likelihood classification.

carbon, soil pH, and contents of Fe, Mn, and Cu in the soil (Kampf and covariates, Ferric Iron Index had the highest correlation, while Clay Min-
Schwertmann, 1983). Pedogenetic processes may also significantly eral Ratio had the lowest correlation. None of the spectral covariates
change the surface material. Besides, different parent materials can re- showed correlation values higher than 0.6. The terrain derivatives in-
sult in soils with similar features, such as soils developed from shales clude seven covariates with correlations higher than 0.6. The prevailing
or basalts. Soils developed from shales have color shades very similar spectral covariates were Ferric Iron (Red/Green), Iron Oxide Ratio (Red/
to those of soils developed from basic igneous rocks, hindering visual Blue), PC4, PC1, and NDVI (Fig. 5). Terrain derivatives and hydrological
distinction of parent material (Instituto Geográfico e Geológico, 1964). covariates showed the highest correlations with parent material,
The same occurs in soils not originated from the geology underneath, evidencing how topography and hydrology are closely related to geol-
but from allochtone material that was carried from nearby areas, as ogy (Wade, 1935; Larkin and Sharp Jr., 1992; Jencso and Mcglynn,
seen in some regions of the Paulista Peripheral Depression (Vidal- 2011).
Torrado and Lepsch, 1999). In this case, the relation between the topsoil High correlation with Elevation (Fig. 5) indicates its great impor-
and the material underneath is likely to be weak. tance to separate parent materials. The association between geology
and elevation has been widely studied (Strugale et al., 2007; Holz
4.2. Evaluation of the covariates et al., 2010; Salgado et al., 2015; Marques et al., 2018). MRVBF is an
index used to separate valley bottom from hillslope areas, identifying
The correlation coefficients between all covariates and parent mate- valley bottoms at a range of sizes and slopes (Gallant and Dowling,
rial are shown in Fig. 5. The graphic has three sets of covariates: 1) ter- 2003). It can identify depositional parts of the landscape and could be
rain derivatives and hydrologic variables, 2) spatial coordinates, and applied to delineate hydrologic and geomorphic units, as well as com-
3) spectral covariates. The covariate Multiresolution Index of Valley Bot- pare catchments quantitatively (Gallant and Dowling, 2003). MRVBF is
tom Flatness (MRVBF) had the highest correlation, whereas the lowest useful for delineating alluvial deposits and separating them from other
correlation was observed for the covariate Analytical Hillshading, al- lithologies. Channel network and vertical distance to channel network
though both are terrain derivatives. The spatial coordinate X had the were also effective to identify depositional areas, such as Alluvial De-
highest correlation with parent material (r N 0.6). Among the spectral posits and Unconsolidated Clay.

Table 4
Validation of different methods used to predict soil parent material in the study area. The overall accuracy and Kappa coefficient were evaluated.

Models Cross-validation Points – CPRM (external Tupi District Santa Rita Farm
dataset)

Accuracy Kappa Accuracy Kappa Accuracy Kappa Accuracy Kappa

Scenario 1: Terrain derivatives


RF* 0.49 0.34 0.59 0.32 0.63 0.27 0.53 0.20
DT 0.33 0.17 0.46 0.10 0.74 0.36 0.26 0.03
SVM 0.38 0.26 0.30 0.06 0.48 0.15 0.51 0.22
MLR 0.45 0.30 0.48 0.14 0.47 0.09 0.38 0.08

Scenario 2: Terrain derivatives + Spectral Covariates (from SYSI)


RF 0.49 0.33 0.63 0.39 0.68 0.35 0.64 0.27
DT 0.42 0.26 0.45 0.13 0.43 0.09 0.59 0.25
SVM 0.38 0.26 0.59 0.34 0.55 0.22 0.68 0.32
MLR 0.41 0.23 0.46 0.15 0.53 0.18 0.44 0.11

Scenario 3: Terrain derivatives + Spectral Covariates (from SYSI) + Spatial Coordinates


RF 0.57 0.45 0.79 0.65 0.81 0.54 0.58 0.19
DT 0.42 0.27 0.64 0.42 0.88 0.67 0.09 0.17
SVM 0.38 0.26 0.57 0.33 0.65 0.34 0.65 0.27
MLR 0.42 0.28 0.43 0.14 0.68 0.33 0.34 0.02

Scenario 4: Spectral Covariates (from SYSI)


KM – – 0.34 0.04 0.15 0.03 0.14 0.10
OBIA ML – – 0.55 0.28 0.60 0.25 0.07 0.01

*RF - Random Forest; DT - Decision Tree; SVM - Support Vector Machine; MLR - Multinomial Logistic Regression; KM - K-Means clustering; OBIA ML - Object Based Image Analysis with
Maximum Likelihood classification.
B.R. Bonfatti et al. / Geomorphology 367 (2020) 107305 11

Fig. 7. Variable importance for parent material prediction by random forest method, using the mean decrease in Gini index. The variable numbers are referenced in Table 1.

4.3. Spatial predictions of soil parent material were dominant. When K-Means was used, unconsolidated clay oc-
curred in a larger area. Areas where alluvial deposits occurred were
Predicted maps of soil parent material by the six classification algo- larger in MLR and appeared disperse in the K-Means and OBIA ML
rithms are shown in Fig. 6. The maps from Models 1 to 4 were produced (Fig. 6). Sites with igneous rocks (basalts and diabases represented
using the terrain derivatives and hydrologic variables (Table 1), spectral by the Serra Geral Formation) were larger in RF and SVM. Shales
covariates (Table 2), and spatial coordinates X and Y. The maps from were the parent material least represented by the models, as well
Models 5 and 6 were elaborated using only the PC from SYSI bands. Gen- as unconsolidated clay. Among siltstones, prevailed the occurrence
erally, in the western part of the study area, sandstone is the predomi- of sandy siltstones of the Corumbataí Formation. A visual analysis
nant parent material, whereas in central-east, siltstones are dominant. of the maps showed that sandy siltstones occur mainly near the allu-
Basalts are common in the eastern part of the study area. The compari- vial deposits, following the river course.
son of prediction maps (Fig. 6) showed significant differences. Visually, Table 3 and Fig. 6 show how different methods can produce con-
basalts occur in representative sites on the maps from Models 1 and 3; trasting results, underlining the importance of testing and validating
however, their occurrence is reduced on the maps from Models 2, 4, 5, methods before choosing the most appropriate one. Each method pre-
and 6. Parent material of sedimentary origin is predominant on the dicted different dominant parent material in the study area. Some par-
maps from Models 2, 4, 5, and 6, whereas igneous rocks were less rep- ent material had little representation, such as the shales of the Irati
resentative. Alluvial deposits are clear on the maps from Models 1 to Formation, with zero representation in the DT and SVM models; never-
4; however, very dispersed on the maps from Models 5 and 6. Unconsol- theless, with 15.32% of representation in the K-Means model. Sand-
idated clays occur in larger areas on the map from Model 5; however, stones of the Botucatu Formation showed a similar condition. We
less representative on the maps from Models 1 to 4. noted that the DT method produced an artifact and a non-continuous
The proportion of each parent material to the total study area, color map, due to the influence of the spatial coordinate X, an important
predicted by each classification algorithm, is described in Table 3. covariate that presented a high correlation with parent material. A tree-
Occurrence of sandstones was more representative in the models structured classification incorrectly divided the resulting map by a ver-
RF and OBIA ML, where sandstones of the Piramboia Formation tical line, corresponding to a significant X value to the classification
12 B.R. Bonfatti et al. / Geomorphology 367 (2020) 107305
B.R. Bonfatti et al. / Geomorphology 367 (2020) 107305 13

Fig. 9. Map of soil parent material of the municipality of Piracicaba, predicted by random forest classifier using Scenario 3 (Table 4).

rules. Based on a possibility of unpredictable artifacts, the use of spatial observed in parameter validation for the District of Tupi and the Santa
coordinates in decision tree models is not advisable. Rita Farm. Predictions showed high improvement when using spatial
Meyer et al. (2019) promote an interesting discussion about apply- coordinates (Scenario 3), with a considerable increase in accuracy and
ing spatial coordinates in prediction models. They indicated that the Kappa coefficient, using cross-validation, independent validation
geolocation variables (i.e. latitude and longitude) can intensify the prob- points, and comparison with legacy geological maps for the District of
lem of autocorrelation, especially for spatially clustered data, causing Tupi. Compared to Scenario 1, the average accuracy improved by
negative effects on the models mainly in areas beyond the location of 11.90% and average Kappa by 25.57% in Scenario 2, and by 18.98% and
the training samples. They suggested different validation strategies, 72.49% in Scenario 3, respectively. Combination of all covariates (Sce-
with could evaluate spatial dependence or spatial autocorrelation, in- nario 3) showed that RF led to higher accuracies in every validation
stead of a commonly used cross-validation. In this work, we adopted strategy, except for the District of Tupi, in which the DT method had
several ways of validation, including the common 10-fold cross- higher values. Thus, Scenario 3 showed the best scenario of covariates
validation, independent points of observation, and highly detailed leg- combination, with RF as the best method.
acy geological maps for the study area. We assumed that its inclusion Scenarios using the terrain derivatives, spatial coordinates and SYSIs
deserves a comprehensive study in each prediction area. were proved to be more effective than those using only the PC of the
SYSIs (Table 4). Although SYSI bands showed different color shades,
4.4. Prediction scenarios and variable importance where the difference between soils developed from igneous and sedi-
mentary rocks are visually evident, some parent material was not well
Among the prediction scenarios, Scenario 4 had the lowest values for represented, such as alluvial deposits. Despite the difficulties, sandstone
parameters validation (Table 4). Scenario 1 showed a high importance sites were sufficiently well represented. We noted also a significant con-
of terrain derivatives and hydrologic covariates, accounting for most of tribution of the spatial covariates, specially the X coordinate, in the dif-
the variability in the validation datasets. Scenario 2 with spectral covar- ferent validation strategies. Based on the validation results, and
iates showed a moderate improvement compared to Scenario 1, as considering the representativeness of the training samples for the entire

Fig. 8. Distribution of parent material by the main covariates used to construct the random forest model. Information was obtained by the intersection between the sampling points used to
train the model and the raster covariates.
14 B.R. Bonfatti et al. / Geomorphology 367 (2020) 107305

Fig. 10. Evaluation of the soil parent material predictions. a) Map of soil parent material predicted by the random forest model. Different parent materials are represented by different
colors and blue dots show the 56 sampling points of validation dataset. b) Predicted parent material of Santa Rita Farm and the correspondent area of the geology map (Demattê et al.,
2004). c) Predicted parent material of Tupi District and the correspondent area of the geology map (Vidal-Torrado and Lepsch, 1999; Marques et al., 2018).

study area, we chose to keep the spatial coordinates in the models, de- of the Rio Claro Formation occur close to the rivers (Instituto
spite the limitations described in Section 4.3. Geográfico e Geológico, 1965; de Melo, 1995). Unconsolidated clay is
a different material compared to the bedrock underneath and probably
4.5. Covariates importance is deposited from the nearby areas with higher elevations where deeply
weathered soil profiles often occur, such as Ferralsols (Vidal-Torrado
The most important variable used was Elevation, followed by the X and Lepsch, 1999; Vidal-Torrado et al., 1999). Above 800 m asl, sand-
coordinate, Channel Network Base Level, Modified Catchment Area, stones of the Piramboia and the Botucatu Formations had small occur-
and MRVBF (Fig. 7). Among spectral covariates, Soil Composition Index rence areas, as well as basalts and diabases of the Serra Geral
was the most important, which may detect soil chemical composition, Formation. Fig. 8b shows parent material distribution by the spatial co-
such as iron oxides (Al-Khaier, 2003), whereas PC1 and PC2 appeared ordinate X. Sandstones of the Botucatu and the Piramboia Formations
among the 15 most important covariates. Rowan and Mars (2003) ana- and alluvial deposits were more frequent in the western part of the
lyzed Ferric Iron index using ASTER imagery, and we applied the study area, whereas the other parent materials occurred predominantly
method to Landsat 8 OLI imagery, considering Fe3+ and Al-OH absorp- in the eastern side. Areas of unconsolidated clay occur mainly in the
tion. Iron Oxide Ratio was used to detect reddish colors of rocks and eastern part of the Piracicaba region, which was also observed by
soils provided by ferric iron minerals. This index is sensitive to ferric Vidal-Torrado and Lepsch (1999) and Marques et al. (2018). The pa-
iron even at low concentrations (Rockwell, 2013). rameter Channel Network Base Level could separate alluvial deposits
Distribution of parent material according to main covariates used in with values lower than 500 m asl. Higher values were found in some
the classification is shown in Fig. 8. Alluvial deposits occurred at eleva- sections of sandstones of the Botucatu and the Piramboia Formations,
tion below 550 m above sea level (asl) (Fig. 8a). On the other hand, and in sandy siltstones of the Corumbataí Formation (Fig. 8c). Alluvial
sandstones of the Botucatu Formation appeared frequently in areas deposits also stood out in Modified Catchment Area (Fig. 8d), with a
higher than 600 m asl. Sandstones of the Piramboia Formation are in a similar distribution of values. Alluvial deposits were also identified by
lower position in the stratigraphic column when compared to sand- MRVBF, with most values higher than 2. The other parent material
stones of the Botucatu Formation (Fig. 3). However, they do not have showed similar distributions with most of values lower than 2
a clear lithological distinction (Instituto Geográfico e Geológico, 1964) (Fig. 8e). Fig. 8f shows the distribution of parent material by VDCN. As
and are frequently represented jointly in maps (Instituto Geográfico e expected, alluvial deposits had the lowest values. Unconsolidated clay
Geológico, 1966). Unconsolidated clays and sandstones of the Rio had values close to 90 m and sandstones of the Rio Claro Formation
Claro Formation occur more frequently around 600 m asl. Sandstones had values close to 50 m. Sandstones of the Botucatu Formation occur
B.R. Bonfatti et al. / Geomorphology 367 (2020) 107305 15

in sites with a wide range of VDCN values. The other parent material dis- other lithologies, such as weathered igneous rock (Instituto Geográfico
tribution was similar, generally lower than 100 m. e Geológico, 1965), 2) few sampling points to implement the models,
or 3) rock texture itself is very similar to that of other sedimentary rocks.
4.6. Mapping soil parent material The distribution of soil particle size followed features of the parent
materials (Fig. 11). Generally, clayey parent material develops into
Important differences could be observed between the pre-existent clayey soils, whereas sandstones develop into sandy soils (Yaalon,
geological map of Piracicaba (scale of 1:100,000) (Fig. 2) and the soil 1971). The coarser the grain size of parent material composed mainly
parent material map produced (Fig. 9). Besides different delineation, of resistant quartz, the coarser the soil particle size (Gray and Murphy,
the geological map also has heterogeneous units, normally grouping in 2002). As observed in Fig. 11, soils developed from sandstones tend to
a broad unit to which various rocks belong together. In contrast, the pre- have more sand in their composition. Soils derived from sandstones of
dicted parent material map presents a more complex distribution. Rocks the Piramboia Formation, the Rio Claro Formation, alluvial deposits, or
in the geological map are divided into six units, whereas the prediction sandy siltstones of the Corumbataí Formation may have 60% of sand
map of parent material comprises 11 units (Figs. 2 and 9). content or more. However, soils developed from basalts or diabases of
Prediction showed better representation in the District of Tupi, with the Serra Geral Formation, shales of the Irati Formation, or unconsoli-
units similar to those in the geological map at the scale 1:25,000 pro- dated clay may have close to 60% of clay. The particle size distribution
vided by previous studies (Vidal-Torrado and Lepsch, 1999; Marques of soils developed from siltstones depend on bedrock composition. For
et al., 2018) (Fig. 10). In the geological map at scale 1:100,000 (Fig. 2), instance, soils developed from clayey siltstones tend to form clayey
a single geological class represents the whole District of Tupi, whereas soils, whereas soils developed from sandy siltstones tend to form soils
in the predicted map we could find five parent material classes. The with sand-rich texture. Non-predominance of silt may indicate the ma-
shales of the Irati Formation were the most difficult parent material to turity of soil genesis or influence of the parent material composition
predict, possibly because 1) weathered material is highly similar to (Birkeland, 1999; Schaetzl and Anderson, 2005).

Fig. 11. Relationship between parent material predicted by random forest and distribution of soil particle size (sand, silt and clay) up to 20 cm (average for 2054 soil sampling points in the
study area).
16 B.R. Bonfatti et al. / Geomorphology 367 (2020) 107305

5. Conclusions Carvalho, A.M.V., 1954. Contribuição ao Estudo Petrográfico do Arenito Botucatu no


Estado de São Paulo. Boletim Sociedade Brasileira de Geologia 3 (1).
Côrtes, A.R.P., Perinotto, J.A.J., 2015. Fácies e associação de fácies da Formação Piramboia
Methods for digital mapping based on remote sensing, machine na região de Descalvado (SP). Geologia. USP. 15 (3–4), 23–40. https://doi.org/
learning, and geoprocessing techniques, coupled with geological data 10.11606/issn.2316-9095.v15i3-4p23-40.
Del Roveri, C., 2010. Petrologia Aplicada da Formação Corumbataí (Região de Rio Claro -
observations, proved to be a suitable framework for mapping soil parent
SP) e Produtos Cerâmicos. Universidade Estadual Paulista (203 pp.).
material under a complex geological condition. The proposed method Demattê, J.A.M., Campos, R.C., Alves, M.C., Fiorio, P.R., Nanni, M.R., 2004. Visible–NIR re-
produced a parent material map with important details not evident in flectance: a new approach on soil evaluation. Geoderma 121 (1–2), 95–112.
the current geological map available, improving the ability to delineate https://doi.org/10.1016/J.GEODERMA.2003.09.012.
Demattê, J.A.M., Fongaro, C.T., Rizzo, R., Safanelli, J.L., 2018. Geospatial Soil Sensing System
soil parent material units. (GEOS3): a powerful data mining procedure to retrieve soil spectral reflectance from
Different statistical procedures produced very distinct maps, satellite images. Remote Sens. Environ. 212, 161–175. https://doi.org/10.1016/J.
highlighting the importance of considering the characteristics of each RSE.2018.04.047.
Demattê, J. A. M., Dotto, A. C., Paiva, A. F. S., Sato, M. V., Dalmolin, R. S. D., de Araújo, M. do
classification algorithms and the combination of covariates, when eval-
S. B., da Silva, E. B., Nanni, M. R., ten Caten, A., Noronha, N. C., Lacerda, M. P. C., de
uating the predictions. Random forest classification was the best Araújo Filho, J. C., Rizzo, R., Bellinaso, H., Francelino, M. R., Schaefer, C. E. G. R., Vicente,
method to predict parent material, with prediction reaching an overall L. E., dos Santos, U. J., de Sá Barretto Sampaio, E. V., … do Couto, H. T. Z. (2019). The
Brazilian Soil Spectral Library (BSSL): a general view, application and challenges.
accuracy of 0.81 and a Kappa coefficient of 0.65, considering different
Geoderma, 354, 113793. doi:https://doi.org/10.1016/j.geoderma.2019.05.043.
validation strategies. The method allowed spatially distinguishing the Dobos, E., Seres, A., Vadnai, P., Michéli, E., Fuchs, M., Láng, V., Bertóti, R.D., 2013. Soil par-
parent material, producing a substantially improved map. Topsoil re- ent material delineation using. MODIS and SRTM Data. 62 (2), 133–156.
flectance acquired from a synthetic satellite image enhances the average ESRI, E. S. R. I, 2014. ArcGIS Desktop Help 10.3.
Florea, N., Mocanu, V., Cotet, V., Dumitru, S., 2015. Map of soil parent material in Romania.
overall accuracy and the Kappa coefficient by 12.45% and 37.5%, respec- Research Journal of Agricultural Science 47, 57–63.
tively. These results clearly demonstrated that the incorporation of a Gallant, J.C., Dowling, T.I., 2003. A multiresolution index of valley bottom flatness for map-
bare soil image leads to enhancement of the model prediction. When ping depositional areas. Water Resour. Res. 39 (12). https://doi.org/10.1029/
2002WR001426.
spatial coordinates were added, the average overall accuracy and the
Gingrich, P., 1992. Introductory Statistics for the Social Sciences. Department of Sociology
Kappa coefficient increased by 19% and 74%, respectively. It is not cer- and Social Sciences, University of Regina (421 pp.).
tain that the use of spatial coordinates as covariates will produce maps Goulart, A.T., Fabris, J.D., Jesus Filho, M.F., Coey, J.M.D., Da Costa, G.M., De Grave, E., 1998.
without artifacts, as observed in the decision tree model, but the results Iron oxides in a soil developed from basalt. Clay Clay Miner. 46 (4), 369–378. https://
doi.org/10.1346/CCMN.1998.0460402.
suggest analyzing its inclusion, mainly in areas where the parent mate- Gray, J.M., Murphy, B.W., 2002. Parent material and world soil distribution. 17th WCSS.
rial shows evident distribution in some specific direction. 2215, pp. 1–14 Bangkok, Thailand.
The predicted map reproduced the tendency in the field of a strong Gray, Jonathan M., Murphy, B.W., 1999. Parent material and soils: a guide to the influence
relationship between soil particle size distribution and soil parent mate- of parent material on soil distribution in eastern Australia. DLWC Technical Report
No. 45 (122 pp.).
rial. The correct identification of the parent material could greatly help Gray, Jonathan M., Bishop, T.F.A., Wilford, J.R., 2016. Lithology and soil relationships for
the soil survey, indicating previously the most likely soil texture classes soil modelling and mapping. Catena 147, 429–440. https://doi.org/10.1016/j.
to be found. Thus, it is a potential tool to identify areas of similar soil catena.2016.07.045.
Han, J., Kamber, M., Pei, J., 2012. Data Mining: Concepts and Techniques. 744 pp. https://
texture. doi.org/10.1016/C2009-0-61819-5.
Heung, B., Bulmer, C.E., Schmidt, M.G., 2014. Predictive soil parent material mapping at a
regional scale: A Random Forest approach. Geoderma 214–215, 141–154. https://doi.
org/10.1016/J.GEODERMA.2013.09.016.
Declaration of competing interest Holz, M., França, A.B., Souza, P.A., Iannuzzi, R., Rohn, R., 2010. Journal of South American
Earth Sciences A stratigraphic chart of the Late Carboniferous/Permian succession of
The authors declare that they have no known competing financial the eastern border of the Paraná Basin, Brazil, South America. J. S. Am. Earth Sci. 29
(2), 381–399. https://doi.org/10.1016/j.jsames.2009.04.004.
interests or personal relationships that could have appeared to influ-
Hosmer, D.W., Lemeshow, S., 2000. Applied Logistic Regression. 383 pp. https://doi.org/
ence the work reported in this paper. 10.1002/0471722146.
Instituto Geográfico e Geológico, 1964. Geologia do Estado de São Paulo. Boletim No 41
Acknowledgements (273 pp.).
Instituto Geográfico e Geológico, 1965. Descrição Geológica e Geográfica das Folhas de
Piracicaba e São Carlos, SP. Boletim No 43 (54 pp.).
We would like to thank the São Paulo Research Foundation (FAPESP) Instituto Geográfico e Geológico, 1966. Folha Geológica de Piracicaba. Escala 1:100.000.
for the financial support for the authors (FAPESP grant n° 2018/12678- IGG.
5, FAPESP grant n° 2014/22262-0, FAPESP grant n° 2015/16172-0, Jencso, K.G., Mcglynn, B.L., 2011. Hierarchical controls on runoff generation: topographi-
cally driven hydrologic connectivity, geology, and vegetation. 47 (November),
FAPESP grant n° 2018/23760-4, FAPESP grant n° 2016/26124-6, FAPESP 1–16. https://doi.org/10.1029/2011WR010666.
grant n° 2016/01597-9, FAPESP grant n° 2020/04306-0). We also thank Jenny, H., 1941. Factors of Soil Formation: A System of Quantitative Pedology. McGraw-
the Geotechnologies on Soil Science Group - GEOCIS (esalqgeocis. Hill, New York (191 pp.).
Kampf, N., Schwertmann, U., 1983. Goethite and hematite in a climosequence in Southern
wixsite.com/english). Brazil and their application in classification of kaolinitc soils. Geoderma 29, 27–39.
Kearney, M.W., 2017. Cramér’s V. In: Allen, M.R. (Ed.), Sage Encyclopedia of Communica-
References tion Research Methods. SAGE, Thousand Oaks, CA, pp. 1–4.
Kuhn, M., 2008. Building predictive models in R using the caret package. Journal of Statis-
Al-Khaier, F., 2003. Soil Salinity Detection Using Satellite Remote Sensing. 1–59. tical Software 28 (5), 1–26. https://doi.org/10.18637/jss.v028.i05.
Alvares, C.A., Stape, J.L., Sentelhas, P.C., De Moraes Gonçalves, J.L., Sparovek, G., 2013. Lacoste, M., Lemercier, B., Walter, C., 2011. Geomorphology Regional mapping of soil par-
Köppen’s climate classification map for Brazil. Meteorol. Z. 22 (6), 711–728. https:// ent material by machine learning based on point data. Geomorphology 133 (1–2),
doi.org/10.1127/0941-2948/2013/0507. 90–99. https://doi.org/10.1016/j.geomorph.2011.06.026.
Anda, M., Shamshuddin, J., Fauziah, C.I., Omar, S.R.S., 2008. Mineralogy and Factors Con- Larkin, R.G., Sharp Jr., J.M., 1992. On the relationship between river-basin geomorphology,
trolling Charge Development of Three Oxisols Developed From Different Parent Ma- aquifer hydraulics, and ground-water flow direction in alluvial aquifers. Geol. Soc.
terials. 143, pp. 153–167.. https://doi.org/10.1016/j.geoderma.2007.10.024. Am. Bull. 104, 1608–1620.
Birkeland, P.W., 1999. Soils and Geomorphology. 1–430. Oxford University Press. Loreti Junior, R., Sardou Filho, R., Caltabeloti, F.P., 2014. Polo Cerâmico de Santa Gertrudes.
Bjornberg, S., Landim, B., 1966. Contribuição ao Estudo da Formação Rio Claro Programa Geológico do Brasil. CPRM, São Paulo (69 pp.).
(Neocenozóico). Boletim Sociedade Brasileira de Geologia 15 (4). Macedo, J., Bryant, R.B., 1989. Preferential microbial reduction of hematite over goethite in
Bogunovic, I., Trevisani, S., Pereira, P., Vukadinovic, V., 2018. Mapping soil organic matter a Brazilian Oxisol. Soil Sci. Soc. Am. J. 53 (4), 1114–1118. https://doi.org/10.2136/
in the Baranja region (Croatia): geological and anthropic forcing parameters. Sci. sssaj1989.03615995005300040022x.
Total Environ. 643, 335–345. MacMillan, R.A., 2003. LandMapR Software Toolkit - C++ Version: User’s Manual (p. 110).
Breiman, L., 1984. Classification and Regression Trees. Wadsworth International Group, LandMapper Environmental Solutions Inc., Edmonton, p. 110.
Michigan, p. 358. Marques, K.P.P., Demattê, J.A.M., Miller, B.A., Lepsch, I.F., 2018. Geomorphometric Seg-
Caetano-chang, M.R., Tai, W.F.T., 2003. Diagênese de Arenitos da Formação Pirambóia no mentation of Complex Slope Elements for Detailed Digital Soil Mapping in Southeast
Centro-Leste Paulista. Geociências - UNESP. 22 pp. 33–39. Brazil. 14, pp. 1–9.. https://doi.org/10.1016/j.geodrs.2018.e00175.
B.R. Bonfatti et al. / Geomorphology 367 (2020) 107305 17

McBratney, A., Mendonça Santos, M., Minasny, B., 2003. On digital soil mapping. Schaetzl, R., Anderson, S., 2005. Soils: Genesis and Geomorphology. Cambridge University
Geoderma 117 (1–2), 3–52. https://doi.org/10.1016/S0016-7061(03)00223-4. Press, New York (817 pp.).
de Melo, M.S., 1995. A Formação Rio Claro e Depósitos Associados: Sedimentação Silvero, N.E.Q., Siqueira, D.S., Coelho, R.M., Da Costa Ferreira, D., Marques, J., 2019. Protocol
Neocenozóica na Depressão Periférica Paulista. USP (164 pp.). for the use of legacy data and magnetic signature on soil mapping of São Paulo Cen-
Meneses, P.R., Almeida, T.De., 2012. In: Meneses, P.R., De Almeida, T. (Eds.), Introdução ao tral West, Brazil. Sci. Total Environ. 693, 133463. https://doi.org/10.1016/j.
Processamento de Imagens de Sensoriamento Remoto. UNB, CNPQ, Brasília (266 pp.). scitotenv.2019.07.269.
Meyer, H., Reudenbach, C., Wöllauer, S., Nauss, T., 2019. Importance of spatial predictor Strugale, M., Rostirolla, S.P., Mancini, F., Portela Filho, C.V., Ferreira, F.J.F., de Freitas, R.C.,
variable selection in machine learning applications – moving from data reproduction 2007. Structural framework and Mesozoic–Cenozoic evolution of Ponta Grossa
to spatial prediction. Ecol. Model. 411, 108815. https://doi.org/10.1016/j. Arch, Paraná Basin, southern Brazil. J. S. Am. Earth Sci. 24 (2–4), 203–227. https://
ecolmodel.2019.108815. doi.org/10.1016/j.jsames.2007.05.003.
Miller, B.A., Burras, C.L., Crumpton, W.G., 2008. Using soil surveys to map Quaternary par- USGS, 2018a. Landsat 4–7 Surface Reflectance (LEDAPS) Product Guide. U.S. Geological
ent materials and landforms across the Des Moines Lobe of Iowa and Minnesota. Soil Survey, Sioux Falls (32 pp.).
Survey Horizon 49, 91–95. https://doi.org/10.2136/sh2008.4.0091. USGS, 2018b. Landsat 8 Surface Reflectance Code (LASRC) Product Guide. U.S. Geological
Navarro, D., 2015. Learning Statistics With R: A Tutorial for Psychology Students and Survey, Sioux Falls (33 pp.).
Other Beginners. (Version 0.5). University of Adelaide, Adelaide, Australia. Vapnik, V.N., 1998. Statistical Learning Theory. John Wiley and Sons, Inc. (732 pp.).
Petri, S., Coimbra, A.M., Amaral, G., Ojeda, H.A., Fúlfaro, V.J., Ponçano, W.L., 1986. Código Vidal-Torrado, P., Lepsch, I.F., 1999. Relações Material de Origem/Solo e Pedogênese em
Brasileiro De Nomenclatura Estratigráfica Guia De Nomenclatura Estratigráfica. uma Sequência de Solos Predominantemente Argilosos e Latossólicos sobre Psamitos
Revista Brasileira de Geociências 16 (4), 370–415. https://doi.org/10.25249/0375- na Depressão Periférica Paulista. Revista Brasileira de Ciência Do Solo 23, 357–369.
7536.1986370415. Vidal-Torrado, P., Lepsch, I.F., Castro, S.S., Cooper, M., 1999. Pedogênese em uma
Poppiel, R.R., Lacerda, M.P.C., Safanelli, J.L., Rizzo, R., Oliveira, M.P., Novais, J.J., Demattê, J.A. seqüência Latossolo-Podzólico na borda de um platô na Depressão Periférica Paulista.
M., 2019. Mapping at 30 m resolution of soil attributes at multiple depths in Midwest Revista Brasileira de Ciência Do Solo 23, 909–921. https://doi.org/10.1590/S0100-
Brazil. Remote Sens. 11 (24), 2905. https://doi.org/10.3390/rs11242905. 06831999000400018.
Prokopovich, N.P., 1984. Use of agricultural soil survey maps for engineering geologic Wade, A., 1935. The relationship between topography and geology. Australian Surveyor 5
mapping. Bulletin of the Association of Engineering Geologists xxi (4), 437–447. (6), 367–371. https://doi.org/10.1080/00050326.1935.10436440.
R Core Team, 2019. R: A Language and Environment for Statistical Computing. R Founda- Willgoose, G., 2018. A Coupled Soilscape-landform Evolution Model: Model Formulation
tion for Statistical Computing Retrieved from. https://www.r-project.org/. and Initial Results. pp. 1–38.
Rao, R.P.N., Scherer, R., 2010. Statistical Pattern Recognition and Machine Learning in Wilson, M.J., 2019, November 1. The importance of parent material in soil classification: a
Brain-computer Interfaces. pp. 335–368. review in a historical context. Catena 182, 104131. https://doi.org/10.1016/j.
Richards, J.A., Jia, X., 2006. Remote Sensing Digital Image Analysis. 4th edition. XIX. catena.2019.104131.
Springer, p. 494. Yaalon, D.H., 1971. Soil-forming intervals in time and space. In: Yaalon, D.H. (Ed.),
Richter, J., Owens, P.R., Libohova, Z., Adhikari, K., Fuentes, B., 2019. Catena Mapping parent Paleopedology. University Press, Jerusalem, pp. 29–39.
material as part of a nested approach to soil mapping in the Arkansas River Valley. Zaine, M.F., Perinotto, J.A.J., 1996. Patrimônios Naturais e História Geológica da Região de
Catena 178 (February), 100–108. https://doi.org/10.1016/j.catena.2019.02.031. Rio Claro - SP. Câmara Municipal de Rio Claro e Arquivo Público Histórico do
Rockwell, B.W., 2013. Automated Mapping of Mineral Groups and Green Vegetation From Município de Rio Claro, Rio Claro, pp. 1–99.
Landsat Thematic Mapper Imagery With an Example From the San Juan Mountains, Zeraatpisheh, M., Ayoubi, S., Jafari, A., Finke, P., 2017. Comparing the efficiency of digital
Colorado. U.S. Geological Survey, Virginia https://doi.org/10.3133/sim3252. and conventional soil mapping to predict soil types in a semi-arid region in Iran. Geo-
Rowan, L.C., Mars, J.C., 2003. Lithologic Mapping in the Mountain Pass, California Area morphology 285, 186–204. https://doi.org/10.1016/j.geomorph.2017.02.015.
Using Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER)
Data. 84 pp. 350–366.
Salgado, A.A.R., Bueno, G.T., Diniz, A.D., Marent, B.R., 2015. In: Vieira, B.C., Salgado, A.A.R.,
Santos, L.J.C. (Eds.), Landscapes and Landforms of Brazil. Springer.

You might also like