Professional Documents
Culture Documents
Engineering Geology
journal homepage: www.elsevier.com/locate/enggeo
A R T I C L E I N F O A B S T R A C T
Keywords: In this paper, we present a suitable integration of discrete and continuous data in a unique methodology based on
Landslide susceptibility systematically collected landslide inventory data. Eleven landslide conditioning factors were analyzed and used,
Principal component analysis where eight correspond to DEM–derived variables, and three to thematic polygon–type variables (shallow ge
Multiple logistic regression
ology, geomorphology and soil land–use). Principal Component Analysis (PCA) was used to avoid the effect of
Weights of evidence
Geographical information system
multicollinearity. Additionally, 3 proposals were developed using Logistic Regression (LR) and Weights of Evi
Bivariate statistical methods dence (WoE) methods that use the continuous and discrete variables efficiently, respectively. The performance of
Natural hazards each proposal was evaluated by the Area Under the Curve (AUC) of the Receiver Operating Characteristic (ROC)
curves. The analysis indicated that Proposal 1 with AUC = 0.8578 and Proposal 2 with AUC = 0.8459 have the
best LSI assessment, while the performance of Proposal 3 with AUC = 0.8054 shows the lowest prediction ap
proaches. In comparison with the WoE method, our proposal shows an increase in high and very high suscep
tibility in areas with complex topography, which is consistent with the reported landslides.
1. Introduction based (Corominas et al., 2014). The heuristically based approach pro
duces an index based on a landslide inventory and the weights assigned
Landslides are one of the major natural hazards controlled by a to conditioning factors (Luo and Liu, 2018; Van Westen et al., 2008).
combination of factors such as rock type, slope, and land use type that This can result in very reliable maps, on the condition that the mapping
lead to instability in the terrain (Van Westen et al., 1999). Some of the was done with care (Van Westen et al., 1999), but it involves the sub
consequences are property damage, economic loss, and human casu jective definition of the weights of each conditioning factor (Lin et al.,
alties. These problems have led governments to work on the prevention 2017). The statistically based approach uses past landslides and condi
and mitigation of disasters. For instance, Colombia created the 1077 tioning factors to produce an objective quantitative LSI map (Luo and
Decree of 2015, which contains the minimum requirements to elaborate Liu, 2018). This approach include bivariate statistical models such as
Landslide Susceptibility Maps (LSM). These requirements include the Weights of Evidence (Hong et al., 2017; Pamela et al., 2018; Mahdadi
use of discrete data (vectorial features such as Shallow Geological Units, et al., 2018) and frequency ratio (Aditian et al., 2018), multivariate
Geomorphology, and land–use thematic maps) integrated with contin statistical techniques such as Logistic Regression (Hemasinghe et al.,
uous data (raster) derived from Digital Elevation Model (DEM), and a 2018; Lombardo and Mai, 2018; Kadavi et al., 2019; Yang et al., 2019;
landslide field inventory (Servicio Geológico Colombiano, 2017). Cantarino et al., 2019; Zhao et al., 2019), as well as non–linear methods
Therefore, the suitable integration of discrete (also called categorical) such as support vector machine (Tien Bui et al., 2019; Achour and
and continuous data in a unique methodology represents one of the most Pourghasemi, 2019; Hu et al., 2019; Bui et al., 2020) and artificial neural
important challenges in the LSM. network (Cantarino et al., 2019; Wang et al., 2019; Abbaszadeh Shahri
LSM describe the spatial distribution of the landslide probability in et al., 2019; Bragagnolo et al., 2020).
an area based on local geo–environmental factors (Dai et al., 2002). Logistic Regression has been used in numerous landslide suscepti
Each pixel of LSM has an associated Landslide Susceptibility Index (LSI) bility assessments to predict the probability of a landslide occurring
value. The methodology to calculate LSI at a regional scale can be from independent explanatory variables, providing accurate and reli
roughly divided into two categories: heuristically based and statistically able results (Pradhan, 2010; Ozdemir and Altural, 2013; Tsangaratos
* Corresponding author.
E-mail addresses: goyes.yesid@gmail.com (P. Goyes-Peñafiel), maria.hernandez26@correo.uis.edu.co (A. Hernandez-Rojas).
https://doi.org/10.1016/j.enggeo.2020.105958
Received 4 March 2020; Received in revised form 14 August 2020; Accepted 6 December 2020
Available online 13 December 2020
0013-7952/© 2020 Elsevier B.V. All rights reserved.
P. Goyes-Peñafiel and A. Hernandez-Rojas Engineering Geology 280 (2021) 105958
and Ilia, 2016; Hemasinghe et al., 2018; Oh et al., 2018). Both contin 2. Materials and methods
uous and categorical variables can be used for Logistic Regression ana
lyses (Lin et al., 2017; Mahdadi et al., 2018). However, according to 2.1. Study area
Zhao et al. (2019), continuous variables need to be converted to cate
gorical variables or viceversa. In the context landslides, there is not a The Popayan municipality is the capital of the Cauca department and
necessary intrinsic ordering to the categories such as geology or land is located in the southwest of Colombia (Fig. 1), between the longitudes
–use. Additionally, for the variables of nominal type (similar to the 73∘48′ W and 76∘26′ W and latitudes 2∘18′ N and 2∘36′ N, with an average
categorical ones but with an explicit ordering of the variables, such as altitude of 1985 m above sea level (masl). The total area covers
slopes or aspect which are classified in ranges), performing a classifi approximately 478.3 km2 and the urban area 0.0417 km2. This munic
cation is necessary, which implies a subjective selection of the ranges ipality extends from northwest to southeast of the inter–Andean Upper
chosen for each study zone. Cauca River Valley, which is limited by the western flank of the Central
On the other hand, the Weights of Evidence method calculates the Cordillera and the eastern flank of Western Cordillera in the Andes
weight for each causative factor of a landslide based on the presence or Mountains.
absence of landslides within the area. The fundamental assumption of The climate in the region is determined by the proximity of the
this method is that future landslides will occur under similar conditions inter–tropical convergence zone, very high humidity tropical rain
from those contributing to previous landslides. It also assumes that forests, and humid temperate and cold zones, resulting in a fairly con
causative factors for the mapped landslides remain constant over time stant humidity (Bastidas et al., 2004; Servicio Geológico Colombiano,
(Regmi et al., 2010; Pamela et al., 2018; Mahdadi et al., 2018; Ozdemir 2015). Precipitation and temperature records are available from the
and Altural, 2013). This method uses discrete data or continuous data climate station at the Popayan airport. These records show annual
divided into several classes (nominal data), and overlaps each one with rainfall ranges from 1400 mm to 2500 mm (average 2119 mm). The
the landslide map to calculate their weights based on the Bayes theorem distribution of rainfall over a year is not uniform, and most rainfall
(Mahdadi et al., 2018). events occur between September and December, with a relatively dry
This paper aims to calculate LSI based on Logistic Regression and period from June to August. The mean annual temperature is about 18
Weights of Evidence model methods in order to perform a suitable ∘
C. The average annual minimum temperature is between 12 and 14 ∘C
integration of categorical and continuous data in the landslide context. and the maximum annual temperature is between 23 and 25 ∘C.
The conditioning factors initially used for Logistic Regression are slope, The topography of the Popayan municipality is controlled by two
general curvature, profile curvature, plan curvature, flow length, terrain main relief zones. The first one is located in the southeast of the mu
roughness index, flow accumulation, and topographic wetness index. nicipality and includes the western flank of the Central Cordillera. This
Logistic Regression requires the use of independent variables, therefore zone covers about 30% of the study area and is characterized by high
Principal Component Analysis (PCA) was used to decrease the dimen elevations with ranges from 1862 to 3822 masl. The inclination in this
sionality and thus guarantee the independence between the variables of mountainous zone is steep, with a mean slope of 25∘ and slope orien
the model. For Weights of Evidence, Shallow Geological Units, Geo tation mainly towards the southeast. These features make this zone
morphology, and land–use thematic maps were used. In order to choose prone to landslides. On the other hand, the second zone covers about
the best model in this study, the accuracy of each assessment was 70% of the study area and is located at an average altitude of 1800 masl
evaluated by the Receiver Operating Characteristic (ROC) curves and in the inter–Andean Cauca–Patia Basin, being crossed by Cauca, Hondo,
the Area Under the Curve (AUC) value which are based on sensitivity and Palace rivers. This zone, unlike the first one, is characterized by a
(true positivity rate) and specificity (false negative rate) (Tien Bui et al., wide plain covering it. In the urban area and its surroundings, the slopes
2016; Bui et al., 2020). gradually decline, but towards the west, they are steeper and susceptible
2
P. Goyes-Peñafiel and A. Hernandez-Rojas Engineering Geology 280 (2021) 105958
to landslides due to uplifted areas as a result of tectonic activity (Servicio according to their susceptibility to landslide occurrence based on the
Geológico Colombiano, 2015). expert review of the conditioning factors present in each one and the
The geologic setting of the study area is the Cauca–Patia Basin, material evidenced in fieldwork.
located in the North Andean block (Servicio Geológico Colombiano,
2015). This basin is divided from north to south in Cauca Subbasin,
Popayan Highland, and Patia Subbasin. Popayan Highland covers most 2.3. Landslide conditioning factors
of the study area (70%) and is characterized by the presence of land
forms such as fans in the urban area, and hills and exhumed rocks in the We introduced 11 landslide variables as relevant and influential to
west. This landscape is a result of the weathering of pyroclastic rocks landslide phenomena considering the geo–environmental settings of the
which develops residual soils. The rest of the area (30%) corresponds to research area, the data availability, and the 1077 Decree of 2015 of
landforms related to the Central Cordillera, such as recent and old vol Colombian legislation (Table 1). The categorical variables selected
canoes, calderas, and those controlled by tectonic setting. These land include Shallow Geological Units (p1), Geomorphology (p2), and Soil
forms are developed in plutonic and volcanic rocks to the northeast and Land–Use (p3, Corine Land Cover units). Whereas, continuous variables
in the southeastern in residual soils of these rocks. were extracted from the DEM with a spatial resolution of 12.5 m created
by the Advanced Land Observation Satellite with a Phased Array type
L–band Synthetic Aperture Radar (ALOS PALSAR). These include slope
2.2. Landslide inventory map angle (x1), Terrain Roughness Index (x2, TRI), Topographic Wetness
Index (x3, TWI), plan curvature (x4), profile curvature (x5), general
Eight hundred sixteen landslides were mapped based on geological curvature (x6), flow length (x7), and flow accumulation (x8).
field survey, historical report and remote sensing images interpretation For categorical variables, Shallow Geological Units are important in
(Fig. 1). The field survey identifies, locates, and classifies landslides. The slope instability because they represent the material exposed in the
historical report contains a compilation of historical events associated terrain with characteristics such as resistance, weathering level, and
with slope instability. This data set can be found in the Landslide In consistency (Servicio Geológico Colombiano, 2017). Geomorphology
formation System–SIMMA on the Colombian Geological Survey web site registers the landforms and the weathering process by which they are
(simma.sgc.gov.co). Finally, the remote sensing interpretation was made affected (Carvajal, 2012). Soil Land–Use means the actual conditions of
by observing aerial photographs and satellite images with 25 m of res vegetation and the human activity in the territory (Van Westen et al.,
olution. All the data previously mentioned was provided by the 2008).
Colombian Geological Survey (SGC in Spanish). One of the most important variables derived from the DEM is the
The landslides within the study area are mainly located in residual slope, which influences landslide stability because it controls the ve
soils with medium to high slopes angles (82%) and, to a lesser extent, in locity of the material (Abbaszadeh Shahri et al., 2019; Zhao et al., 2019;
rock units (18%). They are characterized by different sizes varying from Trigila et al., 2015). TRI indicates the undulation or variation of relief
39,428 m2 to 26 m2, with an average area of 2264 m2. According to their (Servicio Geológico Colombiano, 2015). TWI is used to characterize the
type of movement, landslides were classified in slides, falls (also sub spatial distribution of soil saturation and runoff volume (He et al.,
divided based on their material type), lateral spreads, and complexes. 2019). Plan, profile, and general curvatures are morphometric param
While falls are not landslides, they can be taken as indicators of an up eters that represent the spatial variation of the slope gradient and
coming event. The remaining landslides are presented as “undefined” highlight converging (concave curvature) or divergent (convex curva
due to the impossibility of defining their type of movement by remote ture) water flows (Trigila et al., 2015). Flow length is the longest upslope
sensing interpretation (Fig. 2). In addition to the 816 landslides mapped, distance along the flow path from each cell to the top of a drainage
200 new points landslide occurring were created and classified divide (Regmi et al., 2010). Finally, flow accumulation is the number of
Table 1
Variables names and types used in this study. The data was provided by the SGC
(Servicio Geológico Colombiano, 2017).
Variable Name Data Variable Description
type type
3
P. Goyes-Peñafiel and A. Hernandez-Rojas Engineering Geology 280 (2021) 105958
upstream cells that flow into each cell of the study area (Trigila et al., assigned to each of the different classes within a factor map is classified
2015). (e.g. each geological unit within a Shallow Geological map). An exten
sive explanation of WoE is presented in Ilia and Tsangaratos (2016). The
following equation illustrates the Bayes theorem (Van Westen et al.,
2.4. Logistic regression and principal component analysis
2003).
⎛ ⎞
Logistic Regression (LR) is a quantitative statistical method exten
P{Bi |S} ⎠
sively used for landslide susceptibility analysis. It is used to predict the +
Wi = log ⎝ { } , (3)
probability of the presence of landslides as a function of predictor var P Bi |S
iables (Daya et al., 2018). Those variables can be either continuous,
discrete or any combination of both types, and they do not necessarily and
have normal distributions (Mahdadi et al., 2018). ⎛ { }⎞
( ) P Bi |S
⎜ }⎟
p Wi− = log⎝ { ⎠, (4)
logit(p) = log =Z
1− p P Bi |S
(1)
∑ n
Z = b0 + bi ⋅xi where, Bi= presence of a potential landslide conditioning factor, Bi =
i=1
absence of a potential landslide conditioning factor, S= presence of a
where p is the probability of landslide occurrence. Z is a linear combi landslide, and S = absence of a landslide.
nation of n explanatory variables {xi}ni=1 and the regression coefficients Based on eqs. (3, 4), Van Westen (2002) describes the weights of
{bi}ni=1, b0 is the intercept (also called the bias). evidence in numbers of pixels as shows the eq. (5). Fig. 3 shows the
Once the regression coefficients and the intercept are obtained, it is relation between potential landslide conditioning factors and landslides
possible to calculate the probability p of occurrence for any value of the in terms of the number of pixels (Npix).
explanatory variables. Npix 1 { } Npix3
P{Bi |S} = , P Bi |S = ,
eZ Npix1 + Npix2 Npix3 + Npix4
p=
1
= (2) { } { } (5)
1 + e− Z 1 + eZ Npix2 Npix4
P Bi |S = , P Bi |S =
Npix2 + Npix1 Npix4 + Npix3
The initial process in constructing the landslide susceptibility model
is determining the optimal combination of factors (Ozdemir and Altural, where Npix1, Npix 2, Npix4, Npix5 are the four possible combinations of
2013; He et al., 2019; Wang et al., 2019). Thus, we quantified the
Bi , Bi , S, S as is shown in Fig. 3a.
multicollinearity and estimation ability of each factor with the Pearson’s
correlation coefficient. Multicollinearity exists when significant corre
2.6. Methodology
lations are found among the predictors. Their presence may negatively
affect the interpretation of regression coefficients (Patriche et al., 2016;
In this study we calculated the LSI based on LR and WoE methods in
Lin et al., 2017). In order to avoid multicollinearity, a tolerance test can
order to perform a suitable integration of categorical and continuous
be applied in terms of the coefficient of determination R2, where the
data in the landslide context. The geospatial database of the Popayan
values are computed as 1 − R2; the values less than 0.2 indicate multi
municipality was provided by the SGC, which involved a landslide
collinearity (Bai et al., 2010). Consequently, dependent variables must
be removed from the landslide susceptibility model in order to increase
the efficiency of the LR. However, a strategy that requires less supervi
sion is the reduction of dimensionality applied to the variables by
searching for possible multicollinearity can be performed with Principal
Component Analysis (PCA) (Baeza and Corominas, 2001). Lei et al.
(2011) states that, mathematically, PCA is a process that decomposes the
covariance matrix of a matrix into two parts: eigenvalues and column
eigenvectors. The reduction process is achieved by taking n variables b1,
b2, …, bn to combine them and produce principal components PC1, PC2,
…, PCn, that are uncorrelated. Moreover, with a few of the PCs, it is
possible to preserve a high percentage (explained variance) of the total
variance (Awange et al., 2018; Daya et al., 2018). Thus, the set of
correlated variables can be reduced to a new minimum number of var
iables which are independent of each other but still contain a linear
combination of the related variables. Additionally, the assessment of the
relationship between predictors can be calculated and checked so that
there are no observed instances of multicollinearity (Gwelo, 2019).
4
P. Goyes-Peñafiel and A. Hernandez-Rojas Engineering Geology 280 (2021) 105958
inventory and 11 landslide related variables including Shallow particular case of the logistic regression where b0 = 0 and bi = 1.
Geological Units, Geomorphology, land–use, general curvature, profile
curvature, plan curvature, flow length, terrain roughness index, flow 3. Results and discussion
accumulation, and topographic wetness index.
The spatial data was collected in a database in geopackage (.gpkg) 3.1. Landslide factor analysis
format. All information was projected into the MAGNA–SIRGAS /
Colombia West zone coordinate system–EPSG: 3115. Additionally, the The correlation matrix in Fig. 5 shows a high positive correlation for
boundary of the study area was checked for all vector layers. For the the variables x3, x5 and x6, x7; and high negative correlation for x6, x8
raster–derived data, it was verified that all layers had the same number and x7, x8. Here we applied a dimensionality reduction where the
of pixels and a size of 12.5 × 12.5 m. explained variance using 2 principal components is 68.59%. By using
The first stage of the methodology (Fig. 4a) is the pre–processing of those principal components with almost zero correlation, the
the categorical and continuous variables. The categorical variables (p1, pre–processing data mentioned in Fig. 4 were calculated.
p2, p3) are used for the calculation of LSI with the WoE method (Eq. 5), For Proposal 1, LR model uses PC1, PC2 and LSI. All of the coefficients
and the result will be a new continuous variable. The raster–derived are significant in this LR model. The p–values (P > ∣ z∣) for all variables
variables (x1, x2, x3, x4, x5, x6, x7, x8) may have multicollinearity, which are below 0.05, and thus, the log(odds) are all statistically significant
is evidenced in the correlation matrix. Therefore, to avoid the multi according to the wald–test ∣z ∣ > 2 shown in Table 2 .
collinearity, we applied PCA to reduce the dimensional to two principal In Proposal 2, LR is performed for PC1 and PC2 and separately, for
components. Once the pre–processing is finished, 3 continuous variables LSI. The coefficients (Table 3) give similar values to Proposal 1 because
are obtained which are PC1, PC2 and LSI. the three variables do not have multicollinearity, and thus a minimum
In the second stage, three proposals were carried out for the inte change in AUC value is reached. To calculate the LSI2, the LR results for
gration of the variables PC1, PC2, and LSI. The relationship between the PC1 and PC2, and LSI are multiplied.
three continuous variables and the proposals performed are shown in Finally, Proposal 3 is a special case where the sigmoid function
Fig. 4b. The objective of these proposals is to evaluate the best strategy represents the logit probability with values equal to 1 for the coefficient
to calculate the final LSI. We do this by analyzing the ROC curves and the (bi = 1) and to 0 for the intercept (b0 = 0). This normalizes the LSI values
area under the curve (AUC) of each of the proposals. In proposal 3, the in a range from 0 to 1, so they can be compared with the results obtained
sigmoid function F(z) = 1/(1 + exp (− z)) is presented, which is a with LR. Proposals 2 and 3 are practically a special case of each other.
Fig. 4. Methodological flow chart of the study. (a) Pre–processing data gen
eration using principal component analysis and Weight of Evidence. (b) Three
proposals for LSI calculation based on integrating Logistic Regression (LR) and
sigmoid function F(z). Fig. 5. Correlation matrix for continuous variables.
5
P. Goyes-Peñafiel and A. Hernandez-Rojas Engineering Geology 280 (2021) 105958
Table 2 values is that this zone constitutes large fan with a plain topography as a
Logistic Regression summary for Proposal 1. product of the accumulation of volcanic and alluvial material, which is
Variable Coef. Std. Error z P > ∣ z∣ consistent with the few landslides reported in this area (see Supple
mentary material).
PC1 0.7215 0.137 5.269 0.000
PC2 − 0.5473 0.144 − 3.803 0.000 Fig. 6b shows the LSI map generated with Proposal 2. The steepest
LSI 0.2868 0.072 4.008 0.000 zones with volcanic and structural landforms located in the north
Intercept 0.9353 0.210 4.453 0.000 western and towards the southeast have LSI2 values lower than LSI1
(0.6–0.85), and more critically those located in the eastern part of the
study area (see dashed line) with LSI2 values ranging from 0.1 to 0.5.
Table 3 The residual soil of the urban area has values that show the same pre
Logistic Regression summary for Proposal 2. vious behavior, developing more homogenized zones with LSI values
Variable Coef. Std. Error z P > ∣ z∣ closer to zero than LSI1, although they remain to be high in the south and
east because of the presence of volcanic landform. LSI2 produces very
PC1 0.8238 0.135 6.092 0.0000
PC2 − 0.6146 0.140 − 4.395 0.0000
low values for drainages (<0.05) and zones with reported landslides
Intercept 0.6342 0.186 3.411 0.0006 continue with high LSI values.
LSI 0.3607 0.066 5.506 0.0000 Fig. 6c shows the LSI map obtained with Proposal 1. The LSI3 shows
Intercept 0.7890 0.175 4.497 0.0000 very low values (<0.02) compared to proposals 1 and 2 in the volca
nic–alluvial fan (a large area of plain topography), although in the urban
values (>0.9) are located towards the northwest and southeast, where area there are still zones with high LSI values due to volcanic landforms.
landslides were previously mapped. This is consistent with the geolog The steep slopes in the eastern zone (see dashed line) continue showing
ical conditions that act as triggers, such as: residual soils derived from low LSI values (<0.05) compared with LSI1, which have values close to
pyroclastic deposits developed in steep slopes, landforms belonging to one. The remaining areas have similar patterns to LSI2 map, excluding
volcanic and structural environments related to Central Cordillera, and the landforms related to the tectonic setting because they present very
soil land–use with a low cover of vegetation. In contrast, the lowest low LSI3 values. These are not accurate, due to the fact that many
(<0.3) and medium (0.5–0.7) LSI1 values are located in the center of the landslides are reported in this landforms.
Popayan municipality, including the majority of the urban area, and in The zoomed areas in Fig. 6 show the results obtained with the three
the Cauca, Hondo, and Palace river valleys. The reason for these low proposals and the WoE method in more detail. In Fig. 6a, the zoomed
Fig. 6. Landslide Susceptibility Index maps: (a) Proposal 1 model; (b) Proposal 2 model; (c) Proposal 3 model; (d) WoE model. The black dashed circle is the zone
with the main difference between all methods.
6
P. Goyes-Peñafiel and A. Hernandez-Rojas Engineering Geology 280 (2021) 105958
area covers the Cauca River riverbed. The low LSI1 values in the river Proposal 3 and WoE method, where the AUC value for the latter
valley (<0.2) are consistent because these zones have low dips. (0.8030) is slightly lower than the former (0.8054). Proposals 1 and 2
Furthermore, Fig. 6a demonstrates the contrast between the LSI1 values show an optimal performance of the LSI calculation based on the ROC
as a product of the relief change, passing from steep slopes of volcanic curves and AUC values. The comparison of the models shows that the
landforms with high LSI1 values (>0.85) towards the west to plain AUC value of Proposal 1 is a little greater than Proposal 2 and with a
topography with low LSI1 values (<0.5) resulting from moderate dips more significant difference than Proposal 3 and WoE method. This
located in a volcanic–alluvial fan in the east. The majority of the re shows that the models obtained with Proposals 1 and 2 exhibit the
ported landslides show very high LSI1 values (>0.9), indicating the ac highest reliability in predicting landslides from the input data. The least
curacy of this proposal. reliable methods are Proposal 3 and the WoE method.
Fig. 3.2b shows the zoomed area for the LSI2 map. The low LSI2 Table 4 shows a set of ranges according to 5 quantiles for the LSIs and
values (<0.1) are also present in the Cauca River riverbed. Two of the natural breaks for WoE. A significant variation between the ranges can
main differences with LSI1 are that the west zone does not have very be noticed mainly for the 80 percentile, which is related to the very high
high LSI2 (>0.7) values and that the east zone is a bit more homogenized susceptibility category. A classification of LSI1 and LSI3 to assess the
because there are LSI2 values lower (<0.1). Landslides still reach high difference in susceptibility categories was made using the quantiles
LSI values but not as high as the LSI1 map (>0.8). obtained for LSI1 and natural breaks were used for WoE. The result of the
The zoomed area of LSI3 shows even more low LSI values. The plain classification is shown in Fig. 8. LSI1 is the only method that has the
topography stands out because the LSI3 values are close to zero (<0.02). highest percentage of pixels (20%) in the “very high” susceptibility
The Cauca River riverbed still presents low values, but those are mod category. The other proposals have less than 1%. This can be seen in the
erate in the volcanic landforms (<0.6), which is not accurate because black dashed circle of Fig. 6. Additionally, for the “very low” category,
there are landslides in this zone. an increase in pixels ranging from 20%, 42%, 61% for LSI1, LSI2 and LSI3
Finally, the zoomed area for the LSI map obtained with the WoE respectively can be seen.
method is shown in Fig. 6d. The two zones with high and low LSI values
are still distinguishable and even more homogenized than the LSI1, LSI2 4. Conclusions
and LSI3 maps. Unlike the values from those maps, the Cauca River
riverbed presents moderate LSI values. Some landslides located in the In this study, we calculated the LSI based on LR and WoE methods to
volcanic–alluvial fan have low LSI values and the landslide occurrence perform a suitable integration of categorical and continuous data in the
points labeled as “absent” reports high LSI values. These are not very landslide context. The result is a capable tool for landslide susceptibility
suitable results because they are opposite to the expected according to mapping that can be implemented by the local and regional Colombian
the reported landslides and the geological conditions of the zone. government authorities as an element of diagnosis for territorial plan
ning in order to reduce the risk of disasters and develop environmental
3.2.1. Validation and comparison of proposals control. Furthermore, the same methodology can be applied to zones
The ROC curves for the four models are shown in Fig. 7. The ROC with similar conditions, particularly for tropical countries in response to
curve was calculated based on Vakhshoori and Zare (2016) for the WoE climate change and the need for adaptation to new factors that could
method. This statistical method and its associated AUC value are effec trigger landslides.
tive to evaluate the performance of different models. The model with the The independence of continuous variables was evaluated using PCA.
largest AUC is considered the best model (Zhou et al., 2018). The ROC According to the three proposals, the LSI is low in the middle section of
curve for Proposal 1 is the highest, followed by the curve of Proposal 2. the study area due to soft slopes, plain topography, and residual soils. In
Furthermore, the curves of Proposal 3 and the WoE method present a contrast, the highest values are located in steep slopes related to vol
similar pattern, although the former is a bit higher than the latter. They canic and structural landforms. The results show that Proposal 1 with
both are lower than those in Proposals 1 and 2. AUC = 0.8578 and Proposal 2 with AUC = 0.8459 are better approaches
According to the statistics, the AUC values (Fig. 7) in the three for landslide susceptibility modeling in our study area than Proposal 3
proposals are better than the WoE method. Specifically, Proposal 1 has and WoE. The success of Proposals 1 and 2 are very promising for
the largest AUC value of 0.8578, while Proposal 2 exhibits a slightly landslide spatial prediction and their LSI maps could represent an initial
lower AUC (0.8569). Similar numerical characteristics are also found in assessment for any municipality planning project with the objective of
implementing more detailed studies in those areas previously identified
as highly susceptible.
Table 4
Statistical analysis by using percentile classification system. *For WoE was used
natural breaks.
LSI1 LSI2 LSI3 Percentile WoE*
7
P. Goyes-Peñafiel and A. Hernandez-Rojas Engineering Geology 280 (2021) 105958
8
P. Goyes-Peñafiel and A. Hernandez-Rojas Engineering Geology 280 (2021) 105958
Tsangaratos, P., Ilia, I., 2016. Comparison of a logistic regression and Naïve Bayes Varnes, D., 1978. Slope movement, types and processes. In: Schuster, R., Krizek, R.
classifier in landslide susceptibility assessments: the influence of models complexity (Eds.), Landslides: Analysis and control 176 Chapter 2. Transportation Research
and training dataset size. Catena 145, 164–179. Board, Washington, D.C., pp. 11–33
Vakhshoori, V., Zare, M., 2016. Landslide susceptibility mapping by comparing weight of Wang, Y., Fang, Z., Hong, H., 2019. Comparison of convolutional neural networks for
evidence, fuzzy logic, and frequency ratio methods. Geomatics, Natural Hazards and landslide susceptibility mapping in Yanshan County, China. Sci. Total Environ. 666,
Risk 7, 1731–1752. 975–993.
Van Westen, C.J., 2002. Weights of evidence modeling for landslide susceptibility mapping. Yang, J., Song, C., Yang, Y., Xu, C., Guo, F., Xie, L., 2019. New method for landslide
Technical Report International Institute for Geoinformation Science and Earth susceptibility mapping supported by spatial logistic regression and GeoDetector: A
Observation (ITC) Enschede. case study of Duwen Highway Basin, Sichuan Province, China. Geomorphology 324,
Van Westen, C.J., Seijmonsbergen, A.C., Mantovani, F., 1999. Comparing landslide 62–71.
hazard maps. In Natural Hazards (pp. 137–158). Volume 20. Zhao, Y., Wang, R., Jiang, Y., Liu, H., Wei, Z., 2019. GIS-based logistic regression for
Van Westen, C.J., Rengers, N., Soeters, R., 2003. Use of geomorphological information in rainfall-induced landslide susceptibility mapping under different grid sizes in
indirect landslide susceptibility assessment. Nat. Hazards 30, 399–419. Yueqing, Southeastern China. Eng. Geol. 259, 105147.
Van Westen, C.J., Castellanos, E., Kuriakose, S.L., 2008. Spatial data for landslide Zhou, C., Yin, K., Cao, Y., Ahmed, B., Li, Y., Catani, F., Pourghasemi, H.R., 2018.
susceptibility, hazard, and vulnerability assessment: an overview. Eng. Geol. 102, Landslide susceptibility modeling applying machine learning methods: A case study
112–131. from Longju in the three Gorges Reservoir area, China. Comput. Geosci. 112, 23–37.