You are on page 1of 9

Engineering Geology 280 (2021) 105958

Contents lists available at ScienceDirect

Engineering Geology
journal homepage: www.elsevier.com/locate/enggeo

Landslide susceptibility index based on the integration of logistic regression


and weights of evidence: A case study in Popayan, Colombia
Paul Goyes-Peñafiel *, Alejandra Hernandez-Rojas
School of Geology, Universidad Industrial de Santander, Colombia, 680002

A R T I C L E I N F O A B S T R A C T

Keywords: In this paper, we present a suitable integration of discrete and continuous data in a unique methodology based on
Landslide susceptibility systematically collected landslide inventory data. Eleven landslide conditioning factors were analyzed and used,
Principal component analysis where eight correspond to DEM–derived variables, and three to thematic polygon–type variables (shallow ge­
Multiple logistic regression
ology, geomorphology and soil land–use). Principal Component Analysis (PCA) was used to avoid the effect of
Weights of evidence
Geographical information system
multicollinearity. Additionally, 3 proposals were developed using Logistic Regression (LR) and Weights of Evi­
Bivariate statistical methods dence (WoE) methods that use the continuous and discrete variables efficiently, respectively. The performance of
Natural hazards each proposal was evaluated by the Area Under the Curve (AUC) of the Receiver Operating Characteristic (ROC)
curves. The analysis indicated that Proposal 1 with AUC = 0.8578 and Proposal 2 with AUC = 0.8459 have the
best LSI assessment, while the performance of Proposal 3 with AUC = 0.8054 shows the lowest prediction ap­
proaches. In comparison with the WoE method, our proposal shows an increase in high and very high suscep­
tibility in areas with complex topography, which is consistent with the reported landslides.

1. Introduction based (Corominas et al., 2014). The heuristically based approach pro­
duces an index based on a landslide inventory and the weights assigned
Landslides are one of the major natural hazards controlled by a to conditioning factors (Luo and Liu, 2018; Van Westen et al., 2008).
combination of factors such as rock type, slope, and land use type that This can result in very reliable maps, on the condition that the mapping
lead to instability in the terrain (Van Westen et al., 1999). Some of the was done with care (Van Westen et al., 1999), but it involves the sub­
consequences are property damage, economic loss, and human casu­ jective definition of the weights of each conditioning factor (Lin et al.,
alties. These problems have led governments to work on the prevention 2017). The statistically based approach uses past landslides and condi­
and mitigation of disasters. For instance, Colombia created the 1077 tioning factors to produce an objective quantitative LSI map (Luo and
Decree of 2015, which contains the minimum requirements to elaborate Liu, 2018). This approach include bivariate statistical models such as
Landslide Susceptibility Maps (LSM). These requirements include the Weights of Evidence (Hong et al., 2017; Pamela et al., 2018; Mahdadi
use of discrete data (vectorial features such as Shallow Geological Units, et al., 2018) and frequency ratio (Aditian et al., 2018), multivariate
Geomorphology, and land–use thematic maps) integrated with contin­ statistical techniques such as Logistic Regression (Hemasinghe et al.,
uous data (raster) derived from Digital Elevation Model (DEM), and a 2018; Lombardo and Mai, 2018; Kadavi et al., 2019; Yang et al., 2019;
landslide field inventory (Servicio Geológico Colombiano, 2017). Cantarino et al., 2019; Zhao et al., 2019), as well as non–linear methods
Therefore, the suitable integration of discrete (also called categorical) such as support vector machine (Tien Bui et al., 2019; Achour and
and continuous data in a unique methodology represents one of the most Pourghasemi, 2019; Hu et al., 2019; Bui et al., 2020) and artificial neural
important challenges in the LSM. network (Cantarino et al., 2019; Wang et al., 2019; Abbaszadeh Shahri
LSM describe the spatial distribution of the landslide probability in et al., 2019; Bragagnolo et al., 2020).
an area based on local geo–environmental factors (Dai et al., 2002). Logistic Regression has been used in numerous landslide suscepti­
Each pixel of LSM has an associated Landslide Susceptibility Index (LSI) bility assessments to predict the probability of a landslide occurring
value. The methodology to calculate LSI at a regional scale can be from independent explanatory variables, providing accurate and reli­
roughly divided into two categories: heuristically based and statistically able results (Pradhan, 2010; Ozdemir and Altural, 2013; Tsangaratos

* Corresponding author.
E-mail addresses: goyes.yesid@gmail.com (P. Goyes-Peñafiel), maria.hernandez26@correo.uis.edu.co (A. Hernandez-Rojas).

https://doi.org/10.1016/j.enggeo.2020.105958
Received 4 March 2020; Received in revised form 14 August 2020; Accepted 6 December 2020
Available online 13 December 2020
0013-7952/© 2020 Elsevier B.V. All rights reserved.
P. Goyes-Peñafiel and A. Hernandez-Rojas Engineering Geology 280 (2021) 105958

and Ilia, 2016; Hemasinghe et al., 2018; Oh et al., 2018). Both contin­ 2. Materials and methods
uous and categorical variables can be used for Logistic Regression ana­
lyses (Lin et al., 2017; Mahdadi et al., 2018). However, according to 2.1. Study area
Zhao et al. (2019), continuous variables need to be converted to cate­
gorical variables or viceversa. In the context landslides, there is not a The Popayan municipality is the capital of the Cauca department and
necessary intrinsic ordering to the categories such as geology or land­ is located in the southwest of Colombia (Fig. 1), between the longitudes
–use. Additionally, for the variables of nominal type (similar to the 73∘48′ W and 76∘26′ W and latitudes 2∘18′ N and 2∘36′ N, with an average
categorical ones but with an explicit ordering of the variables, such as altitude of 1985 m above sea level (masl). The total area covers
slopes or aspect which are classified in ranges), performing a classifi­ approximately 478.3 km2 and the urban area 0.0417 km2. This munic­
cation is necessary, which implies a subjective selection of the ranges ipality extends from northwest to southeast of the inter–Andean Upper
chosen for each study zone. Cauca River Valley, which is limited by the western flank of the Central
On the other hand, the Weights of Evidence method calculates the Cordillera and the eastern flank of Western Cordillera in the Andes
weight for each causative factor of a landslide based on the presence or Mountains.
absence of landslides within the area. The fundamental assumption of The climate in the region is determined by the proximity of the
this method is that future landslides will occur under similar conditions inter–tropical convergence zone, very high humidity tropical rain­
from those contributing to previous landslides. It also assumes that forests, and humid temperate and cold zones, resulting in a fairly con­
causative factors for the mapped landslides remain constant over time stant humidity (Bastidas et al., 2004; Servicio Geológico Colombiano,
(Regmi et al., 2010; Pamela et al., 2018; Mahdadi et al., 2018; Ozdemir 2015). Precipitation and temperature records are available from the
and Altural, 2013). This method uses discrete data or continuous data climate station at the Popayan airport. These records show annual
divided into several classes (nominal data), and overlaps each one with rainfall ranges from 1400 mm to 2500 mm (average 2119 mm). The
the landslide map to calculate their weights based on the Bayes theorem distribution of rainfall over a year is not uniform, and most rainfall
(Mahdadi et al., 2018). events occur between September and December, with a relatively dry
This paper aims to calculate LSI based on Logistic Regression and period from June to August. The mean annual temperature is about 18
Weights of Evidence model methods in order to perform a suitable ∘
C. The average annual minimum temperature is between 12 and 14 ∘C
integration of categorical and continuous data in the landslide context. and the maximum annual temperature is between 23 and 25 ∘C.
The conditioning factors initially used for Logistic Regression are slope, The topography of the Popayan municipality is controlled by two
general curvature, profile curvature, plan curvature, flow length, terrain main relief zones. The first one is located in the southeast of the mu­
roughness index, flow accumulation, and topographic wetness index. nicipality and includes the western flank of the Central Cordillera. This
Logistic Regression requires the use of independent variables, therefore zone covers about 30% of the study area and is characterized by high
Principal Component Analysis (PCA) was used to decrease the dimen­ elevations with ranges from 1862 to 3822 masl. The inclination in this
sionality and thus guarantee the independence between the variables of mountainous zone is steep, with a mean slope of 25∘ and slope orien­
the model. For Weights of Evidence, Shallow Geological Units, Geo­ tation mainly towards the southeast. These features make this zone
morphology, and land–use thematic maps were used. In order to choose prone to landslides. On the other hand, the second zone covers about
the best model in this study, the accuracy of each assessment was 70% of the study area and is located at an average altitude of 1800 masl
evaluated by the Receiver Operating Characteristic (ROC) curves and in the inter–Andean Cauca–Patia Basin, being crossed by Cauca, Hondo,
the Area Under the Curve (AUC) value which are based on sensitivity and Palace rivers. This zone, unlike the first one, is characterized by a
(true positivity rate) and specificity (false negative rate) (Tien Bui et al., wide plain covering it. In the urban area and its surroundings, the slopes
2016; Bui et al., 2020). gradually decline, but towards the west, they are steeper and susceptible

Fig. 1. Location of study area in Colombia and the landslide inventory.

2
P. Goyes-Peñafiel and A. Hernandez-Rojas Engineering Geology 280 (2021) 105958

to landslides due to uplifted areas as a result of tectonic activity (Servicio according to their susceptibility to landslide occurrence based on the
Geológico Colombiano, 2015). expert review of the conditioning factors present in each one and the
The geologic setting of the study area is the Cauca–Patia Basin, material evidenced in fieldwork.
located in the North Andean block (Servicio Geológico Colombiano,
2015). This basin is divided from north to south in Cauca Subbasin,
Popayan Highland, and Patia Subbasin. Popayan Highland covers most 2.3. Landslide conditioning factors
of the study area (70%) and is characterized by the presence of land­
forms such as fans in the urban area, and hills and exhumed rocks in the We introduced 11 landslide variables as relevant and influential to
west. This landscape is a result of the weathering of pyroclastic rocks landslide phenomena considering the geo–environmental settings of the
which develops residual soils. The rest of the area (30%) corresponds to research area, the data availability, and the 1077 Decree of 2015 of
landforms related to the Central Cordillera, such as recent and old vol­ Colombian legislation (Table 1). The categorical variables selected
canoes, calderas, and those controlled by tectonic setting. These land­ include Shallow Geological Units (p1), Geomorphology (p2), and Soil
forms are developed in plutonic and volcanic rocks to the northeast and Land–Use (p3, Corine Land Cover units). Whereas, continuous variables
in the southeastern in residual soils of these rocks. were extracted from the DEM with a spatial resolution of 12.5 m created
by the Advanced Land Observation Satellite with a Phased Array type
L–band Synthetic Aperture Radar (ALOS PALSAR). These include slope
2.2. Landslide inventory map angle (x1), Terrain Roughness Index (x2, TRI), Topographic Wetness
Index (x3, TWI), plan curvature (x4), profile curvature (x5), general
Eight hundred sixteen landslides were mapped based on geological curvature (x6), flow length (x7), and flow accumulation (x8).
field survey, historical report and remote sensing images interpretation For categorical variables, Shallow Geological Units are important in
(Fig. 1). The field survey identifies, locates, and classifies landslides. The slope instability because they represent the material exposed in the
historical report contains a compilation of historical events associated terrain with characteristics such as resistance, weathering level, and
with slope instability. This data set can be found in the Landslide In­ consistency (Servicio Geológico Colombiano, 2017). Geomorphology
formation System–SIMMA on the Colombian Geological Survey web site registers the landforms and the weathering process by which they are
(simma.sgc.gov.co). Finally, the remote sensing interpretation was made affected (Carvajal, 2012). Soil Land–Use means the actual conditions of
by observing aerial photographs and satellite images with 25 m of res­ vegetation and the human activity in the territory (Van Westen et al.,
olution. All the data previously mentioned was provided by the 2008).
Colombian Geological Survey (SGC in Spanish). One of the most important variables derived from the DEM is the
The landslides within the study area are mainly located in residual slope, which influences landslide stability because it controls the ve­
soils with medium to high slopes angles (82%) and, to a lesser extent, in locity of the material (Abbaszadeh Shahri et al., 2019; Zhao et al., 2019;
rock units (18%). They are characterized by different sizes varying from Trigila et al., 2015). TRI indicates the undulation or variation of relief
39,428 m2 to 26 m2, with an average area of 2264 m2. According to their (Servicio Geológico Colombiano, 2015). TWI is used to characterize the
type of movement, landslides were classified in slides, falls (also sub­ spatial distribution of soil saturation and runoff volume (He et al.,
divided based on their material type), lateral spreads, and complexes. 2019). Plan, profile, and general curvatures are morphometric param­
While falls are not landslides, they can be taken as indicators of an up­ eters that represent the spatial variation of the slope gradient and
coming event. The remaining landslides are presented as “undefined” highlight converging (concave curvature) or divergent (convex curva­
due to the impossibility of defining their type of movement by remote ture) water flows (Trigila et al., 2015). Flow length is the longest upslope
sensing interpretation (Fig. 2). In addition to the 816 landslides mapped, distance along the flow path from each cell to the top of a drainage
200 new points landslide occurring were created and classified divide (Regmi et al., 2010). Finally, flow accumulation is the number of

Table 1
Variables names and types used in this study. The data was provided by the SGC
(Servicio Geológico Colombiano, 2017).
Variable Name Data Variable Description
type type

Y Landslide inventory Point Binary Landslide presence


or absence
p1 Shallow Geological Polygon Categorical Deposit map
Unit
p2 Geomorphology Polygon Categorical Land forms and
mophogenetic units
p3 Soil Land–Use Polygon Categorical Corine Land Cover
units
x1 Slope Raster Continuous Derived from ALOS
PALSAR DEM
x2 Terrain Roughness Raster Continuous Derived from ALOS
Index (TRI) PALSAR DEM
x3 Topographic Raster Continuous Derived from ALOS
Wetness Index PALSAR DEM
(TWI)
x4 Plan curvature Raster Continuous Derived from ALOS
PALSAR DEM
x5 Profile curvature Raster Continuous Derived from ALOS
PALSAR DEM
x6 General curvature Raster Continuous Derived from ALOS
PALSAR DEM
x7 Flow Length Raster Continuous Derived from ALOS
Fig. 2. Classification of landslides in the study area according to Varnes (1978). PALSAR DEM
x8 Flow Accumulation Raster Continuous Derived from ALOS
(a) Landslide classification histogram based on the type of movement. (b) Fall
PALSAR DEM
classification histogram based on the type of material.

3
P. Goyes-Peñafiel and A. Hernandez-Rojas Engineering Geology 280 (2021) 105958

upstream cells that flow into each cell of the study area (Trigila et al., assigned to each of the different classes within a factor map is classified
2015). (e.g. each geological unit within a Shallow Geological map). An exten­
sive explanation of WoE is presented in Ilia and Tsangaratos (2016). The
following equation illustrates the Bayes theorem (Van Westen et al.,
2.4. Logistic regression and principal component analysis
2003).
⎛ ⎞
Logistic Regression (LR) is a quantitative statistical method exten­
P{Bi |S} ⎠
sively used for landslide susceptibility analysis. It is used to predict the +
Wi = log ⎝ { } , (3)
probability of the presence of landslides as a function of predictor var­ P Bi |S
iables (Daya et al., 2018). Those variables can be either continuous,
discrete or any combination of both types, and they do not necessarily and
have normal distributions (Mahdadi et al., 2018). ⎛ { }⎞
( ) P Bi |S
⎜ }⎟
p Wi− = log⎝ { ⎠, (4)
logit(p) = log =Z
1− p P Bi |S
(1)
∑ n
Z = b0 + bi ⋅xi where, Bi= presence of a potential landslide conditioning factor, Bi =
i=1
absence of a potential landslide conditioning factor, S= presence of a
where p is the probability of landslide occurrence. Z is a linear combi­ landslide, and S = absence of a landslide.
nation of n explanatory variables {xi}ni=1 and the regression coefficients Based on eqs. (3, 4), Van Westen (2002) describes the weights of
{bi}ni=1, b0 is the intercept (also called the bias). evidence in numbers of pixels as shows the eq. (5). Fig. 3 shows the
Once the regression coefficients and the intercept are obtained, it is relation between potential landslide conditioning factors and landslides
possible to calculate the probability p of occurrence for any value of the in terms of the number of pixels (Npix).
explanatory variables. Npix 1 { } Npix3
P{Bi |S} = , P Bi |S = ,
eZ Npix1 + Npix2 Npix3 + Npix4
p=
1
= (2) { } { } (5)
1 + e− Z 1 + eZ Npix2 Npix4
P Bi |S = , P Bi |S =
Npix2 + Npix1 Npix4 + Npix3
The initial process in constructing the landslide susceptibility model
is determining the optimal combination of factors (Ozdemir and Altural, where Npix1, Npix 2, Npix4, Npix5 are the four possible combinations of
2013; He et al., 2019; Wang et al., 2019). Thus, we quantified the
Bi , Bi , S, S as is shown in Fig. 3a.
multicollinearity and estimation ability of each factor with the Pearson’s
correlation coefficient. Multicollinearity exists when significant corre­
2.6. Methodology
lations are found among the predictors. Their presence may negatively
affect the interpretation of regression coefficients (Patriche et al., 2016;
In this study we calculated the LSI based on LR and WoE methods in
Lin et al., 2017). In order to avoid multicollinearity, a tolerance test can
order to perform a suitable integration of categorical and continuous
be applied in terms of the coefficient of determination R2, where the
data in the landslide context. The geospatial database of the Popayan
values are computed as 1 − R2; the values less than 0.2 indicate multi­
municipality was provided by the SGC, which involved a landslide
collinearity (Bai et al., 2010). Consequently, dependent variables must
be removed from the landslide susceptibility model in order to increase
the efficiency of the LR. However, a strategy that requires less supervi­
sion is the reduction of dimensionality applied to the variables by
searching for possible multicollinearity can be performed with Principal
Component Analysis (PCA) (Baeza and Corominas, 2001). Lei et al.
(2011) states that, mathematically, PCA is a process that decomposes the
covariance matrix of a matrix into two parts: eigenvalues and column
eigenvectors. The reduction process is achieved by taking n variables b1,
b2, …, bn to combine them and produce principal components PC1, PC2,
…, PCn, that are uncorrelated. Moreover, with a few of the PCs, it is
possible to preserve a high percentage (explained variance) of the total
variance (Awange et al., 2018; Daya et al., 2018). Thus, the set of
correlated variables can be reduced to a new minimum number of var­
iables which are independent of each other but still contain a linear
combination of the related variables. Additionally, the assessment of the
relationship between predictors can be calculated and checked so that
there are no observed instances of multicollinearity (Gwelo, 2019).

2.5. Weight of evidence

The Weight of Evidence (WoE) method is based on the Bayes theo­


rem and on the concepts of prior and posterior probability. We used WoE
to statistically calculate the importance of evidential variables by
assessing the spatial relationship between the distribution of the areas Fig. 3. (a) Four possible combinations of a potential landslide conditioning
affected by landslides, known landslide locations, and the distribution of factor and a landslide inventory map. Npix = number of pixels. (b) Graphical
the analyzed landslide susceptibility variables and evidential themes representation of the landslide and potential landslide conditioning factor
(Van Westen et al., 2003; Ilia et al., 2010; Ilia and Tsangaratos, 2016). relationship. Modified from Van Westen (2002) and Servicio Geológico
Positive and negative weights (W+ i and Wi , respectively) are

Colombiano (2017).

4
P. Goyes-Peñafiel and A. Hernandez-Rojas Engineering Geology 280 (2021) 105958

inventory and 11 landslide related variables including Shallow particular case of the logistic regression where b0 = 0 and bi = 1.
Geological Units, Geomorphology, land–use, general curvature, profile
curvature, plan curvature, flow length, terrain roughness index, flow 3. Results and discussion
accumulation, and topographic wetness index.
The spatial data was collected in a database in geopackage (.gpkg) 3.1. Landslide factor analysis
format. All information was projected into the MAGNA–SIRGAS /
Colombia West zone coordinate system–EPSG: 3115. Additionally, the The correlation matrix in Fig. 5 shows a high positive correlation for
boundary of the study area was checked for all vector layers. For the the variables x3, x5 and x6, x7; and high negative correlation for x6, x8
raster–derived data, it was verified that all layers had the same number and x7, x8. Here we applied a dimensionality reduction where the
of pixels and a size of 12.5 × 12.5 m. explained variance using 2 principal components is 68.59%. By using
The first stage of the methodology (Fig. 4a) is the pre–processing of those principal components with almost zero correlation, the
the categorical and continuous variables. The categorical variables (p1, pre–processing data mentioned in Fig. 4 were calculated.
p2, p3) are used for the calculation of LSI with the WoE method (Eq. 5), For Proposal 1, LR model uses PC1, PC2 and LSI. All of the coefficients
and the result will be a new continuous variable. The raster–derived are significant in this LR model. The p–values (P > ∣ z∣) for all variables
variables (x1, x2, x3, x4, x5, x6, x7, x8) may have multicollinearity, which are below 0.05, and thus, the log(odds) are all statistically significant
is evidenced in the correlation matrix. Therefore, to avoid the multi­ according to the wald–test ∣z ∣ > 2 shown in Table 2 .
collinearity, we applied PCA to reduce the dimensional to two principal In Proposal 2, LR is performed for PC1 and PC2 and separately, for
components. Once the pre–processing is finished, 3 continuous variables LSI. The coefficients (Table 3) give similar values to Proposal 1 because
are obtained which are PC1, PC2 and LSI. the three variables do not have multicollinearity, and thus a minimum
In the second stage, three proposals were carried out for the inte­ change in AUC value is reached. To calculate the LSI2, the LR results for
gration of the variables PC1, PC2, and LSI. The relationship between the PC1 and PC2, and LSI are multiplied.
three continuous variables and the proposals performed are shown in Finally, Proposal 3 is a special case where the sigmoid function
Fig. 4b. The objective of these proposals is to evaluate the best strategy represents the logit probability with values equal to 1 for the coefficient
to calculate the final LSI. We do this by analyzing the ROC curves and the (bi = 1) and to 0 for the intercept (b0 = 0). This normalizes the LSI values
area under the curve (AUC) of each of the proposals. In proposal 3, the in a range from 0 to 1, so they can be compared with the results obtained
sigmoid function F(z) = 1/(1 + exp (− z)) is presented, which is a with LR. Proposals 2 and 3 are practically a special case of each other.

3.2. Landslide susceptibility maps

Landslides are very complex processes that are controlled by many


topographical and environmental factors. Moreover, LSM is of great
significance for visually analyzing landslide–prone areas (Wang et al.,
2019). For this reason, the main aim of this work is to calculate LSI based
on the integration of LR and WoE methods through three different
proposals that are shown in Fig. 6a–c. In addition, a fourth model was
calculated with the WoE method (Fig. 6d) according to the parameters
described by the Servicio Geológico Colombiano (2015). These include
the use of Shallow Geological Units, Geomorphology, Soil Land–Use,
Slope, General curvature, and Topographic Wetness Index as condi­
tioning factors. For the four models, the landslide inventory was used for
modeling the WoE method, and the landslide occurrence points were
used to construct and verify the LR model.
Fig. 6a shows the LSI map obtained with Proposal 3. The highest LSI1

Fig. 4. Methodological flow chart of the study. (a) Pre–processing data gen­
eration using principal component analysis and Weight of Evidence. (b) Three
proposals for LSI calculation based on integrating Logistic Regression (LR) and
sigmoid function F(z). Fig. 5. Correlation matrix for continuous variables.

5
P. Goyes-Peñafiel and A. Hernandez-Rojas Engineering Geology 280 (2021) 105958

Table 2 values is that this zone constitutes large fan with a plain topography as a
Logistic Regression summary for Proposal 1. product of the accumulation of volcanic and alluvial material, which is
Variable Coef. Std. Error z P > ∣ z∣ consistent with the few landslides reported in this area (see Supple­
mentary material).
PC1 0.7215 0.137 5.269 0.000
PC2 − 0.5473 0.144 − 3.803 0.000 Fig. 6b shows the LSI map generated with Proposal 2. The steepest
LSI 0.2868 0.072 4.008 0.000 zones with volcanic and structural landforms located in the north­
Intercept 0.9353 0.210 4.453 0.000 western and towards the southeast have LSI2 values lower than LSI1
(0.6–0.85), and more critically those located in the eastern part of the
study area (see dashed line) with LSI2 values ranging from 0.1 to 0.5.
Table 3 The residual soil of the urban area has values that show the same pre­
Logistic Regression summary for Proposal 2. vious behavior, developing more homogenized zones with LSI values
Variable Coef. Std. Error z P > ∣ z∣ closer to zero than LSI1, although they remain to be high in the south and
east because of the presence of volcanic landform. LSI2 produces very
PC1 0.8238 0.135 6.092 0.0000
PC2 − 0.6146 0.140 − 4.395 0.0000
low values for drainages (<0.05) and zones with reported landslides
Intercept 0.6342 0.186 3.411 0.0006 continue with high LSI values.
LSI 0.3607 0.066 5.506 0.0000 Fig. 6c shows the LSI map obtained with Proposal 1. The LSI3 shows
Intercept 0.7890 0.175 4.497 0.0000 very low values (<0.02) compared to proposals 1 and 2 in the volca­
nic–alluvial fan (a large area of plain topography), although in the urban
values (>0.9) are located towards the northwest and southeast, where area there are still zones with high LSI values due to volcanic landforms.
landslides were previously mapped. This is consistent with the geolog­ The steep slopes in the eastern zone (see dashed line) continue showing
ical conditions that act as triggers, such as: residual soils derived from low LSI values (<0.05) compared with LSI1, which have values close to
pyroclastic deposits developed in steep slopes, landforms belonging to one. The remaining areas have similar patterns to LSI2 map, excluding
volcanic and structural environments related to Central Cordillera, and the landforms related to the tectonic setting because they present very
soil land–use with a low cover of vegetation. In contrast, the lowest low LSI3 values. These are not accurate, due to the fact that many
(<0.3) and medium (0.5–0.7) LSI1 values are located in the center of the landslides are reported in this landforms.
Popayan municipality, including the majority of the urban area, and in The zoomed areas in Fig. 6 show the results obtained with the three
the Cauca, Hondo, and Palace river valleys. The reason for these low proposals and the WoE method in more detail. In Fig. 6a, the zoomed

Fig. 6. Landslide Susceptibility Index maps: (a) Proposal 1 model; (b) Proposal 2 model; (c) Proposal 3 model; (d) WoE model. The black dashed circle is the zone
with the main difference between all methods.

6
P. Goyes-Peñafiel and A. Hernandez-Rojas Engineering Geology 280 (2021) 105958

area covers the Cauca River riverbed. The low LSI1 values in the river Proposal 3 and WoE method, where the AUC value for the latter
valley (<0.2) are consistent because these zones have low dips. (0.8030) is slightly lower than the former (0.8054). Proposals 1 and 2
Furthermore, Fig. 6a demonstrates the contrast between the LSI1 values show an optimal performance of the LSI calculation based on the ROC
as a product of the relief change, passing from steep slopes of volcanic curves and AUC values. The comparison of the models shows that the
landforms with high LSI1 values (>0.85) towards the west to plain AUC value of Proposal 1 is a little greater than Proposal 2 and with a
topography with low LSI1 values (<0.5) resulting from moderate dips more significant difference than Proposal 3 and WoE method. This
located in a volcanic–alluvial fan in the east. The majority of the re­ shows that the models obtained with Proposals 1 and 2 exhibit the
ported landslides show very high LSI1 values (>0.9), indicating the ac­ highest reliability in predicting landslides from the input data. The least
curacy of this proposal. reliable methods are Proposal 3 and the WoE method.
Fig. 3.2b shows the zoomed area for the LSI2 map. The low LSI2 Table 4 shows a set of ranges according to 5 quantiles for the LSIs and
values (<0.1) are also present in the Cauca River riverbed. Two of the natural breaks for WoE. A significant variation between the ranges can
main differences with LSI1 are that the west zone does not have very be noticed mainly for the 80 percentile, which is related to the very high
high LSI2 (>0.7) values and that the east zone is a bit more homogenized susceptibility category. A classification of LSI1 and LSI3 to assess the
because there are LSI2 values lower (<0.1). Landslides still reach high difference in susceptibility categories was made using the quantiles
LSI values but not as high as the LSI1 map (>0.8). obtained for LSI1 and natural breaks were used for WoE. The result of the
The zoomed area of LSI3 shows even more low LSI values. The plain classification is shown in Fig. 8. LSI1 is the only method that has the
topography stands out because the LSI3 values are close to zero (<0.02). highest percentage of pixels (20%) in the “very high” susceptibility
The Cauca River riverbed still presents low values, but those are mod­ category. The other proposals have less than 1%. This can be seen in the
erate in the volcanic landforms (<0.6), which is not accurate because black dashed circle of Fig. 6. Additionally, for the “very low” category,
there are landslides in this zone. an increase in pixels ranging from 20%, 42%, 61% for LSI1, LSI2 and LSI3
Finally, the zoomed area for the LSI map obtained with the WoE respectively can be seen.
method is shown in Fig. 6d. The two zones with high and low LSI values
are still distinguishable and even more homogenized than the LSI1, LSI2 4. Conclusions
and LSI3 maps. Unlike the values from those maps, the Cauca River
riverbed presents moderate LSI values. Some landslides located in the In this study, we calculated the LSI based on LR and WoE methods to
volcanic–alluvial fan have low LSI values and the landslide occurrence perform a suitable integration of categorical and continuous data in the
points labeled as “absent” reports high LSI values. These are not very landslide context. The result is a capable tool for landslide susceptibility
suitable results because they are opposite to the expected according to mapping that can be implemented by the local and regional Colombian
the reported landslides and the geological conditions of the zone. government authorities as an element of diagnosis for territorial plan­
ning in order to reduce the risk of disasters and develop environmental
3.2.1. Validation and comparison of proposals control. Furthermore, the same methodology can be applied to zones
The ROC curves for the four models are shown in Fig. 7. The ROC with similar conditions, particularly for tropical countries in response to
curve was calculated based on Vakhshoori and Zare (2016) for the WoE climate change and the need for adaptation to new factors that could
method. This statistical method and its associated AUC value are effec­ trigger landslides.
tive to evaluate the performance of different models. The model with the The independence of continuous variables was evaluated using PCA.
largest AUC is considered the best model (Zhou et al., 2018). The ROC According to the three proposals, the LSI is low in the middle section of
curve for Proposal 1 is the highest, followed by the curve of Proposal 2. the study area due to soft slopes, plain topography, and residual soils. In
Furthermore, the curves of Proposal 3 and the WoE method present a contrast, the highest values are located in steep slopes related to vol­
similar pattern, although the former is a bit higher than the latter. They canic and structural landforms. The results show that Proposal 1 with
both are lower than those in Proposals 1 and 2. AUC = 0.8578 and Proposal 2 with AUC = 0.8459 are better approaches
According to the statistics, the AUC values (Fig. 7) in the three for landslide susceptibility modeling in our study area than Proposal 3
proposals are better than the WoE method. Specifically, Proposal 1 has and WoE. The success of Proposals 1 and 2 are very promising for
the largest AUC value of 0.8578, while Proposal 2 exhibits a slightly landslide spatial prediction and their LSI maps could represent an initial
lower AUC (0.8569). Similar numerical characteristics are also found in assessment for any municipality planning project with the objective of
implementing more detailed studies in those areas previously identified
as highly susceptible.

Declaration of Competing Interest

The authors declare that they have no known competing financial

Table 4
Statistical analysis by using percentile classification system. *For WoE was used
natural breaks.
LSI1 LSI2 LSI3 Percentile WoE*

< 0.2911 < 0.1401 < 0.0124 < − 9.4076


0 percentile
0.2911–0.5551 0.1401–0.2762 0.0124–0.0779 20 − 9.4076
percentile to
− 5.3979
0.5551–0.7523 0.2762–0.4260 0.0779–0.2742 40 -5.3979–
percentile − 1.3881
0.7523–0.8890 0.4260–0.5881 0.2742–0.5419 60 -1.3881–
percentile 2.6216
> 0.8890 > 0.5881 > 0.5419 80 > 2.6216
percentile
Fig. 7. ROC curves for Proposal 1,2,3 and WoE method.

7
P. Goyes-Peñafiel and A. Hernandez-Rojas Engineering Geology 280 (2021) 105958

conventional machine learning models in landslide susceptibility assessment. Catena


188, 104426.
Cantarino, I., Carrion, M.A., Goerlich, F., Martinez Ibañez, V., 2019. A ROC analysis-
based classification method for landslide susceptibility maps. Landslides 16,
265–282.
Carvajal, J.H., 2012. Propuesta de estandarización de la cartografía geomorfológica en
Colombia (Technical Report Servicio Geológico Colombiano Bogotá).
Corominas, J., van Westen, C., Frattini, P., Cascini, L., Malet, J.P., Fotopoulou, S.,
Catani, F., Van Den Eeckhaut, M., Mavrouli, O., Agliardi, F., Pitilakis, K., Winter, M.
G., Pastor, M., Ferlisi, S., Tofani, V., Hervás, J., Smith, J.T., 2014. Recommendations
for the quantitative analysis of landslide risk. Bull. Eng. Geol. Environ. 73, 209–263.
Dai, F., Lee, C., Ngai, Y., 2002. Landslide risk assessment and management: an overview.
Eng. Geol. 64, 65–87.
Daya, S.B., Cheng, Q., Agterberg, F., 2018. Handbook of Mathematical Geosciences.
Springer International Publishing, Cham.
Gwelo, A., 2019. Principal Components to Overcome Multicollinearity Problem. Oradea
Journal of Business and Economics 4, 79–91.
He, Q., Shahabi, H., Shirzadi, A., Li, S., Chen, W., Wang, N., Chai, H., Bian, H., Ma, J.,
Chen, Y., Wang, X., Chapi, K., Ahmad, B.B., 2019. Landslide spatial modelling using
novel bivariate statistical based Naïve Bayes, RBF Classifier, and RBF Network
machine learning algorithms. Sci. Total Environ. 663, 1–15.
Hemasinghe, H., Rangali, R.S.S., Deshapriya, N.L., Samarakoon, L., 2018. Landslide
susceptibility mapping using logistic regression model (a case study in Badulla
District, Sri Lanka). Procedia Engineering 212, 1046–1053.
Hong, H., Ilia, I., Tsangaratos, P., Chen, W., Xu, C., 2017. A hybrid fuzzy weight of
evidence method in landslide susceptibility analysis on the Wuyuan area, China.
Geomorphology 290, 1–16.
Hu, Q., Zhou, Y., Wang, S., Wang, F., 2019. Machine learning and fractal theory models
for landslide susceptibility mapping: Case study from the Jinsha River Basin.
Geomorphology 106975.
Ilia, I., Tsangaratos, P., 2016. Applying weight of evidence method and sensitivity
analysis to produce a landslide susceptibility map. Landslides 13, 379–397.
Ilia, I., Tsangaratos, P., Koumantakis, I., Rozos, D., 2010. Application of a bayesian
Fig. 8. Susceptibility classification for each proposal. approach in GIS based model for evaluating landslide susceptibility. Case study KIMI
area, Euboea, Greece. Bull. Geol. Soc. Greece 43, 1590–1600.
Kadavi, P.R., Lee, C.-W., Lee, S., 2019. Landslide-susceptibility mapping in Gangwon-do,
interests or personal relationships that could have appeared to influence
South Korea, using logistic regression and decision tree models. Environ. Earth Sci.
the work reported in this paper. 78, 116.
This research did not receive any specific grant from funding Lei, T.C., Wan, S., Chou, T.Y., Pai, H.C., 2011. The knowledge expression on debris flow
agencies in the public, commercial, or not–for–profit sectors. potential analysis through PCA + LDA and rough sets theory: A case study of Chen-
Yu-Lan watershed, Nantou, Taiwan. Environ. Earth Sci. 63, 981–997.
Lin, G.F., Chang, M.J., Huang, Y.C., Ho, J.Y., 2017. Assessment of susceptibility to
Acknowledgements rainfall-induced landslides using improved self-organizing linear output map,
support vector machine, and logistic regression. Eng. Geol. 224, 62–74.
Lombardo, L., Mai, P.M., 2018. Presenting logistic regression-based landslide
The authors would like to thank the Colombian Geological Survey susceptibility results. Eng. Geol. 244, 14–24.
(Servicio Geológico Colombiano – SGC sgc.gov.co in Spanish) for Luo, W., Liu, C.C., 2018. Innovative landslide susceptibility mapping supported by
providing the information used to prepare this research. We thank the geomorphon and geographical detector methods. Landslides 15, 465–474.
Mahdadi, F., Boumezbeur, A., Hadji, R., Kanungo, D.P., Zahri, F., 2018. GIS-based
reviewers for their careful reading of the manuscript and their landslide susceptibility assessment using statistical models: a case study from Souk
constructive remarks. Ahras province, N-E Algeria. Arabian Journal of Geosciences 11.
Oh, H.-J., Kadavi, P.R., Lee, C.-W., Lee, S., 2018. Evaluation of landslide susceptibility
mapping by evidential belief function, logistic regression and support vector
Appendix A. Supplementary data machine models. Geomatics, Natural Hazards and Risk 9, 1053–1070.
Ozdemir, A., Altural, T., 2013. A comparative study of frequency ratio, weights of
Supplementary data to this article can be found online at https://doi. evidence and logistic regression methods for landslide susceptibility mapping: Sultan
mountains, SW Turkey. J. Asian Earth Sci. 64, 180–197.
org/10.1016/j.enggeo.2020.105958. Pamela, Sadisun, I. A, Arifianti, Y., 2018. Weights of Evidence Method for Landslide
Susceptibility Mapping in Takengon, Central Aceh, Indonesia. IOP Conference Series:
References Earth and Environmental Science, 118.
Patriche, C.V., Pirnau, R., Grozavu, A., Rosca, B., 2016. A Comparative Analysis of Binary
Logistic Regression and Analytical Hierarchy Process for Landslide Susceptibility
Abbaszadeh Shahri, A., Spross, J., Johansson, F., Larsson, S., 2019. Landslide
Assessment in the Dobrov River Basin, Romania. Pedosphere 26, 335–350.
susceptibility hazard map in Southwest Sweden using artificial neural network.
Pradhan, B., 2010. Remote sensing and GIS-based landslide hazard analysis and cross-
Catena 183, 104225.
validation using multivariate logistic regression model on three test areas in
Achour, Y., Pourghasemi, H.R., 2019. How do machine learning techniques help in
Malaysia. Adv. Space Res. 45, 1244–1256.
increasing accuracy of landslide susceptibility maps? Geoscience Frontiers 11,
Regmi, N.R., Giardino, J.R., Vitek, J.D., 2010. Modeling susceptibility to landslides using
871–883.
the weight of evidence approach: Western Colorado, USA. Geomorphology 115,
Aditian, A., Kubota, T., Shinohara, Y., 2018. Comparison of GIS-based landslide
172–187.
susceptibility models using frequency ratio, logistic regression, and artificial neural
Tien Bui, D., Tuan, T.A., Klempe, H., Pradhan, B., Revhaug, I., 2016. Spatial prediction
network in a tertiary region of Ambon, Indonesia. Geomorphology 318, 101–111.
models for shallow landslide hazards: a comparative assessment of the efficacy of
Awange, J.L., Paláncz, B., Lewis, R.H., Völgyesi, L., 2018. Mathematical Geosciences.
support vector machines, artificial neural networks, kernel logistic regression, and
Springer, Hybrid Symbolic-Numeric Methods.
logistic model tree. Landslides 13, 361–378.
Baeza, C., Corominas, J., 2001. Assessment of shallow landslide susceptibility by means
Servicio Geológico Colombiano, 2015. Zonificación geomecánica y de amenaza por
of multivariate statistical techniques. Earth Surf. Process. Landf. 26, 1251–1263.
movimientos en masa del municipio de Popayán - Cauca. Imprenta Nacional de
Bai, S.B., Wang, J., Lü, G.N., Zhou, P.G., Hou, S.S., Xu, S.N., 2010. GIS-based logistic
Colombia (Technical Report Dirección de Geoamenazas).
regression for landslide susceptibility mapping of the Zhongxian segment in the
Servicio Geológico Colombiano, 2017. Guía Metodológica para la Zonificación de
three Gorges area, China. Geomorphology 115, 23–31.
Amenaza por Movimientos en Masa Escala 1:25000. Imprenta Nacional de
Bastidas, A.E., Rodriguez, E., Jaramillo, M., Solarte, E., 2004. Simulation model of
Colombia.
absorption and scattering properties of laser light applied to urban aerosols over the
Tien Bui, D., Hoang, N.-D., Nguyen, H., Tran, X.-L., 2019. Spatial prediction of shallow
city of Popayan, Colombia. In U. N. Singh (Ed.), Laser Radar Techniques for
landslide using Bat algorithm optimized machine learning approach: A case study in
Atmospheric Sensing (p. 147). SPIE volume 5575.
Lang Son Province, Vietnam. Adv. Eng. Inform. 42, 100978.
Bragagnolo, L., da Silva, R.V., Grzybowski, J.M.V., 2020. Landslide susceptibility
Trigila, A., Iadanza, C., Esposito, C., Scarascia-Mugnozza, G., 2015. Comparison of
mapping with r.landslide: A free open-source GIS-integrated tool based on Artificial
Logistic Regression and Random Forests techniques for shallow landslide
Neural Networks. Environmental Modelling and Software 123.
susceptibility assessment in Giampilieri (NE Sicily, Italy). Geomorphology 249,
Bui, D.T., Tsangaratos, P., Nguyen, V.T., Liem, N.V., Trinh, P.T., 2020. Comparing the
119–136.
prediction performance of a Deep Learning Neural Network model with

8
P. Goyes-Peñafiel and A. Hernandez-Rojas Engineering Geology 280 (2021) 105958

Tsangaratos, P., Ilia, I., 2016. Comparison of a logistic regression and Naïve Bayes Varnes, D., 1978. Slope movement, types and processes. In: Schuster, R., Krizek, R.
classifier in landslide susceptibility assessments: the influence of models complexity (Eds.), Landslides: Analysis and control 176 Chapter 2. Transportation Research
and training dataset size. Catena 145, 164–179. Board, Washington, D.C., pp. 11–33
Vakhshoori, V., Zare, M., 2016. Landslide susceptibility mapping by comparing weight of Wang, Y., Fang, Z., Hong, H., 2019. Comparison of convolutional neural networks for
evidence, fuzzy logic, and frequency ratio methods. Geomatics, Natural Hazards and landslide susceptibility mapping in Yanshan County, China. Sci. Total Environ. 666,
Risk 7, 1731–1752. 975–993.
Van Westen, C.J., 2002. Weights of evidence modeling for landslide susceptibility mapping. Yang, J., Song, C., Yang, Y., Xu, C., Guo, F., Xie, L., 2019. New method for landslide
Technical Report International Institute for Geoinformation Science and Earth susceptibility mapping supported by spatial logistic regression and GeoDetector: A
Observation (ITC) Enschede. case study of Duwen Highway Basin, Sichuan Province, China. Geomorphology 324,
Van Westen, C.J., Seijmonsbergen, A.C., Mantovani, F., 1999. Comparing landslide 62–71.
hazard maps. In Natural Hazards (pp. 137–158). Volume 20. Zhao, Y., Wang, R., Jiang, Y., Liu, H., Wei, Z., 2019. GIS-based logistic regression for
Van Westen, C.J., Rengers, N., Soeters, R., 2003. Use of geomorphological information in rainfall-induced landslide susceptibility mapping under different grid sizes in
indirect landslide susceptibility assessment. Nat. Hazards 30, 399–419. Yueqing, Southeastern China. Eng. Geol. 259, 105147.
Van Westen, C.J., Castellanos, E., Kuriakose, S.L., 2008. Spatial data for landslide Zhou, C., Yin, K., Cao, Y., Ahmed, B., Li, Y., Catani, F., Pourghasemi, H.R., 2018.
susceptibility, hazard, and vulnerability assessment: an overview. Eng. Geol. 102, Landslide susceptibility modeling applying machine learning methods: A case study
112–131. from Longju in the three Gorges Reservoir area, China. Comput. Geosci. 112, 23–37.

You might also like