Professional Documents
Culture Documents
H I G H L I G H T S G R A P H I C A L A B S T R A C T
a r t i c l e i n f o a b s t r a c t
Article history: Assessments of landslide disasters are becoming increasingly urgent. The aim of this study is to investigate a
Received 12 November 2018 convolutional neural network (CNN) framework for landslide susceptibility mapping (LSM) in Yanshan County,
Received in revised form 17 January 2019 China. The two primary contributions of this study are summarized as follows. First, to the best of our knowledge,
Accepted 16 February 2019
this report describes the first time that the CNN framework is used for LSM. Second, different data representation
Available online 22 February 2019
algorithms are developed to construct three novel CNN architectures. In this work, sixteen influencing factors as-
Editor: Ralf Ludwig sociated with landslide occurrence were considered and historical landslide locations were randomly divided
into training (70% of the total) and validation (30%) sets. Validation of these CNNs was performed using different
Keywords: commonly used measures in comparison to several of the most popular machine learning and deep learning
Landslide susceptibility methods. The experimental results demonstrated that the proportions of highly susceptible zones in all of the
Deep learning CNN landslide susceptibility maps are highly similar and lower than 30%, which indicates that these CNNs are
Convolutional neural network more practical for landslide prevention and management than conventional methods. Furthermore, the pro-
Data presentation algorithm posed CNN framework achieved higher or comparable prediction accuracy. Specifically, the proposed CNNs
Yanshan County
were 3.94%–7.45% and 0.079–0.151 higher than those of the optimized support vector machine (SVM) in
terms of overall accuracy (OA) and Matthews correlation coefficient (MCC), respectively.
© 2019 Elsevier B.V. All rights reserved.
⁎ Corresponding author.
⁎⁎ Correspondence to: H. Hong, Key Laboratory of Virtual Geographic Environment (Nanjing Normal University), Ministry of Education, Nanjing 210023, China.
E-mail addresses: cug.yi.wang@gmail.com (Y. Wang), 171301013@stu.njnu.edu.cn, hong_haoyuan@outlook.com (H. Hong).
https://doi.org/10.1016/j.scitotenv.2019.02.263
0048-9697/© 2019 Elsevier B.V. All rights reserved.
976 Y. Wang et al. / Science of the Total Environment 666 (2019) 975–993
surface characteristics that may cause landslides (Bui et al., 2017; and validation sets may make the prediction process very complex.
Varnes, 1984). Land use is a key factor that contributes to landslide oc- Meanwhile, the curse of dimensionality always results in poor predic-
currences and has an important impact on the stability of slopes in tion accuracies. The feature selection is beneficial for using attribution
terms of vegetative coverage (Pham et al., 2016c). Lithology is one of selection methods and obtaining high-quality data. Furthermore, this
the most commonly used factors in LSM and some geological formations step can remove extraneous and redundant features from all the avail-
are more favourable to landslides. Rainfall is a key landslide-induced able attributes. In this work, two influencing factor evaluators of
factor because it can influence the shear strength of slopes (Pham multicollinearity analysis and GR are considered and introduced in the
et al., 2016b; Varnes, 1984). The factors of distance to faults, roads and following subsections.
rivers have an important impact on the spread and size of landslides
in the study area (Pham et al., 2016a; Pham et al., 2015). According to 3.3.1. Multicollinearity analysis
previous works, environmental conditions and the available data of To estimate the correlation between the landslides' influencing
the study area, sixteen influencing factors were considered in this factors, multicollinearity analysis was applied in this study.
work for landslide analysis, including the morphological factors of alti- Multicollinearity is a statistical phenomenon in which there exists a
tude, aspect, slope, plan curvature, profile curvature, STI, SPI and TWI, high relationship between two or more predictor variables in a multiple
the geological factors of lithology and distance to faults, the land cover regression model (O'brien, 2007). In this study, tolerance (TOL) and var-
factors of land use, NDVI, soil and distance to roads, the hydrological fac- iance inflation factor (VIF) were used to detect multicollinearity among
tor of distance to river and rainfall. In addition, selection of a suitable influencing factors. Let X = {X1, X2, … , XN} define a given independent
terrain mapping unit is significant for LSM. Among all terrain mapping variable set and R2j denote the coefficient of determination when the
units, grid cells are the most popular for raster-based GIS users in jth independent variable Xj is regressed on all other predictor variables
modelling landslide susceptibility. Using ArcGIS 10.2 software, each fac- in the model. The VIF value is computed as follows:
tor was converted in the form of spatially defined layers of maps with a
grid size of 25 × 25 m, which is in accord with the digital elevation
model (DEM) data generated from 1:50,000 scale topographic maps. VIF ¼ 1= 1−R2j ð1Þ
are multicollinearity and should be removed from the landslide predic- Finally, the GR is defined as follows:
tion models.
GainðAÞ
GainRatioðAÞ ¼ ð6Þ
3.3.2. Gain ratio method SplitInfoA ðSÞ
In this study, the feature selection method of gain ratio (GR) was
used to select an optimal subset to improve the prediction perfor- The average merit (AM) deprived from this method uncovers the
mance in LSM (Dash and Liu, 1997). The information gain measure importance between conditioning factors and landslide occurrence. If
is used to select an attribute at each node of the decision tree. this value is equal to or b0, the corresponding factor is considered to
Then, the GR that is an extension of information gain is proposed to be an irrelevant attribute and should be excluded for prediction,
overcome the bias. For clarification, the GR method is introduced as whereas the remaining factors are used in the following prediction
follows. process.
Let S be a training set and n be the total of samples, with the expected
information given by 3.4. CNN
X
n
A CNN model, exhibiting robust performance in visual image analy-
HðSÞ ¼ − pi log2 ðpi Þ ð2Þ
i¼1
sis, is a class of feed-forward NN whose artificial neurons respond to a
portion of the surrounding elements (Girshick, 2015). This implies
that a CNN is a variation of a multilayer perceptron consisting of one
where pi is the probability that a sample belongs to class Ci. The attribute
or more convolution, max pooling and fully connected layers (Hoo-
A has m values and its average entropy is given by
Chang et al., 2016). The structure of a typical CNN is shown in Fig. 3. A
X
n basic CNN always has input, convolutional, max pooling, fully connected
EðAÞ ¼ − pi HðSÞ ð3Þ and output layers. The input layer is a m × n matrix in which every ele-
i¼1 ment has a feature value; thus, the input data can be represented as a
two-dimensional feature map. Each convolutional layer consists of sev-
and the information gain on attribute A is eral convolutional units, and parameters of every unit are optimized by
a back-propagation algorithm. The purpose of a convolutional manipu-
GainðAÞ ¼ HðSÞ−EðAÞ ð4Þ lation is to extract different features of the input layer (Sharif Razavian
et al., 2014). The first convolutional layer may only extract some low-
The split information value represents the potential information ob- level features such as lines, edges and corners. More convolutional
tained by splitting S into m parts corresponding to m outcomes on attri- layers can iteratively learn more intricate representations from low-
bute A and can be computed as follows: level features. Pooling is a critical manipulation in the CNN technique
(Szegedy et al., 2015). In fact, it is a form of down-sampling to reduce
the dimensionality of feature maps, without altering the depth of
X
m
SplitInfoA ðSÞ ¼ − X i ðSi =SÞ log2 ðSi =SÞ ð5Þ these maps. Max pooling is the most common manipulation in different
i¼1 pooling approaches. The aim of this manipulation is to divide the feature
Y. Wang et al. / Science of the Total Environment 666 (2019) 975–993 979
maps into a number of rectangular areas and produce the maximum maps is connected to a 5 × 5 neighbourhood in the input map. S2 is a
value for each area. Furthermore, this manipulation can continuously max pooling layer for subsampling with six ðn−4Þ2
feature maps. The
22
reduce the dimensionality of data, hence the number of parameters
max pooling kernel size is 2 × 2 and each unit in the feature maps is
and amount of computational cost decrease. Consequently, the over-
connected to the previous layer. C3 is another convolutional layer
fitting problem can be avoided. The fully connected layer reorganizes 2
extracted representations to reduce the loss of feature information, with sixteen ðn−12Þ
22
feature maps that each unit is connected to a 5 × 5
and the output layer produces classification results. It should be noted neighbourhood in the previous layer. The layer S4 performs the max
that the relative position of discovered features to other features is pooling process and has sixteen ðn−12Þ2
feature maps. F5 and F6 are
more important than the precise location, which is very distinctive to 42
fully connected layers with 120 and 84 neural units, respectively. Fi-
CNN.
nally, the output layer produces two neural units to indicate the binary
CNNs are skilled in visual recognition, having different
classification results of “landslide” and “non-landslide”. Fig. 4 shows the
convolutional, max pooling and fully connected layers. For example,
architecture of LeNet-5 when n = 24.
LeNet-5 is a CNN structure used in handwritten digit recognition and
it can effectively solve some visually related problems (LeCun et al.,
1995). However, LeNet-5 cannot be directly employed in landslide sus- 3.5. The proposed CNN architectures
ceptibility analysis. Therefore, in this paper, we proposed new CNN ar-
chitectures that are suitable to landslide susceptibility analysis and The aim of this work is to develop a novel CNN framework for
compared them with the traditional CNN structure of LeNet-5. regional landslide susceptibility analysis. However, landslide data have
Generally, a commonly used LeNet-5 is comprised of eight layers different expressions to fit the CNN architectures, which may influence
(LeCun et al., 1998). Given n × n input data, C1 is a convolutional layer landslide susceptibility results. In this section, three different data rep-
with six (n – 4) × (n – 4) feature maps and each unit in the feature resentation forms were used to construct the novel CNN architectures,
including one-dimensional, two-dimensional and three-dimensional initialization must be performed by converting a one-dimensional
forms of data representation. Therefore, we construct different CNN input grid cell (vector) comprised of different attribute features into a
structures to fit different data representations. In the following, we two-dimensional matrix. In this work, we compare the number of
term the CNN coupling with the one-dimensional, two-dimensional landslide influencing factors with that of attribute values of each factor,
and three-dimensional data representation algorithms as CNN-1D, and choose the maximum of these two numbers to be the size of the
CNN-2D and CNN-3D, respectively. corresponding two-dimensional matrix. For example, there are 24
lithological categories in the study area, a number that is larger than
3.5.1. CNN-1D that of the landslide influencing factors (15). Therefore, we constructed
For LSM, the input data can be regarded as a picture in which each a 24 × 24 matrix for each grid cell. Fig. 6 illustrates the conversion
pixel has several landslide influencing attributes. Therefore, each grid manipulation from a one-dimensional input grid cell (vector) to a
cell of the input data is represented by a column vector with a length de- two-dimensional matrix. Specifically, for each column vector in this
fined by the number of landslide influencing factors. Moreover, each el- matrix, the element value at the position that corresponds to the
ement in this vector corresponds to a landslide influencing attribute. In corresponding attribute value is assigned to be 1, and other element
this work, we develop a 1D CNN structure to directly extract the infor- values in this vector are assigned to be 0.
mation from landslide influencing factors and landslide susceptibility There is one convolutional layer and one max pooling layer in the
analysis. The 1D CNN architecture consists of one convolutional layer, proposed 2D CNN structure. Each convolution layer is followed by a
one max pooling layer and one fully connected layer. Assuming that dropout layer. It is assumed that each landslide grid cell unit is trans-
there are n landslide influencing factors in the input data, the formed into a n × n matrix. The first convolutional layer filters the
convolutional layer filters the input data with N kernels with a size of input data with N kernels with a size of m × m, and thus this layer
m × 1, and thus this layer has N feature vectors with lengths of (n – m has twenty (n–m + 1) × (n–m + 1) feature maps. Each grid cell in
+ 1). Each element in the feature vector is connected to a m × 1 the feature maps is connected to a m × m neighbourhood in the
neighbourhood in the input vector. The max pooling layer has a size of input map. The dropout manipulation temporarily discards the NN
units according to a certain probability during the training process
a × 1 and its result is composed of N vectors with a length of n−mþ1 a .
of the CNN network. This solves for the over-fitting problem and im-
The fully connected layer with k neural units follows the previous
proves classification accuracies. Following the convolutional step, a
layer to represent the extracted features. Finally, we produce two neural
drop manipulation is used. The max pooling layer has a size of a ×
units in the output layer to resolve a binary classification problem. Fig. 5
a. Therefore, the results of this layer consist of N matrices with a
illustrates the architecture of CNN-1D when n = 15, N = 20, m = 3, a =
2 and k = 50. size of ½ðn−mþ1Þ
a ½ðn−mþ1Þ
a , which are then transferred to the second
convolutional layer with M kernels whose size is m × m, followed
by a dropout manipulation. The results of this convolutional layer
3.5.2. CNN-2D
The probability of landslide occurrence of each grid cell is related to consist of M matrices with a size of ½n−ðaþ1Þðm−1Þ
a ½n−ðaþ1Þðm−1Þ
a ,
all the influencing factors. This implies that each grid cell has a set of which are then continuously transferred to the second max pooling
unique attribute values that can reflect the potential of a landslide oc- layer with M whose size is a × a. The resultant M feature maps with
curring. As mentioned in Section 3.3, the CNN technique has been suc- a size of ½n−ðaþ1Þðm−1Þ
a2
½n−ðaþ1Þðm−1Þ
a2
are produced by this max
cessfully applied in image processing. To apply this technique for LSM, pooling layer. A fully connected layer with k neural units follows
Fig. 6. Two-dimensional data form. (a) Input data and (b) the conversion of a 1D grid cell to a 2D matrix.
the previous layer to reorganize the extracted features. Finally, the example, given a neighbourhood of each grid cell with a size of 7 × 7,
output layer produces two neural units to indicate binary classifica- the three-dimensional data representation is illustrated in Fig. 8.
tion results of “landslide” and “non-landslide”. Fig. 6 illustrates the Under these circumstances, we proposed a 3D CNN architecture to
two-dimensional data form and Fig. 7 shows the architecture of extract the influencing factors information and spatial relation to pre-
CNN-2D when n = 24, m = 3, N = 20, M = 15, a = 2 and k = 78. dict the probability of landslide occurrence. Specifically, the three-
dimensional CNN network was composed of one convolutional layer
3.5.3. CNN-3D of N kernels with a size of m × m × m, one max pooling layer and one
Regarding the three-dimensional data representation, the input data fully connected layer. Provided the c × n × n input data, the
of the study area can be represented as a three dimensional matrix of convolutional layer has N feature maps with a size of (c–m + 1) × (n–
size c × n × n, where n denotes the row and column of each data layer m + 1) × (n–m + 1). Each grid cell is connected to a m × m × m
and c represents the number of the landslide influencing factors. For neighbourhood in the input data. The next hidden layer is a max pooling
982 Y. Wang et al. / Science of the Total Environment 666 (2019) 975–993
layer with a size of a × a. Thus, the output results from the max pooling et al., 2012). A rectified linear unit (ReLU) function was used in CNN net-
layer have N feature maps with a size of ½c−mþ1 a ½n−mþ1
a ½n−mþ1
a . A works (Dahl et al., 2013). The ReLU function is one of the most common
fully connected layer with k neural units follows the max pooling and efficient activation functions in CNN, and it has two main advan-
layer to learn the extracted features. Finally, we position two neural tages: First, this function can overcome the problem of gradient disap-
units in the output layer to represent a “landslide” and “non-landslide”. pearance. Second, it is more efficient and effective for training
Fig. 9 illustrates the architecture of CNN-3D when c = 15, n = 7, N = 20, prediction methods than other activation functions (Maas et al.,
m = 3, a = 2 and k = 78. 2013). The categorical cross-entropy function and advanced adaptive
gradient (AdaGrad) algorithm were employed as a loss function and
its optimizer, respectively (Anthimopoulos et al., 2016). The AdaGrad
3.5.4. The related parameters method can provide a constraint on the learning rate and use different
Parameter settings have a significant impact on the performance of learning rates for each learning parameter per iteration (Duchi et al.,
classification/prediction methods. In this subsection, some related pa- 2011). The softmax function was used to produce a posteriori probabil-
rameters used in the proposed CNNs are briefly introduced. On the ity for each grid cell (Lawrence et al., 1997). On the other hand, dropout
one hand, the influence of activation functions and loss functions in manipulation plays an important role for improving prediction perfor-
the CNN architecture is very important. In the ANN techniques, the out- mance because it can temporarily discard the neural network units ac-
put of each layer is always a linear function of its previous layer. How- cording to a particular probability during the training process (Hinton
ever, it is very difficult to represent the actual situation using this et al., 2012). Specifically, dropout manipulation forces a neural unit to
linear relationship (Huang and Babri, 1998). To solve for this problem, work with other randomly selected neural units to reducing over-
the activation function technique was used to fit the output data be- fitting and the coadaptation between hidden units (Srivastava et al.,
cause it can effectively convert linear relationships to nonlinear rela- 2014). Furthermore, this manipulation can enhance the generalization
tionships through predefined activation (nonlinear) functions (Dahl of prediction methods (Dahl et al., 2013).
3.6. Model evaluation methods In ML, the Matthews correlation coefficient (MCC) has been used as
a measure of binary classifications, even if the two classes are of very dif-
To evaluate the performance of the proposed framework, measure- ferent sizes (Matthews, 1975). The MCC is defined as follows:
ments of the OA and ROC curve were used (Chen et al., 2017c; Pham
et al., 2017b; Tsangaratos and Ilia, 2016). The OA value is the ratio of
the number of correctly classified grid cells to the total number of grid TP TN−FP FN
MCC ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð8Þ
cells, calculated as follows: ðTP þ FPÞðTP þ FNÞðTN þ FP ÞðTN þ FNÞ
a
OA ¼ 100% ð7Þ where TP and TN (true negative) represent the number of landslide and
b
non-landslide samples that are correctly classified, whereas FP and FN
where a and b denote the numbers of correctly classified landslide or (false negative) denote the number of non-landslide and landslide sam-
non-landslide grid cells and the total number of grid cells in the valida- ples that are misclassified. Moreover, this measure is actually a correla-
tion set. A higher OA value implies better classification precision. The tion coefficient between the observed and predicted classes. Generally,
ROC curve is a standard technique for the performance evaluation of a final result is regarded as a prefect prediction if the MCC value equals
landslide prediction methods (Bradley, 1997). It is produced by plotting to 1, whereas MCC values of 0 and −1 represent a random prediction
the true positive (TP) rate against the false positive (FP) rate at various and a total disagreement between prediction and observation,
threshold values. The TP rate and the FP rate are also referred to “sensi- respectively.
tivity” and “100-specificity” in statistics, respectively. Moreover, the Moreover, a chi-square test was used to evaluate the significant dif-
AUC measure has been widely used to quantitatively evaluate the per- ference between expected methods (Kuncheva, 2004). It is based on a
formance of LSM approaches (Mandal and Mandal, 2018; Pham et al., prior hypothesis that LSM methods have no significant difference
2017a; Wang et al., 2017). Specifically, a prediction approach is consid- (Tallarida and Murray, 1987). Chi-square and p values were selected
ered good if the AUC value is close to 1 (Tsangaratos et al., 2017; Zhu and calculated for validation. In general, if the p value is smaller than
et al., 2018). 0.05 and the chi-square value is higher than 3.841, there is a significant
difference between the two LSM methods (Pham et al., 2017a).
Table 1
Multicollinearity analysis of landslide influencing factors.
TOL VIF
Table 2
Spatial relationship between each landslide conditioning factor and landslide using the FR model.
Factor Class No. of landslides Percentage of landslides No. of pixels in domain Percentage of domain FR
Table 2 (continued)
Factor Class No. of landslides Percentage of landslides No. of pixels in domain Percentage of domain FR
4. Results that influences soil structure; thus, the highest rainfall of N1400 mm
has the highest FR value of 1.45. With respect to plan curvature and pro-
4.1. Relative importance analysis of influencing factors file curvature factors, the b−0.4 and N0.2 classes have the highest FR
values of 1.2 and 1.13, accounting for over 37% and 47% of landslide oc-
The predictive capability of all the landslide influencing factors was currences in the study area, respectively. The N60° class of slope factor
evaluated using the training set based on multicollinearity analysis with the highest FR value of 12.71 has much greater probability of land-
and the GR method. Table 1 lists the results of the multicollinearity anal- slide occurrence than the other classes. The highest FR value of 2.15 cor-
ysis of landslide influencing factors. The factor of STI is shown to have a responds to the LVh soil type. For SPI factor, N50% of landslides occurred
VIF value that is 10.466 larger than the threshold value (10). Therefore, in the N50 class, which had the highest FR value of 1.35. The relationship
it should be removed from the prediction processes. between landslide occurrence and TWI showed that the N11 and 7–9
For the GR method, factors with higher weights are more significant classes have the highest and lowest FR values of 2.12 and 0.72, respec-
to the prediction methods, whereas factors with weights of zero cannot tively. Fig. 11 presents reclassification maps of all the landslide influenc-
contribute to landslide susceptibility modelling and should be excluded ing factors for better visual inspection.
from further analysis. Fig. 10 shows the AM of each influencing factors.
Among these factors, the landuse factor has the highest AM value of 4.3. Model validation and comparison
0.0617, which indicates that it is more important than the other factors.
The AM values of NDVI, altitude, lithology, slope, soil, distance to rivers, In this subsection, we used the three proposed CNN and LeNet-5
aspect, SPI and rainfall are between 0.0474 and 0.0151. Furthermore, methods introduced in Section 3 for experiments in landslide suscepti-
the AM values of distance to faults, distance to roads, plan curvature, bility analysis. All source codes of the methods previously mentioned
TWI and profile curvature are positive but b0.01, which indicates little were implemented in Python under the well-known TensorFlow frame-
contribution is provided to the models. According to previous analysis, work (http://www.tensorflow.org). TensorFlow is an open source soft-
all the remaining landslide influencing factors with AM values greater ware library using data flow graphs and has been widely used in ML and
than zero contribute to the LSM. DL. The experimental results were produced using a PC equipped with
an Intel Core i5-8400 processor and Nvidia GeForce GTX 660 graphics
4.2. Influencing factors analyses using FR model card. To construct the CNNs, all the parameter settings were optimized
through a training process using the trial-and-error method, as shown
The relationship between landslide occurrence and related influenc- in Table 3.
ing factors using an FR model is summarized in Table 2. If the FR value As the landslide prediction methods were constructed using the
N1, the corresponding area is more prone to landslide occurrence (Oh training set, each grid cell in the study area was assigned a susceptibility
et al., 2011). In Table 2, for altitude, the b200 m class has the lowest index. After assigning weights to the factor classes, the landslide suscep-
FR value of 0.19 and the FR values of other classes are all above 1. Results tibility map was prepared in an ArcGIS environment. For better visuali-
regarding aspect demonstrated that the highest FR value of 2.37 belongs zation, the indices were reclassified using the natural breaks method
to the southeast class in which N26% of the landslides occurred. In the into five levels of very low, low, moderate, high and very high. Fig. 12 il-
case of distance to faults, the highest FR value of 1.14 belongs to the lustrates landslide susceptibility maps obtained by different CNNs and
0–2000 m and 5000–8000 m classes. However, for the factors of dis- Fig. 13 shows the distribution of each class per landslide susceptibility
tance to rivers and distance to roads, the FR value increases as the dis- map. It can be observed that the northernmost part of all susceptibility
tance increases. Regarding the results of land-use factor, the forest maps are categorized as very low and low susceptible zones. The results
class has the highest FR value of 1.52, accounting for 80% of the landslide of CNN-1D and CNN-2D were very similar. However, the very high class
occurrences, which indicates that this class has significant importance in Fig. 12 (a) was evenly distributed in the study area, whereas this class
to landslide occurrence. Lithology results revealed that the W class has in Fig. 12 (b) was mainly concentrated in the northwest and southern-
the highest FR value of 4.57, indicating highest probability of landslide most parts of the study area. For the result of CNN-3D in Fig. 12 (c),
occurrence. For the NDVI factor, the FR value increases as the factor more than half of the study area were marked as very low, whereas
class value increases. Rainfall is a critical factor on landslide incidence the percentage was b10% for the low, moderate and high classes. The
986 Y. Wang et al. / Science of the Total Environment 666 (2019) 975–993
Y. Wang et al. / Science of the Total Environment 666 (2019) 975–993 987
988 Y. Wang et al. / Science of the Total Environment 666 (2019) 975–993
Fig. 11. Thematic maps of the study area. (a) Altitude, (b) aspect, (c) distance to faults, (d) land use, (e) lithology, (f) NDVI, (g) plan curvature, (h) profile curvature, (i) rainfall, (j) distance
to rivers, (k) distance to roads, (l) slope, (m) soil, (n) SPI and (o) TWI.
result of LeNet-5 in Fig. 12 (d) demonstrated the largest very high sus- 25.36%, respectively, which demonstrates that the proposed methods
ceptible zones, approximately 30% of the study area. It should be are more practical for landslide prevention and management than the
noted that the sum of the proportions of the very high and high classes LeNet-5 method with a proportion of 44.76%.
is similar for all the proposed CNNs. Specifically, the very high and high Table 4 lists the OA and MCC values of the three proposed CNNs and
classes of the CNN-1D, CNN-2D and CNN-3D are 25.42%, 29.52% and the LeNet-5 method. The proposed CNNs achieve higher OA values than
that of LeNet-5. Specifically, the CNN-2D method achieved the highest
OA value of 77.63%, which is approximately 3% higher than that of
LeNet-5, followed by CNN-3D and CNN-1D with OA values of 75.88%
Table 3 and 74.12%, respectively. In addition, CNN-2D obtained the highest
Parameter settings of the CNNs.
MCC value of 0.555, followed by CNN-3D, LeNet-5 and CNN-1D with
Method Parameter settings MCC values of 0.518, 0.510 and 0.483, respectively.
CNN-1D Convolutional kernel size: 3 × 1; max pooling kernel size: 2 × 1; number The ROC curves of all the methods using the validation set are shown
of iterations: 300; activation function: ReLU; optimizer: AdaGrad in Fig. 14. The CNN-2D method demonstrated better predictive power
CNN-2D Convolutional kernel size: 3 × 3; max pooling kernel size: 2 × 2; number than the other models in terms of AUC. Specifically, the CNN-3D and
of iterations: 100; activation function: ReLU; optimizer: AdaGrad;
LeNet-5 methods achieved very similar AUC values of 0.806 and 0.807,
dropout rate: 0.4 and 0.3
CNN-3D Convolutional kernel size: 3 × 3 × 3; max pooling kernel size: 2 × 2 × 2; respectively, and the CNN-1D method obtained the lowest AUC value
number of iterations: 300; activation function: ReLU; optimizer: of 0.799. Furthermore, a chi-square test was used to evaluate the signif-
AdaGrad icant difference of different prediction methods. If the chi-square value
LeNet-5 Convolutional kernel size: 5 × 5; max pooling kernel size: 2 × 2; number was larger than 3.841 and the significant level value (p) was lower than
of iterations: 15; activation function: ReLU; optimizer: AdaGrad
0.05, then the difference of the prediction methods was significant.
Y. Wang et al. / Science of the Total Environment 666 (2019) 975–993 989
Fig. 12. Landslide susceptibility maps for different CNN prediction methods. (a) CNN-1D; (b) CNN-2D; (c) CNN-3D; (d) LeNet-5.
Table 5 lists the chi-square values and significant levels of different comparison. The optimal C (27) and γ (2−9) for SVM were obtained
CNNs. All the CNNs are very different because the chi-square and signif- using a five-fold cross validation ranging from 2−5 and 2−15 to 215
icant level values of these methods clearly satisfied the above threshold and 25, respectively. Fig. 15 presents the landslide susceptibility maps
conditions previously mentioned. of the DNN and SVM methods.
To further validate the effectiveness of the proposed CNNs, CNN-2D, Table 6 lists the OA and MCC values of the three DL and SVM
which demonstrated the best performance in the previous experiments, methods. CNN-2D achieved the highest OA value of 77.63%, which is ap-
was selected to be compared with several of the most popular ML and proximately 7% higher than that of the optimized SVM (70.18%),
DL methods. Deep neural networks (DNNs) are typically feed-forward followed by LeNet-5 and DNN with very similar OA values of 73.25%
networks in which data flows from the input layer to the output layer and 71.05%, respectively. In terms of MCC, CNN-2D achieved a much
without looping back (CireşAn et al., 2012a). Initially, a DNN creates a higher MCC value than that of the optimized SVM. For example, the
virtual neural unit map and connects these neural units by weighting CNN-2D obtained the highest MCC value of 0.555, followed by the
them. Then, the input data multiplies with the weights producing a LeNet-5, DNN and SVM with MCC values of 0.510, 0.421 and 0.404,
probability between 0 and 1. The selected DNN is a five-layer network respectively.
architecture including four hidden fully connected layers. The four hid- Fig. 16 shows the ROC curves of all the methods using the validation
den layers have 50, 30, 20 and 10 neural units. The output layer obtains set. CNN-2D demonstrated better predictive power than the optimized
prediction results with two neural units, representing landslide and SVM in terms of AUC. Specifically, the two CNNs of CNN-2D and LeNet-5
non-landslide units, respectively. As a classical and robust model, the achieved AUC values of 0.813 and 0.807, respectively, and DNN obtained
SVM classifier with a radial basis function (RBF) kernel was used for an AUC value approximately 0.8.
990 Y. Wang et al. / Science of the Total Environment 666 (2019) 975–993
4.4. Parameter analysis and SVM classifiers in Yanshan Country, China. Furthermore, different
data representations converted from raw landslide data are presented
In this subsection, the impact of dropout manipulation to landslide and fitted in the three proposed CNN architectures of CNN-1D, CNN-
spatial prediction is initially discussed. Then, the influence of activation 2D and CNN-3D, respectively.
functions in the CNN architecture is addressed. It should be noted that, Before analysing landslide susceptibility, it is very important to as-
for simplicity, these parameters were analysed using CNN-2D. sess the predictive capability of all of the influencing factors. To achieve
In the first experiment, we constructed two networks for compari- this objective, multicollinearity analysis was used to estimate correla-
son and trained them using the same training set. The first network tions between these factors and the GR method was employed to rank
structure was constructed using CNN-2D with two dropout manipula- the importance of these factors. The multicollinearity analysis results re-
tions, which have dropout rates of 0.4 and 0.3, respectively. The second vealed that the STI factor has strong multicollinearity and should be re-
network structure was built using the CNN-2D method without any moved from the subsequent process. Regarding the results of the GR
dropout manipulation. The plot of OA values obtained from the two net- method, the landuse and NDVI factors had higher AM values than the
works is shown in Fig. 17. In this figure, the term “Epoch” implies the other factors, indicating that the two factors were more important for
number of times that the network is trained using the entire training landslide occurrence. On the one hand, the FR values of the grass and
data. The OA value obtained from CNN-2D is shown to be effectively im- forest classes are much greater than the other three classes with regard
proved by including a dropout manipulation after each convolutional to the land use factor. On the other hand, NDVI can accurately display
process in the network. surface vegetation coverage. It should be noted that landslides continue
In the second experiment, two active functions of ReLU and tanh to occur in mountainous areas due to rainfall and external forces, even
were considered for comparison. The plot of OA values obtained using on slopes covered with significant vegetation.
CNN-2D with the two active functions is illustrated in Fig. 18. Although
the OA results obtained using CNN-2D with ReLU and tanh were not sta-
ble and oscillated, the CNN with ReLU achieved higher OA values than
those with tanh in most instances. In conclusion, the CNN with ReLU
as the activation function can produce more reliable prediction results.
5. Discussion
Table 4
Performance of different methods.
Table 5 Table 6
Chi-square values and significant levels of different CNNs. Performance of different prediction methods.
Comparative pairs Chi-square value p value Significance level Method OA value MCC
Fig. 15. Landslide susceptibility maps of DNN (a) and SVM (b).
992 Y. Wang et al. / Science of the Total Environment 666 (2019) 975–993
Acknowledgements
Fig. 17. OA values obtained using CNN-2D with and without dropout manipulations. References
Andrieu, C., De Freitas, N., Doucet, A., Jordan, M.I., 2003. An introduction to MCMC for ma-
chine learning. Mach. Learn. 50, 5–43.
Furthermore, the three proposed data representation forms provide a Anthimopoulos, M., Christodoulidis, S., Ebner, L., Christe, A., Mougiakakou, S., 2016. Lung
new way to handle raw landslide data. The experimental results pattern classification for interstitial lung diseases using a deep convolutional neural
showed that CNN-2D is superior to the classical DL technique of DNN network. IEEE Trans. Med. Imaging 35, 1207–1216.
Bengio, Y., Courville, A., Vincent, P., 2013. Representation learning: a review and new per-
and the conventional ML technique of SVM, which indicated that the
spectives. IEEE Trans. Pattern Anal. Mach. Intell. 35, 1798–1828.
three proposed CNNs may be promising and robust techniques for LSM. Bradley, A.P., 1997. The use of the area under the ROC curve in the evaluation of machine
learning algorithms. Pattern Recogn. 30, 1145–1159.
Broeckx, J., Vanmaercke, M., Duchateau, R., Poesen, J., 2018. A data-based landslide sus-
6. Conclusions
ceptibility map of Africa. Earth Sci. Rev. 185, 102–121.
Bui, D.T., et al., 2016a. GIS-based modeling of rainfall-induced landslides using data
This work investigates the application of a CNN framework for LSM mining-based functional trees classifier with AdaBoost, Bagging, and MultiBoost en-
in the case of Yanshan County, China. The framework is a useful meth- semble frameworks. Environmental Earth Sciences 75, 1101.
Bui, D.T., Tuan, T.A., Klempe, H., Pradhan, B., Revhaug, I., 2016b. Spatial prediction models
odology that can be applied to other areas in the world with similar for shallow landslide hazards: a comparative assessment of the efficacy of support
characteristics. The proposed CNNs were validated in the study area vector machines, artificial neural networks, kernel logistic regression, and logistic
based on the analysis of sixteen influencing factors that were derived model tree. Landslides 13, 361–378.
Bui, D.T., et al., 2017. Spatial prediction of rainfall-induced landslides for the Lao Cai area
from different ancillary data. The final landslide susceptibility maps of (Vietnam) using a hybrid intelligent approach of least squares support vector ma-
the study area were obtained using these CNNs in comparison to the chines inference model and artificial bee colony optimization. Landslides 14,
conventional ML and DL methods of SVM, DNN and LeNet-5. The valida- 447–458.
Chen, W., Pourghasemi, H.R., Zhao, Z., 2017a. A GIS-based comparative study of
tion of the results was conducted on the basis of the objective measures Dempster-Shafer, logistic regression and artificial neural network models for land-
of OA, MCC, ROC and AUC. The experimental results confirmed the fol- slide susceptibility mapping. Geocarto international 32, 367–385.
lowing conclusions. First, the landslide susceptibility maps obtained Chen, W., et al., 2017b. GIS-based landslide susceptibility modelling: a comparative as-
sessment of kernel logistic regression, Naïve-Bayes tree, and alternating decision
using the proposed CNNs are more practical for landslide prevention tree models. Geomatics, Natural Hazards and Risk 8, 950–973.
and management than those from conventional methods. Second, the Chen, W., et al., 2017c. A comparative study of logistic model tree, random forest, and
prediction results obtained using the proposed CNNs are better than classification and regression tree models for spatial prediction of landslide suscepti-
bility. Catena 151, 147–160.
Chen, W., Pourghasemi, H.R., Naghibi, S.A., 2018a. A comparative study of landslide sus-
ceptibility maps produced using support vector machine with different kernel func-
tions and entropy data mining models in China. Bull. Eng. Geol. Environ. 77, 647–664.
Chen, W., et al., 2018b. GIS-based landslide susceptibility evaluation using a novel hybrid
integration approach of bivariate statistical based random forest method. Catena 164,
135–149.
Chigira, M., Wu, X., Inokuchi, T., Wang, G., 2010. Landslides induced by the 2008
Wenchuan earthquake, Sichuan, China. Geomorphology 118, 225–238.
CireşAn, D., Meier, U., Masci, J., Schmidhuber, J., 2012a. Multi-column deep neural net-
work for traffic sign classification. Neural Netw. 32, 333–338.
Cireşan, D., Meier, U., Schmidhuber, J., 2012b. Multi-column deep neural networks for
image classification, 2012 IEEE Conference on Computer Vision and. Pattern Recogn.
3642–3649.
Dahl, G.E., Yu, D., Deng, L., Acero, A., 2012. Context-dependent pre-trained deep neural
networks for large-vocabulary speech recognition. IEEE Trans. Audio Speech Lang.
Process. 20, 30–42.
Dahl, G.E., Sainath, T.N., Hinton, G.E., 2013. Improving deep neural networks for LVCSR
using rectified linear units and dropout, Acoustics, Speech and Signal Processing
(ICASSP). 2013 IEEE International Conference on. IEEE 8609–8613.
Dai, F., Lee, C., Li, J., Xu, Z., 2001. Assessment of landslide susceptibility on the natural ter-
rain of Lantau Island, Hong Kong. Environ. Geol. 40, 381–391.
Dash, M., Liu, H., 1997. Feature selection for classification. Intelligent data analysis 1,
131–156.
Ding, A., Zhang, Q., Zhou, X., Dai, B., 2016. Automatic recognition of landslide based on
CNN and texture change detection, Chinese Association of Automation (YAC).
Youth Academic Annual Conference of. IEEE 444–448.
Duchi, J., Hazan, E., Singer, Y., 2011. Adaptive subgradient methods for online learning and
Fig. 18. OA values obtained using CNN-2D with ReLU and tanh activation functions. stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159.
Y. Wang et al. / Science of the Total Environment 666 (2019) 975–993 993
Girshick, R., 2015. Fast R-CNN. Proceedings of the IEEE international conference on com- Pham, B.T., Bui, D.T., Prakash, I., Dholakia, M., 2016c. Rotation forest fuzzy rule-based clas-
puter vision 1440–1448. sifier ensemble for spatial prediction of landslides using GIS. Nat. Hazards 83, 97–127.
Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.R., 2012. Improv- Pham, B.T., et al., 2017a. A novel ensemble classifier of rotation forest and Naïve Bayer for
ing neural networks by preventing co-adaptation of feature detectors. arXiv preprint landslide susceptibility assessment at the Luc Yen district, Yen Bai Province (Viet
arXiv:1207.0580. Nam) using GIS. Geomatics, Natural Hazards and Risk 8, 649–671.
Hong, H., Pourghasemi, H.R., Pourtaghi, Z.S., 2016a. Landslide susceptibility assessment in Pham, B.T., Bui, D.T., Pourghasemi, H.R., Indra, P., Dholakia, M., 2017b. Landslide suscepti-
Lianhua County (China): a comparison between a random forest data mining tech- bility assessment in the Uttarakhand area (India) using GIS: a comparison study of
nique and bivariate and multivariate statistical models. Geomorphology 259, prediction capability of naïve bayes, multilayer perceptron neural networks, and
105–118. functional trees methods. Theor. Appl. Climatol. 128, 255–273.
Hong, H., et al., 2016b. Spatial prediction of landslide hazard at the Luxi area (China) using Pham, B.T., Bui, D.T., Prakash, I., Nguyen, L.H., Dholakia, M., 2017c. A comparative study of
support vector machines. Environmental Earth Sciences 75, 40. sequential minimal optimization-based support vector machines, vote feature inter-
Hong, H., Pradhan, B., Sameen, M.I., Chen, W., Xu, C., 2017a. Spatial prediction of rotational vals, and logistic regression in landslide susceptibility assessment using GIS. Environ-
landslide using geographically weighted regression, logistic regression, and support mental Earth Sciences 76, 371.
vector machine models in Xing Guo area (China). Geomatics, Natural Hazards and Pham, B.T., Tien Bui, D., Prakash, I., Dholakia, M.B., 2017d. Hybrid integration of multilayer
Risk 8, 1997–2022. perceptron neural networks and machine learning ensembles for landslide suscepti-
Hong, H., Tsangaratos, P., Ilia, I., Chen, W., Xu, C., 2017b. Comparing the Performance of a bility assessment at Himalayan area (India) using GIS. Catena, 149, Part 1, 52-63.
Logistic Regression and a Random Forest Model in Landslide Susceptibility Assess- Pham, B.T., Shirzadi, A., Bui, D.T., Prakash, I., Dholakia, M., 2018. A hybrid machine learning
ments. The Case of Wuyaun Area, China, Workshop on World Landslide Forum. ensemble approach based on a radial basis function neural network and rotation for-
Springer, pp. 1043–1050. est for landslide susceptibility modeling: a case study in the Himalayan area, India. In-
Hong, H., et al., 2018. Landslide susceptibility mapping using J48 Decision Tree with ternational Journal of Sediment Research 33, 157–170.
AdaBoost, Bagging and Rotation Forest ensembles in the Guangchang area (China). Pourghasemi, H.R., Mohammady, M., Pradhan, B., 2012. Landslide susceptibility mapping
Catena 163, 399–413. using index of entropy and conditional probability models in GIS: Safarood Basin,
Hoo-Chang, S., et al., 2016. Deep convolutional neural networks for computer-aided de- Iran. Catena 97, 71–84.
tection: CNN architectures, dataset characteristics and transfer learning. IEEE Trans. Pulighe, G., Baiocchi, V., Lupia, F., 2016. Horizontal accuracy assessment of very high res-
Med. Imaging 35, 1285. olution Google Earth images in the city of Rome, Italy. International Journal of Digital
Huang, G.-B., Babri, H.A., 1998. Upper bounds on the number of hidden neurons in Earth 9, 342–362.
feedforward networks with arbitrary bounded nonlinear activation functions. IEEE Reichenbach, P., Rossi, M., Malamud, B., Mihir, M., Guzzetti, F., 2018. A review of
Trans. Neural Netw. 9, 224–229. statistically-based landslide susceptibility models. Earth Sci. Rev. 180, 60–91.
Kavzoglu, T., Sahin, E.K., Colkesen, I., 2014. Landslide susceptibility mapping using GIS- Ren, S., He, K., Girshick, R., Sun, J., 2015. Faster R-CNN: towards real-time object detection
based multi-criteria decision analysis, support vector machines, and logistic regres- with region proposal networks. Adv. Neural Inf. Proces. Syst. 91–99.
sion. Landslides 11, 425–439. Schmidhuber, J., 2015. Deep learning in neural networks: an overview. Neural Netw. 61,
Krizhevsky, A., Sutskever, I., Hinton, G.E., 2012. Imagenet classification with deep 85–117.
convolutional neural networks. Adv. Neural Inf. Proces. Syst. 1097–1105. Sharif Razavian, A., Azizpour, H., Sullivan, J., Carlsson, S., 2014. CNN features off-the-shelf:
Kuncheva, L.I., 2004. Combining Pattern Classifiers: Methods and Algorithms. John Wiley an astounding baseline for recognition, Proceedings of the IEEE conference on com-
& Sons. puter vision and pattern recognition workshops, pp. 806–813.
Lawrence, S., Giles, C.L., Tsoi, A.C., Back, A.D., 1997. Face recognition: a convolutional Shirzadi, A., et al., 2017. Shallow landslide susceptibility assessment using a novel hybrid
neural-network approach. IEEE Trans. Neural Netw. 8, 98–113. intelligence approach. Environmental Earth Sciences 76, 60.
LeCun, Y., et al., 1995. Comparison of Learning Algorithms for Handwritten Digit Recogni- Simard, P.Y., Steinkraus, D., Platt, J.C., 2003. Best practices for convolutional neural net-
tion, International Conference on Artificial Neural Networks. Perth, Australia, works applied to visual document analysis. null. IEEE 958.
pp. 53–60. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R., 2014. Dropout: a
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P., 1998. Gradient-based learning applied to doc- simple way to prevent neural networks from overfitting. The Journal of Machine
ument recognition. Proc. IEEE 86, 2278–2324. Learning Research 15, 1929–1958.
LeCun, Y., Bengio, Y., Hinton, G., 2015. Deep learning. nature 521, 436. Szegedy, C., et al., 2015. Going deeper with convolutions. Proc. IEEE Conf. Comput. Vis.
Maas, A.L., Hannun, A.Y., Ng, A.Y., 2013. Rectifier nonlinearities improve neural network Pattern Recognit. 1–9.
acoustic models, Proc. icml, pp. 3. Tallarida, R.J., Murray, R.B., 1987. Chi-square Test, Manual of Pharmacologic Calculations.
Mandal, S., Mandal, K., 2018. Modeling and mapping landslide susceptibility zones using Springer, pp. 140–142.
GIS based multivariate binary logistic regression (LR) model in the Rorachu river Tsangaratos, P., Ilia, I., 2016. Comparison of a logistic regression and Naïve Bayes classifier
basin of eastern Sikkim Himalaya, India. Modeling Earth Systems and Environment in landslide susceptibility assessments: the influence of models complexity and train-
1–20. ing dataset size. Catena 145, 164–179.
Matthews, B.W., 1975. Comparison of the predicted and observed secondary structure of Tsangaratos, P., Ilia, I., Hong, H., Chen, W., Xu, C., 2017. Applying Information Theory and
T4 phage lysozyme. Biochimica et Biophysica Acta (BBA)-Protein Structure 405, GIS-based quantitative methods to produce landslide susceptibility maps in
442–451. Nancheng County, China. Landslides 14, 1091–1111.
O'brien, R.M., 2007. A caution regarding rules of thumb for variance inflation factors. Qual. Varnes, D.J., 1984. Landslide Hazard Zonation: A Review of Principles and Practice.
Quant. 41, 673–690. Wang, L.-J., Guo, M., Sawada, K., Lin, J., Zhang, J., 2016. A comparative study of landslide
Oh, H.-J., Pradhan, B., 2011. Application of a neuro-fuzzy model to landslide-susceptibility susceptibility maps using logistic regression, frequency ratio, decision tree, weights
mapping for shallow landslides in a tropical hilly area. Comput. Geosci. 37, of evidence and artificial neural network. Geosci. J. 20, 117–136.
1264–1276. Wang, Q., Wang, Y., Niu, R., Peng, L., 2017. Integration of information theory, K-means
Oh, H.-J., Kim, Y.-S., Choi, J.-K., Park, E., Lee, S., 2011. GIS mapping of regional probabilistic cluster analysis and the logistic regression model for landslide susceptibility mapping
groundwater potential in the area of Pohang City, Korea. J. Hydrol. 399, 158–172. in the Three Gorges Area, China. Remote Sens. 9, 938.
Pham, B.T., Tien Bui, D., Indra, P., Dholakia, M., 2015. Landslide susceptibility assessment Yingying Tian, C.X., Hong, Haoyuan, Zhou, Qing, Wang, Duo, 2019. Mapping earthquake-
at a part of Uttarakhand Himalaya, India using GIS–based statistical approach of fre- triggered landslide susceptibility by use of artificial neural network (ANN) models an
quency ratio method. Int J Eng Res Technol 4, 338–344. example of the 2013 Minxian (China) Mw 5.9 event. Geomatics, Natural Hazards and
Pham, B.T., Bui, D., Prakash, I., Dholakia, M., 2016a. Evaluation of predictive ability of sup- Risk. 10, 1–25.
port vector machines and naive Bayes trees methods for spatial prediction of land- Yu, H., Ma, Y., Wang, L., Zhai, Y., Wang, X., 2017. A landslide intelligent detection method
slides in Uttarakhand state (India) using GIS. J. Geom. 10, 71–79. based on CNN and RSG_R, Mechatronics and Automation (ICMA), 2017 IEEE Interna-
Pham, B.T., Bui, D.T., Dholakia, M., Prakash, I., Pham, H.V., 2016b. A comparative study of tional Conference on. IEEE 40–44.
least square support vector machines and multiclass alternating decision trees for Zhu, A.-X., et al., 2018. A comparative study of an expert knowledge-based model and two
spatial prediction of rainfall-induced landslides in a tropical cyclones area. Geotech. data-driven models for landslide susceptibility mapping. Catena 166, 317–327.
Geol. Eng. 34, 1807–1824.