You are on page 1of 19

Science of the Total Environment 666 (2019) 975–993

Contents lists available at ScienceDirect

Science of the Total Environment

journal homepage: www.elsevier.com/locate/scitotenv

Comparison of convolutional neural networks for landslide susceptibility


mapping in Yanshan County, China
Yi Wang a,⁎, Zhice Fang a, Haoyuan Hong b,c,d,⁎⁎
a
Institute of Geophysics and Geomatics, China University of Geosciences, Wuhan 430074, China
b
Key Laboratory of Virtual Geographic Environment (Nanjing Normal University), Ministry of Education, Nanjing, 210023, China
c
State Key Laboratory Cultivation Base of Geographical Environment Evolution (Jiangsu Province), Nanjing 210023, China
d
Jiangsu Centre for Collaborative Innovation in Geographic Information Resource Development and Application, Nanjing, Jiangsu 210023, China

H I G H L I G H T S G R A P H I C A L A B S T R A C T

• Convolutional neural networks for land-


slide susceptibility mapping are carried
out for the first time.
• Three novel data representation algo-
rithms are developed to fit the CNN ar-
chitectures.
• A comparative study on the proposed
methods under the CNN framework is
implemented in Yanshan County, China.
• The proposed CNNs are superior to the
state-of-the-art marching learning and
deep learning methods.

a r t i c l e i n f o a b s t r a c t

Article history: Assessments of landslide disasters are becoming increasingly urgent. The aim of this study is to investigate a
Received 12 November 2018 convolutional neural network (CNN) framework for landslide susceptibility mapping (LSM) in Yanshan County,
Received in revised form 17 January 2019 China. The two primary contributions of this study are summarized as follows. First, to the best of our knowledge,
Accepted 16 February 2019
this report describes the first time that the CNN framework is used for LSM. Second, different data representation
Available online 22 February 2019
algorithms are developed to construct three novel CNN architectures. In this work, sixteen influencing factors as-
Editor: Ralf Ludwig sociated with landslide occurrence were considered and historical landslide locations were randomly divided
into training (70% of the total) and validation (30%) sets. Validation of these CNNs was performed using different
Keywords: commonly used measures in comparison to several of the most popular machine learning and deep learning
Landslide susceptibility methods. The experimental results demonstrated that the proportions of highly susceptible zones in all of the
Deep learning CNN landslide susceptibility maps are highly similar and lower than 30%, which indicates that these CNNs are
Convolutional neural network more practical for landslide prevention and management than conventional methods. Furthermore, the pro-
Data presentation algorithm posed CNN framework achieved higher or comparable prediction accuracy. Specifically, the proposed CNNs
Yanshan County
were 3.94%–7.45% and 0.079–0.151 higher than those of the optimized support vector machine (SVM) in
terms of overall accuracy (OA) and Matthews correlation coefficient (MCC), respectively.
© 2019 Elsevier B.V. All rights reserved.

⁎ Corresponding author.
⁎⁎ Correspondence to: H. Hong, Key Laboratory of Virtual Geographic Environment (Nanjing Normal University), Ministry of Education, Nanjing 210023, China.
E-mail addresses: cug.yi.wang@gmail.com (Y. Wang), 171301013@stu.njnu.edu.cn, hong_haoyuan@outlook.com (H. Hong).

https://doi.org/10.1016/j.scitotenv.2019.02.263
0048-9697/© 2019 Elsevier B.V. All rights reserved.
976 Y. Wang et al. / Science of the Total Environment 666 (2019) 975–993

1. Introduction Yanshan County has a subtropical climate. According to the Jiangxi


Province Meteorological Bureau (http://www.weather.gov.cn), the av-
Landslides are among the most common and catastrophic natural erage annual rainfall during 1959–2016 ranged between 1700 and
hazards, causing numerous casualties and fatalities every year 2100 mm, based on the historical records of 27 rain gauge stations. Av-
(Kavzoglu et al., 2014). Many influencing factors contribute to landslide erage annual temperature and annual sunshine were between 17.2 and
occurrences (Bui et al., 2016a). Therefore, landslide spatial prediction 19.6 °C and 1792 h, respectively.
using these influencing factors is essential in preventing and decreasing
landslide damages. 3. Material and methodology
Recently, various machine learning (ML) methods have been
employed in regional landslide spatial prediction, including naïve The methodological hierarchy in this work is based on the CNN
Bayes (Pham et al., 2016a; Shirzadi et al., 2017; Tsangaratos and Ilia, framework and four CNN architectures are considered for LSM. Specifi-
2016), decision trees (Chen et al., 2017b; Wang et al., 2016), support cally, the gain ratio (GR) method is first used to select the most signifi-
vector machines (SVMs) (Bui et al., 2016b; Hong et al., 2016b; Hong cant influencing factors. Then, three data representation forms are
et al., 2017a; Pham et al., 2016a), random forests (Chen et al., 2018b; performed to convert the input data to a series of subimages (with dif-
Hong et al., 2016a; Hong et al., 2017b) and artificial neural networks ferent dimensions) to fit the CNN architecture. Next, these subimages
(ANNs) (Bui et al., 2016b; Chen et al., 2017a; Yingying Tian et al., are placed into the CNN architectures for landslide susceptibility model-
2019). Therefore, advanced ML approaches are considerably promising ling using the training set. Finally, the final prediction results obtained
for landslide spatial prediction. For example, with the rapid develop- from the CNNs are evaluated using the validation set based on qualita-
ment of neural network (NN) techniques, the classification capability tive and quantitative analysis. The flowchart of the proposed CNN
of an NN to fit a decision boundary plane has become significantly framework is illustrated in Fig. 2. Some related techniques are intro-
more reliable (LeCun et al., 2015). However, the previously mentioned duced in the following subsections.
ML approaches directly classify the input data and cannot uncover
more representative features from these data to further improve classi-
fication accuracies (Andrieu et al., 2003). 3.1. Landslide inventory mapping
To address this problem, the deep learning (DL) framework has re-
cently received more attention. DL is one of the most popular branches Landslide inventory mapping can be interpreted from visual inspec-
in the ML field based on learning (in a supervised, semi-supervised or tion using high-resolution remote sensing images (Pham et al., 2017b;
unsupervised way) data representations, which is different from task- Pham et al., 2017c; Pham et al., 2017d; Tsangaratos et al., 2017). Google
specific approaches (Bengio et al., 2013; LeCun et al., 2015; Earth™ has been widely used for landslide detection and LSM by using
Schmidhuber, 2015). This framework has been applied in various fields remote sensing images with different resolutions, including Worldview,
and can achieve results comparable or superior to human experts IKONOS, QuickBird, SPOT, Pléiades, Gaofen and Landsat images
(Cireşan et al., 2012b; Krizhevsky et al., 2012). Among the different DL (Broeckx et al., 2018; Chigira et al., 2010; Pulighe et al., 2016). Therefore,
techniques, convolutional neural network (CNN) plays a significant the landslide inventory map for Yanshan County with 380 historical
role in addressing pattern recognition problems (Ren et al., 2015). Spe- landslide locations provided by the Jiangxi Department of Land and Re-
cifically, it is capable of recognizing patterns with extreme variability sources (http://www.jxgtt.gov.cn) and Jiangxi Meteorological Bureau
(such as handwritten characters) using convolutional and pooling (http://www.weather.gov.cn), and field surveys and image interpreta-
layers, which reflect the translation-invariant nature of most images tion were sourced from Google Earth™. These landslide locations in-
(Simard et al., 2003). More recently, the CNN technique has been used clude both rotational slides and translational slides. The smallest and
to detect and recognize landslides using remote sensing images (Ding largest landslides are approximately 172 m2 and 15,722 m2, respec-
et al., 2016; Yu et al., 2017). However, none of these studies assessed tively, and the average area of all landslides is 2531 m2. In this study,
the effectiveness of CNN to map landslide susceptible zones. all of the landslide locations previously mentioned were used to con-
The objective of this study is to present a CNN framework for re- struct a landslide dataset that more accurately represents the geo-
gional landslide susceptibility analysis in Yanshan County, China. The environmental settings of the landslide areas, of which 266 locations
two primary contributions of this work can be summarized as follows. (70%) were randomly selected for training and the remaining 114 loca-
First, to the best of our knowledge, this report describes the first time tions (30%) for validation. The landslide inventory map of the study area
that the CNN framework is used for landslide susceptibility mapping is shown in Fig. 1.
(LSM). We attempt to explore the powerful prediction capability of
CNNs and provide meaningful information for further research. Second, 3.2. Landslide influencing factors
different data representation algorithms are developed to construct
three novel CNN architectures to fit the landslide prediction process. The environmental conditions of landslide occurrences are crucial to
To assess the effectiveness of CNNs, several commonly used measures landslide susceptibility prediction. Therefore, historical landslide events
of overall accuracy (OA), Matthews correlation coefficient (MCC), re- were used to construct relevant LSM methods based on the assumption
ceiver operating characteristic (ROC) curve and area under curve that landslides that occur in the future are subjected to the same envi-
(AUC) were used for comparison with conventional methods. ronments as the previous landslides.
The selection of influencing factors for evaluating landslide hazards
is a key step of LSM. There are hundreds of influencing factors that affect
2. Study area landslides (Reichenbach et al., 2018). It is necessary to select appropri-
ate factors to produce a reliable landslide susceptibility map. For exam-
Yanshan County is located in the municipality of Shangrao, in north- ple, slope can reflect the steepness of the study area and steep slopes are
eastern Jiangxi Province, China, covering an area of approximately highly prone to landslides (Varnes, 1984). Generally, gentle slopes ap-
2180km2 and extending between 117°25′E and 118°0′E longitude and pear to have lower landslide susceptibility than steep slopes (Dai
27°48′Ν and 28°25′Ν latitude. The population of the study area is et al., 2001). Aspect is related to landslide occurrences because slopes
477,715 people, of whom 202,263 live in towns. Approximately 60.0% with different orientations experience different effects of precipitation
of the study area has a slope gradient that is b20°, whereas areas with and solar radiation (Pourghasemi et al., 2012; Varnes, 1984). Plan curva-
a slope gradient N40° account for only 5% of the total area, with altitudes ture and profile curvature can effectively reflect terrain complexity and
ranging from −27 m to 2144 m above sea level, as shown in Fig. 1. topography (Oh and Pradhan, 2011). Different soil types have different
Y. Wang et al. / Science of the Total Environment 666 (2019) 975–993 977

Fig. 1. Location of the study area.

surface characteristics that may cause landslides (Bui et al., 2017; and validation sets may make the prediction process very complex.
Varnes, 1984). Land use is a key factor that contributes to landslide oc- Meanwhile, the curse of dimensionality always results in poor predic-
currences and has an important impact on the stability of slopes in tion accuracies. The feature selection is beneficial for using attribution
terms of vegetative coverage (Pham et al., 2016c). Lithology is one of selection methods and obtaining high-quality data. Furthermore, this
the most commonly used factors in LSM and some geological formations step can remove extraneous and redundant features from all the avail-
are more favourable to landslides. Rainfall is a key landslide-induced able attributes. In this work, two influencing factor evaluators of
factor because it can influence the shear strength of slopes (Pham multicollinearity analysis and GR are considered and introduced in the
et al., 2016b; Varnes, 1984). The factors of distance to faults, roads and following subsections.
rivers have an important impact on the spread and size of landslides
in the study area (Pham et al., 2016a; Pham et al., 2015). According to 3.3.1. Multicollinearity analysis
previous works, environmental conditions and the available data of To estimate the correlation between the landslides' influencing
the study area, sixteen influencing factors were considered in this factors, multicollinearity analysis was applied in this study.
work for landslide analysis, including the morphological factors of alti- Multicollinearity is a statistical phenomenon in which there exists a
tude, aspect, slope, plan curvature, profile curvature, STI, SPI and TWI, high relationship between two or more predictor variables in a multiple
the geological factors of lithology and distance to faults, the land cover regression model (O'brien, 2007). In this study, tolerance (TOL) and var-
factors of land use, NDVI, soil and distance to roads, the hydrological fac- iance inflation factor (VIF) were used to detect multicollinearity among
tor of distance to river and rainfall. In addition, selection of a suitable influencing factors. Let X = {X1, X2, … , XN} define a given independent
terrain mapping unit is significant for LSM. Among all terrain mapping variable set and R2j denote the coefficient of determination when the
units, grid cells are the most popular for raster-based GIS users in jth independent variable Xj is regressed on all other predictor variables
modelling landslide susceptibility. Using ArcGIS 10.2 software, each fac- in the model. The VIF value is computed as follows:
tor was converted in the form of spatially defined layers of maps with a
grid size of 25 × 25 m, which is in accord with the digital elevation  
model (DEM) data generated from 1:50,000 scale topographic maps. VIF ¼ 1= 1−R2j ð1Þ

3.3. Influencing factor evaluators


The TOL value is the reciprocal of the VIF value and represents the
Feature selection is very important in the field of data mining, espe- degree of linear correlation between independent variables. If the VIF
cially in landslide spatial prediction. High dimensionality of the training value is above 10 or the TOL value is b0.1, the corresponding factors
978 Y. Wang et al. / Science of the Total Environment 666 (2019) 975–993

Fig. 2. Flowchart of the proposed CNN framework.

are multicollinearity and should be removed from the landslide predic- Finally, the GR is defined as follows:
tion models.
GainðAÞ
GainRatioðAÞ ¼ ð6Þ
3.3.2. Gain ratio method SplitInfoA ðSÞ
In this study, the feature selection method of gain ratio (GR) was
used to select an optimal subset to improve the prediction perfor- The average merit (AM) deprived from this method uncovers the
mance in LSM (Dash and Liu, 1997). The information gain measure importance between conditioning factors and landslide occurrence. If
is used to select an attribute at each node of the decision tree. this value is equal to or b0, the corresponding factor is considered to
Then, the GR that is an extension of information gain is proposed to be an irrelevant attribute and should be excluded for prediction,
overcome the bias. For clarification, the GR method is introduced as whereas the remaining factors are used in the following prediction
follows. process.
Let S be a training set and n be the total of samples, with the expected
information given by 3.4. CNN

X
n
A CNN model, exhibiting robust performance in visual image analy-
HðSÞ ¼ − pi log2 ðpi Þ ð2Þ
i¼1
sis, is a class of feed-forward NN whose artificial neurons respond to a
portion of the surrounding elements (Girshick, 2015). This implies
that a CNN is a variation of a multilayer perceptron consisting of one
where pi is the probability that a sample belongs to class Ci. The attribute
or more convolution, max pooling and fully connected layers (Hoo-
A has m values and its average entropy is given by
Chang et al., 2016). The structure of a typical CNN is shown in Fig. 3. A
X
n basic CNN always has input, convolutional, max pooling, fully connected
EðAÞ ¼ − pi HðSÞ ð3Þ and output layers. The input layer is a m × n matrix in which every ele-
i¼1 ment has a feature value; thus, the input data can be represented as a
two-dimensional feature map. Each convolutional layer consists of sev-
and the information gain on attribute A is eral convolutional units, and parameters of every unit are optimized by
a back-propagation algorithm. The purpose of a convolutional manipu-
GainðAÞ ¼ HðSÞ−EðAÞ ð4Þ lation is to extract different features of the input layer (Sharif Razavian
et al., 2014). The first convolutional layer may only extract some low-
The split information value represents the potential information ob- level features such as lines, edges and corners. More convolutional
tained by splitting S into m parts corresponding to m outcomes on attri- layers can iteratively learn more intricate representations from low-
bute A and can be computed as follows: level features. Pooling is a critical manipulation in the CNN technique
(Szegedy et al., 2015). In fact, it is a form of down-sampling to reduce
the dimensionality of feature maps, without altering the depth of
X
m
SplitInfoA ðSÞ ¼ − X i ðSi =SÞ log2 ðSi =SÞ ð5Þ these maps. Max pooling is the most common manipulation in different
i¼1 pooling approaches. The aim of this manipulation is to divide the feature
Y. Wang et al. / Science of the Total Environment 666 (2019) 975–993 979

Fig. 3. Generalized CNN architecture.

maps into a number of rectangular areas and produce the maximum maps is connected to a 5 × 5 neighbourhood in the input map. S2 is a
value for each area. Furthermore, this manipulation can continuously max pooling layer for subsampling with six ðn−4Þ2
feature maps. The
22
reduce the dimensionality of data, hence the number of parameters
max pooling kernel size is 2 × 2 and each unit in the feature maps is
and amount of computational cost decrease. Consequently, the over-
connected to the previous layer. C3 is another convolutional layer
fitting problem can be avoided. The fully connected layer reorganizes 2

extracted representations to reduce the loss of feature information, with sixteen ðn−12Þ
22
feature maps that each unit is connected to a 5 × 5
and the output layer produces classification results. It should be noted neighbourhood in the previous layer. The layer S4 performs the max
that the relative position of discovered features to other features is pooling process and has sixteen ðn−12Þ2
feature maps. F5 and F6 are
more important than the precise location, which is very distinctive to 42
fully connected layers with 120 and 84 neural units, respectively. Fi-
CNN.
nally, the output layer produces two neural units to indicate the binary
CNNs are skilled in visual recognition, having different
classification results of “landslide” and “non-landslide”. Fig. 4 shows the
convolutional, max pooling and fully connected layers. For example,
architecture of LeNet-5 when n = 24.
LeNet-5 is a CNN structure used in handwritten digit recognition and
it can effectively solve some visually related problems (LeCun et al.,
1995). However, LeNet-5 cannot be directly employed in landslide sus- 3.5. The proposed CNN architectures
ceptibility analysis. Therefore, in this paper, we proposed new CNN ar-
chitectures that are suitable to landslide susceptibility analysis and The aim of this work is to develop a novel CNN framework for
compared them with the traditional CNN structure of LeNet-5. regional landslide susceptibility analysis. However, landslide data have
Generally, a commonly used LeNet-5 is comprised of eight layers different expressions to fit the CNN architectures, which may influence
(LeCun et al., 1998). Given n × n input data, C1 is a convolutional layer landslide susceptibility results. In this section, three different data rep-
with six (n – 4) × (n – 4) feature maps and each unit in the feature resentation forms were used to construct the novel CNN architectures,

Fig. 4. LeNet-5 architecture.


980 Y. Wang et al. / Science of the Total Environment 666 (2019) 975–993

including one-dimensional, two-dimensional and three-dimensional initialization must be performed by converting a one-dimensional
forms of data representation. Therefore, we construct different CNN input grid cell (vector) comprised of different attribute features into a
structures to fit different data representations. In the following, we two-dimensional matrix. In this work, we compare the number of
term the CNN coupling with the one-dimensional, two-dimensional landslide influencing factors with that of attribute values of each factor,
and three-dimensional data representation algorithms as CNN-1D, and choose the maximum of these two numbers to be the size of the
CNN-2D and CNN-3D, respectively. corresponding two-dimensional matrix. For example, there are 24
lithological categories in the study area, a number that is larger than
3.5.1. CNN-1D that of the landslide influencing factors (15). Therefore, we constructed
For LSM, the input data can be regarded as a picture in which each a 24 × 24 matrix for each grid cell. Fig. 6 illustrates the conversion
pixel has several landslide influencing attributes. Therefore, each grid manipulation from a one-dimensional input grid cell (vector) to a
cell of the input data is represented by a column vector with a length de- two-dimensional matrix. Specifically, for each column vector in this
fined by the number of landslide influencing factors. Moreover, each el- matrix, the element value at the position that corresponds to the
ement in this vector corresponds to a landslide influencing attribute. In corresponding attribute value is assigned to be 1, and other element
this work, we develop a 1D CNN structure to directly extract the infor- values in this vector are assigned to be 0.
mation from landslide influencing factors and landslide susceptibility There is one convolutional layer and one max pooling layer in the
analysis. The 1D CNN architecture consists of one convolutional layer, proposed 2D CNN structure. Each convolution layer is followed by a
one max pooling layer and one fully connected layer. Assuming that dropout layer. It is assumed that each landslide grid cell unit is trans-
there are n landslide influencing factors in the input data, the formed into a n × n matrix. The first convolutional layer filters the
convolutional layer filters the input data with N kernels with a size of input data with N kernels with a size of m × m, and thus this layer
m × 1, and thus this layer has N feature vectors with lengths of (n – m has twenty (n–m + 1) × (n–m + 1) feature maps. Each grid cell in
+ 1). Each element in the feature vector is connected to a m × 1 the feature maps is connected to a m × m neighbourhood in the
neighbourhood in the input vector. The max pooling layer has a size of input map. The dropout manipulation temporarily discards the NN
units according to a certain probability during the training process
a × 1 and its result is composed of N vectors with a length of n−mþ1 a .
of the CNN network. This solves for the over-fitting problem and im-
The fully connected layer with k neural units follows the previous
proves classification accuracies. Following the convolutional step, a
layer to represent the extracted features. Finally, we produce two neural
drop manipulation is used. The max pooling layer has a size of a ×
units in the output layer to resolve a binary classification problem. Fig. 5
a. Therefore, the results of this layer consist of N matrices with a
illustrates the architecture of CNN-1D when n = 15, N = 20, m = 3, a =
2 and k = 50. size of ½ðn−mþ1Þ
a   ½ðn−mþ1Þ
a , which are then transferred to the second
convolutional layer with M kernels whose size is m × m, followed
by a dropout manipulation. The results of this convolutional layer
3.5.2. CNN-2D
The probability of landslide occurrence of each grid cell is related to consist of M matrices with a size of ½n−ðaþ1Þðm−1Þ
a   ½n−ðaþ1Þðm−1Þ
a ,
all the influencing factors. This implies that each grid cell has a set of which are then continuously transferred to the second max pooling
unique attribute values that can reflect the potential of a landslide oc- layer with M whose size is a × a. The resultant M feature maps with
curring. As mentioned in Section 3.3, the CNN technique has been suc- a size of ½n−ðaþ1Þðm−1Þ
a2
  ½n−ðaþ1Þðm−1Þ
a2
 are produced by this max
cessfully applied in image processing. To apply this technique for LSM, pooling layer. A fully connected layer with k neural units follows

Fig. 5. CNN-1D architecture.


Y. Wang et al. / Science of the Total Environment 666 (2019) 975–993 981

Fig. 6. Two-dimensional data form. (a) Input data and (b) the conversion of a 1D grid cell to a 2D matrix.

the previous layer to reorganize the extracted features. Finally, the example, given a neighbourhood of each grid cell with a size of 7 × 7,
output layer produces two neural units to indicate binary classifica- the three-dimensional data representation is illustrated in Fig. 8.
tion results of “landslide” and “non-landslide”. Fig. 6 illustrates the Under these circumstances, we proposed a 3D CNN architecture to
two-dimensional data form and Fig. 7 shows the architecture of extract the influencing factors information and spatial relation to pre-
CNN-2D when n = 24, m = 3, N = 20, M = 15, a = 2 and k = 78. dict the probability of landslide occurrence. Specifically, the three-
dimensional CNN network was composed of one convolutional layer
3.5.3. CNN-3D of N kernels with a size of m × m × m, one max pooling layer and one
Regarding the three-dimensional data representation, the input data fully connected layer. Provided the c × n × n input data, the
of the study area can be represented as a three dimensional matrix of convolutional layer has N feature maps with a size of (c–m + 1) × (n–
size c × n × n, where n denotes the row and column of each data layer m + 1) × (n–m + 1). Each grid cell is connected to a m × m × m
and c represents the number of the landslide influencing factors. For neighbourhood in the input data. The next hidden layer is a max pooling
982 Y. Wang et al. / Science of the Total Environment 666 (2019) 975–993

Fig. 7. 2D CNN architecture.

layer with a size of a × a. Thus, the output results from the max pooling et al., 2012). A rectified linear unit (ReLU) function was used in CNN net-
layer have N feature maps with a size of ½c−mþ1 a   ½n−mþ1
a   ½n−mþ1
a . A works (Dahl et al., 2013). The ReLU function is one of the most common
fully connected layer with k neural units follows the max pooling and efficient activation functions in CNN, and it has two main advan-
layer to learn the extracted features. Finally, we position two neural tages: First, this function can overcome the problem of gradient disap-
units in the output layer to represent a “landslide” and “non-landslide”. pearance. Second, it is more efficient and effective for training
Fig. 9 illustrates the architecture of CNN-3D when c = 15, n = 7, N = 20, prediction methods than other activation functions (Maas et al.,
m = 3, a = 2 and k = 78. 2013). The categorical cross-entropy function and advanced adaptive
gradient (AdaGrad) algorithm were employed as a loss function and
its optimizer, respectively (Anthimopoulos et al., 2016). The AdaGrad
3.5.4. The related parameters method can provide a constraint on the learning rate and use different
Parameter settings have a significant impact on the performance of learning rates for each learning parameter per iteration (Duchi et al.,
classification/prediction methods. In this subsection, some related pa- 2011). The softmax function was used to produce a posteriori probabil-
rameters used in the proposed CNNs are briefly introduced. On the ity for each grid cell (Lawrence et al., 1997). On the other hand, dropout
one hand, the influence of activation functions and loss functions in manipulation plays an important role for improving prediction perfor-
the CNN architecture is very important. In the ANN techniques, the out- mance because it can temporarily discard the neural network units ac-
put of each layer is always a linear function of its previous layer. How- cording to a particular probability during the training process (Hinton
ever, it is very difficult to represent the actual situation using this et al., 2012). Specifically, dropout manipulation forces a neural unit to
linear relationship (Huang and Babri, 1998). To solve for this problem, work with other randomly selected neural units to reducing over-
the activation function technique was used to fit the output data be- fitting and the coadaptation between hidden units (Srivastava et al.,
cause it can effectively convert linear relationships to nonlinear rela- 2014). Furthermore, this manipulation can enhance the generalization
tionships through predefined activation (nonlinear) functions (Dahl of prediction methods (Dahl et al., 2013).

Fig. 8. Three-dimensional data form.


Y. Wang et al. / Science of the Total Environment 666 (2019) 975–993 983

Fig. 9. 3D CNN architecture.

3.6. Model evaluation methods In ML, the Matthews correlation coefficient (MCC) has been used as
a measure of binary classifications, even if the two classes are of very dif-
To evaluate the performance of the proposed framework, measure- ferent sizes (Matthews, 1975). The MCC is defined as follows:
ments of the OA and ROC curve were used (Chen et al., 2017c; Pham
et al., 2017b; Tsangaratos and Ilia, 2016). The OA value is the ratio of
the number of correctly classified grid cells to the total number of grid TP  TN−FP  FN
MCC ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð8Þ
cells, calculated as follows: ðTP þ FPÞðTP þ FNÞðTN þ FP ÞðTN þ FNÞ

a
OA ¼  100% ð7Þ where TP and TN (true negative) represent the number of landslide and
b
non-landslide samples that are correctly classified, whereas FP and FN
where a and b denote the numbers of correctly classified landslide or (false negative) denote the number of non-landslide and landslide sam-
non-landslide grid cells and the total number of grid cells in the valida- ples that are misclassified. Moreover, this measure is actually a correla-
tion set. A higher OA value implies better classification precision. The tion coefficient between the observed and predicted classes. Generally,
ROC curve is a standard technique for the performance evaluation of a final result is regarded as a prefect prediction if the MCC value equals
landslide prediction methods (Bradley, 1997). It is produced by plotting to 1, whereas MCC values of 0 and −1 represent a random prediction
the true positive (TP) rate against the false positive (FP) rate at various and a total disagreement between prediction and observation,
threshold values. The TP rate and the FP rate are also referred to “sensi- respectively.
tivity” and “100-specificity” in statistics, respectively. Moreover, the Moreover, a chi-square test was used to evaluate the significant dif-
AUC measure has been widely used to quantitatively evaluate the per- ference between expected methods (Kuncheva, 2004). It is based on a
formance of LSM approaches (Mandal and Mandal, 2018; Pham et al., prior hypothesis that LSM methods have no significant difference
2017a; Wang et al., 2017). Specifically, a prediction approach is consid- (Tallarida and Murray, 1987). Chi-square and p values were selected
ered good if the AUC value is close to 1 (Tsangaratos et al., 2017; Zhu and calculated for validation. In general, if the p value is smaller than
et al., 2018). 0.05 and the chi-square value is higher than 3.841, there is a significant
difference between the two LSM methods (Pham et al., 2017a).

Table 1
Multicollinearity analysis of landslide influencing factors.

Landslide influencing factors Statistics

TOL VIF

Altitude 0.249 4.020


Aspect 0.935 1.069
Distance to faults 0.865 1.156
Land use 0.695 1.438
Lithology 0.776 1.289
NDVI 0.700 1.428
Plan curvature 0.569 1.756
Profile curvature 0.726 1.378
Rainfall 0.590 1.695
Distance to rivers 0.828 1.207
Distance to roads 0.852 1.174
Slope 0.310 3.221
Soil 0.351 2.846
SPI 0.102 9.802
STI 0.096 10.466
TWI 0.443 2.258
Fig. 10. AM of each landslide influencing factor using the GR method.
984 Y. Wang et al. / Science of the Total Environment 666 (2019) 975–993

Table 2
Spatial relationship between each landslide conditioning factor and landslide using the FR model.

Factor Class No. of landslides Percentage of landslides No. of pixels in domain Percentage of domain FR

200–400 108 28.42 736,559 21.27 1.34


400–600 79 20.79 497,448 14.36 1.45
600–1000 108 28.42 593,363 17.13 1.66
1000–1400 36 9.47 232,973 6.73 1.41
N 1400 22 5.79 125,653 3.63 1.60
Aspect Flat 0 0.00 21,778 0.63 0.00
North 9 2.37 435,037 12.56 0.19
Northeast 14 3.68 438,426 12.66 0.29
East 68 17.89 459,958 13.28 1.35
Southeast 102 26.84 392,302 11.33 2.37
South 75 19.74 375,231 10.83 1.82
Southwest 62 16.32 389,014 11.23 1.45
West 34 8.95 489,863 14.14 0.63
Northwest 16 4.21 461,844 13.33 0.32
Distance to faults (m) 0–2000 138 36.32 1,107,234 31.97 1.14
2000–5000 128 33.68 1,123,993 32.45 1.04
5000–8000 79 20.79 630,206 18.20 1.14
8000–11,000 26 6.84 323,193 9.33 0.73
N 11,000 9 2.37 278,827 8.05 0.29
Landuse Grass 65 17.11 974,430 28.13 0.61
Water 0 0.00 14,468 0.42 0.00
Forest 304 80.00 1,819,269 52.53 1.52
Farmland 3 0.79 261,041 7.54 0.10
Residential 5 1.32 362,690 10.47 0.13
Bare 3 0.79 31,555 0.91 0.87
Lithology A 0 0.00 20,399 0.59 0.00
B 0 0.00 428,033 12.36 0.00
C 0 0.00 10,201 0.29 0.00
D 59 15.53 435,322 12.57 1.24
E 57 15.00 277,368 8.01 1.87
F 0 0.00 23,628 0.68 0.00
G 4 1.05 138,451 4.00 0.26
H 0 0.00 76,832 2.22 0.00
I 30 7.89 165,735 4.79 1.65
J 3 0.79 122,923 3.55 0.22
K 7 1.84 155,381 4.49 0.41
L 11 2.89 83,728 2.42 1.20
M 110 28.95 841,897 24.31 1.19
N 1 0.26 11,530 0.33 0.79
O 40 10.53 263,127 7.60 1.39
P 2 0.53 24,377 0.70 0.75
Q 0 0.00 5872 0.17 0.00
R 1 0.26 57,458 1.66 0.16
S 15 3.95 176,934 5.11 0.77
T 0 0.00 29,718 0.86 0.00
U 5 1.32 37,506 1.08 1.22
V 7 1.84 21,033 0.61 3.03
W 28 7.37 55,893 1.61 4.57
X 0 0.00 107 0.00 0.00
NDVI b 0.1 27 7.11 982,767 28.38 0.25
0.1–0.2 50 13.16 895,248 25.85 0.51
0.2–0.3 174 45.79 1,119,767 32.33 1.42
0.3–0.4 116 30.53 425,473 12.28 2.48
N 0.4 13 3.42 40,198 1.16 2.95
Plan curvature b −0.4 143 37.63 1,081,806 31.23 1.20
−0.4–0.5 133 35.00 1,399,644 40.41 0.87
N 0.5 104 27.37 982,003 28.35 0.97
Profile curvature b −1 67 17.63 667,079 19.26 0.92
(−1) - 0.2 133 35.00 1,346,078 38.87 0.90
N 0.2 180 47.37 1,450,296 41.87 1.13
Rainfall b 1000 0 0.00 62,560 1.81 0.00
1000–1200 88 23.16 828,235 23.91 0.97
1200–1300 91 23.95 1,251,534 36.14 0.66
1300–1400 143 37.63 955,719 27.59 1.36
N 1400 58 15.26 365,405 10.55 1.45
Distance to rivers (m) b 200 58 15.26 1,064,718 30.74 0.50
200–400 85 22.37 903,585 26.09 0.86
400–700 152 40.00 997,555 28.80 1.39
N 700 85 22.37 497,595 14.37 1.56
Distance to roads (m) b 600 127 33.42 1,274,023 36.78 0.91
600–1300 125 32.89 1,108,941 32.02 1.03
1300–2300 92 24.21 808,140 23.33 1.04
N 2300 36 9.47 272,349 7.86 1.20
Slope (°) 0–10 44 11.58 1,041,810 30.08 0.38
Y. Wang et al. / Science of the Total Environment 666 (2019) 975–993 985

Table 2 (continued)

Factor Class No. of landslides Percentage of landslides No. of pixels in domain Percentage of domain FR

10–20 129 33.95 1,056,375 30.50 1.11


20–30 109 28.68 754,343 21.78 1.32
30–40 62 16.32 434,489 12.54 1.30
40–50 30 7.89 151,131 4.36 1.81
50–60 4 1.05 23,871 0.69 1.53
N 60 2 0.53 1434 0.04 12.71
Soil ATc 15 3.95 522,583 15.09 0.26
ACh 147 38.68 1,680,645 48.53 0.80
ACu 84 22.11 646,566 18.67 1.18
LVh 51 13.42 216,258 6.24 2.15
ALh 82 21.58 378,741 10.94 1.97
CMu 1 0.26 18,660 0.54 0.49
SPI b 10 33 8.68 708,150 20.45 0.42
10–20 58 15.26 567,134 16.37 0.93
20–35 52 13.68 484,677 13.99 0.98
35–50 25 6.58 274,544 7.93 0.83
N 50 212 55.79 1,428,948 41.26 1.35
TWI b5 82 21.58 562,670 16.25 1.33
5–7 200 52.63 1,927,531 55.65 0.95
7–9 56 14.74 705,905 20.38 0.72
9–11 31 8.16 220,069 6.35 1.28
N 11 11 2.89 47,278 1.37 2.12

4. Results that influences soil structure; thus, the highest rainfall of N1400 mm
has the highest FR value of 1.45. With respect to plan curvature and pro-
4.1. Relative importance analysis of influencing factors file curvature factors, the b−0.4 and N0.2 classes have the highest FR
values of 1.2 and 1.13, accounting for over 37% and 47% of landslide oc-
The predictive capability of all the landslide influencing factors was currences in the study area, respectively. The N60° class of slope factor
evaluated using the training set based on multicollinearity analysis with the highest FR value of 12.71 has much greater probability of land-
and the GR method. Table 1 lists the results of the multicollinearity anal- slide occurrence than the other classes. The highest FR value of 2.15 cor-
ysis of landslide influencing factors. The factor of STI is shown to have a responds to the LVh soil type. For SPI factor, N50% of landslides occurred
VIF value that is 10.466 larger than the threshold value (10). Therefore, in the N50 class, which had the highest FR value of 1.35. The relationship
it should be removed from the prediction processes. between landslide occurrence and TWI showed that the N11 and 7–9
For the GR method, factors with higher weights are more significant classes have the highest and lowest FR values of 2.12 and 0.72, respec-
to the prediction methods, whereas factors with weights of zero cannot tively. Fig. 11 presents reclassification maps of all the landslide influenc-
contribute to landslide susceptibility modelling and should be excluded ing factors for better visual inspection.
from further analysis. Fig. 10 shows the AM of each influencing factors.
Among these factors, the landuse factor has the highest AM value of 4.3. Model validation and comparison
0.0617, which indicates that it is more important than the other factors.
The AM values of NDVI, altitude, lithology, slope, soil, distance to rivers, In this subsection, we used the three proposed CNN and LeNet-5
aspect, SPI and rainfall are between 0.0474 and 0.0151. Furthermore, methods introduced in Section 3 for experiments in landslide suscepti-
the AM values of distance to faults, distance to roads, plan curvature, bility analysis. All source codes of the methods previously mentioned
TWI and profile curvature are positive but b0.01, which indicates little were implemented in Python under the well-known TensorFlow frame-
contribution is provided to the models. According to previous analysis, work (http://www.tensorflow.org). TensorFlow is an open source soft-
all the remaining landslide influencing factors with AM values greater ware library using data flow graphs and has been widely used in ML and
than zero contribute to the LSM. DL. The experimental results were produced using a PC equipped with
an Intel Core i5-8400 processor and Nvidia GeForce GTX 660 graphics
4.2. Influencing factors analyses using FR model card. To construct the CNNs, all the parameter settings were optimized
through a training process using the trial-and-error method, as shown
The relationship between landslide occurrence and related influenc- in Table 3.
ing factors using an FR model is summarized in Table 2. If the FR value As the landslide prediction methods were constructed using the
N1, the corresponding area is more prone to landslide occurrence (Oh training set, each grid cell in the study area was assigned a susceptibility
et al., 2011). In Table 2, for altitude, the b200 m class has the lowest index. After assigning weights to the factor classes, the landslide suscep-
FR value of 0.19 and the FR values of other classes are all above 1. Results tibility map was prepared in an ArcGIS environment. For better visuali-
regarding aspect demonstrated that the highest FR value of 2.37 belongs zation, the indices were reclassified using the natural breaks method
to the southeast class in which N26% of the landslides occurred. In the into five levels of very low, low, moderate, high and very high. Fig. 12 il-
case of distance to faults, the highest FR value of 1.14 belongs to the lustrates landslide susceptibility maps obtained by different CNNs and
0–2000 m and 5000–8000 m classes. However, for the factors of dis- Fig. 13 shows the distribution of each class per landslide susceptibility
tance to rivers and distance to roads, the FR value increases as the dis- map. It can be observed that the northernmost part of all susceptibility
tance increases. Regarding the results of land-use factor, the forest maps are categorized as very low and low susceptible zones. The results
class has the highest FR value of 1.52, accounting for 80% of the landslide of CNN-1D and CNN-2D were very similar. However, the very high class
occurrences, which indicates that this class has significant importance in Fig. 12 (a) was evenly distributed in the study area, whereas this class
to landslide occurrence. Lithology results revealed that the W class has in Fig. 12 (b) was mainly concentrated in the northwest and southern-
the highest FR value of 4.57, indicating highest probability of landslide most parts of the study area. For the result of CNN-3D in Fig. 12 (c),
occurrence. For the NDVI factor, the FR value increases as the factor more than half of the study area were marked as very low, whereas
class value increases. Rainfall is a critical factor on landslide incidence the percentage was b10% for the low, moderate and high classes. The
986 Y. Wang et al. / Science of the Total Environment 666 (2019) 975–993
Y. Wang et al. / Science of the Total Environment 666 (2019) 975–993 987
988 Y. Wang et al. / Science of the Total Environment 666 (2019) 975–993

Fig. 11. Thematic maps of the study area. (a) Altitude, (b) aspect, (c) distance to faults, (d) land use, (e) lithology, (f) NDVI, (g) plan curvature, (h) profile curvature, (i) rainfall, (j) distance
to rivers, (k) distance to roads, (l) slope, (m) soil, (n) SPI and (o) TWI.

result of LeNet-5 in Fig. 12 (d) demonstrated the largest very high sus- 25.36%, respectively, which demonstrates that the proposed methods
ceptible zones, approximately 30% of the study area. It should be are more practical for landslide prevention and management than the
noted that the sum of the proportions of the very high and high classes LeNet-5 method with a proportion of 44.76%.
is similar for all the proposed CNNs. Specifically, the very high and high Table 4 lists the OA and MCC values of the three proposed CNNs and
classes of the CNN-1D, CNN-2D and CNN-3D are 25.42%, 29.52% and the LeNet-5 method. The proposed CNNs achieve higher OA values than
that of LeNet-5. Specifically, the CNN-2D method achieved the highest
OA value of 77.63%, which is approximately 3% higher than that of
LeNet-5, followed by CNN-3D and CNN-1D with OA values of 75.88%
Table 3 and 74.12%, respectively. In addition, CNN-2D obtained the highest
Parameter settings of the CNNs.
MCC value of 0.555, followed by CNN-3D, LeNet-5 and CNN-1D with
Method Parameter settings MCC values of 0.518, 0.510 and 0.483, respectively.
CNN-1D Convolutional kernel size: 3 × 1; max pooling kernel size: 2 × 1; number The ROC curves of all the methods using the validation set are shown
of iterations: 300; activation function: ReLU; optimizer: AdaGrad in Fig. 14. The CNN-2D method demonstrated better predictive power
CNN-2D Convolutional kernel size: 3 × 3; max pooling kernel size: 2 × 2; number than the other models in terms of AUC. Specifically, the CNN-3D and
of iterations: 100; activation function: ReLU; optimizer: AdaGrad;
LeNet-5 methods achieved very similar AUC values of 0.806 and 0.807,
dropout rate: 0.4 and 0.3
CNN-3D Convolutional kernel size: 3 × 3 × 3; max pooling kernel size: 2 × 2 × 2; respectively, and the CNN-1D method obtained the lowest AUC value
number of iterations: 300; activation function: ReLU; optimizer: of 0.799. Furthermore, a chi-square test was used to evaluate the signif-
AdaGrad icant difference of different prediction methods. If the chi-square value
LeNet-5 Convolutional kernel size: 5 × 5; max pooling kernel size: 2 × 2; number was larger than 3.841 and the significant level value (p) was lower than
of iterations: 15; activation function: ReLU; optimizer: AdaGrad
0.05, then the difference of the prediction methods was significant.
Y. Wang et al. / Science of the Total Environment 666 (2019) 975–993 989

Fig. 12. Landslide susceptibility maps for different CNN prediction methods. (a) CNN-1D; (b) CNN-2D; (c) CNN-3D; (d) LeNet-5.

Table 5 lists the chi-square values and significant levels of different comparison. The optimal C (27) and γ (2−9) for SVM were obtained
CNNs. All the CNNs are very different because the chi-square and signif- using a five-fold cross validation ranging from 2−5 and 2−15 to 215
icant level values of these methods clearly satisfied the above threshold and 25, respectively. Fig. 15 presents the landslide susceptibility maps
conditions previously mentioned. of the DNN and SVM methods.
To further validate the effectiveness of the proposed CNNs, CNN-2D, Table 6 lists the OA and MCC values of the three DL and SVM
which demonstrated the best performance in the previous experiments, methods. CNN-2D achieved the highest OA value of 77.63%, which is ap-
was selected to be compared with several of the most popular ML and proximately 7% higher than that of the optimized SVM (70.18%),
DL methods. Deep neural networks (DNNs) are typically feed-forward followed by LeNet-5 and DNN with very similar OA values of 73.25%
networks in which data flows from the input layer to the output layer and 71.05%, respectively. In terms of MCC, CNN-2D achieved a much
without looping back (CireşAn et al., 2012a). Initially, a DNN creates a higher MCC value than that of the optimized SVM. For example, the
virtual neural unit map and connects these neural units by weighting CNN-2D obtained the highest MCC value of 0.555, followed by the
them. Then, the input data multiplies with the weights producing a LeNet-5, DNN and SVM with MCC values of 0.510, 0.421 and 0.404,
probability between 0 and 1. The selected DNN is a five-layer network respectively.
architecture including four hidden fully connected layers. The four hid- Fig. 16 shows the ROC curves of all the methods using the validation
den layers have 50, 30, 20 and 10 neural units. The output layer obtains set. CNN-2D demonstrated better predictive power than the optimized
prediction results with two neural units, representing landslide and SVM in terms of AUC. Specifically, the two CNNs of CNN-2D and LeNet-5
non-landslide units, respectively. As a classical and robust model, the achieved AUC values of 0.813 and 0.807, respectively, and DNN obtained
SVM classifier with a radial basis function (RBF) kernel was used for an AUC value approximately 0.8.
990 Y. Wang et al. / Science of the Total Environment 666 (2019) 975–993

Fig. 13. Percentages of different landslide susceptibility classes.

4.4. Parameter analysis and SVM classifiers in Yanshan Country, China. Furthermore, different
data representations converted from raw landslide data are presented
In this subsection, the impact of dropout manipulation to landslide and fitted in the three proposed CNN architectures of CNN-1D, CNN-
spatial prediction is initially discussed. Then, the influence of activation 2D and CNN-3D, respectively.
functions in the CNN architecture is addressed. It should be noted that, Before analysing landslide susceptibility, it is very important to as-
for simplicity, these parameters were analysed using CNN-2D. sess the predictive capability of all of the influencing factors. To achieve
In the first experiment, we constructed two networks for compari- this objective, multicollinearity analysis was used to estimate correla-
son and trained them using the same training set. The first network tions between these factors and the GR method was employed to rank
structure was constructed using CNN-2D with two dropout manipula- the importance of these factors. The multicollinearity analysis results re-
tions, which have dropout rates of 0.4 and 0.3, respectively. The second vealed that the STI factor has strong multicollinearity and should be re-
network structure was built using the CNN-2D method without any moved from the subsequent process. Regarding the results of the GR
dropout manipulation. The plot of OA values obtained from the two net- method, the landuse and NDVI factors had higher AM values than the
works is shown in Fig. 17. In this figure, the term “Epoch” implies the other factors, indicating that the two factors were more important for
number of times that the network is trained using the entire training landslide occurrence. On the one hand, the FR values of the grass and
data. The OA value obtained from CNN-2D is shown to be effectively im- forest classes are much greater than the other three classes with regard
proved by including a dropout manipulation after each convolutional to the land use factor. On the other hand, NDVI can accurately display
process in the network. surface vegetation coverage. It should be noted that landslides continue
In the second experiment, two active functions of ReLU and tanh to occur in mountainous areas due to rainfall and external forces, even
were considered for comparison. The plot of OA values obtained using on slopes covered with significant vegetation.
CNN-2D with the two active functions is illustrated in Fig. 18. Although
the OA results obtained using CNN-2D with ReLU and tanh were not sta-
ble and oscillated, the CNN with ReLU achieved higher OA values than
those with tanh in most instances. In conclusion, the CNN with ReLU
as the activation function can produce more reliable prediction results.

5. Discussion

Landslides are very complex processes that are controlled by many


topographical and environmental factors. Moreover, LSM is of great sig-
nificance for analysing landslide prone areas in a visual way. Therefore,
the primary objective of this work is to apply CNN architectures for re-
gional landslide susceptibility analyses and compare them with DNN

Table 4
Performance of different methods.

Method OA value MCC

CNN-1D 74.12% 0.483


CNN-2D 77.63% 0.555
CNN-3D 75.88% 0.518
LeNet-5 73.25% 0.510
Fig. 14. ROC curves for all CNNs using the verification set.
Y. Wang et al. / Science of the Total Environment 666 (2019) 975–993 991

Table 5 Table 6
Chi-square values and significant levels of different CNNs. Performance of different prediction methods.

Comparative pairs Chi-square value p value Significance level Method OA value MCC

CNN-1D vs CNN-2D 279.002 b 0.0001 Yes CNN-2D 77.63% 0.555


CNN-1D vs CNN-3D 276.312 b 0.0001 Yes LeNet-5 73.25% 0.510
CNN-1D vs LeNet-5 197.787 b 0.0001 Yes DNN 71.05% 0.421
CNN-2D vs CNN-3D 256.968 b 0.0001 Yes SVM 70.18% 0.404
CNN-2D vs LeNet-5 343.222 b 0.0001 Yes
CNN-3D vs LeNet-5 340.595 b 0.0001 Yes

For the selected landslide influencing factors, three CNN architec-


tures were constructed to compare with the conventional LeNet-5
method. CNN can effectively extract spatial information using local con-
nections and can significantly reduce the number of network parame-
ters by sharing weights. The architecture of CNN-1D could exploit the
local correction and gradually learn more intricate representations
from factor vectors. In the case of CNN-2D, because the CNN technique
initially exhibited excellent performance in the visual image analysis
field, we converted each one-dimensional factor vector to a two-
dimensional matrix to sufficiently extract the hidden valuable features.
Moreover, CNN-3D not only learns factor representations but also ex-
tracts local spatial information. The proposed CNNs obtained higher
OA and MCC values than those of LeNet-5, and they also achieved better
prediction performance than the most popular DNN and SVM classifiers
in the subsequent experiments. Finally, CNN-2D achieved the highest
AUC value of 0.813 using the validation set, which reveals that this
two-dimensional structure can effectively improve prediction perfor-
mance and may be a promising method for future studies.
Recently, many ML methods have been applied and compared for
landslide spatial prediction in a given area, including decision tree Fig. 16. ROC curves for four methods using the verification set.
(Chen et al., 2017c), logistic regression (Tsangaratos and Ilia, 2016),
artificial neural network (Chen et al., 2017a) and SVM (Chen et al.,
2018a). Additionally, different ensemble methods have been developed, and plays an increasingly important role in computer vision, imaging
including AdaBoost (Hong et al., 2018), bagging (Pham et al., 2017a) processing and natural language processing. The DL technique can
and rotation forest (Pham et al., 2018). The convolutional ML methods automatically explore the representation needed for prediction from
have a limited ability to process data in their raw form (LeCun et al., raw data. Therefore, it is promising to explore the probability of apply-
2015). However, the DL technique is a powerful improvement to ML ing powerful DL methods in landslide susceptibility assessments.

Fig. 15. Landslide susceptibility maps of DNN (a) and SVM (b).
992 Y. Wang et al. / Science of the Total Environment 666 (2019) 975–993

or comparable to those of the optimized SVM in terms of OA, MCC and


AUC. Among the proposed CNNs, the CNN-2D method achieved the
highest OA, MCC and AUC values, and it can be used to produce reliable
landslide susceptibility maps. Finally, the prediction accuracies of LSM
using a CNN framework can be effectively improved through two strat-
egies: (1) the inclusion of a dropout manipulation after a convolutional
step when constructing the CNN architecture and (2) the selection of
ReLU as the activation function. In summary, CNNs are very promising
for landslide spatial prediction. In the future, our research will investi-
gate more efficient DL architectures for LSM.

Acknowledgements

We express our gratitude to Ralf Ludwig, editor of the journal Sci-


ence of the Total Environment, and the three anonymous reviewers
for their valuable comments and suggestions that improved the quality
of our paper. This work was supported by the National Natural Science
Foundation of China (61271408).

Fig. 17. OA values obtained using CNN-2D with and without dropout manipulations. References
Andrieu, C., De Freitas, N., Doucet, A., Jordan, M.I., 2003. An introduction to MCMC for ma-
chine learning. Mach. Learn. 50, 5–43.
Furthermore, the three proposed data representation forms provide a Anthimopoulos, M., Christodoulidis, S., Ebner, L., Christe, A., Mougiakakou, S., 2016. Lung
new way to handle raw landslide data. The experimental results pattern classification for interstitial lung diseases using a deep convolutional neural
showed that CNN-2D is superior to the classical DL technique of DNN network. IEEE Trans. Med. Imaging 35, 1207–1216.
Bengio, Y., Courville, A., Vincent, P., 2013. Representation learning: a review and new per-
and the conventional ML technique of SVM, which indicated that the
spectives. IEEE Trans. Pattern Anal. Mach. Intell. 35, 1798–1828.
three proposed CNNs may be promising and robust techniques for LSM. Bradley, A.P., 1997. The use of the area under the ROC curve in the evaluation of machine
learning algorithms. Pattern Recogn. 30, 1145–1159.
Broeckx, J., Vanmaercke, M., Duchateau, R., Poesen, J., 2018. A data-based landslide sus-
6. Conclusions
ceptibility map of Africa. Earth Sci. Rev. 185, 102–121.
Bui, D.T., et al., 2016a. GIS-based modeling of rainfall-induced landslides using data
This work investigates the application of a CNN framework for LSM mining-based functional trees classifier with AdaBoost, Bagging, and MultiBoost en-
in the case of Yanshan County, China. The framework is a useful meth- semble frameworks. Environmental Earth Sciences 75, 1101.
Bui, D.T., Tuan, T.A., Klempe, H., Pradhan, B., Revhaug, I., 2016b. Spatial prediction models
odology that can be applied to other areas in the world with similar for shallow landslide hazards: a comparative assessment of the efficacy of support
characteristics. The proposed CNNs were validated in the study area vector machines, artificial neural networks, kernel logistic regression, and logistic
based on the analysis of sixteen influencing factors that were derived model tree. Landslides 13, 361–378.
Bui, D.T., et al., 2017. Spatial prediction of rainfall-induced landslides for the Lao Cai area
from different ancillary data. The final landslide susceptibility maps of (Vietnam) using a hybrid intelligent approach of least squares support vector ma-
the study area were obtained using these CNNs in comparison to the chines inference model and artificial bee colony optimization. Landslides 14,
conventional ML and DL methods of SVM, DNN and LeNet-5. The valida- 447–458.
Chen, W., Pourghasemi, H.R., Zhao, Z., 2017a. A GIS-based comparative study of
tion of the results was conducted on the basis of the objective measures Dempster-Shafer, logistic regression and artificial neural network models for land-
of OA, MCC, ROC and AUC. The experimental results confirmed the fol- slide susceptibility mapping. Geocarto international 32, 367–385.
lowing conclusions. First, the landslide susceptibility maps obtained Chen, W., et al., 2017b. GIS-based landslide susceptibility modelling: a comparative as-
sessment of kernel logistic regression, Naïve-Bayes tree, and alternating decision
using the proposed CNNs are more practical for landslide prevention tree models. Geomatics, Natural Hazards and Risk 8, 950–973.
and management than those from conventional methods. Second, the Chen, W., et al., 2017c. A comparative study of logistic model tree, random forest, and
prediction results obtained using the proposed CNNs are better than classification and regression tree models for spatial prediction of landslide suscepti-
bility. Catena 151, 147–160.
Chen, W., Pourghasemi, H.R., Naghibi, S.A., 2018a. A comparative study of landslide sus-
ceptibility maps produced using support vector machine with different kernel func-
tions and entropy data mining models in China. Bull. Eng. Geol. Environ. 77, 647–664.
Chen, W., et al., 2018b. GIS-based landslide susceptibility evaluation using a novel hybrid
integration approach of bivariate statistical based random forest method. Catena 164,
135–149.
Chigira, M., Wu, X., Inokuchi, T., Wang, G., 2010. Landslides induced by the 2008
Wenchuan earthquake, Sichuan, China. Geomorphology 118, 225–238.
CireşAn, D., Meier, U., Masci, J., Schmidhuber, J., 2012a. Multi-column deep neural net-
work for traffic sign classification. Neural Netw. 32, 333–338.
Cireşan, D., Meier, U., Schmidhuber, J., 2012b. Multi-column deep neural networks for
image classification, 2012 IEEE Conference on Computer Vision and. Pattern Recogn.
3642–3649.
Dahl, G.E., Yu, D., Deng, L., Acero, A., 2012. Context-dependent pre-trained deep neural
networks for large-vocabulary speech recognition. IEEE Trans. Audio Speech Lang.
Process. 20, 30–42.
Dahl, G.E., Sainath, T.N., Hinton, G.E., 2013. Improving deep neural networks for LVCSR
using rectified linear units and dropout, Acoustics, Speech and Signal Processing
(ICASSP). 2013 IEEE International Conference on. IEEE 8609–8613.
Dai, F., Lee, C., Li, J., Xu, Z., 2001. Assessment of landslide susceptibility on the natural ter-
rain of Lantau Island, Hong Kong. Environ. Geol. 40, 381–391.
Dash, M., Liu, H., 1997. Feature selection for classification. Intelligent data analysis 1,
131–156.
Ding, A., Zhang, Q., Zhou, X., Dai, B., 2016. Automatic recognition of landslide based on
CNN and texture change detection, Chinese Association of Automation (YAC).
Youth Academic Annual Conference of. IEEE 444–448.
Duchi, J., Hazan, E., Singer, Y., 2011. Adaptive subgradient methods for online learning and
Fig. 18. OA values obtained using CNN-2D with ReLU and tanh activation functions. stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159.
Y. Wang et al. / Science of the Total Environment 666 (2019) 975–993 993

Girshick, R., 2015. Fast R-CNN. Proceedings of the IEEE international conference on com- Pham, B.T., Bui, D.T., Prakash, I., Dholakia, M., 2016c. Rotation forest fuzzy rule-based clas-
puter vision 1440–1448. sifier ensemble for spatial prediction of landslides using GIS. Nat. Hazards 83, 97–127.
Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.R., 2012. Improv- Pham, B.T., et al., 2017a. A novel ensemble classifier of rotation forest and Naïve Bayer for
ing neural networks by preventing co-adaptation of feature detectors. arXiv preprint landslide susceptibility assessment at the Luc Yen district, Yen Bai Province (Viet
arXiv:1207.0580. Nam) using GIS. Geomatics, Natural Hazards and Risk 8, 649–671.
Hong, H., Pourghasemi, H.R., Pourtaghi, Z.S., 2016a. Landslide susceptibility assessment in Pham, B.T., Bui, D.T., Pourghasemi, H.R., Indra, P., Dholakia, M., 2017b. Landslide suscepti-
Lianhua County (China): a comparison between a random forest data mining tech- bility assessment in the Uttarakhand area (India) using GIS: a comparison study of
nique and bivariate and multivariate statistical models. Geomorphology 259, prediction capability of naïve bayes, multilayer perceptron neural networks, and
105–118. functional trees methods. Theor. Appl. Climatol. 128, 255–273.
Hong, H., et al., 2016b. Spatial prediction of landslide hazard at the Luxi area (China) using Pham, B.T., Bui, D.T., Prakash, I., Nguyen, L.H., Dholakia, M., 2017c. A comparative study of
support vector machines. Environmental Earth Sciences 75, 40. sequential minimal optimization-based support vector machines, vote feature inter-
Hong, H., Pradhan, B., Sameen, M.I., Chen, W., Xu, C., 2017a. Spatial prediction of rotational vals, and logistic regression in landslide susceptibility assessment using GIS. Environ-
landslide using geographically weighted regression, logistic regression, and support mental Earth Sciences 76, 371.
vector machine models in Xing Guo area (China). Geomatics, Natural Hazards and Pham, B.T., Tien Bui, D., Prakash, I., Dholakia, M.B., 2017d. Hybrid integration of multilayer
Risk 8, 1997–2022. perceptron neural networks and machine learning ensembles for landslide suscepti-
Hong, H., Tsangaratos, P., Ilia, I., Chen, W., Xu, C., 2017b. Comparing the Performance of a bility assessment at Himalayan area (India) using GIS. Catena, 149, Part 1, 52-63.
Logistic Regression and a Random Forest Model in Landslide Susceptibility Assess- Pham, B.T., Shirzadi, A., Bui, D.T., Prakash, I., Dholakia, M., 2018. A hybrid machine learning
ments. The Case of Wuyaun Area, China, Workshop on World Landslide Forum. ensemble approach based on a radial basis function neural network and rotation for-
Springer, pp. 1043–1050. est for landslide susceptibility modeling: a case study in the Himalayan area, India. In-
Hong, H., et al., 2018. Landslide susceptibility mapping using J48 Decision Tree with ternational Journal of Sediment Research 33, 157–170.
AdaBoost, Bagging and Rotation Forest ensembles in the Guangchang area (China). Pourghasemi, H.R., Mohammady, M., Pradhan, B., 2012. Landslide susceptibility mapping
Catena 163, 399–413. using index of entropy and conditional probability models in GIS: Safarood Basin,
Hoo-Chang, S., et al., 2016. Deep convolutional neural networks for computer-aided de- Iran. Catena 97, 71–84.
tection: CNN architectures, dataset characteristics and transfer learning. IEEE Trans. Pulighe, G., Baiocchi, V., Lupia, F., 2016. Horizontal accuracy assessment of very high res-
Med. Imaging 35, 1285. olution Google Earth images in the city of Rome, Italy. International Journal of Digital
Huang, G.-B., Babri, H.A., 1998. Upper bounds on the number of hidden neurons in Earth 9, 342–362.
feedforward networks with arbitrary bounded nonlinear activation functions. IEEE Reichenbach, P., Rossi, M., Malamud, B., Mihir, M., Guzzetti, F., 2018. A review of
Trans. Neural Netw. 9, 224–229. statistically-based landslide susceptibility models. Earth Sci. Rev. 180, 60–91.
Kavzoglu, T., Sahin, E.K., Colkesen, I., 2014. Landslide susceptibility mapping using GIS- Ren, S., He, K., Girshick, R., Sun, J., 2015. Faster R-CNN: towards real-time object detection
based multi-criteria decision analysis, support vector machines, and logistic regres- with region proposal networks. Adv. Neural Inf. Proces. Syst. 91–99.
sion. Landslides 11, 425–439. Schmidhuber, J., 2015. Deep learning in neural networks: an overview. Neural Netw. 61,
Krizhevsky, A., Sutskever, I., Hinton, G.E., 2012. Imagenet classification with deep 85–117.
convolutional neural networks. Adv. Neural Inf. Proces. Syst. 1097–1105. Sharif Razavian, A., Azizpour, H., Sullivan, J., Carlsson, S., 2014. CNN features off-the-shelf:
Kuncheva, L.I., 2004. Combining Pattern Classifiers: Methods and Algorithms. John Wiley an astounding baseline for recognition, Proceedings of the IEEE conference on com-
& Sons. puter vision and pattern recognition workshops, pp. 806–813.
Lawrence, S., Giles, C.L., Tsoi, A.C., Back, A.D., 1997. Face recognition: a convolutional Shirzadi, A., et al., 2017. Shallow landslide susceptibility assessment using a novel hybrid
neural-network approach. IEEE Trans. Neural Netw. 8, 98–113. intelligence approach. Environmental Earth Sciences 76, 60.
LeCun, Y., et al., 1995. Comparison of Learning Algorithms for Handwritten Digit Recogni- Simard, P.Y., Steinkraus, D., Platt, J.C., 2003. Best practices for convolutional neural net-
tion, International Conference on Artificial Neural Networks. Perth, Australia, works applied to visual document analysis. null. IEEE 958.
pp. 53–60. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R., 2014. Dropout: a
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P., 1998. Gradient-based learning applied to doc- simple way to prevent neural networks from overfitting. The Journal of Machine
ument recognition. Proc. IEEE 86, 2278–2324. Learning Research 15, 1929–1958.
LeCun, Y., Bengio, Y., Hinton, G., 2015. Deep learning. nature 521, 436. Szegedy, C., et al., 2015. Going deeper with convolutions. Proc. IEEE Conf. Comput. Vis.
Maas, A.L., Hannun, A.Y., Ng, A.Y., 2013. Rectifier nonlinearities improve neural network Pattern Recognit. 1–9.
acoustic models, Proc. icml, pp. 3. Tallarida, R.J., Murray, R.B., 1987. Chi-square Test, Manual of Pharmacologic Calculations.
Mandal, S., Mandal, K., 2018. Modeling and mapping landslide susceptibility zones using Springer, pp. 140–142.
GIS based multivariate binary logistic regression (LR) model in the Rorachu river Tsangaratos, P., Ilia, I., 2016. Comparison of a logistic regression and Naïve Bayes classifier
basin of eastern Sikkim Himalaya, India. Modeling Earth Systems and Environment in landslide susceptibility assessments: the influence of models complexity and train-
1–20. ing dataset size. Catena 145, 164–179.
Matthews, B.W., 1975. Comparison of the predicted and observed secondary structure of Tsangaratos, P., Ilia, I., Hong, H., Chen, W., Xu, C., 2017. Applying Information Theory and
T4 phage lysozyme. Biochimica et Biophysica Acta (BBA)-Protein Structure 405, GIS-based quantitative methods to produce landslide susceptibility maps in
442–451. Nancheng County, China. Landslides 14, 1091–1111.
O'brien, R.M., 2007. A caution regarding rules of thumb for variance inflation factors. Qual. Varnes, D.J., 1984. Landslide Hazard Zonation: A Review of Principles and Practice.
Quant. 41, 673–690. Wang, L.-J., Guo, M., Sawada, K., Lin, J., Zhang, J., 2016. A comparative study of landslide
Oh, H.-J., Pradhan, B., 2011. Application of a neuro-fuzzy model to landslide-susceptibility susceptibility maps using logistic regression, frequency ratio, decision tree, weights
mapping for shallow landslides in a tropical hilly area. Comput. Geosci. 37, of evidence and artificial neural network. Geosci. J. 20, 117–136.
1264–1276. Wang, Q., Wang, Y., Niu, R., Peng, L., 2017. Integration of information theory, K-means
Oh, H.-J., Kim, Y.-S., Choi, J.-K., Park, E., Lee, S., 2011. GIS mapping of regional probabilistic cluster analysis and the logistic regression model for landslide susceptibility mapping
groundwater potential in the area of Pohang City, Korea. J. Hydrol. 399, 158–172. in the Three Gorges Area, China. Remote Sens. 9, 938.
Pham, B.T., Tien Bui, D., Indra, P., Dholakia, M., 2015. Landslide susceptibility assessment Yingying Tian, C.X., Hong, Haoyuan, Zhou, Qing, Wang, Duo, 2019. Mapping earthquake-
at a part of Uttarakhand Himalaya, India using GIS–based statistical approach of fre- triggered landslide susceptibility by use of artificial neural network (ANN) models an
quency ratio method. Int J Eng Res Technol 4, 338–344. example of the 2013 Minxian (China) Mw 5.9 event. Geomatics, Natural Hazards and
Pham, B.T., Bui, D., Prakash, I., Dholakia, M., 2016a. Evaluation of predictive ability of sup- Risk. 10, 1–25.
port vector machines and naive Bayes trees methods for spatial prediction of land- Yu, H., Ma, Y., Wang, L., Zhai, Y., Wang, X., 2017. A landslide intelligent detection method
slides in Uttarakhand state (India) using GIS. J. Geom. 10, 71–79. based on CNN and RSG_R, Mechatronics and Automation (ICMA), 2017 IEEE Interna-
Pham, B.T., Bui, D.T., Dholakia, M., Prakash, I., Pham, H.V., 2016b. A comparative study of tional Conference on. IEEE 40–44.
least square support vector machines and multiclass alternating decision trees for Zhu, A.-X., et al., 2018. A comparative study of an expert knowledge-based model and two
spatial prediction of rainfall-induced landslides in a tropical cyclones area. Geotech. data-driven models for landslide susceptibility mapping. Catena 166, 317–327.
Geol. Eng. 34, 1807–1824.

You might also like