MutarandPradhan2019 IEEEACCESS FullCitation

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/333498072
A Spatial Ensemble Model for Rockfall Source Identiﬁcation from High

Resolution LiDAR Data and GIS
Article in IEEE Access · May 2019

DOI: 10.1109/ACCESS.2019.2919977
CITATIONS READS
22 186
2 authors:
Ali Fanos Biswajeet Pradhan

Universiti Putra Malaysia University of Technology Sydney
12 PUBLICATIONS 291 CITATIONS 1,066 PUBLICATIONS 61,248 CITATIONS
SEE PROFILE SEE PROFILE
All content following this page was uploaded by Biswajeet Pradhan on 10 November 2021.
The user has requested enhancement of the downloaded file.

Received May 15, 2019, accepted May 27, 2019, date of publication May 30, 2019, date of current version June 20, 2019.
Digital Object Identifier 10.1109/ACCESS.2019.2919977
A Spatial Ensemble Model for Rockfall Source

Identification From High Resolution
LiDAR Data and GIS
ALI MUTAR FANOS1 AND BISWAJEET PRADHAN 2,3 , (Senior Member, IEEE)
1 Department of Civil Engineering, Faculty of Engineering, Universiti Putra Malaysia (UPM), Serdang 43400, Malaysia
2 Centre for Advanced Modelling and Geospatial Information Systems (CAMGIS), Faculty of Engineering and Information Technology, University of Technology
Sydney, Ultimo, NSW 2007, Australia

3 Department of Energy and Mineral Resources Engineering, Choongmu-gwan, Sejong University, Seoul 05006, South Korea
Corresponding author: Biswajeet Pradhan (biswajet.pradhan@uts.edu.au)

This work was supported by the Centre for Advanced Modelling and Geospatial Information Systems (CAMGIS), UTS, under
Grant 323930, Grant 321740.2232335, and Grant 321740.2232357.
ABSTRACT Rockfall source identification is the most challenging task in rockfall hazard and risk assess-
ment. This difficulty rises in the areas where there is a presence of other types of the landslide, such as shallow
landslide and debris flow. The aim of this paper is to develop and test a hybrid model that can accurately
identify the source areas. High-resolution light detection and ranging data (LiDAR) was employed to derive
the digital terrain model (DTM), from which several conditioning factors were extracted. These conditioning
factors were optimized utilizing the ant colony optimization (ACO). Different machine learning algorithms,
namely, logistic regression (LR), random tree (RT), random forest (RF), support vector machine (SVM), and
artificial neural network (ANN), in addition to their ensemble models (stacking, bagging, and voting), were
examined. This is based on the selected best subset of conditioning factors and inventory dataset. Stacking
LR-RT (the best fit model) was then utilized to produce the probabilities of different landslide types. On the
other hand, the Gaussian mixture model (GMM) was optimized and applied for automatically identifying the
slope threshold of the occurrence of the different landslide types. In order to reduce the model sensitivity to
the alteration in various conditioning factors and to improve the model computations performance, land use
probability area was formed. The rockfall sources were identified by integrating the probability maps and
the reclassified slope raster based on the GMM results. The accuracy assessment reveals that the developed
hybrid model can identify the probable rockfall regions with an accuracy of 0.95 based on the validation
dataset and 0.94 on based the training dataset. The slope thresholds calculated by GMM were found to
be > 58◦ , 22◦ –58◦ , and 9◦ –22◦ for rockfall, shallow landslide, and debris flow, respectively. This indicates
that the model can be generalized and replicated in different regions, and the proposed method can be applied
in various landslides studies.
INDEX TERMS Remote sensing, machine learning, rockfall, hybrid model, GIS, LiDAR.
I. INTRODUCTION goods. Rockfall can be caused by rainfall, weathering, joint-

Rockfall is a natural phenomenon encompassing a downs- ing, man-made or the combination of them [2].
lope movement of a detached boulder or groups of boul- Rockfall source identification is of essential signifi-
ders including free falling/flying, bouncing, sliding, and cance for generating rockfall hazard maps [3]. Numerous
rolling [1]. Rockfall can result in serious damage to infras- approaches have been developed for the rockfall sources
tructures and structures, even with small rock magnitude, identification at local and regional scales [4], and to assess
consequently representing a pertinent risk for people and the resulting susceptibility [5], or hazard [6]. The availability
of high-resolution Digital Terrain Models (DTM) allows the
topography analysis at a regional scale with high levels of
The associate editor coordinating the review of this manuscript and
details. Nevertheless, the identification of probable rockfall
approving it for publication was Lefei Zhang. sources at a regional scale is one of the major challenges
2169-3536 2019 IEEE. Translations and content mining are permitted for academic research only.
74570 Personal use is also permitted, but republication/redistribution requires IEEE permission. VOLUME 7, 2019
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
A. M. Fanos, B. Pradhan: Spatial Ensemble Model for Rockfall Source Identification
in the assessment of rockfall hazard [7]. Rockfall source Despite the numerous attempts that have been conducted
regions, that correspond to the detachment regions of rocks, for rockfall source identification using photogrammetry or
are normally taken from evidences such as scree deposits at laser scanning data in local scale [21]–[23], and the few
foot-slope and talus slope [8], historical and field inventory studies in regional scale [4], There is a main problem which
of fallen rocks. Therefore, a DTM-based geomorphomet- is still not considered. This problem is when the study area
ric methods become more appropriate for identification of contains other types of landslides that have nearly comparable
probable rockfall sources [6]. The potential rockfall sources geomorphometric characteristics such as shallow landslide,
are identified using the slope angle distribution obtained rockfall, and debris flow. Nevertheless, Fanos [24] proposed
from high-resolution DTM crossed with other information a model for rockfall source identification using a single
obtained from topographic and geological maps as GIS lay- machine learning algorithm. However, their study did not
ers. The distribution of slope angle can be decomposed in take into account the optimization of the algorithm hyperpa-
many Gaussian distributions that can be considered as mor- rameters that can significantly influence the derived results.
phological units characteristics [9]. Moreover, they used a limited conditioning factors without
The steep slope angle is one of the major elements optimization or examining the multicollinearity among these
necessary for rockfall initiation [10]. The processes of factors. In addition, the GMM was run without optimization
gravity-driven surfaces in mountainous areas are exceedingly and based on the inventory dataset not on the geomorpholog-
associated with the topography steepness and the relief mor- ical units of the slope. Therefore, this paper aims to develop
phology, and thus reflects these instabilities [11]. The stabil- and test an ensemble model using high-resolution LiDAR
ity slope angles rely on the type of rock and the mechanical data to solve such problem. This model is based on the
and geometrical properties of the discontinuities set [12]. combination of machine learning algorithms and Gaussian
Consequently, it can be considered that the distribution of Mixture Model (GMM). The former is to produce the prob-
slope angle reflects the relief type, such as plains, alluvial, ability maps of different landslide types (rockfall, shallow
glacial, etc., and the rocks mechanical properties. Therefore, landslide, and debris flow), and the latter is to determine the
various rock types and morphologies result in a range of slope thresholds of different landslide types. Kinta Valley,
slope angle values that are characteristic for a particular Malaysia was selected to perform the proposed ensemble
morphotectonic setting and expressed in the distribution of model.
slope angle [13].
In the past decades landslide probability, hazard, and
risk have been widely analyzed and explored. Neverthe- II. STUDY AREA
less, the selection of optimized conditioning factors in such Rockfall incidents are frequently occurred in Malaysia due
application remains a difficult mission [4]. The mapping to the high and steep terrain in addition to high density of
of landslide probability employs topological, hydrological, rainfall. Kinta Valley which is situated in the eastern part
geological, and environmental factors. Some researchers of Malaysia (Fig. 1), is one of the most hardly hit dis-
presume that the precision and accuracy of the produced prob- tricts in Malaysia which contributes to the economic and
ability maps increase by increasing the number of condition- tourism sectors. Approximately, the Kinta district has an area
ing factors. In contrast, other researchers have proved that a of 1,950 km2 . Geographically, the area is located approxi-
small number of conditioning factors are adequate to generate mately between the northeast corner (101◦ 80 33, 4◦ 390 0400 )
landslide probability map with a reasonable accuracy [14]. and the southwest corner (101◦ 30 3200 , 4◦ 310 3100 ).
Depending on the aim, there are numerous categories of The geological setting of Kinta Valley and adjacent regions
data mining tasks. Classification is one of those categories. are limestone bedrock, granitic hills, and mine. Consequently,
The main aim of classification is to gain knowledge of hidden Kinta Valley and its surrounding areas have encountered
patterns to make prediction about the class of some unknown several engineering geologic problems including rockfall,
data [15], [16]. For data classification, most of the standard landslide, and debris flow. The limestone bedrocks rise over
machine learning algorithms can be utilized effectively for the alluvial plain in every region forming limestone hills with
classification precision if class labels are equally scattered. very steep slopes (sub- vertical to vertical). The study area
However, such algorithms show less or poor learning per- consists of diverse lithology with a high coverage of igneous
formance in case of classifying the imbalanced dataset that rocks. Such features are existing in the high-altitude areas
have variation in the class labels [17]. On the other hand, on the east and west sides of Kinta Valley. The sedimen-
some new methods were proposed for classification issues tary (limestone) and metamorphic rocks (marble) and are
such as robust manifold matrix factorization [18] and deep widely present in Kinta Valley [25].
learning [19]. In order to get the accuracy of classification Kinta Valley receives high precipitation along the year
algorithms, one or more algorithms can be combined and with average annual rainfall of 321 mm. In addition, it expe-
can get the reasonable accuracy. The process of combin- riences tropical climate with temperature ranging between
ing the multiple algorithms is called ensembling [18]. The 23 ◦ C and 35 ◦ C. The humidity in the study area is rela-
most popular ensemble models are voting, bagging, and tively high ∼81.5% (Meteorological Service Department of
stacking [20]. Malaysia).
VOLUME 7, 2019 74571

FIGURE 1. Study area.
III. USED DATA

This section demonstrates the dataset that were used to imple-
ment the proposed ensemble model. LiDAR data is the main
dataset that were used to produce a high-resolution DTM and
main conditioning factors. The LiDAR data were gathered
in 2015 with a flight height of 1000 m and average point
density of 8 pts./m2 using an airborne laser scanning system
(ALS). In addition, different GIS layers were used as condi-
tioning factors. Inventory data were also utilized to train and
validate the models.
A. ROCKFALL CONDITIONING FACTORS

Landslides and rockfalls are controlled by various
conditioning factors and cannot be explained using individ-
ual conditioning factors [4]. Therefore, this study employs
many conditioning factors for identifying the rockfall sources
within Kinta Valley. These factors encompassed hydrologi-
cal, morphological, geological, anthropogenic, and environ-
mental factors (Fig. 2), which are commonly mentioned in the
literature. Various conditioning factors were obtained from
LiDAR dataset, satellite images, and government agencies
data archives.
The morphological factors (elevation, aspect, slope, and
curvature) were extracted from LiDAR and GIS spatial anal-
ysis tools. Furthermore, four hydrological factors included
Stream Power Index (SPI), Sediment Transport Index (STI),
Topographic Roughness Index (TRI) and Topographic Wet-
ness Index (TWI). Other factors such as flow length, distance
to stream, distances to roads and lineaments were included.
Geological factors such as lineaments are main triggering
for landslides [26]. The anthropogenic factors included land
use. The land use map was produced by classifying a high-
resolution aerial photo (0.1 m) with supervised SVM method FIGURE 2. Landslide conditioning factors.
74572 VOLUME 7, 2019

subsets (training (70%) and testing (30%)) ensuring each

group has all the landslides types [27].
IV. METHODOLOGY
This section presents an overview of the proposed hybrid
model to identify the potential rockfall sources using
high-resolution LiDAR dataset. The model is based on an
ensemble stacking LR-RT and GMM methods. The details
of this model and its implementation and validation methods
are given in the following sub-sections. The ensemble models
were implemented through Python, whereas the GMM and
ACO were implemented using MATLAB (2017).
A. OVERALL METHODOLOGY
The developing steps of the proposed model are illustrated
in Figure 3, which include four major steps. In the first stage,
field and remote sensing dataset were obtained. ALS dataset,
landslides inventory, and a GIS dataset that contains topo-
graphical, geological, rainfall, and vegetation were prepared.
The acquired datasets were pre-processed in the second step.
In addition, the conditioning factors were optimized using
ACO. The ACO was run with different subsets of the con-
ditioning factors to find the best subset for each landslides
types. Three various machine learning algorithms (RF, SVM,
and LR) were employed to validate and compare the obtained
results. The third stage was the core-processing module in the
FIGURE 2. (Continued.) Landslide conditioning factors. proposed method, which consisted of developing the stacking
LR - RT and GMM models. The stacking LR - RT model
was developed to produce a probability of landslide, rockfall,
with an accuracy of (91%). Field surveys were conducted to and debris flow occurrences in the area using the inventory
verify the land use map. A total number of twelve classes dataset and the best subset of the conditioning factors. The
were specified as shown in Figure 2. The lithology of Kinta stacking LR - RT model was employed after performing an
Valley is predominantly limestone / marble. Other types
include sandstone and granite are also presence in the study
area.
Finally, areas with low vegetation density are more
exposed to erosion and increase instability than highly vege-
tated regions. The low vegetation coverage can simply result
in landslides formation. In the current research, vegetation
density was utilized as a conditioning factor for mapping
landslide and rockfalls probabilities.
B. INVENTORY DATASET
The landslide inventory was obtained from different sources
involving historical records, field surveys, and remote sens-
ing. SPOT fused images and aerial photo (0.1 m) were uti-
lized for the visual identification of landslides in the selected
area. Field surveys were performed to map the landslides
can be occurred beneath vegetation or in regions that can-
not be detected from the satellite images. Multiple in-situ
campaigns were performed utilizing a GNSS device. A total
number of 179 samples with their related characteristics were
prepared for assessment (Figure 1). The inventory dataset
include three types of landslide namely rockfall, shallow FIGURE 3. The workflow of the proposed integrated model for detecting
landslides, and debris flow. The inventory was split into two potential rockfall sources.
VOLUME 7, 2019 74573

elaborate comparative study with other individual ensemble producing probability map is to evaluate the significance of
models such as LR, RT, RF, SVM, and ANN and their each factor. The conditioning factors building is a hard mis-
ensembles (bagging, voting, and stacking). The selection sion and no particular rules exists to define the number of the
of the base models hyperparameters is significantly affect conditioning factors that adequate for a particular probability
the performance a model. In the current study, grid search assessment. Moreover, no frameworks exist for the chosen
method was used to select these parameters. The best model of conditioning factors. Such factors are normally selected
was selected based on the standard accuracy metrics such as based on the experts’ opinions [32].
overall accuracy, Receiver Operating Characteristics (ROC) In this research two conditioning factors datasets were
curves, and Prediction Rate Curve (PRC). Then the best fit built within the environment of GIS. The first dataset
ensemble model (stacking LR – RT) was trained based on were obtained from the airborne LiDAR data encompass-
the inventory dataset with the best subset of the conditioning ing of eight landslide conditioning factors: altitude, aspect,
factors and thus the landslide probability was generated for slope, curvature, sediment transport index (STI), topographic
each type. On the other hand, The GMM is developed to roughness index (TRI), topographic wetness index (TWI),
determine the distributions of slope angle and deriving the and stream power index (SPI). The second dataset was the
geomorphological units in the selected area. The GMM was addition of other conditioning factors: environmental and
first optimized to find the best number of Gaussian compo- geological factors of, land use/cover (LULC), distance from
nents. Consequently, GMM was run using the slope dataset road and river, soil, and geology. Seventeen conditioning
of the selected study areas to determine the slope thresh- factors were used for landslide probability mapping.
olds of each landslides types automatically. The final step The analysis of multicollinearity is a significant step in
composed of mapping, validations, and model comparisons landslide probability. The presence of a near-linear relation
with other models. Consequently, the produced landslides among factors can produce a division-by-zero problem during
probabilities were integrated with the reclassified slope based regression calculations. Such problem can result in abortion
on the obtained thresholds from GMM and then land-use of the calculations and inexact relationship; division by a very
probability area to detect the potential landslide, debris flow, small quantity still deforms the result. Consequently, it is sig-
and rockfall source areas. nificant to analyze the landslide conditioning factors before
probability mapping. The collinear (dependent) factors can be
B. DTM GENERATION identified through multicollinearity analysis by examining a
Filtering process is fundamental step to derive an accu- correlation matrix built via computing R2 . Several quantita-
rate Digital Terrain Model (DTM). This because of the raw tive approaches are existing for multicollinearity detection,
LiDAR data contain ground and up-ground points. Several such as examination the eigenvalue in a correlation matrix,
filtering methods are existed for generating DTM, and this evaluation the variance infiation factor (VIF), and pairwise
study employs two various algorithms: morphology-based scatter plots. In the current research, the VIF values were cal-
filtering and multi-scale comparison. The former is suitable culated to detect the multicollinearity for each conditioning
for terrain with small features and steep terrains whereas the factor. Moreover, communality similar to R2 was computed
later appropriate for urban area [28]. Simple Morphological for each factor.
Filter (SMRF) is a morphology-based method proposed by After multicollinearity was performed, Ant Colony Opti-
Pingel et al. [29]. The multi-resolution hierarchical classifi- mization (ACO) with three different types of machine learn-
cation (MHC) is a multi-scale comparison method developed ing algorithms (RF, SVM, and LR), was tested and compared.
by Chen et al. [30]. Generally, the advantage of the multiscale Consequently, the best fit method was utilized to identify the
comparison filtering methods is deriving an accurate DTM in best subset of landslides conditioning factors. The area under
urban regions with different man-made natural objects [28]. curve (AUC) and overall accuracy were used to assess the
Both SMRF and MHC were employed in this research to filter obtained results.
the point clouds of different regions of the study area. In last decade ACO has attracted an increasing num-
Once the point clouds were filtered by removing the non- ber of researchers, and its applications have developed
ground points, both Kriging and IDW were tested based on remarkably [33]. ACO was efficiently employed in sev-
cross-validation method. IDW was found as an appropriate eral remote sensing applications, such as optimal attribute
interpolation method due to the high density of the point subset selection [14], image segmentation [34], selection of
cloud [31]. Consequently, IDW interpolation method was parameters [35], and object derivation [36]. The advantages
used to generate the DTM based on the remaining points with of ACO are, first, it makes a probabilistic decision in terms
a resolution of 0.5 m. of local heuristic information and artificial pheromone trails
and consequently permits the examination of a huge solu-
C. OPTIMIZATION OF THE CONDITIONING FACTORS tions number than greedy heuristic [36]. Second, it assists
The variances between the factors characteristics should be the evaporation of pheromone trail, that is a process which
assessed to generate landslide probability maps that utilizes result in decreasing of pheromone trail intensity along time.
different conditioning factors. The conditioning factors char- The evaporation of pheromone aids to prevent the fast algo-
acteristics differ from area to another, thus, the first step in rithm convergence toward a suboptimal area [37]. Third, in
74574 VOLUME 7, 2019

the rule exploring context, an ACO algorithm can imple- TABLE 1. The selected landslide conditioning factors.
ment a robust, flexible search for an optimal terms com-
bination (logical conditions) including the predictor factors
values [38]. Moreover, ACO provides better and more reli-
able results than genetic algorithm (GA) and the common
sequence method (CSM) [39], [40]. The overall workflow of
ACO-based attribute selection is presented in Figure. 4.
training dataset into two separate groups, train many base

learners using the first group, test the base learners using
the second group, and utilizing the prediction from the pre-
vious step as an input, and a proper response as the output,
train a higher level learner. Boosting is an algorithm that
can be presented as a model averaging approach. It basically
developed for classification. However, it also can be used
for regression applications. First, a weak classifier is created
then a sequence of models are constructed iteratively. Every
model trained using the dataset in which points misclassified
via the former model are assigned more weight. Eventually,
the sequence models are weighted based on their success and
the results are gathered through voting thus producing a final
model. Boosting means applying a weak classifier, running
it multiple times on the training data and then allowing the
FIGURE 4. The workflow of ACO-based factors selection.
learned classifier to vote. On the other hand, voting ensembles
Several conditioning factors are available for landslide create multiple models (typically of different algorithms), and
probability mapping. However, not all the factors are useful, simple statistics (such as computing the majority and mean)
therefore, these factors should be evaluated, and only impor- are utilized to combine prediction.
tant (related) factors must be chosen to derive better landslide In this study, several base models such as LR, RT, RF,
probability result. Consequently, ACO method was employed SVM, ANN were used. In addition, three ensemble meth-
in this research to choose important factors in the probability ods, stacking, boosting, and voting was investigated. The
of landslide. A total of 17 conditioning factors (Table 1), were grid search method was employed in order to optimize the
primary chosen according to the condition of the study area base models. The best model is then decided according
and literature. Afterward, the best factors subset with their to AUC and overall accuracy. A grid search is a standard
attributes was selected via ACO, that can attain the high- search method for selecting sub-optimal hyperparameters of
est potential classification result. Different machine learning a machine learning / statistical model. Suppose there are k
algorithms (RF, SVM, and LR) were utilized for the evalu- parameters and each of them has ci values, then the number
ation process. The conditioning factors were split into two of search possibilities (P) is:
different groups (70%) training and (30%) testing. In order to k
Y
perform ACO factors selection, the ACO parameters values P= ci (1)
should be set according to previous works and a preliminary i=1
analysis and [41].
E. GAUSSIAN MIXTURE MODEL (GMM)
D. ENSEMBLE MODEL Gaussian Mixture Model (GMM) is a parametric probability
Ensemble models are methods that combine multiple base density function represented as a weighted sum of Gaus-
models to create a more robust model that can produce sian component densities [46]. GMM is frequently utilized
improved results. Ensemble models are often more accurate as a parametric model of the probabilities distribution of
than single models [42]–[44]. There are several ensemble features or continuous measurements in biometric systems.
methods such as stacking, boosting, and voting. Stacking is The parameters of GMM are derived from training dataset
defined as a way of combining multiple machine learning utilizing the Maximum A Posteriori (MAP) estimation from
models [45]. It usually combines models of different types a well-trained prior model or the iterative Expectation-
of classifiers. The basic steps of stacking are: divide the Maximization (EM) algorithm.
VOLUME 7, 2019 74575

A GMM is a weighted sum of M component Gaussian optimized, and then penalize it with the parameters number
densities as given by the equation, in the model (the complexity of the model). Nevertheless,
M
the AIC penalizes for complexity less severely than BIC.
X Thus, the BIC tends to select simpler model that may underfit,
p(X/λ) = wi g (x/µi , 6i ) (2)
whereas AIC tends to select more complex model that may
i=1
overfit. Therefore, it is better to look at AIC and BIC criteria
In which x is a D-dimensional continuous-valued data to decide the best model. The best fit model is with lower AIC
vector (i.e. features or measurements), wi , i = 1, . . . .M, are or BIC values.
the mixture weights, and g (X /µi , 6i ) , i = 1, . . . , M are the The GMM was trained on slope dataset extracted from
component Gaussian densities. Each component density is a three different areas within the area of interest. Iterative
D-variate Gaussian function of the form, search based on AIC and BIC values was performed to select
X 1 the hyperparameters of the model. The output of the GMM
g X /µi , = P 1/2 model was thresholds of slope angle (MUs), which allows
i
(2π)D/2 i
detecting the probable sources of various landslide types
1 X−1
automatically.
exp − (X − µi )0 (X − µi ) (3)
2 i
With mean vector µi and covariance matrix i The mix-

P
F. ACCURACY ASSESSMENT
PM
ture weights satisfy the constraint that i=1 wi = 1. The proposed ensemble model was validated based on the
The complete Gaussian mixture model is parameterized by success and prediction rate curves. These curves show the
the mixture weights, covariance matrices, and mean vectors cumulative frequency graph and explain the known landslides
from all component densities. These parameters are collec- percentage that fall into probability level ranks [49]. The
tively represented through the notation, success rate was generated from the landslides utilized for
n Xo training whereas the prediction rate was generated using the
λ = wi , µi , i = 1, . . . , M. (4) landslides utilized for validation. Moreover, AUC was used
i
to identify the accuracy of the probability maps qualitatively
Many variants are existing on the GMM P illustrated in in which greater AUC means higher accuracy attained [43].
Equation (3). The covariance matrices, i , can be con- In addition, some of the recently proposed ensemble mod-
strained or full rank to be diagonal. In addition, parameters els were tested to assess their performance in compari-
can be tied or shared among the components of Gaussian, son with the proposed ensemble model (Stacking LR-RT).
such as having a frequent covariance matrix for all compo- These models are comprised of random forest and frequency
nents. The selection of model setting (parameter tying, diag- ratio (RFFR) [50], evidential belief function based logis-
onal or full covariance matrices, and components number) is tic model tree (EBF-LMT) [51], and Rotation Forest-based
normally identified via the magnitude of dataset available for Radial Basis Function (RFRBF) neural network [52].
the estimation of the GMM parameters and how the GMM is Field verification was also carried out to validate the iden-
utilized in a specified biometric application. tified rockfall source areas. Several locations that distributed
The terrain morphology displays the slope angle char- over the whole study area, were selected randomly from the
acteristics that can be straight linked to the geomorphic produced map and then in-situ validation was performed.
processes implicated in the stability of slopes. The rockfall
sources are frequently exist in the steepest morphological
units [47]. Consequently, Michoud et al. [48] have con- V. RESULTS AND DISCUSSION
structed a DEM-based geomorphometric method to identify This section presents the main results and discusses the
these morphological units and thus rockfall sources. Using major findings obtained from this work. First, a summary
this method, the distribution of slope angle is decomposed statistics of modeling data and its pre-processing is given.
in many Gaussian distributions that can be considered mor- Then, the results of stacking LR - RF model including the
phological units characteristics such as rock cliff, steep slope, probability maps, optimization results, and comparisons with
moderate steep, foot slope, and plain. When the slope angle other methods are explained. After that, the results of GMM
exceeds a certain threshold, the terrain is considered a prob- are presented. Finally, the validation and field verification
able source of rockfall which in turn is determined when the were discussed.
Gaussian distribution of the morphological units rock cliff
become dominant over the steep slopes units [48]. A. SUMMARY STATISTICS
In many applications, the P components number (k), and The inventory data had 179 sampling points. The slope angles
proper covariance structure, i , are unknown. In order to in the landslide data samples ranged from 19◦ to 49◦ with an
tune a GMM comparison of information criteria has to be per- average slope angle of 34.75◦ (std. = 13.43◦ ). On the other
form. Akaike’s Information Criterion (AIC) and the Bayesian hand, the slope angles in the rockfall data samples had a min-
Information Criterion (BIC) are the most common informa- imum of 47◦ and a maximum of 76.39◦ . The average slope
tion criteria. AIC and BIC take the negative log likelihood, angle was 65.86◦ , and the standard deviation was 11.34◦ .
74576 VOLUME 7, 2019

TABLE 2. The assessment of multicollinearity for the conditioning factors

in landslide, rockfall, and debris flow data samples.
For debris flow the slope angles ranged from 7◦ to 25◦ and
the average slope angle was 15.13◦ .
In addition, since the sampling points (landslide, debris
flow, and rockfall) are subjected to strong correlations in
different conditioning factors, the multicollinearity of the
factors was analyzed using the Variance Inflated Factor (VIF)
method. Table 2 shows the VIF values calculated among the
conditioning factors in the landslide, rockfall, and debris flow
data samples. According to the previous works [26], a VIF
value of greater than four is considered high collinear. Thus,
the corresponding factors should be removed from further
analysis. The highest VIF value is 3.107 for slope factor in
the landslide data samples, 3.272 for STI factor in the rockfall
data samples, and 3.155 for altitude factor in the debris flow
FIGURE 5. Assessment of the factors subset selected by ACO based on
data samples. As a result, none of the factors was removed. overall accuracy for (a) landslide, (b) rockfall, and (c) debris flow.
B. THE RESULT OF THE OPTIMIZED CONDITIONING

FACTORS
ACO was used to choose the optimum conditioning fac- In which x and x are the original and scaled values, respec-
tors subset for shallow landslide, rockfall, and debris flow tively, and mini is the minimum value of factor i and max i is
probability. The ACO approach was performed in MATLAB the maximum values of factor i.
R2016b. The preparation of testing and training datasets were With the training and testing datasets, many percent sub-
conducted in Excel sheets and GIS. Scaling process was per- sets of factors were examined with the RF, SVM, and
formed for all used factors during the data processing stage. LR algorithms using the area under curve and overall
This to minimize computation complexity and to prevent accuracy.
factors with high numerical ranges from dominating those In this study, the implementation of ACO approach has the
with low numerical ranges. Equation (5) was used to linearly fiexibility to set the factors number needed to be chosen. Ten
scale each factor to [0, 1]: experiments were performed to choose the best subset starting
from 10% to 100% of the 17 selected factors. 100 iterations
x−mini were conducted in each experiment and then the best subset
x= (5) was chosen. Figure 5 shows the best factors subsets selected
maxi − mini
VOLUME 7, 2019 74577
TABLE 3. Selected conditioning factors based on ACO for each percent subset for shallow landslide.
TABLE 4. Selected conditioning factors based on ACO for each percent subset for rockfall.
by ACO. It also illustrates the classification performance for shallow landslide, rockfall, and debris flow. The ideal
according to the overall accuracy. The highest overall accu- number of factors among 17 available factors is 14, 12, and 15
racy was observed in experiment 8, 7, and 9 for shallow for shallow landslide, rockfall, and debris flow, respectively.
landslide, rockfall, and debris flow, respectively. In which 14, Careful analysis of Table 3 indicates that many factors, such
12, and 15 factors were selected by the algorithm for shallow as slope, distance to road, and distance to river are strongly
landslide, rockfall, and debris flow, respectively. Accord- affect the shallow landslide probability. In addition, RF out-
ingly, this subset was considered the best factors subset for performs LR and SVM, respectively. However, with the three
the dataset and the study area. evaluators the best subset was in experiment 8 with 80 %
The selected factors in each experiment for shallow land- of conditioning factors (14 factors). Table 4 shows the best
slide, rockfall, and debris flow are shown in Table 3, Table 4, subset of factors for rockfall probability. Slope, distance to
and Table 5, respectively, and the factors names are presented lineament and TRI, are found highly important for rockfall
in Table 1. probability mapping. The best subset was observed in experi-
The result indicates that utilizing a big number of condi- ment 7 with 70 % of conditioning factors (12 factors). The
tioning factors does not necessarily increase the accuracy of highest overall accuracy and AUC were recorded with RF
the calcification. The experiments illustrate that the accuracy evaluator which are 86 and 0.89, respectively. The best factors
of classification deteriorated after experiment 8, 7, and 9 subset for debris flow is shown in Table 5. Factors such
74578 VOLUME 7, 2019

FIGURE 6. Probability maps based on stacking LR – RT model of (a) shallow landslide, (b) rockfall, and (c) debris flow.
TABLE 5. Selected conditioning factors based on ACO for each percent subset for debris flow.
TABLE 6. Accuracy assessment of the individual algorithms for different landslide types.
as distance to road, TRI, and vegetation density are found conditioning factors (15 factors). LR outperforms RF and
strongly influence the probability of debris flow. Neverthe- SVM with overall accuracy of 80. However, based on AUC,
less, the best subset is in experiment 9 with 90 % of the RF outperforms LR and SVM, respectively.
VOLUME 7, 2019 74579

The optimal k value was determined to be 5 for all the

three areas (Gunung Rapat, Gua Tambun, and Gunung Lang)
(Figure 7). The average number of iterations and regulariza-
tion value were 500 and 0.01, respectively. The means and
standard deviations of five slope angle distributions (MU) are
also computed. For Gunung Lang area, GMM has determined
5 MU values, 1.78, 6.07, 16.04, 41.25, and 63.54, for five
geomorphological units, plains, foot slopes, moderately steep
slopes, steep slopes, and cliffs. The plain unit indicates low
slope angles corresponding to the fluvial and fluvioglacial
deposits. Foot slope is gentle slope angles featuring the lower
part of the hillslope characterized by colluvial fans, debris
flow, and landslide deposits. Moderately steep slopes and
steep slopes are the units those containing deposits and rocky
outcrops covered with vegetation. On the other hand, cliffs are
very steep slopes, which correspond to rocky outcrops. The
slope angles distributions for Gua Tambun area were 1.92,
5.18, 13.35, 37.60, and 63.78. On the other hand, the MU
values for Gunung Rapat area were 1.46, 6.23, 16.43, 43.21,
and 66.31. The debris flow threshold identified when moder-
ate steep slope dominates foot-slope. Whereas, shallow land-
slide determined when steep slope dominates moderate steep
slope. On the other hand, rockfall source area located when
cliff dominates steep slope. In Gunung Lang area, the slope
thresholds were determined as 10◦ – 23◦ , 23◦ – 59◦ , and >
59◦ for debris flow, shallow landslide, and rockfall, respec-
tively. Whereas, in Gua Tambun area, the slope thresholds
were found 9◦ – 19◦ , 19◦ – 57◦ , and > 57◦ for debris flow,
shallow landslide, and rockfall, respectively. On the other
FIGURE 7. Slope distribution and thresholds based on GMM for (a) Gua
hand, the slope thresholds were calculated as 9◦ – 24◦ , 24◦
Tambun, (b) Gunung Rapat, and (c) Gunung Lang. – 58◦ , and > 58◦ for debris flow, shallow landslide, and
rockfall, respectively. Therefore, the used slope thresholds
are 9◦ – 22◦ , 22◦ – 58◦ , and > 58◦ for debris flow, shallow
C. THE RESULTS OF STACKING LR-RT
landslide, and rockfall, respectively.
The produced probability maps of shallow landslide, rockfall,
and debris flow based on stacking LR - RF model are illus- E. LAND-USE PROBABILITY AREA
trated in Figure 6. Figure 6a shows the probability map of Land-use probability area is a raster encompasses of two
shallow landslide occurrence, the probability maps of rockfall classes that represent the regions which are probable or not
and debris flow occurrences are shown in Figure 6b and probable to encounter landslides. This raster is formed based
Figure 6c, respectively. The stacking LR - RT model pre- on the land-use map. The land-use factor is classified into two
dicted the occurrence probability, ranging from 0 (no poten- classes by integrating residential building, river, transporta-
tial of landslide/rockfall/debris flow occurrence) to 1 (very tion, other building, a cemetery, and water body in one class
high potential of landslide/rockfall/debris flow occurrence). and the other class contains forest, mixed vegetation, and
However, to simplify the interpretation of the maps, the open land. The advantage of this binary raster is reducing the
probability values were reclassified into five classes using model sensitivity to the spatial variation in the conditioning
quantile method. The classes were very low, low, moderate, factors and the computing time. In average, 23% of Kinta
high, and very high. According to the generated probability Valley is prone to landslides occurrences.
maps Gua Tambun, Gunung Rapat, and Gunung Lang areas
are highly potential for landslides occurrences. However, F. RESULT OF ROCKFALL SOURCES REGIONS
the highest probability density was observed in rockfall prob- The intersections of slope angles thresholds determined by
ability map following by shallow landslide and debris flow GMM and probability values resulted in the potential land-
probability maps, respectively. slide and rockfall source areas. This section presents the
results of detecting potential landslide and rockfall source
D. THE RESULT OF GAUSSIAN MIXTURE MODEL (GMM) areas. The potential sources of landslides were detected by
This section presents the distributions of slope angle and intersecting the slope angles threshold (23◦ ) and the proba-
thresholds calculated through the optimized GMM model. bility values estimated through stacking LR – RT model.
74580 VOLUME 7, 2019

TABLE 7. Accuracy assessment of bagging models based on the selected algorithms.
TABLE 8. Accuracy assessment of voting models based on the selected algorithms.
TABLE 9. Accuracy assessment of stacking models based on the selected algorithms
The identification of probable rockfall sources was per- PRC, and overall accuracy. RF outperforms ANN, LR, SVM,
formed by crossing the obtained threshold of slope angle and RT, respectively. The highest accuracy (ROC = 0.89,
(> 58◦ ) and the probability values of the sacking LR - RT. PRC = 0.86, and overall accuracy = 85.62), was achieved
Figure 8 demonstrates the result of potential rockfall sources by RF with rockfall data samples. However, all algorithms
in the study areas. The inventories of rockfalls are mainly achieved the highest accuracy with rockfall data samples in
situated in the regions those detected as probable source areas comparison with shallow landslide and debris flow. On the
proving that the proposed model is eligible to identify the other hand, LR, RT, and ANN performed better with debris
sources of both rockfalls and landslides in the selected areas. flow data samples than shallow landslide. Whereas, RF and
SVM performed better with shallow landslide data sam-
G. THE ASSESSMENT OF THE MODEL ACCURACY ples than debris flow. The comparison of bagging models
In order to validate the proposed ensemble model for pre- is shown in Table 7. RF outperformed the other machine
dicting the probable landslides the success (ROC) and pre- learning algorithms. Nevertheless, RF in bagging model was
diction (PRC) rate curves were employed [47]. The ROC slightly improved in comparison with individual RF this is
is generated based on the training landslide dataset while because RF itself is ensemble model. On the other hand,
PRC was produced utilizing the validation landslide dataset. the bagging models of the other algorithms showed bet-
Moreover, the area under curve (AUC) is used to identify the ter improvement than the individual implementation of the
accuracy of the probability maps [49]–[54]. same algorithms. Table 8 demonstrates the comparison of
Table 6 illustrates the comparison among the various voting models. Nine voting ensemble models were tested
machine learning algorithms individually based on ROC, based on the selected machine learning algorithms using
VOLUME 7, 2019 74581

FIGURE 8. The probable rockfall source areas and the inventory dataset at (a) Gua Tambun, (b) Gunung Rapat, and (c) Gunung Lang.
TABLE 10. Comparison of the recently proposed ensemble models.
rockfall, shallow landslide, and debris flow data samples. LR – RT model was employed to produce the probability
The accuracy of some algorithms was improved by adding maps of shallow landslide, rockfall, and debris flow. Table 10
another algorithm while some the accuracy of some algo- shows the comparative result of the assessment of the pro-
rithms was decreased by adding another algorithm. For posed ensemble models in recent literature.
instance, the accuracy of LR was improved by adding RT, Since the stacking LR – RT model performs well on both
RF, and ANN, whereas the accuracy of RF was decreased datasets (training and validation), the proposed ensemble
by adding SVM and ANN. Table 9 illustrates the compar- model is thus considered effective for landslides probability
ison of different stacking models. The stacking LR – RT mapping. In addition, as the model achieved a good accuracy
model outperformed the other tested models with all land- on validation landslides, it is expected to perform well in other
slides data samples (shallow landslide, rockfall, and debris regions that have similar or nearly similar condition as the
flow). The highest accuracy that achieved using stacking tested area.
LR – RT model with rockfall data samples is 0.94, 0.95,
and 92.89 of ROC, PRC, and overall accuracy. However, H. FIELD VERIFICATION
the model achieved good accuracy with shallow landslide In-situ survey was carried out to verify the produced map of
and debris flow of 0.86, 0.88, and 86.46, and 0.84, 0.81, rockfall sources areas. Several locations were selected ran-
and 83.95 of ROC, PRC, and overall accuracy, respectively. domly that distributed on the entire study area. Figure 9 shows
In comparison with some other tested models in the literature, the selected locations and the in-situ photos. All locations
the stacking LR - RT model achieved better accuracy than were found reasonably prone to rockfall incidents.
a hybrid Bagging based Support Vector Machines (BSVM) Discontinuities and fractures are clear in the selected
model [54], Decision tree (DT)-based CHi-squared Auto- locations especially along the cliff face. In addition, some
matic Interaction Detection (CHAID) model [56], a hybrid inventory fallen rocks were observed in various locations
approach of Reduced Error Pruning Trees (REPT) and Ran- close to the foot-slope. According to different interviews that
dom Subspace Ensemble (RSS) [57], and a hybrid fuzzy performed in the study area, these areas were encountered
weight of evidence model [58]. Consequently, the stacking rockfall incidents in different periods of time.
74582 VOLUME 7, 2019

RF achieved the highest overall accuracy of 86.64%, 92.89%,

and 83.95% with shallow landslide, rockfall, and debris flow
data samples, respectively. In addition, as the model achieved
a good accuracy on validation landslides, it is expected to
perform well in other regions that have similar or nearly
similar condition as the tested area. Although the proposed
ensemble model achieved good accuracy with both testing
and validation dataset, it is difficult to differentiate rockfall
from other landslides types relying on just the produced
probability maps. This is due to the overlapping in some areas
that are probable to encounter different types of landslides.
On the other hand, the slope thresholds obtained through
GMM model contributed to solving this problem thus identify
FIGURE 9. The verified in-situ locations within the study area. each landslide type accurately. However, the proposed model
can be tested in another study area to prove the performance
of the model. In addition, the accuracy can be improved by
increasing the number of the inventory dataset.
VI. CONCLUSION
Rockfalls are the main threat for people and their properties ACKNOWLEDGMENT
in addition to the transportation ways and infrastructure. The The authors wish to thank the Department of Mineral
identification of rockfall source areas is a key element in and Geosciences, the Department of Surveying Malaysia,
rockfall hazard and risk assessment. However, rockfall source the Federal Department of Town and Country Planning
identification is the most crucial and challenging task. This Malaysia for the data provided.
difficulty rises where the study area contains different land-
slide types, such as shallow landslide and debris flow that REFERENCES
have similar geo-morphometric characteristics. In order to [1] D. J. Varnes, ‘‘Slope movement types and processes,’’ Special Rep.,
solve such problem, this paper developed an ensemble model vol. 176, no. 176, pp. 11–33, 1978.
[2] M. K. Ansari, M. Ahmad, R. Singh, and T. N. Singh, ‘‘2D and 3D rockfall
for rockfall sources identification in presence of another hazard analysis and protection measures for Saptashrungi Gad Temple,
landslide types. This based on stacking LR-RT and Gaussian Vani, Nashik, Maharashtra—A case study,’’ J. Geol. Soc. India, vol. 91,
Mixture Model (GMM). no. 1, pp. 47–56, Jan. 2018.
[3] A. M. Fanos and B. Pradhan, ‘‘Laser scanning systems and techniques in
LiDAR data provide fundamental GIS information for sev- rockfall source identification and risk assessment: A critical review,’’ Earth
eral geospatial applications, including site selection, urban Syst. Environ., vol. 2, no. 2, pp. 163–182, 2018.
planning, natural hazard and risk evaluation. ACO was uti- [4] K. Messenzehl, H. Meyer, J.-C. Otto, T. Hoffmann, and R. Dikau,
‘‘Regional-scale controls on the spatial activity of rockfalls (Turtmann Val-
lized due to its advantage and efficiency in comparison with ley, Swiss Alps)—A multivariate modeling approach,’’ Geomorphology,
many other methods for the selection of the best factors subset vol. 287, pp. 29–45, Jun. 2017.
fourteen among seventeen conditioning factors (80% of the [5] B, Matasci, M. Jaboyedoff, A. Loye, A. Pedrazzini, M.-H. Derron, and
G. Pedrozzi, ‘‘Impacts of fracturing patterns on the rockfall susceptibil-
used factors), were selected for shallow landslide probability ity and erosion rate of stratified limestone,’’ Geomorphology, vol. 241,
based on the highest accuracy of 0.87 and 82% for AUC and pp. 83–97, Jul. 2015.
overall accuracy, respectively. On the other hand, twelve of [6] A. M. Fanos and B. Pradhan, ‘‘Multi-scenario rockfall hazard assess-
ment using LiDAR data and GIS,’’ Geotech. Geol. Eng., vol. 34, no. 5,
the best subset of conditioning factors were selected for the pp. 1375–1393, 2016.
producing of rockfall probability based on the highest overall [7] B. Pradhan and A. M. Fanos, ‘‘Rockfall hazard assessment: An overview,’’
accuracy (86%) and area under curve (AUC = 0.89). For in Laser Scanning Applications in Landslide Assessment. Cham,
Switzerland: Springer, 2017, pp. 299–322.
the generating of debris flow probability map, fifteen on the [8] P. Frattini, G. Crosta, A. Carrara, and F. Agliardi, ‘‘Assessment of rockfall
best conditioning factors were chosen based on the highest susceptibility by integrating statistical and physically-based approaches,’’
accuracy of 0.85 and 79% for AUC and overall accuracy. Geomorphology, vol. 94, nos. 3–4, pp. 419–437, 2008.
[9] C. Corona, D. Trappmann, and M. Stoffel, ‘‘Parameterization of rockfall
The GMM model was integrated with stacking LR – RT source areas and magnitudes with ecological recorders: When disturbances
model to automatically determining the slope thresholds and in trees serve the calibration and validation of simulation runs,’’ Geomor-
detecting the high probability of landslide/rockfall/debris phology, vol. 202, pp. 33–42, Nov. 2013.
[10] L. Losasso, M. Jaboyedoff, and F. Sdao, ‘‘Potential rock fall source areas
flow occurrences combined. The GMM was precisely defined identification and rock fall propagation in the province of Potenza territory
the distributions of slope angle and identify the thresh- using an empirically distributed approach,’’ Landslides, vol. 14, no. 5,
olds of slope angle that permitted the source identification pp. 1593–1602, Oct. 2017.
[11] E. Hoek, J. D. Bray, Rock Slope Engineering. Boca Raton, FL, USA: CRC
of debris flow (at 9◦ – 22◦ ), shallow landslide (at 22◦ – Press, 2014.
58◦ ), and rockfall (at > 58◦ ). On the other hand, stacking [12] D. R. Montgomery and M. T. Brandon, ‘‘Topographic controls on erosion
LR - RF attained the best result in comparison with other rates in tectonically active mountain ranges,’’ Earth Planet. Sci. Lett.,
vol. 201, nos. 3–4, pp. 481–489, 2002.
single and ensemble models such as LR, RF, RF, SVM, and [13] A. N. Strahler, ‘‘Statistical analysis in geomorphic research,’’ J. Geol.,
ANN, and their voting and bagging models. Stacking LR - vol. 62, no. 1, pp. 1–25, Jan. 1954.
VOLUME 7, 2019 74583

[14] M. N. Jebur, B. Pradhan, and M. S. Tehrany, ‘‘Optimization of landslide [36] L.-L. Li and J.-K. Wang, ‘‘SAR image ship detection based on ant colony
conditioning factors using very high-resolution airborne laser scanning optimization,’’ in Proc. 5th Int. Congr. Image Signal Process., Oct. 2012,
(LiDAR) data at catchment scale,’’ Remote Sens. Environ., vol. 152, pp. 1100–1103.
pp. 150–165, Sep. 2014. [37] J. Gottlieb, M. Puchta, and C. Solnon, ‘‘A study of greedy, local search, and
[15] F. Luo, B. Du, L. Zhang, L. Zhang, and D. Tao, ‘‘Feature learning ant colony optimization approaches for car sequencing problems,’’ in Proc.
using spatial-spectral Hypergraph discriminant analysis for hyperspectral Workshops Appl. Evol. Comput. Berlin, Germany: Springer, Apr. 2003,
image,’’ IEEE Trans. Cybern., vol. 49, no. 7, pp. 2406–2419, Jul. 2019. pp. 246–257.
[16] L. Zhang, L. Zhang, B. Du, J. You, and D. Tao, ‘‘Hyperspectral image [38] M. Dorigo and T. Stützle, ‘‘The ant colony optimization metaheuristic:
unsupervised classification by robust manifold matrix factorization,’’ Inf. Algorithms, applications, and advances,’’ in Handbook of Metaheuristics.
Sci., vol. 485, pp. 154–169, Jun. 2019. Boston, MA, USA: Springer, 2003, pp. 250–285.
[17] Y. Gu, Q. Wang, X. Jia, and J. A. Benediktsson, ‘‘A novel MKL model [39] R. S. Parpinelli, H. S. Lopes, and A. A. Freitas, ‘‘Data mining with an ant
of integrating LiDAR data and MSI for urban area classification,’’ IEEE colony optimization algorithm,’’ IEEE Trans. Evol. Comput., vol. 6, no. 4,
Trans. Geosci. Remote Sens., vol. 53, no. 10, pp. 5312–5326, Oct. 2015. pp. 321–332, Aug. 2002.
[18] S. P. Potharaju and M. Sreedevi, ‘‘Ensembled rule based classification [40] R. Putha, L. Quadrifoglio, and E. Zechman, ‘‘Comparing ant colony
algorithms for predicting imbalanced kidney disease data,’’ J. Eng. Sci. optimization and genetic algorithm approaches for solving traffic sig-
Technol. Rev., vol. 9, no. 5, pp. 201–207, 2016. nal coordination under oversaturation conditions,’’ Comput. Aided Civil
[19] X. X. Zhu, D. Tuia, L. Mou, G.-S. Xia, L. Zhang, F. Xu, and F. Fraundorfer, Infrastruct. Eng., vol. 27, no. 1, pp. 14–28, 2012.
‘‘Deep learning in remote sensing: A comprehensive review and list of [41] Z. Jiang, M. Zhou, J. Tong, H. Jiang, Y. Yang, A. Wang, and Z. You, ‘‘Com-
resources,’’ IEEE Geosci. Remote Sens. Mag., vol. 5, no. 4, pp. 8–36, paring an ant colony algorithm with a genetic algorithm for replugging tour
Dec. 2017. planning of seedling transplanter,’’ Comput. Electron. Agricult., vol. 113,
[20] M. Akour, I. Alsmadi, and I. Alazzam, ‘‘Software fault proneness pre- pp. 225–233, Apr. 2015.
diction: A comparative study between bagging, boosting, and stacking [42] D. N. Kumar and M. J. Reddy, ‘‘Ant colony optimization for multi-purpose
ensemble and base learner methods,’’ Int. J. Data Anal. Techn. Strategies, reservoir operation,’’ Water Resour. Manage., vol. 20, no. 6, pp. 879–898,
vol. 9, no. 1, pp. 1–16, 2017. 2006.
[21] A. M. Fanos, B. Pradhan, A. A. Aziz, M. N. Jebur, and H.-J. Park, ‘‘Assess- [43] A. M. Youssef, H. R. Pourghasemi, Z. S. Pourtaghi, and M. M. Al-Katheeri,
ment of multi-scenario rockfall hazard based on mechanical parameters ‘‘Landslide susceptibility mapping using random forest, boosted regression
using high-resolution airborne laser scanning data and GIS in a tropical tree, classification and regression tree, and general linear models and
area,’’ Environ. Earth Sci., vol. 75, no. 15, p. 1129, Aug. 2016. comparison of their performance at Wadi Tayyah Basin, Asir Region, Saudi
[22] B. Pradhan and A. M. Fanos, ‘‘Application of LiDAR in rockfall hazard Arabia,’’ Landslides, vol. 13, no. 5, pp. 839–856, 2016.
assessment in tropical region,’’ in Laser Scanning Applications in Land- [44] B. T. Pham, B. Pradhan, D. T. Bui, I. Prakash, and M. B. Dholakia,
slide Assessment. Cham, Switzerland: Springer, 2017, pp. 323–359. ‘‘A comparative study of different machine learning methods for landslide
susceptibility assessment: A case study of Uttarakhand area (India),’’
[23] R. Muzzillo, L. Losasso, and F. Sdao, ‘‘Rockfall source areas assessment
Environ. Model. Softw., vol. 84, pp. 240–250, Oct. 2016.
in an area of the Pollino national park (Southern Italy),’’ in Proc. Int. Conf.
Comput. Sci. Appl. Cham, Switzerland: Springer, Jul. 2018, pp. 366–379. [45] L. Xiao, Y. Zhang, and G. Peng, ‘‘Landslide susceptibility assessment
using integrated deep learning algorithm along the China-Nepal highway,’’
[24] A. M. Fanos, B. Pradhan, S. Mansor, Z. M. Yusoff, and A. F. B. Abdullah,
Sensors, vol. 18, no. 12, p. 4436, 2018.
‘‘A hybrid model using machine learning methods and GIS for potential
[46] Z. Ouyang, X. Sun, J. Chen, D. Yue, and T. Zhang, ‘‘Multi-view stacking
rockfall source identification from airborne laser scanning data,’’ Land-
ensemble for power consumption anomaly detection in the context of
slides, vol. 15, no. 9, pp. 1833–1850, 2018.
industrial Internet of Things,’’ IEEE Access, vol. 6, pp. 9623–9631, 2018.
[25] B. Pradhan, M. H. Abokharima, M. N. Jebur, and M. S. Tehrany, ‘‘Land
[47] Y. Chen, T. T. Georgiou, and A. Tannenbaum, ‘‘Optimal transport for
subsidence susceptibility mapping at Kinta Valley (Malaysia) using the
Gaussian mixture models,’’ IEEE Access, vol. 7, pp. 6269–6278, 2019.
evidential belief function model in GIS,’’ Natural Hazards, vol. 73, no. 2,
[48] C. Michoud, M.-H. Derron, P. Horton, M. Jaboyedoff, F.-J. Baillifard,
pp. 1019–1042, 2014.
A. Loye, P. Nicolet, A. Pedrazzini, and A. Queyrel, ‘‘Rockfall hazard and
[26] Y. Wu, B. Jiang, and N. Lu, ‘‘A descriptor system approach for estimation
risk assessments along roads at a regional scale: Example in Swiss Alps,’’
of incipient faults with application to high-speed railway traction devices,’’
Natural Hazards Earth Syst. Sci., vol. 12, no. 3, pp. 615–629, Mar. 2012.
IEEE Trans. Syst., Man, Cybern. Syst., to be published.
[49] A. Loye, M. Jaboyedoff, and A. Pedrazzini, ‘‘Identification of potential
[27] H. Hong, B. Pradhan, M. I. Sameen, W. Chen, and C. Xu, ‘‘Spatial pre- rockfall source areas at a regional scale using a DEM-based geomor-
diction of rotational landslide using geographically weighted regression, phometric analysis,’’ Natural Hazards Earth Syst. Sci., vol. 9, no. 5,
logistic regression, and support vector machine models in Xing Guo area pp. 1643–1653, 2009.
(China),’’ Geomatics, Natural Hazards Risk, vol. 8, no. 2, pp. 1997–2022, [50] N. Intarawichian and S. Dasananda, ‘‘Frequency ratio model based land-
2017. slide susceptibility mapping in lower Mae Chaem watershed, Northern
[28] Z. Chen, B. Gao, and B. Devereux, ‘‘State-of-the-art: DTM generation Thailand,’’ Environ. Earth Sci., vol. 64, no. 8, pp. 2271–2285, 2011.
using airborne LIDAR data,’’ Sensors, vol. 17, no. 1, p. 150, 2017. [51] A. Kornejady, H. R. Pourghasemi, and S. F. Afzali, ‘‘Presentation of RFFR
[29] T. J. Pingel, K. C. Clarke, and W. A. McBride, ‘‘An improved simple new ensemble model for landslide susceptibility assessment in Iran,’’ in
morphological filter for the terrain classification of airborne LIDAR data,’’ Landslides: Theory, Practice and Modelling. Cham, Switzerland: Springer,
ISPRS J. Photogramm. Remote Sens., vol. 77, pp. 21–30, Mar. 2013. 2019, pp. 123–143.
[30] C. Chen, Y. Li, W. Li, and H. Dai, ‘‘A multiresolution hierarchical clas- [52] W. Chen, H. Shahabi, A. Shirzadi, T. Li, C. Guo, H. Hong, W. Li, D. Pan,
sification algorithm for filtering airborne LiDAR data,’’ ISPRS J. Pho- J. Hui, M. Ma, M. Xi, and B. B. Ahmad, ‘‘A novel ensemble approach
togramm. Remote Sens., vol. 82, pp. 1–9, Aug. 2013. of bivariate statistical-based logistic model tree classifier for landslide
[31] J. Li and A. D. Heap, ‘‘Spatial interpolation methods applied in the environ- susceptibility assessment,’’ Geocarto Int., vol. 33, no. 12, pp. 1398–1420,
mental sciences: A review,’’ Environ. Model. Softw., vol. 53, pp. 173–189, 2018.
Mar. 2014. [53] B. T. Pham, A. Shirzadi, D. T. Bui, I. Prakash, and M. B. Dholakia,
[32] W. M. Abdulwahid and B. Pradhan, ‘‘Landslide vulnerability and risk ‘‘A hybrid machine learning ensemble approach based on a radial basis
assessment for multi-hazard scenarios using airborne laser scanning data function neural network and rotation forest for landslide susceptibility
(LiDAR),’’ Landslides, vol. 14, no. 3, pp. 1057–1076, 2017. modeling: A case study in the Himalayan area, India,’’ Int. J. Sediment
[33] J. Shang, X. Wang, X. Wu, Y. Sun, Q. Ding, J.-X. Liu, and H. Zhang, Res., vol. 33, no. 2, pp. 157–170, 2018.
‘‘A review of ant colony optimization based methods for detecting epistatic [54] J. Mathew, V. K. Jha, and G. S. Rawat, ‘‘Landslide susceptibility zonation
interactions,’’ IEEE Access, vol. 7, pp. 13497–13509, 2019. mapping and its validation in part of Garhwal Lesser Himalaya, India,
[34] L.-Y. Cao and L.-Z. Xia, ‘‘An ant colony optimization approach for SAR using binary logistic regression analysis and receiver operating character-
image segmentation,’’ in Proc. Int. Conf. Wavelet Anal. Pattern Recognit., istic curve method,’’ Landslides, vol. 6, no. 1, pp. 17–26, 2009.
Nov. 2007, pp. 296–300. [55] H. R. Pourghasemi, A. G. Jirandeh, B. Pradhan, C. Xu, and C. Gokceoglu,
[35] H. B. Alwan and K. R. Ku-Mahamud, ‘‘Optimizing support vector machine ‘‘Landslide susceptibility mapping using support vector machine and GIS
parameters using continuous ant colony optimization,’’ in Proc. 7th Int. at the Golestan Province, Iran,’’ J. Earth Syst. Sci., vol. 122, no. 2,
Conf. Comput. Converg. Technol. (ICCCT), Dec. 2012, pp. 164–169. pp. 349–369, 2013.
74584 VOLUME 7, 2019

[56] B. T. Pham, D. T. Bui, and I, Prakash, ‘‘Bagging based support vector BISWAJEET PRADHAN received the Habilita-
machines for spatial prediction of landslides,’’ Environ. Earth Sci., vol. 77, tion degree in remote sensing from the Dres-
no. 4, p. 146, Feb. 2018. den University of Technology, Germany, in 2011.
[57] O. F. Althuwaynee, B. Pradhan, H.-J. Park, and J. H. Lee, ‘‘A novel He is currently the Director of the Centre for
ensemble decision tree-based CHi-squared automatic interaction detection Advanced Modelling and Geospatial Information
(CHAID) and multivariate logistic regression models in landslide suscep- Systems (CAMGIS), Faculty of Engineering and
tibility mapping,’’ Landslides, vol. 11, no. 6, pp. 1063–1078, 2014. IT. He is also the Distinguished Professor with the
[58] B. T. Pham and I. Prakash, ‘‘A novel hybrid model of bagging-based Naïve
University of Technology Sydney. He is also an
Bayes Trees for landslide susceptibility assessment,’’ Bull. Eng. Geol.
internationally established Scientist in the fields
Environ., vol. 78, no. 3, pp. 1911–1925, 2017.
[59] H. Hong, I. Ilia, P. Tsangaratos, W. Chen, and C. Xu, ‘‘A hybrid fuzzy of geospatial information systems (GIS), remote
weight of evidence method in landslide susceptibility analysis on the sensing and image processing, complex modeling/geo-computing, machine
Wuyuan area, China,’’ Geomorphology, vol. 290, pp. 1–16, Aug. 2017. learning and soft-computing applications, natural hazards, and environmen-
tal modeling. Since 2015, he has been serving as the Ambassador Scientist
for the Alexander Humboldt Foundation, Germany. Out of his more than
450 articles, more than 400 have been published in science citation index
(SCI/SCIE) technical journals. He has authored eight books and 13 book
chapters. He was a recipient of the Alexander von Humboldt Fellowship
ALI MUTAR FANOS was born in Baghdad, from Germany. He received 55 awards in recognition of his excellence in
Iraq, in 1987. He received the bachelor’s degree teaching, service, and research, since 2006. He was also a recipient of the
in surveying engineering from the University of Alexander von Humboldt Research Fellowship from Germany. From 2016 to
Baghdad, Iraq, in 2008, and the M.Sc. degree 2018, he was listed as the World’s Most Highly Cited Researcher by Clarivate
in GIS and geomatic engineering from Univer- Analytics Report as one of the world’s most influential mind. In 2018,
siti Putra Malaysia (UPM), in 2016, where he he was awarded as the World Class Professor by the Ministry of Research,
is currently pursuing the Ph.D. degree in geo- Technology and Higher Education, Indonesia. He is also an Associate Editor
graphic information system (GIS) with the Faculty and an Editorial Member of more than eight ISI journals. He has widely
of Engineering. Since 2008, he has been a Senior travelled abroad, visiting more than 52 countries to present his research
Engineer with the State Commission on Survey, findings.
Ministry of Water Resources, in different projects in Iraq.
VOLUME 7, 2019 74585
View publication stats

MutarandPradhan2019 IEEEACCESS FullCitation

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

MutarandPradhan2019 IEEEACCESS FullCitation

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

A Spatial Ensemble Model for Rockfall Source Identiﬁcation from High

Article in IEEE Access · May 2019

Ali Fanos Biswajeet Pradhan

SEE PROFILE SEE PROFILE

The user has requested enhancement of the downloaded file.

A Spatial Ensemble Model for Rockfall Source

Sydney, Ultimo, NSW 2007, Australia

Corresponding author: Biswajeet Pradhan (biswajet.pradhan@uts.edu.au)

I. INTRODUCTION goods. Rockfall can be caused by rainfall, weathering, joint-

VOLUME 7, 2019 74571

FIGURE 1. Study area.

III. USED DATA

A. ROCKFALL CONDITIONING FACTORS

74572 VOLUME 7, 2019

subsets (training (70%) and testing (30%)) ensuring each

VOLUME 7, 2019 74573

74574 VOLUME 7, 2019

training dataset into two separate groups, train many base

VOLUME 7, 2019 74575

With mean vector µi and covariance matrix i The mix-

74576 VOLUME 7, 2019

TABLE 2. The assessment of multicollinearity for the conditioning factors

B. THE RESULT OF THE OPTIMIZED CONDITIONING

74578 VOLUME 7, 2019

VOLUME 7, 2019 74579

The optimal k value was determined to be 5 for all the

74580 VOLUME 7, 2019

TABLE 7. Accuracy assessment of bagging models based on the selected algorithms.

TABLE 8. Accuracy assessment of voting models based on the selected algorithms.

TABLE 9. Accuracy assessment of stacking models based on the selected algorithms

VOLUME 7, 2019 74581

TABLE 10. Comparison of the recently proposed ensemble models.

74582 VOLUME 7, 2019

RF achieved the highest overall accuracy of 86.64%, 92.89%,

VOLUME 7, 2019 74583

74584 VOLUME 7, 2019

VOLUME 7, 2019 74585

View publication stats

You might also like