You are on page 1of 20

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/373389345

Flash‑flood susceptibility mapping: a novel credal decision tree‑based


ensemble approaches

Article in Earth Science Informatics · August 2023


DOI: 10.1007/s12145-023-01057-w

CITATIONS READS

0 61

6 authors, including:

Ujwal Deep Saha Aznarul Islam


Vidyasagar College Aliah University
24 PUBLICATIONS 71 CITATIONS 129 PUBLICATIONS 849 CITATIONS

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Fluvial Systems in the Anthropocene- Process, Response and Modelling View project

Spatial Modeling of Environmental Pollution, and Ecological Risk (Scopus Index, Elsevier) View project

All content following this page was uploaded by Ujwal Deep Saha on 30 August 2023.

The user has requested enhancement of the downloaded file.


Earth Science Informatics
https://doi.org/10.1007/s12145-023-01057-w

RESEARCH

Flash‑flood susceptibility mapping: a novel credal decision tree‑based


ensemble approaches
Dingying Yang1 · Ting Zhang1 · Alireza Arabameri2 · M. Santosh3,4 · Ujwal Deep Saha5 · Aznarul Islam6

Received: 2 May 2023 / Accepted: 18 July 2023


© The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2023

Abstract
Escalation in flash floods and the enhanced devastations, especially in the arid and semiarid regions of the world has required
precise mapping of the flash flood susceptible zones. In this study, we applied six novel credal decision tree (CDT)-based
ensemble models—1. CDT, 2. CDT Alternative Decision Tree (ADTree), 3. CDT- Reduced Error Pruning Tree (REPT), 4.
CDT- Rotational Forest (RF), 5. CDT-FT, 6. CDT- Naïve Bias Tree (NBTree). For preparing the flash flood susceptibility
maps (FFSM), 206 flood locations were selected in the Neka-roud watershed of Iran with 70% as training data and 30% as
testing data. Moreover, 18 flood conditing factors were considered for FFSM and a multi-colinearity test was performed for
determining the role of the factors. Our results show that the distance from the stream plays a vital role in flash floods. The
CDT-FT is the best-fit model out of the six novel algorithms employed in this study as demonstrated by the highest values
of the area under the curve (AUC) of the receiver operating curve (ROC) (AUROC 0.986 for training data and 0.981 for
testing data). Our study provides a novel approach and useful tool for flood management.

Keywords Flash flood mapping · Machine learning algorithms · Credal decision tree · Novel Ensemble models · Flood
management · Neka-roud watershed

Introduction fluvial or coastal floods, flash floods are often considered


more disastrous because it occurs abruptly leaving the local
Flash flood is a common natural hazard occurring world- community unprepared (Islam and Ghosh 2022). There has
wide especially in regions of short-duration rapid down- been increasing awareness of the importance of flash flood
pours on relatively higher slopes (Gao et al. 2023; Ghosh management (FFM) from local to global scale. Thus, map-
et al. 2023). Compared to the other forms of floods like ping and monitoring flash floods have become one of the
most important facets of the FFM. Traditionally, Analytical
Hierarchy Process (AHP) and Frequency Ratio (FR) models
Communicated by: H. Babaie
are widely used in the generation of the flash flood suscep-
* Ting Zhang tibility map (FFSM) (Tariq et al. 2022). Elkhrachy (2015)
zhangtingqwea@sina.com computed a composite flood hazard index (FHI) using satel-
Dingying Yang lite images in GIS framework to better map the flash flood
yangdingyingqwe@sina.com
2
Department of Geomorphology, Tarbiat Modares University,
Alireza Arabameri
Tehran 14117‑13116, Iran
Alireza.ameri91@yahoo.com
3
School of Earth Sciences and Resources, China University
M. Santosh
of Geosciences Beijing, 29 Xueyuan Road, Beijing 100083,
santosh@cugb.edu.cn
China
Ujwal Deep Saha 4
Department of Earth Science, University of Adelaide,
sahaujwal.geo@gmail.com
Adelaide, SA 5005, Australia
Aznarul Islam 5
Department of Geography Vidyasagar College,
aznarulislam@gmail.com
Kolkata 700006, India
1 6
College of Civil Engineering, Fuzhou University, Department of Geography, Aliah University, 17 Gorachand
Fuzhou 350116, China Road, Kolkata 700 014, West Bengal, India

13
Vol.:(0123456789)
Earth Science Informatics

area in the Najran City, Kingdom of Saudi Arabia. Alarifi methods are well documented, the credal decision tree
et al. (2022) mapped flash flood zones of the Wadi Hali in (CDT) based ensemble methods provide superior algorithms
southwestern Saudi Arabia using the AHP method in a GIS for the modelling of spatial events. For example, Arabameri
environment in which provided a useful tool for regional et al. (2021a) applied these methods in the context of gully
planning. Similarly, Dash et al. (2022) used multi-criteria erosion while Nguyen et al. (2020) used these models for the
decision analysis and GIS to produce FFSM in the Hima- groundwater flow modelling. Similarly, Gui et al. (2023a,
layan region of India which is characterized by flash floods b) applied CDT-based models for the spatial prediction of
during monsoon season. Aldhshan et al. (2019) used SAR landslides. This approach provides a high level of reliabil-
data-based binary algorithms to map the flash flood suscep- ity in the modelling for spatial prediction. Arabameri et al.
tible zones in Bangladesh. (2020a, b) have shown a higher accuracy of meta-classifiers
In recent years, due to the advances in computer science than the base classifiers. Similarly, Roy and Saha (2020)
(Huang et al. 2021; Zhang et al. 2022a, b; Li et al. 2022a, have shown a better predictive capacity of ensembles while
b; Dang et al. 2023; Wang et al. 2023a) and remote sens- applying with base classifiers than applying individually.
ing (Zhuo et al. 2022; Zhou et al. 2021, 2022a, 2022b; Yin Thus, it is recommended to set a novel base classifier and
et al. 2022, 2023a, b), modeling has been widely used (Li apply a novel meta classifier on it (Arabameri et al. 2021a,
et al. 2020; Xie et al. 2021; Wu et al. 2022; Zhang et al. b as the ensemble approach outperforms the preciseness of
2022a, b; Yang et al. 2022; Zhu et al. 2022). Recently, many individual machine learning algorithms (MLA). Compared
data-driven modelling approaches using machine learning to other statistical models and individual MLA applications,
(ML) and deep learning (DL) algorithms are widely used it is more accurate to use ensemble models with base clas-
in previous works to find out flash flood-susceptible zones sifiers for FFSM. However, these novel ensemble methods
(Saleh et al. 2020). For example, Shahabi et al. (2021a, b) have not been formulated or applied in the context of the
produced FFSM using a novel deep learning model based FFM globally.
on deep belief networks, backpropagation and genetic algo- In the present study, we have applied novel credal-based
rithm. Saleh et al. (2022) applied genetic algorithms and ensemble approaches to evaluate the severity of flash floods
ensemble methods to map the flood susceptible zones in in Iran. This is perhaps the first attempt to apply certain
an urban area. Similarly, Chakrabortty et al. (2022) used proven powerful ensemble models in mapping FFSM. Few
ML-based algorithms like support vector machine (SVM), of these hybrid models have already been applied in previ-
random forest (RF), linear regression (LR), and decision tree ous works on FFSM but the approaches were a bit discrete
(DT), to map the flash flood of the Kangsabati River Basin, as those studies have not applied only ensemble models in
India. Similarly, Yin et al. (2023a, b) used SVM, K-nearest clusters. For example, Khosravi et al. (2018)have applied a
neighbor (KNN), and RF based on the improved blending mixture of MCDM models and machine learning algorithms.
machine learning approach for mapping. Moreover, Islam Gudiyangada Nachappa et al. (2020) have also applied a
et al. (2021) mapped the flood susceptibility of the Teesta combination of MCDM models, supervised machine learn-
River Basin, Bangladesh and found that Dagging model was ing algorithms and ensemble methods. The selected ensem-
superior, followed by RF, the ANN, the SVM, and the Ran- ble models with CDT base classifier have been applied
dom Subspace (RS). mostly in mapping landslide susceptibility or gully erosion,
Previous studies identified that flash floods are mainly where they were documented as best-fit models with a rela-
concentrated in the monsoon seasons, particularly in the arid tively higher degree of significance. Thus, we tried to use
and semi-arid regions of the world (AlMahasneh et al. 2021; such powerful CDT-based ensemble models in this study
Singh et al. 1993). In January 2020 three days of heavy rain area to gauge the best-fit model with a maximum achiev-
led to flash floods across the southern provinces of the coun- able degree of significance. Examining the contribution of
try (NASA Earth Observatory 2023). The Iranian flash flood the morphometric factors along with other affecting factors
has drawn the attention of world scientists and stakeholders. of flood together is a new approach to understanding flood
Pouyan et al (2021) suggested multi-hazard (MHR) map- at a basin scale. Moreover, a new hybrid method, CDT-FT,
ping in the context of Iran for better management of natural is applied for the first time in any scientific study that deals
resources and human lives. Several studies were attempted with mapping flood susceptibility.
on the Iranian flash floods most of which were concerned Our study systematically addresses the following objec-
with the FFSM based on traditional, statistical, stochastic, tives: (1) to formulate and apply the novel credal-based
ML, and DL methods (Shahabi et al. 2021a). The methods ensemble methods for flood susceptibility maps, and (2) to
used to produce the FFSM have evolved from traditional delineate the flash flood susceptibility zones and their spatial
approaches (AHP, and FR, etc.) to the ML approaches (e.g. characteristics based on the best-fit model of the study. The
SVM, and DT, etc.). Although the applications of these present study is based on the geospatial and statistical data at

13
Earth Science Informatics

the watershed scale in Iran to find the best possible solution maximum slope of the study area is 76.22° and the mean
in the context of flash floods to aid policymakers in future is 16.58. The southern and southeastern landscapes are
planning and development. Moreover, as this study is prob- much rougher terrains. The topography is quite heteroge-
ably the first attempt at the FFM on a global scale, it can be neous: 39.85% of the region is convex, 37.46% is concave,
applied to other similar regions for flood risk management. and 22.67% is flat. nine lithological groups outcrop in the
watershed (Geology Survey of Iran, 1997), with 24.6% Mc
(Red conglomerate and sandstone) and 22.53% Tre (thick-
Methodology bedded grey oolitic limestone; thin-platy, yellow to pinkish
shaly limestone with worm tracks and well to thick-bedded
Study area dolomite and dolomitic limestone). More than 60% of the
study area is covered with forest, 21% range-land and 17%
The Neka-roud watershed is bound between longitudes 53° agriculture.
04′ 08.81″ E to 54° 08′ 53.65″ E and latitudes 35° 58′ 34.67″
N to 36° 28′ 34.68″ N, located in the Mazandran province, Preparation of flood inventory map
north of Iran (Fig. 1). The total drainage area of the Neka-
roud watershed is 3768.33 ­km2. The perimeter of the total In this study, novel credal decision tree-based ensemble
study area is 1929.79 km. The basin length of the watershed models were used to map flash flood susceptibility (FFS).
is 141.95 km. Elevations in the study area range from 95 The methodological steps are illustrated in Fig. 2. Here, Cre-
to 3711 m a.s.l., with a mean of 1443 m. The study area is dal Decision Tree (CDT) and its five ensembles are used to
humid to semi-humid and experiences a mean annual rainfall produce a flash flood susceptibility map with an aim to boost
of 690 mm. Annual rainfall ranges between 399.9 mm and the model accuracy level compared to traditional machine
1331.6 mm. The mean annual temperature is 12.8 °C. The learning approaches (MLAs). The workflow involves the fol-
north and northwest parts of the study area are flat, but the lowing steps: 1, preparation of flood inventory map; 2, selec-
south and southeast have mountainous topographies. The tion of necessary flood conditioning factors; 3, modeling

Fig. 1  Location of the study area in Iran, b location of the study area in Mazandran and Semnan provinces, c location of training and validation
flood data in the study area, d, e, f representative photos of floods in the study area

13
Earth Science Informatics

Fig. 2  Flowchart of the methodology adopted in the present study

FFS using CDT, and its ensembles; Alternative Decision conditioning factors were chosen to prepare the FFS map.
Tree (ADTree), Function tree (FT), Rotational Forest (RF), Four different perspectives such as topographical condition,
Naïve Bias Tree (NBTree), and Reduced Error Pruning Tree hydrological condition, environmental setup and basin mor-
(REPT), 4, validation and efficiency assessment of the mod- phometry are used to select different parameters which are
els and 5, preparation of the flash flood susceptible map. relevant to this study area (Fig. 3).
For preparing the FFS map, the flood location has been
considered as the dependent variable and effective flood
conditioning factors were selected as independent variables. Topographic factors
It is essential to prepare a flood inventory map for spatial
prediction of flash flood probability, for which we used the Elevation, slope and surface curvature are used as topo-
reports of the Water Resources Department, Iran from 2001 graphic parameters. The elevation is an essential factor in
to 2019, and investigative reports on disaster management evaluating the flashiness of flood events as water tends to
of Mazandran Province and wide field investigation. In the accumulate more on the lower elevation (Cao et al. 2016). In
study area, a total of 206 flood sites were identified and ran- rugged mountainous terrain, elevation defines the nature of
domly divided into two data sets, i.e. training (70%) and the fluvial process as the discharge volume is related to the
validation (30%) datasets (Fig. 1). According to the reports upstream catchment area. Here, the elevation ranges from
from the agency, there was a huge loss of properties and 95 to 3711 m.
lives caused by the flood (Fig. 1). Surface slope acts positively to increase the flashiness
of water discharge and it is a key component to under-
Preparation of conditioning factors standing the topographic properties of the region (Rah-
mati et al. 2016). The slope gradient is a major factor
Generally, there is no universal rule for selecting condition- controlling the velocity of fluid transfer along the surface
ing factors for FFS. The selection of the conditioning factors as well as the direction of runoff while it also controls the
varies from one place to another depending upon the envi- infiltration and surface accumulation of water (Chapi et al.
ronmental condition and pattern of historical occurrence of 2017). Here, the slope varies from almost flatland (0°) to
flash flood events (Gui et al. 2023a, b). Here, a total of 18 highly sloping (77°).

13
Earth Science Informatics

Fig. 3  Flood conditioning factors. a elevation, b slope, c topography ratio, m Mean bifurcation ratio (MBR), n Compactness Coefficient,
wetness index (TWI), d stream power index (SPI), e rainfall, f NDVI, o form factor, p elongation ratio, q constant of channel maintenance
g land use/land cover (LU/LC), h lithology, i curvature, j distance to (CCM), r Infiltration number
stream (DtStream), k distance to residential (DtResidential), l texture

13
Earth Science Informatics

Fig. 3  (continued)

Curvature is the direct derivative of topography and it is in terms of run-off generation on different curvature prop-
a quantitative measure of a point’s distortion on topography erties, the nature of curvature determines the magnitude of
including the topographic planform (Torcivia and López discharge accumulation (Chapi et al. 2017). Here, these three
2020). Curvature properties can be classified into convex, topographic components are computed using the ALOS DEM
concave and flat classes. Although there is an ongoing debate with a 12.5 m spatial resolution. The DEM was downloaded

13
Earth Science Informatics

free of cost from https//vertex.daac.asf.alaska.edu. Here, the of vegetation coverage on different slope facets (Gao 1996).
curvature exhibits also a wide range (-27 to 23). The concentration of vegetal cover tends to moderate the
nature of flash flood havoc by intercepting a part of the
Hydrological factors runoff Tehrany et al. (2015). Higher vegetation cover was
assumed to induce a lower probability of flood occurrence
To measure the hydrological influences of the catchment, in the study area. Here, the NDVI is computed using Eq. (3).
mean annual rainfall, distance to stream, Topographic Wetness
Index (TWI) and Stream Power Index (SPI) were considered.
NDVI = (NIR − IR) ∕ (NIR + IR) (3)
The duration and intensity of rainfall influence the flashiness NIR is the near-infrared band or band 5 and IR is the
of river discharge. Short-duration torrential rainfall can create infrared band or band 4. We used Landsat 8 OLI image
an exceptionally high peak of river discharge existing for a acquired in 2018 to prepare the NDVI map and the raster is
small timespan (Lu et al. 2020; Peptenatu et al. 2020). We have further classified as ranging from -0.06 to 0.55.
prepared a rainfall map using 30 years of daily average rainfall Land use can affect the infiltration capacity of any terrain
data measured at 12 stations located within the watershed. This condition (Islam et al. 2021). A bare surface is more capable
data was collected from the website of the Iran Meteorological of producing greater runoff compared to a vegetated surface
Association. The point source rainfall data were interpolated developed on a constant lithological foundation (Rahmati
using the Kriging method. et al. 2016). We used the Landsat 8 OLI image acquired
Areas lying at the closest proximity of the river are more in 2018 to prepare the LULC map of the watershed using
prone to experience flood and this probability decreases with the Maximum likelihood classifier module in the supervised
increasing distance from the river unless any abnormally high classification scheme. The entire watershed was classified
magnitude flood event occurs (Tehrany et al. 2015). The river into 5 classes; agricultural land, orchards, forest, rangeland
network of the catchment was extracted using the ALOS DEM and water.
and based on the Euclidean distance from channels; the map The lithology of the terrain controls the infiltration capac-
exhibited a minimum Dt 0 m with a maximum of 2788 m in ity and thus influences the nature of runoff (Shahabi et al.
the watershed. 2021a, b). Flooding can be affected by lithology and struc-
The topographic wetness index (TWI) is a hydrological ture especially porosity, permeability, and fractures Der-
metric that defines the location of possible water accumula- byshire et al. (2013). Resistant rocks with lesser porosity
tion and flood possibility for each pixel of a watershed (Nhu restrict infiltration and thus contribute positively to flooding.
et al. 2020). It is the ratio between a specific basin area and A geological map prepared at 1:100000 scale by the Geo-
slope angle. Here, it was calculated using the ALOS DEM and logical Society of Iran was used to prepare the lithological
formulated as follows; condition of the watershed. The entire area contains a total
of 9 different lithological classes.
TWI = In(As ∕tan𝜃) (1)
α is a total upslope area that drains through a point (per unit
contour length), β is a gradient of the slope (in degrees). Morphometric condition
The natural slope classification method was finally used to
prepare a raster of it having a range from TWI 0.79 to 23. The morphometric attitude of the watershed was obtained
Stream Power Index (SPI) simply provides a measure of at the sub-watershed level. The watershed was divided into
flow power at any certain point within a watershed Poudyal 42 sub-watersheds. The sub-watershed polygons were used
et al. (2010). Higher SPI defines greater stream power and a in calculating the areal morphometric parameters and the
possible positive correlation with the flashiness of channel dis- ALOS DEM (12.5 m) was used in computing the relief
charge (Moore and Wilson 1992). SPI is formulated as follows; parameters.
The basin Texture Ratio ­(TR) is calculated by dividing
SPI = AS × tan𝛽 (2) the total number of streams of all orders by the perimeter
As is the upstream contributing area and β is the slope gra- of the watershed. Higher ­TR indicates greater intensity of
dient (in degrees). SPI raster of ranges from 6 to 25. dissection and higher erosion thus, higher runoff generation
(Hamid 2013). This raster ranged from 0.861 to 3.68.
Environmental factors The bifurcation Ratio (Rb) is formulated as a ratio of
the total number of channels in a particular order and its
It is particularly difficult to analyse the role of environmen- immediate next order. Here, the average Rb was calculated
tal setup in moderating or intensifying the nature of flash for each of the sub-watersheds. A high bifurcation ratio
floods. Researchers have used NDVI to identify the nature indicates a greater number of 1­ st and ­2nd order streams thus

13
Earth Science Informatics

leading to higher erosion and generation of high runoff Modeling approaches


(Schumm 1956). This raster varied from 3.52 to 41.26.
The ratio between the watershed area and its perim- Credal Decision Tree (CDT)
eter is conceived as Compactness Coefficient (­ Cc) while
Form Factor ­(Ff) is the ratio between the basin area and A credal decision tree (CDT) represents uncertain or imprecise
the squared length of the basin. Elongation Ratio (­ Er) is information about the probabilities of events. CDTs allow for
the squared root of the ratio between basin area and basin the representation of probabilities as intervals or sets of pos-
length. These three parameters are direct indicators of sible values. Abellán and Moral (2003) introduced CDT to
basin shape that influence runoff generation significantly solve classification problems through a credal set application
(Horton 1945). Elongated watersheds generate flat dis- where each node represents a decision point, and each branch
charge peaks for a higher duration compared to circular represents a possible decision or event outcome. CDT applies
watersheds (Schumm 1956). This raster ranged from 1.15 split criteria where imprecise uncertainty and probability are
to 1.89. considered. The probabilities associated with each branch are
The constant of channel maintenance (CCM) is the recip- represented as intervals or sets of possible values, rather than
rocal value of drainage density. It influences the degree of point estimates. The leaves of the tree represent the final deci-
runoff generation per unit area of a watershed. A lower value sion or outcome, which can also be represented as an interval
of CCM indicates higher areal occupancy within a watershed or set of possible values. The entire uncertainty (EU) is defined
under corresponding channel segments. It allows for generat- using Eq. (5).
ing more runoff on a higher slope (Schumm 1956). This ras-
ter contained a minimum of 0.803 and a maximum of 1.28. EU(n) = NC(n) + RC(n) (5)
Infiltration number ­(If), the ratio between drainage density where, n denotes the credal set, NC and RC represent the
and drainage frequency is an important factor to study flash common non-specificity and randomness respectively.
flood characteristics. Higher ­If indicates higher runoff gen- For a general credal set of frame X, NC can be calculated
eration thus, lower infiltration capacity (Alam et al. 2021). using Eq. (6)
Watersheds with impervious surfaces, higher slope compo- ∑n
nents, and lack of vegetal cover will register high ­If. This NC(n) = mn(A) ln(|A|) (6)
A⊂X
raster minimum was 0.141 and the maximum was 0 0.927.
where m is the mass of credal set n and A is the power set
of X.
Evaluation of multicollinearity For a general credal set of frame X, RC can be calculated
using Eq. (7)
Multicollinearity occurs when two or more predictor vari- ∑
ables in a regression model are highly correlated with each RC(n) = max{
x∈X
px ln px} (7)
other. This can lead to unstable and unreliable coefficient
estimates and make it difficult to interpret the effects of indi- where the maximum is taken over all probability distribution
vidual predictors on the outcome variable. It is necessary to of credal set n.
identify the relative role of colinearity affecting the overall
result of the model. To measure the multicollinearity phe- Rotation Forest (RF)
nomenon, the Variance Inflation Factor (VIF) is a common
method. VIF measures the extent to which the variance of RF is an integration method that makes weak classifiers
the estimated coefficient of a predictor variable is increased perform better though aiding their capacity. It is an ensem-
due to the correlation between that variable and the other ble learning algorithm that combines the outputs of multiple
predictor variables in the model (Kariminejad et al. 2019). decision tree classifiers by applying a transformation to the
The VIF for a predictor is then calculated using Eq. (4). input data before building each tree Rodríguez et al. (2006).
The base of the RF model is dependent on the Random for-
VIF = 1∕(1 − R)2 (4) est algorithm but the RF model is capable of handling both
multi-dimensional and small datasets. This ensemble model to
R2 is the coefficient of determination for the regression CDT is well in use for solving multi-criteria decision problems
model. A VIF value of 1 indicates no multicollinearity, (Arabameri et al. 2021a, b). The classification using the FT
while values above 1 indicate increasing levels of multi- model can be assessed using Eq. (8).
colinearity whereas VIF value of 5 or 10 is considered to
∑l
indicate problematic multicollinearity (Pourghasemi et al. Vα(n) = fm, n(nSbj )(j = 1 … … … .d) (8)
2013; Kariminejad et al. 2019). j=1

13
Earth Science Informatics

∑n
n = arg max (vα(n)) (v ∈ D) (9) E(Z) − i=1 Zi
Z
E(Z i )
Gain ratio (x, Y) = ∑n Zi (12)
− i=1 Z log2 Zi
Z

Alternating decision tree (ADTree) where, |Yi| is the number of flash flood conditioning fac-
tors belonging to the class |Yi|. E is the entropy function.
ADTree is a widely applied ensemble model based on the The independent assumptions between the conditioning fac-
principle of boosting. It was developed by Freund and tors, × 1, × 2, …, xn is included in the NBT as class condi-
Mason (1999) and it is effectively being used in solving tional independence Shirzadi et al. (2017).
multi-criteria decision-making problems (MCDM). An
ADTree is a type of decision tree that uses alternating tests Reduced Error Pruning Tree (REPT)
at each node, where each test corresponds to a feature and
its threshold value. Reduced Error Pruning (REP) is a commonly used data
ADTrees are particularly useful when dealing with noisy pruning technique to reduce the complexity of a decision
data or datasets and they can effectively prune away fea- tree while still maintaining its accuracy (Khosravi et al.
tures not likely to contribute to the classification. It is highly 2018). The basic idea behind reduced error pruning is to iter-
reliable in forecasting issues as it is regarded as consistent atively remove branches from the tree that do not improve its
in proving good accuracy (Arabameri et al. 2021a, b). The accuracy on a validation set. The REPT is a hybrid approach
ADTree-CDT can handle both uncertainty and noise in the of the REP method which builds a decision tree based on the
data, while also pruning away irrelevant features. The algo- information gain and is capable of reducing overfitting in
rithm also allows for the construction of more compact and decision trees without any significant loss of accuracy (Polo
interpretable decision trees than traditional CDTs. ADTree et al. 2008). The DT builds the classification tree by looking
model can be assessed using Eq. (10). for the input variable with the highest gain ratio (Tien Bui
√ √ et al. 2012). The pruning method REP is used to decrease
T(b) = 2( (V + (b)V− (b)) + (V + (−b)V− (−b))) + V� the complicity of the tree structure while it can remove some
(10) leaves and branches of the tree. The maximum gain ratio can
V+ (b) and V ­ − (b) refers to the complete weight of the be computed using Eq. (12).
calibration data. V′ represents the overall weight of the data-
set which does not fit the forecast node, and c represents Generation of flash flood susceptibility maps
partition testing. The optimum level of partition testing is
gained by determining the least value of T. In the repetitive Successive to the training and validation processes of flash
split test, the pruning method applied in this approach is flood models, the susceptibility maps were obtained using
equated using Eq. (11). the following steps. Firstly, the probability of flood occur-
√ √ rence for each pixel was generated using the probability
Tpure = 2( V+ + V−) + V� (11) distribution functions of the CDT, CDT-ADTree, CDT-
REPTree, CDT-RF, CDT-FT, and CDT-NBTree models in
Here, ­Tpure signifies the lowest threshold of T.
the packages and codes included in the Weka software. The
model parameters utilized for training the models are illus-
trated in Table 1. Lastly, the susceptibility maps were reclas-
Naive Bayes Tree (NBTree)
sified using the quantile method into five classes: very low,
low, moderate, high, and very high. The quantile method was
The Naïve Bayes Tree (NBT) is the most commonly used
used since it provides a more comprehensive analysis for
model by researchers because it takes very little computer
both linear and nonlinear models in practical problems and
memory, performs efficiently and is easy to interpret (Wang
makes a useful supplement for general regression models
et al. 2015). It combines Naïve Bayes (NB) and Decision
(Ching and Phoon 2023; He et al. 2019).
Tree (DT) algorithms based on the Bayes theorem (Kohavi
1996). This model uses an entropy model for growing trees
Validation of best fit model
and splits data at a node while a leaf is generated on the data
with a local NB model at that specific node. In this model,
Performance assess is one of the most importance steps in
the pre-pruning technique follows the following steps where
scientific works (Wang et al. 2022, 2023b; Zhao and Wang
the data splitting process is done at the node or a leaf is gen-
2022; Qi et al. 2022; Zhou et al. 2022a, b, c; Yue et al. 2021).
erated on the data using the NB model. Here, the gain ratio
The performance of the models is evaluated by the validation
values are computed to control tree growth using Eq. (12).
method (Li et al. 2020, Li et al. 2021; Liu et al. 2023). Here,

13
Earth Science Informatics

Table 1  List of the parameters used in dierent models


No Parameter Models
CDT CDT-ADTree CDT-REPTree VDT-RF CDT-FT CDT-NBTree

1 KTH Root Attribute 1 - - - - -


2 S Value 1.0 - - - - -
3 Initial class value count 0 - - - - -
4 Maximum tree depth -1 - - - - -
5 Minimum total weight of instances in a leaf 2.0 - - - - -
6 Minimum proportion of the variance 0.002 - - - - -
7 Number of Decimal Places 3 3 3 3 3 3
8 Number of Folds 2 - 3 - - -
9 Number of Seed 1 1 1 1 1 1
10 Batch Size - 100 100 100 100 100
11 No. of trees - 32 3854 11 75 36
12 No. of correct instances - 5398 4412 6542 6325 5391
13 Maximum Group - - - 3 - -
14 Minimum Group - - - 3 - -
15 The number of iteration - - - 17 - -
16 Removal percentage - - - 50 - -
17 The minimum number of instance - - - - 15 -
18 the number boosting iterations - - - - 15 -
19 Trim Beta - - - - 0 -
20 The maximum depth of the trees - - -1 - - -
21 The minimum number, - - 2 - - -
22 The minimum variance probability - - 0.001 - - -

Receiver Operating Characteristic (ROC) was used to validate combines both sensitivity and specificity so that both omis-
the results derived from all 6 models. The selection and com- sion and commission errors are accounted for (Allouche
parison of the training data and validation data are important et al. 2006). It is traditionally used for measuring the accu-
as it has the potentiality of influencing the validation results racy of weather models (McBride and Ebert 2000).TSS con-
(Nguyen et al. 2020). Thus, a careful selection of validation siders both omission and commission errors and success as
data sets is important. 30% of the flood inventory points were a result of random guessing. It ranges from − 1 to + 1; + 1
selected to validate the models using the Area Under Curve indicates perfect agreement and values of zero or less indi-
(AUC) of the ROC technique. The ROC curve is a commonly cate a random performance. It is calculated as follows:
used method for the validation of binary classification models
ad − bc
(Termeh et al. 2019). The curve is created through comput- TSS = = Sensitivity + Specif icity − 1
(a + c)(b + d)
ing relativity and specificity. This curve can be constructed by
(14)
plotting the values of two statistical indexes, “sensitivity” and
“1-specificity”, on the ordinate and abscissa respectively. AUC
(Eq. 13) is often utilized quantitatively to validate and com- Seed Cell Area Index (SCAI)
pare the predictive capability of the models. An AUC varying
from 0.5 to 0.6 denotes an incompetent model, AUC value The seed Cell Area Index was used to measure the accuracy
between 0.6 and 0.7 indicates a poor performance model, its of the applied forecast models in this study. SCAI is one of
value between 0.7 to 0.8 indicates a moderate performance and the most commonly used measures to test the performance
if the value is greater than 0.8, it indicates a high fitness model of forecasting models (Arabameri et al. 2021a, b). It was
with the dataset (Saha and Bhattacharya 2020). developed by Süzen and Doyuran (2004) as a ratio between
∑ ∑ TN the total amounts of pixels of the particular FFS and the total
AUC = TP + +N (13) amount of pixels of flooded areas within that category. This
P
bivariate statistical method denotes the relationship between
Apart from the AUROC indicator, True Skill Statistics occurrences of flash floods and their conditioning factors
(TSS) was also used to test the significance of the models. It (Gudiyangada Nachappa et al. 2020). If the SCAI value

13
Earth Science Informatics

decreases from a very low to a very high class of FFS, the received the maximum and minimum VIF of 5.747 and
model is regarded as excellent (Arabameri et al. 2020a, b). 1.181 respectively with a corresponding tolerance value of
0.174 and 0.847. Based on the VIF values, Table 1 indicates
that there is no direct linear relationship persists between any
Results variable with others and all these variables can be used in
computing flood-susceptible areas (Table 2).
Analysis of feature selection
Model validation
The conditioning factors of FFS are selected based on the
ground knowledge. Here, the topographic derivatives, spatial The training data has been used for preparing the flood sus-
variations in climatic factors, fluviometric properties, spa- ceptibility map while the testing data is utilised to validate
tial allocation of human practices and basin morphometric the relative performances of the six models applied in this
parameters are given the major thrust. To complete this task study (Fig. 5). Based on the respective area under the ROC
scientifically, correlation-based factor selection is consid- curve, the CDT-FT is found the best-fit model both in the
ered as it is capable of representing an individual degree of case of training and testing data having AUROC 0.986 and
influence of the selected factors. This method was initially 0.981 respectively. Concerning both the training and testing
applied to 18 factors and concerning the correlation values, data, the second-best performance has been registered by
the hierarchy lies in the order of having Distance to stream the CDT-REPT model having AUROC values of 0.983 and
at the top followed by TWI, SPI, Slope, Elevation, LU/LC, 0.979 respectively (Fig. 6).
Rainfall, MBR, Infiltration number, Form factor, Texture With respect to both the training and testing data set, the
ratio, Elongation ratio, Distance from residential, NDVI, CDT-REPT model is followed by CDT-RF, having AUROC
Lithology, CCM, Compactness and Curvature. Closeness to 0.979 and 0.976 respectively, CDT-ADTree, having AUROC
the channel with resident soil moisture, the relative volume 0.970 and 0.970 respectively, CDT-NBTree, having AUROC
of flow accumulation and its varying degree of slope along- 0.967 and 0.953 respectively. And the least fit model among
side the respective elevation initially determine the flood all the applied Credal Decision Tree functions is CDT hav-
susceptibility scenario in this study area (Fig. 4). ing AUROC 0.955 and 0.920 to the training and validation
dataset. To test the efficiency of these models the TSS was
Outcome of multi‑colinearity test also computed. The TSS is highest for the CDT-FT model
in case of both the training and testing datasets. The TSS
The collinearity problem among the selected parameters was for CDT-FT in case of training and testing datasets; 0.875
tested by calculating the VIF value and the tolerance value. and 0.872 respectively signify a near-perfect agreement for
Tolerance is reciprocal of VIF. Elevation and Curvature the model.

Fig. 4  Importance of condition- Curvature 0.006


ing factors based on FT model
Compactness 0.009
CCM 0.009
Lithology 0.016
NDVI 0.021
DtResidential 0.021
Elingation ratio 0.023
Texture ratio 0.025
Form factor 0.027
Infiltration number 0.034
MBR 0.036
Rainfall 0.058
LU/LC 0.061
Elevation 0.123
Slope 0.156
SPI 0.264
TWI 0.304
DtStream 0.406

0.0 0.1 0.2 0.3 0.4 0.5

13
Earth Science Informatics

Table 2  Multi-Collinearity test 11.96% and the very highly susceptible class covers 11.39%
Factor Tolerance VIF
of the total study area (Fig. 9). Roughly all the training sam-
ple sites converge within the high to very highly susceptible
Texture tatio 0.330 3.030 to flood hazard regions.
Curvature 0.847 1.181
Elevation 0.174 5.747 Model efficiency
Distance to stream 0.444 2.252
Mean bifurcation ratio (MBR) 0.469 2.132 The Frequency Ratio (FR) for the five classes of flood sus-
Compactness Coefficient 0.251 3.984 ceptibility predicted by the CDT-FT model is calculated to
Form factor 0.278 3.597 compare the flash flood-susceptible pixel and flood occur-
Elongation ratio 0.287 3.484 rence points. The very highly susceptible regions to flash
NDVI 0.687 1.456 flood hazard having the maximum FR of 8.330 among
Constant of channel maintenance (CCM) 0.423 2.364 all the classes signify a relatively high efficiency of the
Infiltration number 0.210 4.762 CDT-FT model in predicting the flood susceptible regions.
Rainfall 0.238 4.202 Moreover, the SCAI, another method to validate the CDT-
Distance to residential 0.610 1.639 FT model’s efficiency is also applied. The decreasing SCAI
Slope 0.292 3.425 to 0.120 at the very highly susceptible class signifies maxi-
Stream power index (SPI) 0.380 2.632 mum occurrences of previous flood events within this class
Topography wetness index (TWI) 0.250 4.000 (Fig. 10). The measured significance in terms of corre-
Land use/land cover (LU/LC) 0.534 1.873 spondence of previously occurred flood events to predicted
Lithology 0.596 1.678 flood susceptible zone by the CDT-FT model is good to use
by the land-use planners for regulating certain land-use
practices within different zones concerning the severity of
For both the training and validation dataset, the CDT- flood occurrences.
FT model is followed by CDT-REPT having TSS = 0.871
and 0.869 respectively. CDT-RF has registered TSS = 0.862
and 0.861 respectively while CDT-ADTree has registered Discussion
TSS = 0.857 and 0.857 respectively. CDT-NBTree has reg-
istered TSS = 0.844 and 0.837 respectively while CDT has Researchers have developed multiple machine learning
TSS = 0.832 and 0.811 respectively (Fig. 7). All six models (Khosravi et al. 2018; Pham et al. 2020), deep learning and
have registered AUROC above the range of 0.9 which signi- MCDM techniques (Pradhan 2010; Rahmati et al. 2016)
fies an excellent fit for all the predicted outcomes (Yesilnacar to predict FFS in different river basins located in diverse
2005) and a TSS value greater than 0.80 signifies a near- climatic regions. In this study, CDT-based different novel
perfect fit. These quantifications indicate that all the models ensemble algorithms were applied for better prediction of
are relatively good to use although the comparative testing FFS in the Neka Roud watershed located within the Mazan-
of these values denotes the best performance of CDT-FT dran Province. To the best of our knowledge, although these
among all. models have been used separately in different studies (Khos-
ravi et al. 2018), this hybrid/ensemble approach including all
Spatial characteristics of the best‑fit model these models as an ensemble to CDT model has not previ-
ously been employed in predicting FFS areas. Though the
In this study, the spatial extent of flood susceptibility is development of different MLAs approaches has increased
measured using 6 models, CDT, CDT-ADTree, CDT-FT, the predictive capacity of riverine hazards compared to
CDT-RF, CDT-REPT, and CDT-NBTree. The CDT-FT was statistical and mathematical approaches, many researchers
found the best-fit model in predicting the most significant have employed different ensemble models to decision tree
extent of flood susceptibility in the study area. The output algorithms in predicting different environmental hazards and
raster is classified into five classes (Fig. 8). registered a better performance comparatively (Pham et al.
The high and very highly susceptible classes of flood sus- 2019; Arabameri et al. 2021a, b; Islam et al. 2021).
ceptibility are mostly concentrated around the river channels The risk of flash floods has increased due to climate
within the valleys while the slope and elevation of the valley change and changes in land use practices with increasing
walls restrict flood inundation effects. The very low suscep- population. Thus, to cater to this environmental problem
tible class contains 41.46% of the total area. The low sus- development of models with better predictive capacity is
ceptibility class contains 20.55%, the moderately susceptible required. The robust methodology in this study that incorpo-
class contains 14.64%, the highly susceptible class covers rates the application of the ensemble approach and multi-tier

13
Earth Science Informatics

Fig. 5  Flood hazard susceptibility maps. a credal decision tree (CDT), b CDT- Alternative Decision Tree (ADTree), c CDT- function tree (FT),
d CDT- rotation forest (RF), CDT- reduced error pruning tree (REPTree), CDT- Naïve-Bayes tree (NBTree)

performance assessment of applied models provides scien- varying degrees of tree generation because of the difference
tific understanding along with an outlook that is significantly in sub-dataset construction may have reduced the perfor-
accurate to cater to the problem of FFS zone prediction. mance of individual CDT compared to ensemble models
A total of CDT based 5 ensemble models and CDT itself (Abellán and Moral 2003). The integration of meta-classifier
has been applied in this present study, where the relative such as ADTree, REPTree, RF, FT, and NBTree with the
importance of the conditioning factors has been tested base-classifier such as CDT usually helps the base classi-
through a correlation study. The collinearity problem within fier in lowering noise and bias levels within the dataset. It
the selected dataset was also checked using the VIF method results in higher prediction accuracy. These meta-classifi-
and a careful selection of training and testing datasets has ers are important algorithms for improving the accuracy
increased the reliability of the approach. AUROC values of individual classification as they fuse different classifica-
were considered for identifying the best-fit model The tions. The base classification generated errors is moved to a
results of model validation indicate an enhancement of a domain, which is being calculated on comparatively smaller
single CDT-based classifier by the used ensemble frame- training datasets and makes the ensemble models useful for
work for better flash flood susceptibility mapping. Possibly, low classification too. These models are capable of reducing

13
Earth Science Informatics

Fig. 6  The area under the receiver operating characteristic (AUROC) values for used models. a training data, b validation data

(a)
0.875 0.88
0.871
0.87
0.862
0.857 0.86

0.844 0.85
0.84
0.832
0.83
0.82
0.81
CDT CDT NBTree CDT ADTree CDT RF CDT REPT CDT FT

(b)
0.869 0.872 0.88
0.861 0.87
0.857
0.86
0.85
0.837
0.84
0.83
0.811 0.82
0.81
0.8
0.79
0.78
CDT CDT NBTree CDT ADTree CDT RF CDT REPT CDT FT

Fig. 7  Efficiency and true skill statistic (TSS) values for used models. a training, b validation data

13
Earth Science Informatics

the average errors in terms of bias. In these hybrid methods,


the original training dataset is divided into multiple sub-
datasets, which can be treated at the same time. TSS was
calculated for all the applied models both on training and
testing datasets where the highest value of CDT-FT indicates
better performance compared to other models. Moreover,
SCAI and FR were computed for the best-fit CDT-FT model
where the decreasing SCAI with increasing level of suscep-
tibility indicates significant performance and reliability of
the best-fit model.
The relative degree of association of conditioning factors
with occurrences of flash floods indicates the prominence of
increasing distance from the active channels. With increasing
distance from the rivers, the degree of FFS tends to get mod-
erated. Similarly, the degree of FFS was also observed to be
Fig. 8  Flood hazard susceptibility classes produced by the CDT-RF higher in the low elevation and steep areas where depression-
ensemble model (the best model) bound topographic conditions face the wrath of flash flood
impact. This observation was found to be in agreement with
studies by Khosravi et al. (2018) and Shahabi et al. (2021a,
Very Low b). Khosravi et al. (2018) utilized four machine learning cal-
41.46
culations (LMT, REPT, NBT, and ADT) using 11 condition-
ing elements tested by the IGR method to display flooding
in Haraz. They argued that steeper ground had less time for
20.55
water to infiltrate. Shahabi et al. (2021a, b) used 11 condi-
Low Very High
tioning factors to predict the FFS in Haraz Watershed using a
11.39
novel deep learning approach where they found slope as the
most influential factor in FFS, followed by distance to river
function and argued that since the watershed is mountainous,
the steep slopes transfer water quickly downstream leading to
14.64 11.96 overtopping of the river banks.
Moderate High Similar to the outcome of this study, distance to river func-
tion was found to be the most influential in FFS by Tien Bui
et al. (2019). Pham et al. (2020) also found the distance to
river function as the most influential while using CDT as the
Fig. 9  Percentage of susceptibility classes prodiced by CDT-FT base model along with four different ensemble models. In this
model

Fig. 10  Frequency ratio (FR)


90.000 9.000
and seed cell area index (SCAI)
8.330
trend of susceptibility classes 80.000 80.441 8.000
produced by CDT-FT model
70.000 7.000
60.000 6.000
50.000 5.000
40.000 39.870 4.000
30.000 28.406 3.000
20.000 2.000
0.302
10.000 3.314 1.000
0.012 0.025 0.035
0.000 0.120 0.000
Very Low Low Moderate High Very High

ASCI FR

13
Earth Science Informatics

study, TWI was found following the distance to river function Conclusions
as an influencing factor of flash flood generation. It indicates
the proximity of water accumulation with its relation to soil In the present study, six CDT-based novel ensemble mod-
moisture. The higher FFS was observed associated with higher els were applied to prepare the FFS of the Neka-roud
TWI which is found in agreement with the observation made watershed in Iran. For modelling purposes, 206 flood sites
by Shahabi et al. (2021a, b). Since this study area contains a were taken with 70% as training data and 30% as testing
higher range of elevation, rainfall which is considered here data. Moreover, 18 flood conditioning factors were con-
as a conditioning factor tends to flow quickly over the higher sidered for the preparation of the FFS. Multi-collinearity
ground, and the steep slope facets direct this runoff towards diagnostic tests (VIF) show that Dt-stream was the most
areas with better run-off accumulating terrain conditions. crucial conditioning factor followed by the TWI, SPI,
Although rainfall is considered one of the important condi- slope and elevation. Based on the respective area under the
tioning factors, the result does not reflect what we expected. It ROC curve, the CDT-FT is found the best-fit model both in
is possibly because of two reasons; one, we did not consider the case of training and testing data having AUROC 0.986
the relative magnitude of flash flood and two, the steep val- and 0.981 respectively which is followed by the second-
ley slope has generated run-off and transferred the discharge best fit model i.e. CDT-REPT model having AUROC val-
quickly on the low elevation segment of the river. Similarly, the ues of 0.983 and 0.979 respectively. Other models in order
lithological characteristics of the study area do not influence of performance are CDT-RF, CDT-ADTree, CDT-NBTree,
the nature of the flash flood significantly. Khosravi et al. (2018) and CDT. It exhibits that CDT-based ensembles perform
considered rainfall along with other factors in their modeling better than the simple CDT model for flood susceptibil-
and assessment of flooding in one of China's most flood-prone ity mapping. Based on the best-fit model (CDT-FT), the
areas. Using the IGR technique, they found that rainfall ranked flood susceptible areas are concentrated in and around the
last, below NDVI and lithology, in explaining flood incidence. channel of the river courses of the watershed and the study
It is observed that the ensemble approach contributes sig- found that ~ 12% of the study area falls under the highly
nificantly to boosting the predictive capacity of CDT. These flood flash flood susceptible zone while ~ 11% of the water-
hybrid models have registered AUROC all above 0.9 compara- shed are very highly flooding susceptible. As flash flood
tively better than the individual CDT. The AUROC of CDT susceptibility mapping plays a pivotal role in the flood
is 0.92 which is itself an excellent fit, and is surpassed by its management of any region especially arid tracts of Iran,
ensembles. Comparatively lower performance of individual the novel ensemble models will be highly beneficial for
CDT is possibly due to the formation of a more diverse tree various stakeholders to better comprehend the dynamics
network while the addition of different ensembles has signifi- of the flash floods and plan accordingly to minimize loss
cantly improved the predictive capacity. Nguyen et al. (2019), of properties and lives.
He et al. (2019), and Gui et al. (2023a, b) have reported that
using ensembles with a base like CDT increases the predictive Acknowledgements Not applicable.
capacity of MLA’s. Ensembles like ADTree (Arabameri et al. Author Contributions Conceptualization, A.A., D.Y.; methodology,
2021a, b), CEPT (Khosravi et al. 2018), RF (He et al. 2019) A.A., D.Y., and T.Z.; software, A.A.; validation, A.A.; formal analy-
and NBTree (Khosravi et al. 2018, 2019) have been tested in sis, A.A and T.Z.; investigation, A.A. and M.S.; resources, A.A. and
previous studies where these meta-classifiers have significantly M.S.; data curation, A.A.; writing—original draft preparation, A.A.,
D.Y., M.S., U.D.S., and A.I.; writing—review and editing, T.Z., A.A.,
increased the model performance. Here, the CDT-FT model D.Y., M.S., U.D.S., and A.I.; All authors have read and agreed to the
has outperformed all of them. The accurate estimation of flood published version of the manuscript.
susceptibility is an essential process for safeguarding property
and lives and developing effective mitigation measures. Mark- Data availability The data used in this research are available by the
corresponding author upon reasonable request.
ing the areas with a hierarchical degree of flood risk is essential
to tame the extent of the damage. Therefore, researchers are Declarations
trying new and robust techniques in ML using the DT method
with reliable application of hybrid models to obtain very high Ethical approval The authors confirm that this article is original
precision and accurate results, which aid to propose flood man- research and has not been published or presented previously in any
journal or conference in any language (in whole or in part).
agement plans (Wang et al. 2019). The methodology used in
this study brings reliability in applying the CDT-based ensem- Consent to participate Not applicable.
bles if the conditioning factors are carefully chosen depending
upon the environmental condition. Competing interests The authors have no conflict of interest.

13
Earth Science Informatics

References Region China. Urban Clim 49:101562. https://​doi.​org/​10.​1016/j.​


uclim.​2023.​101562
Ghosh S, Roy S, Islam A, Shit PK, Datta DK, Islam MS, Das BC
Abellán J, Moral S (2003) Building classification trees using the
(2023) Floods of ganga-brahmaputra-meghna delta in context.
total uncertainty criterion. Int J Intell Syst 18(12):1215–1225
Floods in the Ganga–Brahmaputra–Meghna Delta. Springer Inter-
Alam A, Ahmed B, Sammonds P (2021) Flash flood susceptibility
national Publishing, Cham, pp 1–17
assessment using the parameters of drainage basin morphometry
Gudiyangada Nachappa T, Kienberger S, Meena SR, Hölbling D,
in SE Bangladesh. Quat Int 575–576:295–307
Blaschke T (2020) Comparison and validation of per-pixel and
Alarifi SS, Abdelkareem M, Abdalla F, Alotaibi M (2022) Flash
object-based approaches for landslide susceptibility mapping.
flood hazard mapping using remote sensing and GIS techniques
Geomat Nat Hazards Risk 11:572–600
in Southwestern Saudi Arabia. Sustainability 14(21):14145
Gui J, Pérez-Rey I, Yao M, Zhao F, Chen W (2023a) Credal- decision-
Aldhshan SR, Mohammed OZ, Shafri HM (2019) Flash flood area
tree-based ensembles for spatial prediction of landslides. Water
mapping using sentinel-1 SAR data: a case study of eight upa-
15:605. https://​doi.​org/​10.​3390/​w1503​0605
zilas in Sunamganj district, Bangladesh. In IOP Conference
Gui J, Pérez-Rey I, Yao M, Zhao F, Chen W (2023b) Credal-decision-
Series: Earth Environ Sci, 357(1):012034. IOP Publishing
tree-based ensembles for spatial prediction of landslides. Water
AlMahasneh L, Abuhamoor D, Al Sane K, Haddad NJ (2021)
15:605. https://​doi.​org/​10.​3390/​w1503​0605
Assessment and mapping of flash flood hazard severity in Jor-
Hamid RAHA (2013) Application of morphometric analysis for geo-
dan. Int J River Basin Manage, 1–15
hydrological studies using geo-spatial technology—a case study
Arabameri A, Karimi-Sangchini E, Pal SC et al (2020a) Novel cre-
of vishav drainage basin. J Waste Water Treat Anal 4:1–12
dal decision tree-based ensemble approaches for predicting the
He Q, Xu Z, Li S, Li R, Zhang S, Wang N, Pham BT, Chen W (2019)
landslide susceptibility. Remote Sens 12:3389
Novel entropy and rotation forest-based credal decision tree clas-
Arabameri A, Sadhasivam N, Turabieh H et al (2021a) Credal deci-
sifier for landslide susceptibility modeling. Entropy 21(2):106
sion tree based novel ensemble models for spatial assessment of
Horton RE (1945) Erosional development of streams and their drainage
gully erosion and sustainable management. Sci Rep 11:31–47
basins; hydrophysical approach to quantitative morphology. Geol
Arabameri A, Sadhasivam N, Turabieh H, Mafarja M, Rezaie F,
Soc Am Bull 56:275–370
Pal SC, Santosh M (2021b) Credal decision tree based novel
Huang S, Lyu Y, Sha H, Xiu L (2021) Seismic performance assess-
ensemble models for spatial assessment of gully erosion and
ment of unsaturated soil slope in different groundwater lev-
sustainable management. Sci Rep 11(1):3147
els. Landslides 18(8):2813–2833. https:// ​ d oi. ​ o rg/ ​ 1 0. ​ 1 007/​
Arabameri A, Saha S, Roy J, Chen W, Blaschke T, Tien Bui D
s10346-​021-​01674-w
(2020b) Landslide susceptibility evaluation and management
Islam A, Ghosh S (2022) Community-based riverine flood risk assess-
using different machine learning methods in the Gallicash River
ment and evaluating its drivers: evidence from Rarh Plains of
Watershed. Iran Remote Sens 12:475
India. Appl Spat Anal Policy 15(1):1–47
Cao C, Xu P, Wang Y, Chen J, Zheng L, Niu C (2016) Flash flood
Islam ARMT, Talukdar S, Mahato S, Kundu S, Eibek KU, Pham QB,
hazard susceptibility mapping using frequency ratio and statisti-
Linh NTT (2021) Flood susceptibility modelling using advanced
cal index methods in coalmine subsidence areas. Sustainability
ensemble machine learning models. Geosci Front 12(3):101075
8(9):948
Kariminejad N, Hosseinalizadeh M, Pourghasemi HR, Bernatek-
Chakrabortty R, Chandra Pal S, Rezaie F, Arabameri A, Lee S,
Jakiel A, Campetella G, Ownegh M (2019) Evaluation of factors
Roy P, Moayedi H (2022) Flash-flood hazard susceptibil-
affecting gully headcut location using summary statistics and the
ity mapping in Kangsabati River Basin India. Geocarto Int
maximum entropy model: Golestan Province. NE Iran Sci Tot
37(23):6713–6735
Env 677:281–298. https://d​ oi.o​ rg/1​ 0.1​ 016/j.s​ citot​ env.2​ 019.0​ 4.3​ 06
Ching J, Phoon K-K (2023) Quantile value method versus design
Khosravi K, Pham BT, Chapi K et al (2018) A comparative assessment
value method for calibration of reliability-based geotechnical
of decision trees algorithms for flash flood susceptibility modeling
codes. Struct Saf 44:47–58
at Haraz watershed, northern Iran. Sci Tot Env 627(218):744–755
Chapi K, Singh VP, Shirzadi A, Shahabi H, Bui DT, Pham BT, Khos-
Kohavi R (1996) Scaling up the accuracy of naive-bayes classifiers: A
ravi K (2017) A novel hybrid artificial intelligence approach for
decision-tree hybrid. Kdd, pp. 202–207
flood susceptibility assessment. Environ Model Softw 95:229–245
Lu Y, Bookman R, Waldmann N, Marco S (2020) A 45 kyr laminae
Dash P, Mukherjee K, Ghosh S (2022) Flash flood susceptibility
record from the Dead Sea: Implications for basin erosion and
mapping of a himalayan river basin using multi‐criteria deci-
floods recurrence. Quat Sci Rev 229:106143
sion‐analysis and GIS. Advances in Remote Sensing Technol-
Li J, Wang Z, Wu X, Xu C, Guo S, Chen X (2020) Toward monitoring
ogy and the Three Poles, 257–267
short-term droughts using a novel daily scale, standardized ante-
Dang P, Cui J, Liu Q, Li Y (2023) Influence of source uncertainty on
cedent precipitation evapotranspiration index. J Hydrometeorol
stochastic ground motion simulation: a case study of the 2022
21(5):891–908. https://​doi.​org/​10.​1175/​JHM-D-​19-​0298.1
Mw 66 Luding, China, earthquake. Stochastic Environ Res Risk
Li W, Zhu J, Fu L, Zhu Q, Xie Y, Hu Y (2021) An augmented represen-
Assess. https://​doi.​org/​10.​1007/​s00477-​023-​02427-y
tation method of debris flow scenes to improve public perception.
Derbyshire E, Hails JR, Gregory KJ (2013) Geomorphological pro-
Int J Geograph Inf Sci 35(8):1521–1544. https://​doi.​org/​10.​1080/​
cesses: Studies in Physical Geography. Elsevier
13658​816.​2020.​18330​16
Elkhrachy I (2015) Flash flood hazard mapping using satellite images
Li T, Xia T, Wang H, Tu Z, Tarkoma S, Han Z, Hui P (2022) Smart-
and GIS tools: a case study of Najran City, Kingdom of Saudi
phone app usage analysis: datasets, methods, and applications.
Arabia (KSA). Egypt J Remote Sens Space Sci 18(2):261–278
IEEE Comm Surv Tutor 24(2):937–966. https://​doi.​org/​10.​1109/​
Freund Y, Mason L (1999) The alternating decision tree learning
COMST.​2022.​31631​76
algorithm. ICML. 124–133
Li Q, Song D, Yuan C, Nie W (2022) An image recognition method for
Gao BC (1996) NDWI—A normalized difference water index for
the deformation area of open-pit rock slopes under variable rain-
remote sensing of vegetation liquid water from space. Remote
fall. Measurement 188:110544. https://​doi.​org/​10.​1016/j.​measu​
Sens Environ 58(3):257–266
rement.​2021.​110544
Gao C, Zhang B, Shao S, Hao M, Zhang Y, Xu Y, Wang Z (2023)
Liu Z, Feng J, Uden L (2023) From technology opportunities to
Risk assessment and zoning of flood disaster in Wuchengxiyu
ideas generation via cross-cutting patent analysis: Application

13
Earth Science Informatics

of generative topographic mapping and link prediction. Technol of the Torsa River course; Model Earth Syst Environ https://​doi.​
Forecast Soc Change 192:122565. https://​doi.​org/​10.​1016/j.​techf​ org/​10.​1007/​s40808-​020-​00967-8
ore.​2023.​122565 Saleh, A., Yuzir, A., & Abustan, I. (2020). Flash flood susceptibility
McBride JL, Ebert EE (2000) Verification of quantitative precipitation modelling: A review. In IOP Conference Series: Materials Sci-
forecasts from operational numerical weather prediction models ence and Engineering.712(1):012005. IOP Publishing
over Australia. Weather Forecast 15:103–121 Saleh A, Yuzir A, Sabtu N, Abujayyab SK, Bunmi MR, Pham QB
Moore ID, Wilson JP (1992) Length-slope factors for the revised uni- (2022) Flash flood susceptibility mapping in urban area using
versal soil loss equation: simplified method of estimation. J Soil genetic algorithm and ensemble method. Geocarto Int, 1–30
Water Conserv 47(5):423–428 Schumm SA (1956) Evolution of drainage systems and slopes in
NASA Earth Observatory (2023). Flash Flooding in Iran. https://​earth​ badlands at Perth Amboy, New Jersey. Geol Soc Am Bull
obser ​vatory.​nasa.​gov/​images/​146150/​f lash-​f lood​ing-​in-​iran. 67:597–646
Accessed 20 Apr 2023 Shahabi H, Shirzadi A, Ronoud S, Asadi S, Pham BT, Mansouripour
Nguyen PT, Ha DH, Nguyen HD, Van Phong T, Trinh PT, Al-Ansari F, Bui DT (2021) Flash flood susceptibility mapping using a
N, Prakash I (2020) Improvement of credal decision trees using novel deep learning model based on deep belief network,
ensemble frameworks for groundwater potential modeling. Sus- back propagation and genetic algorithm. Geoscience Frontiers
tainability 12(7):2622 12(3):101100
Nguyen VT, Tran T, Ha N, Ngo VL, Al-Ansari N, Van Phong T, Shahabi H, Shirzadi A, Ronoud S et al (2021b) Flash flood suscepti-
Nguyen DH, Malek M, Amini A, Prakash I et al (2019) GIS based bility mapping using a novel deep learning model based on deep
novel hybrid computational intelligence models for mapping land- belief network, back propagation and genetic algorithm. Geosci
slide susceptibility: A Case Study at Da Lat City. Vietnam Sus- Fronts 12(2021):101100
tainability 11:7118 Shirzadi A, Bui DT, Pham BT, Solaimani K, Chapi K, Kavian A, Sha-
Nhu VH, Ngo PTT, Pham TD, Dou J, Song X, Hoang ND, Tran DA, habi H, Revhaug I (2017) Shallow landslide susceptibility assess-
Cao DP, Aydilek İB, Amiri M, Costache R, Hoa PV, Tien BD ment using a novel hybrid intelligence approach. Environ Earth
(2020) A New hybrid firefly–PSO optimized random subspace Sci 76(2):60
tree intelligence for torrential rainfall- induced flash flood sus- Singh N, Vangani NS, Sharma JR (1993) Flash flood damage mapping
ceptible mapping. Remote Sens 12(7):2688. https://​doi.​org/​10.​ in arid environment using satellite remote sensing—a case study
3390/​rs121​72688 of Pali region. J Ind Soc Remote Sens 21:75–86
Peptenatu D, Grecu A, Simion AG, Gruia KA, Andronache I, Draghici Süzen ML, Doyuran V (2004) A comparison of the GIS based landslide
CC, Diaconu DC (2020) Deforestation and frequency of floods susceptibility assessment methods: Multivariate versus bivariate.
in romania.water resour manag romania. Springer, pp. 279–306 Environ Geol 45:665–679
Pourghasemi H, Pradhan B, Gokceoglu C, Mohammady M, Moradi H Tariq A, Yan J, Ghaffar B, Qin S, Mousa BG, Sharifi A, Aslam M
(2013) Application of weights-of-evidence and certainty factor (2022) Flash flood susceptibility assessment and zonation by inte-
models and their comparison in landslide susceptibility mapping grating analytic hierarchy process and frequency ratio model with
at Haraz watershed. Iran Arab J Geosci 6(7):2351–2365 diverse spatial data. Water 14(19):3069
Pham BT, Avand M, Janizadeh S, Phong TV, Al-Ansari N, Ho LS, Tehrany MS, Pradhan B, Jebur MN (2015) Flood susceptibility anal-
Das S, Le HV, Amini A, Bozchaloei SK (2020) GIS based hybrid ysis and its verification using a novel ensemble support vector
computational approaches for flash flood susceptibility assess- machine and frequency ratio method. Stoch Env Res Risk A
ment. Water 12(3):683 29(4):1149–1165
Pham BT, Prakash I, Singh SK, Shirzadi A, Shahabi H, Bui DT (2019) Termeh SVR, Khosravi K, Sartaj M, Keesstra SD, Tsai FTC, Dijksma
Landslide susceptibility modeling using Reduced Error Pruning R, Pham BT (2019) Optimization of an adaptive neuro-fuzzy
Trees and different ensemble techniques: hybrid machine learning inference system for groundwater potential mapping. Hydrogeol
approaches. CATENA 175:203–218 J 27:2511–2534
Polo JL, Berzal F, Cubero JC (2008) Class-oriented reduction of deci- Tien Bui D, Ngo PTT, Pham TD, Jaafari A, Minh NQ, Hoa PV, Samui
sion tree complexity. International Symposium on Methodologies P (2019) A novel hybrid approach based on a swarm intelligence
for Intelligent Systems. Springer, pp. 48–57 optimized extreme learning machine for flash flood susceptibility
Poudyal CP, Chang C, Oh HJ, Lee S (2010) Landslide susceptibil- mapping. Catena 179:184–196
ity maps comparing frequency ratio and artificial neural net- Tien Bui D, Pradhan B, Lofman O, Revhaug I (2012) Landslide sus-
works: a case study from the Nepal Himalaya. Environ Earth Sci ceptibility assessment in Vietnam using support vector machines,
61(5):1049–1064 decision tree, and naive bayes models. Math Probl Eng 2012
Pouyan S, Pourghasemi HR, Bordbar M, Rahmanian S, Clague JJ Torcivia CEG, López NNR (2020) Preliminary Morphometric analysis:
(2021) A multi-hazard map-based flooding, gully erosion, forest río talacasto basin, central precordillera of San Juan, Argentina.
fires, and earthquakes in Iran. Sci Rep 11(1):1–19 Advances in Geomorphology and Quaternary Studies in Argen-
Pradhan B (2010) Flood susceptible mapping and risk area delineation tina. Springer, Cham, pp. 158–168
using logistic regression, GIS and remote sensing. J Spat Hydrol 9 Wang S, Jiang L, Li C (2015) Adapting naive Bayes tree for text clas-
Qi M, Cui S, Chang X, Xu Y, Meng H, Wang Y, Arif M (2022) Multi- sification. Knowl Inf Syst 44(1):77–89
region nonuniform brightness correction algorithm based on Wang Y, Hong H, Chen W, Li S, Pamučar D, Gigović L, Duan H
l-channel gamma transform. Security and communication net- (2019) A hybrid GIS multi-criteria decision-making method for
works, 2022. https://​doi.​org/​10.​1155/​2022/​26759​50 flood susceptibility mapping at Shangyou. China Remote Sens
Rahmati O, Pourghasemi HR, Zeinivand H (2016) Flood susceptibility 11(1):62
mapping using frequency ratio and weights-of-evidence models in Wang Y, Xu N, Liu A, Li W, Zhang Y (2022) High-Order interaction
the Golastan Province. Iran Geocarto Int 31(1):42–70 learning for image captioning. IEEE Trans Circuits Syst Video
Rodríguez JJ, Kuncheva LI, Alonso CJ (2006) Rotation forest: A Technol 32(7):4417–4430. https://​doi.​org/​10.​1109/​TCSVT.​2021.​
new classifier ensemble method. IEEE Trans Pattern Anal 31210​62
28:1619–1630 Wang S, Hu X, Sun J, Liu J (2023a) Hyperspectral anomaly detection
Saha UD, Bhattacharya S (2020) Application of multi-criteria deci- using ensemble and robust collaborative representation. Inf Sci
sion-making approach for ascertaining the avulsion potentiality 624:748–760. https://​doi.​org/​10.​1016/j.​ins.​2022.​12.​096

13
Earth Science Informatics

Wang Y, Su Y, Li W, Xiao J, Li X, Liu A (2023) Dual-path rare content landslide with different row spacings. Landslides. https://​doi.​org/​
enhancement network for image and text matching. IEEE Trans Circ 10.​1007/​s10346-​022-​01994-5
Syst Video Technol. https://​doi.​org/​10.​1109/​TCSVT.​2023.​32545​30 Zhao L, Wang L (2022) A new lightweight network based on Mobile-
Wu X, Guo S, Qian S, Wang Z, Lai C, Li J, Liu P (2022) Long-range NetV3. KSII Trans Internet Inf Syst 16:1–15
precipitation forecast based on multipole and preceding fluctuations Zhou G, Zhang R, Huang S (2021) Generalized buffering algorithm.
of sea surface temperature. Int J Climatol 42(15):8024–8039. https://​ IEEE Access 9:27140–27157. https://​doi.​org/​10.​1109/​ACCESS.​
doi.​org/​10.​1002/​joc.​7690 2021.​30577​19
Xie X, Xie B, Cheng J, Chu Q, Dooling T (2021) A simple Monte Zhou J, Wang L, Zhong X, Yao T, Qi J, Wang Y, Xue Y (2022) Quan-
Carlo method for estimating the chance of a cyclone impact. tifying the major drivers for the expanding lakes in the interior
Nat Hazards 107(3):2573–2582. https:// ​ d oi. ​ o rg/ ​ 1 0. ​ 1 007/​ tibetan plateau. Sci Bull 67(5):474–478. https://d​ oi.o​ rg/1​ 0.1​ 016/j.​
s11069-​021-​04505-2 scib.​2021.​11.​010
Yang J, Fu LY, Zhang Y et al (2022) Temperature- and pressure- Zhou G, Song B, Liang P, Xu J, Yue T (2022b) Voids filling of DEM
dependent pore microstructures using static and dynamic mod- with multiattention generative adversarial network model. Remote
uli and their correlation. Rock Mech Rock Eng 55:4073–4092. Sensing (basel, Switzerland) 14(5):1206. https://​doi.​org/​10.​3390/​
https://​doi.​org/​10.​1007/​s00603-​022-​02829-4 rs140​51206
Yesilnacar EK (2005) The application of computational intelligence to Zhou L, Ye Y, Tang T, Nan K, Qin Y (2022c) Robust matching for
landslide susceptibility mapping in Turkey. Ph.D. thesis, Depart- SAR and optical images using multiscale convolutional gradient
ment of Geomatics, University of Melbourne features. IEEE Geosci Remote Sens Lett 19:1–5. https://​doi.​org/​
Yin L, Wang L, Zheng W, Ge L, Tian J, Liu Y, Liu S (2022) Evaluation 10.​1109/​LGRS.​2021.​31055​67
of empirical atmospheric models using swarm-C Satellite Data. Zhuo Z, Du L, Lu X, Chen J, Cao Z (2022) Smoothed Lv distribu-
Atmosphere 13(2):294. https://​doi.​org/​10.​3390/​atmos​13020​294 tion based three-dimensional imaging for spinning space debris.
Yin Y, Zhang X, Guan Z, Chen Y, Liu C, Yang T (2023) Flash flood IEEE Trans Geosci Remote Sens 60:1–13. https://d​ oi.o​ rg/1​ 0.1​ 109/​
susceptibility mapping based on catchments using an improved TGRS.​2022.​31746​77
Blending machine learning approach. Hydrol Res. 54(4):557–579 Zhu X, Xu Z, Liu Z, Liu M, Yin Z, Yin L, Zheng W (2022) Impact of
Yin L, Wang L, Tian J, Yin Z, Liu M, Zheng W (2023) Atmospheric dam construction on precipitation: a regional perspective. Mar
density inversion based on swarm-C satellite accelerometer. Appl Freshw Res. https://​doi.​org/​10.​1071/​MF221​35
Sci 13(6):3610. https://​doi.​org/​10.​3390/​app13​063610
Yue Z, Zhou W, Li T (2021) Impact of the indian ocean dipole on Publisher's note Springer Nature remains neutral with regard to
evolution of the subsequent ENSO: relative roles of dynamic and jurisdictional claims in published maps and institutional affiliations.
thermodynamic processes. J Clim 34(9):3591–3607. https://​doi.​
org/​10.​1175/​JCLI-D-​20-​0487.1 Springer Nature or its licensor (e.g. a society or other partner) holds
Zhang Y, Luo J, Zhang Y, Huang Y, Cai X, Yang J, Zhang Y (2022) exclusive rights to this article under a publishing agreement with the
Resolution enhancement for large-scale real beam mapping based author(s) or other rightsholder(s); author self-archiving of the accepted
on adaptive low-rank approximation. IEEE Trans Geosci Remote manuscript version of this article is solely governed by the terms of
Sens 60:1–21. https://​doi.​org/​10.​1109/​TGRS.​2022.​32020​73 such publishing agreement and applicable law.
Zhang C, Yin Y, Yan H, Zhu S, Li B, Hou X, Yang Y (2022) Centri-
fuge modeling of multi-row stabilizing piles reinforced reservoir

13

View publication stats

You might also like