You are on page 1of 32

Environmental Research Letters

ACCEPTED MANUSCRIPT • OPEN ACCESS

Leveraging machine learning for predicting flash flood damages in the


Southeast US
To cite this article before publication: Atieh Alipour et al 2020 Environ. Res. Lett. in press https://doi.org/10.1088/1748-9326/ab6edd

Manuscript version: Accepted Manuscript


Accepted Manuscript is “the version of the article accepted for publication including all changes made as a result of the peer review process,
and which may also include the addition to the article by IOP Publishing of a header, an article ID, a cover sheet and/or an ‘Accepted
Manuscript’ watermark, but excluding any other editing, typesetting or other changes made by IOP Publishing and/or its licensors”

This Accepted Manuscript is © 2020 The Author(s). Published by IOP Publishing Ltd.

As the Version of Record of this article is going to be / has been published on a gold open access basis under a CC BY 3.0 licence, this Accepted
Manuscript is available for reuse under a CC BY 3.0 licence immediately.

Everyone is permitted to use all or part of the original content in this article, provided that they adhere to all the terms of the licence
https://creativecommons.org/licences/by/3.0

Although reasonable endeavours have been taken to obtain all necessary permissions from third parties to include their copyrighted content
within this article, their full citation and copyright line may not be present in this Accepted Manuscript version. Before using any content from this
article, please refer to the Version of Record on IOPscience once published for full citation and copyright details, as permissions may be required.
All third party content is fully copyright protected and is not published on a gold open access basis under a CC BY licence, unless that is
specifically stated in the figure caption in the Version of Record.

View the article online for updates and enhancements.

This content was downloaded from IP address 185.46.87.233 on 25/01/2020 at 18:58


Page 1 of 31 AUTHOR SUBMITTED MANUSCRIPT - ERL-107644.R2

1
2
3 1 Leveraging Machine Learning for Predicting Flash Flood Damages in the Southeast US
4

pt
5
6 2 Atieh Alipour*, Ali Ahmadalipour, Peyman Abbaszadeh, Hamid Moradkhani
7
8
9 3 Center for Complex Hydrosystems Research, Department of Civil, Construction and

cri
10
11
12
4 Environmental Engineering, University of Alabama, Tuscaloosa, AL, USA
13
14
15
5 *Corresponding author: aalipour@crimson.ua.edu
16

us
17
6
18
19
20
21
22
23
24
25
26
27
28
an
dM
29
30
31
32
33
34
35
36
37
38
pte

39
40
41
42
43
44
45
46
ce

47
48
49
50
51
52
Ac

53
54
55
56
57
58 1
59
60
AUTHOR SUBMITTED MANUSCRIPT - ERL-107644.R2 Page 2 of 31

1
2
3 7 Abstract
4

pt
5
6 8 Flash flood is a recurrent natural hazard with substantial impacts in the Southeast U.S.
7
8 9 (SEUS) due to the frequent torrential rainfalls that occur in the region, which are triggered by
9

cri
10 10 tropical storms, thunderstorms, and hurricanes. Flash floods are costly natural hazards, primarily
11
12
13
11 due to their rapid onset. Therefore, predicting property damages of flash floods is imperative for
14
15 12 proactive disaster management. Here, we present a systematic framework that considers a variety
16

us
17 13 of features explaining different components of risk (i.e., hazard, vulnerability, and exposure), and
18
19
14 examine multiple Machine Learning (ML) methods to predict flash flood damages. A large
20
21
22 15 database of flash flood events consisting of more than 14,000 events are assessed for training and
23
24
25
26
27
28
16

17
an
testing the methodology, while multitude of data sources are utilized to acquire reliable

information related to each event. A variable selection approach was employed to alleviate the
dM
29 18 complexity of the dataset and facilitate the model development process. The Random Forest (RF)
30
31 19 method was then used to map the identified input covariates to a target variable (i.e. property
32
33 20 damage). The RF model was implemented in two modes: first, as a binary classifier to estimate if
34
35
36
21 a region of interest was damaged in any particular flood event, and then as a regression model to
37
38 22 predict the amount of property damage associated with each event. The results indicate that the
pte

39
40 23 proposed approach is successful not only for classifying damaging events (with an accuracy of
41
42
24 81%), but also for predicting flash flood damage with a good agreement with the observed property
43
44
45 25 damages. This study is among the few efforts for predicting flash flood damages across a large
46
ce

47 26 domain using mesoscale input variables, and the findings demonstrate the effectiveness of the
48
49 27 proposed methodology.
50
51
52 28 Keywords: Flash flood, risk, flood damage, machine learning
Ac

53
54
55
56
57
58 2
59
60
Page 3 of 31 AUTHOR SUBMITTED MANUSCRIPT - ERL-107644.R2

1
2
3 29 1. Introduction
4

pt
5
6
30 The Southeast U.S. (SEUS) is known to be susceptible to flash flooding due to the frequent
7
8
9 31 high intensity rainfalls triggered by tropical storms, thunderstorms and hurricanes (Smith and

cri
10
11 32 Smith 2015; Czajkowski et al. 2011; Orville and Huffines 2001). During the last two decades,
12
13 33 widespread flash flood events have caused significant economic damage in this region. Recent
14
15
16 34 studies have shown that the frequency of flash flooding is increasing in the SEUS (Alipour et al.

us
17
18 35 2020). Therefore, predicting property damages of flash floods is crucial for attaining proactive
19
20 36 disaster management in this region.
21
22
23
24
25
26
27
28
37

38

39
an
Generally, risk refers to the potential losses of a particular hazard (Ahmadalipour et al.

2019; Armenakis et al. 2017; Cardona et al. 2012), which is characterized as a function of three

major components: hazard, vulnerability, and exposure (Adger 2006; Dang et al. 2011; Winsemius
dM
29
30 40 et al. 2013; Koks et al. 2015; Budiyono et al. 2015). Assessing flash flood risk components has
31
32
33 41 been the subject of several studies. Recently, Ahmadalipour and Moradkhani (2019) investigated
34
35 42 the spatiotemporal characteristics of flash flooding hazard over the Contiguous United States
36
37 43 (CONUS). Also, Khajehei et al. (2020) assessed the socio-economic vulnerability of flash flooding
38
pte

39
44 at the county scale across the entire CONUS while accounting for flash flood characteristics
40
41
42 45 including duration, frequency, magnitude and severity.
43
44
45 46 The conventional approaches for modeling flood risk are mostly dependent on the flood
46
ce

47 47 water depth to estimate the associated damage (Aerts J.C. et al. 2014; Velasco et al. 2014). Several
48
49
50
48 recent studies have shown that considering multi variate data will improve the damage estimates
51
52 49 (Wagenaar et al. 2017). Therefore, over the past few years, several studies evaluated flood risk in
Ac

53
54
55
56
57
58 3
59
60
AUTHOR SUBMITTED MANUSCRIPT - ERL-107644.R2 Page 4 of 31

1
2
3 50 various regions of the globe (van Berchum et al. 2018; Arnell and Gosling 2016; de Moel et al.
4

pt
5
6 51 2015) using a multitude of variables representing hazard, vulnerability, and exposure.
7
8
9 52 Recent advances in Machine Learning (ML) techniques have led to significant

cri
10
11 53 improvements in flood risk assessment (Lai et al. 2016; Wang et al. 2015). Artificial neural
12
13 54 network, decision tree, logistic regression, random forest, regression tree, support vector machine
14
15
16 55 are the most widely used ML models for flood risk assessments (Gotham et al. 2018; Kourgialas

us
17
18 56 and Karatzas 2017; Mojaddadi et al. 2017; Nafari and Ngo, 2018; Shafapour Tehrany et al. 2019;
19
20 57 Terti et al. 2019). Table 1S lists all of the factors used in these studies. Although some of these
21
22
23
24
25
26
27
28
58

59

60
an
works assessed the flood damage prediction, few attempted to predict the potential property

damage of the flash flooding events. In addition, the majority of damage prediction studies are

conducted at small-scale regional domains, explicitly applicable to the region of interest (Garrote
dM
29
61 et al. 2016; Scheuer et al. 2011).
30
31
32
33 62 Therefore, in this study, we propose a risk-based and physically informed model for near
34
35 63 real-time estimation of the potential property damages of flash flood events across the SEUS.
36
37 64 Several influential factors including geographic, socioeconomic, and climatic features are utilized
38
pte

39
65 as input to the ML model in order to predict property damage of each flash flood event. This study
40
41
42 66 also presents a unique model input structure/topology by which the ML model produces improved
43
44 67 results and would be a universal approach to predict potential property damage in any region of
45
46 68 interest. The model was trained and tested based on a large database consisting of more than 14,000
ce

47
48
49 69 flash flood events during 1996-2017. In this study, the overarching research objective is to develop
50
51 70 a risk-informed mesoscale flash flood damage prediction model across the SEUS that assist
52
Ac

53 71 decision makers and insurance companies dealing with flood risk assessment.
54
55
56
57
58 4
59
60
Page 5 of 31 AUTHOR SUBMITTED MANUSCRIPT - ERL-107644.R2

1
2
3 72 2. Study Area and Data
4

pt
5
6
73 In this study, several data sources have been utilized to acquire the information for flash
7
8
9 74 flood events as well as physical and geographical characteristics across the SEUS during 1996 to

cri
10
11 75 2017. Each dataset and their characteristics are thoroughly explained in the following sections.
12
13
14 76 2.1. Study Area
15
16

us
17 77 The study area encompasses nine southeastern U.S. states (referred to as SEUS in this
18
19 78 study) including Alabama, Arkansas, Florida, Georgia, Louisiana, Mississippi, North Carolina,
20
21
22 79 South Carolina, and Tennessee. The climate of this region varies with latitude, topography, and
23
24
25
26
27
28
80

81
an
proximity to Atlantic Ocean and Gulf of Mexico (Ingram et al. 2013). The high-pressure system,

known as Bermuda High, commonly draws moisture form Atlantic Ocean and Gulf of Mexico,
dM
82 and causes warm and humid summer in the SEUS along with frequent thunderstorms (Zhu and
29
30
31 83 Liang 2013). Based on the 2017 U.S. census estimation, over 61 million people reside in the 755
32
33 84 counties of the SEUS. A large number of flash flood events have impacted the SEUS in the past
34
35 85 couple of decades and imposed billions of dollars in damage to the SEUS residents.
36
37
38 86 2.2. NOAA Storm Events Database
pte

39
40
41 87 The National Oceanic and Atmospheric Administration (NOAA) Storm Events database is
42
43
44 88 a comprehensive repository that provides information for different types of natural disasters, such
45
46 89 as flash flooding across the U.S. from 1996 to present. This information include the beginning and
ce

47
48 90 termination date and time, location, the associated injuries and fertilities, amount of damage to
49
50
51
91 properties and crops, and event narrative (Ashley and Ashley 2008; Sharif et al. 2015; Konisky et
52
Ac

53 92 al. 2016; Hamidi et al. 2017; Shah et al. 2017). In this study, we have used the NOAA storm
54
55
56
57
58 5
59
60
AUTHOR SUBMITTED MANUSCRIPT - ERL-107644.R2 Page 6 of 31

1
2
3 93 events database to obtain the information for 14,317 flash flood events including the onset time,
4

pt
5
6 94 duration, date, location, and property damages during 1996 to 2017.
7
8
95 2.3. NLDAS-2 Hourly Precipitation
9

cri
10
11
12
96 The precipitation data from Phase 2 of the North American Land Data Assimilation System
13
14 97 (NLDAS-2) are available at 1/8th-degree spatial resolution (about 12km) and hourly temporal
15
16 98 resolution during the period of January 1979 to present (Xia et al. 2012). The hourly NLDAS-2

us
17
18
99 precipitation data is generated from different in situ and remote sensing data sources (Yu et al.
19
20
21 100 2017).
22
23
24
25
26
27
28
101

102
an
Since flash floods generally occur in small catchments at usually less than 1000 km2

(Villarini et al. 2010; Llasat et al. 2016), the hourly NLDAS-2 precipitation data is upscaled to 0.3
dM
103 grid cell (using bilinear interpolation) so as to represent an approximate inundated area of 1000
29
30
31 104 km2. Then, the location, start time, and duration of each flash flood event (acquired from NOAA
32
33 105 storm events database) are utilized to extract the mean and cumulative precipitation during each
34
35 106 flash flood event using the NLDAS-2 data. The mean and the cumulative precipitation represent
36
37
38 107 the intensity and severity of flash flood events, respectively, both of which are important
pte

39
40 108 characteristics for identifying flash flood hazard.
41
42
43 109 2.4. GTOPO30 Topography Data
44
45
46 110 GTOPO30 has a 1-km spatial resolution and it has been used in many studies for estimation
ce

47
48 111 of multiple topographical indices (Durand et al. 2019; Marlier et al. 2015; Folk et al. 2018;
49
50
51
112 Abbaszadeh et al. 2019b). In this study, we used GTOPO30 to derive several topographic factors
52
Ac

53 113 including altitude, slope, flow accumulation, and TRI at different spatial resolutions (i.e. 1, 3, and
54
55 114 30 kilometers) corresponding to each flash flood event.
56
57
58 6
59
60
Page 7 of 31 AUTHOR SUBMITTED MANUSCRIPT - ERL-107644.R2

1
2
3 115 2.5. Zillow Database
4

pt
5
6 116 Zillow real estate provides an index known as the Zillow Home Value Index (ZHVI) that
7
8 117 is the median home value in a specific geographic region and housing type since 1996 to present.
9

cri
10 118 Several studies have utilized this product for risk analysis (Watson et al. 2016; Miller 2018;
11
12
13
119 Morckel 2017). In this study, we used ZHIVI to evaluate the median home value for all homes in
14
15 120 each county during 1996 to 2017. The median home value is an indicator of flash flood exposure.
16

us
17 121 The Zillow dataset does not include information for all counties in all years, so we used machine
18
19
122 learning to predict the missing values, which is explained in section 3.1 in more details.
20
21
22
23
24
25
26
27
28
123

124

125
2.6. U.S. Census Bureau Database

an
The U.S. Census Bureau aims to provide accurate data of the people, economy of the nation

and the geographic information of the country including the boundaries map of the state, county,
dM
29
30 126 place, and census tracts through questionnaires every 10 years. Here, we have extracted the
31
32
127 population of SEUS counties from this dataset to analyze the flash flood exposure. We also derived
33
34
35 128 the area of each county from the SEUS counties shapefile to estimate the population density as
36
37 129 another indicator of flash flood exposure.
38
pte

39
40 130 2.7. The Centers for Disease Control and Prevention’s Social Vulnerability Index
41
42 131 The Centers for Disease Control and Prevention’s (CDC) Social Vulnerability Index (SVI)
43
44
132 is based on 15 social factors that consider unemployment, minority status, and disability. These
45
46
133 factors are divided into four themes namely socioeconomic status, household composition and
ce

47
48
49 134 disability, minority status and language, and housing and transportation (Cimellaro et al. 2016).
50
51 135 The SVI values are available for the years 2000, 2010, 2014, and 2016. Since this data is not
52
Ac

53
54 136 available for all the years during 1996 to 2017, for simplicity, accuracy and consistency we used
55
56 137 the 2016 SVI at county level to evaluate flash flood vulnerability to flash flood events.
57
58 7
59
60
AUTHOR SUBMITTED MANUSCRIPT - ERL-107644.R2 Page 8 of 31

1
2
3 138 3. Methodology
4

pt
5
6 139 This study proposes a risk-based model for flash flood damage prediction over the SEUS,
7
8 140 a valuable tool for decision makers and insurance companies. The framework of the proposed
9

cri
10 141 approach is presented in Figure 1.
11
12
13 142
14
15
16

us
17
18
19
20
21
22
23
24
25
26
27
28
an
dM
29
30
31
32
33
34
35
36
37
38
pte

39
40
41
42
43
44
143
45
46
ce

47 144 Figure 1. Schematic representation of the proposed framework for flash flood damage prediction. In the
48
49 145 figure, ANN (MLP) stands for Artificial Neural Network (Multilayer Perceptron), and RF is Random
50
51
146 Forest.
52
Ac

53
54 147 -----------------
55
56
57
58 8
59
60
Page 9 of 31 AUTHOR SUBMITTED MANUSCRIPT - ERL-107644.R2

1
2
3 148 3.1. Filling the Gaps in Zillow Dataset
4

pt
5
6
149 One of the variables used in this study is median home value that explains the flash flood
7
8
9 150 exposure. We utilized Zillow dataset to extract this information for each flash flood event during

cri
10
11 151 1996 to 2017 over the SEUS. Unfortunately, the median home value is not available for all counties
12
13 152 and all years in the study period. To cope with this shortcoming, we utilized Artificial Neural
14
15
16 153 Network (ANN) to predict the missing median home values. ANN models are suitable for

us
17
18 154 modeling a wide variety of nonlinear problems by extrapolating the relationships between a set of
19
20 155 inputs and the output without taking any prior assumption nor any knowledge of the underlying
21
22
23
24
25
26
27
28
156

157

158
an
physics of the process (ASCE Task Committee 2000a; ASCE Task Committee 2000b; Mitra et al.

2016; Asadi et al. 2013). There are several versions of ANNs that could be adopted to find missing

home values. Our appraisal analysis suggests that a simple ANN-MLP structure suffices to
dM
29
159 properly estimate the missing home values in this study. A typical ANN-MLP structure consists
30
31
32 160 of three layers including input layer, hidden layer that involves neurons, and output layer. ANN
33
34 161 model with large dimension that include irrelevant inputs, behave poorly (Bowden et al., 2005a;
35
36 162 Bowden et al., 2005b; Wu et al., 2014). In this study, our input variables only include the centroid
37
38
163 latitude and longitude of the counties, the year and its corresponding population of county, while
pte

39
40
41 164 the output layer is the median home value of each specific year and county. Number of neurons in
42
43 165 the hidden layer has been selected by trial-and-error approach. Out of 16610 samples (755 counties
44
45
46 166 in the SEUS during 1996-2017 period: 755×22), 9853 cases were available in the Zillow dataset
ce

47
48 167 and the remaining median home value data (16610 – 9853 = 6757 cases) were missing.
49
50
51 168 There are several methods for splitting the data into different subsets for training,
52
Ac

53 169 validation and testing the model (Bowden et al. 2002; Wu et al. 2013). Here, we randomly
54
55
56 170 separated the 9853 data into three groups: training (70% of data), to train and calibrate the ANN
57
58 9
59
60
AUTHOR SUBMITTED MANUSCRIPT - ERL-107644.R2 Page 10 of 31

1
2
3 171 model, validation (15% of data), to validate the trained model and avoid the potential of model
4

pt
5
6 172 overfitting, and testing (15% of data), to verify the performance of trained model. It is important
7
8 173 to note that random separation of data set assures the generalizability of the trained model. We
9

cri
10 174 normalized the input variables and trained the ANN-MLP model using the training dataset.
11
12
13
175 Validation is an important part of the modeling (Humphrey et al.). Here, we used validation data
14
15 176 for early stopping during the model development process. The trained model was verified using
16

us
17 177 the testing dataset as shown in Figure 2. The result shows a high correlation between the model
18
19
178 output and the actual values of median home value reported by Zillow. Therefore, the trained
20
21
22 179 model was used to estimate the missing median home values in the Zillow dataset.
23
24
25
26
27
28
an
dM
29
30
31
32
33
34
35
36
37
38
pte

39
40
41
42
43
44
45
46
ce

47
48
49
50
180
51
52
Ac

53 181 Figure 2. Verification result for the ANN-MLP model for the testing period that is used to fill out
54
55
56 182 the missing values of the median home values of the Zillow dataset; R= correlation coefficient.
57
58 10
59
60
Page 11 of 31 AUTHOR SUBMITTED MANUSCRIPT - ERL-107644.R2

1
2
3 183 -----------------
4

pt
5
6 184 3.2. Variable Selection
7
8
9 185 Variable selection is a common procedure for model development in artificial intelligence.

cri
10
11 186 It helps remove the redundant predictors that add noise to the major estimators, and saves
12
13 187 computation time. Additionally, it prevents the potential overfitting of the model. Figure 3
14
15
188 illustrates the variable selection process, and the final selected features are shown in red, yellow,
16

us
17
18 189 blue, and grey that are respectively representing exposure, vulnerability, hazard, and
19
20 190 spatiotemporal features.
21
22
23
24
25
26
27
28
191

192

193
an
We selected our variables in different steps, such that we would able to address one issue at a time.

The geomorphologic features of the inundated area namely altitude, slope, flow accumulation, and

topographic roughness index are extracted at different spatial resolutions (1, 3, and 30 kilometers).
dM
29
30 194 The correlation between different resolutions and the reported damage was estimated and the one
31
32 195 with the highest correlation was selected in this step. Please note that we also used Spearman
33
34 196 correlation coefficient that assesses the monotonic relationship between the two variables (whether linear
35
36
197 or not) and found that the result for selected variables were the same as those obtained by Pearson
37
38
198 correlation. Afterwards, we used Variance Inflation Factor (VIF) approach to remove multicollinear
pte

39
40
41 199 variables. VIF is calculated as 1/(1-𝑅 2 ), where 𝑅 is the correlation computed for each pair of the
42
43 200 predictor variables. To further reduce the dimension of our input variables, we also performed a
44
45
46
201 leave-one-out approach where one input variable was removed and the prediction was
ce

47
48 202 implemented. Variables with the most accurate prediction result were selected as our final model
49
50 203 input variables. Most of the selected variables (e.g. duration and median home value) represents
51
52
204 the hazardousness of the flash flood events and the amount of exposed properties. Household
Ac

53
54
55 205 Composition and disability index represent the percentage of people aged 65 or older, aged 17 or
56
57
58 11
59
60
AUTHOR SUBMITTED MANUSCRIPT - ERL-107644.R2 Page 12 of 31

1
2
3 206 younger, civilian with a disability, and single-parent households. We chose this factor as it had
4

pt
5
6 207 higher correlation with the flash flood property damage based on the result. The housing of this
7
8 208 group of people may reside in regions that are more prone to flooding due to either their lack of
9

cri
10 209 awareness or financial standing. The location of the flash flooding enables our model to predict
11
12
13
210 the damage for a large region, and the timing variables (i.e. month and onset time) are indicators
14
15 211 of those factors that are not included in our study (e.g., soil moisture). We compared the models’
16

us
17 212 performance (classification and regression scenarios) with and without using the variable selection
18
19
213 approach. Therefore, we realized that the proposed variable selection procedure that includes three
20
21
22 214 main components namely correlation coefficient, VIF and leave-one-out approach all collectively
23
24
25
26
27
28
215

216 generalizable and not-overfitted models.


an
assure that our ML models are fed by the most appropriate input variables, and guarantee
dM
29
30
31
32
33
34
35
36
37
38
pte

39
40
41
42
43
44
45
46
ce

47
48
49
50
51
52
Ac

53
54
55
56
57
58 12
59
60
Page 13 of 31 AUTHOR SUBMITTED MANUSCRIPT - ERL-107644.R2

1
2
3
4

pt
5
6
7
8
9

cri
10
11
12
13
14
15
16

us
17
18
19
20
21
22
23
24
25
26
27
28
an
dM
29
30
31
32
217
33
34
218 Figure 3. Flowchart of the variable selection method, and the final 11 chosen variables (at the
35
36
37 219 bottom) that are used as input to the Random Forests model. Red, yellow, blue, and gray colors
38
pte

39 220 are used for variables representing exposure, vulnerability, hazard, and spatiotemporal features,
40
41 221 respectively.
42
43
44
222 -----------------
45
46
ce

47 223 Figure 4 shows the spatial variation of input features including vulnerability, population of
48
49
50
224 each county in 2017, median home value in 2017, mean duration of flash floods during 1996 to
51
52 225 2017, long-term average intensity of flash flood events during 1996 to 2017, flow accumulation,
Ac

53
54 226 and slope. Figure 4 also illustrates the monthly and diurnal distribution of flash flood events during
55
56
57
58 13
59
60
AUTHOR SUBMITTED MANUSCRIPT - ERL-107644.R2 Page 14 of 31

1
2
3 227 1996 to 2017. This figure indicates that flash floods are more frequent during spring and summer
4

pt
5
6 228 (April to September), and the onset is more likely to happen in the afternoon (3pm-7pm).
7
8
9

cri
10
11
12
13
14
15
16

us
17
18
19
20
21
22
23
24
25
26
27
28
an
dM
29
30
31
32
33
34
35
36
37
38 229
pte

39
40
41 230 Figure 4. The spatial variation of input features used for predicting flash flood damages. a) The
42
43 231 2016 relative vulnerability index (household composition & disability); b) Population of each
44
45
46
232 county in 2017; c) Median home value in 2017; d) Mean duration of flash floods during 1996 to
ce

47
48 233 2017; e) Long-term average intensity of flash flood events during 1996 to 2017; f) flow
49
50 234 accumulation; g) slope for each county; and the monthly (h) and diurnal (i) distribution of flash
51
52
235 flood events during 1996 to 2017.
Ac

53
54
55
236 -----------------
56
57
58 14
59
60
Page 15 of 31 AUTHOR SUBMITTED MANUSCRIPT - ERL-107644.R2

1
2
3 237 3.3. Random Forest
4

pt
5
6 238 The objective of this study is to build a model that can predict flash flood damage using
7
8 239 the event characteristics as the input variables. In this study, we used Random Forest (RF) for
9

cri
10 240 classification and prediction of flash flood property damage. RF, proposed by Breiman (2001), is
11
12
13
241 an ensemble learning method that generates multiple decision trees using a randomly selected
14
15 242 subset of samples through replacement. This method is suitable for both regression and
16

us
17 243 classification problems. Due to randomized and decorrelated features of RF, it is able to build the
18
19
244 connection between the input and output variables when their relationship is very complex and
20
21
22 245 nonlinear ( He et al. 2016; Hong et al. 2016).
23
24
25
26
27
28
246

247
an
In this study, RF was used in two modes, classification and regression (see Figure 5). For

the classification problem, we transformed the damage values to a binary zero and one scoring
dM
29 248 system, such that zero represents the events with no property damage and one refers to any damage
30
31
32 249 values greater than zero. In the regression mode, RF is used to estimate the relationships among
33
34 250 the predictors and the output variable (damage). To deal with the skewness of data, for the
35
36 251 regression model, both input and output variables were transformed using Box-Cox and log
37
38
252 transformations. We randomly split the data set into two groups, 85% of the data for training and
pte

39
40
41 253 the remaining (15% of data) for testing. In both classification and regression models, using a trial
42
43 254 and error approach, we identified 1000 regression trees to yield promising performance. Using
44
45 255 1000 trees improve the model performance compared to small size of trees, while increasing the
46
ce

47
48 256 numbers of trees to more than 1000 result in very minor improvement and significantly add to the
49
50 257 computational complexity. The model was also verified using the testing dataset.
51
52
Ac

53
54
55
56
57
58 15
59
60
AUTHOR SUBMITTED MANUSCRIPT - ERL-107644.R2 Page 16 of 31

1
2
3
4

pt
5
6
7
8
9

cri
10
11
12
13
14
15
16

us
17
18
19
20
21
22
23
24
25
26
27
28
an
dM
29
30
31
32 258
33
34
35 259 Figure 5. The schematic representation of the flash flood damage prediction framework. In the
36
37 260 figure, RF stands for Random Forest, and AUC is the area under relative-operating characteristic
38
pte

39 261 curve.
40
41
42 262 -----------------
43
44
45 263
46
ce

47
48
264 4. Result and Discussion
49
50
51 265 The results are discussed in two subsections below. Section 4.1 reports the performance of
52
Ac

53 266 the proposed classification model for classifying flash flood events to damaging and non-
54
55
56
57
58 16
59
60
Page 17 of 31 AUTHOR SUBMITTED MANUSCRIPT - ERL-107644.R2

1
2
3 267 damaging, and section 4.2 explains the effectiveness of the regression model for flash flood
4

pt
5
6 268 damage prediction.
7
8
9 269 4.1. Damaging vs. Non-damaging Classification

cri
10
11 270 Here, sensitivity (true positive rate) and specificity (true negative rate) are utilized to assess
12
13 271 the performance of the developed classifier model (Lin et al. 2019). Sensitivity measures the
14
15
272 proportion of positives that are correctly identified (i.e. the events that actually caused property
16

us
17
18 273 damage and the model correctly classified them as damaging events) and specificity assesses the
19
20 274 proportion of negatives that are correctly determined (i.e. the events that actually caused no
21
22
23
24
25
26
27
28
275

276

277
an
property damage and the model correctly classified them as non-damaging events), both of which

range from zero to one with an ideal value equal to one, which is indication of perfect model

accuracy. Therefore, sensitivity and specificity are calculated using the following equations:
dM
29
30 𝑁𝑜. 𝑜𝑓 𝑡𝑟𝑢𝑒 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 𝑑𝑎𝑚𝑎𝑔𝑖𝑛𝑔 𝑒𝑣𝑒𝑛𝑡𝑠
278 𝑆𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 = (1)
31 𝑇𝑜𝑡𝑎𝑙 𝑁𝑜. 𝑜𝑓 𝑑𝑎𝑚𝑎𝑔𝑖𝑛𝑔 𝑒𝑣𝑒𝑛𝑡𝑠
32
33 𝑁𝑜. 𝑜𝑓 𝑡𝑟𝑢𝑒 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 𝑛𝑜𝑛−𝑑𝑎𝑚𝑎𝑔𝑖𝑛𝑔 𝑒𝑣𝑒𝑛𝑡𝑠
34 279 𝑆𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦 = (2)
𝑇𝑜𝑡𝑎𝑙 𝑁𝑜. 𝑜𝑓 𝑛𝑜𝑛−𝑑𝑎𝑚𝑎𝑔𝑖𝑛𝑔 𝑒𝑣𝑒𝑛𝑡𝑠
35
36
37 280 Figure 6 shows the performance of the RF classifier model. The location of correct (blue)
38
pte

39
40 281 and incorrect (red) classified events in the testing dataset as well as the sensitivity and specificity
41
42 282 of the model for each state are shown in this figure. Figure 6a and 6b show the result for damaging
43
44 283 and non-damaging events. The number of correct (true) and incorrect (false) classifications are
45
46
284 shown in each figure panel as well. The overall performance of the model is fairly high in both
ce

47
48
49 285 classifying of damaging and non-damaging events. The sensitivity and specificity for the states of
50
51 286 Alabama, Florida, and Louisiana are considerably high (greater than 0.75), which indicates the
52
Ac

53
287 higher reliability of classification model in these states. Sensitivity and specificity are inversely
54
55
56 288 proportional, such that if sensitivity increases, specificity will decrease and vice versa (Parikh et
57
58 17
59
60
AUTHOR SUBMITTED MANUSCRIPT - ERL-107644.R2 Page 18 of 31

1
2
3 289 al. 2008). This is in particular more apparent for the case of Mississippi and North Carolina.
4

pt
5
6 290 Although North Carolina has low value of sensitivity, it has a high value of specificity (>0.9).
7
8 291 Conversely, a high value of sensitivity and low value of specificity is observed for the Mississippi
9

cri
10 292 state.
11
12
13
14
15
16

us
17
18
19
20
21
22
23
24
25
26
27
28
an
dM
29
30
31
32
33
34
35
36
37
38
pte

39
40
41
42
43
44
45
46
ce

47
48
49
50
51
52
Ac

53
54
55
56
57
58 18
59
60
Page 19 of 31 AUTHOR SUBMITTED MANUSCRIPT - ERL-107644.R2

1
2
3
4

pt
5
6
7
8
9

cri
10
11
12
13
14
15
16

us
17
18
19
20
21
22
23
24
25
26
27
28
an
dM
29
30
31
32
33
34
35
36
37
38
pte

39
40
41
42
43
44
45
46
ce

47
48
49 293
50
51 294 Figure 6. The performance of the proposed binary damage classification approach for (a)
52
Ac

53 295 damaging and (b) non-damaging flash flood events. The blue and red colors indicate the true and
54
55
296 false predictions, respectively. The points on the map show the location of flash flood events. The
56
57
58 19
59
60
AUTHOR SUBMITTED MANUSCRIPT - ERL-107644.R2 Page 20 of 31

1
2
3 297 total number of correct and incorrect predicted events are shown for both cases. On the right side
4

pt
5
6 298 of the panels, the sensitivity and specificity of the model are shown for each state.
7
8
9 299 -----------------

cri
10
11
12
300 To better understand the classifier model’s performance, the overall sensitivity and
13
14 301 specificity of the model, as well as the accuracy of the model (𝑠𝑒𝑒 𝐸𝑞. 3) are presented in Figure
15
16 302 7a. The model accuracy, sensitivity, and specificity indicate the reliability of the model in the

us
17
18
303 classification of flash flood damage.
19
20
21 𝑁𝑜. 𝑜𝑓 𝑡𝑟𝑢𝑒 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 𝑑𝑎𝑚𝑎𝑔𝑖𝑛𝑔 𝑒𝑣𝑒𝑛𝑡𝑠 + 𝑁𝑜. 𝑜𝑓 𝑡𝑟𝑢𝑒 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 𝑛𝑜𝑛−𝑑𝑎𝑚𝑎𝑔𝑖𝑛𝑔 𝑒𝑣𝑒𝑛𝑡𝑠
22 304 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = (3)
23
24
25
26
27
28
305

306
an
𝑇𝑜𝑡𝑎𝑙 𝑁𝑜. 𝑜𝑓 𝑒𝑣𝑒𝑛𝑡𝑠

Moreover, to further evaluate the ML model, we estimated the area under the Relative-

Operating Characteristic (ROC) curve. The ROC curve is considered as a representation of the
dM
29
30 307 model trade-off between the false positive (1 ̶ specificity) and true positive (sensitivity) rates, and
31
32 308 it ranges between 0.5 to 1, where 1 is the ideal value (Rahmati and Pourghasemi 2017; Chapi et
33
34 309 al. 2017). The dashed line in figure 7a represents the area of 0.5 that means an inaccurate model.
35
36
37 310 The area under the curve (AUC) indicates the accuracy of the model. Several studies employed
38
pte

39 311 AUC to measure the performance of classifier models. For instance, Joo et al. (2019) used a
40
41 312 Bayesian network to integrate weights of different variables that affect flood damage and reported
42
43
44 313 AUC value of 0.67 for their method. The high value of AUC (i.e. 0.87) shown in figure 7a is an
45
46 314 indication of the reliability of the proposed model.
ce

47
48
49 315 Figure 7b shows the importance of each variable in the developed classifier model. The
50
51 316 importance of each variable is calculated based on the increase in the prediction error, if the values
52
Ac

53
317 of that variable are permuted across the process. As it can be seen from the figure, the most
54
55
56 318 important variables are the location of the event (latitude and longitude). This implies that by
57
58 20
59
60
Page 21 of 31 AUTHOR SUBMITTED MANUSCRIPT - ERL-107644.R2

1
2
3 319 considering the location of events along with other geographic, socioeconomic, and flood factors,
4

pt
5
6 320 we can extend our prediction to larger domains. The flow accumulation is the least important
7
8 321 feature, however the leave-one-out approach mentioned earlier in section 3.2 indicated that
9

cri
10 322 keeping this variable increases the accuracy.
11
12
13
14
15
16

us
17
18
19
20
21
22
23
24
25
26
27
28
323
an
dM
29 324 Figure 7. a) The Relative-Operating Characteristic (ROC) curve of the proposed Random Forest
30
31 325 classifier; AUC = Area Under Curve. b) The relative importance of features for the random forest
32
33
34 326 classifier model.
35
36
37 327 -----------------
38
pte

39
40 328 4.2. Damage Prediction Model
41
42 329 RF not only is used as a classifier, it is also implemented to predict the amount of property
43
44 330 damage from a particular flash flood event. The flash flood events that caused property damage
45
46
331 were randomly divided into two parts: training (85% of dataset) and testing (15% dataset). The
ce

47
48
49 332 result of the developed model are evaluated using two performance measures: correlation
50
51 333 coefficient (R) and bias, both of which have been commonly used to measure the accuracy and
52
Ac

53 334 performance of the ML models (Abbaszadeh et al. 2019a; Neri et al. 2019; Gavahi et al. 2019;
54
55
56 335 Shastry and Durand 2019). Here, the regression (i.e. damage prediction) model is evaluated for
57
58 21
59
60
AUTHOR SUBMITTED MANUSCRIPT - ERL-107644.R2 Page 22 of 31

1
2
3 336 training, testing, as well as the entire dataset, and the results are shown in Figure 8. The statistical
4

pt
5
6 337 measures shown in this figure indicate that there is a satisfactory agreement between the observed
7
8 338 and predicted values. However, a slightly negative bias is observed in the model (mean -$1100 for
9

cri
10 339 testing and -$1010 in overall).
11
12
13
14
15
16

us
17
18
19
20
21
22
23
24
25
26
27
28
340

341
an
Figure 8. The performance of the Random Forest model in prediction of flash flood damage over
dM
29
30 342 the SEUS. The subplots indicate the histogram of bias during the training, testing, and the entire
31
32
33 343 dataset (totaling 5500, 970, and 6470 events, respectively). The axis titles for all the panels are the
34
35 344 same.
36
37
38 345 -----------------
pte

39
40
41 346 The findings of several studies suggest that climate change will increase the likelihood of
42
43
44
347 flooding events (Marsooli e al. 2019; Sisco et al. 2017; Yin et al. 2018; Zhang et al. 2018), and
45
46 348 therefore, proactive disaster risk management strategies are required. The proposed framework in
ce

47
48 349 this study can help the decision makers and insurance agencies to better allocate the resources and
49
50
350 inform the communities about the hazardousness of flash flood events (Shao et al. 2019).
51
52
Ac

53
54
55
56
57
58 22
59
60
Page 23 of 31 AUTHOR SUBMITTED MANUSCRIPT - ERL-107644.R2

1
2
3 351 5. Summary and Conclusion
4

pt
5
6 352 This study proposed a risk-based and physically informed model for predicting flash flood
7
8 353 property damages across the Southeast U.S. (SEUS) using a variety of influential factors including
9

cri
10 354 geographic, socioeconomic, and climatic features. We selected Random Forest (RF) as the central
11
12
13
355 model. The model was trained and tested using the information acquired from various data sources
14
15 356 for a large number of flash flood events during the period of 1996 to 2017. RF has been
16

us
17 357 implemented in two different modes, classification and regression. In the classification mode, we
18
19
358 estimated whether the flash flood caused any property damage or not, and then in the regression
20
21
22 359 mode, the amount of property damage was predicted. Various statistical measures were employed
23
24
25
26
27
28
360

361 reliability of the developed framework.


an
to evaluate the performance of both classifier and regression models, and the results indicated the
dM
29 362 The findings of this study suggest the applicability and accuracy of RF model for
30
31
32 363 prediction of property damages associated with flash flood events over a large domain. For future
33
34 364 work, researchers are encouraged to develop probabilistic models for predicting flash flood
35
36 365 damage. Moreover, additional predictors such as watershed properties can be incorporated into the
37
38
366 model.
pte

39
40
41
367
42
43
44 368 6. Acknowledgment
45
46 369 We would like to acknowledge the National Centers for Environmental Information for
ce

47
48 370 providing access to the NOAA Storm Events Database. We also appreciate the data provided by
49
50
51 371 the North American Land Data Assimilation Systems. The authors declare no competing interests.
52
Ac

53
54
55
56
57
58 23
59
60
AUTHOR SUBMITTED MANUSCRIPT - ERL-107644.R2 Page 24 of 31

1
2
3 372 7. Data statement
4

pt
5
6 373 The data that support the findings of this study are available from the corresponding author
7
8 374 upon reasonable request.
9

cri
10
11 375 References
12
13
14 376 Abbaszadeh, P., H. Moradkhani, and D. N. Daescu, 2019a: The Quest for Model Uncertainty
15 377 Quantification: A Hybrid Ensemble and Variational Data Assimilation Framewor. Water
16

us
378 Resour. Res., 55, 2407–2431, doi:10.1029/2018WR023629.
17
18
379 Abbaszadeh, P., H. Moradkhani, and X. Zhan, 2019b: Downscaling SMAP Radiometer Soil
19
20 380 Moisture Over the CONUS Using an Ensemble Learning Method. Water Resour. Res., 55,
21 381 324–344, doi:10.1029/2018WR023354.
22
23
24
25
26
27
28
382
383
384

385
386
doi:10.1016/j.gloenvcha.2006.02.006.
an
Adger, W. N., 2006: Vulnerability. Glob. Environ. Chang., 16, 268–281,

https://linkinghub.elsevier.com/retrieve/pii/S0959378006000422.
Aerts, J.C., Botzen, W.W., Emanuel, K., Lin, N., De Moel, H. and Michel-Kerjan, E.O., 2014.
Evaluating flood resilience strategies for coastal megacities. Science, 344(6183), pp.473-
dM
29 387 475.
30
31 388 Ahmadalipour, A., H. Moradkhani, A. Castelletti, and N. Magliocca, 2019: Future drought risk
32
389 in Africa: Integrating vulnerability, climate change, and population growth. Sci. Total
33
34 390 Environ., 662, 672–686, doi:10.1016/j.scitotenv.2019.01.278.
35 391 https://doi.org/10.1016/j.scitotenv.2019.01.278.
36
37 392 Ahmadalipour, A., and H. Moradkhani 2019: A data-driven analysis of flash flood hazard,
38 393 fatalities, and damages over the CONUS during 1996-2017, Journal of Hydrology, doi:
pte

39 394 10.1016/j.jhydrol.2019.124106
40
41 395 Alipour A., A. Ahmadalipour, H. Moradkhani, 2020: Assessing flash flood hazard and damages
42 396 in southeast U.S. (SEUS), J. Flood Risk Manag. (Under review)
43
44 397 Armenakis, C., E. Du, S. Natesan, R. Persad, and Y. Zhang, 2017: Flood Risk Assessment in
45 398 Urban Areas Based on Spatial Analytics and Social Factors. Geosciences, 7, 123,
46
399 doi:10.3390/geosciences7040123.
ce

47
48
400 Arnell, N. W., and S. N. Gosling, 2016: The impacts of climate change on river flood risk at the
49
50 401 global scale. Clim. Change, 134, 387–401, doi:10.1007/s10584-014-1084-5.
51
52 402 Asadi, S., J. Shahrabi, P. Abbaszadeh, and S. Tabanmehr, 2013: A new hybrid artificial neural
Ac

53 403 networks for rainfall-runoff process modeling. Neurocomputing, 121, 470–480,


54 404 doi:10.1016/j.neucom.2013.05.023. http://dx.doi.org/10.1016/j.neucom.2013.05.023.
55
56 405 ASCE Task Committee, 2000a: ASCE Task Committee Artificial neural networks in hydrology-
57
58 24
59
60
Page 25 of 31 AUTHOR SUBMITTED MANUSCRIPT - ERL-107644.R2

1
2
3 406 I: Preliminary concepts. Journal of Hydrologic Engineering, ASCE, 5 (2), pp. 115-123
4

pt
5 407 ASCE Task Committee, 2000b: ASCE Task Committee Artificial neural networks in hydrology-
6
408 II: Hydrologic applications. Journal of Hydrologic Engineering, ASCE, 5 (2) , pp. 124-137
7
8
409 Ashley, S. T., and W. S. Ashley, 2008: Flood fatalities in the United States. J. Appl. Meteorol.
9

cri
10 410 Climatol., 47, 805–818.
11
12 411 van Berchum, E. C., W. Mobley, S. N. Jonkman, J. S. Timmermans, J. H. Kwakkel, and S. D.
13 412 Brody, 2018: Evaluation of flood risk reduction strategies through combinations of
14 413 interventions. J. Flood Risk Manag., 1–17, doi:10.1111/jfr3.12506.
15
16 414 Bowden, G. J., Dandy, G. C., and Maier, H. R., 2005a: Input determination for neural network

us
17 415 models in water resources applications. Part 1 - background and methodology. Journal of
18 416 Hydrology, 301(1-4), 75-92.
19
20 417 Bowden, G. J., Maier, H. R., and Dandy, G. C., 2005b: Input determination for neural network
21 418 models in water resources applications. Part 2. Case study: forecasting salinity in a river.
22
23
24
25
26
27
28
419

420
421

422
Journal of Hydrology, 301(1-4), 93-107.

an
Bowden, G. J., Maier, H. R., and Dandy, G. C., 2002: Optimal division of data for neural
network models in water resources applications. Water Resour. Res., 38(2), 1010.
Breiman, L., 2001: (impo)Random forests(book). Mach. Learn., 5–32,
dM
29 423 doi:10.1023/A:1010933404324. http://link.springer.com/article/10.1023/A:1010933404324.
30
31 424 Budiyono, Y., J. Aerts, J. J. Brinkman, M. A. Marfai, and P. Ward, 2015: Flood risk assessment
32 425 for delta mega-cities: a case study of Jakarta. Nat. Hazards, 75, 389–413,
33 426 doi:10.1007/s11069-014-1327-9.
34
35 427 Cardona, O. D., and Coauthors, 2012: Determinants of risk: Exposure and vulnerability. Manag.
36 428 Risks Extrem. Events Disasters to Adv. Clim. Chang. Adapt. Spec. Rep. Intergov. Panel
37 429 Clim. Chang., 9781107025, 65–108, doi:10.1017/CBO9781139177245.005.
38
pte

39 430 Chapi, K., V. P. Singh, A. Shirzadi, H. Shahabi, D. T. Bui, B. T. Pham, and K. Khosravi, 2017:
40
431 A novel hybrid artificial intelligence approach for flood susceptibility assessment. Environ.
41
42 432 Model. Softw., 95, 229–245, doi:10.1016/j.envsoft.2017.06.012.
43 433 http://dx.doi.org/10.1016/j.envsoft.2017.06.012.
44
45 434 Cimellaro, G. P., C. Renschler, A. M. Reinhorn, and L. Arendt, 2016: PEOPLES: A Framework
46 435 for Evaluating Resilience. J. Struct. Eng., 142, 04016063, doi:10.1061/(asce)st.1943-
ce

47 436 541x.0001514.
48
49 437 Czajkowski, J., K. Simmons, and D. Sutter, 2011: An analysis of coastal and inland fatalities in
50 438 landfalling US hurricanes. Nat. Hazards, 59, 1513–1531, doi:10.1007/s11069-011-9849-x.
51
52 439 Durand, F., C. G. Piecuch, M. Becker, F. Papa, S. V. Raju, J. U. Khan, and R. M. Ponte, 2019:
Ac

53 440 Impact of Continental Freshwater Runoff on Coastal Sea Level. Surv. Geophys.,
54
441 doi:10.1007/s10712-019-09536-w. https://doi.org/10.1007/s10712-019-09536-w.
55
56
57
58 25
59
60
AUTHOR SUBMITTED MANUSCRIPT - ERL-107644.R2 Page 26 of 31

1
2
3 442 Folk, R. A., C. J. Visger, P. S. Soltis, D. E. Soltis, and R. P. Guralnick, 2018: Geographic Range
4

pt
443 Dynamics Drove Ancient Hybridization in a Lineage of Angiosperms. Am. Nat., 192, 171–
5
6 444 187, doi:10.1086/698120.
7
8 445 Garrote, J., F. M. Alvarenga, and A. Díez-Herrero, 2016: Quantification of flash flood economic
9 446 risk using ultra-detailed stage–damage functions and 2-D hydraulic models. J. Hydrol., 541,

cri
10 447 611–625, doi:10.1016/j.jhydrol.2016.02.006.
11 448 http://dx.doi.org/10.1016/j.jhydrol.2016.02.006.
12
13 449 Gavahi, K., S. J. Mousavi, and K. Ponnambalam, 2019: Adaptive forecast-based real-time
14 450 optimal reservoir operations: application to lake Urmia. J. Hydroinformatics, 1–18,
15 451 doi:10.2166/hydro.2019.005.
16

us
17 452 Gotham, K. F., R. Campanella, K. Lauve-Moon, and B. Powers, 2018: Hazard Experience,
18
453 Geophysical Vulnerability, and Flood Risk Perceptions in a Postdisaster City, the Case of
19
20 454 New Orleans. Risk Anal., 38, 345–356, doi:10.1111/risa.12830.
21
22 455 Hamidi, A., N. Devineni, J. F. Booth, A. Hosten, R. R. Ferraro, and R. Khanbilvardi, 2017:
23
24
25
26
27
28
456
457
458

459
460
an
Classifying Urban Rainfall Extremes Using Weather Radar Data: An Application to the
Greater New York Area. J. Hydrometeorol., 18, 611–623, doi:10.1175/JHM-D-16-0193.1.
http://journals.ametsoc.org/doi/10.1175/JHM-D-16-0193.1.
Hasanzadeh Nafari, R., T. Ngo, and P. Mendis, 2016: An Assessment of the Effectiveness of
Tree-Based Models for Multi-Variate Flood Damage Assessment in Australia. Water, 8,
dM
29 461 282, doi:10.3390/w8070282. http://www.mdpi.com/2073-4441/8/7/282.
30
31 462 X.G. He, N.W. Chaney, M. Schleiss, J. Sheffield, 2016:Spatial downscaling of precipitation
32
463 using adaptable random forests. Water Resour. Res., 52 (10) (2016), pp. 8217-8237
33
34
464 Humphrey G.B., Maier H.R., Wu W., Mount N.J., Dandy G.C., Abrahart R.J. and Dawson C.W.,
35
36 465 2017: Improved validation framework and R-package for artificial neural network models,
37 466 Environmental Modelling and Software, 92, 82-106, DOI: 10.1016/j.envsoft.2017.01.023.
38
467 Hong, H., H. R. Pourghasemi, and Z. S. Pourtaghi, 2016: Landslide susceptibility assessment in
pte

39
40 468 Lianhua County (China): A comparison between a random forest data mining technique and
41 469 bivariate and multivariate statistical models. Geomorphology, 259, 105–118,
42 470 doi:10.1016/j.geomorph.2016.02.012. http://dx.doi.org/10.1016/j.geomorph.2016.02.012.
43
44 471 Ingram, K. T., K. Dow, L. Carter, and J. Anderson, 2013: ClimateoftheSoutheastUnitedStates.
45
46 472 Joo, H., C. Choi, J. Kim, D. Kim, S. Kim, and H. S. Kim, 2019: A Bayesian network-based
ce

47 473 integrated for Flood Risk Assessment (InFRA). Sustain., 11, doi:10.3390/su11133733.
48
49 474 Khajehei, S., A. Ahmadalipour, W. Shao, and H. Moradkhani 2020: A Place-based Assessment
50
475 of Flash Flood Hazard and Vulnerability in the Contiguous United States, 10:448, doi:
51
52 476 10.1038/s41598-019-57349-z.
Ac

53
54 477 Khosravi, K., B. T. Pham, K. Chapi, A. Shirzadi, H. Shahabi, I. Revhaug, I. Prakash, and D. Tien
55 478 Bui, 2018: A comparative assessment of decision trees algorithms for flash flood
56 479 susceptibility modeling at Haraz watershed, northern Iran. Sci. Total Environ., 627, 744–
57
58 26
59
60
Page 27 of 31 AUTHOR SUBMITTED MANUSCRIPT - ERL-107644.R2

1
2
3 480 755, doi:10.1016/j.scitotenv.2018.01.266. https://doi.org/10.1016/j.scitotenv.2018.01.266.
4

pt
5 481 Koks, E. E., B. Jongman, T. G. Husby, and W. J. W. Botzen, 2015: Combining hazard, exposure
6
482 and social vulnerability to provide lessons for flood risk management. Environ. Sci. Policy,
7
8 483 47, 42–52, doi:10.1016/j.envsci.2014.10.013.
9 484 http://dx.doi.org/10.1016/j.envsci.2014.10.013.

cri
10
11 485 Konisky, D. M., L. Hughes, and C. H. Kaylor, 2016: Extreme weather events and climate change
12 486 concern. Clim. Change, 134, 533–547, doi:10.1007/s10584-015-1555-3.
13
14 487 Kourgialas, N. N., and G. P. Karatzas, 2017: A national scale flood hazard mapping
15 488 methodology: The case of Greece – Protection and adaptation policy approaches. Sci. Total
16 489 Environ., 601–602, 441–452, doi:10.1016/j.scitotenv.2017.05.197.

us
17 490 http://dx.doi.org/10.1016/j.scitotenv.2017.05.197.
18
19 491 Lin, L., L. Di, J. Tang, E. Yu, C. Zhang, M. S. Rahman, R. Shrestha, and L. Kang, 2019:
20
492 Improvement and validation of NASA/MODIS NRT global flood mapping. Remote Sens.,
21
22 493 11, doi:10.3390/rs11020205.
23
24
25
26
27
28
494
495
496
497
an
Llasat, M. C., R. Marcos, M. Turco, J. Gilabert, and M. Llasat-Botija, 2016: Trends in flash
flood events versus convective precipitation in the Mediterranean region: The case of
Catalonia. J. Hydrol., 541, 24–37, doi:10.1016/j.jhydrol.2016.05.040.
http://dx.doi.org/10.1016/j.jhydrol.2016.05.040.
dM
29 498 Marlier, M. E., and Coauthors, 2015: Regional air quality impacts of future fire emissions in
30 499 Sumatra and Kalimantan. Environ. Res. Lett., 10, doi:10.1088/1748-9326/10/5/054010.
31
32 500 Marsooli, R., Lin, N., Emanuel, K. and Feng, K., 2019: Climate change exacerbates hurricane
33 501 flood hazards along US Atlantic and Gulf Coasts in spatially varying patterns. Nature
34
502 communications, 10(1), pp.1-9.
35
36
503 Miller, J. A., 2018: Credit Downgrade Threat as a Non-regulatory Driver for Flood Risk
37
38 504 Mitigation and Sea Level Rise Adaptation.
pte

39
40 505 Mitra, P., and Coauthors, 2016: Flood forecasting using Internet of things and artificial neural
41 506 networks. 7th IEEE Annu. Inf. Technol. Electron. Mob. Commun. Conf. IEEE IEMCON
42 507 2016, 1–5, doi:10.1109/IEMCON.2016.7746363.
43
44 508 de Moel, H., B. Jongman, H. Kreibich, B. Merz, E. Penning-Rowsell, and P. J. Ward, 2015:
45 509 Flood risk assessments at different spatial scales. Mitig. Adapt. Strateg. Glob. Chang., 20,
46 510 865–890, doi:10.1007/s11027-015-9654-z.
ce

47
48 511 Mojaddadi, H., B. Pradhan, H. Nampak, N. Ahmad, and A. H. bin Ghazali, 2017: Ensemble
49 512 machine-learning-based geospatial approach for flood risk assessment using multi-sensor
50
513 remote-sensing data and GIS. Geomatics, Nat. Hazards Risk, 8, 1080–1102,
51
52 514 doi:10.1080/19475705.2017.1294113. https://doi.org/10.1080/19475705.2017.1294113.
Ac

53
54 515 Morckel, V., 2017: Using suitability analysis to select and prioritize naturalization efforts in
55 516 legacy cities: An example from Flint, Michigan. Urban For. Urban Green., 27, 343–351,
56 517 doi:10.1016/j.ufug.2017.09.006. http://dx.doi.org/10.1016/j.ufug.2017.09.006.
57
58 27
59
60
AUTHOR SUBMITTED MANUSCRIPT - ERL-107644.R2 Page 28 of 31

1
2
3 518 Nafari, R. H., and T. Ngo, 2018: Predictive applications of australian flood loss models after a
4

pt
519 temporal and spatial transfer. Geomatics, Nat. Hazards Risk, 9, 416–430,
5
6 520 doi:10.1080/19475705.2018.1445666. https://doi.org/10.1080/19475705.2018.1445666.
7
8 521 Neri, A., G. Villarini, K. A. Salvi, L. J. Slater, and F. Napolitano, 2019: On the decadal
9 522 predictability of the frequency of flood events across the U.S. Midwest. Int. J. Climatol., 39,

cri
10 523 1796–1804, doi:10.1002/joc.5915.
11
12 524 Orville, R., and G. Huffines, 2001: Cloud-to-Ground Lightning in the United States : NLDN
13 525 Results in the First Decade , 1989 – 98. Mon. Weather Rev., 129, 1179–1193.
14
15 526 Parikh, R., A. Mathai, S. Parikh, G. C. Sekhar, and R. Thomas, 2008: Understanding and using
16 527 sensitivity, specificity and predictive values. Indian J. Ophthalmol., 56, 45–50,

us
17 528 doi:10.4103/0301-4738.37595.
18
19 529 Rahmati, O., and H. R. Pourghasemi, 2017: Identification of Critical Flood Prone Areas in Data-
20
530 Scarce and Ungauged Regions: A Comparison of Three Data Mining Models. Water
21
22 531 Resour. Manag., 31, 1473–1487, doi:10.1007/s11269-017-1589-6.
23
24
25
26
27
28
532
533
534
535
an
Scheuer, S., D. Haase, and V. Meyer, 2011: Exploring multicriteria flood vulnerability by
integrating economic, social and ecological dimensions of flood risk and coping capacity:
From a starting point view towards an end point view of vulnerability. Nat. Hazards, 58,
731–751, doi:10.1007/s11069-010-9666-7.
dM
29 536 Sisco, M.R., Bosetti, V. and Weber, E.U., 2017: When do extreme weather events generate
30 537 attention to climate change?. Climatic change, 143(1-2), pp.227-241.
31
32 538 Shafapour Tehrany, M., L. Kumar, M. Neamah Jebur, and F. Shabani, 2019: Evaluating the
33 539 application of the statistical index method in flood susceptibility mapping and its
34
540 comparison with frequency ratio and logistic regression methods. Geomatics, Nat. Hazards
35
36 541 Risk, 10, 79–101, doi:10.1080/19475705.2018.1506509.
37 542 https://doi.org/10.1080/19475705.2018.1506509.
38
543 Shah, V., K. R. Kirsch, D. Cervantes, D. F. Zane, T. Haywood, and J. A. Horney, 2017: Flash
pte

39
40 544 flood swift water rescues, Texas, 2005–2014. Clim. Risk Manag., 17, 11–20,
41 545 doi:10.1016/j.crm.2017.06.003. http://dx.doi.org/10.1016/j.crm.2017.06.003.
42
43 546 Sharif, H. O., T. L. Jackson, M. M. Hossain, and D. Zane, 2015: Analysis of Flood Fatalities in
44 547 Texas. Nat. Hazards Rev., 16, 04014016, doi:10.1061/(ASCE)NH.1527-6996.0000145.
45 548 http://ascelibrary.org/doi/10.1061/%28ASCE%29NH.1527-6996.0000145.
46
ce

47 549 Shao, W., Feng, K., and Lin, N., 2019: Predicting support for flood mitigation based on flood
48
550 insurance purchase behavior. Environmental Research Letters, 14(5), p.054014.
49
50
551 Shastry, A., and M. Durand, 2019: Utilizing Flood Inundation Observations to Obtain Floodplain
51
52 552 Topography in Data-Scarce Regions. Front. Earth Sci., 6, 1–10,
Ac

53 553 doi:10.3389/feart.2018.00243.
54
55 554 Smith, B., and J. Smith, 2015: The Flashiest Watersheds in the Contiguous United States. J.
56 555 Hydrol., 16, 2365–2381, doi:10.1175/JHM-D-14-0217.1.
57
58 28
59
60
Page 29 of 31 AUTHOR SUBMITTED MANUSCRIPT - ERL-107644.R2

1
2
3 556 Terti, G., I. Ruin, J. J. Gourley, P. Kirstetter, Z. Flamig, J. Blanchet, A. Arthur, and S. Anquetin,
4

pt
557 2019: Toward Probabilistic Prediction of Flash Flood Human Impacts. Risk Anal., 39, 140–
5
6 558 161, doi:10.1111/risa.12921.
7
8 559 Villarini, G., W. F. Krajewski, A. A. Ntelekos, K. P. Georgakakos, and J. A. Smith, 2010:
9 560 Towards probabilistic forecasting of flash floods: The combined effects of uncertainty in

cri
10 561 radar-rainfall and flash flood guidance. J. Hydrol., 394, 275–284,
11 562 doi:10.1016/j.jhydrol.2010.02.014. http://dx.doi.org/10.1016/j.jhydrol.2010.02.014.
12
13 563 Watson, K. B., T. Ricketts, G. Galford, S. Polasky, and J. O’Niel-Dunne, 2016: Quantifying
14 564 flood mitigation services: The economic value of Otter Creek wetlands and floodplains to
15 565 Middlebury, VT. Ecol. Econ., 130, 16–24, doi:10.1016/j.ecolecon.2016.05.015.
16

us
566 http://dx.doi.org/10.1016/j.ecolecon.2016.05.015.
17
18
567 Winsemius, H. C., L. P. H. Van Beek, B. Jongman, P. J. Ward, and A. Bouwman, 2013: A
19
20 568 framework for global river flood risk assessments. Hydrol. Earth Syst. Sci., 17, 1871–1892,
21 569 doi:10.5194/hess-17-1871-2013.
22
23
24
25
26
27
28
570
571
572

573
574
an
Wu, W., May, R., Maier, H., and Dandy, G., 2013: A benchmarking approach for comparing
data splitting methods for modeling water resources parameters using artificial neural
networks. Water Resources Research, 49(11), 7598–7614.
Wu, W., Dandy, G. C., and Maier, H. R. 2014: Protocol for developing ANN models and its
application to the assessment of the quality of the ANN model development process in
dM
29 575 drinking water quality modelling. Environmental Modelling & Software, 54(0), 108-127.
30
31 576 Xia, Y., and Coauthors, 2012: Continental-scale water and energy flux analysis and validation
32
577 for the North American Land Data Assimilation System project phase 2 (NLDAS-2): 1.
33
34 578 Intercomparison and application of model products. J. Geophys. Res. Atmos., 117, n/a-n/a,
35 579 doi:10.1029/2011JD016048. http://doi.wiley.com/10.1029/2011JD016048.
36
37 580 Yin, J., Gentine, P., Zhou, S., Sullivan, S.C., Wang, R., Zhang, Y. and Guo, S., 2018: Large
38 581 increase in global storm runoff extremes driven by climate and anthropogenic changes.
pte

39 582 Nature communications, 9(1), p.4389.


40
41 583 Yu, L., S. Zhong, W. E. Heilman, and X. Bian, 2017: A comparison of the effects of El niño and
42 584 el niño modoki on subdaily extreme precipitation occurrences across the contiguous United
43 585 States. J. Geophys. Res., 122, 7401–7415, doi:10.1002/2017JD026683.
44
45 586 Zhang, W., Villarini, G., Vecchi, G.A. and Smith, J.A., 2018: Urbanization exacerbated the
46
587 rainfall and flooding caused by hurricane Harvey in Houston. Nature, 563(7731), p.384.
ce

47
48
588 Zhu, J., and X. Z. Liang, 2013: Impacts of the bermuda high on regional climate and ozone over
49
50 589 the United states. J. Clim., 26, 1018–1032, doi:10.1175/JCLI-D-12-00168.1.
51
52 590
Ac

53
54
55
56
57
58 29
59
60
AUTHOR SUBMITTED MANUSCRIPT - ERL-107644.R2 Page 30 of 31

1
2
3 591 Figure captions
4

pt
5 592 Figure 1. Schematic representation of the proposed framework for flash flood damage prediction. In the
6
7
593 figure, ANN (MLP) stands for Artificial Neural Network (Multilayer Perceptron), and RF is Random
8
9

cri
10 594 Forest.
11
12
13 595 Figure 2. Verification result for the ANN-MLP model for the testing period that is used to fill out
14
15 596 the missing values of the median home values of the Zillow dataset; R= correlation coefficient.
16

us
17
18 597 Figure 3. Flowchart of the variable selection method, and the final 11 chosen variables (at the
19
20 598 bottom) that are used as input to the Random Forests model. Red, yellow, blue, and gray colors
21
22
23
24
25
26
27
28
599

600

601
respectively. an
are used for variables representing exposure, vulnerability, hazard, and spatiotemporal features,

Figure 4. The spatial variation of input features used for predicting flash flood damages. a) The
dM
29
30 602 2016 relative vulnerability index (household composition & disability); b) Population of each
31
32 603 county in 2017; c) Median home value in 2017; d) Mean duration of flash floods during 1996 to
33
34
35 604 2017; e) Long-term average intensity of flash flood events during 1996 to 2017; f) flow
36
37 605 accumulation; g) slope for each county; and the monthly (h) and diurnal (i) distribution of flash
38
pte

39 606 flood events during 1996 to 2017.


40
41
42 607 Figure 5. The schematic representation of the flash flood damage prediction framework. In the
43
44
608 figure, RF stands for Random Forest, and AUC is the area under relative-operating characteristic curve.
45
46
ce

47 609 Figure 6. The performance of the proposed binary damage classification approach for (a)
48
49
50 610 damaging and (b) non-damaging flash flood events. The blue and red colors indicate the true and
51
52 611 false predictions, respectively. The points on the map show the location of flash flood events. The
Ac

53
54
55
56
57
58 30
59
60
Page 31 of 31 AUTHOR SUBMITTED MANUSCRIPT - ERL-107644.R2

1
2
3 612 total number of correct and incorrect predicted events are shown for both cases. On the right side
4

pt
5
6 613 of the panels, the sensitivity and specificity of the model are shown for each state.
7
8
9 614 Figure 7. a) The Relative-Operating Characteristic (ROC) curve of the proposed Random Forest

cri
10
11 615 classifier; AUC = Area Under Curve. b) The relative importance of features for the random forest
12
13 616 classifier model.
14
15
16 617 Figure 8. The performance of the Random Forest model in prediction of flash flood damage over

us
17
18
618 the SEUS. The subplots indicate the histogram of bias during the training, testing, and the entire
19
20
21 619 dataset (totaling 5500, 970, and 6470 events, respectively). The axis titles for all the panels are the
22
23
24
25
26
27
28
620

621
same.

an
dM
29 622
30
31 623
32
33
34
35
36
37
38
pte

39
40
41
42
43
44
45
46
ce

47
48
49
50
51
52
Ac

53
54
55
56
57
58 31
59
60

You might also like