You are on page 1of 14

Journal of Hydrology 470–471 (2012) 302–315

Contents lists available at SciVerse ScienceDirect

Journal of Hydrology
journal homepage: www.elsevier.com/locate/jhydrol

Classifiers for the detection of flood-prone areas using remote sensed elevation data
Massimiliano Degiorgis, Giorgio Gnecco, Silvia Gorni, Giorgio Roth ⇑, Marcello Sanguineti,
Angela Celeste Taramasso
University of Genoa, Genoa, Italy

a r t i c l e i n f o s u m m a r y

Article history: A technique is presented for the identification of the areas subject to flooding hazard. Starting from
Received 31 May 2012 remote sensed elevation data and existing flood hazard maps – usually available for limited areas –
Received in revised form 9 August 2012 the relationships between selected quantitative morphologic features and the flooding hazard are first
Accepted 3 September 2012
identified and then used to extend the hazard information to the entire catchment. This is performed
Available online 11 September 2012
This manuscript was handled by Corrado
through techniques of pattern classification, such as linear classifiers based on quantitative morphologic
Corradini, Editor-in-Chief, with the features, and support vector machines with linear and Gaussian kernels. The experiment starts by dis-
assistance of Magdeline Laba, Associate criminating between flood-prone areas and marginal hazard areas. Multiclass classifiers are subsequently
Editor used to graduate the hazard. Their designs amount to solving suitable optimization problems. Several
performance measures are considered in comparing the different classifiers, such as the area under the
Keywords: receiver operating characteristics curve, and the sum of the false positive and false negative rates.
Flood hazard The procedure has been validated for the Tanaro basin, a tributary to the major Italian river, the Po.
Flood risk management Results show a high reliability: the classifier properly identifies 93% of flood-prone areas, and only 14%
Receiver operating characteristics of the areas subject to a marginal hazard are improperly assigned. An increase of this latter value up
Linear classifiers and support-vector to 19% is detected when the same structure is applied for hazard graduation. Results derived from the
machines
application to different catchments seem to qualitatively indicate the ability of the classifier to perform
Parameter optimization
well also outside the calibration region.
Shuttle radar topography mission
Pattern classification techniques should be considered when the identification of flood-prone areas and
hazard grading is required for large regions (e.g., for civil protection or insurance purposes) or when a
first identification is needed (e.g., to address further detailed flood-mapping activities).
Ó 2012 Elsevier B.V. All rights reserved.

1. Introduction be included in the analysis (de Moel et al., 2009; Ghizzoni et al.,
2010, 2012), the still limited availability of flood-prone and haz-
Flooding is one of the most significant natural risks. Its impact ard-grading maps require an effort to be directed toward the com-
concerns almost all the components of global communities, inde- pletion of this knowledge.
pendently from their geographic location and their social and eco- Flood hazard maps constitute the result of a modeling chain
nomic structures. The mapping of the hazard component of the that usually starts from the collection of historical information
flooding risk is frequently identified as the basic element on which about past flood events, which allows for the first recognition of
risk mitigation strategies should be developed. Consequently, potentially hazardous reaches and river sections. A hydrologic
many countries regulate hazard and risk mapping by law. In analysis is then performed to define flow peak discharges and re-
1973 the Congress of the United States of America, through the lated hydrographs for assigned return periods. Those are the input
Flood Disaster Protection Act (Pub. L. No. 93-234; 87 Stat. 975), rec- for hydraulic flow propagation models, which allow for the
ognized the relevance of the flooding hazard and called for the description of water levels along the reaches under examination.
identification of floodplain areas and related hazard areas. More re- While hydrologic analyses usually involve the availability of rele-
cently, in 2007, the European Commission, through the Flood vant discharge time series, the hydraulic analysis requires the
Directive (2007/60/EC), requires European Union member states knowledge of both river channel geometry and pertinent charac-
to produce flood hazard and flood risk maps. While adverse effects teristics, such as the surface roughness and boundary conditions.
on asset values, people and the environment should in perspective At this stage, critical sections for the given return periods can be
identified. They constitute the starting point of the modeling of
⇑ Corresponding author. Address: University of Genoa, Via Montallegro 1, Genoa the inundation process over the floodplains. A variety of different
16145, Italy. Tel.: +39 0103532486; fax: +39 0103532546. models can be assumed to represent the involved physical pro-
E-mail address: giorgio.roth@unige.it (G. Roth). cesses, and the choice of the right model depends upon both the

0022-1694/$ - see front matter Ó 2012 Elsevier B.V. All rights reserved.
http://dx.doi.org/10.1016/j.jhydrol.2012.09.006
M. Degiorgis et al. / Journal of Hydrology 470–471 (2012) 302–315 303

characteristics of the physical system under analysis and the accu- features, and of Support Vector Machines (SVMs) with linear and
racy of the expected results (Guzzetti et al., 2005; Horritt and Gaussian kernels (Franc and Hlávač, 2004; Vapnik, 1998). The
Bates, 2002; Hunter et al., 2007). experiments have been made first by discriminating between
The above outlined procedure is well established and able to flood-prone areas and marginal hazard areas, then multiclass clas-
accurately recognize flood-prone areas down to the scale of the sifiers have been used for hazard graduation. For the purpose of the
single building. On the other hand, it is expensive and time con- present paper, a marginal hazard level is introduced: it distin-
suming; moreover, it requires information not readily available guishes areas that are subject to the flood hazard with a return
for all areas. For all these reasons, even in developed countries, time greater than that used to identify flood-prone areas.
the complete mapping of flood-prone areas is far from being This paper is organized as follows. Section 2 presents the Tanaro
achieved. case-study area. Section 3 introduces the elevation dataset and the
The development and processing of Digital Elevation Models methods used to prune the drainage network, i.e. the source of the
(DEMs) is a subject of increasing interest for a number of environ- flood hazard. The marginal hazard concept is presented in Section 4
mental disciplines. Consequently, the availability of new technolo- together with a simple procedure able to identify areas that are
gies to measure surface elevation (e.g., GPS, SAR, SAR subject to this hazard level. Section 5 describes the selected mor-
interferometry, radar and laser altimetry) has made more attrac- phological features, i.e. those for which the relation with the flood-
tive the application of DEM-based models. Moreover, in recent ing hazard is to be evaluated. Performance measures are
years the DEM-based automatic characterization of hydrological intensively used to identify the best classifier and to validate the
and morphological features (e.g. drainage area, stream channels, procedure. They are introduced in Section 6. Linear classifiers
valley bottoms, and floodplain identification) has become a prac- and SVM with linear and Gaussian kernels are presented in Sec-
tice for hydrologists and geomorphologists, substituting time-con- tions 7 and 8, respectively. A procedure for hazard graduation in
suming manual procedures (Bates et al., 2003; Dodov and recognized flood-prone areas is delineated in Section 9. Finally,
Foufoula-Georgiou, 2005; Gallant and Dowling, 2003; Giannoni Section 10 presents the application and validation of the procedure
et al., 2005, 2008; Manfreda et al., 2011; Nardi et al., 2006, 2008; with reference to the Tanaro case study as well as the qualitative
Noman et al., 2001). Among global elevation sources, in the follow- validation performed within portions of the Tevere, the Dora Baltea
ing reference is made to HydroSHEDS (Hydrological data and maps and the Quirra catchments (Italy). Section 11 is a brief conclusive
based on SHuttle Elevation Derivatives at multiple Scales; see Leh- section. To make the paper self contained, an appendix on basic re-
ner et al. (2008) for technical information), based on the NASA sults from Statistical Learning Theory is included.
Shuttle Radar Topography Mission (SRTM).
In the present contribution, a possible approach is delineated
for flood-prone areas identification and hazard grading from Basin 2. The Tanaro basin case study
Authorities and remote sensed elevation datasets. For the purpose,
a number of DEM-derived quantitative morphologic features were Tanaro is a 276 km-long river in North-western Italy (Fig. 1). It
selected: local slope, contributing area, site elevation and distance rises in the Ligurian Alps, close to the border with France and is the
from the potential source of flooding, and surface concavity. On the most significant right-side tributary to the Po River in terms of
other hand, flood hazard maps provided by Basin Authorities, and length, drainage area (partly Alpine and partly Apennine) and dis-
available for limited portions of the basin area, complete the data- charge. At its junction with the Po River, the Tanaro drains about
set for the calibration of relationships between the selected mor- 8000 km2, 500 km2 of which are in mountainous terrain. Major
phologic features and the flooding hazard. Once these tributaries are the Stura di Demonte (contributing area
relationships are identified and calibrated, the extension of the 1430 km2), Alto Tanaro (1840 km2), Belbo (480 km2), Bormida
hazard information to the entire catchment can be performed. This (1640 km2) and Orba (840 km2). Morphological variability allows
is achieved through techniques of pattern classification such as the for the identification of three main zones with characteristic
use of linear classifiers based on one or two of the morphologic behaviors. The mountain part, with an average 6% slope, very steep

Fig. 1. Tanaro basin: location map (a) and hillshaded representation of the SRTM dataset (b).
304 M. Degiorgis et al. / Journal of Hydrology 470–471 (2012) 302–315

catchments and deep river beds; the mild part, with an average 1% drainage basin surface is derived from the NASA SRTM mission
slope, shallower river beds and mildly steep catchments; finally, with a 3 arc-seconds resolution. For the study area, this corre-
the alluvial part, characterized by very low slope values. Unique sponds to a DEM grid size of about 85  85 m. While this resolu-
among the Po right-side tributaries, the river has an Alpine origin. tion is far from allowing for the recognition of flood control
Nevertheless, the Ligurian Alps have a not high enough altitude, structures – such as levees, dikes and weirs – it is accurate enough
and are located too close to the sea, to allow for the formation of to describe the local terrain morphology (a discussion on the influ-
snow fields or glaciers large enough to provide a steady source of ence of DEM source and grid size on the delineation of flood-prone
water during the dry summer season. Furthermore, the Alpine areas is provided in Manfreda et al. (2011)).
zone forms only a part of the basin drained by the Tanaro. The dis- Fig. 2 depicts the Po Basin Authority results relevant to the pres-
charge is therefore subject to a great deal of variation, and the sea- ent study (www.adbpo.it). Panel (a) shows the portion of the drain-
sonal regime of the river is more typical of an Apennine torrent, age network for which hydrologic and hydraulic studies were
with maximum discharges in spring and autumn and a very small performed to define flood-prone areas and hazard graduation. Pa-
flow rate in summer. The river is highly prone to flooding. During nel (b) shows the areas that are recognized as subject to possible
the last two centuries, the Tanaro basin was affected by floods on floods produced by the reaches that constitute the network in Pa-
136 occasions, the most devastating being that of November nel (a). Panel (c) shows, within flood-prone areas, the hazard grad-
1994, when the whole of the river valley was affected by severe uation. The flooding hazard is graduated in three classes: high,
flooding (Marchi et al., 1996; Luino, 2002). Due to these character- medium, and low.
istics, the Tanaro was selected as case study for the design of risk From Fig. 2, one could realize that the work needed to complete
scenarios for the flooding hazard (Ghizzoni et al., 2010). the knowledge of the flooding hazard for the Tanaro basin is far
Fig. 1 incorporates a hillshaded representation of the Tanaro ba- from being finished. When analyzing the hazard for an element lo-
sin elevation data. The DEM used in this study to describe the cated outside recognized flood-prone areas, i.e., within the gray
areas in Fig. 2c, this incomplete knowledge presents severe draw-
backs. In fact, since many non-studied tributaries are present, each
location in the catchment seems to be potentially subject to the
flooding hazard.

3. Drainage network as the hazard source

In the context of the present work, streams constitute the


source of the hazard. Within a given drainage basin, the hazard is
known when flood-prone areas and hazard grading are available
for all the streams that constitute its drainage network. The first
step is therefore to define a reference drainage network, which
should include all major streams, their tributaries, and sub-tribu-
taries down to a certain limit below which one could expect that
no relevant inundation process could be generated. The way in
which such limit is identified depends on the specific application
for which flood maps are produced. For the purpose of the present
work, the extension of the network will be fixed in terms of catch-
ment drainage density, Dd, defined as the ratio of the total drainage
network length to the basin area.
Elevation data were obtained from HydroSHEDS (hydro-
sheds.cr.usgs.gov/index.php). The HydroSHEDS database offers
hydrologic-related information at both regional and global scales
in a format compatible with GIS applications. These include, e.g.,
drainage directions and networks, water divides, and contributing
areas. For the aim of the present study, the Void-filled (DEM–VOID)
and the Hydrologically conditioned (DEM–CON) elevation models
are utilized. Both DEMs are derived from SRTM data. In DEM–VOID
missing data-points are filled and main elevation inconsistencies
removed. DEM–CON is further conditioned to produce a coherent
drainage, one in which the spatial location of the DEM derived
drainage network match its actual position (detailed specifications
are provided in hydrosheds.cr.usgs.gov/datasets.php).
Finally, one could note that conditioning procedures modify ele-
vation data. This limits the use of the DEM–CON to drainage net-
work identification procedures. Quantitative measures of
morphologic characteristics involving elevation data, e.g., local
slope or elevation difference between points, have therefore been
derived from DEM–VOID data. This dataset has been chosen for
its completeness and, for sake of generality of the proposed proce-
Fig. 2. Summary of the Po Basin Authority studies for the Tanaro basin: drainage dure, used without alterations. However, one could notice that dif-
network (a), flood-prone areas (b) and hazard graduation (c). Red, yellow and blue
respectively indicate high, medium and low hazard areas. (For interpretation of the
ferent procedures, potentially able to modify results, can be
references to color in this figure legend, the reader is referred to the web version of adopted to remove DEMs inconsistencies (for a discussion on these
this article.) topics see Santini et al., 2009).
M. Degiorgis et al. / Journal of Hydrology 470–471 (2012) 302–315 305

Fig. 3. Reference drainage network of the Tanaro basin. Streams studied by the Po Fig. 4. Marginal hazard areas of the Tanaro basin (green) identified according to the
Basin Authority are depicted in dark-blue. (For interpretation of the references to Po Basin Authority hazard studies. Marginal hazard areas are recognized as those
color in this figure legend, the reader is referred to the web version of this article.) cells (i) directly drained by the network of Fig. 2a, (ii) not prone to floods according
to Fig. 2c, and (iii) not flowing through the streams depicted in light-blue in Fig. 3.
(For interpretation of the references to color in this figure legend, the reader is
referred to the web version of this article.)
Different methods allow for the identification of the drainage
network from a DEM (see, e.g., Band, 1986; Giannoni et al., 2005;
O’Callaghan and Mark, 1984; Rodriguez-Iturbe and Rinaldo, The calibration of classifiers able to discriminate the hazard le-
1997; Roth et al., 1996; Tarboton et al., 1991). In the context of vel needs a training set that includes elements that belong to all
the present work, the drainage network is pruned through the pro- the classes: high, medium, low, and marginal. For this purpose,
cedure proposed by Giannoni et al. (2005), which takes into ac- marginal hazard areas associated to the network of Fig. 2 should
count contributing area, A, and local slope, S, in the form ASk. On be identified.
the basis of this area-slope criterion, a channel is expected to start Marginal hazard areas are here recognized as the ensemble of
from locations where the quantity ASk exceeds a threshold value. the DEM cells that are: (i) directly drained by the studied network
Once a channel is generated, its path to the outlet is identified by of Fig. 2a; (ii) not recognized as prone to floods according to
following the maximum slope direction. Fig. 2b; and (iii) not flowing through the streams depicted in
This procedure produces a non-uniform drainage density, an light-blue in Fig. 3. Note that the last condition holds only for those
important attribute when accurate recognition of the extension locations not subject to be flooded from a non-studied stream.
of the drainage system is essential. In this framework, the k expo- In Fig. 4 the results of this identification procedure are pre-
nent of the threshold expression is substantially responsible for sented, it now includes four hazard classes (high, medium, low,
drainage density redistribution within the basin. In fact, this and marginal) although unclassified areas are still present.
parameter, by assigning different importance to the slope, aug-
ments the influence of high slope values in steep mountain zones, 5. Flood-related basic morphologic features
and does the opposite in flat areas. The k = 1.7 value is assumed in
the present work (for a complete discussion on this topic see Gian- Binary classifiers need a dataset of features on which calibration
noni et al., 2005). The threshold above which a drainage is pro- and predictions are to be performed. Basin surface topography and
duced is here fixed to achieve, for the Tanaro catchment, the morphology are here represented by a DEM: obvious ingredient of
average drainage density value obtained by the Po Basin Authority, the dataset is therefore the location of the cell under exam (lati-
that is ASk P 5104 m2 and Dd = 0.74 m1. The resulting drainage tude and longitude). The hazard class (high, medium, low, or mar-
network is depicted in Fig. 3, from which one could both remark ginal) will be provided for calibration purposes where available.
the significant increase in the extension of the network, and guess Other features should be related to the physical process under
(e.g., from DEM shadows) the presence of small tributaries still not investigation, and available for the entire area under study, in this
considered as potential hazard sources. Obviously, landscape dis- case the Tanaro catchment. In this work, simple morphologic fea-
section by surface transport processes starts well below the dimen- tures are taken into account, leaving their matching and weighting
sion here determined by the threshold value assumption (see, e.g., to the classifier structure. The selected features, specified for each
Montgomery and Dietrich, 1988). Nevertheless, the small size of DEM cell, are: distance from the nearest stream, D, elevation to the
un-dissected catchments, as well as their location in hilly and nearest stream, H, surface curvature, DH (defined as the Laplacian
mountain areas, allows assuming that all main hazard sources of the elevation), contributing area, A, and local slope, S, estimated
are taken into account, at least for the aims of the present work. as the maximum slope among the eight possible flow directions
Moreover, a possible downsizing to the scales of surface dissection that connect the cell under exam to the adjacent cells. Contributing
and channel generation processes will result in an all-pervading area and local slope are the main ingredients of the Topographic
hazard source. Index (TI) first introduced by Kirkby (1975) and recently modified
to detect flood-prone areas from DEMs by Manfreda et al. (2011).
4. Marginal hazard identification In the present work, these two features will be considered sepa-
rately, together, and mixed with all other features.
For the purpose of the present paper, a marginal hazard level is Two features are related to the cell location with respect to the
assigned to the areas that are subject to the flood hazard with a re- nearest stream. The distance from the nearest stream is the length
turn time greater than that used to identify flood-prone areas. The of the path, identified by following the maximum slope direction,
hazard in such areas is less than low, and tends to zero. As a con- which connects the cell under exam to the nearest element of
sequence, the flooding hazard for a given location can be set to the reference drainage network (Fig. 3). The elevation to the near-
marginal if (i) all the potential flood risk sources have been taken est stream is the difference between the elevation of the cell under
into account and (ii) for each single risk source the location is out- exam and the elevation of the final point of the above-identified
side its flood-prone area. path. As suggested by intuition, large distance and elevation values
306 M. Degiorgis et al. / Journal of Hydrology 470–471 (2012) 302–315

correspond to a lower flood hazard. Finally, the surface curvature is rtp = (1  rfn) and rtn = (1  rfp). The ROC curve is defined as the
related to the ability to discriminate convex divides from concave set of pairs (rfp, rtp) obtained by varying the threshold of a binary
valley bottoms, more prone to be flooded. classifier. The Area Under the ROC Curve (also known as AUC) is
a common measure of performance, used to compare different
kinds of binary classifiers. This measure does not require one to as-
6. Dataset and performance measures
sign different weights to the average misclassification errors on the
two classes, or to specify their a priori probabilities. In general, the
As a result from the above, the available dataset is composed of
area under the ROC curve is between 0 and 1. The larger the AUC,
187,306 labeled data points pertaining to both the Tanaro basin
the better the classifier. AUC values greater than 0.5 correspond to
flood-prone areas recognized by the Po Basin Authority, and to
classifiers performing better than chance. Indeed, a completely
conterminous areas subject to a marginal flood hazard. Each data
random classifier (one that is no better at recognizing true posi-
point contains latitude and longitude plus the following five fea-
tives than flipping a coin) has an area under the ROC curve of
tures: D, H, DH, A, S. Within flood-prone areas, the dataset provides
0.5. Instead, a classifier with 0 false positives and 0 false negatives
also the hazard level: high, medium or low. For sake of simplicity,
has AUC = 1. In the following, rfp, rtp, and AUC are used as perfor-
we first ignore the hazard level, and divide data points into two
mance measures of the binary classifiers.
classes: class 0 for marginal hazard data points, and class 1 for data
points with high, medium or low hazard level. The data points
belonging to class 0 are 131,785 (about 70% of the size of the data- 7. Linear binary classifiers
set); the ones from class 1 are 55,521 (about 30% of the size of the
dataset). A pictorial representation of linear binary classifiers is provided
Since the dataset was necessarily sampled from a portion of the in Fig. 5a. They are introduced in this section, while SVM with lin-
geographical region under investigation, we did not use latitude ear and Gaussian kernels (Fig. 5b) are presented in Section 8. In the
and longitude to train the classifiers. While these two features first case, classifiers use only one (Section 7.1) or two (Section 7.2)
may be useful to classify data points that are in the neighborhood of the selected features presented in Section 5 whereas, in the sec-
of some training sample, they may be misleading for the classifica- ond case, all five features are jointly used. To compare different
tion of data points coming from sub-regions where no training classifiers, we use performance measures introduced in Section 6.
sample is available. The respective ROC curves are first obtained by varying the classi-
To investigate the dependence of simulation results on the size fication threshold. Then, corresponding AUCs are evaluated, and
of the dataset, some simulations were performed on both the the binary classifier with the largest AUC value is finally selected.
whole dataset and suitably chosen subsets. Indeed, the first option
may be time-consuming for large datasets. To reduce simulation 7.1. Classifiers based on a single feature and ROC curves
time, only a few training samples may be sufficient to evaluate
empirical estimates of the same metrics computed using the whole We first considered linear binary classifiers based on a single
dataset. feature, chosen among the following: D, H, DH, A, and S. Data were
In some applications, because of the different importance as- then normalized in such a way that each normalized feature – ob-
signed to the occurrence of each of the two events, achieving a tained after translation and scaling of the original feature – lies be-
small average misclassification error on one of the two classes tween 1 and 1. Consequently, also the threshold in the classifiers
may be more significant than obtaining a small average misclassi- was normalized.
fication error on the other class. During the training phase, assign- Single feature classifiers have the advantage of being particu-
ing two different weights to the average misclassification errors on larly simple and quick to be trained, since only the threshold has
the two classes may satisfy this need. However, sometimes the ex- to be set. Fig. 6 shows the ROC curves associated with the five lin-
act values of such weights are difficult to establish. A second way ear binary classifiers obtained by separately thresholding each of
to satisfy this need is through the so-called Receiver Operating the five features, and varying the threshold. Table 1 shows the cor-
Characteristics (ROC) curve (see, e.g., Fawcett, 2006), which is de- responding AUCs. For the classifiers based on H, D, and S, each data
fined in terms of false positive and true positive rates. We recall point was assigned to the class 0 if the feature was above the
that, for any given binary classifier, the false positive rate, rfp, is threshold, and to the class 1 if it was under the threshold. Instead,
the probability that a sample coming from class 0 is erroneously for the classifiers based on the features DH and A, each data point
classified as a sample coming from class 1, i.e. a marginal hazard was assigned to the class 0 if the corresponding feature was under
area is classified as flood prone. Similarly, the false negative rate, the threshold, and to the class 1 if it was above the threshold. The
rfn, is the probability that a sample coming from class 1 is errone- reason for such different choices is that these two different rules
ously classified as a sample coming from class 0. The true positive allow one to obtain ROC curves whose AUCs are greater than 0.5
rate, rtp, and the true negative rate, rtn, are simply obtained as for all the five classifiers.

Fig. 5. Pictorial representation of linear classifiers (a) and SVM classifiers (b). For graphical reasons, the case of two features x1 and x2 is considered. Note that in (a) a single-
feature linear classifier corresponds to a horizontal or vertical separating line. In (b), again for graphical reasons, the case H ¼ R2 is illustrated.
M. Degiorgis et al. / Journal of Hydrology 470–471 (2012) 302–315 307

Fig. 6. Receiver Operating Characteristics (ROC) curve for the five selected features.
ROC curves are obtained by applying a threshold to one of the five features in the
dataset, and by varying the threshold.

Fig. 7. Performance of the single-feature-classifier based on relative elevation.


Table 1
Performance curves are obtained by varying the threshold.
AUCs for the five linear binary classifiers obtained by thresholding one of the features
in the dataset, and varying the threshold. Note that a perfect classifier has AUC = 1
and a random binary classifier has AUC = 0.5.
the sum of the error rates and the AUC) is better than the one of
H DH S A D any of the previous linear binary classifiers, which are based on
AUC 0.9399 0.5459 0.7028 0.5081 0.8628 one single feature.
More specifically, for each choice of x1 and x2, we consider can-
didate classifiers of the form
One could note that for the classifier based on relative elevation
Sða1 x1 þ a2 x2  a3 Þ ð1Þ
(i.e., H) a quite good performance in terms of the AUC was ob-
tained. Fig. 7 shows the false positive rate and the true positive rate where S() is the Heaviside step function (see Appendix A for its def-
as functions of the normalized threshold, for the linear binary clas- inition), x1 and x2 are the selected normalized features, a1 and a2
sifier associated with this feature. the associated coefficients in the linear combination, and a3 the
The best value of the normalized threshold is obtained by min- threshold. Among these classifiers, we search for one classifier that
imizing the sum of the false positive rate and the false negative minimizes rfp + (1  rtp). We call rfp and rtp the so-obtained optimal
rate rfp + (1  rtp) (note that equal weights have been assigned to values. We repeat the procedure for each of the 10 possible pairs of
the two rates). Since the associated ROC curve can be well approx- normalized features. In the particular case in which one between x1
imated by a concave and continuously differentiable function, this and x2 is the normalized feature H, the condition rfp + (1  r°tp) -
is obtained by finding a value k 2 R such that the line rtp = rfp + k is 6 0.2453 holds a priori, since any linear binary classifier based on
tangent to the ROC curve. The value of the optimal normalized H alone is a particular case a linear binary classifier based on two
threshold so obtained is H = 0.8460, for which one has rfp = features, one of which is H (hopefully, one may obtain rfp +
0.1599 (i.e., rtn = 0.8401), rtp = 0.9146, and rfp + (1  rtp) = 0.2453. (1  rtp) < 0.2453).
The classifier described by (1) can be considered as a combina-
7.2. Classifiers based on two features and ROC curves tion of two single-feature classifiers, with suitable weights on the
features and a suitable threshold. Note that, for any c > 0, one has
We now consider the case of linear binary classifier based on S(a1x1 + a2x2  a3) = S(ca1x1 + ca2x2  ca3). This, combined with
two features, investigating the distributions of the projections of the normalization of x1 and x2, allows one to restrict the search
the data on the subspaces associated with various pairs of features. for the optimal parameters to a1 = cos(h), a2 = sin(h), a3 -
Even though the distributions of two classes could partially over- = t(cos(h) + sin(h)), where h e [0, 2p) and t e [1, 1].
lap, better classification performance may be in general obtained Approximately optimal parameters can be searched for by dis-
by considering linear binary classifiers acting on two features in- cretizing h and t. In the simulations, 360 values for h were consid-
stead than one feature, or more generally nonlinear binary classifi- ered, and 1000 for t. All the simulations were performed using
ers such as those introduced in Section 8. We proceed as follows. Matlab 7.7 on a personal computer with a 2.40 GHz Core2 Quad
Let (x1, x2) be any of the 10 possible pairs of normalized features Q6600 CPU and 2 GB of RAM.
selected among the available five normalized features (D, H, DH, A Table 2 shows the obtained results, for the particular cases in
and S). Then, we search for a linear binary classifier based on such which the two rates rfp and rtp are replaced by their empirical esti-
two normalized features x1 and x2, whose performance (in terms of mates, obtained by sampling i.i.d. the given data distribution, 5000
308 M. Degiorgis et al. / Journal of Hydrology 470–471 (2012) 302–315

Table 2
Empirical false positive rate, r fp , and true positive rate, r tp , and the sum r fp þ ð1  r tp Þ for the (approximately) optimal two features linear binary classifiers. The samples were
obtained by sampling i.i.d. from the given data distribution (for n = 5000 and n = 20,000).

Pairs of features r fp n ¼ 5000 r tp n ¼ 5000 r fp þ ð1  r tp Þn ¼ 5000 rtp n ¼ 20; 000 r tp n ¼ 20; 000 r fp þ ð1  r  tpÞn ¼ 20; 000

H, DH 0.1658 0.9209 0.2449 0.1563 0.9166 0.2397


H, S 0.1899 0.9566 0.2333 0.1803 0.9546 0.2256
H, A 0.1689 0.9230 0.2459 0.1656 0.9257 0.2399
H, D 0.1277 0.9062 0.2215 0.1523 0.9483 0.2040
DH, S 0.5053 0.8775 0.6278 0.5406 0.9173 0.6233
DH, A 0.7585 0.9454 0.8131 0.7569 0.9369 0.8200
DH, D 0.2258 0.7691 0.4567 0.2481 0.8034 0.4447
S, A 0.5980 0.9454 0.6526 0.5473 0.9064 0.6409
S, D 0.1854 0.8258 0.3957 0.2016 0.8656 0.3360
A, D 0.2249 0.7691 0.4559 0.2572 0.8095 0.4476

Table 3 Table 4
False positive rate, r fp , true positive rate, r tp , the sum r fp þ ð1  r tp Þ, and the h and t AUCs for the (approximately) optimal two features linear binary classifier for the
parameters for the (approximately) optimal two features linear binary classifiers. choices x1 = H and x2 = D, and for the linear binary classifiers obtained by thresholding
one of the same two features.
Pairs of features r fp r tp r fp þ ð1  r tp Þ h t
H and D H D
H, DH 0.1636 0.9212 0.2424 183° 0.794
AUC 0.9575 0.9399 0.8628
H, S 0.1629 0.9340 0.2289 171° 0.824
H, A 0.1732 0.9289 0.2443 182° 0.846
H, D 0.1369 0.9283 0.2086 186° 0.838
DH, S 0.5466 0.9200 0.6266 248° 0.586 whole dataset, respectively. Moreover, Tables 2 and 3 show that
DH, A 0.7795 0.9528 0.8267 16° 0.182 even with 5000 i.i.d. samples one is able to obtain the optimal
DH, D 0.2270 0.7789 0.4481 275° 0.954 selection of x1 and x2 (i.e., x1 = H and x2 = D).
S, A 0.5920 0.9444 0.6477 183° 0.870 Let Uða1 x1 þ a2 x2  a3 Þ be an (approximately) optimal linear
S, D 0.1938 0.8521 0.3417 219° 0.874
binary classifier and consider the choices x1 = H and x2 = D. The
A, D 0.2403 0.7899 0.4504 284° 0.790
classifier associates the pattern (x1, x2) to the class 0 if
a1 x1 þ a2 x2 6 a3 , to the class 1 otherwise. Then, the ROC curve
and 20,000 times, respectively (in both cases, the same samples associated with such a classifier is obtained by varying only its
were used to evaluate rfp and rtp, for each of the 10 kinds of classi- threshold a3 (while fixing a1 ¼ a1 and a2 ¼ a2 ), and plotting the
fiers). Instead, Table 3 shows the results obtained by computing resulting pairs (rfp, rtp). Hopefully, the resulting AUC may be greater
the true rfp and rtp (i.e., the ones obtained using the whole dataset), than 0.9399. This is indeed the case, since in the simulation the ob-
and also the corresponding approximately optimal parameters h tained AUC was 0.9575. Fig. 8 compares the ROC curve associated
and t. with such a binary classifier and the ones associated with the linear
The results of Tables 2 and 3 are quite similar, in accordance to binary classifiers obtained by thresholding one of the two features.
Appendix A about Statistical Learning Theory. Note, however, that Then, Table 4 shows the corresponding AUCs.
the samples used in Table 2 are less than 2.7% and 10.7% of the

8. Binary SVMs with linear and Gaussian kernel and ROC curves

Next, to evaluate linear binary classifiers based on more than


two features, we trained a binary SVM classifier with linear kernel
and L1-soft margin (see Appendix B for a short introduction to
SVMs and Fig. 5b for a pictorial representation), using all the five
features and giving the same weights to samples coming from
either of the two classes. In the simulations, we used the Sequen-
tial Minimal Optimization (SMO) algorithm (Platt, 1998) (as imple-
mented in the Matlab Statistical Pattern Recognition Toolbox
(Franc and Hlávač, 2004)) to solve the associated optimization
problem, choosing C = 5 for the regularization parameter. Note
that, since the SVM is using a linear kernel, the risk of over-fitting
with even a few samples is low, as the VC-dimension of the class of
indicator functions that the algorithm can produce as outputs is at
most 6. However, in order to reduce further such a risk (and also
the simulation time), the training set was obtained selecting ran-
domly about 25% of the dataset, and the performance metrics rfp
and rtp were evaluated first on a validation set made up of about
the remaining 75% of the dataset, and then on the whole dataset.
Table 5 shows the weight vector w and the bias b of the so-ob-
tained binary SVM classifier, and the AUCs associated with the ROC
curves obtained by varying its bias while fixing the weight vector
to w, and whose rates rfp and rtp are computed on the validation
Fig. 8. Receiver Operating Characteristics (ROC) curve and AUC for the best two-
features classifier, i.e. the one based on relative elevation and distance from the
set and on the whole dataset, respectively (as expected, the two
nearest stream. ROC curves and AUCs for the two single-feature-classifiers based on ROC curves resulted actually almost overlapping, see Fig. 9). Note
the same two features are also reported. that the corresponding values of the AUCs (0.9593 and 0.9595,
M. Degiorgis et al. / Journal of Hydrology 470–471 (2012) 302–315 309

Table 5
Weight vector, w, bias, b, and AUC for the binary SVM classifier with linear kernel obtained after training. The ROC curve associated with the binary SVM classifier was obtained
by varying the bias while fixing the weight vector. Two different AUCs are shown, associated to ROC curves whose rates rfp and rtp are evaluated on the validation set and on the
whole dataset.

wH wDH wS wA wD b AUC validation set AUC dataset

28.7222 2.5115 1.6437 0.0485 4.8648 26.7012 0.9593 0.9595

Table 6
Number of support vectors and AUC for the binary SVM classifier with Gaussian
kernel obtained after training. The ROC curve associated with the binary SVM
classifier was obtained by varying the bias while fixing the weight vector. Two
different kinds of AUCs are shown, associated to ROC curves whose rates rfp and rtp are
evaluated on the validation set and on the whole dataset.

# Support vectors AUC validation set AUC dataset


3139 0.9568 0.9563

ber of support vectors out of 12500 training samples). The ROC


curve whose rates rfp and rtp are evaluated on the whole dataset
is practically coincident with the one obtained in the case of the
linear kernel, so it is not reported. However, there was no improve-
ment in the AUC with respect to the results shown in Table 5 for
the linear kernel (actually, a slight worsening was obtained, since
the corresponding values in Table 5 are 0.9593 and 0.9595). Among
the possible reasons of this behavior, we recall that, up to our
knowledge, the standard SVM optimization problem with L1-soft
margin is not directly related to the optimization of the AUC, even
though some relationships between SVM optimization and AUC
optimization (in a suitable feature space) hold when a L2-soft mar-
gin penalty is chosen (Rakotomamonjy, 2004).
Fig. 9. ROC curve and AUC for the SVM classifier with linear kernel. The ROC curve Finally, we report that for the trained binary SVM classifier with
and the AUC for the best two-feature classifier are also reported. Gaussian kernel one has rfp = 0.0763 and rtp = 0.8346, and that the
minimum value of rfp + (1  rtp) on the associated ROC curve is ob-
tained on the point (rfp = 0.1299, rtp = 0.9219), for which one has
respectively) are greater than any of the AUCs obtained in the pre-
rfp + (1  rtp) = 0.2080. This is the best result obtained in all the
vious simulations, and that the largest components (in absolute va-
simulations concerning the minimization of rfp + (1  rtp). How-
lue) of the weight vector w are the ones associated with the
ever, the improvement with respect to other classifiers is not very
normalized features H and D (i.e., the ones for which the best per-
large (0.2080 vs. 0.2082 derived from Table 3), this is likely due to a
formances were obtained for the case of the linear binary classifiers
slight overlapping between the two classes, which may not be re-
based on two features).
moved without incurring in over-fitting (nor without adding more
Fig. 9 shows the ROC curve of the obtained binary SVM classifier
features to the ones available). Finally, we observe that the simula-
and the one of the optimal linear binary classifier shown in Table 3,
tions performed for the classifiers based on two features described
for the choice x1 = H and x2 = D. We also report that the trained bin-
in Section 7 provided only slightly worse results with respect to the
ary SVM classifier with linear kernel has rfp = 0.0931 and
best ones, but with significantly smaller simulation times. More-
rtp = 0.8696, and that the minimum value of rfp + (1  rtp) on the
over, such simulation results are more easily to be interpreted than
associated ROC curve is obtained on the point (rfp = 0.1359,
those obtained for the case of the SVM with Gaussian kernel, since
rtp = 0.9277), for which one has rfp + (1  rtp) = 0.2082.
the weight assigned to a normalized feature in a linear classifier
Finally, we considered the case of a Gaussian kernel (the actual
n 2
o provides a measure of the usefulness of such feature for the classi-
kernel used in the simulation was Kðx; yÞ ¼ exp kxyk 2r2
, were x, y fication problem at hand, while such an information is not pro-
are 5-dimensional vectors of normalized features). Due to the nor- vided by the support vectors in the case of the SVM with
malization of each component of the data vector, one has Gaussian kernel.
pffiffiffi
0 6 kx  yk 6 2 5 for all x, y in the dataset, so in the simulation
we made the choice r = 0.4 in order to avoid over-fitting and, at 9. Hazard graduation
the same time, to make sure that, for each x in the dataset, the va-
lue assumed by the kernel K(x, y) is nearly 0 for points y in the We now take into account hazard graduation in three classes:
dataset that are far from x. The only other change with respect to high, medium, and low. Among the 55,521 data points (of class
the simulation performed in the case of the binary SVM classifier 1) that are labeled as flood prone, 16,659 of them (about 30%)
with linear kernel was a significant reduction in the sizes of both are further labeled as areas subject to a low flood hazard (subclass
the training set and the validation set (their cardinalities were cho- 1), whereas other 20,099 (about 36%) are subject to a medium
sen about 30% of the ones of the corresponding sets chosen for the flood hazard (subclass 2) and the remaining 18,763 ones (about
simulation performed in the case of the linear kernel). This was 34%) are subject to a high flood hazard (subclass 3). To separate
made in order to reduce further the simulation time. the various subclasses with suitable binary classifiers, we propose
The corresponding AUCs were 0.9568 and 0.9563, respectively the following procedure. We first defined a new class 00
(these results are also reported in Table 6, together with the num- (low + marginal) made up of all the data points from class 0 and
310 M. Degiorgis et al. / Journal of Hydrology 470–471 (2012) 302–315

Table 7
Binary classification problems with classes [(marginal) vs. (high + medium + low)], [(low + marginal) vs. (high + medium)], and [(medium + low + marginal) vs. (high)],
respectively: false positive rate, rfp , true positive rate, r tp , the sum r fp þ ð1  r tp ), the h and t parameters, and the AUC for the (approximately) optimal linear binary classifiers for
the choices x1 = H and x2 = D.

Hazard classes r fp r tp r fp þ ð1  r tp Þ h t AUC

(Marginal) vs. (high + medium + low) 0.1369 0.9283 0.2086 186° 0.838 0.9575
(Low + marginal) vs. (high + medium) 0.1590 0.9270 0.2319 190° 0.850 0.9446
(Medium + low + marginal) vs. (high) 0.1967 0.9254 0.2713 190° 0.862 0.9250

the ones from subclass 1, and a new class 10 (high + medium) made fined a new class 000 (medium + low + marginal) made up of all the
up of all the data points from subclasses 2 and 3. Similarly, we de- data points from class 0 and the ones from subclasses 1 and 2, and
a new class 100 (high) made up of all the data points from subclass 3.
In this way, it was possible to apply the same binary classification
techniques used in the previous simulations to first separate clas-
ses 00 and 10 , then classes 000 and 100 .
In particular, we considered only linear binary classifiers based
on the two normalized features H and D, since this was the best
architecture (in terms of both the performance and the simulation
time) found in the previous simulations to separate classes 0 and 1.
The obtained results are shown in Table 7, whereas Fig. 10 shows
the obtained ROC curves.

Fig. 10. Hazard graduation. ROC curve for the binary [(low + marginal) vs.
(high + medium)] (a), and the [(medium + low + marginal) vs. (high)] (b) hazard
classifiers. The two-feature classifier of Fig. 8 is also reported for comparison
purposes. All classifiers are based on relative elevation and distance from the
nearest stream. AUC values are also reported.

Table 8
Performances of the three multiclass classifiers in terms of their identification of the
hazard level. The results are in%. Best performances are highlighted in bold.

Hazard level Multiclass Multiclass Multiclass


classifier 1 classifier 2 classifier 3
Correctly identified 72.3 74.8 75.2
Overestimated by 1 level 13.2 11.9 11.5
Overestimated by 2 levels 7.3 5.5 5.4 Fig. 11. Tanaro basin. Flood-prone areas (a) and hazard graduation (b) identified
Overestimated by 3 levels 4.0 1.9 1.2 according to the two-features classifiers based on relative elevation and distance
Underestimated by 1 level 2.2 4.2 5.0 from the nearest stream. Compare respectively with Fig. 2b and c. Red, yellow and
Underestimated by 2 levels 0.8 1.3 1.3 blue respectively indicate high, medium and low hazard areas. (For interpretation
Underestimated by 3 levels 0.2 0.3 0.3 of the references to color in this figure legend, the reader is referred to the web
version of this article.)

Table 9
Performances of the three multiclass classifiers in terms in of the extensions of the four kinds of sub-regions identified by the classifiers (high, medium, low, and marginal hazard).
The results are also shown in% with respect to the total extension of the region investigated (around the river Tanaro). The best performances of each row (with respect to the data
available from the Po Basin Authority, B.A.) are highlighted in bold.

Hazard B.A. (river Multiclass Multiclass classifier 1 vs. Multiclass Multiclass classifier 2 vs. Multiclass Multiclass classifier 3 vs.
level Tanaro) classifier 1 B.A. (%) classifier 2 B.A. (%) classifier 3 B.A. (%)
Marginal 70 63 10.7 66 6.3 66 6.3
Low 9 5 40.3 7 20.5 7 20.5
Medium 11 5 54.7 7 37.1 10 5.6
High 10 27 +169.3 20 +101.8 17 +68.2
M. Degiorgis et al. / Journal of Hydrology 470–471 (2012) 302–315 311

The final multiclass classifier merges the outputs of the three (3) It is considered at medium hazard level if it is associated to
‘‘inner’’ binary classifiers associated with the three pairs of classes the classes 1, 10 and 000 , respectively.
in the following way: (4) It is considered at high hazard level if it is associated to the
classes 1, 10 and 100 , respectively.
(1) A data point is considered at marginal hazard level if it is
associated by the three binary classifiers to the classes 0, 00 Note that such a multiclass classifier is in general nonlinear, de-
and 000 , respectively. spite the fact that the inner binary classifiers are linear.
(2) It is considered at low hazard level if it is associated to the At this point, we observe that the three inner binary classifiers
classes 1, 00 and 000 , respectively. on which the multiclass classifier is based are obtained by giving
equal weights to the error rates associated with the classes 0 and
1, 00 and 10 , and 000 and 100 , respectively. Indeed, each of them is
obtained by minimizing rfp + (1  rtp). A different criterion con-
sists in using different weights to the error rates associated with
the two classes of each of the three inner binary classifiers. For
instance, giving more weight to one of the two classes may pro-
vide better performance of the binary classifier with respect to
that class, without a significant decrease in performance with re-
spect to the other class. Then, we defined three multiclass classi-
fiers (each of which is based on three binary classifiers with
classes 0 and 1, 00 and 10 , and 000 and 100 , respectively) in the fol-
lowing way:

(1) For Multiclass Classifier 1, all the three binary classifiers are
Fig. 12. Tanaro basin. Composite of Po Basin Authority predictions [Fig. 2b] and obtained by minimizing rfp + (1  rtp), as described above.
hazard graduation identified according to the two-features classifiers based on
(2) For Multiclass Classifier 2, the binary classifier with classes 0
relative elevation and distance from the nearest stream [Fig. 11b]. When available,
Basin Authority predictions are preferred due to their higher reliability. Red, yellow, and 1 is obtained by minimizing 1.5rfp + (1  rtp), the one
blue and green respectively indicate high, medium, low and marginal hazard areas. with classes 00 and 10 by minimizing 2rfp + (1  rtp), and the
(For interpretation of the references to color in this figure legend, the reader is one with classes 000 and 100 by minimizing 2.5rfp + (1  rtp).
referred to the web version of this article.)

Fig. 13. Flood hazard predictions for a portion of the Tevere (Central Italy) basin: Basin Authority data (a), marginal hazard areas (b), two-features classifiers based on relative
elevation and distance from the nearest stream (c), and composite picture (d). Red, yellow and blue respectively indicate high, medium, low hazard areas. Dark and light green
indicate marginal hazard areas as identified from Basin Authority data or by the binary classifier, respectively. (For interpretation of the references to color in this figure
legend, the reader is referred to the web version of this article.)
312 M. Degiorgis et al. / Journal of Hydrology 470–471 (2012) 302–315

(3) For Multiclass Classifier 3, the binary classifier with classes 0 One could notice that the overall extension and location of
and 1 is obtained by minimizing 1.5rfp + (1  rtp), the one flood-prone areas (Fig. 2b vs. Fig. 11a) is well replicated by the clas-
with classes 00 and 10 by minimizing 2rfp + (1  rtp), and the sifier. As a result of the adopted calibration procedure, an overesti-
one with classes 000 and 100 by minimizing 3rfp + (1  rtp). mation error is envisaged. This is confirmed by Fig. 11a which
shows a small overestimation of flood-prone areas, confirmed by
One can observe that the different coefficients assigned to the the rfp ¼ 0:1369 value which characterizes this classifier (see
error rates allow one to take into account the different numbers Table 3).
of training samples for the two classes of each of the inner binary Fig. 11b shows classifiers results for hazard graduation. When
classifiers, and to give more weight to the ‘‘more important’’ clas- compared with Fig. 2c, hazard overestimation is detected. Again,
ses (e.g., one may want to give more weight to the error made overestimation is definitely to be preferred over underestimation,
when associating a marginal hazard level to an area with high, and may be partially anticipated by the selected calibration proce-
medium or low hazard level, with respect to the weight assigned dure. Values of r fp increase up to r fp ¼ 0:1967. This latter value is
to the error made in associating a high, medium, or low hazard le- associated to the identification of high hazard areas. This may be
vel to an area with marginal hazard level. Finally, by varying the partially due to the DEM grid size, which does not allow for the rec-
same coefficients, one can control the extensions of the sub-re- ognition of flood defense structures such as levees, dams and weirs.
gions classified as at a high, medium, low, or marginal hazard level. In fact, overestimation is easily detected in flat valley areas, were
Tables 8 and 9 show the results obtained for the three multi- levees are more effective, and tend to vanish moving upstream to-
class classifiers described above. As shown by the tables, the best ward hilly and mountainous areas.
overall performances were obtained for Multiclass Classifier 3. Classifiers should be applied to predict flood-prone areas and
hazard graduation in non-studied areas. Fig. 12 shows, for the en-
10. Application and results tire Tanaro basin, a composite of the Po Basin Authority predictions
and the results of the two-feature linear classifier based on relative
The application of the above described procedure to the streams elevation and distance from the nearest stream. Basin Authority
studied by the Po Basin Authority for the Tanaro catchment is pre- predictions, resulting from intensive field surveys and from valu-
sented in Fig. 11 with reference to the two-feature linear classifier able hydrologic and hydraulic studies, are characterized by a high-
based on relative elevation and distance from the nearest stream. er reliability. Consequently, in Fig. 12 they are overlapped to
Consequently, results are limited to the streams of Fig. 2a for which classifier results. Since the classifier now identify flood-prone areas
the flood prone status and the hazard graduation are already avail- for the entire catchment, also areas subject to a marginal hazard
able, and depicted in Fig. 2b and c. could be depicted.

Fig. 14. Flood hazard predictions for a portion of the Dora Baltea (Valle d’Aosta Region, Northwestern Italy) basin: Basin Authority data (a), marginal hazard areas (b), two-
features classifiers based on relative elevation and distance from the nearest stream (c), and composite picture (d). Red, yellow and blue respectively indicate high, medium,
low hazard areas. Dark and light green indicate marginal hazard areas as identified from Basin Authority data or by the binary classifier, respectively. (For interpretation of the
references to color in this figure legend, the reader is referred to the web version of this article.)
M. Degiorgis et al. / Journal of Hydrology 470–471 (2012) 302–315 313

The linear binary classifier calibrated on the Tanaro basin data 11. Conclusions
provided by the Po Basin Authority and complemented by mar-
ginal hazard identification procedure described in Section 4, can A simple linear binary classifier, based on two features related
be applied also outside the Tanaro catchment. While quantitative to the location of the site under exam with respect to the nearest
measures of classifiers performances are meaningless outside the hazard source, allows distinguishing flood-prone areas. The two
calibration catchment, qualitative comparisons will shed some best-performing features, selected among the five available, seem
light on the possible extensive classifiers application on wide areas, to confirm the intuition that an increasing distance from the risk
at least to obtain a preliminary description of the flooding hazard. sources corresponds to a lower hazard. The first feature is the
The two-feature linear classifier based on relative elevation and length of the path that hydrologically connects the location under
distance from the nearest stream, calibrated with the Tanaro data exam to the nearest element of the drainage network. The second
set, has been therefore applied to three different cases: the Tevere feature is the difference in elevation between the cell under exam
basin in Central Italy, the Dora Baltea basin in Northwestern Italy, and the final point of the same path. The identification is per-
and the Quirra basin in the Sardinia island, Italy. Results are pre- formed with a high reliability: for the Tanaro case study, 93% of
sented in Figs. 13–15, respectively. flood-prone areas are properly recognized by the classifier, and
The three cases were selected for their specific characteristics. only 14% of the areas subject to a marginal hazard are improperly
The Tevere River represents the case of a well-studied catchment: assigned.
flood-prone areas and hazard graduation is available for almost the The same structure can be applied for hazard graduation. While
entire mainstream and for a number of tributaries. Dora Baltea Riv- a negligible reduction in the performances of the resulting multi-
er is studied intermittently, along the mainstream only. The Quirra class classifier is observed in terms of its ability to correctly recog-
River is studied for its final reach only. nize high hazard areas, an increase of false positive up to 19% is
Results depicted in Figs. 13–15 show that the classifier is able to detected. This is partially originated from the selected optimization
provide a good description of flood-prone areas and hazard gradu- procedure, whose main goal is the correct identification of flood-
ation. In all cases, simulation results mimic well the Basin Author- prone areas, and partially due to the DEM resolution. This, in fact,
ities predictions, and composite pictures do not show abrupt seems high enough to describe the local terrain morphology, but
changes at the interface between the two data sources. As com- far from allowing the recognition of flood control structures.
mented with reference to the Tanaro basin results, a small overes- Results derived from the application to different catchments
timation is produced with respect to the prediction of flood-prone seem to qualitatively indicate the ability of the classifier to perform
areas extension and for high hazard areas identification. well also outside the calibration region.

Fig. 15. Flood hazard predictions for a portion of the Quirra (Sardinia, Italy) basin: Basin Authority data (a), marginal hazard areas (b), two-features classifiers based on
relative elevation and distance from the nearest stream (c), and composite picture (d). Red, yellow and blue respectively indicate high, medium, low hazard areas. Dark and
light green indicate marginal hazard areas as identified from Basin Authority data or by the binary classifier, respectively. (For interpretation of the references to color in this
figure legend, the reader is referred to the web version of this article.)
314 M. Degiorgis et al. / Journal of Hydrology 470–471 (2012) 302–315

Pattern classification techniques should be taken into account of a binary classifier), a fixed nonlinear map u: RR ! E, where E
when the completeness in the identification of flood-prone areas is an (usually infinite-dimensional) Euclidean space, a fixed regu-
and hazard grading is required for large regions or when a first larization parameter C > 0, and a fixed parameter p e {1, 2}, a binary
identification is desired. This may be the case (i) for specific appli- SVM classifier with Lp-soft margin is obtained by solving the fol-
cations, such as those related to the insurance market; (ii) for spe- lowing optimization problem (see, e.g., Franc and Hlávač, 2004):
cific areas, for which the available information is limited; or (iii) !
whenever a cost to benefit ratio should address further detailed  1 Xn

flood-mapping activities. ðw ; b Þ ¼ arg min kwk2E þ C npi


w2E;b2R 2 i¼1

Appendix A. Some results from Statistical Learning Theory 8


< hw; xi iE þ b 6 1 þ ni ; yi ¼ 0;
>
We recall the following results from Statistical Learning Theory s:t: hw; xi iE þ b P þ1  ni ; yi ¼ 1;
>
:
(Vapnik, 1998). For a Borel-measurable set Z # Rd , a random vari- ni P 0; i ¼ 1; . . . ; n:
able z with values in Z and with associated distribution F, and a
set of binary-valued functions Q(, a): Z ? {0, 1} (functions of this The resulting binary SVM classifier is Sðhw; xiE þ bÞ where S() is
form are called indicator functions), parameterized by a e A, Theo- the Heaviside step function and x 2 Rd is the input to the classifier.
rem 4.4 from Section 4.9.2 in Vapnik (1998) provides the following It can be proved (see, e.g., Vapnik, 1998) that this classifier can be
uniform probabilistic bound on the absolute value of the difference also expressed as
R
between the expected risk Z Q ðz; aÞdFðzÞ and the empirical risk !
1
Pn X
n
i¼1 Q ðzi ; aÞ, (where the n samples zi are sampled i.i.d. from F):
 
n S a i ð2yi  1ÞKðx; xi Þ þ b ð4Þ
( Z  )
 1X n  i¼1
 
Prob sup Qðz; aÞdFðzÞ  Qðzi ; aÞ > e
a2A  Z n i¼1  where ai 2 R are suitable coefficients, and the function
  Kðx; yÞ :¼ huðxÞ; uðyÞiE is called kernel. For some spaces E, the kernel
hð1 þ lnð2n=hÞÞ
< 4 exp  e2 ; K has a simple expression. This is the case, e.g., of the linear kernel
n n 2
o
K(x, y) = hx, yi and the Gaussian kernel Kðx; yÞ ¼ exp  kxyk 2r2
,
where e⁄ = (e  1/n), and h is the VC-dimension of the set {Q(, a):
P where r > 0 is a fixed parameter. It often happens that only a small
a e A}. This result shows that the empirical risk a 1n ni¼1 Q ðzi ; aÞ is,
with high confidence, a good estimate of the expected risk subset of the coefficients ai (with respect to the total number n) is
R different from 0; the corresponding vectors u(xi) are called support
Z
Q ðz; aÞdFðzÞ (uniformly with respect to the choice of the parame-
ter a), when the number of i.i.d. samples n is large with respect to vectors.
the VC-dimension h. In practice, a binary SVM classifier can be interpreted as a bin-
In the particular case in which A ¼ Rm and Q(z, a) has the form ary linear classifier in the space H (see Fig. 5). It often allows one to
! separate data that are not linearly separable in the original space
X
m
(Fig. 5). At the same time, the generalization capability of the bin-
Q ðz; aÞ ¼ S ak fk ðzÞ ð2Þ
ary SVM classifier is often guaranteed by bounds from Statistical
k¼1
Learning Theory like the one reported in Appendix A, since it can
(where S : R ! R is the Heaviside step function, defined as be shown that the formulation of the optimization problem (4) is
 closely related to such bounds.
0; x 6 0;
SðxÞ ¼ ð3Þ
1; x > 0
References
and the functions fk : Z ! R are fixed), then it is known that the VC-
dimension h is equal to the number of parameters m (see for in- Band, L.E., 1986. Topographic partition of watershed with digital elevation models.
stance Example 2 from Section 4.11 in (Vapnik, 1998)). Water Resour. Res. 22, 15–24.
Now, consider the case in which F is the distribution of the data- Bates, P.D., Marks, K.J., Horritt, M.S., 2003. Optimal use of high resolution
topographic data in flood inundation models. Hydrol. Process 17, 537–557.
set and let the training set be i.i.d. sampled from such a distribu- de Moel, H., van Alphen, J., Aerts, J.C.J.H., 2009. Flood maps in Europe – methods,
tion (note that, since the dataset is finite, the i.i.d. assumption availability and use. Nat. Hazards Earth Syst. Sci. 9, 289–301.
implies possible repetitions of the samples inside the training Dodov, B., Foufoula-Georgiou, E., 2005. Floodplain morphometry extraction from a
P high resolution digital elevation model: a simple algorithm for regional analysis
set). Then, let the empirical risk 1n ni¼1 Q ðzi ; aÞ represent either an
studies. Res. Rep. Ser. Univ. of Minn. Supercomputing Inst. Minneapolis. <http://
empirical false positive rate (i.e., the one evaluated on a training static.msi.umn.edu/rreports/2005/1.pdf>.
set), or an empirical true positive one. The previous results, when Fawcett, T., 2006. An introduction to ROC analysis. Pattern Recognit. Lett. 27, 861–
applied to the linear binary classifiers considered in the paper 874.
Franc, V., Hlávač, V., 2004. Statistical pattern recognition toolbox for Matlab – User’s
(where the number of tunable parameters is at most 6, hence the guide, Center for Machine Perception, Czech Technical University, pp. 1213–
VC-dimensions h of the associated classes of indicator functions 2365. doi: ftp://cmp.felk.cvut.cz/pub/cmp/articles/Franc-TR-2004-08.pdf.
are at most 6) show that the empirical false positive rate and the Gallant, J.C., Dowling, T.I., 2003. A multiresolution index of valley bottom flatness
for mapping depositional areas. Water Resour. Res. 39, 1347.
empirical true positive rate are, with high confidence, good esti- Ghizzoni, T., Roth, G., Rudari, R., 2010. Multivariate skew-t approach to the design of
mates of the (expected) false positive rate and true positive rate, accumulation risk scenarios for the flooding hazard. Adv. Water Resour. 33,
when the number of i.i.d. samples n is large with respect to 6 (this 1243–1255.
Ghizzoni, T., Roth, G., Rudari, R., 2012. Multisite flooding hazard assessment in the
condition is not difficult to be achieved, even with a small proba- Upper Mississippi River. J. Hydrol. 412–413, 101–113.
bility of repetitions in the training set, since the dataset is made Giannoni, F., Roth, G., Rudari, R., 2005. A procedure for drainage network
up of 187306 data points). identification from geomorphology and its application to the prediction of the
hydrologic response. Adv. Water Resour. 28, 567–581.
Giannoni, F., Roth, G., Rudari, R., 2008. A semi-distributed rainfall-runoff model
Appendix B. Binary SVM classifiers based on a geomorphologic approach. Phys. Chem. Earth. 25, 665–671.
Guzzetti, F., Stark, C.P., Salvati, P., 2005. Evaluation of flood and landslide risk to the
population of Italy. Environ. Manage. 36, 15–36.
Given a set of a finite number n of training data (xi, yi) with Horritt, M.S., Bates, P.D., 2002. Evaluation of 1D and 2D numerical models for
xi 2 Rd and yi e {0, 1} (corresponding, e.g., to the classes 0 and 1 predicting river flood inundation. J. Hydrol. 268, 87–99.
M. Degiorgis et al. / Journal of Hydrology 470–471 (2012) 302–315 315

Hunter, N.M., Bates, P.D., Horritt, M.S., Wilson, M.D., 2007. Simple spatially- Noman, N.S., Nelson, E.J., Zundel, A.K., 2001. A review of automated flood plain
distributed models for predicting flood inundation: a review. Geomorphology delineation from digital terrain models. ASCE J. Water Resour. Plann. Manage.
90, 208–225. 127, 394–402.
Kirkby, M.J., 1975. Hydrograph modelling strategies. Progress Phys. Hum. Geogr., O’Callaghan, J.F., Mark, D.M., 1984. The extraction of drainage networks from digital
69–90. elevation data. Comput. Vision Graphics Image Proc. 28, 323–344.
Lehner, B., Verdin, K., Jarvis, A., 2008. New global hydrography derived from Platt, C.J., 1998. Sequential minimal optimization: A fast algorithm for training
spaceborne elevation data. Eos. Trans. AGU 89, 93–94. support vector machines. doi: ftp://ftp.research.microsoft.com/pub/tr/tr-98-
Luino, F., 2002. Flooding vulnerability of a town in the Tanaro basin: the case of 14.pdf.
Ceva (Piedmont–Northwest Italy). In: Thorndycraft, V.R., Benito, G., Barriendos, Rakotomamonjy, A., 2004. Optimizing area under ROC curve with SVMs. ECAI,
M., Llasat, M.C., (Eds.), Proc. PHEFRA Workshop ‘‘Paleofloods, Historical Data & Valencia, Spain, 71–80.
Climatic Variability: Application in Flood Risk Assessment’’. Barcelona, Spain, Rodriguez-Iturbe, I., Rinaldo, A., 1997. Fractal River Basins: Chance and Self-
pp. 321–326. organization. Cambridge University Press.
Manfreda, S., Di Leo, M., Sole, A., 2011. Detection of flood-prone areas using digital Roth, G., La Barbera, P., Greco, M., 1996. On the description of the basin effective
elevation models. J. Hydrol. Eng. 16, 781–790. drainage structure. J. Hydrol. 187, 119–135.
Marchi, E., Roth, G., Siccardi, F., 1996. The Po: centuries of river training. Phys. Chem. Santini, M., Grimaldi, S., Nardi, F., Petroselli, A., Rulli, M.C., 2009. Pre-processing
Earth 20, 475–478. algorithms and landslide modeling on remotely sensed DEMs. Geomorphology
Montgomery, D.R., Dietrich, W.E., 1988. Where do channels begin? Nature 336, 113, 110–125.
232–234. Tarboton, D.G., Bras, R.L., Rodriguez-Iturbe, I., 1991. On the extraction of channel
Nardi, F., Grimaldi, S., Santini, M., Petroselli, A., Ubertini, L., 2008. Hydrogeomorphic networks from digital elevation data. Hydrol. Process 5, 81–100.
properties of simulated drainage patterns using digital elevation models: the Vapnik, V.N., 1998. Statistical Learning Theory. Wiley, New York.
flat area issue. Hydrol. Sci. J. 53, 1176–1193.
Nardi, F., Vivoni, E.R., Grimaldi, S., 2006. Investigating a floodplain scaling relation
using a hydrogeomorphic delineation method. Water Resour. 42, W09409.
http://dx.doi.org/10.1029/2005WR004155.

You might also like