You are on page 1of 39

2.

18 Statistical Hydrology
S Grimaldi, Università degli Studi della Tuscia, Viterbo, Italy
S-C Kao, Oak Ridge National Laboratory, Oak Ridge, TN, USA
A Castellarin, Università degli Studi di Bologna, Bologna, Italy
S-M Papalexiou, National Technical University of Athens, Zographou, Greece
A Viglione, Technische Universität Wien, Vienna, Austria
F Laio, Politecnico di Torino, Torino, Italy
H Aksoy and A Gedikli, Istanbul Technical University, Istanbul, Turkey
& 2011 Elsevier B.V. All rights reserved.

2.18.1 Introduction 479


2.18.2 Analysis and Detection of Nonstationarity in Hydrological Time Series 480
2.18.2.1 The Common Nonstationarity Analysis Methods 480
2.18.2.1.1 Randomness test 480
2.18.2.1.2 Detection of trend 480
2.18.2.1.3 Simple regression on time 480
2.18.2.1.4 Mann–Kendall test 481
2.18.2.1.5 Spearman rank order correlation test 481
2.18.2.1.6 Detection of shifts (segmentation) 482
2.18.2.1.7 t-Test 482
2.18.2.1.8 Mann–Whitney test 482
2.18.2.2 A New Method of Segmentation 483
2.18.3 Extreme Value Analysis: Distribution Functions and Statistical Inference 484
2.18.3.1 Probability Distributions for Extreme Events 484
2.18.3.1.1 Normal distribution 484
2.18.3.1.2 Lognormal distribution 485
2.18.3.1.3 Exponential distribution 485
2.18.3.1.4 Gamma distribution 485
2.18.3.1.5 Pearson type 3 distribution 485
2.18.3.1.6 Log-Pearson type 3 distribution 486
2.18.3.1.7 Extreme value distributions 486
2.18.3.1.8 Generalized Pareto distribution 488
2.18.3.1.9 Generalized logistic distribution 488
2.18.3.2 Parameter Estimation Methods 488
2.18.3.2.1 Method of moments 488
2.18.3.2.2 Method of L-moments 489
2.18.3.2.3 Method of the maximum-likelihood and Bayesian methods 490
2.18.3.3 Model Verification: Goodness-of-Fit Tests 490
2.18.4 IDF Curves 492
2.18.4.1 Definition of IDF Curves and Clarifications 493
2.18.4.2 Empirical Methods 494
2.18.4.2.1 Parameter estimation 494
2.18.4.2.2 Application in a real-world data set 495
2.18.4.3 Theoretically Consistent Methods 496
2.18.4.3.1 Parameter estimation 497
2.18.4.3.2 Application in a real-world data set 499
2.18.5 Copula Function for Hydrological Application 500
2.18.5.1 Concepts of Dependence Structure and Copulas 500
2.18.5.2 Copulas in Hydrologic Applications 503
2.18.5.3 Remarks on Copulas and Future Research 505
2.18.6 Regional Frequency Analysis 505
2.18.6.1 Index-Flood Procedure, Extensions and Evolutions 505
2.18.6.2 Classical Regionalization Approach 506
2.18.6.2.1 Estimation of the index flood 506
2.18.6.2.2 Estimation of the regional dimensionless quantile 507
2.18.6.2.3 Homogeneity testing 507
2.18.6.2.4 Choice of a frequency distribution 508
2.18.6.2.5 Estimation of the regional frequency distribution 508

479
480 Statistical Hydrology

2.18.6.2.6 Validation of the regional model 508


2.18.6.3 Open Problems and New Advances 510
References 511

2.18.1 Introduction or minor infrastructures (e.g., water dams, urban hydraulic


works, flood design, etc.) that affect human lives. IDF curves
Hydrological phenomena such as precipitation, floods, and are in use almost for a century, and the many different forms
droughts are inherently random by nature. Due to the com- and methods proposed and studied through the years
plexity of the hydrologic system, these physical processes are underline their importance. During all those years, IDF curves
not fully understood and reliable deterministic mathematical have evolved from purely empirical forms to theoretically
models are still to be developed. Therefore, in order to provide more consistent, while today, their study still remains an active
useful analyses for designing hydraulic facilities and infra- field of research. In this text, some of the most commonly
structures, statistical approaches have been commonly used forms and techniques have been presented and applied
adopted. in a real world data set. The search of the literature and the
In literature and in the practical hydrological applications, application presented here reveals that some commonly used
many statistical methods are considered with different aims. techniques and forms of IDF curves may result in under-
Simulation, forecasting, uncertainty analysis, spatial inter- estimating the rainfall intensity, especially for large return
polation, and risk analysis are some of the most important periods, and thus should be used with caution. More
ones. The use of statistical analyses is strongly related to the advanced forms and estimation procedures are described and
data availability and to the quality of observations. Particular compared to the most commonly used ones in practical
emphasis is given to the case of ungauged area where the applications.
statistical approach is particularly important to develop Until now the efforts of hydrologists were primarily de-
hydrological analyses without direct observations (the rele- voted to analyze single parameters (flood peak, rainfall in-
vance of this issue is well documented by the Decade on tensity, etc.), not because it is not important to consider other
Prediction in Ungauged Basins (PUB) promoted by the variables (i.e., flood duration and flood volume, or rainfall
International Association of Hydrological Sciences (IAHS, duration and volume, etc.) but because of the absence of a
Sivapalan et al., 2003). flexible approach to jointly analyze these different but useful
This chapter describes some statistical topics widely used in variables. However, this is now finally possible, thanks to the
hydrology. Among the large number of subjects available in relatively recent introduction of copula function. This stat-
literature, the attention is focalized on some of them par- istical and mathematical method is quickly evolving and nu-
ticularly useful either for innovative hydrological analyses or merous applications are described in literature. Since this
for an appropriate application of common procedures. approach is promising and it could change and improve many
Many statistical methods are strongly affected by specific hydrological procedures, a specific section on copula function
conditions to be verified on the available data set. Indeed, for is considered in this chapter, providing an updated review
instance, complex procedures, used for different important useful for hydrological applications.
applications, usually need a very common and simple hy- As mentioned at the beginning of this section, the
pothesis: the stationarity. This condition, simple to define but ungauged basin is a sensitive problem. Most of the little basins
very difficult to verify, probably is the most important in (o150 km2) are characterized by poor hydrological obser-
statistical hydrology. For this reason, the first section of this vations (usually few raingauges are available) that stimulated
chapter provides a short review of this topic and a detailed an intense research on statistical methods for regional fre-
description of the segmentation method that is a promising quency analysis. Therefore, in the last section, it is essential to
procedure for time series trend detection. include a review and a specific description of this important
Another primary topic, described here, is the univariate topic.
extreme value (EV) analysis. The EV approach is the widest This chapter is written by a group of researcher members of
used in hydrology (i.e., for the derivation of return levels for the Statistics in Hydrology – STAHY Working Group recently
extreme rainfall and flood estimates) and it should be care- launched by the International Association of Hydrological
fully and correctly applied in order to avoid dangerous under- Sciences – IAHS with the purpose of sharing knowledge and
or overestimation of the analyzed design variables (rainfall, stimulating research activities on statistical hydrology.
runoff). With this aim in the second section, a detailed dis-
tribution functions used with hydrological variables are de-
scribed; moreover, the approaches to develop the parameter 2.18.2 Analysis and Detection of Nonstationarity in
estimation and the goodness-fit-test steps are reviewed. Hydrological Time Series
Since rainfall is the most-observed hydrological phenom-
enon, a peculiar section is included in this chapter providing Hydrological time series used in water resources planning
an update description of EV-IDF (intensity–duration–frequency) studies are very often supposed to meet the stationary hy-
procedure. IDF curves are an invaluable tool in hydrology pothesis. Under steady-state natural conditions, time series
having a crucial role in the safe and efficient design of major exhibit regular fluctuations around a mean value; however,
Statistical Hydrology 481

when the natural conditions change markedly, they may form 2.18.2.1 The Common Nonstationarity Analysis Methods
trends or exhibit jumps. Hydrological data series frequently
A number of parametric and nonparametric tests have been sug-
show this type of significant nonstationarity due to several
gested in literature for the detection of trend and jumps, and for
reasons (human activities, climate change, etc.).
checking randomness. These tests are considered to be important
A random process is an indexed family (xt)tAI of random
for scientific purposes as well as for practicing hydrologists.
variables, which may be discrete time if I is a set of integers.
In what follows, a combination of the above-mentioned
A discrete random time process x ¼ (x1,x2, y , xn) is said to be
tests has been briefly described.
stationary if, for every k and n, the distribution of xkþ1,xkþ2,
y , xkþn is the same as the distribution of x1,x2, y , xn (Base-
ville and Nikiforov, 1993). In other words, a random process 2.18.2.1.1 Randomness test
or variable is said to be strictly stationary if its statistical An adapted version of a simple nonparametric run test, re-
properties do not vary with time, and hence independent of ported by Adeloye and Montaseri (2002), is given below. The
changes of time origin. test consists of the following steps (Figure 1):
Trends, jumps/shifts, seasonality/periodicity, or non-
randomness in a hydrological time series can be referred to as 1. The median of the observation is determined.
components of the time series. Presence of these components 2. Each data item is examined to find out if it exceeds the
makes the time series nonstationary. Indeed, nonstationarity median. If a data item exceeds the median, this is defined as
is under the effect of persistency and scaling issues a case of success, S, if not, this is defined as a case of failure,
(Koutsoyiannis, 2006). F. Cases that are exactly equal to the median are excluded.
Hydrological time series frequently exhibit nonstationary 3. Successes and failures are counted and denoted by n1 and
behavior, for example; flow and precipitation or rainfall stay n2, respectively.
below or above the mean long-term average (Rao and Yu, 4. The total number of runs (R) in the data set is determined.
1986), although they are generally assumed to be stationary at A run is a continuous sequence of successes until it is
annual scale. When the time interval used is shorter than a interrupted by a failure or vice versa.
year (month, week, or day), the stationarity assumption in the 5. The test statistics is computed by
hydrological time series then becomes nonvalid simply be-  
cause of the annual cycle of the Earth around the Sun. 2n1 n2
R 1
Trends in a time series can result from gradual natural and n1 þ n2
z ¼ sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð1Þ
human-induced disruptive and evolutionary changes in the 2n1 n2 ð2n1 n2  n1  n2 Þ
environment, whereas a jump may result from sudden cata-
ðn1 þ n2 Þ2 ðn1 þ n2  1Þ
strophic natural events (Haan, 2002). Any change in the time
series is most reliable if it is detected by statistical tests and
also has physical and historical evidences (Salas et al., 1980). where z has a standard normal distribution under the null
Therefore, it is considered an important issue to identify hypothesis, H0, that the sequence of successes and failures
(detect), describe (test), and remove these components. is random.

x50

t = t +1 R

Y xt = x50 2n1n2
R −1
n1 + n2
N z=
N 2n1n2 (2n1n2 − n1 − n2)
Y
F xt > x50 S
(n1 + n2)2 (n1 + n2 −1)

n2 = n2 + 1 n1 = n1 + 1
N z < −z/2 Y
z > z/2
Y
t<n

N Accept H0 Reject H0

R H0: The sequence of Ss and Fs is random.

Figure 1 Randomness test.


482 Statistical Hydrology

6. Critical values of the standard normal distribution are 2.18.2.1.4 Mann–Kendall test
obtained for the chosen significance level, a, and denoted The Mann–Kendall test checks the existence of a trend without
by 7za/2. specifying if the trend is linear or nonlinear. It is widely re-
7. Computed statistics z is compared to the critical values ported as in Libiseller and Grimwall (2002).
7za/2. H0 is rejected if zo  za/2 or z4za/2. The univariate statistics for monotone trend in a time series
xt (t ¼ 1, 2, y , n) is defined as
X
2.18.2.1.2 Detection of trend S¼ sgnðxi  xj Þ ð4Þ
A number of parametric and nonparametric trend detection jo i
tests are available in the literature (Berryman et al., 1988; Cluis
et al., 1989; Helsel and Hirsch, 1992; Salas, 1993; Fanta et al.,
where
2001; Yue et al., 2002; Burn and Elnur, 2002; Adeloye and
Montaseri, 2002; Xiong and Guo, 2004; Koutsoyiannis, 2006). 8
< 1;
> if x4 0
Among these, one parametric and two nonparametric tests are
supplied below. sgnðxÞ ¼ 0; if x ¼ 0 ð5Þ
>
:
1; if xo 0

2.18.2.1.3 Simple regression on time


If no ties are present and the values of x1, x2, y , xn are ran-
The simple linear trend line between the variable (x) and time
domly ordered, the test statistics has expectation zero and
(t) can be written as
variance

xt ¼ a þ bt ð2Þ nðn  1Þð2n þ 5Þ


VðSÞ ¼ ð6Þ
18

where a and b are parameters of the regression model. A In the case of presence of tied groups, equations are modified
linear trend exists when the null hypothesis that b ¼ 0 is re- (Salas, 1993).
jected. The null hypothesis is rejected if the test statistics, Tc,
satisfies
2.18.2.1.5 Spearman rank order correlation test
 pffiffiffiffiffiffiffiffiffiffiffiffi  The Spearman rank order correlation nonparametric test is
 n2  used to investigate the existence of a trend that might be found
Tc ¼  pffiffiffiffiffiffiffiffiffiffiffiffiffiffi4 T1a=2; v ð3Þ
r 1  r2 in the time series. The step-by-step explanation of the test for a
time series xt (t ¼ 1, y , n) observed in time t (Figure 2) is as
follows:
where r is the cross-correlation coefficient between the variable
x (x1, x2, y , xn) and time t ¼ 1, 2, y , n, and T1a/2, v is the 1. Ranks, Rxt, are assigned to xt, such that the rank 1 is as-
1  a/2 quantile of the Student t distribution with v ¼ n  2 signed to the largest xt and the rank n to the least xt. Where
degrees of freedom. there are ties in the xt, then a rank equal to the average of

x1< x2<...< xt<...< xn


t=1
1 2 Rt n

dt = Rt − t t=t+1 n−2
T = rs
1 − rs2

N
t<n

Y N T > T/2, n − 2 Y
n
T < −T/2, n − 2
1−6 dt2
t=1
rs =
n(n 2 − 1) Accept H0 Reject H0

H0: The time series has no trend.

Figure 2 Spearman rank order correlation coefficient test.


Statistical Hydrology 483

the ranks which would have been used had there been no x1, x2
ties is assigned to each of the ties. n1, n2
2. The difference

dt ¼ Rxt  t ð7Þ
n1 n2
(x − x1)2 (xj − x2)2
is computed. i =1 i j=1
s =
3. The coefficient of trend, rs, is computed by n1 + n2 − 2

P
1  6 nt¼1 d2t
rs ¼ ð8Þ
nðn2  1Þ

Under the null hypothesis that the time series has no trend, x1 − x2
T=
the variable n1 − n2
s
n1n2
sffiffiffiffiffiffiffiffiffiffiffiffiffi
n2
T ¼ rs ð9Þ
1  r2s

has a Student’s t-distribution with n  2 degrees of N


freedom. Accept H0 T > T1 −  / 2, n1 + n2 − 2 Reject H0
4. The critical values of the t-distribution for the chosen sig-
nificance level, a, and n  2 degrees of freedom are ob-
tained. For a two-tailed test, the critical values are denoted H0: The shift in the mean is insignificant.
by 7Ta/2, n2.
5. The values of T are compared to the critical values. H0 is Figure 3 Detection of shift: t-test.
rejected if T4Ta/2, n2 or To  Ta/2, n2.
using the parametric t-test for which details are given below
(Figure 3).
2.18.2.1.6 Detection of shifts (segmentation)
Segmentation of a time series is the first step of jump analysis 1. The time series is divided into several segments by a seg-
also called change point detection problem (or detection of mentation algorithm.
shifts) for which statistical tests such as the Pettitt (1979) and 2. The average of two consecutive segments ( x1 and x2 ) is
Alexandersson (1986) tests are available in the literature. The calculated and the length of the segments (n1 and n2) is
simplest case is the segmentation with regression by constant determined.
in which it is aimed to determine the change points or 3. The t-statistics is calculated by
boundaries where the average of the current segment is stat-
jx1  x2 j
istically different than the average of the next segment as well T ¼ rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð10Þ
n1 þ n2
as that of the previous one. This shift or jump may be either s
positive or negative. By using a proper algorithm, the time n1 n2
series is first divided into segments with different mean values.
Then the significance of the difference in the mean is tested. with n1 þ n2  2 degrees of freedom. s in Equation (10) is
A number of tests are available in the literature to test the pooled variance given by
the significance, that is to detect whether the time series is sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Pn1 P2
consistent. The tests are either parametric or nonparametric i¼1 ðxi  x 1 Þ2 þ nj¼1 ðxj  x2 Þ2
s¼ ð11Þ
as in the trend detection tests (Hirsch et al., 1993; Chen n1 þ n2  2
and Rao, 2002; Fanta et al., 2001; Xiong and Guo, 2004;
Wong et al., 2006). Here, the t-test and Mann–Whitney test are 4. The null hypothesis that shift in the average value is in-
described. significant is rejected if the sample T statistics in Equation
(10) is greater than the critical value of Student’s t-distri-
2.18.2.1.7 t-Test bution T1a=2; n1 þn2 2 with n1 þ n2  2 degrees of freedom.
A segmentation algorithm can be used for splitting the sample
into segments with significantly different means. The seg- 2.18.2.1.8 Mann–Whitney test
mentation algorithm divides the time series into as many The Mann–Whitney test is used when a time series xt (t ¼ 1,
segments as possible. Then, if two or more segments are 2, y , n) can be divided into two segments x1 ; x2 ; y; xn1 and
identified, the starting year of the last segment is chosen as the xn1 þ1 ; xn1 þ2 ; y; xn such that n2 ¼ n  n1. This is a widely re-
first year for splitting the time series. The comparison is made ported test and briefly given below as reported in Salas (1993):
between the segments before and after the chosen year. Once a new series zt (t ¼ 1, 2, y , n) is defined by rearranging the
segmentation is completed, the jump analysis is performed by original data xt at increasing order of magnitude. The
484 Statistical Hydrology

hypothesis that the mean of the first segment is equal to the depends on x. The segmentation cost J(t) is defined by
mean of the second segment is tested by
X
K
Pn1 J ðt Þ ¼ dtk1 þ1;tk ð13Þ
t¼1 Rðxt Þ  n1 ðn1 þ n2 þ 1Þ=2
uc ¼ 1=2
ð12Þ k¼1
½n1 n2 ðn1 þ n2 Þ=12
where ds,t (for 0rsotrT) is the segment error corresponding
where R(xt) is the rank of the observation xt in ordered series to segment [s,t]. The segment error depends on the data vector
zt. The hypothesis of equal means is rejected at the significance fxs ; xsþ1 ; y; xt g. A variety of ds,t functions can be used. In this
level a when |uc|4u1a/2 where u1a/2 is the 1  a/2 quantile study,
of the standard normal distribution. X
t
ds;t ¼ ðxt  ms;t Þ2 ð14Þ
t¼s
2.18.2.2 A New Method of Segmentation
In addition to the classical tests, various segmentation algo- is used where the segment mean is given by
Pt
rithms have been developed to determine stationary segments t¼s xt
and estimate parameters characterizing each segment. The
ms;t ¼ ð15Þ
tsþ1
usual criterion to decide if a change point exists is based on
the segmentation cost defined as the sum of squared deviation The optimal segmentation, denoted by t̂ ¼ ð^t0 ; ^t1 ; y; ^tK Þ, is
of the data from the means of their respective segments. The defined as t̂ ¼ arg mintAN JðtÞ and the optimal segmentation
ðKÞ ðKÞ ðKÞ
number of segments has the lowest limit of 1 and the highest of order K, denoted by t̂ ðKÞ ¼ð^t0 ; ^t1 ; y ; ^tK Þ, is defined as
ðKÞ
n, the length of the time series, and determines the order of t̂ ¼ arg mintANK JðtÞ. The optimal segmentation can be
segmentation, that is, a fifth-order segmentation, for instance, found by exhaustive enumeration of all possible segmen-
means the time series is divided into five segments. tations (and computation of the corresponding ds,t). In com-
The segmentation procedure of Hubert et al. (1989) and putational sense, this is an infeasible way as the total number
Hubert (2000) used also by Cluis and Laberge (2001), Fortin of segmentations increases exponentially with T. In order to
et al. (2004), Aksoy (2006), Dahamsheh and Aksoy (2007), obtain fast algorithms, a fast method for computing the costs
and Aksoy et al. (2008b), among others, is available in the ds,t is first required. For this aim, the recursive formulation of
literature. Some earlier examples developed to determine sta- ds;tþ1 ¼ ds;t þ ðt  s þ 1Þðms;t  ms;tþ1 Þ2
tionary segments and estimate the parameters characterizing
each segment can be found in Appel and Brandt (1983) and þ ðxtþ1  ms;tþ1 Þ2 ð16Þ
Imberger and Ivey (1991). Kehagias (2004) and Kehagias et al.
(2006) developed segmentation algorithms based on dynamic is easily proved where
programming (DP) and hidden Markov model (HMM). ðt  s þ 1Þms;t þ xtþ1
ms;tþ1 ¼ ð17Þ
Gedikli et al. (2008) made the segmentation algorithm – de- tsþ2
noted as AUG – available. Gedikli et al. (2010b) modified the
DP algorithm (mDP). The HMM, DP, AUG, and mDP algo- The segmentation algorithm is based on the branch-and-
rithms are all motivated from the segmentation algorithm of bound-type technique. The branches are the possible segments
Hubert (2000). In the following subsection, the AUG algo- of the kth-order segmentation. As suggested by Hubert (2000),
rithm is briefed with the aim of piecewise-stationarity analysis the upper bound, u, of the kth segment in the Kth-order seg-
of hydrological time series. mentation can trivially be given as
Following definitions are required to explain the formu- tk r u ¼ n  K þ k ð18Þ
lation behind the segmentation algorithm. For details, the
reader is referred to Gedikli et al. (2008) and Aksoy et al. In the segmentation algorithm, the term ‘upper bound’ is the
(2008a). possible maximum value that tk can take. The basic idea of the
Assume that a time series x ¼ (x1, x2, y , xn) is given. Seg- algorithm is to enumerate (branch into) the possible solutions
mentation of such a series can be described by a sequence of the segmentation problem but, at the same time, to avoid
t ¼ (t0, t1, y , tK) to satisfy 0 ¼ t0ot1o y otklotk ¼ n. The exhaustive enumeration by eliminating clearly suboptimal
intervals of integers ½t0 þ 1; t1 ½t1 þ 1; t2 y; ½tK1 þ 1; y; tK  solutions (bounds). It is possible to eliminate segmentations
are called segments, the times t0 ; t1 ; y; tK are called segment by reducing the upper bound of the segments as defined in
boundaries and K, the number of segments, is called the order Equation (18). It is also easy to check that
of the segmentation. In other words, the time points where
 
changes take place are called change points; the interval in- cktþ1  ckt  ckþ1 and ckþ1 ð19Þ
t tþ1
cluded between two change points is a segment (of the time
series); and the procedure by which the segments of a time is valid for t ¼ 2, y , N  1 and k ¼ 1,2, y , t. Equation (19) is
series are determined is called time series segmentation. rather obvious; a detailed derivation of it can be found in
The set of all segmentations of {1, 2, y , n} is denoted by Gedikli et al. (2008). In order to reduce the upper bound, u,
N and the set of all segmentations of order K by NK. Clearly, the remaining cost concept is defined as
N ¼ ,nK¼1 NK : The number of all possible segmentations of
{1, 2, y , n] is 2n1. This can be formulated as an optimiza-
tion problem. In other words, the optimal segmentation RK;k K k
n;t ¼ cn  ct ð20Þ
Statistical Hydrology 485

where krK and trn. This is a unique concept developed to As stressed in Stedinger et al. (1993), ‘‘frequency analysis is
make the algorithm fast. an information problem.’’ If one had a sufficiently long record
The segmentation algorithm computes a sequence of op- of flood flows, rainfall, low flows, etc., then a frequency dis-
timal segmentations t̂1 ; t̂2 ; y ; t̂k , where t̂k is the kth-order tribution for a site could be precisely determined, so long as
optimal segmentation. For a given segmentation (t̂k for in- change over time due to urbanization or natural processes
stance), the hypothesis that the means of consecutive seg- did not alter the relationships of concern. However in most
ments are significantly different is tested. Determining the situations, available data are not enough to precisely define
optimal order of segmentation, that is, selecting the number of the risk of large floods, rainfall, or low flows. This forces
segments, is a subsequent step in the segmentation procedure hydrologists to use practical knowledge of the processes
to be performed for which the Scheffe (1959) test is employed. involved, and efficient and robust statistical techniques, to
The test is run on the optimal segmentations t̂ ð1Þ ; t̂ ð2Þ ; y; t̂ ðKÞ . develop the best estimates of risk that they can. These tech-
Hubert (2000) accepts t̂ ðkÞ as the optimal segmentation when niques are generally restricted, with 10–100 sample obser-
t̂ ðkþ1Þ is the first lowest order segmentation which is rejected vations, to estimate events exceeded with a chance of at least
by the Scheffe test (i.e., the first segmentation for which at 1 in 100, corresponding to exceedance probabilities of 1% or
least two consecutive segments do not show a statistically more. In some cases, they are used to estimate the rainfall
significant difference in their means). In the AUG algorithm, exceeded with a chance of 1 in 1000 (the rainfall with return
however, not the first lowest but the highest order segmen- period of 1000 years), and even the flood flows for spillway
tation which is accepted by the Scheffe test is considered design exceeded with a chance of 1 in 10 000 (the 10 000 years
instead. flood).
The application of the segmentation algorithm was per- In essence, the extreme value analysis consists of fitting
formed by using a previously used data set: the annual mean distribution functions to ordered sequences of observed data
streamflow data of Senegal River originating from Hubert and extrapolating the tails of the distribution to low excee-
(2000) and used by Kehagias (2004), Kehagias et al. (2006), dance probabilities. The immediate problem pertains to the
and Gedikli et al. (2008). The data set is available on way in which the probabilities are estimated and what level of
the Internet. A user-friendly software (the AUG-Segmenter accuracy is associated with such probabilities. The hydrologist
version – 1.1) based on the above algorithm is now available should be aware that in practice the true probability distri-
(Gedikli et al., 2010a). The software is able to segment time butions of the phenomena in question are not known. Even if
series efficiently and fast. Using this software the Senegal River they were, their functional representation would likely have
annual mean streamflow data set is segmented. The length of too many parameters to be of much practical use. The practical
the data is 84 years for the period 1903–86. The fifth-order issues are: how to select a reasonable and simple distribution
segmentation is found to be optimal after the execution of the to describe the phenomenon of interest, finding the correct
algorithm (Table 1 and Figure 4). trade-off between estimation bias and variance, that respect-
ively decreases and increases as the number of model par-
ameters increases; to estimate the distribution’s parameters;

2.18.3 Extreme Value Analysis: Distribution


Functions and Statistical Inference
1400

The study of the statistics of extreme events is the first step for 1200
Flow (m3 s−1)

most of the hydrological studies. In many situations, historical 1000


records containing observations from the past are the only 800
reliable source of information. In the flood contest, the an- 600
alysis of extreme events was introduced at the beginning of the
400
twentieth century (e.g., Fuller, 1914) to replace the earlier
200
design flood procedures, such as envelope curves and empir-
ical formulas, by more objective estimation methods. When 0
1900

1910

1920

1930

1940

1950

1960

1970

1980

1990

longer flood records became available by the middle of the


twentieth century and with further theoretical developments
such as extreme value theory of Gumbel (1958), the method Year
rapidly became what Klemeš (1993) termed ‘the standard Figure 4 The fifth-order segmentation of the Senegal River annual
approach to frequency analysis’. streamflow data.

Table 1 Change points of annual streamflow data of Senegal River data (1903–86)

Segmentation order Change points

2 1902 1967 1986


3 1902 1949 1967 1986
4 1902 1938 1949 1967 1986
5 1902 1921 1936 1949 1967 1986
486 Statistical Hydrology

and thus to obtain risk estimates of satisfactory accuracy for distribution. The main limitations of the normal distribution
the problem at hand. for describing hydrological variables are that it varies over a
continuous range (  N, þ N), while most hydrologic vari-
ables are non-negative, and that it is symmetric about the
2.18.3.1 Probability Distributions for Extreme Events
mean, while hydrologic data tend to be skewed. Because of its
In this section, several distributions commonly used in hy- definition, the normal distribution is not suitable for extreme
drology are briefly described. Tables 2 and 3 provide a sum- value analysis.
mary of the probability density functions (PDFs) or
cumulative distribution functions (CDFs) of these probability
2.18.3.1.2 Lognormal distribution
distributions. The moments and L-moments for these distri-
If the random variable Y ¼ log(X) is normally distributed, then
butions are reported in Tables 4 and 5 (see Section 2.18.3.2
X is said to be lognormally distributed (LN). This distribution
for more details).
is applicable to hydrologic variables formed as the products of
other variables, because of the central limit theorem, provided
2.18.3.1.1 Normal distribution that these are independent and identically distributed (see,
The normal distribution (N) arises from the central limit e.g., Sangal and Biswas (1970), Martins and Stedinger (2001),
theorem, which states that if a sequence of random variables and Kroll and Vogel (2002) for applications in hydrology).
Xi are independently and identically distributed, then the The lognormal distribution has the advantages over the
distribution of the sum of n such random variables tends normal distribution that it is bounded (X40) and that the
toward the normal distribution as n becomes large. The im- log transformation tends to reduce the positive skewness
portant point is that this is true no matter what the probability commonly found in hydrologic data (especially in extremes),
distribution function of X is. Hydrologic variables, such as because taking logarithms reduces large numbers pro-
annual precipitation, calculated as the sum of the effects of portionately more than small numbers. Some limitations of
many independent events tend to follow the normal the lognormal distribution are that it has only two parameters

Table 2 Commonly used frequency distributions in hydrology

Distribution PDF, fx(x), CDF, Fx(x), and quantile function, xF Range


"  #
Normal (N) 1 1 x  y1 2 No x o N
f X ðx Þ  pffiffiffiffiffiffi exp 
y2 2p 2 y2

x F ¼ y1 þ y2 F1 ðF Þ y2 4 0
"   #
Lognormal (LN) 1 1 logðxÞ  y1 2 x40
f X ðx Þ ¼ pffiffiffiffiffiffi exp 
x y2 2p 2 y2

x F ¼ exp ½y1 þ y2 F1 ðF Þ y2 4 0


"   #
3-par Lognormal (LN3) 1 1 logðx  y1 Þ  y2 2 x 4 y1
f X ðx Þ ¼ pffiffiffiffiffiffi exp 
ðx  y1 Þy3 2p 2 y3

x F ¼ y1 þ exp ½y2 þ y3 F1 ðF Þ y3 4 0



Exponential (E) 1 x  y1 x 4 y1 for y2 4 0
f X ðx Þ ¼ exp 
y2 y2

x  y1
F X ðx Þ ¼ 1  exp 
y2
x F ¼ y1  y2 ln ð1  F Þ

Gamma (G)  y2 1  x 0


1 x x
f X ðx Þ ¼ exp 
jy1 jGðy2 Þ y1 y1
   y1 o x o N if y2 4 0
Pearson type 3 (P3) 1 x  y1 y3 1 x  y1
f X ðx Þ ¼ exp  No x o y1 if y2 o 0
jy2 jGðy3 Þ y2 y2
y3 4 0
   expðy1 Þo x o N if y2 4 0
Log-Pearson type 3 (LP3) 1 logðx Þ  y1 y3 1 logðx Þ  y1
f X ðx Þ ¼ exp 
x jy2 jGðy3 Þ y2 y2 0o x o expðy1 Þ if y2 o 0
y3 4 0

y1, y2, and y3 are distribution parameters, F is the standard normal CDF, and G is the gamma function.
Statistical Hydrology 487

Table 3 Commonly used frequency distributions in hydrology

Distribution PDF, fx(x), CDF, Fx(x), and quantile function, xF Range




Gumbel (EVl) 1 x  y1 x  y1 No x o N
f X ðx Þ ¼ exp   exp 
y2 y2 y2


x  y1
F X ðxÞ ¼ exp exp 
y2
x F ¼ y1  y2 ln½lnðF Þ
  "   #
Fréchet (EV2) y2 y1 y2 þ1 y1 y2 x 4 0; y1 ; y2 4 0
f X ðx Þ ¼ exp 
y1 x x
"   #
y2
y1
F X ðxÞ ¼ exp 
x

x F ¼ y1 ½lnðF Þ1=y2
  "   #
Weibull (EV3) y2 x y2 1 x y2 x 4 0; y1 ; y2 4 0
f X ðx Þ ¼ exp 
y1 y1 y1
"   #
y2
x
F X ðxÞ ¼ 1  exp 
y1

x F ¼ y1 ½lnð1  F Þ1=y2
(  )  
GEV ðx  y1 Þ 1=y3 y2
F X ðxÞ ¼ exp  1  y3 x o y1 þ if y3 4 0
y2 y3
 
y2 y2
x F ¼ y1 þ ½1  ðlnðF Þy3 Þ x 4 y1 þ if y3 o 0
y3 y3
 y1 r x o N if y3 o 0
Generalized Pareto (GP) 1 ðx  y1 Þ 1=y3 1
f X ðx Þ ¼ 1  y3
y2 y2
 y2
ðx  y1 Þ 1=y3 y1 r x r y1 þ if y3 4 0
F X ðxÞ ¼ 1  1  y3 y3
y2
y2
x F ¼ y1 þ 1  ð1  F Þ y3
y3
 
Generalized Logistic (GL) 1 y2
F X ðxÞ ¼  1=y3 x 4 y1 þ if y3 o 0
y3 y3
1 þ 1  ðx  y1 Þ
y2
"   #  
y2 1  F y3 y2
x F ¼ y1 þ 1 x o y1 þ if y3 4 0
y3 F y3

y1, y2, and y3 are distribution parameters, F is the standard normal CDF, G is the gamma function.

and that it requires the logarithms of the data to be symmetric or interarrival time, is described by the exponential distri-
about their mean. Moreover, the lognormal distribution can- bution (E) whose parameter y2 is the mean rate of occurrence
not be used when dealing with variables that can assume null of the events. The exponential distribution is used to describe
values (e.g., discharge in ephemeral rivers). the interarrival times of random shocks to hydrologic systems,
The three-parameter lognormal distribution (LN3) differs such as slugs of polluted runoff entering streams as rainfall
from the LN2 distribution by the introduction of a lower washes the pollutants off the land surface. The advantage of
bound (indicated as y1 in Table 2) so that if X follows the LN3 the exponential distribution is that it is easy to estimate y2
distribution, log(X  y1) is normally distributed. from observed data and the exponential distribution lends
itself well to theoretical studies, such as a probability model
for the linear reservoir (y2 ¼ l/k, where k is the storage constant
2.18.3.1.3 Exponential distribution in the linear reservoir). Its disadvantage is that it requires the
Some sequences of hydrologic events, such as the occurrence occurrence of each event to be completely independent of its
of precipitation, may be considered Poisson processes, in neighbors, which may not be a valid assumption for the
which events occur instantaneously and independently on a process under study (e.g., the arrival of a front may generate
time horizon, or along a line. The time between such events, many showers of rain) and this has led investigators to study
488 Statistical Hydrology

Table 4 Moments and L-moments of commonly used frequency distributions in hydrology

Distribution Moments L-moments

Normal (N) m ¼ y1 ; s ¼ y2 l1 ¼ y1 ; l2 ¼ p1=2 y2


g ¼ 0; k ¼ 3 t3 ¼ 0; t4 ¼ 0:1226
   
Lognormal (LN) m ¼ exp y1 þ y22 =2 l1 ¼ exp y1 þ y22 =2
pffiffiffi
s 2 ¼ ½expðy22 Þ  1 expð2y1 þ y22 Þ l2 ¼ expðy1 þ y22 =2Þ ½2Fðy2 = 2Þ  1
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
g ¼ ½expðy22 Þ þ 2 expðy22 Þ  1 t3 : NA; see HW; eq: ðA72Þ
2 2 2
k ¼ e 4y2 þ 2e 3y2 þ 3e 2y2  3 t4 : NA; see HW; eq: ðA73Þ
3-par Lognormal (LN3) m ¼ y1 þ expðy2 þ y23 =2Þ l1 ¼ y1 þ expðy2 þ y23 =2Þ
pffiffiffi
s 2 ¼ expðy23 Þ  1 expð2y2 þ y23 Þ l2 ¼ expðy2 þ y23 =2Þ ½2Fðy3 = 2Þ  1
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
g ¼ expðy23 Þ þ 2 expðy23 Þ  1 t3 : NA; see HW; eq: ðA72Þ
2 2 2
k ¼ e 4y3 þ 2e 3y3 þ 3e 2y3  3 t4 : NA; see HW; eq: ðA73Þ
2
Exponential (E) m ¼ y1 þ y2 ; s ¼ y22 l1 ¼ y1 þ y2 ; l2 ¼ y2 =2
g ¼ 2; k ¼ 9 t3 ¼ 1=3; t4 ¼ 1=6
Gamma (G) m ¼ y1 y2 l1 ¼ y1 y2
s 2 ¼ y2 y21 l2 ¼ p1=2 y1 Gðy2 þ 1=2Þ=Gðy2 Þ
pffiffiffiffiffi
g ¼ 2 signðy1 Þ= y2 t3 : NA; see HW; eq: ðA86Þ and ðA88Þ
k ¼ 6=y2 þ 3 t4 : NA; see HW; eq: ðA87Þ and ðA89Þ
Pearson type 3 (P3) m ¼ y1 þ y3 y2 l1 ¼ y1 þ y2 y3
s 2 ¼ y3 y22 l2 ¼ p1=2 y2 Gðy3 þ 1=2Þ=Gðy3 Þ
pffiffiffiffiffi
g ¼ 2 signðy2 Þ= y3 t3 : NA; see HW; eq: ðA86Þ and ðA88Þ
k ¼ 6=y3 þ 3 t4 : NA; see HW; eq: ðA87Þ and ðA89Þ
Log-Pearson type 3 (LP3) m ¼ e y1 ½ð1  y2 Þy3 l1 ¼ ey1 ½ð1  y2 Þy3
h i
s 2 ¼ e 2y1 ð1  2y2 Þy3  ð1  y2 Þ2y3 l2 : NA

g : NA; see ST; page 18:21 t3 : NA


k: NA; see ST; page 18:21 t4 : NA

y1, y2, and y3 are distribution parameters, F is the standard normal CDF, and G is the gamma function. NA indicates that the moment or L-moment is very complicated or not
available in analytical form, with reference to Hosking and Wallis (1997) (HW in the table) or Stedinger et al. (1993) (ST in the table) when formulas or approximations are available.

various forms of compound Poisson processes, in which y2 is the lower bound. This is a very flexible distribution, assuming
considered a random variable instead of a constant. The ex- a number of different shapes as the parameters vary. The
ponential distribution has been used in extreme value analysis normal distribution is a special case of the Pearson type 3
as a simple model of the flood or rainfall exceedances over distribution, describing a nonskewed variable. The Pearson
high thresholds in peak over threshold analyses (see, e.g., type 3 distribution was first applied in hydrology by Foster
Todorovic, 1978). (1924) to describe the probability distribution of annual
maximum flood peaks. When the data are very positively
skewed, a log transformation is used to reduce the skewness.
2.18.3.1.4 Gamma distribution
Examples of use of the Pearson type 3 distribution in ex-
The time taken for a number of events, n, to occur in a Poisson
treme value analysis are Matalas and Wallis (1973), Bobée and
process is described by the gamma distribution (G), which is
Rasmussen (1995), and Kroll and Vogel (2002) among others.
the distribution of a sum of n independent and identical ex-
ponentially distributed random variables. The gamma distri-
bution has a smoothly varying form and is useful for
2.18.3.1.6 Log-Pearson type 3 distribution
describing skewed hydrologic variables without the need for
If log(X) follows a Pearson type 3 distribution, then X is said
log transformation. It has been applied, for example, to de-
to follow a log-Pearson type 3 distribution (LP3). This distri-
scribe the distribution of depth of precipitation in storms (see,
bution is the standard distribution for frequency analysis of
e.g., Sivapalan et al., 2005; Viglione and Blöschl, 2009). The
annual maximum floods in the United States (Benson, 1968;
two-parameter gamma distribution has a lower bound at zero,
Stedinger and Griffis, 2008). As a special case, when log(X) is
which is a disadvantage for application to hydrologic variables
symmetric about its mean, the log-Pearson type 3 distribution
that have a lower bound larger than zero.
reduces to the lognormal distribution. The location of the
bound y1 in the log-Pearson type 3 distribution depends on
2.18.3.1.5 Pearson type 3 distribution the skewness of the data. If the data are positively skewed,
The Pearson type 3 distribution (P3), also called the three- then log(X)4y1 and y1 is a lower bound, whereas if the data
parameter gamma distribution, introduces a third parameter, are negatively skewed, log(X)4y1 and y1 is an upper bound.
Statistical Hydrology 489

Table 5 Moments and L-moments of commonly used frequency distributions in hydrology

Distribution Moments L-moments

Gumbel (EV1) m ¼ y1 þ 0:5772 y2 ; s2 ¼ p2 y22 =6 l1 ¼ y1 þ 0:5772 y2 ; l2 ¼ y2 lnð2Þ


g ¼ 1:1396; k ¼ 5 þ 2=5 t3 ¼ 0:1699; t4 ¼ 0:1504
Fréchet (EV2) m ¼ y1 Gð1  1=y2 Þ l1 ¼ y1 Gð1  1=y2 Þ

s 2 ¼ y21 Gð1  2=y2 Þ  G2 ð1  1=y2 Þ l2 ¼ y1 Gð1  1=y2 Þ ð21=y2  1Þ
Weibull (EV3) m ¼ y1 Gð1 þ 1=y2 Þ l1 ¼ y1 Gð1 þ 1=y2 Þ

s 2 ¼ y21 Gð1 þ 2=y2 Þ  G2 ð1 þ 1=y2 Þ l2 ¼ y1 Gð1 þ 1=y2 Þ ð21=y2  1Þ
GEV m ¼ y1 þ y2 ½1  Gð1 þ y3 Þ=y3 l1 ¼ y1 þ y2 ½1  Gð1 þ y3 Þ=y3
 2
y2 l2 ¼ y2 ð1  2y3 ÞGð1 þ y3 Þ=y3
s2 ¼ Gð1 þ 2y3 Þ  G2 ð1 þ y3 Þ
y3
g : NA; see ST; eq: ð18:2:19Þ t3 ¼ 2ð1  3y3 Þ=ð1  2y3 Þ  3
k : NA 5 ð1  4 y3 Þ  10 ð1  3 y3 Þ þ 6 ð1  2 y3 Þ
t4 ¼
1  2 y3
Generalized Pareto (GP) m ¼ y1 þ y2 =ð1 þ y3 Þ l1 ¼ y1 þ y2 =ð1 þ y3 Þ
h i
s 2 ¼ y22 = ð1 þ y3 Þ2 ð1 þ 2y3 Þ l2 ¼ y2 =½ð1 þ y3 Þ ð2 þ y3 Þ
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
g ¼ 2 1 þ 2y3 ð1  y3 Þ=ð1 þ 3y3 Þ t3 ¼ ð1  y3 Þ=ð3 þ y3 Þ
3ð1 þ 2y3 Þ ð3  y3 þ 2y23 Þ t4 ¼ ð1  y3 Þ ð2  y3 Þ=½ð3 þ y3 Þ ð4 þ y3 Þ

ð1 þ 3y3 Þ ð1 þ 4y3 Þ

Generalized Logistic (GL) m ¼ y1 þ y2 ð1=y3  p=sin ðpy3 ÞÞ l1 ¼ y1 þ y2 ð1=y3  p=sin ðpy3 ÞÞ


 
2 p l2 ¼ y2 y3 p=sinðpy3 Þ
s 2 ¼ py22  2
y3 sin ð2py3 Þ sin ðpy3 Þ
g : NA; see JO; eq: ð23:71Þ t3 ¼ y3
k: NA; see JO; eq: ð23:71Þ t4 ¼ ð1 þ 5y23 Þ=6

y1, y2, and y3 are distribution parameters, F is the standard normal CDF, and G is the gamma function. NA indicates that the moment or L-moment is very complicated or not
available in analytical form, with reference to Hosking and Wallis (1997) (HW in the table) or Stedinger et al. (1993) (ST in the table) or Johnson et al. (1994) (JO in the table) when
formulas or approximations are available.

The log transformation reduces the skewness of the trans- annual maxima and minima. The properties of the three
formed data and may produce transformed data which are limiting forms were further developed by Gumbel (1941) for
negatively skewed from original data which are positively the extreme value type I (EV1) distribution, Fréchet (1927) for
skewed. In this case, the application of the log-Pearson type 3 the extreme value type II (EV2), and Weibull (1939) for the
distribution would impose an artificial upper bound on the extreme value type III (EV3). The three limiting forms were
data. shown by Jenkinson (1955) to be special cases of a single
Depending on the values of the parameters, the log-Pearson distribution called the generalized extreme value (GEV) dis-
type 3 distribution can assume many different shapes. Its use is tribution. The three limiting cases are: (1) for y3 ¼ 0, the EV1
justified by the fact that it has been found to yield good results distribution for which x is unbounded; (2) for y3o0, the EV2
in many applications, particularly for flood peak data (e.g., distribution for which x is bounded from below by y1 þ y2/y3;
Bobée, 1975). (3) for y340, the EV3 distribution for which x is bounded
from above by y1 þ y2/y3.
2.18.3.1.7 Extreme value distributions The EV1 and EV2 distributions are also known as the
Extreme values are selected maximum or minimum values of Gumbel and Fréchet distributions, respectively. Note that the
sets of data. For example, the annual maximum discharge at a Gumbel and Fréchet distributions are mutually related
given location is the largest recorded discharge value during a through the logarithmic transformation, that is, if X is a
year, and the annual maximum discharge values for each year Fréchet-distributed variable, then log(X) is distributed as a
of historical record make up a set of extreme values that can be Gumbel. If a variable x is described by the EV3 distribution,
analyzed statistically. Distributions of the extreme values se- then  x is said to have a Weibull distribution. The Gumbel
lected from sets of samples of any probability distribution model is widely applied, often gives satisfactorily results, and
have been shown by Fisher and Tippett (1928) to converge to is parsimonious (two parameters), but may underestimate the
one of three forms of extreme value distributions, called types design rainfall depth or discharge for large return periods (e.g.,
I, II, and III, respectively, when the number of selected extreme Koutsoyiannis, 2004a, 2004b). By contrast, the GEV model
values is large. Unfortunately, for many hydrologic variables encounters difficulties when its parameters are estimated using
this convergence is too slow for this argument alone to justify small to medium-size samples, due to the large estimation
adoption of an extreme value distribution as a model of variance of the shape parameter y3.
490 Statistical Hydrology

Examples of using the extreme value distribution in ex- The second moment about the mean is the variance defined as
treme value analysis are of course very numerous; an extensive s 2 ¼ E½ðX  mÞ 2 . The standard deviation s is the square root
list of references would be out of the scope of this chapter. of the variance and describes the width or scale of a distri-
bution. These are examples of product moments because they
2.18.3.1.8 Generalized Pareto distribution depend upon powers of X. A dimensionless measure of the
The generalized Pareto (GP) distribution is a simple distri- variability in X, appropriate for use with positive random
bution useful for describing events which exceed a specified variables XZ0, is the coefficient of variation, defined as s/m.
lower bound, such as all floods above a threshold or daily The relative asymmetry of a distribution is described by the
flows above zero. The GP distribution allows a continuous coefficient of skewness y ¼ E½ðX  mÞ 3 =s 3 while the co-
range of possible shapes that includes both the exponential efficient of kurtosis k ¼ E½ðX  mÞ 4 =s 4 describes the thickness
and Pareto distributions as special cases. The GP distribution of distribution’s tails. These four moments are tabled for dif-
is commonly used in the peaks over threshold (POT) ap- ferent distributions in Tables 4 and 5.
proach. Examples of use of the generalized Pareto in extreme From a set of observations (X1, y , Xn), the unbiased esti-
value analysis are Hosking and Wallis (1987), Rosbjerg et al. mators of the mean, variance, and coefficient of skewness are
(1992), Madsen et al. (1997a, 1997b), Lang et al. (1999), and
1X n
Claps and Laio (2003). ^ ¼ X ¼
m Xi ð22Þ
n i¼1
2.18.3.1.9 Generalized logistic distribution
The generalized logistic distribution (GL) has been used ex-
tensively for maximum rainfall modeling, and in the UK and
1 X n
2
elsewhere is used in hydrological risk analysis as the standard ^2 ¼ S2 ¼
s ð Xi  XÞ ð23Þ
n  1 i¼1
model for flood frequency estimation (Institute of Hydrology,
1999; Atiem and Harmancioglu, 2006).

2.18.3.2 Parameter Estimation Methods n Xn


3
^g ¼ ð Xi  XÞ ð24Þ
Fitting a distribution to data sets provides a compact and ðn  1Þðn  2ÞS 3 i¼1
smoothed representation of the frequency distribution re-
vealed by the available data, and leads to a systematic pro- The method of moments consists of inverting the equations in
cedure for extrapolation to frequencies beyond the range of Tables 4 and 5 so as to express the parameters of the distri-
the data set. When flood flows, low flows, rainfall, or water- butions in terms of their moments and then using the sample
quality variables are well described by some family of distri- moments to estimate the distribution moments.
butions, a task for the hydrologist is to estimate the parameters
Y of that distribution so that required quantiles and expect- 2.18.3.2.2 Method of L-moments
ations can be calculated with the fitted model. Appropriate L-moments are another way to summarize the statistical
choices for distribution functions can be based on examin- properties of hydrologic data. L-moments were introduced by
ation of the data using probability plots, moment ratios and Sillitto (1969) and formalized by Hosking (1990) and are
goodness-of-fit tests (discussed in Section 2.18.3.3), the linear combinations of the probability-weighted moments
physical origins of the data, previous experience, and ad- defined by Greenwood et al. (1979). The first L-moment es-
ministrative guidelines. timator is again the mean l1 ¼ E[X]. Let Xi:n be the ith smallest
Several approaches are available for estimating the par- observation in a sample of size n (i ¼ 1 corresponds to the
ameters of a distribution. Some commonly used approaches smallest). Then, for any distribution, the second L-moment is
are described in the following subsections. a description or scale based on the expected difference be-
tween two randomly selected observations:
2.18.3.2.1 Method of moments
The method of moments was first developed by Karl Pearson l2 ¼ 12E½ X2:2  X1:2  ð25Þ
in 1902. He considered that good estimates of the parameters
of a probability distribution are those for which moments of Similarly, the third and fourth L-moments are
the PDF about the origin are equal to the corresponding
moments of the sample data. Pearson originally considered l3 ¼ 13E½ X3:3  2X2:3 þ X1:3  ð26Þ
only moments about the origin, but later it became customary
to use the variance as the second central moment and the
coefficient of skewness as the standardized third central mo-
ment, to determine second and third parameters of the dis- l4 ¼ 14E½ X4:4  3X3:4 þ 3X2:4  X1:4  ð27Þ
tribution if required.
Given a distribution function fX(x), the mean is defined as and in general
!
Z N X
r1
r1
m ¼ E½ X  ¼ xfX ðxÞdx ð21Þ lr ¼ r 1
ð1Þj E Xrj:r ð28Þ
j¼0 j
N
Statistical Hydrology 491

The coefficient of L-variation (L-CV) is defined by the ratio of poorly when the distribution of the observations deviates in
two L-moments as t ¼ l2/l1. Other L-moment ratios are significant ways from the distribution being fitted.
t3 ¼ l3/l2 and t4 ¼ l4/l2 that measure the skewness and the MLE methods provide a computationally convenient way
kurtosis of the distributions. to fit frequency distributions by using different sources of in-
Analogously to the method of moments, the method of formation. In flood frequency analysis, for example, systematic
L-moments consists of inverting the equations of Table 2 so as records can be combined with historical events through a
to express the parameters of the distributions in terms of their proper formulation of the likelihood function in which also
L-moments and to use the sample L-moments as estimators of uncertainties (particularly measurement errors) are taken into
distribution L-moments. Sample L-moments are defined as account (see, e.g., Stedinger and Cohn, 1986; O’Connell et al.,
2002; O’Connell, 2005). The best parametrization of the as-
X
r1 sumed flood frequency distribution can then be obtained
lr ¼ pr1;k bk ð29Þ maximizing the likelihood function.
k¼0
The Bayesian inference, in addition to the maximum like-
lihood method, combines prior information (or, for example,
where the coefficients regional hydrologic information) with the likelihood function
! ! in a posterior probability model that quantifies the belief in
r rþk the hypothesis (i.e., the flood frequency distribution with a
pr;k ¼ ð1Þrk ð30Þ
k k given parameter set) after evidence (the flood data) has been
observed. Another advantage of the Bayesian method over the
method of maximum likelihood is that it allows the explicit
are those of the ‘shifted Legendre polynomials’ (see Hosking
modeling of uncertainty in the parameters of the frequency
and Wallis, 1997) and bk are the sample probability-weighted
distribution, which can be used to assign confidence bounds
moments. These are computed from the ordered statistics X1:n,
to the estimated flood quantiles.
X2:n, y , Xn:n, that is, the data values arranged in increasing
order, as
2.18.3.3 Model Verification: Goodness-of-Fit Tests
!1 !
n1 X
n
j1 Goodness-to-fit criteria are useful for gaining an appreciation
bk ¼ n1 Xj:n ð31Þ
k k for whether the lack of fit is likely to be due to sample-to-
j¼kþ1
sample variability, or whether a particular departure of the
data from a model is statistically significant. Model testing and
where n is the sample length and k the order of the prob- verification are basic steps of statistical inference, and several
ability-weighted moment. Since L-moment estimators are testing techniques, borrowed from applied statistics, have
linear functions of the sample values, they should be virtually been applied in the hydrologic field. However, none of these
unbiased and have relatively small sampling variance. tests has reached a broad consensus in the hydrologic com-
The sample L-CV is defined by the ratio t ¼ l2/l1, where l1 is munity, possibly due to some complications that inevitably
the sample mean and l2 a measure of the dispersion around arise when the parameters of the hypothetical distribution are
the mean value. Other sample L-moment ratios are t3 ¼ l3/l2 unknown. In order to evaluate which is the best distribution
and t4 ¼ l4/l2 that measure the skewness and the kurtosis of for a specific sample, a first simple step can be the graphical
data. According to Hosking (1990), also the L-moment ratio representation of the functions.
estimators are asymptotically normally distributed and have The tail behavior of the distributions described in the
small bias and variance, especially if compared with the clas- previous paragraphs is analyzed in Figure 5, which shows the
sical coefficients of variation, skewness, and kurtosis (Hosking quantile function versus the return period for three-
and Wallis, 1997). parameters distributions with equal L-moments of order 1, 2,
In many hydrological applications an occasional event may and 3. It is evident that the distributions follow a similar be-
be several times larger than other values; when product mo- havior for return periods up to 50 years, but they tend to
ments are used, such values can mask the information pro- diverge for larger return periods. This highlights the necessity
vided by the other observations, while product moments of to consider methods for the choice of the distribution func-
the logarithms of sample values can overemphasize small tion from a set of candidate distributions. Graphical pro-
values. In a wide range of hydrologic applications, L-moments cedures form a useful visual method of verifying whether a
provide simple and reasonably efficient estimators of the theoretical distribution fits an empirical distribution (a data
characteristics of hydrologic data and of a distribution’s sample). Among these procedures, probability plots are
parameters. commonly used in hydrology (Powell, 1943). In most cases,
probability plots are constructed to suit the CDF of a par-
2.18.3.2.3 Method of the maximum-likelihood and ticular distribution. Thus, when the distribution function is
Bayesian methods plotted against the variate, a linear relationship is obtained if
Still another method that has strong statistical motivation is the observations are from the hypothetical distribution.
the maximum-likelihood method. Maximum-likelihood esti- Figure 5 does not refer to one particular distribution but is
mators (MLEs) have very good statistical properties in large simply a convenient probability plot that highlights the shape
samples, and experience has shown that they generally do well of different distributions for very low exceedance probabilities
with records available in hydrology. MLEs sometimes perform (very high return periods). If one plots a data sample (which,
492 Statistical Hydrology

250 0.8 LN3


P3
LP3
GEV
0.6 GP
200 GL
Lower bound
0.4
xF

4
150

LN3 0.2
P3
LP3
100 GEV
GP E
0.0
GL EV1
N
10 20 50 100 200 500 1000
1 0.0 0.2 0.4 0.6 0.8
T= (years) 3
1 − FX (x)

Figure 5 Frequency plot of the quantile function xF vs. the return period Figure 6 Probability distributions in the L-kurtosis vs. L-skewness
T for some three-parameter distributions with l1 ¼ 40, t ¼ 0.3, and diagram.
t3 ¼ 0.3.

for example, has sample L-moments l1 ¼ 40, t ¼ 0.3 and fully specified a priori. In case p, the distribution test statistics
t3 ¼ 0.3) using a plotting position on such a graph, one can depend on the so-called null hypothesis H0, that is, on the
have an idea of which distribution is better suited to the data. probability distribution that is being tested (e.g., Stephens,
Since the method is subjective, it should always be sup- 1986). This means that the percentage points, that is, the
plemented by objective goodness-of-fit tests and/or by the 100(1  a) percentiles of the distributions of the test statistics
application of model selection techniques. Objective pro- (a is the significance level of the test), have to be recalculated
cedures for the probabilistic model selection can be found in for each H0. The method of parameter estimation, the pres-
the specific literature. The subject was first proposed in the ence of a shape parameter, and the sample size also have an
Akaike (1973) work, where the principle of maximum entropy influence on percentage points, and this further complicates
was introduced as the theoretical basis for model selection, the analysis. There is thus the necessity to have a different table
and by Schwartz (1978) who, by developing a similar idea in a for each distribution, and tables of percentage values for some
Bayesian context, proposed the Bayesian information criterion tests and distributional families are still lacking.
for model selection. Applications of objective model selection Some testing techniques commonly used in the hydrologic
techniques in statistical hydrology can be found in field are listed in the following, with reference to available
Strupczewski et al. (2002), Mitosek et al. (2006), Di Baldassarre tables of percentage points when applicable.
et al. (2008), and Laio et al. (2009).
Perhaps the most common approach to the choice of the 1. Tests of chi-squared type. The use of the classical Pearson test
probabilistic model in hydrology is based on the use of requires that the range of x is partitioned in classes; a
L-moments plots, which are used to determine the probability convenient procedure to avoid arbitrariness and maximize
distribution closer to the available sample of data (see, e.g., the power of the test entails the choice of k equiprobable
Hosking, 1990; Chowdhury et al., 1991; Stedinger et al., 1993; classes under the hypothesized distribution, with k ¼ 2n0.4
Hosking and Wallis, 1997; Peel et al., 2001). Figure 6 repre- (Moore, 1986). The test statistic distribution in case 0 is the
sents the distributions treated in this chapter on the L-moment chi-squared distribution with k  1 degrees of freedom. In
ratio diagram t3–t4. Two-parameter distributions are shown case p the distribution is not completely known, since
as points while three-parameter distributions are represented there is a partial recovery of degrees of freedom of the
as curves. Because the LP3 distribution has two shape par- chi-squared distribution with respect to the commonly
ameters, the L-moment ratio diagram covers a two-dimen- recommended value of k  p  1 (p is the number of model
sional area (see Griffis and Stedinger, 2007). The diagram is parameters), when efficient estimators are used (e.g.,
convenient for plotting sample at-site or regional average Kendall and Stuart, 1977: 455; Moore, 1986). When using
L-moments ratios for comparison with the population values. maximum likelihood to estimate model parameters, the
Also this approach, however, is not fully objective, because the critical points fall between those of w2(k  1) and those of
goodness-of-fit of a distribution to the data is often based only w2(k  p  1), and not even this can be said when moments
upon graphic judgment. Again, the choice of the distribution or L-moments estimators are employed.
should be based on goodness-of-fit test. 2. Tests using the linearity of the probability plot for measuring the
In the following some goodness-of-fit tests are described. It goodness of fit. A probability plot is a graph of the ranked
will be referred to as ‘case p’, in analogy to the Stephens observations x(i) versus an approximation of their expected
(1986) use of ‘case 0’ for the case when the parameters are value, F1(1  qi), where qi is the plotting position, which
Statistical Hydrology 493

can be written as 2.18.4 IDF Curves

ia IDF curves are probably one of the most commonly used tools
qi ¼ in engineering practice. IDF curves are simple functions be-
n þ 1  2a
tween the rainfall intensity i, the timescale k at which the
rainfall process is studied, and the return period T (see Section
where ao0.5 is a coefficient (see Stedinger et al., 1993, 2.18.4.1 for definitions). Nevertheless, the concept of IDF
table 18.3.1 for standard a values). Appropriate critical curves is often misinterpreted, mainly because of the imprecise
values for probability plot tests for the EV1 and normal terms used in its definition. It will be apparent in the fol-
distributions are tabled by Stedinger et al. (1993). No such lowing analysis that both terms, ‘duration’ and ‘frequency’, are
tables exist for the GEV with the three parameters estimated misleading.
from the sample. For the P3 distribution, the testing pro- Specifically, the term ‘duration’ is often misinterpreted as
cedure is described by Vogel and McMartin (1991). the actual time duration of a rainfall episode, while the term is
3. Tests based on the comparison of empirical and hypothetical meant for the time interval k, or else the timescale k, over
L-moments ratios. The appropriate testing procedure and which the rainfall process is averaged. For example, the actual
percentage points are found in Fill and Stedinger (1995) duration of a rainfall episode may be only a fraction of the
for the EV1 distribution, in Stedinger et al. (1993) for time interval k, or may be equal to many time intervals k.
the normal distribution, and in Wang (1998) for the In addition, the term ‘frequency’ traditionally is meant for
GEV distribution. The test is not available for the P3 the number of occurrences of a periodic event during a time
distribution. unit, and is reciprocal to the period which is defined as the
4. Tests based in the empirical distribution function (EDF). EDF exact time between two successive occurrences of the event.
tests are based on the comparison between the hypo- Thus, this may falsely lead to the belief that the rainfall in-
thetical and empirical distribution function, Fn(x), a cu- tensity value assigned to a return period T will occur once
mulative probability distribution function that every T years. Of course, this is wrong, as the correct inter-
concentrates probability 1/n at each of the n values in a pretation is that the rainfall intensity value i, assigned to a
sample. The discrepancy between the two distributions can return period T, will on average be exceeded once every
be measured either with statistics of the form maxjFn ðxÞ  T years. This means that during a particular period of T years
FX ðxÞj (Kolmogorov–Smirov
R (KS) test), or using quadratic the value i may not be exceeded at all, or exceeded several
statistics, Q 2 ¼ n all x ½Fn ðxÞ  FX ðxÞ2 CðxÞ dx, where C(x) times, and only on average it will be once every T years. Thus,
is a weight function. When C(x) ¼ 1, one has the Cramer– frequency in the IDF curves, it could be said, is referred to an
von Mises statistic, usually called W2, which is a measure of ‘average frequency’.
the mean square difference between the empirical and Given the above, a more correct term for IDF curves would
hypothetical CDF; when CðxÞ ¼ ½FX ðxÞð1  FX ðxÞÞ1 , the be intensity–timescale–return period curves. Recently, the
tails of the distribution are weighted more than the central term ‘ombrian curves’ has been coined (Papalexiou and
part, and one has the Anderson–Darling statistic, called A2. Koutsoyiannis, 2008) based on the ancient Greek word
W2 and A2 are estimated in practice as (e.g., Stephens, ‘ómbroB’ (pronounced ómbros) meaning rain (send by the
1986) Olympian god Zeus). Nevertheless, as the prevailing term at
the moment is IDF curves, this term will be used in the rest of
n

X
 ð2i  1Þ 2
1 the text.
W2 ¼ þ FX xðiÞ  ð32Þ It seems that Kuichling (1889) was the first, who studied
12n i¼1 2n
the rainfall in relation with timescale; however, IDF curves, on
a basis that is still in use, were established by Bernard (1932).
and Since then, the importance of IDF curves in engineering ne-
cessitated the study and the construction of IDF curves in
1X n several parts of world. For example, in the USA, the US Wea-
A2 ¼  n  fð2i  1Þ lnðFX ðxðiÞ ÞÞ ther Bureau created a rainfall frequency Atlas (Hershfield,
n i¼1
1961); in addition, the NOOA developed maps for the
þ ð2n þ 1  2iÞ lnð1  ðFX ðxðiÞ ÞÞg ð33Þ Western US (Miller, 1973) and the Eastern and Central US
(Frederick, 1977). Similar maps were constructed for Sri Lanka
respectively, where x(i) represents the ith element in the (Baghirathan, 1978), Namibia (Pitman, 1980), areas of Brazil
ordered sample. (Brasil Vieira and Zink de Souza, 1985), Australia (Canterford,
1986), Pennsylvania (Gert, 1987), India (Kothyari, 1992), and
Suitable tables of percentage points for the KS test in the many more. More recently, IDF relationships have been
p-case are found in Stephens (1986) for the EV1 and normal studied for Southeast Asia (Dairaku et al., 2004), Quebec in
distributions. For the GEV and GAM distributions with all the Canada (Mailhot et al., 2007), the Netherlands (Overeem
parameters estimated, the appropriate percentage points are et al., 2008), and for Denmark (Madsen et al., 2009).
instead not known. Percentage points in the p-case for EV1, IDF curves have a great variety of applications, as they are a
NORM, GAM, and GEV distributions can be calculated very convenient and useful tool used in the hydraulic design of
following the procedure described by Laio (2004) for the flood protection infrastructures and in flood risk management
Cramer–von Mises and Anderson–Darling tests. in general. They provide the basic input in models that convert
494 Statistical Hydrology

the rainfall to flood discharge, for example, they provide the In the literature, there are several different techniques for
rainfall rate for the so-called rational method. Essentially, their constructing IDF curves that vary significantly. Regarding the
usefulness is in predicting the average rainfall intensity value, starting series, or the historical samples used for the con-
for a given timescale that depends on the infrastructure’s struction of IDF curves, some methods use annual maxima
characteristics, and for a given return period that depends on series (AMS) of rainfall intensity, that is, the annual maximum
the infrastructure’s importance and the aimed safety. values of every timescale, and others use partial duration series
(PDS), that is, the series of values above a threshold (for
2.18.4.1 Definition of IDF Curves and Clarifications comparison see e.g., Langbein, 1949; Cunnane, 1973; Takeu-
chi, 1984; Buishand, 1989; Madsen, 1997a, 1997b; Begueria,
IDF curves are mathematical formulas that relate the rainfall 2005; Ben-Zvi, 2009). Nevertheless, the use of AMS is by far
intensity i with the timescale k and the return period T, that is, more popular as the AMS are usually readily available, or can
formulas that establish a one-to-one correspondence among be easily prepared. In addition, the use of AMS offers com-
rainfall intensity i, timescale k, and return period T. putational simplicity (it will be apparent in the next sections),
To clarify, the rainfall intensity is, in general, a continuous as it can be assumed that the probability distribution of the
time stochastic process, that is, at every time instant t, the annual maximum rainfall intensity at each timescale kj be-
rainfall intensity has a value i(t) that can be either zero or longs to the same family of distributions, that is, the extreme
positive. Of course, the instantaneous rainfall intensity i(t) value distributions.
cannot be known, however; the rainfall depth can be meas- Methodologies for constructing IDF curves do not only
ured at consecutive time intervals of duration k each, and thus, vary in the samples used, but also may be based on different
dividing the rainfall depth by the time duration k results in approaches. For example, the classical empirical forms are
the average rainfall intensity time series. The time duration k presented in Chow et al. (1988: 459); a more general approach
over which the rainfall intensity is averaged is called the applied in United States has been proposed by Chen (1983);
timescale k. general forms consistent with the probability theory are given
Obviously, as in reality only the average rainfall intensity is by Koutsoyiannis et al. (1998); forms in relation with L-mo-
used, and for the sake of brevity, instead of the term average ments by Hosking and Wallis (2005); approaches in relation
rainfall intensity the term rainfall intensity will be used. The with multifractals are given by Bendjoudi et al. (1997) and
rainfall intensity at timescale k can be regarded as a random Veneziano et al. (2007); and in relation with copula functions
variable (r.v.), denoted as I(k), following a probability distri- by Singh and Zhang (2007).
bution FI(k)(i), whereas the timescale k is a specified quantity Nevertheless, most of forms of IDF curves can be combined
and not an r.v. Moreover, it is well known that the return in the following general expression:
period T assigned to a value of an r.v. is defined as the average
time needed for this value to be exceeded, and in a discrete gðTÞ  
time process is explicitly related to the probability distribution
iðk; TÞ ¼ i in mm h 1 ; k in h; T in years ð36Þ
hðkÞ
F of the r.v. by
where g(T) is a function of the return period T and h(k) is a
1
T¼ ð34Þ function of the timescale k. Clearly, this expression implies the
1F separable function dependence of the rainfall intensity i on
the return period T and on the timescale k, and even though
It is noted that T is expressed in the same time units as the theoretical consistency of this assumption has been re-
the timescale k of the discrete time process, for example, if the cently disputed, for moderate and large return periods pro-
timescale k is 1 h then T is expressed in hours. Therefore, if the vides a close approximation sufficient for practical purposes
probability distribution FI(k)(i) is known, the rainfall intensity (Papalexiou and Koutsoyiannis, 2008). Of course, as it is ob-
at timescale k and for a return period T can be estimated, given vious form Equation (36) that the rainfall intensity is a
Equation (34), by the quantile function QI(k)(T) of the monotonically increasing function of the return period T, and
distribution, FI(k)(i): a monotonically decreasing function of the timescale k.
 
1
iðk; T Þ ¼ FI1
ðkÞ 1  ¼ QIðkÞ ðT Þ ð35Þ 2.18.4.2 Empirical Methods
T
Empirical forms of IDF curves, due to their long history, as
Nevertheless, the estimation of rainfall intensity for a given they date back to 1932 (Bernard, 1932), are those mostly
return period T and for an arbitrary timescale k within a de- studied and used in practice, while are still the most popular
sired interval – as it is often the demand in engineering forms covered in existing text books (e.g., Chow et al., 1988:
practice – would require knowledge of the distribution FI(k)(i) 459; Wanielista, 1990: 61; Shaw and Shaw, 1998: 228; Mays,
for every timescale k within this interval. Undoubtedly, this is 2004: 219). In general, compared to other forms of IDF curves,
hard to accomplish, if not impossible, as in reality, the dis- their expressions are characterized by simplicity, while their
tribution FI(k)(i) can only be estimated for a few discrete parameters are easy to estimate, at least for the most simple
timescales. In fact, the construction of IDF curves remedies forms among them. The most commonly used empirical ex-
this problem by using the few estimated distributions FI(k)(i) pression for the return period function is g(T) ¼ aTb, while
to establish a function that assigns a rainfall intensity value to others have also been used. In addition, the timescale function
any given timescale k and any return period T. can be found in many variations that, however, can all be
Statistical Hydrology 495

combined to the general expression h(k) ¼ (kg þ d)e (for a popularity the last decade, can be used taking special care in the
comparison see, e.g., Garcı́a-Bartual and Schneider, 2001; estimation of the parameter y3. In particular, as the typical
Di Baldassarre et al., 2006a). For convenience, the different rainfall samples are usually small, the estimation of the par-
variations of the return period function g(T) and the timescale ameter y3 may be highly uncertain. In order to remedy this,
function h(k) used in this text are distinctly named: Koutsoyiannis (2004a, 2004b) proposed to adopt a global
value for y3, that is, y3 ¼  0.15, as this value resulted from
gðTÞ : g1 ðTÞ ¼ aT b ; g2 ðTÞ ¼ a þ b ln T ð37Þ studying many rainfall samples from stations all over the world.
Step 2: In the second step, a set of p characteristic return
hðkÞ : h1 ðkÞ ¼ kg ; h2 ðkÞ ¼ kg þ d; period values {T1, y , Tl, y , Tp} is defined (e.g., {2, 5, 10, 20,
h3 ðkÞ ¼ ðk þ dÞe ; h4 ðkÞ ¼ ðk g þ dÞe ð38Þ 50, y , Tp}) and the m fitted distributions form step 1 are used
to evaluate the rainfall intensity for the selected return periods
where a, b, g, d, and e are the parameters to be estimated. Of and for each of the m timescales kj. This procedure will result
course, all the different variations of the g(T) and the h(k) in a set comprising m  p points of the form (ij,l,kj,Tl). The
functions may be used, thus resulting in several different evaluation of the maximum rainfall intensities ij,l can be ac-
empirical forms of IDF curves. complished using the quantile functions QIðkj Þ ðTl Þ of the fitted
distributions expressed in relation with the return period T.
2.18.4.2.1 Parameter estimation For the Gumbel and GEV distributions, the quantile functions
The typical estimation procedure of IDF curves’ parameters, are given, respectively:
based on AMS (e.g., Chow et al., 1988: 459), can be sum-   
1
marized in three steps. ij;l ¼ QIðki Þ ðTl Þ ¼ y1  y2 ln ln 1  ð41aÞ
Tl
Step 1: In the first step, a suitable probability distribution is
selected and fitted to each maximum rainfall intensity data set (    )
that comprises values of the same timescale kj, where j ¼ 1, y2 1 y3
ij;l ¼ QIðkj Þ ðTl Þ ¼ y1 þ 1  ln 1  ð41bÞ
y , m, with m denoting the total number of different time- y3 Tl
scales that data are available.
Clearly, the many distribution choices and the many Step 3: In this final step the parameters of the selected form
available distribution fitting methods (e.g., the method of of IDF curves are estimated.
moments and L-moments, or the maximum likelihood and The parameter estimation of the most simple and one of
the least-squares error methods) may significantly affect the the most commonly used empirical forms of IDF curves, that
estimated parameters of the IDF curves. is, i(k,T) ¼ g1(T)/h1(k), can be done analytically using the
Nevertheless, a natural choice for the probability distri- method of multiple linear regression. Clearly, logarithmizing
bution to be fitted, given that the groups of rainfall intensity this simple form results in
values are annual maximum values, is one of the two max-
ln iðk; TÞ ¼ ln a þ b ln T  g ln k ð42Þ
imum extreme value distributions, that is, the Gumbel distri-
bution, given in Equation (39), or the GEV distribution, given
which is for the form y ¼ x0 þ x1x1 þ x2x2, and consequently,
in Equation (40) (with parameter y3o0 in order to be un-
the parameters a, b, and g can be estimated by performing a
bounded form above), and with the latter comprising the
multiple linear regression using the set of (ln ij,l, ln kj, ln Tl)
Gumbel distribution as a special case for y3 ¼ 0:
points, evaluated in step 2. Obviously, the rainfall intensity
   logarithm ln i is considered as the dependent variable y,
i  y1
FIðkj Þ ðiÞ ¼ exp exp  ðy1 AR; y2 4 0Þ ð39Þ whereas the timescale logarithm ln k and the return period
y2
logarithm ln T as the independent variables x1 and x2, re-
"   # spectively. The parameters a, b, and g will straightforwardly
i  y1 1=y3 result from the estimated multiple linear regression
FIðkj Þ ðiÞ ¼ exp  1  y3
y2 coefficients x0, x1, and x2, that is, a ¼ exp(x0), b ¼ x1, and
ðy1 AR; y2 4 0; y3 4 0Þ ð40Þ g ¼  x 2.
This technique, however, is not directly applicable in the
where the symbol I(kj) stands for the annual maximum rain- case of more general forms of IDF curves. Specifically, loga-
fall intensity at timescale kj. Yet, it should be noted that apart rithmizing the i(k,T) ¼ aTb/(kg þ d)e results in
from the Gumbel and GEV distributions, other distributions
have also been used to describe annual maxima, for example, ln iðk; TÞ ¼ b lnT  e ln ðk g þ dÞ þ ln a ð43Þ
the log-Pearson III and lognormal distributions.
Although the Gumbel distribution has been the traditional which is not of the form y ¼ x0 þ x1x1 þ x2x2. Nevertheless,
choice for describing maxima, as it is a parsimonious model inspection of Equation (43) suggests that if the timescale
and often gives good results, new evidence suggests (Gellens, function h(k) ¼ h3(k), then the term ln(kg þ d) in the equation
2002; Ramesh and Davison, 2002; Koutsoyiannis, 2004a, becomes ln(k þ d), and thus, multiple regression can be per-
2004b; Salvadori and De Michele, 2006) that the Gumbel formed by assuming given values of d. The estimated par-
distribution may seriously underestimate the rainfall intensity ameters a, b, and g should be those for that d minimizes a
for large return periods, and thus its use should be avoided. proper norm between the values of the set of points (ln ij,l,
Alternatively, the GEV distribution, which has gained ln kj, ln Tl) and the estimated ones by the selected form of IDF
496 Statistical Hydrology

curves. Obviously, if the return period function g(T) ¼ g2(T), timescale data. In this application, for demonstration and
this methodology is not applicable. comparison, both the Gumbel and the GEV distributions are
In addition, one global way of fitting a function, linear or fitted to the data. In addition, the L-moments method
nonlinear, to a given data set of values, and thus applicable in (Hosking, 1990) was selected as a fitting method, as it is ro-
the parameter estimation of IDF curves, is to minimize the bust and easy to apply – especially for the Gumbel and the
mean square error (MSE) between the values of the given data GEV distributions, it results in analytical equations. The results
set and the corresponding values calculated from the function are presented in Table 7.
to be fitted. Alternatively, in cases where there are large dif- The fitted distributions to the empirical data, and the
ferences between the values of the given data set, instead of empirical distribution functions according to Weibull plotting
minimizing the MSE, it may be more suitable to minimize the position, are depicted in Figure 7. Clearly, both distributions
logarithmic SE (log SE). In the case studied here, the MSE and perform very well up to return period values approximately
the log SE are given, respectively, by equal to 20 years. Of course, as it was expected, the estimated
rainfall intensity difference between the two distributions in-
m Xh p i2
1 X creases monotonically with the return period, and for return
MSE ¼ iðkj; Tl Þ  QIðkj Þ ðTl Þ ð44Þ period values higher than 100 years, the difference gets sig-
mp j¼1 l¼1
nificant, with the GEV distribution resulting in higher rainfall
intensity estimates.
p
m X
X iðkj ; Tl Þ Both the Gumbel and the GEV distributions fit equally well
log SE ¼ log 2 ð45Þ to the empirical points (see Figure 7); however, the Gumbel
j¼1 l¼1
QIðkj Þ ðTl Þ
distribution may underestimate the rainfall intensity for high
return periods (see step 1 of Section 2.18.4.2.1), and thus, the
where i(kj, Tl) is the rainfall intensity values calculated from GEV distribution is preferred to generate the set of (ij,l, kj, Tl)
the selected form of IDF curves, for example, one of the forms points described in step 2 of Section 2.18.4.2.1. In addition,
resulted from the combinations given in Equations (37) and this argument is fortified by noticing in Figure 7 that the
(38) and QI(kj)(Tl) is the quantile functions of the distri- empirical return period of the higher historical value in
butions fitted in step 1, for the rainfall intensity of timescale kj the smaller timescales is disproportionally small compared
and of return period Tl. to the one resulted by the Gumbel distribution. For example,
the empirical return period of the largest value in the 5 min
2.18.4.2.2 Application in a real-world data set timescale is 19 years, and while the GEV distribution assigns a
In this section, the methodologies described in Section theoretical return period to that value approximately equal to
2.18.4.2.1 will be applied in a real-world data set of recorded 80 years, the corresponding value by the Gumbel distribution
rainfall intensities form 1987 to 2004, in the station Ardee- is about 180 years.
more in UK. The data set was originally available, by the The selected characteristic return periods Tl are {2, 5, 10,
British Atmospheric Data Centre (BACD), as tipping bucket 20, 50, 100, 200, 500, 1000} years, a total of p ¼ 9 values. It is
measurements that were first converted in the 5-min temporal noted that due to the small recorded sample (18 years), the
resolution, second, aggregated over several timescales and rainfall intensity estimates in high return periods will be un-
third, the annual maximum values of each timescale were certain. Furthermore, the number of the selected timescales kj
extracted (British Atmospheric Data Centre, 2006). The sum- is m ¼ 9 (see Table 6), and therefore, a set of m  p ¼ 99 points
mary statistics of the resulted data set of maximum rainfall of the form (ij,l, kj, Tl) is generated. The rainfall intensity values
intensities in several different timescales are presented in ij,l are calculated using the quantile function of the GEV dis-
Table 6. tribution, given in Equation (41), for the characteristics return
As described in Section 2.18.4.2.1, the typical parameter periods Tl, with the estimated parameters that correspond to
estimation procedure of IDF curves begins with selecting and the timescale kj (see Table 7). The set of generated points is
fitting a theoretical probability distribution to the same depicted in Figure 8.

Table 6 Summary statistics of maximum rainfall intensity data (mm h1) at several timescales observed at Ardeemore

Timescale kj Sample size Mean Standard deviation Variation coefficient Skewness coefficient Minimum Maximum

5 min 18 44.03 20.77 0.47 2.04 20.30 112.83


10 min 18 33.07 19.14 0.58 2.86 16.20 103.22
20 min 18 23.00 10.03 0.44 1.52 11.35 52.97
30 min 18 17.49 6.62 0.38 1.14 8.27 35.50
60 min 18 11.23 3.64 0.32 0.61 4.90 20.40
2h 18 7.89 3.26 0.41 1.86 4.19 18.10
3h 18 5.99 2.05 0.34 2.04 3.13 12.60
6h 18 4.22 0.91 0.22 0.58 2.70 6.30
12 h 18 2.85 0.70 0.25 0.03 1.50 4.07
24 h 18 1.82 0.40 0.22 –0.09 1.00 2.61
48 h 18 1.25 0.37 0.30 0.94 0.72 2.17
Statistical Hydrology 497

Table 7 Estimated sample L-moments of the maximum rainfall intensity in 11 different timescales and the corresponding estimated parameters of
the fitted Gumbel and GEV distributions

Timescale Sample L-moments Gumbel parameters GEV parametersa

l1 l2 y1 y2 y1 y2

5 min 44.03 10.29 35.47 14.84 34.54 12.66


10 min 33.07 8.25 26.21 11.90 25.46 10.15
20 min 23.00 5.30 18.59 7.65 18.11 6.53
30 min 17.49 3.62 14.47 5.23 14.14 4.46
60 min 11.23 2.06 9.51 2.97 9.32 2.54
2h 7.89 1.63 6.53 2.35 6.38 2.01
3h 5.99 0.97 5.18 1.40 5.09 1.19
6h 4.22 0.51 3.79 0.74 3.75 0.63
12 h 2.85 0.41 2.51 0.59 2.47 0.50
24 h 1.82 0.23 1.63 0.33 1.61 0.28
48 h 1.25 0.21 1.08 0.30 1.06 0.25
a
Parameters estimated setting a priori y3 ¼  0.15 in all timescales.

200 T=1000 yr
10.12 T 0.212 T=500 yr
102 i (k,T ) =

Rainfall intensity, i (mm h−1)


100 k 0.615 T=200 yr
Rainfall intensity, i (mm h−1)

70 T=100 yr
50 T=50 yr
T=20 yr
30 T=10 yr
20 T=5 yr
T=2 yr
101 10
7
5
3
2

1
5 10 20 30 60 2 3 4 6 8 12 24 48
100
Minutes Hours
100 101 102 103
Timescale, k
Return period, T (years)
Figure 8 IDF curves for the Ardeemore station in UK constructed using
Figure 7 Empirical distribution functions (dots) according to the
the typical parameter estimation procedure, and the set of rainfall
Weibull plotting position, and fitted by the method of L-moments,
intensity points generated form the fitted GEV distribution to the
Gumbel (dashed lines) and GEV (solid lines) distributions for the data in
empirical data.
the timescales given in Table 8, ranging form 5 min to 48 h (from above
to below).

estimating the parameters of the simple i(k,T) ¼ aTb/kg by


Finally, the parameters of the most simple form of IDF minimizing the log SE is actually the same as performing
curves (i.e., i(k,T) ¼ aTb/kg) are estimated (see step 3 in Section the multilinear regression method. The fitted IDF curves the
2.18.4.2.1) by performing a multiple linear regression to the estimated parameters, and the resulted log SE are presented in
set of (ln ij,l, ln kj, ln Tl) points. The estimated multiple linear Table 8.
regression coefficients are x0 ¼ 2.315, x1 ¼0.211, and As expected, the additional parameters result in smaller
x2 ¼  0.615, and consequently, the resulted parameters of the log SE, and thus, in a better fit. Nevertheless, especially in the
IDF curves are a ¼ 10.12, b ¼ 0.211, and g ¼ 0.615. The resulted particular case studied here, the difference in log SE among the
IDF curves are depicted in Figure 8. different forms of IDF curves is not substantial, and therefore,
In addition to the previous simple form of IDF curves, according to the principle of parsimony, a more parsimonious
some more complicated empirical forms – and in order to form should be preferred than the most general five-parameter
demonstrate and compare their performance – were con- case. The last argument is also fortified by Figure 9, where the
structed for the Ardeemore station data set, by numerically more general IDF curves of Table 9 are depicted. Clearly, the
minimizing the log SE given in Equation (45) (see step 3 in IDF curves with return period function g1(T) and the three-
Section 2.18.4.2.1). The log SE minimization was performed parameter timescale function h4(k), compared with the ones
using one of the many software packages that include with the two-parameter functions h2(k) and h3(k), are not
numerical minimization routines. It is worth noting that significantly different.
498 Statistical Hydrology

Table 8 Five different empirical forms of IDF curves fitted by minimizing the logarithmic square error

IDF curves i(k,T) Estimated parameters Log SE

a b g d e

aT b =k g 10.12 0.212 0.615 0.99


aT b =ðk g þ dÞ 10.47 0.212 0.627 0.020 0.97
aT b =ðk þ dÞe 10.42 0.212 0.015 0.626 0.95
aT b =ðk g þ dÞe 10.39 0.212 1.810 0.005 0.347 0.91
ða þ b lnT Þ=ðk g þ dÞe 7.33 4.66 3.258 0.0003 0.192 0.92

T=1000 yr
200 10.47 T 0.212 200 10.42 T 0.212 T=1000 yr
i(k,T ) = T=500 yr i(k,T ) = T=500 yr
0.627 +
k 0.02 T=200 yr (k + 0.015)0.626
100 100 T=200 yr
Rainfall intensity, i (mm h−1)

Rainfall intensity, i (mm h−1)


T=100 yr T=100 yr
70 70
50 T=50 yr T=50 yr
50
T=20 yr T=20 yr
30 T=10 yr 30 T=10 yr
20 T=5 yr 20 T=5 yr
T=2 yr T=2 yr
10 10
7 7
5 5
3 3
2 2

1 1
5 10 20 30 60 2 3 4 6 8 12 24 48 5 10 20 30 60 2 3 4 6 8 12 24 48
Minutes Hours Minutes Hours
Timescale, k Timescale, k

200 10.39 T 0.212 T=1000 yr 200 7.33 + 4.66 ln T T=1000 yr


i(k,T ) = i(k,T ) = T=500 yr
T=500 yr
(k1.81 + 0.005)0.347 (k3.258 + 0.005)0.192 T=200 yr
100 T=200 yr 100
Rainfall intensity, i (mm h−1)

Rainfall intensity, i (mm h−1)

70 T=100 yr 70 T=100 yr
50 T=50 yr 50 T=50 yr
T=20 yr T=20 yr
30 T=10 yr 30 T=10 yr
20 T=5 yr 20 T=5 yr
T=2 yr T=2 yr
10 10
7 7
5 5
3 3
2 2

1 1
5 10 20 30 60 2 3 4 6 8 12 24 48 5 10 20 30 60 2 3 4 6 8 12 24 48
Minutes Hours Minutes Hours
Timescale, k Timescale, k
Figure 9 Four different empirical forms of IDF curves constructed for the Ardeemore station data set using the typical parameter estimation
procedure.

Moreover, it is important to note that all IDF curves shown important parameter that allows a better fit of the IDF curves
in Figure 9, compared to the simple form depicted in in small timescales, or equally in the high rainfall intensities.
Figure 8, exhibit a slight curvature in small timescales – which Although in this particular case the resulted curvature is very
is more apparent in the IDF curves with the timescale function slight, and thus not important, this is not the general rule, as
h4(k). This is obviously the effect of the parameter d, a very for a different data set this curvature may be very strong and
Statistical Hydrology 499

Table 9 Theoretically consistent forms of IDF curves fitted by two different methods for the Ardeemore station data set

Method Parameters IDF curves i(k,T)

QGEV (T ) QGEV (T ) QGEV (T ) a QGumb (T ) QGumb (T )


(k g þ d)e (k þ d)e (k þ d)e (k g þ d)e (k þ d)e

Two-step robust estimation y1 9.92 10.02 9.95 10.04 10.14


y2 2.82 2.88 2.67 3.09 3.14
y3 0.09 0.09 0.15
g 2.298 2.298
d 0.002 0.017 0.017 0.002 0.017
e 0.255 0.593 0.593 0.255 0.593
KKW 12.08 13.38 13.38 12.08 13.38
One-step log SE minimization y1 9.65 9.83 9.85 9.85 10.04
y2 3.02 3.07 3.10 3.28 3.36
y3 0.16 0.16 0.15
g 3.089 3.089
d 0.0004 0.022 0.022 0.0004 0.022
e 0.185 0.577 0.577 0.185 0.577
log SE 2.13 2.16 2.16 2.26 2.29
a
The parameters were estimated by setting a priori y3 ¼  0.15.

thus essential in engineering practice. Consequently, it is (    )


y2 1 y3
proposed that the selected form of IDF curves should include y1 þ 1  ln 1 
this parameter. QGEV ðTÞ y3 Tl
iðk; TÞ ¼ ¼ ð48Þ
h4 ðkÞ ðkg þ dÞe
2.18.4.3 Theoretically Consistent Methods
Apart from the classical empirical forms of IDF curves, de- where the symbols QGumb(T) and QGEV(T), obviously, denote
scribed in Section 2.18.4.2, there are forms of IDF curves, the quantile functions of the GEV and Gumbel distributions,
based on the general Equation (36), that are theoretically respectively.
more consistent. Koutsoyiannis et al. (1998) proposed Nevertheless, in this theoretical framework, the return
that empirically derived return period functions g(T) are period functions in the empirical forms of IDF curves actually
unnecessary, as the g(T) can be determined from the prob- correspond to theoretical probability distributions. Specific-
ability distribution function of the maximum rainfall intensity ally, the return period function g1(T) ¼ aTb corresponds to
I(k). Specifically, this method is based on the fact that the
probability distribution FI(i) of the r.v. I ¼ I(k)h(k) is just  b  1=b
1 i
a scaled version of the distribution of the r.v. I(k), as the aT b ¼ QI ðTÞ3 a ¼ i3 FI ðiÞ ¼ 1 
1  FI ðiÞ ab
function h(k) for a certain timescale k is just a real number.
Thus, instead of estimating the probability distribution FI(k)(i) ð49Þ
of the r.v. I(k) for several timescales, only the distribution of
the r.v. I should be estimated. Consequently, the form of IDF which is the celebrated two-parameter Pareto distribution with
curves would be parameters a40, b40 and support iA[ab,N). In addition, the
return period function g2(T) ¼ a þ b ln T corresponds to
F1
I ð1  1=TÞ QI ðTÞ
iðk; TÞ ¼ ¼ ð46Þ 1
hðkÞ hðkÞ a þ b ln T ¼ QI ðTÞ3 a þ b ln
1  FI ðiÞ
 
where QI(T) is the quantile function of the r.v. I, and not some ia
¼ i3 FI ðiÞ ¼ 1  exp  ð50Þ
empirically proposed function. Using the extreme value b
distributions in this framework, as they are the natural choice
for describing maxima – although many other distributions which is the celebrated two-parameter exponential distri-
have been used – the forms of IDF curves, according to bution with parameters aAR, b40, and support iA[a,N).
Equation (46) and for the general three-parameter timescale
function h4(k), for the Gumbel and GEV distributions, re- 2.18.4.3.1 Parameter estimation
spectively, become Two-step robust estimation method. This method (Koutsoyiannis
   et al., 1998) estimates the parameters of the IDF curves in
1
y1  y2 ln ln 1  two steps. First, it estimates the parameters of the scale func-
QGumb ðTÞ Tl tion h(k), and second, the parameters of the return period
iðk; TÞ ¼ ¼ ð47Þ
h4 ðkÞ ðkg þ dÞe function g(T), which, in this framework, is the quantile
500 Statistical Hydrology

function of a probability distribution. The method is based on position, by


the fact that, the r.v.’s Ij ¼ I(kj)h(kj) should be distributed
identically. nj þ 1
Tj;l ¼ ; l ¼ 1; y; nj ð52Þ
Given the above, in the first step, multiplying the values of l
each timescale kj group, denoted fij;1 ; y ; ij;nj g by the value
h(kj), should result in groups from the same population. where nj is the sample size of the timescale kj group.
Apparently, the function of h(k) is not a priori known. Con- Consequently, the MSE and log SE given in Equations (44)
sequently, the method assumes a set of values for the h(k) and (45), respectively, are modified to
parameters, and consecutively uses an appropriate statistic n
to check that indeed the resulted different timescale groups 1 X m X j
2
MSE ¼ iðkj ; Tj;l Þ  ij;l ð53Þ
(i.e., fhðkj Þ  ij;1 ; y; hðkj Þ  ij;nj g) belong to same population. mn j¼1 l¼1
This naturally leads to the Kruskal–Wallis test (Kruskal and
Wallis, 1952), which is a nonparametric test applied to infer nj
m X
X iðkj ; Tj;l Þ
whether or not different groups of values belong to the same log SE ¼ log 2 ð54Þ
j¼1 l¼1
ij;l
population. The test static KKW is given by

X   nj where i(kj,Tj,l) is the rainfall intensity calculated from the se-


12 m
Nþ1 2 1X
KKW ¼ nj rj  ; rj ¼ rj;l ð51Þ lected form of IDF curves for the timescale kj and the empirical
NðN þ 1Þ j¼1 2 nj i¼1
return period Tj,l of the historical rainfall intensity value ij,l as
defined above.
where m is the total number of timescales, nj is the sample size Therefore, the one-step least-squares error method consists
of the timescale kj group, N is the total sample size across all of selecting a form of IDF curves, for example, one of the
groups, rj the average rank of the timescale kj group, and rj,l the theoretically consistent forms given in Equations (47) or (48),
rank (among all data) of the lth data value of the timescale kj and numerically minimizes the resulted MSE or log SE be-
group. tween the selected form of IDF curves and the historical data.
Clearly, different groups from the same population would Again, as noted in Section 2.18.4.2.1 the log SE may be more
result in a small value of the KKW statistic. Therefore, the suitable due to the large differences in the rainfall intensity
estimated parameters of the scale function h(k) are those that values in small and large timescales. Evidently, the estimated
minimize the KKW statistic. Essentially, minimizing the KKW parameters are the ones that minimize the MSE or the log SE.
statistic results in forcing the different groups of data to belong Alternatively, this method can also be used with the empirical
to the same population. Unfortunately, this minimization can forms of IDF curves or with any other form.
only be accomplished numerically, but numerical optimiza-
tion can now be routinely performed with widely spread 2.18.4.3.2 Application in a real-world data set
software packages. This section demonstrates the applicability of the afore-
Once the parameters of the timescale function h(k) are mentioned methodologies and presents the consistent forms
estimated, it is straightforward to estimate the parameters of of IDF curves constructed for the Ardeemore station data set
the return period function g(T). Specifically, the values of all used in Section 2.18.4.2.2. Among the several forms of IDF
the resulted groups fhðkj Þ  ij;1 ; y; hðkj Þ  ij;nj g are unified in one curves that could emerge by combining different return period
sample – at least theoretically should belong to the same functions g(T) and timescale functions h(k), the ones pre-
population – and the probability distribution that corres- sented in Equations (47) and (48) are used for the two- and
ponds to the return period function g(T) of the selected form three-parameter timescale functions h2(k) and h4(k), respect-
of IDF curve is just fitted to this unified sample. The estimated ively. Each form of IDF curves, for comparison and demon-
parameters of the fitted distribution are, evidently, the par- stration, was fitted using both methods described in Section
ameters of the return period function g(T). 2.18.4.3.1, that is, the two-step robust estimation method, and
One-step least-squares estimation method. The basic difference the one-step least-squares estimation method. The results are
of this method compared to the least-squares method pre- presented in Table 9.
sented in Section 2.18.4.2.1 is that it uses historical data It seems that the one-step least-squares error method is the
(Koutsoyiannis et al., 1998). As a result, first, it avoids the most straightforward method to apply. Simply, the desired
procedure of fitting distributions to each timescale kj group form of IDF curves is selected for an arbitrary set of par-
and generates values using a set of characteristic return peri- ameters, and is used to estimate the rainfall intensity values
ods, and second, it does not depend on the range of the i(kj ,Tj,l) that correspond to the empirical return periods Tj,l
characteristic return period set, as it uses the empirical return given by Equation (52). The numerical minimization of the
periods resulting from the historical data. MSE or of the log SE, given in Equations (53) and (54), re-
In particular, to every rainfall intensity value of every spectively, between the historical data and the ones predicted
timescale kj group of historical data, an empirical return per- by the selected form of IDF curves, results in the estimated
iod can be assigned. Specifically, sorting in decreasing order parameters. The selected forms that were fitted by minimizing
the values of every timescale kj group, for example, the log SE as the estimated parameters are presented in
ij;ð1Þ 4 y4 ij;ðlÞ 4 y4 ij;ðnj Þ , the empirical return period Tj,l of Table 9.
the lth largest rainfall intensity value, denoted by ij,l, of the Among the several variations of fitted IDF curves presented
timescale kj group, is given according, to the Weibull plotting in Table 9, Figure 10 depicts the QGEV(T)/h2(k) for
Statistical Hydrology 501

T=1000 yr T=1000 yr
200 200
T=500 yr T=500 yr
T=200 yr T=200 yr
Rainfall intensity, i (mm h−1) 100 100

Rainfall intensity, i (mm h−1)


T=100 yr T=100 yr
70 70
T=50 yr T=50 yr
50 50
T=20 yr T=20 yr
30 T=10 yr 30 T=10 yr
20 T=5 yr 20 T=5 yr
T=2 yr T=2 yr
10 10
7 7
5 5
3 3
2 2

1 1
5 10 20 30 60 2 3 4 6 8 12 24 48 5 10 20 30 60 2 3 4 6 8 12 24 48
Minutes Hours Minutes Hours
(a) Timescale, k (b) Timescale, k

200 T=1000 yr T=1000 yr


200
T=500 yr T=500 yr
100 T=200 yr 100 T=200 yr
Rainfall intensity, i (mm h−1)

Rainfall intensity, i (mm h−1)


70 T=100 yr T=100 yr
70
50 T=50 yr 50 T=50 yr
T=20 yr T=20 yr
30 T=10 yr 30 T=10 yr
20 T=5 yr 20 T=5 yr
T=2 yr T=2 yr
10 10
7 7
5 5
3 3
2 2

1 1
5 10 20 30 60 2 3 4 6 8 12 24 48 5 10 20 30 60 2 3 4 6 8 12 24 48
Minutes Hours Minutes Hours
(c) Timescale, k (d) Timescale, k
Figure 10 IDF curves constructed for the Ardeemore station data set. Graphs (a) and (b) depict the QGEV ðT Þ=h2 ðk Þ for y3 ¼  0.15 and the
Q Gumb ðT Þ=h2 ðk Þ, respectively, fitted with the robust estimation method, while graphs (c) and (d) depict the same IDF curves fitted with the one-step
LSE method. The parameters for each case are given in Table 9.

y3 ¼  0.15, and the QGumb(T)/h2(k), fitted with robust esti- GEV distribution. Obviously, from a mathematical point of
mation and by minimizing the log SE. The comparison of the view this was expected, but given that the log SE is smaller in
same form of IDF curves fitted by different methods reveals the form QGEV(T)/h2(k) than in the QGmnb(T)/h2(k), this may
that there are small, albeit noticeable, differences between suggest that adoption of the QGumb(T)/h2(k) for the design
them. Specifically, the IDF curves fitted with the one-step LSE purposes may be a dangerous choice.
method, especially for large return periods, are slightly more
conservative, that is, the predicted rainfall intensity is higher
compared to the one predicted by the other method. In add- 2.18.5 Copula Function for Hydrological Application
ition, the one-step LSE method results in stronger curvature in
the area of small timescales. This behavior can be explained by Since most of the hydrologic phenomena involved multiple
the presence of a very large value in the small timescales of the variables across various temporal and spatial scales with sig-
historical data set (see Figure 7), and it is well known that the nificant inter-dependencies and non-Gaussian-like behaviors,
LSE methods, in general, are sensitive to outliers. Of course, as univariate approaches with the assumption of normality
Figure 10 demonstrates, the major difference is between the or independence among variables may cause significant
two forms of IDF curves: the form that uses the quantile of the over-simplification. In order to address the interwoven
Gumbel distribution as a return period function, especially for dependencies between hydrologic variables, multivariate joint
the large return periods, predicts significantly smaller rainfall probability distribution needs to be properly modeled. In the
intensity values compared to one that uses the quantile of the past there were attempts focusing on preserving the correct
502 Statistical Hydrology

correlation relationship (e.g., Goel et al., 2000; Singh and (or dependence) refers directly to Pearson’s linear correlation
Singh, 1991; Yue, 2001), but they usually required more as- coefficient r. For random variables, X and Y with means
sumptions and case-specific restrictions (types of marginal as x and y, r is defined as E½ðX  xÞðY  yÞ=Std½XStd½Y, in
distributions, variables, and selected fixed-form joint distri- which E[  ] and Std[  ] are the operators of expectation and
butions) may apply. Thus, though the univariate approach standard deviation. Though r is widely adopted, its limi-
may be less realistic, it is sometimes a necessary trade-off be- tations are less emphasized: (1) r tends to be highly affected
tween complexity and applicability. by outliers and hence is not suitable for extreme value an-
With the need to characterize multidimensional random- alysis; (2) the value of r may change if X and/or Y are trans-
ness in nature, a flexible approach with general applicability is formed monotonically (such as exponentiation) while their
of desire. Such a method should be able to model different rank correlations remain the same; (3) most important of all,
types of probability distributions for hydrologic variables r is only adequate for Gaussian (or elliptical) distributions
governed by various physical mechanisms (e.g., rainfall in- (Nelsen, 2006).
tensity, flood peak, and drought severity), while also being An example is illustrated in Figure 11, where two bivariate
able to faithfully describe their dependence structures. distributions and the corresponding realizations are pre-
Delightfully, these challenges can now be addressed by using a sented. Figure 11(a) shows the bivariate Gaussian distribution
novel statistical tool – copulas. Copulas got the name as with r ¼ 0.8. The two-dimensional surface represents the joint
functions that couples arbitrary univariate distributions to PDF hXY(x,y). When integrating either one of the variables over
form the multivariate joint distribution. In order words, they the entire domain (  N,N), the marginals fx(x) and fY(y) can
are the mathematical formulations of the entire dependence be obtained, which are plotted on the two sides. In
space rather than a single correlation or dependence measure Figure 11(a), both marginals are the typical bell-shape
(e.g., Pearson’s linear correlation coefficient). Since all multi- Gaussian densities. In the other case, the joint distribution
variate probability functions (such as multivariate Gaussian shown in Figure 11(b) is clearly not bivariate Gaussian, as
and bivariate exponential distributions) can be re-expressed hXY(x,y) has a different shape and the realizations reveal dis-
into the combinations of their marginal distributions and the similar patterns. However, the marginals shown in
corresponding copulas, the use of copulas does not conflict Figure 11(b) can still be univariate Gaussian (identical to
with the existing multivariate techniques, but endow them Figure 11(a)), and the correlation coefficient r is again the
with more possibility. In practice, conceptually similar to the same as 0.8. This example indicates that: (1) the joint distri-
selection of a most appropriate PDF for each individual vari- bution cannot be determined only by known marginals and
able, the most suitable dependence structure between vari- (2) the correlation coefficient r is not a sufficient measure-
ables of interest can be identified by testing various candidate ment of dependence for non-Gaussian distributions.
copula functions. Associated with the identified marginal As a matter of fact, a single dependence measure (e.g.,
distributions, together a general joint distribution can be besides r, Kendall’s concordance measure t and Spearman’s
formed. rank correlation r) may not be sufficient to describe the entire
Though the core theorem supporting copulas was pro- dependence space, just as the statistical moments are only the
posed by Sklar early in 1959, the growing number of copula summary of a univariate PDF. Hence, it motivates the use of
applications was not found until recently, mostly in the field copulas. The first usage of copula is attributed to Sklar (1959)
of finance (see Cherubini et al., 2004). In the hydrologic in a theorem describing how one-dimensional distribution
community, it is still a relatively new concept. Nevertheless, functions can be combined to form multivariate distributions.
copulas were soon found useful in various types of water re- For d-dimensional continuous random variables {Xl,y,Xd}
sources problems due to their great feasibility in modeling with marginal CDFs uj ¼ Fxj ðxj Þ; j ¼ 1; y; d, Sklar showed
multivariate dependence structure. Generally speaking, there that there exists one unique d-copula CU1 ;y;Ud such that
are several advantages that make copulas an appealing method
for hydrologic topics: (1) it can model non-Gaussian-like CU1 ;y;Ud ðu1 ; y; ud Þ ¼ HX1 ;y;Xd ðx1 ; y; xd Þ ð55Þ
variables; (2) the assumption of statistical independence is not
a prerequisite; (3) it proceeds in a parallel fashion and all the where uj is the jth marginal and HX1 ;y;Xd is the joint CDF of
existing univariate techniques hold; (4) it is less mathematical {X1, y , Xd}. Copulas CU1 ;y;Ud can be regarded as a transfor-
challenging compared to the conventional multivariate stat- mation of HX1 ;y;Xd from [  N, N]d to [0,1]d. The con-
istical approach; and (5) it helps generate sets of random sequence of this transformation is that the marginal
vectors with prescribed marginal distributions and depend- distributions are segregated from HX1 ;y; Xd . Hence, CU1 ;y; Ud
ence levels conveniently. Expect that copulas will gradually becomes only relevant to the association between variables,
play a more important role in the future hydrologic study; this and it gives a complete description of the entire dependence
section aims to provide the general hydrologic audience with structure. In other words, the characterization of joint distri-
the introduction, state-of-the-art applications, limitations, and butions can be performed separately for the marginal distri-
future research needs of the copula techniques. butions and for the dependence structure (described by
copulas), and therefore the dependence between variables can
be clearly revealed.
2.18.5.1 Concepts of Dependence Structure and Copulas
Among various types of copula function, one-parameter
As Gaussian distribution has been the most commonly used Archimedean copulas have attracted the most attention owing
statistical model for probability distributions and uncertain- to their several convenient properties. For an Archimedean
ties, to some engineers and hydrologists, the term correlation copula, there exists a generator j such that the following
Statistical Hydrology 503

∫−∞ hXY (x, y ) dy


Marginals fx (x ) = 3
fY (y ) =
∞ 2
0.4
∫−∞ hXY (x, y ) dx 1
0.2
0

y
0 −1
2 −2
0 2
Joint PDF 0
−2 −2 −3
hXY (x, y ) y x
−3 −2 −1 0 1 2 3
(a) x
3

0.4 1

y
0.2
−1

0 −2
2
0 2 −3
0 −3 −2 −1 0 1 2 3
y −2 −2
(b)
x x
Figure 11 Illustration of bivariate joint distributions: (a) bivariate Gaussian distribution with r ¼ 0.8 and the corresponding samples and (b) joint
distribution with Gaussian marginals and Clayton copulas and the corresponding samples (r ¼ 0.8).

relationship holds: project multivariate information onto a single axis. The


quantity KC was also used by Salvadori and De Michele
jðCðu; vÞÞ ¼ jðuÞ þ jðvÞ ð56Þ (2004b) for defining secondary return period for bivariate
copulas. Another statistic that can be related to j is
where the generator j is a continuous, strictly decreasing the concordance measure Kendall’s t, which is defined
function defined in [0,1], and j(1) ¼ 0. When the generator as t ¼ P[(X1  X2)(Y1  Y2)40]  P[(X1  X2)(Y1  Y2)o0],
j(t) ¼  ln t, the copula in (56) is C(u,v) ¼ uv, which is the where (X1,Y1) and (X2,Y2) are independent and identically
special case when the variables are independent. Some com- distributed random vectors with the same joint CDF HXY(x,y).
monly used families of one-parameter Archimedean copulas are Kendall’s t can be interpreted as the difference between
listed in Table 10, in which y is the dependence parameter. It probability of concordance P[(X1  X2)(Y1  Y2)40] (for
should be noted that not every family of Archimedean copulas positive dependence) and probability of discordance
can accommodate the entire range of dependencies (from per- P[(X1  X2)(Y1  Y2)o0] (for negative dependence). The value
fectly positive dependence to perfectly negative dependence). of Kendall’s t falls in [  1,1], where 1 represents total con-
The choice of copulas depends on the range of dependence cordance,  1 represents total discordance, and 0 represents
levels they can describe. For instance, Gumbel-Hougaard can concordance. To obtain the sample estimator of Kendall’s t, let
only be applied for positive dependence, Ali-Mikhail-Haq is (x1, y1) and (x2, y2) be two observations from a size-n sample
only suitable for weaker dependence (  0.1807oto0.3333), space, and then ^t can be estimated by
while Clayton, Frank, and Genest-Ghoudi are suitable for both
positive and negative dependencies. Figure 12 shows an ex- ðc  dÞ
^t ¼ ! ð58Þ
ample of using Frank family of copulas in computing random n
samples with various levels of dependence. 2
Archimedean copulas find wide applications because they
are easy to construct and possess several nice features. For where c denotes concordant pairs ((x2  x1)(y2  y1)40),
example, several statistical properties can be simply expressed and d denotes disconcordant pairs ((x2  x1)(y2  y1)o0).
in terms of j, such as the distribution function KC of copulas By using generator j, the theoretical Kendall’s t can be
(i.e. KC(t) ¼ P[C(U,V)rt]): expressed as
Z 1
jðtÞ
jðtÞ t¼1þ4 dt ð59Þ
KC ðtÞ ¼ t  0 ; tA½0; 1 ð57Þ 0 j0 ðtÞ
j ðtÞ
This useful property leads to the nonparametric procedure of
The distribution function KC offers a cumulative probability estimating dependence parameter y by equating ^t ¼ t. This
measure for the set {(u,v)A[0,1]2|C(u,v)rt}, and it can help nonparametric estimator does not rely on prior information of
504 Statistical Hydrology

Table 10 Some commonly used one-parameter Archimedean copulas

Family j(t)a Range of ya KC(t)b t(t)b,c

 
Ali-Mikhail-Haq
ln
1  yð1  tÞ ½1; 1Þ

tð1  y þ ytÞ 1  y þ yt
ln 2 2 1 2
t 1y t 1  1 lnð1  yÞc
3y 3 y
 
Clayton 1 y ½1; 0Þ,ð0; NÞ 1 t yþ1 y
ðt  1Þ t 1þ 
y y y yþ2
Frank e yt  1 ð N; 0Þ,ð0; NÞ e yt  1 e y  1 4
ln tþ ln yt 1 þ ½D1 ðyÞ  1d
e y  1 y e 1 y
Genest-Ghoudi ð1  t 1=y Þ y ½1; NÞ t 11=y 2y  3
2y  1
 
Gumbel-Hougaard ðlntÞ y ½1; NÞ lnt y1
t 1
y y
a
Column j(t), range of y adapted from Nelsen (2006).
b
Column KC(t) and (t) of the Genest-Ghoudi family adopted from Kao and Govindaraju (2007b).
c
t(t) of the Ali-Makhail-Haq, Frank, and Gumbel-Hougaard families adopted from Zhang and Singh (2007a), t(t) of the Clayton family adopted from Grimaldi and Serinaldi (2006b).
d
R
D1 is the Debye function of order 1, D1 ðyÞ ¼ y0 ðt =yðet  1ÞÞ dt .

Frank family,  = 10 Frank family,  = 10 Frank family,  = −10

1 1 1

0.5 0.5 0.5

0 0 0
1 1 1
1 1 1
0.5 0.5 0.5
0.5 0.5 0.5
v v u v
0 0 u 0 0 0 0 u

Frank family,  = 10 Frank family,  = 0.01 Frank family,  = −10


1 1 1

0.8 0.8 0.8

0.6 0.6 0.6


v

0.4 0.4 0.4

0.2 0.2 0.2

0 0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
u u u
Figure 12 Frank family of Archimedean copulas with various levels of dependencies.

marginal distributions, and provides a more objective measure To examine the appropriateness of a selected copula, one
of dependence structure. Therefore, the existence of outliers, can construct the empirical copula (i.e., the observed prob-
while affecting the estimation of marginal distributions, abilities) and apply it to perform goodness-of-fit tests. Similar
would not affect the determination of copulas. This aspect of to the concept of plotting position formula used in univariate
being able to determine the dependence structure independ- statistical analysis (e.g. Weibull formula), empirical copulas are
ent of marginals can be an advantage. Further information on rank-based empirically joint cumulative probability measures.
statistical interference, detailed theoretical background, and For sample size n, the d-dimensional empirical copula Cn is
descriptions of copulas can be found in Genest and Rivest  
(1993), Genest et al. (1995), Joe (1997), Nelsen (2006), and k1 k2 kd a
Cn ; ; y; ¼ ð60Þ
Salvadori et al. (2007). n n n n
Statistical Hydrology 505

where a is the number of samples {x1,y,xd} with x1 r x1ðk1 Þ ; precipitation stations in Indiana, USA with a minimum re-
y; xd r xdðkd Þ ; and x1ðk1 Þ ; y; xdðkd Þ with 1rk1,y,kdrn are the cording of 50 years. Extreme rainfall events were analyzed
order statistics from the sample. In an analogous fashion, the using both bivariate Archimedean copulas and trivariate
empirical distribution function Kcn can be expressed as Plackett copulas. The most appropriate definition for the se-
  lection of extreme rainfall samples was suggested. Joint dis-
l b tributions of extreme rainfall events were constructed and used
KCn ¼ ð61Þ
n n to compute design rainfall estimates. Comparisons between
the conventional and copula-based rainfall estimates showed
where b is the number of samples {x1, y , xd} with Cn(k1/ that the traditional univariate analysis provides reasonable
n, y , kd/n)rl/n. Empirical copulas Cn and empirical distri- estimates of rainfall depths for durations greater than 10 h but
bution function Kcn are mostly applied for model verification fails to capture the peak features of rainfall. Further appli-
and are treated as the observed (real) dependence structure. cations of copulas in rainfall analysis can be found in Kuhn et
Currently, the development of goodness-of-fit tests for copulas al. (2007), Singh and Zhang (2007), Evin and Favre (2008),
remains a major interest. Several applicable tests include: Laux et al. (2009), and Serinaldi (2009).
multidimensional KS test (Saunders and Laud, 1980), tests Copulas were adopted in flood related problems as well.
based on the probability integral transformation (Breymann Favre et al. (2004) applied copulas for multivariate flood fre-
et al., 2003; Genest et al., 2006; Dobric and Schmid, 2007; quency analysis for two watersheds in Quebec, Canada. The
Kojadinovic, in press), kernel-based smoothing techniques combined flooding risk of multiple catchments and the joint
(Fermanian, 2005; Panchenko, 2005; Scaillet, 2007), and distribution of peak flows and volume were discussed. Two
cross-product ratio model (Wallace and Clayton, 2003). families of Archimedean copulas (Frank, Clayton), in-
dependent and Farlie-Gumbel-Morgenstern copulas, were in-
vestigated. They showed that conditional probability of flood
2.18.5.2 Copulas in Hydrologic Applications
volumes is quite different when compared to the univariate
Though being a relatively new method in statistical hydrology, result. Therefore, return periods of design floods would be
the flexibility offered by copulas for constructing joint distri- different when the joint behavior is taken into account. De
butions is now evident from many applications. Copulas were Michele et al. (2005) used the Gumbel family of Archimedean
firstly applied in hydrologic studies by De Michele and copulas to model the dependence between flood peaks and
Salvadori (2003), and Salvadori and De Michele (2004a) for flood volumes. These two margins were analyzed by the gen-
rainfall frequency analysis. Hourly precipitation data from two eralized extreme value distribution. A bivariate model was
raingauges at La Presa (Italy) for 7 years (from 1990 to 1996) constructed to calculate the flood hydrographs for a given
were utilized to construct a bivariate model for regular storms, return period, and was combined with the linear reservoir
in which the generalized Pareto distribution was selected to model to assess the adequacy of dam spillway of Ceppo
describe the marginals, and the Frank family of Archimedean Modrelli dam in Northern Italy. Zhang and Singh (2006,
copulas was adopted to construct the dependence structure 2007c) investigated the dependence structure between flood
between rainfall duration and average intensity. This study was peak, volume, and duration by testing four different families
further extended to the trivariate level in Salvadori and De of Archimedean copulas: Gumbel-Hougaard, Ali-Mikhail-
Michele (2006), where the dry period between rainfall events Haq, Frank, and Cook-Johnson. The margins were analyzed by
was added as the third variable, and again the generalized using the extreme value type I and log-Pearson type III dis-
Pareto distribution and Frank families of Archimedean cop- tributions. They found positive correlations between flood
ulas were adopted. peak and volume, and flood volume and duration, and con-
For studying extreme rainfall behavior, Grimaldi and cluded that the Gumbel-Hougaard family was most appro-
Serinaldi (2006b) and Serinaldi et al. (2005) discussed the priate to characterize the dependence structure. They also
relationship between design rainfall depth (critical depth, applied the copula model in calculating conditional return
obtained from IDF curves by specifying design duration and period. Other applications of copulas in flood frequency an-
return period) and the actual features of extreme rainfall alysis can be found in Grimaldi and Serinaldi (2006a), Shiau
events. Half-hourly rainfall data from 10 raingauges at Umbria et al. (2006), Genest et al. (2007), Renard and Lang (2007),
(Italy) from 1995 to 2001 were combined with the assump- and Serinaldi and Grimaldi (2007).
tion of regional homogeneity to form a 70-year annual max- Besides rainfall and flood frequency analyses, copulas were
imum series for analysis. The trivariate model containing adopted in other hydrologic topics as well. Salvadori and De
critical depth, actual total depth, and peak intensity was Michele (2004b) discussed the use of copulas to assess the
constructed via copulas. By providing critical depth, it was return period of hydrological events using bivariate models
expected that important features of extreme rainfall could be and defined the secondary return period. Kao and Govindaraju
obtained. Zhang and Singh (2007a, 2007b) performed (2007a) quantified the effect of dependence between rainfall
multivariate analysis for extreme rainfall events via copulas. duration and average intensity on surface runoff and demon-
Hourly precipitation data from three raingauges at Amite River strated that the use of copulas could result in simpler, more
basin in Louisiana (US) for 42 years were analyzed. Bivariate elegant mathematical treatment of zero runoff probabilities.
rainfall models between total depth (volume), duration, and Copulas were also applied in estimating groundwater par-
average intensity were constructed. Several types of con- ameters using a copula-based geostatistics approach (Bárdossy,
ditional and joint return periods were illustrated in their study. 2006; Bárdossy and Li, 2008), drought analysis (Beersma and
Kao and Govindaraju (2007b, 2008) adopted 53 hourly Buishand, 2004; Shiau, 2006; Shiau et al., 2007; Serinaldi
506 Statistical Hydrology

et al., 2009), multivariate L-moment homogeneity test collected at different locations to compensate for the limited
(Chebana and Ouarda, 2007), tail dependence in hydrologic (or absence of) information at the site of interest.
data (de Waal et al., 2007; Poulin et al., 2007), uncertainty The literature proposes several approaches to regional-
quantification in remote-sensed data (Gebremichael and ization (traditional approaches are illustrated for instance in
Krajewski, 2007; Pan et al., 2008; Villarini et al., 2008), rainfall Stedinger et al. (1993), Hosking and Wallis (1997), Pandey
IDF curves (Singh and Zhang, 2007), sea storm and wave and Nguyen (1999), and FEH (1999)) and presents appli-
height analysis (Wist et al., 2004; de Waal and van Gelder, cations to different hydrological problems and contexts such
2005; De Michele et al., 2007), and atmospheric and clima- as the T-year flood estimation (see, e.g., Dalrymple, 1960;
tologic studies (Vrac et al., 2005; Maity and Kumar, 2008; Burn, 1990; Gabriele and Arnell, 1991; Castellarin et al., 2001;
Norris et al., 2008). A review of copulas in Genest and Favre Merz and Blöschl, 2005), frequency analysis of low flows (see,
(2007) indicated that application of copulas in hydrology is e.g., Smakhtin, 2001; Castellarin et al., 2004; Laaha and
still in its nascent stages, and their full potential for analyzing Blöschl, 2006; Vogel and Kroll, 1992; Furey and Gupta, 2000),
hydrologic problems is yet to be realized. More detailed the- and rainfall extremes (Schaefer, 1990; Alila, 1999; Faulkner,
oretical background and descriptions for the use of copulas in 1999; Brath et al., 2003; Di Baldassarre et al., 2006b).
problems related to water resources can be found in Dupuis This section makes a direct and explicit reference to re-
(2007), Salvadori et al. (2007), and Salvadori and De Michele gional flood frequency analysis (RFFA); nevertheless, concepts
(2007). and algorithms presented herein are suitable for regional-
ization of other hydrological extremes (e.g., low flows and
rainstorms). More in general, the main features of a region-
2.18.5.3 Remarks on Copulas and Future Research alization procedure, which are summarized in the remainder
of this section, can be easily extended to broader hydrologic
While copulas can help advance hydrologic analysis at
problems such as the prediction of within-year variability of
multivariate levels and provide broad potential applications, it
streamflow regime in ungauged basins. Examples are reported
is important to bear in mind also their limitations. One
in the literature that describe how to regionalize annual and
should always be aware that the reliability of copulas is
long-term flow–duration curves (see, e.g., Fennessey and
founded upon the sufficiency and quality of observations.
Vogel, 1990; Smakhtin et al., 1997; Castellarin et al., 2004,
Copulas, like any other statistical methods, can only elucidate
2007) or to predict the streamflow regime in ungauged
the information embedded in the samples. In order to char-
catchments via simulation by using rainfall-runoff models
acterize multivariate joint distributions, a much larger sample
with regional parameters (see, e.g., Parajka et al., 2007).
size is needed. As such, data processing and quality control
will play equally important roles. The assumption of statio-
narity should also be examined. It is particularly necessary 2.18.6.1 Index-Flood Procedure, Extensions and Evolutions
since the changing climate may cause fundamental changes in A traditional approach to regional frequency analysis is the
the past climate pattern and invalidate the predictions based index-flood procedure (Dalrymple, 1960). The approach has a
on historic observations. long history in hydrology and is based on the identification of
Another major limitation, which is essential for more ex- homogeneous groups of sites (homogeneous regions) for
tensive usage in hydrologic applications, is the curse of di- which the frequency distribution of floods (or other hydro-
mensionality. Though Sklar’s theorem was proposed for a logical extremes) is the same except for a scale parameter,
general dimension, most of the current copula functions are called index flood (or in general index term, e.g., index storm
valid only on the bivariate level. Choices are limited especially for the regionalization of rainfall extremes), which reflects the
on a higher dimension (44). It is a major disadvantage par- local hydrological conditions.
ticularly while modeling natural phenomenon with a com- According to the index-flood procedure, the T-year flood
plicated dependence structure such as droughts. Moreover, the at a given site, QT, can be expressed as the product of two terms:
mathematical compatibility among various marginals and the index-flood, mQ, and the dimensionless growth factor qT,
lower-level dependencies complicates the issue and it remains which describes the relationship between the dimensionless
as an open problem (see Kao and Govindaraju, 2008). Further flood and the recurrence interval, T (the so-called growth curve):
researches are deemed necessary to explore more choices of
copulas, for the expanding possibility in modeling complex QT ¼ mQ  qT ð62Þ
natural dependence structures.
The index flood mQ in (62) is generally a measure of central
tendency of the at-site frequency distribution. It is common to
2.18.6 Regional Frequency Analysis refer to the mean of the distribution (see, e.g., Brath et al., 2001),
but the literature points out that the median is a valid alternative
Regional frequency analysis is widely employed for estimating (see, e.g., Robson and Reed, 1999). The procedure allows the
design variables in ungauged sites or when dealing with data estimation of more reliable growth curves due to the exploit-
record lengths that are short as compared to the recurrence ation of information from the entire homogenous region.
interval of interest (see, e.g., Stedinger et al., 1993). Region- The index-flood approach can be applied with respect to
alization procedures embody the first principles proposed by AMS or PDS (also known as POT; see, e.g., Madsen et al.,
the NRC-US (1988) for hydro-meteorological modeling: 1997a, 1997b). The remainder of this section refers directly to
‘substitute time for space’ by using hydrologic information AMS, which are generally more common and easier to obtain,
Statistical Hydrology 507

Region 1 Region 1 Ungauged target site


Region 2 Region 2 Neighboring station
Region 3 Region 3 Non-neighboring station
(a) Region 4 (b) Region 4 (c)

Figure 13 Approaches for the delineation of homogeneous regions: (a) geographically continuous regions; (b) noncontiguous homogeneous regions;
and (c) hydrologic neighborhoods. From Ouarda TBMJ, Girard C, Cavadias GS and Bobee B (2001) Regional flood frequency estimation with canonical
correlation analysis. Journal of Hydrology 254(1–4): 157–173, Fig. 1.

even though the concepts and procedures described can be The latest advances tend to remove completely the concept
easily extended for PDS (or POT) series. of boundaries from the definition of homogeneous regions
The classical implementation of the index-flood procedure (pooling group of sites). Examples in this direction are the
is based on the most restrictive fundamental hypothesis of models in which regional parameters vary continuously with
existence of homogeneous regions within which the statistical geomorfoclimatic indices (see, for instance, Alila, 1999;
properties of dimensionless flood flows do not vary with lo- Di Baldassarre et al., 2006b) or the adaptation of geostatistical
cation (e.g., dimensionless statistical moments such as the interpolation techniques to the problem of regionalization of
coefficients of variation and skewness are constant). Never- hydrological information.
theless, after proposing the original procedure the literature
reported several extensions and evolutions, which partly relax 2.18.6.2 Classical Regionalization Approach
the above fundamental hypothesis. The hierarchical appli-
cation of the index-flood procedure, for instance, assumes that A few main steps characterize any regionalization procedure. If
statistics of increasing order are constant within a set of nested the index flood is selected as the regionalization scheme and
regions; the larger the order of the statistics, the larger the re- one is interested in estimating the flood quantile at a given
gion (see, e.g., Gabriele and Arnell, 1991). Another relevant site, these steps can be summarized as follows (see, e.g.,
example of evolution of the original hypothesis is or region of Hosking and Wallis, 1997): (1) estimation of the index flood,
influence (RoI) approach (see, e.g., Burn, 1990), which replaces mQ; (2) estimation of the regional quantile, qT, that is,
the original concept of fixed and contiguous regions with (2a) identification of homogeneous pooling group of sites,
homogeneous pooling groups of sites. The pooling groups are (2b) choice of a frequency distribution, and (2c) estimation of
identified in order to maximize the hydrological affinity with the regional frequency distribution; and (3) validation of the
the site of interest (focused pooling), and the regionalization regional model.
procedure enables one to weight the regional hydrological in-
formation according to the similarities with the site of interest 2.18.6.2.1 Estimation of the index flood
and to use the at-site information in a very efficient way (see, Without lack of generality, let us consider the case in which mQ
e.g., Zrinji and Burn, 1996; Castellarin et al., 2001). is assumed to be the mean of the distribution. The estimation
Although the techniques for delineating homogeneous of the index flood is straightforward when the site of interest is
pooling group of sites were definitely enhanced and improved gauged and the record length is sufficiently long. In this case
since the first studies on regionalization, moving toward more mQ can be obtained directly by calculating the arithmetic mean
objective and process-related approaches, the definition of of the available observations. Indirect methods have to be used
homogeneous pooling-group of sites is still a hot topic of in ungauged sites, instead. Multiregression models are prob-
regional frequency analysis, highly debated among the scien- ably the most common indirect methods (see, e.g., Brath et al.,
tific community. During the evolution of the regionalization 2001; Castellarin et al., 2007). They link mQ to an appropriate
techniques, the definition of boundaries between pooling- set of morphological and climatic descriptors of the basin
group of sites evolved steadily, from administrative through statistical relations that, for instance, may read as
boundaries to physiographic and meteo-climatic boundaries
An
(panel a) in Figure 13), from geographically identifiable ^Q ¼ A0 oA2
m A3
1 o2 yoi þ W ð63Þ
boundaries (panels a) and b) in Figure 11) to boundaries as-
sociated with the particular site of interest (panel c) in Figure 13; ^Q is the index-flood estimate, oi, i ¼ 1,2, y , n, are the
where m
RoI, see, e.g., Burn, 1990; Reed et al., 1999; Ouarda et al., 2001. explanatory variables of the model (i.e., a suitable set of
508 Statistical Hydrology

geomorphologic and climatic indexes), Ai, i ¼ 0,1, y , n, are et al. (2007) for a comparison of some homogeneity tests
parameters, and y is the residual of the model. The structure of proposed by the literature). Hosking and Wallis (1997) de-
(63), that is, selection of the smallest and most efficient set of fined a heterogeneity measure that is a standardized measure
catchment descriptors, and the values of the parameters can be of the intersite variability of L-moment ratios. This measure is
identified by multivariate stepwise regression analysis (e.g., routinely used by hydrologists to test regional homogeneity.
Wiesberg, 1985; Brath et al., 2001). Instead of stepwise re- The Hosking and Wallis (1997) heterogeneity measure
gression analysis, alternative multivariate procedures can also assesses the homogeneity of a group of basins at three differ-
be adopted for this task, such as artificial neural network, ent levels by focusing on three measures of dispersion for
principal component or canonical correlation analysis (see, different orders of the sample L-moment ratios.
e.g., Shu and Burn, 2004; Ouarda et al., 2001; Chokmani and A measure of dispersion for the L-Cv
Ouarda, 2004).  2
PR ðiÞ
The literature indicates that statistical indirect models such ni t2  t2
i¼1
as (63) are generally more accurate than conceptual indirect V1 ¼ PR ð65Þ
models for predicting mQ in ungauged basins (see, e.g., Brath i¼1 ni
et al., 2001). The latter models attempt to interpret the dynamics
of rainfall–runoff transformation and are characterized by a A measure of dispersion for both the L-Cv and the L-Cs co-
more rigid structure. As a consequence, conceptual models efficients in the L-Cv–L-Cs space
reduce the influence on model parametrization of the specific  2  2 1=2
information which arrives from any gauged station and
PR ðiÞ ðiÞ
i¼1 ni t2  t2 þ t3  t3
therefore are typically more robust than statistical models. The V2 ¼ PR ð66Þ
literature also reports that direct estimation may be a prefer- i¼1 ni
able alternative to indirect methods when 2–5 years of data
are available, especially for basins with physiographic and A measure of dispersion for both the L-Cs and the L-Ck co-
climatic characteristics that differ significantly from the aver- efficients in the L-Cs–L-Ck space
age characteristics of the set of basins considered for the 
PR 2  2 1=2
identification of the indirect estimation models. ðiÞ ðiÞ
i¼1 ni t3  t3 þ t4  t4
V3 ¼ PR ð67Þ
2.18.6.2.2 Estimation of the regional dimensionless i¼1 ni
quantile
The literature reports on different approaches for delineating where t2, t3 , and t4 are the regional L-Cv, L-Cs, and L-Ck re-
ðiÞ ðiÞ ðiÞ
pooling groups of sites, as well as selecting and estimating the spectively; t2 ; t3 ; t4 , and ni are the values of L-Cv, L-Cs, L-Ck,
regional frequency distribution (see, e.g., GREHYS, 1996; FEH, and the sample size for site i; and R is the number of sites in
1999). Hosking and Wallis (1997) proposed an integrated the pooling group.
approach completely based on the use of L-moments. The underlying concept of the test is to measure the sample
The approach summarizes the frequency regime of the variability of the L-moment ratios and compare it to the
pooling group of sites through the regional L-moment ratios. variation that would be expected in a homogeneous group.
Regional L-moment ratios can be defined as follows: The expected mean value and standard deviation of these
dispersion measures for a homogeneous group, namely mVk
X
R X
R
and sVk , are assessed through repeated simulations, by gener-
tr ¼ ni tðiÞ
r = ni ð64Þ ating homogeneous groups of basins having the same record
i¼1 i¼1
lengths as those of the observed data. To avoid any unduly
commitment to a particular three-parameter distribution, the
where tr is the regional L-moment ratio of order r (e.g., the
authors recommend the four-parameter kappa distribution to
L-coefficients of variation, skewness, and kurtosis, L-Cv, L-Cs,
ðiÞ generate the synthetic groups of flood sequences. The kappa
and L-Ck, correspond to r ¼ 2, 3, and 4 in this order), tr is the
distribution includes, as special cases, several well-known two-
sample L-moment ratio of order r for site i that can be com-
and three-parameter distributions (see, e.g., Hosking and
puted as described in Section 2.18.3, ni is the record length for
Wallis, 1997; Castellarin et al., 2007). The heterogeneity
site i, while R is the number of sites in the pooling group.
measures are then evaluated using the following expression:
Numerous applications in different contexts and to different
regionalization problems (not necessarily confined to the es- V k  mVk
timation of the design flood) proved the validity and re- Hk ¼ ; for k ¼ 1; 2; 3 ð68Þ
sVk
liability of the approach, whose main steps are briefly
illustrated in this section. Hosking and Wallis suggest that a group of sites may be
regarded as ‘acceptably homogeneous’ if Hko1, ‘possibly
2.18.6.2.3 Homogeneity testing heterogeneous’ if 1rHko2, and ‘definitely heterogeneous’ if
Once a pooling group of sites has been delineated, its Hk42. According to the authors, these reference values are
homogeneity degree has to be tested. The homogeneity of the guidelines. For instance, the amount H ¼ 1 can be regarded as
group of sites is a fundamental requirement in order to per- the borderline of whether a redefinition of the region may lead
form an effective estimation of the T-year quantile (e.g., to a meaningful increase in the accuracy of the regional
Lettenmaier et al., 1987; Stedinger and Lu, 1995: see Viglione quantile estimate.
Statistical Hydrology 509

H1 is the most selective heterogeneity measure. H2 and H3 Pareto (GPA), lognormal (LN3), Pearson type III (PE3), and
tend to identify larger homogeneous pooling groups of sites; GEV (see Section 2.18.3). The measure defined for selecting
therefore, the utilization of all three measures well suits the three-parameter distribution quantifies how well the L-Cs
the application of a hierarchical regionalization approach and the L-Ck of the of the fitted distribution match the re-
(Gabriele and Arnell, 1991; Castellarin et al., 2001). gional average L-Cs and L-Ck. The fit can be geometrically
interpreted on a diagram that reports the values of L-Cs and
2.18.6.2.4 Choice of a frequency distribution L-Ck on the x- and y-axis (L-moment ratio diagram) as the
Hosking and Wallis suggest to base the selection of the parent vertical distance between the point corresponding to the re-
distribution (the frequency distribution for all sites in the gional average and the curve representing the theoretical re-
pooling-group) on the value of regional L-moments. The lationship between L-Cs and L-Ck for the considered
authors define a goodness-of-fit measure to be used for se- distribution (see Figure 14). Hosking and Wallis (1997), Vogel
lecting the candidate distribution among a family of possible and Fennessey (1993), and Peel et al. (2001), among others,
three-parameter distributions. The authors consider as pos- recommended using the L-moment ratio diagrams to guide
sible candidates the generalized logistic (GLO), generalized the selection of the most suitable parent distribution.

At-site data
Mean of sample data
U-uniform, E-exponential, G-gumbel, Gumbel
L-logistic, N-normal Generalized extreme value
Generalized pareto Generalized pareto
Generalized extreme value Generalized logistic
Generalized logistic
Regional sample L-moments
0.7 0.25

0.5
0.20
6h
L-kurtosis

L-kurtosis

0.3 12 h 3h
24 h
L 1h
0.15
G E
0.1 N

U
−0.1 0.10
0 0.2 0.4 0.6 0.8 0.10 0.15 0.20 0.25 0.30
(a) L-skewness (b) L-skewness

1.0 0.5
GEV
0.8 0.4
GUM
0.6 Data
0.3
L-kurtosis

Mean
L-kurt

0.4
0.2
0.2
0.1 GEV
0.0 Gumbel
0.0 Mean
−0.2
−0.2 0.2 0.6 1.0 −0.2 0.0 0.2 0.4 0.6
(c) L-skew (d) L-skew

Figure 14 (a) L-moment ratio diagrams: application to AMS of flood flows; (b) rainfall depths with different duration; (c) global data set of earthquake
magnitudes; and (d) extreme wind speeds at 129 stations in the contiguous United States. (a) From Castellarin A, Burn DH and Brath A (2001)
Assessing the effectiveness of hydrological similarity measures for flood frequency analysis. Journal of Hydrology 241: 270–285, Fig. 2. (b) From
Brath A, Castellarin A and Montanari A (2003) Assessing the reliability of regional depth-durationfrequency equations for gaged and ungaged sites.
Water Resources Research 39(12): 1367–1379, Fig. 3. (c) From Thompson EM, Baise LG and Vogel RM (2007) A global index earthquake approach to
probabilistic assessment of extremes. Journal of Geophysical Research 112: B06314, Fig. 1. (d) Personal communication, Eugene Morgan, Tufts
University, Deparment of Civil and Environmental Engineering, 2009.
510 Statistical Hydrology

For the details concerning the goodness-of-fit measure the obtain reasonably accurate estimates of the T-year quantile
interested reader is referred to Hosking and Wallis (1997). avoiding undue extrapolations.
The literature documents that the GEV distribution is
conceptually appropriate and technically suitable for accur- 2.18.6.2.6 Validation of the regional model
ately reproducing the sample frequency distribution of geo- Regional flood frequency models are generally applied for
physical extremes observed in different geographical contexts predicting the flood frequency regime in ungauged basin.
around the world (see, e.g., Stedinger et al., 1993; Vogel and Therefore, it is fundamental to quantify the accuracy of the
Wilson, 1996; Robson and Reed, 1999; Castellarin et al., 2001; models and the uncertainty of regional estimates when no
Thompson et al., 2007). For instance, Figure 14 shows by observation is available at the site of interest. A powerful and
means of L-moment ratios the appropriateness of the GEV easy-to-implement cross-validation technique that can be
distribution for flood flows, rainfall extremes, earthquake used for this purpose is the jack-knife resampling procedure
magnitudes, and wind speed extremes. (see, e.g., Shao and Tu, 1995; Castellarin, 2007; Castellarin
et al., 2007).
2.18.6.2.5 Estimation of the regional frequency Regardless of the regionalization approach or structure of
distribution the regional model being considered, the jack-knife procedure
The estimation of the regional distribution can be performed is a leave-one-out cross-validation technique that can be de-
through the method of L-moments. This method is analogous scribed as follows:
to the method of moments, which is probably the oldest and
widely understood technique for fitting frequency distri- 1. one gauging station (site i) is removed from the set of
butions to observed data (Vogel and Fennessey, 1993). The R stations;
method equates regional L-moment estimates with the theo- 2. the regional model is constructed for site i pooling group
retical L-moments of the distribution, resulting in a system of by neglecting site i data;
nonlinear equations whose variables are the parameters to be 3. the quantity of interest (i.e., T-year flood QT) is estimated at
estimated. site i through the regional model identified at step 2 (jack-
The use of L-moments instead of conventional moments knife estimate); and
offers several advantages, for instance, the possibility to char- 4. steps 1–3 are repeated R  1 times for each one of the
acterize a wider range of distributions, smaller bias and higher remaining gauges.
robustness of the estimators when applied to short samples The R jack-knife estimates are then compared with the cor-
(see, e.g., Hosking and Wallis, 1997). responding reference values (i.e., regional estimates that do
For the case of the GEV distribution, Hosking and Wallis consider the data observed at site i, or at-site estimate if vi-
(1997) proposed the following system of equations for the able), for instance in terms of relative BIAS, MSE, and Nash
application of the method of L-moments: and Sutcliffe efficiency measure (NSE).
2 ln 2
k^R E 7:8590c þ 2:9554c 2 ; c¼  ð69aÞ 1X R jk
x^i  xi
3 þ t3 ln 3 BIAS ¼
R i¼1 xi
!2
t2 k^R 1X R jk
x^i  xi
^a R ¼ ; ^x R ¼ 1  ^a R ½1  Gð1 þ k^R Þk^R
MSE ¼
ð1  2k^ R ÞGð1 þ k^R Þ R i¼1 xi
ð69bÞ !2
XR jk
x^i  xi
NSE ¼ 1  ð71Þ
where k ^ R , and ^
^ R, a x R indicates the regional estimates of the GEV i¼1
xi  x
parameters, t2 and t3 are the regional L-Cv and L-Cs, GðxÞ ¼ jk
R x x1 t where xi and x^i are respectively the reference and jack-knife
0t e dt is the gamma function, and the regional GEV parent
estimates for site i; R is the number of sites in the pooling
is assumed to have unit mean (i.e., index flood coincides with
group; and x is the average of the R reference values. NSE
the mean of the distribution). The empirical polynomial equa-
varies between  N and 1, where NSE ¼ 1 indicates the per-
tion in c has accuracy better than 9 104 for typical L-Cs values.
fect fit, and NSE ¼ 0 stands for a model that performs as effi-
Once the regional parameters are estimated, the regional di-
ciently as a mean regional value.
mensionless quantiles can be computed as follows:
The general structure of the jack-knife procedure can be
(    ^ R ) applied for the cross-validation of any regional model. The
^R^R
a 1 k
q^T ¼ x þ 1  ln 1  for ka0 ð70aÞ actual implementation of step 2 depends on the particular
k^R T regional model being considered. For example, if a regional
multiregression model for the estimation of the index flood is

  
1 considered, step 2 will involve the calibration of the statistical
q^T ¼ ^xR  a
^R ln ln 1  for ka0 ð70bÞ coefficients of the model (i.e., coefficients Aj, with j ¼ 1,2,
T
y , n in Equation (63)). If the estimation of QT is considered
To improve the accuracy of the regional quantile estimates, the instead, step 2 will include the whole regionalization process.
target pooling-group size can be determined according to the 5T Figure 15 reports an example of cross-validation for a regional
guideline (Jakob et al., 1999), which suggests that a pooling model adopting the index-flood scheme (see also Castellarin,
group should contain at least 5T station-years of data so as to 2007).
Statistical Hydrology 511

2
Subregions
1.5 45 43 47 53 Region W
46 54
5
50 56 5762 Region C
44
1
48 49 55 64 66 Region E
Relative error 61 69
59 63 68
58 60 70 71 Adriatic
0.5 sea
Tyrrhenian 72
sea 74
0 73

75
−0.5
78 79
0 50 100
−1 80 81
QT qT Q
km

Figure 15 Box plots summarizing the relative error distributions in terms of 25th, 50th, and 75th percentiles, maximum and minimum values and
outliers (circles) for cross-validated flood quantiles of given recurrence interval T, QT, dimensionless flood quantiles, qT, and index flood, mQ for the
three homogeneous regions depicted in the right panel. From Castellarin A, Camorani G and Brath A (2007) Predicting annual and long-term flow-
duration curves in ungauged basins. Advances in Water Resources 30(4): 937–953, Fig. 6.

2.18.6.3 Open Problems and New Advances negligible (see, e.g., Matalas and Langbein, 1962; Stedinger,
1983) and leads to increases in the variance of regional flood
RFFA has been a research sector for more than five decades statistics (see, for instance, Hosking and Wallis, 1988).
now, yet the scientific community is still very active on this Nevertheless, the analysis of the impacts of cross-correlation
topic. This results from the existence of open problems, as it is on regional estimates is still poorly understood. Recent studies
discussed later, but it also can be ascribed to the potentiality of have pointed out that cross-correlation may significantly re-
statistical regionalisation for solving a very common problem duce the regional information content in practical appli-
in hydrology, that is prediction in ungauged basins (see, e.g., cations, quantified in terms of equivalent number of
Sivapalan et al., 2003). For instance, probabilistic interpret- independent observation (see, e.g., Troutman and Karlinger,
ation and regionalization of classical deterministic hydro- 2003, Castellarin et al., 2005; Castellarin, 2007). This re-
logical tools (e.g., flow duration curves, FDC, regional duction has an impact on the reliability of regional quantiles,
envelope curves of flood flows, REC, etc.) renewed the scien- as it increases the variance of regional estimators, and can also
tific appeal of these simple methods, further promoting RFFA severely affect the power of statistical tests for assessing the
among hydrological research topics (see, e.g., Castellarin et al. regional homogeneity degree (Castellarin et al., 2008).
(2004, 2007) for FDC and Castellarin et al. (2005) and Cas- The delineation of homogeneous pooling group of sites, or
tellarin (2007) for REC). catchment classification, is still an open and highly debated
Some issues and aspects associated with RFFA may perhaps problem, on which the scientific community is very active
be considered to be well studied and the margin of improve- (see, e.g., McDonnell and Woods, 2004). Concerning
ment in the accuracy of regional estimates associated with this issue, the main research activities focus on: (1) the iden-
them is probably rather limited. Examples are the choice and tification of the most descriptive and informative physio-
estimation of the regional parent distribution or the statistical graphic and climatic catchment descriptors to be used as
homogeneity testing (Castellarin and Laio, 2006). Some other proxies for the flood frequency regime (see, e.g., Castellarin
issues are still critical, instead, and further analyses may sig- et al., 2001; Merz and Blöschl, 2005) and (2) the development
nificantly improve the accuracy of regional predictions in of pooling procedures as objective and nonsupervised as
ungauged sites. possible. Several objective approaches have been proposed by
One of these issues is certainly the estimation of the index the scientific literature, such as cluster analysis (Burn, 1989) or
flood in ungauged basins. Figure 15 eloquently shows for a unsupervised artificial neural networks (ANNs) (see, e.g., Hall
given case study, but this is a widespread condition (see, e.g., and Minns, 1999; Toth, 2009). Furthermore, the scientific
Kjeldsen and Jones, 2007) that the largest amount of un- community is dedicating an increasing attention to the pos-
certainty is associated with this step of the regionalization sibilities offered by the application of geostatistical techniques
procedure. Investigators are still dedicating a great deal of to the problem of statistical regionalisation. These techniques
effort to the improvement of existing methodologies (see, e.g., have been showed to have a significant potential for region-
Shu and Burn, 2004; Kjeldsen and Jones, 2007) and to the alization and, for this reason, will be briefly discussed and
definition of guidelines for the identification of the most re- presented in this section.
liable and suitable ones depending on the problem at hand Geostatistical procedures were originally developed for the
(see, e.g., Bocchiola et al., 2003). spatial interpolation of point data (see, e.g., kriging: Kitanidis,
Also, classical studies document that intersite correlation 1997). The literature proposes two different ways to apply
among flood flows observed at different sites is typically not geostatistics to the problem of regionalisation of hydrological
512 Statistical Hydrology

Standardised Q100
1
0.30 0
0.40
0.45 −1
0.50 −2
0.55
0.60 −3
0.65
0.70
−4
0.80 −5
1.00 2
2.00 1 3
0 2
−1 1
−2 −1
0
y −3 −2 x

(a) (b)
3 1 2
Figure 16 (a) Topkriging: 100-year flood per unit area (color codes in m s km ) for a portion of the Mur region (Austria): Topkriging estimates
along the stream network and empirical values as circles. From Skøien JO, Merz R and Bloschl G (2006) Top-kriging - geostatistics on stream
networks. Hydrology and Earth System Sciences 10(2): 277–287, Fig. 7. (b) PSBI: 3D representation of standardised value of 100-year flood over the
physiographic space identified for a set of basins in northern central Italy (gauged basins are represented as dots).

information. The first technique is called physiographic space- each gauged catchment, and can then be spatially interpolated
based interpolation (PSBI) and performs the spatial inter- (with uncertainty) by applying a standard interpolation
polation of the desired hydrometric variable (e.g., T-year algorithm (e.g., ordinary or universal kriging). The spatial
flood, but also annual streamflow, peak flow with a certain interpolation enables one to represent the quantity of interest
return period, low flows, etc.) in the bidimensional space of over the entire portion of the x–y space containing empirical
geomorphoclimatic descriptors (Chokmani and Ouarda, data, and therefore to estimate it at ungauged sites lying
2004; Castiglioni et al., 2009). The x and y orthogonal co- within the same portion of the space (see Figure 16).
ordinates of the bidimensional space are derived from an
adequate set of n41 geomorphologic and climatic descriptors
of the river basin (such as drainage area, main channel length,
mean annual precipitation, and indicators of seasonality; see References
Castellarin et al., 2001) through the application of multivariate
techniques, such as the principal components or canonical Adeloye AJ and Montaseri M (2002) Preliminary streamflow data analyses prior to
water resources planning study. Hydrological Sciences Journal 47(5): 679--692.
correlation analysis (Shu and Ouarda, 2007). The second
Akaike H (1973) Information theory and an extension of the maximum likelihood
technique, named Topological kriging or Topkriging, is a principle. In: Petrov BN and Csaki F (eds.) Second International Symposium on
spatial estimation method for streamflow-related variables. It Information Theory. 281pp. Budapest: Academiai Kiado.
interpolates the streamflow value of interest (i.e., T-year flood, Aksoy H (2006) Hydrological variability of the European part of Turkey. Iranian Journal
low-flow indices, etc.) along the stream network by taking the of Science and Technology 31(B2): 225--236.
Aksoy H, Unal NE, Gedikli A, and Kehagias A (2008a) Fast segmentation algorithms for
area and the nested nature of catchments into account (Skøien
long hydrometeorological time series. Hydrological Processes 22: 4600--4608.
et al., 2006; Skøien and Blöschl, 2007). Aksoy H, Unal NE, Alexandrov V, Dakova S, and Yoon J (2008b) Hydrometeorological
The philosophy behind these innovative approaches to analysis of northwestern Turkey with links to climate change. International Journal
regionalization is rather interesting because they enable one to of Climatology 28: 1047--1060.
regionalize hydrometric variables dispensing with the defin- Alexandersson H (1986) A homogeneity test applied to precipitation data. Journal of
Climatology 6: 661--675.
ition or identification of homogeneous regions or pooling Alila Y (1999) A hierarchical approach for the regionalization of precipitation annual
groups of sites (see Figure 13). The approaches are particularly maxima in Canada. Journal of Geophysical Research 104: 31645--31655.
appealing for predictions in ungauged basins as they provide a Appel U and Brandt AV (1983) Adaptive sequential segmentation of piecewise
continuous representation of the quantity of interest (e.g., stationary time series. Information Sciences 29: 27--56.
Atiem IA and Harmancioglu NB (2006) Assessment of regional floods using
T-year flood) along the stream network (Topkriging) or in the
l-moments approach: The case of the river Nile. Water Resources Management
physiographic space (PSBI), providing the user with an esti- 20(5): 723--747.
mate of the uncertainty associated with the interpolated value. Baghirathan VR (1978) Rainfall depth-duration-frequency studies for Sri Lanka.
In particular, the final output for Topkriging is the estimation Journal of Hydrology 37(3–4): 223--239.
of the measure of interest (with uncertainty) along the stream Bárdossy A (2006) Copula-based geostatistical models for groundwater quality
parameters. Water Resources Research 42: W11416 (doi:10.1029/
network (see Figure 16). A little less intuitive is the output of 2005WR004754).
PSBI. With this technique any given basin (gauged or Bárdossy A and Li J (2008) Geostatistical interpolation using copulas. Water
ungauged) can be represented as a point in the x–y space de- Resources Research 44: W07412 (doi:10.1029/2007WR006115).
scribed above; in the same way the set of gauged basins of the Baseville M and Nikiforov IV (1993) Detection of abrupt changes: Theory and
application. Englewood Cliffs, NJ: Prentice Hall.
study area can be represented by a cloud of points in this
Beersma JJ and Buishand TA (2004) Joint probability of precipitation and discharge
space. The empirical values of the quantity of interest (e.g., deficits in the Netherlands. Water Resources Research 40: W12508 (doi: 10.1029/
T-year flood) can be represented along the third dimension z for 2004WR003265).
Statistical Hydrology 513

Begueria S (2005) Uncertainties in partial duration series modelling of extremes related Chokmani K and Ouarda TBMJ (2004) Physiographical space-based kriging for
to the choice of the threshold value. Journal of Hydrology 303(1–4): 215--230. regional flood frequency estimation at ungauged sites. Water Resources Research
Bendjoudi H, Hubert P, Schertzer D, and Lovejoy S (1997) Multifractal point of view on 40: W12514.
rainfall intensity-duration-frequency curves. Comptes Rendus de l’Academie de Chow VT, Maidment DR, and Mays LW (1988) Applied Hydrology, International edn.
Sciences – Serie IIa: Sciences de la Terre et des Planetes 325(5): 323--326. New York: McGraw-Hill Higher Education.
Benson M (1968) Uniform flood-frequency estimating methods for federal agencies. Chowdhury JU, Stedinger JR, and Lu L (1991) Goodness-of-fit tests for regional
Water Resources Research 4(5): 891--908. generalized extreme value flood distributions. Water Resources Research 27(7):
Ben-Zvi A (2009) Rainfall intensity-duration-frequency relationships derived from large 1765--1776.
partial duration series. Journal of Hydrology 367(1–2): 104--114 (doi:10.1016/ Claps P and Laio F (2003) Can continuous streamflow data support flood frequency
j.jhydrol.2009.01.007). analysis? An alternative to the partial duration series approach. Water Resources
Bernard MM (1932) Formulas for rainfall intensities of long durations. Transaction of Research 39(8): 1216.
the American Society of Civil Engineers 96: 592--624. Cluis D and Laberge C (2001) Climate change and trend detection in selected
Berryman D, Bobee B, Cluis D, and Haemmerli J (1988) Nonparametric tests for trend rivers within the Asia-Pacific region. Water International 26(3):
detection in water quality time series. AWRA Water Resources Bulletin 24(3): 411--424.
545--556. Cluis D, Langlois C, van Coillie R, and Laberge C (1989) Development of a software
Bobée B (1975) Log Pearson type-3 distribution and its application in hydrology. package for trend detection in temporal series: Application to water and industrial
Water Resources Research 11(5): 681--689. effluent quality data for the St. Lawrence river. Environmental Monitoring and
Bobée B and Rasmussen PF (1995) Recent advances in flood frequency-analysis. Assessment 12: 429--441.
Reviews of Geophysics 33(S1): 1111--1116. Cunnane C (1973) A particular comparison of annual maxima and partial duration
Bocchiola D, De Michele C, and Rosso R (2003) Review of recent advances in index series methods of flood frequency prediction. Journal of Hydrology 18(3–4):
flood estimation. Hydrology and Earth System Sciences 7(3): 283--296. 257--271.
Brasil Vieira D and Zink de Souza C (1985) Analysis of the relation intensity–duration– Dahamsheh A and Aksoy H (2007) Structural characteristics of annual precipitation
frequency of heavy rains for Ribeirao Preto. ICID Bulletin (International data in Jordan. Theoretical and Applied Climatology 88: 201--212.
Commission on Irrigation and Drainage) 34(1): 49–55, 64. Dairaku K, Emori S, and Oki T (2004) Rainfall amount, intensity, duration, and
Brath A, Castellarin A, and Montanari A (2003) Assessing the reliability of regional frequency relationships in the Mae Chaem watershed in Southeast Asia. Journal of
depth–duration–frequency equations for gaged and ungaged sites. Water Hydrometeorology 5(3): 458--470.
Resources Research 39(12): 1367--1379. Dalrymple T (1960) Flood frequency analyses. Water Supply Paper 1543–A. Reston,
Breymann W, Dias A, and Embrechts P (2003) Dependence structures for multivariate VA, USA: USGS.
high-frequency data in finance. Quantitative Finance 3: 1--14 (doi:10.1080/ De Michele C and Salvadori G (2003) A Generalized Pareto intensity-duration model of
713666155). storm rainfall exploiting 2-Copulas. Journal of Geophysical Research 108(D2):
British Atmospheric Data Centre (2006) UK Meteorological Office. MIDAS Land 4067 (doi:10.1029/2002JD002534).
Surface Stations data (1853-current). http://badc.nerc.ac.uk/data/ukmo-midas De Michele C, Salvadori G, Canossi M, Petaccia A, and Rosso R (2005) Bivariate
(accessed March 2010). statistical approach to check adequacy of dam spillway. Journal of Hydrologics
Buishand TA (1989) The partial duration series method with a fixed number of peaks. Engineering 10(1): 50--57 (doi:10.1061/(ASCE)1084-0699(2005)10:1(50)).
Journal of Hydrology 109(1–2): 1--9. De Michele C, Salvadori G, Passoni G, and Vezzoli R (2007) A multivariate model of
Burn DH (1989) Cluster analysis as applied to regional flood frequency. Journal of sea storms using copulas. Coastal Engineering 54(10): 734--751 (doi:10.1016/
Water Resources Planning and Management 115(5): 567--582. j.coastaleng.2007.05.007).
Burn DH (1990) Evaluation of regional flood frequency analysis with a region of de Waal DJ and van Gelder PHAJM (2005) Modelling of extreme wave heights
influence approach. Water Resources Research 26(10): 2257--2265. and periods through copulas. Extremes 8: 345--356 (doi:10.1007/s10687-006-
Burn DH and Elnur MAH (2002) Detection of hydrologic trends and variability. Journal 0006-y).
of Hydrology 255: 107--122. Di Baldassarre G, Brath A, and Montanari A (2006a) Reliability of different depth–
Canterford R P (1986) Frequency analysis of Australian rainfall data as used for flood duration–frequency equations for estimating short-duration design storms. Water
analysis and design. In: Regional Flood Frequency Analysis: Proceedings of the Resources Research 42(12): W12501.
International Symposium on Flood Frequency and Risk Analyses, pp. 293–302. Di Baldassarre G, Castellarin A, and Brath A (2006b) Relationships between statistics
Castellarin A (2007) Probabilistic envelope curves for design-flood estimation at of rainfall extremes and mean annual precipitation: An application for design-storm
ungaged sites. Water Resources Research 43: W04406. estimation in northern central Italy. Hydrology and Earth System Sciences 10:
Castellarin A, Burn DH, and Brath A (2001) Assessing the effectiveness of hydrological 589--601.
similarity measures for flood frequency analysis. Journal of Hydrology 241: Di Baldassarre G, Laio F, and Montanari A (2008) Design flood estimation using model
270--285. selection criteria. Physics and Chemistry of the Earth 34: 606--611 (doi: 10.1016/
Castellarin A, Burn DH, and Brath A (2008) Homogeneity testing: how homogeneous do j.pce.2008.10.066).
heterogeneous cross-correlated regions seem? Journal of Hydrology 360: 67--76. Dobric J and Schmid F (2007) A goodness of fit test for copulas based on the
Castellarin A, Camorani G, and Brath A (2007) Predicting annual and long-term flow- Rosenblatt’s transformation. Computational Statistics and Data Analysis 51(9):
duration curves in ungauged basins. Advances in Water Resources 30(4): 4633--4642 (doi: 10.1016/j.csda.2006.08.012).
937--953. Dupuis DJ (2007) Using copulas in hydrology: Benefits, cautions, and issues. Journal
Castellarin A, Galeati G, Brandimarte L, Montanari A, and Brath A (2004) Regional of Hydrologics Engineering 12(4): 381--393 (doi: 10.1061/(ASCE) 1084-
flow-duration curves: Reliability for ungauged basins. Advances in Water 0699(2007)).
Resources 27: 953--965. Evin G and Favre A-C (2008) A new rainfall model based on the Neyman-Scott process
Castellarin A and Laio F (2006) Regional frequency analysis, obsolete or ongoing? using cubic copulas. Water Resources Research 44: W03433 (doi:10.1029/
Seminar in Italian within the workshop New Frontiers in Hydrology. Potenza, Italy: 2007WR006054).
Università degli Studi della Basilicata. Fanta B, Zaake BT, and Kachroo RK (2001) A study of variability of annual river
Castellarin A, Vogel RM, and Matalas NC (2005) Probabilistic behavior of a regional flow of the southern African region. Hydrological Sciences Journal 46(4):
envelope curve. Water Resources Research 41: W06018. 513--524.
Castiglioni S, Castellarin A, and Montanari A (2009) Prediction of low-flow indices in Faulkner D (1999) Rainfall frequency estimation. In: Flood Estimation Handbook (FEH),
ungauged basins through physiographical space-based interpolation. Journal of Vol. 2, Wallingford: Institute of Hydrology.
Hydrology 378: 272–280. Favre A-C, El Adlouni S, Perreault L, Thiémonge N, and Bobée B (2004) Multivariate
Chebana F and Ouarda TBMJ (2007) Multivariate L-moment homogeneity test. Water hydrological frequency analysis using copulas. Water Resources Research 40:
Resources Research 43: W08406 (doi:10.1029/2006WR005639). W01101 (doi: 10.1029/2003 WR002456).
Chen C (1983) Rainfall intensity-duration-frequency formulas. Journal of Hydraulic FEH (1999) Flood Estimation Handbook. Wallingford: Institute of Hydrology.
Engineering–ASCE 109(12): 1603--1621. Fennessey NM and Vogel RM (1990) Regional flow-duration curves for ungauged sites
Chen HL and Rao AR (2002) Testing hydrologic time series for stationarity. ASCE in Massachusetts. Journal of Water Resources Planning and Management, ASCE
Journal of Hydrologic Engineering 7(2): 129--136. 116(4): 531--549.
Cherubini U, Luciano E, and Vecchiato W (2004) Copula Methods in Finance. West Fermanian J-D (2005) Goodness-of-fit tests for copulas. Journal of Multivariate
Sussex: Wiley. Analysis 95: 119--152 (doi:10.1016/j.jmva.2004.07.004).
514 Statistical Hydrology

Fill HD and Stedinger JR (1995) L-moment and probability plot correlation coefficient Gumbel EJ (1941) The return period of flood flows. Annals of Mathematical Statistics
goodness-of-fit tests for the Gumbel distribution and impact of autocorrelation. 12: 163--190.
Water Resources Research 31(1): 225--229. Gumbel EJ (1958) Statistics of Extremes. New York: Columbia University Press.
Fisher R and Tippett L (1928) Limiting forms of the frequency distribution of the Hall MJ and Minns AW (1999) The classification of hydrologically homogeneous
largest or smallest member of a sample. Proceedings of the Cambridge regions. Hydrological Sciences–Journal–des Sciences Hydrologiques 44(5):
Philosophical Society 24(2): 180--191. 693--704.
Fortin V, Perreault L, and Salas JD (2004) Restrospective analysis and forecasting Haan CT (2002) Statistical Methods in Hydrology, 2nd edn. Ames, Iowa: Iowa State
of streamflows using a shifting level models. Journal of Hydrology Press.
296: 135--163. Helsel DR and Hirsch RM (1992) Statistical Methods in Water Research. Amsterdam:
Foster H (1924) Theoretical frequency curves and their application to engineering Elsevier.
problems. Transactions of the American Society of Civil Engineers 87: 142--173. Hershfield DM (1961) Rainfall frequency Atlas of the United States for durations from
Fréchet M (1927) Sur ia loi de probabilité de l’écart maximum (On the probability law 30 minutes to 24 hours and return periods from 1 to 100 years. US Weather
of maximum values). In: Annales de Ia societé’ Polonaise de Mathematique, vol. 6, Bureau Technical Paper 40. Washington, DC: US Government Printing Office.
pp. 93–116. Krakow, Poland. Hirsch RM, Helsel DR, Cohn TA, and Gilroy EJ (1993) Statistical analysis of hydrologic
Frederick RH (1977) Five- to 60-minute precipitation frequency for the eastern and data. In: Maidment DR (ed.) Handbook of Hydrology. New York: Mc-Graw Hill.
central United States. NOAA Technical Memorandum NWS HYDRO-35. Hosking J and Wallis J (1987) Parameter and quantile estimation for the generalized
Fuller W (1914) Flood flows. Transactions of the American Society of Civil Engineers Pareto distribution. Technometrics 29(3): 339--349.
77: 564--617. Hosking JRM (1990) L-moments: Analysis and estimation of distributions using linear
Furey PR and Gupta VA (2000) Space-time variability of low streamflows in river combinations of order statistics. Journal of the Royal Statistical Society, Series B
networks. Water Resources Research 36(9): 2679--2690. (Methodological) 52: 105--124.
Gabriele S and Arnell N (1991) A hierarchical approach to regional flood frequency Hosking JRM and Wallis JR (1988) The effect of intersite dependence on regional
analysis. Water Resources Research 27(6): 1281--1289. flood frequency-analysis. Water Resources Research 24(4): 588--600.
Garcı́a-Bartual R and Schneider M (2001) Estimating maximum expected short- Hosking JRM and Wallis JR (1997) Regional Frequency Analysis: An Approach Based
duration rainfall intensities from extreme convective storms. Physics and Chemistry on L-Moments. Cambridge: Cambridge University Press.
of the Earth, Part B: Hydrology, Oceans and Atmosphere 26(9): 675--681. Hosking JRM and Wallis JR (2005) Regional Frequency Analysis: An Approach Based
Gebremichael M and Krajewski WF (2007) Application of copulas to modeling on L-Moments. Cambridge: Cambridge University Press.
temporal sampling errors in satellite-derived rainfall estimates. Journal of Hubert P (2000) The segmentation procedure as a tool for discrete modeling of
Hydrologics Engineering 12(4): 404--408 (doi:10.1061/(ASCE)1084- hydrometeorological regimes. Stochastic Environmental Research and Risk
0699(2007)12:4(404)). Assessment 14: 297--304.
Gedikli A, Aksoy H, and Unal NE (2008) Segmentation algorithm for long time series Hubert P, Carbonnel JP, and Chaouche A (1989) Segmentation des series
analysis. Stochastic Environmental Research and Risk Assessment 22: 291--302. hydrométéorologiques – application à des séries de précipitations et de débits de
Gedikli A, Aksoy H, and Unal NE (2010a). AUG-segmenter: A user-friendly tool for l’afrique de l’ouest. Journal of Hydrology 110(3–4): 349--367.
segmentation of long time series. Journal of Hydroinformatics 12(3): 318–328. Imberger J and Ivey GN (1991) On the nature of turbulence in a stratified fluid. Part II:
Gedikli A, Aksoy H, Unal NE, and Kehagias A (2010b) Modified dynamic programming Applications to lakes. Journal of Physical Oceanography 21(5): 659--679.
approach for offline segmentation of long hydrometeorological time series. Institute of Hydrology (1999) Flood Estimation Handbook. Crowmarsh Gifford: Institute
Stochastic Environmental Research and Risk Assessment 24: 547–557. of Hydrology.
Gellens D (2002) Combining regional approach and data extension procedure for Jakob D, Reed DW, and Robson AJ (1999) Choosing a pooling-group. In: Flood
assessing GEV distribution of extreme precipitation in Belgium. Journal of Estimation Handbook, vol. 3, Wallingford: Institute of Hydrology.
Hydrology 268(1–4): 113--126. Jenkinson AF (1955) The frequency distribution of the annual maximum (or minimum)
Genest C and Favre A-C (2007) Everything you always wanted to know about copula of meteorological elements. Quarterly Journal of the Royal Meteorological Society
modeling but were afraid to ask. Journal of Hydrologics Engineering 12(4): 81: 158--171.
347--368 (doi:10.1061/(ASCE) 1084-0699(2007)12:4(347)). Joe H (1997) Multivariate models and dependence concepts. London: Chapman and
Genest C, Favre A-C, Béliveau J, and Jacques C (2007) Metaelliptical copulas and Hall.
their use in frequency analysis of multivariate hydrological data. Water Resources Johnson NL, Kotz S, and Balakrishnan N (1994) Continuous Univariate Distributions.
Research 43: W09401 (doi:10.1029/2006WR005275). 2nd edn., vol. 1. New York: Wiley.
Genest C, Ghoudi K, and Rivest L-P (1995) A semiparametric estimation procedure of Kao S-C and Govindaraju RS (2007a) Probabilistic structure of storm surface runoff
dependence parameters in multivariate families of distributions. Biometrika 82(3): considering the dependence between average intensity and storm duration. Water
543--552. Resources Research 43: W06410 (doi:10.1029/2006WR005564).
Genest C, Quessy J-F, and Rémillard B (2006) Goodness-of-fit procedures for copula Kao S-C and Govindaraju RS (2007b) A bivariate rainfall frequency analysis of extreme
models based on the probability integral transformation. Scandinavian Journal of rainfall with implications for design. Journal of Geophysical Research 112: D13119
Statistics 33: 337--366 (doi:10.1111/j.1467-9469.2006.00470.x). (doi:10.1029/2007JD008522).
Genest C and Rivest L-P (1993) Statistical inference procedures for bivariate Kao S-C and Govindaraju RS (2008) Trivariate statistical analysis of extreme rainfall
archimedean copulas. Journal of the American Statistical Association 88(423): events via Plackett family of copulas. Water Resources Research 44: W02415
1034--1043. (doi:10.1029/2007WR006261).
Gert A (1987) Regional rainfall intensity-duration-frequency curves for Pennsylvania. Kehagias A (2004) A hidden Markov model segmentation procedure for hydrological
Water Resources Bulletin 23(3): 479--486. and environmental time series. Stochastic Environmental Research and Risk
Goel NK, Kurothe RS, Mathur BS, and Bogel RM (2000) A derived flood frequency Assessment 18: 117--130.
distribution for correlated rainfall intensity and duration. Journal of Hydrology 228: Kehagias A, Nidelkou E, and Petridis V (2006) A dynamic programming segmentation
56--67 (doi:10.1016/S0022-1694(00)00145-1). procedure for hydrological and environmental time series. Stochastic
Greenwood J, Landwehr J, Matalas N, and Wallis J (1979) Probability weighted Environmental Research and Risk Assessment 20: 77--94.
moments: Definition and relation to parameters of several distributions expressible Kendall MG and Stuart A (1977) The Advanced Theory of Statistics; Vol. 2: Inference
in inverse form. Water Resources Research 15: 1049--1054. and Relationship, 4th edn. London: Griffin and Co.
GREHYS (1996) Inter-comparison of regional flood frequency procedures for Canadian Kitanidis PK (1997) Introduction to Geostatistics: Applications to Hydrogeology.
rivers. Journal of Hydrology 186: 85--103. Cambridge: Cambridge University Press.
Griffis V and Stedinger JR (2007) Log-Pearson type 3 distribution and its application Kjeldsen TR and Jones D (2007) Estimation of an index flood using data transfer
in flood frequency analysis. I: Distribution characteristics. Journal of Hydrologic in the UK. Hydrological Sciences–Journal–des Sciences Hydrologiques
Engineering 12(5): 482--491. 52(1): 86--98.
Grimaldi S and Serinaldi F (2006a) Asymmetric copula in multivariate flood frequency Klemeš V (1993) Probability of extreme hydrometeorological events – a different
analysis. Advances in Water Resources 29(8): 1155--1167 (doi:10.1016/ approach. In: Kundzewicz Z (ed.) Extreme hydrological events: Precipitation,
j.advwatres. 2005.09.005). Floods and Droughts, vol. 213, pp. 167--176. Wallingford: IAHS.
Grimaldi S and Serinaldi F (2006b) Design hyetograph analysis with 3-copula Kojadinovic I (2010) Hierarchical clustering of continuous variables based on the
function. Hydrological Sciences Journal 51(2): 223--238 (doi:10.1623/ empirical copula process and permutation linkages. Computational Statistics and
hysj.51.2.223). Data Analysis 54(1): 90–108.
Statistical Hydrology 515

Kothyari UC (1992) Rainfall intensity–duration–frequency formula for India. Journal of McDonnell JJ and Woods RA (2004) On the need for catchment classification. Journal
Hydraulic Engineering – ASCE 118(2): 323--336. of Hydrology 299: 2--3.
Koutsoyiannis D (2004a) Statistics of extremes and estimation of extreme rainfall: I. Merz R and Blöschl G (2005) Flood frequency regionalisation – spatial proximity vs
Theoretical investigation. Hydrological Sciences Journal 49(4): 575--590. catchment attributes. Journal of Hydrology 302(1–4): 283--306.
Koutsoyiannis D (2004b) Statistics of extremes and estimation of extreme rainfall: II. Miller JF (1973) Precipitation Frequency Analysis of the Western United States, NOAA
Empirical investigation of long rainfall records. Hydrological Sciences Journal Atlas 2.
49(4): 591--610. Mitosek HT, Strupczewski WG, and Singh VP (2006) Three procedures for selection of
Koutsoyiannis D (2006) Nonstationarity versus scaling in hydrology. Journal of annual flood peak distribution. Journal of Hydrology 323: 57--73.
Hydrology 324: 239--254. Moore DS (1986) Tests of chi-squared type. In: D’Agostino RB and Stephens AM
Koutsoyiannis D, Kozonis D, and Manetas A (1998) A mathematical framework for (eds.) Goodness-of-Fit Techniques, pp. 63--96. New York: Dekker.
studying rainfall intensity–duration–frequency relationships. Journal of Hydrology Nelsen RB (2006) An Introduction to Copulas. New York: Springer.
206(1–2): 118--135. Norris PM, Oreopoulos L, Hou AY, Tao W-K, and Zeng X (2008) Representation of 3D
Kroll C and Vogel RM (2002) Probability distribution of low streamflow series in the heterogeneous cloud fields using copulas: Theory for water clouds. Quarterly
United States. Journal of Hydrologic Engineering 7(2): 137--146. Journal of the Royal Meteorological Society 134: 1843--1864 (doi:10.1002/
Kruskal WH and Wallis WA (1952) Use of ranks in one-criterion variance analysis. qj.321).
Journal of the American Statistical Association 47: 583--621. NRC-US (1988) US National Research Council, Committee on Techniques for
Kuhn G, Khan S, Ganguly AR, and Branstetter ML (2007) Geospatial–temporal Estimating Probabilities of Extreme Floods, Estimating Probabilities of Extreme
dependence among weekly precipitation extremes with applications to observations Floods, Methods and Recommended Research. Washington DC: National
and climate model simulations in South America. Advances in Water Resources 30: Academies Press.
2401--2423 (doi:10.1016/j.advwatres.2007.05.006). O’Connell DR (2005) Nonparametric Bayesian flood frequency estimation. Journal of
Kuichling E (1889) The relation between the rainfall and the discharge of sewers in Hydrology 313(1–2): 79--96.
populous districts. Transactions of the American Society of Civil Engineers O’Connell DR, Ostenaa DA, Levish DR, and Klinger RE (2002) Bayesian flood
20(140): 782--794. frequency analysis with paleohydrologic bound data. Water Resources Research
Laaha G and Blöschl G (2006) Seasonality indices for regionalizing low flows. 38(5): 1058 (doi:10.1029/2000WR000028).
Hydrological Processes 20: 3851--3878. Ouarda TBMJ, Girard C, Cavadias GS, and Bobee B (2001) Regional flood frequency
Laio F (2004) Cramer–von Mises and Anderson-Darling goodness of fit tests for estimation with canonical correlation analysis. Journal of Hydrology 254(1–4):
extreme value distributions with unknown parameters. Water Resources Research 157--173.
40: W09308 (doi:10.1029/2004WR003204). Overeem A, Buishand A, and Holleman I (2008) Rainfall depth-duration-frequency
Laio F, Di Baldassarre G, and Montanari A (2009) Model selection techniques for the curves and their uncertainties. Journal of Hydrology 348(1–2): 124--134
frequency analysis of hydrological extremes. Water Resources Research 45: (doi:10.1016/j.jhydrol.2007.09.044).
W07416 (doi:10.1029/2007WR006666). Pan M, Wood EF, Wojcik R, and McCabe MF (2008) Estimation of regional terrestrial
Lang M, Ouarda T, and Bobée B (1999) Towards operational guidelines for over- water cycle using multi-sensor remote sensing observations and data assimilation.
threshold modeling. Journal of Hydrology 225(3–4): 103--117. Remote Sensing Environment 112(4): 1282--1294 (doi:10.1016/j.rse.2007.02.
Langbein WB (1949) Annual floods and the partial duration flood series. Transactions 039).
of the American Geophysical Union 30(6): 879--881. Panchenko V (2005) Goodness-of-fit test for copulas. Physica A 355(1): 176--182
Laux P, Wagner S, Wagner A, Jacobeit J, Bárdossy A, and Kunstmann H (2009) (doi:10.1016/j.physa.2005.02.081).
Modelling daily precipitation features in the Volta Basin of West Africa. Internatinal Pandey GR and Nguyen VTV (1999) A comparative study of regression based methods
Journal of Climatology (doi:10.1002/joc.1852). in regional flood frequency analysis. Journal of Hydrology 225(1–2): 92--101.
Lettenmaier DP, Wallis JR, and Wood EF (1987) Effect of regional heterogeneity on Papalexiou S and Koutsoyiannis D (2008) Ombrian curves in a maximum entropy
flood frequency estimation. Water Resources Research 23(2): 313--323. framework. In: European Geosciences Union General Assembly, p. 00702. http://
Libiseller C and Grimwall A (2002) Performance of partial Mann–Kendall tests www.itia.ntua.gr/en/docinfo/851/ (accessed March 2010).
for trend detection in the presence of covariates. Environmentrics 13: Parajka J, Blöschl G, and Merz R (2007) Regional calibration of catchment models:
71--84. Potential for ungauged catchments. Water Resources Research 43: W06406.
Madsen H (1997a) Comparison of annual maximum series and partial duration series Peel MC, Wang QJ, Vogel RM, and McMahon TA (2001) The utility of L-moment ratio
methods for modeling extreme hydrologic events 1. At-site modeling. Water diagrams for selecting a regional probability distribution. Hydrological Sciences
Resources Research 33(4): 747--757. Journal des Sciences Hydrologiques 46(1): 147--156.
Madsen H (1997b) Comparison of annual maximum series and partial duration series Pettitt AN (1979) A non-parametric approach to the change-point detection. Applied
methods for modeling extreme hydrologic events 2. Regional modeling. Water Statistics 28: 126--135.
Resources Research 33(4): 759--769. Pitman WV (1980) A depth–duration–frequency diagram for point rainfall in SWA-
Madsen H, Arnbjerg-Nielsen K, and Mikkelsen P (2009) Update of regional intensity– Namibia. Water SA 6(4): 157--162.
duration–frequency curves in Denmark: Tendency towards increased storm Poulin A, Huard D, Favre A-C, and Pugin S (2007) Importance of tail dependence in
intensities. Atmospheric Research 92(3): 343--349. bivariate frequency analysis. Journal of Hydrologics Engineering 12(4): 394--403
Madsen H, Pearson C, and Rosbjerg D (1997a) Comparison of annual maximum (doi:10.1061/(ASCE) 1084-0699(2007)12:4(394)).
series and partial duration series methods for modeling extreme hydrologic events Powell R (1943) A simple method of estimating flood frequencies. Civil Engineering
2. Regional modeling. Water Resources Research 33(4): 759--769. 13: 105--106.
Madsen H, Rasmussen PF, and Rosbjerg D (1997b) Comparison of annual maximum Ramesh N and Davison A (2002) Local models for exploratory analysis of hydrological
series and partial duration series methods for modeling extreme hydrologic events extremes. Journal of Hydrology 256(1–2): 106--119.
1. At-site modeling. Water Resources Research 33(4): 747--757. Rao AR and Yu GH (1986) Detection of nonstationarity in hydrologic time series.
Mailhot A, Duchesne S, Caya D, and Talbot G (2007) Assessment of future change in Management Science 32(9): 1206--1217.
intensity–duration–frequency (IDF) curves for Southern Quebec using the Reed DW, Jakob D, Robinson AJ, Faulkner DS, and Stewart EJ (1999) Regional
Canadian Regional Climate Model (CRCM). Journal of Hydrology 347(1–2): frequency analysis: A new vocabulary. In: Hydrological Extremes: Understanding,
197--210. Predicting, Mitigating, Proc. IUGG 99 Symposium, Publ. no. 255, pp. 237–243.
Maity R and Kumar DN (2008) Probabilistic prediction of hydroclimatic variables with Birmingham: IAHS.
nonparametric quantification of uncertainty. Journal of Geophysical Research 113: Renard B and Lang M (2007) Use of a Gaussian copula for multivariate extreme value
D14105 (doi:10.1029/2008JD009856). analysis: Some case studies in hydrology. Advances in Water Resources 30:
Martins E and Stedinger JR (2001) Historical information in a generalized maximum 897--912 (doi:10.1016/j.advwatres.2006.08.001).
likelihood framework with partial duration and annual maximum series. Water Robson AJ and Reed DW (1999) Flood Estimation Handbook, Vol. 3: Statistical
Resources Research 37(10): 2559--2567. Procedures for Flood Frequency Estimation. Wallingford: Institute of Hydrology.
Matalas N and Wallis J (1973) Eureka–it fits a Pearson type 3 distribution. Water Rosbjerg D, Madsen H, and Rasmussen PF (1992) Prediction in partial duration series
Resources Research 9(2): 281--289. with generalized Pareto distributed exceedances. Water Resources Research
Matalas NC and Langbein WB (1962) Information content of the mean. Journal of 28(11): 3001--3010.
Geophysical Research 67(9): 3441--3448. Salas JD (1993) Analysis and modeling of hydrologic time series. In: Maidment DR
Mays LW (2004) Water Resources Engineering, 2005th edn. New York: Wiley. (ed.) Handbook of Hydrology. New York: Mc-Graw Hill.
516 Statistical Hydrology

Salas JD, Delleur JW, Yevjevich V, and Lane WL (1980) Applied Modeling of Sklar A (1959) Functions de répartition à n dimensions et leurs marges. Publ. Inst.
Hydrologic Time Series. Littleton, CO: Water Resources Publications. Statist. Univ. Paris 8: 229–231.
Salvadori G and De Michele C (2004a) Analytical calculation of storm volume statistics Skøien JO and Blöschl G (2007) Spatiotemporal topological kriging of runoff time
involving Pareto-like intensity-duration marginals. Geophysical Research Letters series. Water Resources Research 43: W09419.
31: L04502 (doi:10.1029/2003GL018767). Skøien JO, Merz R, and Bloschl G (2006) Top-kriging–geostatistics on stream
Salvadori G and De Michele C (2004b) Frequency analysis via copulas: Theoretical networks. Hydrology and Earth System Sciences 10(2): 277--287.
aspects and applications to hydrological events. Water Resources Research 40: Smakhtin VU (2001) Low flow in hydrology: A review. Journal of Hydrology 240:
W12511 (doi:10.1029/2004WR003133). 147--186.
Salvadori G and De Michele C (2006) Statistical characterization of temporal structure Smakhtin VY, Hughes DA, and Creuse-Naudine E (1997) Regionalization of daily flow
of storms. Advances in Water Resources 29(6): 827--842 (doi:10.1016/ characteristics in part of the Eastern Cape, South Africa. Hydrological Sciences
j.advwatres.2005.07.013). Journal des Sciences Hydrologiques 42(6): 919--936.
Salvadori G and De Michele C (2007) On the use of copulas in hydrology: Theory and Stedinger JR (1983) Estimating a regional flood frequency distribution. Water
practice. Journal of Hydrologics Engineering 12(4): 369--380 (doi:10.1061/ Resources Research 19: 503--510.
(ASCE)1084-0699(2007)12:4 (369)). Stedinger JR and Cohn TA (1986) Flood frequency analysis with historical and
Salvadori G, De Michele C, Kottegoda NT, and Rosso R (2007). Extremes in nature–An paleoflood information. Water Resources Research 22(5): 785--793.
Approach Using Copulas: Water Sciences and Technology Library Series. vol. 56. Stedinger JR and Griffis VW (2008) Flood frequency analysis in the United States:
New York: Springer. Time to update. Journal of Hydrological Engineering 13(4): 199--204.
Sangal B and Biswas A (1970) 3-parameter lognormal distribution and its applications Stedinger JR and Lu L (1995) Appraisal of regional and index flood quantile
in hydrology. Water Resources Research 6(2): 505--515. estimators. Stochastic Hydrology and Hydraulics 9(1): 49--75.
Saunders R and Laud P (1980) The multidimensional Kolmogorov goodness-of-fit Stedinger JR, Vogel RM, and Foufoula-Georgiou E (1993) Frequency analysis of
test. Biometrika 67(1): 237. extreme events. In: Maidment DR (ed.) Handbook of Hydrology, Ch. 18. New York:
Scaillet O (2007) Kernel based goodness-of-fit tests for copulas with fixed smoothing McGraw-Hill Inc.
parameters. Journal of Multivariate Analysis 98(3): 533--543 (doi:10.1016/ Stephens MA (1986) Tests based on EDF statistics. In: D’Agostino RB and Stephens
j.jmva.2006.05.006). AM (eds.) Goodness-of-fit techniques, pp. 97--194. New York: Dekker.
Schaefer MG (1990) Regional analysis of precipitation annual maxima in Washington Strupczewski WG, Singh VP, and Weglarczyk S (2002) Asymptotic bias of estimation
State. Water Resources Research 26(1): 119--131. methods caused by the assumption of false probability distributions. Journal of
Scheffe M (1959) The analysis of variance. 477 pp. New York: Wiley. Hydrology 258: 122--148.
Schwartz G (1978) Estimating the dimension of a model. Annals of Statistics 6: Takeuchi K (1984) Annual maximum series and partial-duration series-Evaluation of
461--464. Langbein’s formula and Chow’s discussion. Journal of Hydrology 68(1–4):
Serinaldi F (2009) Copula-based mixed models for bivariate rainfall data: An empirical 275--284.
study in regression perspective. Stochostic Environmental Research and Risk Thompson EM, Baise LG, and Vogel RM (2007) A global index earthquake approach to
Assessment 23(5): 677–693. (doi:10.1007/s00477-008-0249-z). probabilistic assessment of extremes. Journal of Geophysical Research 112:
Serinaldi F, Bonaccorso B, Cancelliere A, and Grimaldi G (2009) Probabilistic B06314.
characterization of drought properties through copulas. Physics Chemistry of the Todorovic P (1978) Stochastic models of floods. Water Resources Research 14(2):
Earth (in press) (doi:10.1016/j.pce.2008.09.004). 345--356.
Serinaldi F and Grimaldi S (2007) Fully nested 3-copula: Procedure and application on Toth E (2009) Classification of hydro-meteorological conditions and multiple artificial
hydrological data. Journal of Hydrologics Engineering 12(4): 420--430 neural networks for streamflow forecasting. Hydrology and Earth System Sciences
(doi:10.1061/(ASCE) 1084-0699(2007) 12:4(420)). Discussion 6: 897--919.
Serinaldi F, Grimaldi S, Napolitano F, and Ubertini L (2005) A 3-copula function Troutman BM and Karlinger MR (2003) Regional flood probabilities. Water Resources
application for design hyetograph analysis. In: Savic DA, Mariño MA, Savenije Research 39(4): 1095.
HHG, and Bertoni JC (eds.) Sustainable Water Management Solutions for Large Veneziano D, Lepore C, Langousis A, and Furcolo P (2007) Marginal methods of
Cities. Wallingford: IAHS. 293 ISBN 1-901502-97-X. 203-212. intensity-duration-frequency estimation in scaling and nonscaling rainfall. Water
Shao J and Tu D (1995) The Jackknife and Bootstrap. New York: Springer. Resources Research 43(10): W10418, 14.
Shaw E and Shaw EM (1998) Hydrology in Practice, 3rd edn. Abingdon, Oxon: Viglione A and Blöschl G (2009) On the role of storm duration in the mapping of
Routledge. rainfall to flood return periods. Hydrology and Earth System Sciences 13(2):
Shiau J-T (2006) Fitting drought duration and severity with two-dimensional copulas. 205--216.
Water Resources Management 20: 795--815 (doi:10.1007/s11269-005-9008-9). Viglione A, Laio F, and Claps P (2007) A comparison of homogeneity tests for regional
Shiau J-T, Feng S, and Nadarajah S (2007) Assessment of hydrological droughts for frequency analysis. Water Resources Research 43: W03428.
the Yellow River, China, using copulas. Hydrological Processes 21(16): Villarini G, Serinaldi F, and Krajewski WF (2008) Modeling radar-rainfall estimation
2157--2163 (doi:10.1002/hyp.6400). uncertainties using parametric and non-parametric approaches. Advances in Water
Shiau J-T, Wang H-Y, and Tsai C-T (2006) Bivariate frequency analysis of flood using Resources 31(12): 1674--1686 (doi:10.1016/j.advwatres.2008.08.002).
copulas. Journal of American Water Resources Association 42(6): 1549--1564 Vogel RM and Fennessey NM (1993) L-moment diagrams should replace product-
(doi:10.1111/j.1752-1688. 2006.tb06020.x). moment diagrams. Water Resources Research 29(6): 1745--1752.
Shu C and Burn DH (2004) Artificial neural network ensembles and their Vogel RM and Kroll CN (1992) Regional geohydrologic–geomorphic relationships for
application in pooled flood frequency analysis. Water Resources Research the estimation of low-flow statistics. Water Resources Research 28(9): 2451--2458.
40: W09301. Vogel RM and McMartin DE (1991) Probability plot goodness-of-fit and skewness
Shu C and Ouarda TBMJ (2007) Flood frequency analysis at ungauged sites using estimation procedures for the Pearson type 3 distribution. Water Resources
artificial neural networks in canonical correlation analysis physiographic space. Research 27(12): 3149--3158.
Water Resources Research 43: W07438. Vogel RM and Wilson I (1996) Probability distribution of annual maximum, mean, and
Singh K and Singh VP (1991) Derivation of bivariate probability density functions with minimum streamflows in the United States. Journal of Hydrologic Engineering
exponential marginals. Stochastics Hydrology and Hydraulics 5(1): 55--68. 1(2): 69--76.
Singh VP and Zhang L (2007) IDF curves using the Frank Archimedean copula. Vrac M, Chedin A, and Diday E (2005) Clustering a global field of atmospheric profiles
Journal of Hydrologics Engineering 12(6): 651--662 (doi:10.1061/(ASCE)1084- by mixture decomposition of copulas. Journal of Atmospheric and Oceanic
0699(2007)12:6(651)). Technology 22(10): 1445--1459 (doi:10.1175/JTECH1795.1).
Sillitto GP (1969) Derivation of approximants to the inverse distribution function of a Wallace C and Clayton D (2003) Estimating relative recurrence risk ratio. Genetic
continuous univariate population from the order statistics of a sample. Biometrika Epidemiology 25(4): 293--302 (doi: 10.1002/gepi.10270).
56(3): 641--650. Wang QJ (1998) Approximate goodness-of-fit tests of fitted generalized extreme value
Sivapalan M, Blöschl G, Merz R, and Gutknecht D (2005) Linking flood frequency to distributions using LH moments. Water Resources Research 34(12): 3497--3502.
long-term water balance: Incorporating effects of seasonality. Water Resources Wanielista M (1990) Hydrology and Water Quality Control. New York: Wiley.
Research 41(6): W06012 (doi:10.1029/2004WR003439). Weibull W (1939) A statistical theory of the strength of materials, pp. 5–45. Number
Sivapalan, et al. (2003) IAHS decade on predictions in ungauged basins (PUB), 2003– 51. Ingeniors Vetenskaps Akademien (The Royal Swedish Institute for Engineering
2012: Shaping an exciting future for the hydrological sciences. Hydrological Research).
Sciences Journal des Sciences Hydrologiques 48(6): 857--880. Wiesberg S (1985) Applied Linear Regression, 2nd edn. New York: Wiley.
Statistical Hydrology 517

Wist HT, Myrhaug D, and Rue H (2004) Statistical properties of successive wave Zhang L and Singh VP (2007b) Gumbel-Hougaard copula for trivariate rainfall
heights and successive wave periods. Applied Ocean Research 26(3–4): 114--136 frequency analysis. Journal of Hydrologics Engineering 12(4): 409--419
(doi:10.1016/j.apor.2005.01.002). (doi:10.1061/(ASCE)1084-0699 (2007)).
Wong H, Hu BQ, Ip WC, and Xia J (2006) Change-point analysis of hydrological time Zhang L and Singh VP (2007c) Trivariate flood frequency analysis using the Gumbel-
series using grey relational method. Journal of Hydrology 324: 323--338. Hougaard copula. Journal of Hydrologics Engineering 12(4): 431--439
Xiong L and Guo S (2004) Trend test and change-point detection for the annual (doi:10.1061/(ASCE)1084-0699 (2007)).
discharge series of the Yangtze River at the Yichang hydrological station. Zrinji Z and Burn DH (1996) Regional flood frequency with hierarchical region of
Hydrological Sciences Journal 49(1): 99--112. influence. Journal of Water Resources Planning and Management 122(4): 245--252.
Yue S (2001) The bivariate lognormal distribution to model a multivariate flood
episode. Hydrological Processes 14: 2575--2588 (doi:10.1002/hyp.259).
Yue S, Pilon P, and Cavadias G (2002) Power of the Mann-Kendall and Spearman’s rho
tests for detecting monotonic trends in hydrological series. Journal of Hydrology
259: 254--271.
Zhang L and Singh VP (2006) Bivariate flood frequency analysis using the copula Relevant Websites
method. Journal of Hydrologics Engineering 11(2): 150--164 (doi:10.1061/(ASCE)
1084-0699(2006) 11:2 (150)). http://www.stahy.org
Zhang L and Singh VP (2007a) Bivariate rainfall frequency distributions using STAHY- WG, Statistics in Hydrology Working Group.
Archimedean copulas. Journal of Hydrology 332(1–2): 93--109 (doi:10.1016/ http://www.iash.info
j.jhydrol.2006.06.033). What you need, when you need it.

You might also like