You are on page 1of 18

Journal of Hydrology 554 (2017) 137–154

Contents lists available at ScienceDirect

Journal of Hydrology
journal homepage: www.elsevier.com/locate/jhydrol

Research papers

Bayesian estimation of extreme flood quantiles using a rainfall-runoff


model and a stochastic daily rainfall generator
Veber Costa ⇑, Wilson Fernandes
Dept. of Hydraulic and Water Resources Engineering, Federal University of Minas Gerais, Brazil

a r t i c l e i n f o a b s t r a c t

Article history: Extreme flood estimation has been a key research topic in hydrological sciences. Reliable estimates of
Received 30 March 2017 such events are necessary as structures for flood conveyance are continuously evolving in size and com-
Received in revised form 4 September 2017 plexity and, as a result, their failure-associated hazards become more and more pronounced. Due to this
Accepted 5 September 2017
fact, several estimation techniques intended to improve flood frequency analysis and reducing uncer-
Available online 8 September 2017
tainty in extreme quantile estimation have been addressed in the literature in the last decades. In this
paper, we develop a Bayesian framework for the indirect estimation of extreme flood quantiles from
Keywords:
rainfall-runoff models. In the proposed approach, an ensemble of long daily rainfall series is simulated
Extreme floods
Upper-bounded distribution function
with a stochastic generator, which models extreme rainfall amounts with an upper-bounded distribution
Bayesian inference function, namely, the 4-parameter lognormal model. The rationale behind the generation model is that
Daily rainfall stochastic generation physical limits for rainfall amounts, and consequently for floods, exist and, by imposing an appropriate
Hydrologic modeling upper bound for the probabilistic model, more plausible estimates can be obtained for those rainfall
quantiles with very low exceedance probabilities. Daily rainfall time series are converted into stream-
flows by routing each realization of the synthetic ensemble through a conceptual hydrologic model,
the Rio Grande rainfall-runoff model. Calibration of parameters is performed through a nonlinear regres-
sion model, by means of the specification of a statistical model for the residuals that is able to accommo-
date autocorrelation, heteroscedasticity and nonnormality. By combining the outlined steps in a Bayesian
structure of analysis, one is able to properly summarize the resulting uncertainty and estimating more
accurate credible intervals for a set of flood quantiles of interest. The method for extreme flood indirect
estimation was applied to the American river catchment, at the Folsom dam, in the state of California,
USA. Results show that most floods, including exceptionally large non-systematic events, were reason-
ably estimated with the proposed approach. In addition, by accounting for uncertainties in each modeling
step, one is able to obtain a better understanding of the influential factors in large flood formation
dynamics.
Ó 2017 Elsevier B.V. All rights reserved.

1. Introduction However, conventional probabilistic modeling strategies, based


on at-site flood frequency analysis, are, in general, ineffective in
Floods are one of the most complex phenomena in nature. From describing the entire spectrum of maximum flood events, as the
their very genesis, the combination of a large collection of interve- available samples are usually small and do not provide a full pic-
nient factors, as related to the meteorological conditions for ture on the behavior of the upper tail of the floods distributional
extreme rainfall formation, the temporal and spatial patterns of model. In addition, uncertainty estimation, which is a key topic
the produced storms, and the particular hydrologic characteristics for accounting for the potential variability of the phenomenon
that may affect the catchment response, makes each flood a virtu- being modeled, is frequently performed on the basis of asymptotic
ally unique event, which arises from intricate physical processes properties of parameters/quantile estimators, which might not be
with unpredictable outcomes. suitably met in samples with reduced size.
Estimation of rare and extreme flood quantiles is a necessary Several techniques have been developed for improving flood
expedient for establishing guidelines for the design and risk assess- frequency analysis and reducing modeling uncertainties. Most
ment of large hydraulic structures related to flood mitigation. procedures rely on increasing sample sizes of block-maxima by
incorporating regional (e.g., Hosking and Wallis, 1997) or non-
⇑ Corresponding author. systematic (e.g., Baker et al., 2002) information. Alternative infer-
E-mail addresses: veber@ehr.ufmg.br (V. Costa), wilson@ehr.ufmg.br ence procedures have also gained prominence in the last decades.
(W. Fernandes). An example in this context is the peaks-over-threshold framework

http://dx.doi.org/10.1016/j.jhydrol.2017.09.003
0022-1694/Ó 2017 Elsevier B.V. All rights reserved.
138 V. Costa, W. Fernandes / Journal of Hydrology 554 (2017) 137–154

(Katz and Parlange, 2002 and references therein). Another interest- tion, as the bounded model has to converge to the upper bound,
ing approach is that in which maximum-related deterministic precipitation and flood quantiles will be restricted to physically
quantities, such as the Probable Maximum Precipitation (PMP) plausible values, which is a desirable property for modeling pur-
and the Probable Maximum Flood (PMF), are accommodated in poses. Nonetheless, PMP and PMF estimates are not univocal and
probabilistic models for hydrologic random variables as estimators depend on the available samples and estimation techniques. As
for upper bound parameters (Elíasson, 1994, 1997; Takara and such, these quantities should not be understood as actual upper
Tosa, 1999; Botero and Francés, 2010; Fernandes et al., 2010; bounds, but as quantiles with small (yet non-null) exceedance
Guse et al., 2010; Costa et al., 2015). probability, for which a suitably uncertainty account is required.
The rationale behind the use of distribution functions with Bayesian analysis emerges as a convenient approach for
bounded upper tails in the modeling of hydrologic block-maxima addressing this problem, since such a school of inference provides
is the assumption that precipitation has a physical upper bound, appropriate tools for handling and quantifying uncertainty. In this
which results from the potentially limited supply of moisture to sense, the random nature of PMP and PMF estimates may be incor-
the atmosphere. Once rainfall is assumed to be bounded, this will porated into logical structures of analysis for estimating upper
also be the case for floods. The existence of upper bounds for the bounds by means of prior uncertainty distributions. Fernandes
referred random variables, however, remains a controversial issue et al. (2010) proposed a method for flood frequency analysis in
in hydrological sciences. On the one hand, authors such as which this rationale is utilized. In the referred work, the authors
Yevjevich (1968), Klemes (1987), Yevjevich and Harmancioglu elicited an informative prior distribution for the upper bound
(1987) and Papalexiou and Koutsoyiannis (2006) argue that an parameter on the basis of regional information from a large collec-
exhaustive evaluation of the causal factors in the formation of tion of PMF estimates in the United States. The predictive posterior
extreme storms and floods is beyond our current understanding distribution, as obtained from the bounded model, provided appro-
and modeling capability. This fact prevents one to prescribe quan- priate fits even for paleofloods, which could not be described by
tiles with null exceedance probability. Furthermore, according to unbounded distributions of frequent use in hydrology, such as
Hosking and Wallis (1997), the misspecification of upper bounds the Generalized Extreme Value Distribution (GEV). In addition,
in flood frequency analysis may entail ill estimates for the quan- the credible intervals associated to the bounded model were nar-
tiles of actual interest in the design of hydraulic structures. rower than the ones obtained for the unbounded Bayesian counter-
On the other hand, some authors are skeptical on the possibility parts, which, at some extent, shows that incorporating an
of rainfall volumes and floods to increase indefinitely. As early as in appropriate upper bound to the probabilistic model improves the
1936, Horton, an influential researcher in the field of hydrology, estimation of extreme quantiles. However, some criticism may be
had questioned the plausibility of a small stream to convey a flood assigned for the proposed upper bound prior distribution, since
as large as the ones of the Mississippi River. This line of reasoning floods are scale-dependent and their mechanisms of formation
is also supported by Boughton (1980, 1999), Laursen (1983) and greatly differ from one catchment to another. Thus, very distinct
Takara and Tosa (1999). Objective evidence on the existence of information sources were aggregated in the prior knowledge
upper bounds for floods was provided by Enzel et al. (1993). In regarding the floods upper bound, possibly entailing some loss of
effect, these authors demonstrated that the inclusion of non- physical realism in the posterior analyses.
systematic information, associated to paleofloods whose occur- In this paper, we propose an alternative approach in which
rence spans a 4,000-year period, does not affect the envelope maximum flood quantiles are indirectly estimated from rainfall-
curves derived on the basis of the systematic records of streamflow runoff models. The reasoning of the method is using rainfall
block-maxima for the Colorado River. This fact suggests that a nat- volumes as the variable of primary interest, since large areas are
ural physical limit for floods exists at this particular location. Sim- characterized by similar meteorological conditions for rainfall
ilar conclusions were obtained by Jacoby et al. (2008) in the formation, which makes rainfall less scale dependent than floods.
modeling of maximum flood events in Israeli catchments. The Due to this fact, by modeling rainfall amounts, one is expected to
adoption of an upper-bounded distribution function in this paper elicit more physically realistic and informative prior distributions
is mainly based on the findings of these two works, although they for upper bound parameters from regional PMP estimates. In addi-
alone are not expected to provide definitive conclusions on the tion, samples of daily rainfall volumes are usually larger than those
outlined dispute. related to streamflows, which makes the former more likely to
A comprehensive discussion of modeling strategies based on contain records of extreme events and might provide a more accu-
the assumption of upper-bounded hydrologic variables is provided rate description of the precipitation distribution upper tail. The
by Fernandes et al. (2010) and Costa et al. (2015). As stated by Bayesian bounded model for maximum rainfall precipitation is uti-
these authors, a major issue concerning such strategies lies on lized as a functional partition of a recently proposed stochastic
the estimation of upper bound parameters of theoretical probabil- daily rainfall generator (Costa et al., 2015), which comprises a
ity functions. In fact, widely spread estimation techniques, such as well-suited algorithm for simulating long rainfall series, while pre-
the maximum likelihood approach, are usually unable to provide serving statistical features of the original random sample, as well
reliable estimates for the upper bound. It can be shown that, for as for providing a suitable ensemble of realizations of the phe-
most models, the maximum likelihood estimates (MLE) for the nomenon, which allows one to evaluate a large set of hydrologic
upper bound parameter are equal or very similar to the highest responses of a catchment under different yet equally likely input
recorded value in the random sample, which, in turn, is unlikely conditions.
to reflect the most extreme conditions for rainfall or floods forma- Stochastic rainfall generators have been employed in
tion at a given catchment. simulation-based hydrologic modeling since the 1960’s (Gabriel
By treating the upper bound parameter as a deterministic quan- and Neumann, 1962). During the last two decades, a number of
tity, whose magnitude is well beyond the observed data, some techniques, based on parametric and nonparametric inference pro-
drawbacks of the conventional maximum likelihood approach cedures, have been developed for constructing at-site and multisite
are, at least partially, circumvented in the inference. In effect, generation models (Wilks, 1998; Yates et al., 2003; Apipattanavis
because of the location of the upper bound, the upper-bounded et al., 2007; Basinger et al., 2010; Li et al., 2012). Overall, these
model may behave very similarly to either light or heavy-tailed models are able to reproduce summary statistics such as means,
distributions over a wide range of the random variable domain, standard deviations and mean number of wet days per month.
which should contain most quantiles of practical interest. In addi- Most of them, however, are unfit to produce reliable estimates of
V. Costa, W. Fernandes / Journal of Hydrology 554 (2017) 137–154 139

extreme precipitation events (Sharif and Burn, 2006; Furrer and Wilks, 1999) and Markov chains (Gabriel and Neumann, 1962;
Katz, 2008; Hundecha et al., 2009; Li et al., 2012; Chen and Boughton, 1999), are commonly used approaches for this purpose,
Brissette, 2014). Extensively used light-tailed models, such as the as they provide effective means for describing the dependence
Gamma distribution, often underestimate the frequency and mag- structures of daily rainfall events.
nitude of rare and extreme rainfall depths, whereas those models The modeling of rainfall amounts, on the other hand, is a chal-
with subexponential upper tails, such as the Generalized Pareto, lenging and complex task. In fact, low to moderate and extreme
might entail the frequent generation of exceptionally large precip- rainfall depths are inherently related to very distinct physical pro-
itation amounts, which may prove inconsistent with the physical cesses, which may not be entirely accommodated into a single
mechanisms of rainfall formation at particular locations. This con- mathematical model. In other words, as they are a result of very
stitutes an additional motivation for utilizing a bounded model, as particular meteorological conditions, extreme rainfall amounts
it restrains the generated values to a physically plausible range are expected to be sampled from a different population as com-
and, at the same time, mimics the actual behavior of the upper tail pared to the most frequent precipitation volumes. Furthermore,
of daily rainfall block-maxima for a set of conveniently specified it has been noted by Koutsoyiannis (2004), Papalexiou and
quantiles, according to hydraulic structures design requirements. Koutsoyiannis (2012), Li et al., (2012) and Papalexiou et al.
The streamflow series, as related to the synthetic daily rainfall (2013) that daily rainfall annual block-maxima are less likely to
ensemble derived from stochastic simulation, are obtained by be extracted from distributions with exponential upper tails. In
means of a rainfall-runoff model, namely the Rio Grande model, fact, as pointed by a recently published study by Papalexiou and
whose structure is based on the Xinanjiang model, originally pro- Koutsoyiannis (2016), the Generalized Gamma model, which has
posed by Zhao et al. (1980), and is described in details in Appendix a stretched upper tail as compared to the exponential counterparts,
A. Such a strategy, however, imposes other challenges to the ana- appears to be the most appropriate distributional model for a large
lyst, since new sources of uncertainty, associated to input errors, set of samples of annual maximum daily rainfall, followed by mod-
parameter estimation and model structure, are aggregated into els with subexponential tails. Therefore, in order to appropriately
the analysis. Nonetheless, the Bayesian paradigm provides a describe the overall spectrum of rainfall amounts, it may be neces-
straightforward framework for accounting for the resulting uncer- sary employing hybrid models, which arise from the combination
tainty by treating the hydrological model parameters, as well as of two probability distributions set apart by a suitable threshold:
those of the probabilistic model of calibration residuals, as random a light-tailed right-skewed distribution, for modeling low to mod-
variables, whose probabilistic behavior is summarized by a joint erate rainfall, and a distribution with heavier tails for modeling the
posterior distribution. Thus, while constructing flood quantiles most extreme ones (Hundecha et al., 2009; Li et al., 2012; Ramesh
curves by exploring rainfall synthetic series and different sets of and Onof, 2014).
parameters of the hydrological model, as sampled from the joint Strictly parametric hybrid models represent an intuitive choice
posterior distribution, the uncertainties are properly quantified for composing the stochastic daily rainfall generator structure, as
and coherently propagated in all modeling steps, which is expected they may readily fit the observed sample of non-null rainfall
to entail more reliable estimates of rare and extreme floods. depths and may suitably model those amounts that exceed a suffi-
The remainder of this paper is organized as follows. In Section 2, ciently high threshold by means of imposing a heavy tail-like decay
the theoretical foundations of the proposed method are explored. from the threshold onwards. However, an intractable problem may
First, a brief description of the mixed model for stochastic daily stem for most sets of candidate distributions for constructing a
rainfall generation (Costa et al., 2015) is provided, along with the hybrid model: the resulting probability density function (PDF) will
elicitation of prior uncertainty distributions for the parameters of rarely be continuous at the transition point, which prevents one to
the 4-parameter lognormal distribution (LN4). Next, calibration obtain parameter estimates from usual estimation methods, such
of parameters of conceptual rainfall-runoff models under the Baye- as the maximum likelihood, and may incur in numerical instability
sian paradigm and the generalized log-likelihood function along the simulations (Furrer and Katz, 2008).
(Schoups and Vrugt, 2010), which introduces a model for calibra- Nonparametric models (Lall and Sharma, 1996; Apipattanavis
tion residuals that accommodates autocorrelation, heteroscedas- et al., 2007; Basinger et al., 2010) may, to some extent, circumvent
ticity and non-Gaussian behavior, are addressed. Finally, a the outlined problem, as no theoretical distribution functions are
summary on the construction of flood quantile curves and the cor- fit to the data. In this sense, low to moderate and extreme rainfall
responding uncertainty evaluation is provided. Section 3 describes distributions can be obtained from kernel density estimation, with
an application of the proposed method to the American River no need for prescribing a common value for probability density at
Catchment, at Folsom Dam. The conclusions and recommendations the threshold. Nonetheless, the most frequently used kernel esti-
of the study are presented in Section 4. Lastly, Appendix A provides mators are light-tailed, which restrains the ability of extrapolation
a description of the Rio Grande rainfall-runoff model. of the nonparametric density for extreme rainfall amounts
(Markovich, 2007; Furrer and Katz, 2008; Li et al., 2012) and might
entail an ill description of the upper tail of the distribution of max-
2. Theoretical background imum daily precipitation.
Costa et al. (2015) proposed an alternative approach, which
2.1. The mixed model for stochastic daily rainfall generation combines parametric and nonparametric methods in the stochastic
generator structure. The authors termed the resulting model as the
Stochastic generation of daily rainfall is a widespread technique mixed model for stochastic daily rainfall generation. The rationale
for hydrologic simulation. Comprehensive reviews on the subject behind the mixed model is similar to that of hybrid parametric
may be found in Srikanthan and McMahon (2000) and Chen and models, i.e., distinct processes of rainfall formation require differ-
Brissette (2014). In general, daily rainfall stochastic generators ent modeling approaches. However, as defining a coherent thresh-
comprise a two-step simulation scheme. The first step models old, which ascertains the continuity of the PDF’s at this transition
the occurrence of rainfall, whereas, in the second step, rainfall point, for particular pairs of theoretical probability distributions,
amounts are estimated on wet days. Rainfall occurrence is often may be impossible, one may simulate a given range of the domain
modeled through stochastic processes that are able to reproduce of daily precipitation volumes in a nonparametric framework and,
the persistence of wet and dry intervals. Simple modeling alterna- thus, trivially bypass the mathematical constraint imposed by the
tives, such as alternating renewal processes (Wilby et al., 1998; definition of the threshold.
140 V. Costa, W. Fernandes / Journal of Hydrology 554 (2017) 137–154

The mixed model is derived from a first-order Markov process, 2. For each time step of the simulation, a random number is gen-
described by a 3-state transition probability matrix (TPM), which erated from a uniform distribution defined in the interval (0, 1).
represent dry days (d), wet days (w) and extremely wet days (e). Such a random number is then compared to the conditional
Such states are intended to simulate the possible conditions of probabilities in the appropriate line of the TPM, which is
rainfall occurrence by setting apart low to moderate and extreme defined by the rainfall occurrence state of the previous day. This
rainfall amounts in wet days, allowing one to employ different allows one to construct a Markov chain by exploring the transi-
inference strategies for estimating precipitation volumes. The tran- tions between the 3 states of the TPM. The procedure is
sition probability matrices are defined for every day of the year, by repeated until a series with the desired size is obtained;
accounting for the empirical frequency of transitions between 3. From the Markov chain defined in the previous step, a vector of
states in the random sample, which simply corresponds to the rainfall amounts is estimated. If, for a given day, state d is
MLE’s. The daily basis for the construction TPM’s entails some selected, the algorithm returns a null value. If, on the other
degree of smoothing of the transition probabilities along the year, hand, the chain is state w, the rainfall amount is simulated by
as compared to the usual monthly time span, and, at some extent, resampling inside a moving window centered on the day of
allows more complex dependence structures in the daily rainfall interest. Finally, if state e holds, the rainfall amount on that
time series to be reproduced. day is obtained by the inverse function of the LN4 model; and
Simulation of low to moderate rainfall amounts is based on a 4. The overall procedure is repeated until a previously defined
data-driven procedure, by means of the bootstrap resampling tech- number of synthetic time series is simulated.
nique (Efron, 1979). The underlying assumption on this choice is
that the available random samples should provide an appropriate The performance evaluation of the model is based on the com-
summary on the variability of daily rainfall amounts, for each parison of a collection of daily summary statistics for the synthetic
day of the year. By following this line of reasoning, no conjectures series and those related to the observed ones, both with the same
on a theoretical probabilistic model are necessary for describing sample size, for each month of the year, as a means for checking
this range of the random variable domain and no mathematical the appropriate reproduction of seasonal features of daily rainfall.
constraints are imposed for defining the threshold between transi- The summary statistics, as proposed by Srikanthan and Pegram
tion states. In addition, as TPM’s are defined for the daily time span, (2009), are: (a) mean; (b) standard deviation; (c) coefficient of
the nonparametric approach entails a more parsimonious model, skewness; (d) average number of wet days; and (e) annual daily
as the parametric counterpart would require parameter estimation maximum rainfall amount. Due to the random nature of the gener-
for a set of 365 models and, thus, a higher degree of uncertainty ation algorithm, a particular synthetic time series may not reflect
should result. Bootstrapping is performed inside moving windows, the statistical features of the original random sample, even though
centered and symmetrically spread with respect to the day being all generated series are considered equally likely realizations of the
simulated. The size of the moving windows is user defined. During same stochastic process. Thus, for evaluating the performance of
the simulations, one may choose between all non-null values the algorithm, a large set of synthetic daily rainfall series is gener-
inside the window, providing larger sets for resampling, or ated, from which the median values of the performance indexes
between those records with identical combination of transition will be extracted as point estimates, and the 2.5% and 97.5% asso-
states, as a means for preserving, at least partially, the dependence ciated quantiles provide an uncertainty measure. After the valida-
structure of the data. tion of the model for synthetic series with the same sample size as
As for the modeling of extreme rainfall amounts, a fully paramet- the observed records, the previously mentioned summary statis-
ric approach is employed, in order to allow the extrapolation for tics, except for monthly annual maxima, are computed for
quantiles that are much larger than the observed records. In the 10,000-year long generated series, in order to evaluate whether
application discussed in this paper, the LN4 distribution is utilized, the statistical similarity holds when very long rainfall series are
since such a model presents an explicit upper bound parameter, simulated. This additional validation step is necessary for indi-
which ensures that the generated rainfall depths are restrained to rectly constructing flood quantile curves and estimating their cor-
a physically plausible range. A detailed description of the referred responding predictive uncertainty. It is worth mentioning that, in
probability distribution function is provided in Section 2.2. the application discussed in Section 3, we have assumed that the
Parameter estimation of the LN4 model is based on the Bayesian ensemble of 10,000-year generated series is simulated from a sta-
framework. An informative prior uncertainty distribution on the tionary stochastic process. Such an option results from two model-
upper parameter is elicited by means of aggregating regional infor- ing aspects. First, no significant monotonic trends or jumps were
mation from a large set of PMP estimates, following the sugges- detected in the observed annual block-maxima sample, when
tions of Fernandes et al. (2010). Since, in general, a single PMP applying, respectively, the Mann-Kendall and Pettitt hypothesis
estimate is available in a given catchment, one cannot obtain a full tests at significance level of 5%. Second, the time window provided
picture of PMP seasonal variability for modeling purposes and, as a by the observed records (about 35 years) is considered too small
result, a single set of parameters, as obtained from the posterior for identifying a suitable model for any kind of trend, even if one
descriptive distribution, is utilized in the stochastic generator had been, in fact, verified in the data.
structure. This might appear incoherent from a theoretical point
of view, as PMP estimates are expected to assume very distinct val- 2.2. The 4-parameter lognormal model and prior distributions for its
ues in dry and wet seasons of the year. However, by fixing a conve- parameters
niently high threshold, one may restrain the conditional
probabilities of generating extreme rainfall events to very low val- Some models of frequent use in hydrology may present upper-
ues during the dry season, making it highly unlikely that daily rain- bounded forms, due to particular combinations of parameter esti-
fall amounts with the same order of magnitude of the PMP mates. Among these models, one can mention the 3-parameter
estimate to be generated when physically plausible conditions GEV, the log-Pearson type 3 (LP3) and the Kappa distributions.
for this are not met. These bounded models, however, arise from general parametric
In short, the daily rainfall generation algorithm works as follows: forms, which also accommodate unbounded upper tails and do
not present an explicit upper bound parameter, hindering potential
1. The conditional transition probabilities, for each day of the year, associations with meteorological characteristic of the region in
are estimated from the historical time series; study. In addition, the upper-bounded forms of these models result
V. Costa, W. Fernandes / Journal of Hydrology 554 (2017) 137–154 141

in negative coefficient of skewness, which are, for most applica- is that in which l / 1, or, in other words, l would follow a uniform
tions, inappropriate for modeling maximum-related random distribution, defined in the interval ð1; 1Þ. Such a distribution,
variables. however, is improper, since integrating it along the entire param-
Another class of distributions, with fewer applications in hydro- eter domain would not result in a unity value. This fact, per se, is
logic sciences, comprise bounded models that are right-skewed. Of not a constraint for inference, as long as the posterior distribution
primary interest for this paper are those distributional models with is proper. However, according to Robert (2007), the use of impro-
such a shape characteristic and explicit upper-bound parameter, per distributions might entail inconsistent posterior results and
for which an informative prior uncertainty distribution can be eli- should be avoided whenever it is possible. An alternative for cir-
cited on the grounds of regional information from PMP estimates. cumventing the outlined problem is utilizing flat proper distribu-
Among these models, it is worth mentioning the Transformed tions as a means of reflecting the absence of previous knowledge.
Distribution Function, or TDF (Elíasson, 1997), the Extreme Value For instance, the prior uncertainty regarding parameter l may be
Type 4 distribution, or EV4 (Kanda, 1981), and the 4-parameter summarized by a normal distribution with very large variance,
lognormal distribution, or LN4 (Slade, 1936). The latter will be used which implies that approximately the same probability density is
in the parametric module of the mixed model for daily rainfall attributed to every possible value in the real number line. A similar
generation. rationale may be employed for the scale parameter r. Nonetheless,
The LN4 distribution is bounded in both lower and upper tails such a parameter is only defined for positive non-null real values,
and arises from the following transformation: which makes the normal model an inappropriate choice for its
  prior distribution. Thus, a flat Gamma, which has domain ð0; 1Þ,
Xe
Y ¼ ln ð1Þ is selected for describing the prior uncertainty on r .
aX As for the upper bound parameter, the subjective prior uncer-
in which e is the lower bound parameter of the random variable X, a tainty distribution is elicited on the basis of regional information
is the upper bound parameter, and the transformed random vari- gathered from a large set of 1-day PMP estimates. Insights on the
able Y follows a normal distribution with parameters ðlY ; r2Y Þ, here- shape and the dispersion of the theoretical distribution of these
after denoted by l and r. From a theoretical point of view, a null quantities may obtained from a frequency histogram. As a rule of
value for the annual maximum rainfall volume is, as a matter of thumb, maximum-related hydrologic random variables are right-
fact, a possible realization of the phenomenon being studied. In skewed, which makes distributions such as the 2-parameter log-
addition, Fernandes et al. (2010) demonstrated that such an normal and the Gamma model reasonable candidates for fitting
assumption does not significantly affect the upper tail and the over- the sample of PMP estimates. In addition, it seems plausible to
all behavior of the LN4 model. Thus, we assume, with no loss of gen- admit, in view of our understanding of the physical processes
erality, that the lower bound is zero, in order to obtain a more encompassed by rainfall formation, that the variability pattern of
parsimonious model, with fewer prior uncertainty distributions to both PMP and upper bound is similar, as the PMP estimates,
be elicited. although affected by estimation uncertainties, are intended to syn-
The probability density function of a random variable X, dis- thetize the most extreme meteorological conditions that are likely
tributed according to the LN4 model and denoted by to occur in a given location. Thus, although the referred distribu-
X  LN4ðl; r; aÞ, with location parameter l 2 R, scale parameter tions are not expected to be identical, useful information on the
r 2 Rþ and upper bound parameter a 2 Rþ , is expressed as: prior distribution for the upper bound may be obtained from the
 i probabilistic model fitted to the PMP estimates.
a 1 h  x 
f X ðxjl; r; aÞ ¼ pffiffiffiffiffiffiffi  exp ln  l2
ð2Þ Adopting an unbounded right-skewed probabilistic model for
xða  xÞr 2p 2r2 ax summarizing the prior uncertainty on the upper bound parameter
might appear an incoherent modeling procedure. However, such a
The cumulative distribution function of the LN4 model is given
choice prevents the modeler from specifying, a priori, a finite range
by

for the domain of the upper bound parameter, which implicitly
1  x  l reflects our limited understanding on the processes of rainfall for-
F X ðxjl; r; aÞ ¼ U ln  ð3Þ
r ax r mation under extreme meteorological conditions. In addition, by
using a flexible light-tailed distribution for summarizing the
where U denote the standard normal distribution.
uncertainty on the upper bound, very low probabilities are attrib-
From a modeling perspective, the upper bound is the only
uted to those rainfall amounts that are considered physically
parameter of the LN4 model that can be physically related to mete-
implausible by the modeler. On the basis of these arguments, the
orological characteristics of the catchment. In this sense, informa-
Gamma distribution emerges as a convenient option for eliciting
tion regarding extreme precipitation amounts, such as maximum
the prior distribution of the upper bound parameter.
rainfall-related deterministic quantities or maximum precipitation
The Gamma distribution, with scale parameter ba 2 Rþ and
envelope curves, may be used for eliciting informative or subjec-
shape parameter qa 2 Rþ , may be fully described by: (1) an esti-
tive prior distributions on a. Location and scale parameters, l
mate of the regional coefficient of variation, CVa , since, from the
and r on the other hand, have no clear interpretation with respect
to hydrometeorological processes, and, as a result, eliciting subjec- method of moments, q ^ a ¼ CV2a ; and (2) a statement on the non-
tive prior uncertainty distributions on these parameters becomes a exceedance probability of the at-site PMP estimate, as hyperpa-
complex task. In fact, apart from expressing location and disper- rameter ba can be estimated from Pða 6 PMPjqa ; ba Þ ¼ p
sion measures, l and r also affect the skewness and the upper tail (Fernandes et al., 2010). If one assumes similarity between the
of the LN4 model, respectively. Due to this fact, it becomes difficult variability pattern of PMP and upper bound, CVa may be readily
to clearly associate these parameters to meteorological or hydro- obtained from the former’s sample summary statistics. Attributing
logical covariates, and, thus, aggregate prior data-independent a non-exceedance probability of the at-site PMP estimate, on the
knowledge to the analysis. Therefore, as no consistent prior infor- other hand, is a nontrivial task. In effect, to our knowledge, no
mation on l and r is available, non-informative or objective prior entirely objective techniques for this purpose are discussed in
distributions are prescribed for both parameters. the literature. Due to this fact, an ad hoc procedure is employed
Location parameter l may assume any value in the real number here. As water vapor availability in distinct locations is expected
line. Hence, an intuitive choice for the objective prior distribution to affect both extreme and aggregated annual rainfall amounts,
142 V. Costa, W. Fernandes / Journal of Hydrology 554 (2017) 137–154

all regional PMP estimates, including the at-site homologous, were autocorrelation be present in the residuals time series. In addition,
normalized by the mean annual rainfall, in order to account for from a theoretical point of view, such a serial correlation only dis-
spatial variability of precipitation. Next, empirical cumulative appears when a given reservoir in the model structure is com-
non-exceedance probabilities were attributed to the resulting vari- pletely emptied. This clearly violates the assumption of
ates with the use of the Weibull plotting position. Finally, the independent error variates. Second, most calibration criteria based
empirical probability obtained for the location of interest is used on a single set of parameters have proved unable to accommodate
in the estimation of ba . the distinct physical mechanisms that affects different parts of a
Fernandes et al. (2010) and Costa et al. (2015) also evaluated hydrograph, particularly those related to fast changes in input
the potential effects of eliciting a non-informative prior distribu- rates (Kavetski et al., 2003). This fact often entails distinct vari-
tion to the upper bound parameter. These studies showed that ances/ heteroscedasticity and skewed distributions for the residu-
the data, alone, are not able to provide useful information for esti- als, which, in turn, violates the assumption that such variables are
mating floods and precipitation upper bounds in the absence of identically distributed Gaussian variates (Schoups and Vrugt, 2010;
prior data-independent knowledge. In other words, after the appli- Silva et al., 2014).
cation of Bayes’ theorem, the posterior distribution of a closely Specifying a suitable probabilistic model for calibration residu-
resembles the prior counterpart. As a result, when using an objec- als has been a key research topic in hydrologic modeling. (Schoups
tive prior distribution and a quadratic loss function in parametric and Vrugt, 2010; Silva et al., 2014). In fact, the IID Gaussian
inference, Costa et al. (2015) obtained point estimates as large as assumption has been shown to introduce significant bias to param-
100,000 mm for the maximum daily rainfall upper bound, which eter estimates, which should probably result in a reduction of the
is virtually impossible in any catchment on Earth. Hence, in the predictive abilities of the hydrologic model (Kavetski et al., 2003;
application discussed in this paper, only the proposed informative Schoups and Vrugt, 2010; Silva et al., 2014). Thus, by employing
prior uncertainty distribution is employed. a distributional parametric form that is able to accommodate serial
correlation and skewed residuals distribution, one is expected to
obtain more reliable estimates of the hydrologic model parame-
2.3. Calibration of parameters of conceptual hydrologic models in a
ters. However, an appropriate error model, alone, may not be suf-
Bayesian framework
ficient for the correct identification of the parametric vector. In
effect, the need for accounting for input uncertainty as a means
Flood formation processes arise from a complex collection of
for reducing bias in parameter estimation has been highlighted
factors, which account for the variation, in both time and space,
in several studies (e.g., Kavetski et al., 2003 and references therein).
of rainfall, losses and storages in a given catchment. Due to the
In this context, an inference framework that allows separating and
complexity of these processes, it is admittedly unfeasible to quan-
explicitly treating input and model structure uncertainties appears
tify the hydrologic response of the catchment by integrating the
to be required.
differential equations that describe mass, momentum and energy
The outlined problem may be also properly addressed by resort-
transfers between phases of the water cycle, as a result of a partic-
ing to the Bayesian school of inference. In fact, observed inputs are
ular input condition. Thus, it is usual to resort to simplified math-
usually areal averaged-based estimates, which are expected to be
ematical schemes, which describe water transfers by routing the
corrupted by measurement errors. Throughout routing procedures
inputs through linear or nonlinear fictional reservoirs. This is the
in hydrologic simulation, input errors are propagated through the
rationale behind conceptual rainfall-runoff models, which can be
model structure. In addition, the hydrologic model is itself, at best,
generally expressed, at each simulation time step, as
a simplification of the real physical process and, thus, it also intro-
~t ¼ hðXjhÞt þ et
y ð4Þ duces errors to the analysis. Both error sources are to be combined
in order to estimate the catchment simulated response. Such a sit-
where y ~t is the observed streamflow at time t, hðXjhÞt is the hydro- uation is illustrated in Fig. 1. If data-independent prior uncertainty
logic model response, X denotes the model input vector (which usu- distributions are elicited for input, model structure, model param-
ally comprises mean areal precipitation and evapotranspiration), h eters and calibration residuals, Bayes’ theorem allows one to obtain
corresponds the model parametric vector and et is a random error a joint posterior distribution, conditioned on both observed input
term. Parameters of hydrologic models are intended to synthetize and output records, which summarizes the updated knowledge
the routing properties of unsaturated and saturated soil phases, as on the referred quantities. Furthermore, by integrating such a dis-
well as the ones of the surface runoff counterpart, in the catchment tribution along the parametric space, the predictive uncertainty of
response. For most models, however, parameters are not measur- each simulated daily streamflow may be properly quantified.
able quantities. As a result, the parametric vector is generally iden- In addition to prior uncertainty distributions, the application of
tified by means of nonlinear regression models, constructed in such the Bayesian inference framework requires the specification of a
a way that the differences between simulated and recorded daily likelihood function, which, in the context of rainfall-runoff model-
streamflows are minimized. Such an expedient is denoted ing, describes the degree of similarity between simulated and
calibration. observed streamflows. Schoups and Vrugt (2010) introduced the
Calibration of parameters of conceptual hydrologic models has generalized log-likelihood function, which is derived from an auto-
been traditionally addressed as an optimization problem correlated skewed model for calibration residuals. In such a prob-
(Kavetski et al., 2003). In effect, the modeler main target is usually abilistic model, simulation of daily streamflow is performed on the
estimating a set of parameters that minimizes the simulation basis of a nonlinear regression additive model, which is expressed
residuals, with respect to a particular objective-function, by means as
of a regression model that does not consider input uncertainty. In
this sense, if one assumes that the calibration residuals are inde- ~ ¼Eþe
Y ð5Þ
pendent Gaussian variates, a conventional least squares procedure
suffices for providing unbiased estimators for the rainfall-runoff where Y~ is a vector of observed streamflows, E denotes the vector of
model parameter vector. However, such an approach ignores rele- flow expected values, which is obtained from the hydrologic model,
vant aspects of the hydrologic model structure. First, since water and e is a vector of zero mean modeling residuals, which accounts
transfers are mainly governed by routing of inputs through the for model structure errors. At each time step, flow expected values
model reservoirs, it is intuitively expected that some degree of are given by
V. Costa, W. Fernandes / Journal of Hydrology 554 (2017) 137–154 143

Fig. 1. Propagation of errors in hydrological modeling. Adap. from Kavetski et al. (2003).

Et ¼ Y h;t ðXjhÞ lt ð6Þ and Vrugt, 2010). The SEP distribution constitutes a useful paramet-
ric form for describing calibration residuals as it relaxes the tradi-
in which Y h;t ðXjhÞ is the simulated streamflow and lt is a multi-
tional IID Gaussian assumption for the errors and, at the same
plicative bias factor, which is intended to account for input model
time, has the normal probability density as a limiting distribution.
errors by amplifying the nonlinearity of the catchment response
In addition, as the SEP distribution may present heavier tails than
(Schoups and Vrugt, 2010) and is expressed as
the Gaussian error model, parameter inference may be effectively
lt ¼ expðlh Y h; t Þ ð7Þ more robust to the presence of outliers, which, in turn, can improve
the predictive abilities of the rainfall-runoff model.
where lh is a bias parameter which is inferred from the data. As for The log-likelihood function of the outlined error model, denoted
calibrations residuals, which are described by a joint probability generalized log-likelihood function (GL), can thus be expressed as
density function and a parametric vector he , the following model (Schoups and Vrugt, 2010)
is proposed:
X X
~ hÞ ¼ N ln 2rn xb 
N N
Up ðBÞet ¼ rt at lnðrt Þ  cb jan; t j1þb
2
ð8Þ LðY 1
ð11Þ
nþn t¼1 t¼1
Pp
in which the term Up ðBÞ ¼ 1  i¼1 ui Bi is an autoregressive poly-
in which N is the sample size, as expressed by the number of daily
nomial of order p, which is intended to remove serial correlation;
time spans of the training period. Eq. (11) is a conditional form of
B is the backshift operator; rt is the standard deviation at time t,
the likelihood function, which was shown to present close agree-
expressed as
ment with the exact counterpart for large N (Sorooshian and
rt ¼ r0 þ r1 E t ð9Þ Dracup, 1980), this being typically the case in rainfall-runoff
modeling.
with parameters r0 and r1 , and whose purpose is handling
Performance evaluation of the described calibration procedure
heteroscedasticity; and at is an IID random error term, with zero
is based on the following indexes: (1) root mean squared error
mean and unity standard deviation, described by the skew expo-
(RMSE); (2) ratio of simulated and observed volumes (RV); (3)
nential power (SEP) density function, and is intended to accommo-
Nash-Sutcliffe criterion (NS); and (4) Pearson correlation coeffi-
date nonnormality of residuals. The probability density function of a
cient (r). In addition, a visual assessment of the residuals behavior
SEP distributed variate at, with shape parameter n > 0 and kurtosis
is performed in order to validate the overall set of prior modeling
parameter b, defined in (1,1), is given by
assumptions. In these procedures, the modes of the marginal pos-
2rn n o terior distributions are utilized as point estimates for the parame-
xb exp cb jan; t j1þb
2
pðat jn; bÞ ¼ 1
ð10Þ ters of the rainfall-runoff model.
nþn

where an; t ¼ nsignðln þrn tÞ , and ln ; rn ; cb xb and xb are computed as 2.4. Construction of flood quantile curves
functions of n and b (see Schoups and Vrugt, 2010, for details).
Parameter b controls the peakedness of the PDF whereas parameter The final step of the proposed method consists in estimating the
n is related to its skewness. The PDF is symmetric for n ¼ 1, posi- flood quantile curves and the corresponding predictive uncer-
tively skewed for n > 1 and negatively skewed otherwise. If n ¼ 1, tainty. In standard Bayesian flood frequency analysis, such a proce-
then a Gaussian distribution results when b ¼ 0, a uniform distribu- dure is performed by integrating the inverse cumulative
tion when b ¼ 1 and a Laplace distribution when b ¼ 1 (Schoups distribution function of the floods distributional model, at each
144 V. Costa, W. Fernandes / Journal of Hydrology 554 (2017) 137–154

quantile of interest, along the entire parametric space. The result- due to the influence of elevation. As an example, in the Sacramento
ing quantile curve arises from the predictive posterior distribution Valley the mean annual precipitation is approximately 460 mm
and expresses the variability of both flood formation processes and whilst in the crests of Sierra Nevada such a quantity assumes val-
parameter estimates. For rainfall-runoff modeling, on the other ues around 1,800 mm.
hand, the predictive uncertainty may be estimated through the In addition to rainfall, snowmelt is also a contributing factor for
integration of the hydrologic model parameter joint posterior dis- runoff production, particularly in the upper portions of the Amer-
tribution at each simulated daily streamflow. In our modeling ican river catchment. Nonetheless, USBR (2002) highlights the lim-
approach, however, none of the outlined strategies for assessing ited role of snowmelt in large flood dynamics at Folsom Dam. In
uncertainty is entirely appropriate. In fact, as flood quantiles are fact, the technical report states that the snowmelt runoff rates
indirectly estimated from hydrologic simulation, an alternative are usually small and, as a result, most flooding conditions at this
uncertainty estimation framework, which comprises characteris- location are related to rainfall events. Due to this fact, the effects
tics of both modeling strategies, is necessary. of snowmelt will be neglected in all subsequent analyses of this
An ad hoc procedure in this context consists in simulating the paper.
rainfall-runoff model for a large set of synthetic daily rainfall series Streamflows at Folsom Dam are strongly regulated. According
and different parameter point estimates, as obtained from the joint to USBR (2002), there exist 58 reservoirs upstream of this river sec-
posterior distribution, for the hydrologic model parametric vector. tion. Five of them, completed by the early 1960’s, control 90% of
For each model run, a set of notable flood quantiles is computed. the storage upstream the Folsom Dam cross section and corre-
Next, for a given flood quantile, the resulting estimates are ranked, spond to 14% of the drainage area of the catchment. Studies of
and, finally, the order statistics corresponding to empirical cumu- the United States Army Corps of Engineers (USACE, 1998) estimate
lative probabilities of 0.025, 0.50 and 0.975 are extracted from that, during large storm events, the set of upstream storage struc-
the quantile samples in order to define a median curve and its tures entails a reduction of 14%–18% in the peaks of inflow
credible intervals. This line of reasoning is intended to approxi- hydrographs.
mate the integrals of the posterior distribution along the paramet- Systematic streamflow gauging is available since the beginning
ric space at each quantile of interest. of the 20th century at Fair Oaks gauging station (code
For the application discussed in the next section, an ensemble of USGS11446500), which is located immediately downstream the
1,000 time series, with size 10,000 years, is simulated with the Folsom lake and has an incremental area of 70 km2 as compared
mixed model for daily rainfall generation. Each of these series is to river section at Folsom Dam. Discharge estimates in these loca-
assigned a specific realization of the joint distribution of the hydro- tions are usually considered equivalent (NRC, 1999). Daily stream-
logic model parameters, which is obtained from calibration. Then, flow data are available in https://waterdata.usgs.gov. For the
the quantiles corresponding to return periods of 2, 5, 10, 25, 50, purposes of lumped hydrologic simulation, unregulated daily
100, 500, 1,000 and 10,000 years are estimated along with the pre- streamflows are required. As limited information on private dam
dictive uncertainty. The resulting curves are then compared to systems is at disposal and the five largest reservoirs in the catch-
available empirical estimates and other flood estimation methods, ment were only fully implemented in the 1960’s, we considered
such as Bayesian flood frequency analysis encompassing bounded that all discharges measured prior to this date are approximately
and unbounded distributional models. unregulated. Thus, a streamflow training dataset, comprising the
water years between 1940 and 1945, was employed for applying
the Bayesian calibration framework, and a 3-year period, encom-
3. Case study passing the water years between 1945 and 1948, was utilized for
validation. The streamflow sample does not present missing
3.1. Study area and dataset values.
In addition to systematic data, a collection of paleoflood esti-
The method for indirect estimation of extreme flood quantiles mates is available at the Fair Oaks streamflow gauging station.
was applied to the American river catchment, at a river cross sec- By resorting to geological evidence and two-dimension hydraulic
tion defined by the Folsom Dam, whose construction was com- modeling, USBR (2002) reconstructed at least five major floods at
pleted in 1955. The Folsom Dam is located upstream of the city the Folsom Dam section: one of them in 1862, three of them
of Sacramento, California. At this point, the catchment encom- between 150 and 700 years ago, and the largest one between 700
passes a drainage area of approximately 4,820 km2, which com- and 2000 ago (using as reference the year 2000). Apart from the
prises mountainous regions with glaciated peaks in the upper 1862 event, the remaining floods have magnitudes up to three
basin, forests in the middle portion and alluvial plains nearby the times higher than the maximum unregulated observed record,
Folsom lake (USBR, 2002). The American river catchment was which makes such events useful for evaluating the predictive abil-
selected for the present application due to the availability of sys- ity of the proposed method outside the range of systematic
tematic records, of both streamflow and rainfall, as well as non- records. Readers are referred to USBR (2002) for details in the
systematic flood information, which plays an important role in reconstruction of paleofloods in the American river.
the description of the upper tail of the maximum floods distribu- As for daily rainfall data, a set of 5 rain gauging stations was
tion. In addition, several flood-related hazard studies have been selected for computing the mean areal rainfall by means of the
conducted at the Folsom Dam (USBR, 2002), which allows a com- Thiessen polygons method: Blue Canyon (code USW00023225),
parison of performances for different modeling approaches. Colfax (code CA95713), Placerville (code USC00046960), Represa
Climate in the American river basin is mostly classified as (code USC00047370) and Twin Lakes (code USC00049105). Daily
Mediterranean, with a wet season in the winter months (between rainfall information was obtained in http://www1.ncdc.noaa.gov.
November and April), in which about 90% of runoff-producing The training dataset for stochastic daily rainfall generation com-
storms occur (USACE, 1987), and a hot dry season throughout the prised the years between 1961 and 1995, which corresponds to
remainder of the year. According to USBR (2002), extreme flood- the largest time period-of-record with no missing values in all
producing winter storms are primarily related to westerly warm gauging stations. Time spans coincident with those proposed for
moist Pacific systems, which interact at nearly right angles with systematic streamflow records, 1940–1945 and 1945–1948, were
the Sierra Nevada, resulting in intensification of orographic effects. selected, respectively, for calibration and validation of the
Mean annual precipitation (MAP) greatly varies in the catchment rainfall-runoff model.
V. Costa, W. Fernandes / Journal of Hydrology 554 (2017) 137–154 145

Finally, evaporation data were obtained from the technical Filliben), resulted in not rejecting the null hypothesis that the
report TR NWS 34 (NOAA, 1982), which provides a summary of PMP sample may be extracted from a Gamma population at the
monthly potential evaporation rates at several locations in the Uni- significance level of 5%. As the patterns of variability of both upper
ted States. As the daily time span is used for hydrologic modeling, bound and PMP estimates are considered similar in our modeling
evaporation estimates in monthly basis were equally partitioned strategy, the Gamma model appears to be a reasonable choice for
between each day of the month being simulated. Although, at first summarizing the uncertainty regarding a and is, thus, selected
sight, this procedure may introduce large input uncertainties to the for statistical inference. However, as mentioned earlier, the
analysis, Kavetski et al. (2003) and Silva et al. (2014) state that prior distribution of the upper bound parameter is not expected
most calibration procedures are somewhat insensitive to changes to match that of PMP estimates and hence estimates for the
in evapotranspiration rates. Hence, in the present application, we hyperparameters of the upper bound prior distribution are still
assume the uncertainty in daily evaporation measurements is at required.
least partially accommodated in the prior distribution elicited for From the set of reduced PMP estimates, summary statistics
the Rio Grande model parameter k. were computed and a value of 0.304 was obtained for the sample
regional coefficient of variation, which, in the proposed modeling
3.2. Estimation of the parameters of the LN4 model framework, is assumed to equal that of the daily precipitation
upper bound prior distribution, CV a . By employing the method of
Estimation of the parameters of the LN4 model is based on the moments, it follows that qa ¼ 10:821. Estimation of parameter ba
Bayesian framework, in which prior uncertainty distribution are relied on assigning a non-exceedance probability of the at-site esti-
elicited for a, l and r. As stated in Section 2.2, non-informative mate, Pða < 370j qa ; ba Þ. For this purpose, the 39 PMP estimates
prior distributions are utilized for location and scale parameters. were normalized by the corresponding mean annual rainfall, in
In this application, we have considered that l  NORMALð1:0; order to account for spatial variation of moisture availability, and
1:0  106 Þ and r  GAMMAð1:0; 1:0  108 Þ, where the hyperpa- empirical probabilities, as obtained from the Weibull plotting posi-
rameter values refer, respectively, to the mean and the precision tion, were attributed to the ranked adjusted estimates. Following
of the Gaussian variate and the scale and shape parameters of this line of reasoning, the non-exceedance probability for the at-
the Gamma counterpart (Fernandes et al., 2010; Costa et al., site PMP estimate was found to be 0.084 and thus ba ¼ 0:0181.
2015). As for the upper bound parameter, regional information The elicited prior distribution for the upper bound parameter is
on 1-day PMP estimates is required for constructing the subjective therefore a  GAMMAð10:821; 0:0181Þ:
prior uncertainty distribution. A collection of 39 meteorological The posterior distribution of the LN4 parameters is then
PMP estimates in the state of California is provided by the hydrom- expressed as
eteorological report HMR 59 (NOAA, 1999), including the at-site  N

1 a 1
1-day PMP estimate at Folsom Dam, which corresponds to 370 mm. pða; l; rjXÞ / pffiffiffiffiffiffiffi p
Since the locations in which PMP estimation was performed encom- 2p r xða  xÞ
( )
pass very distinct drainage areas, Depth-Area-Duration (DAD) rela- X 1 ln x  l
2
N
ax
 exp 
tions were used for reducing the PMP estimates to appropriate
i¼1
2 r
values at the Folsom Dam river section, according to the procedure q0

suggested by the HMR 59 (NOAA, 1999). b0


 aq0 1 expðb0 aÞ
A frequency histogram was constructed for the reduced PMP Cðq0 Þ
( "  2 #)
estimates. Such a graphic tool is presented in Fig. 2, along with
1 1 1 l  l1
the PDF of a Gamma distribution function fitted to the PMP sample.  pffiffiffiffiffiffiffi exp 
It is possible to observe that the histogram is bimodal and slightly 2p r1 2 r1
" #
skewed to the right. Most parametric distributions of frequent use q1
b1
 r q1 1
expðb1 rÞ ð12Þ
in hydrology are not able to accommodate sample bimodality. Cðq1 Þ
However, the Gamma distribution provides an appropriate fit as
the positive skewness property is concerned. In addition, where b0 and q0 are the hyperparameters of the prior distribution of
goodness-of-fit hypothesis tests, with different power of discrimi- a, l1 and r1 are the hyperparameters of the prior distribution of l,
nation in the tails (Kolmogorov-Smirnov, Anderson-Darling and and b1 and q1 are the hyperparameters of the prior distribution of r.
The posterior distribution of the LN4 parameters was explored
by sampling Markov Chain Monte Carlo (MCMC) algorithms.
Numerical simulations were performed with software WinBUGS
(Lunn et al., 2000). Calculations of the Brooks-Gelman-Rubin statis-
tic (Brooks and Gelman, 1998) showed that the Markov chains
attained convergence after approximately 40,000 iterations. In
addition, a trial-and error procedure demonstrated that a lag of
20 is suitable for producing non-correlated samples of the joint
posterior distribution. Thus, after discarding the burn-in samples
and applying the selected lag, a sample of size 50,000 was retained
for subsequent analyses. Plots of marginal variation of parameter
values (not shown here) attest that, as desired, no tendencies or
changes in variance are verified in the samples of the posterior
distribution.
Table 1 summarizes the some posterior results for parameters
a, l and r. One can observe that a physically reasonable point esti-
mate is obtained for the upper bound parameter, which should
Fig. 2. Empirical distribution of reduced 1-day PMP estimates and the Gamma prevent the simulation of exceptionally large rainfall amounts
distribution fitted to the PMP sample. from the posterior descriptive distribution in the parametric
146 V. Costa, W. Fernandes / Journal of Hydrology 554 (2017) 137–154

Table 1 amounts to the range of the observed records. The resampling win-
Posterior summaries for the LN4 parameters. dow size, in turn, controls the variability of low to moderate daily
Parameter Mean SD CV 95%HPDa rainfall volumes. A small window will reduce the computational
r 0.424 0.083 0.196 (0.300;0.622) effort. However, as few records are available for resampling, the
l 1.795 0.430 0.240 (2.518; 0.824) simulations may not suitably capture the entire range of variation
a 489.4 162.8 0.332 (218.8;852.2) of daily rainfall amounts. Thus, for simulation efficiency of very
a
95% HPD – Highest posterior density interval (95% credible interval).
long series (e.g., 10,000 years), a tradeoff between descriptive abil-
ity and calculation demands must be reached in the generation
process.
module of the stochastic daily rainfall generator. In addition, such For selecting the threshold, median quantile curves were
an estimate is about 32% larger than the 1-day PMP counterpart. It obtained from ensembles of 1,000 35-year long daily rainfall series,
is then possible to infer that attributing a null exceedance proba- simulated for several threshold levels, and compared to the empir-
bility to the at-site PMP estimate might lead to ill-posed estimates ical quantiles, with non-exceedance probabilities given by the
of extreme rainfall quantiles. It is also worth mentioning that the Weibull plotting position, by means of the RMSE criterion. The
posterior distribution of a assigns a higher non-exceedance proba- optimum value of the performance index was attained for a thresh-
bility to the at-site PMP estimate, as compared to the prior homol- old of approximately 80 mm. It has to be noted that such a level
ogous. This fact results from the limited range of the records of hinders the occurrence of extreme rainfall during the months of
annual daily rainfall block-maxima. In effect, as the maximum April to October. Nonetheless, as the largest flood events are
observed value is well below the at-site PMP estimate in the ran- reported to occur during the winter (USBR, 2002), we considered
dom variable domain, the posterior distribution is expected to that such a constraint does not significantly affect the indirect esti-
move to the left in the real number line. mation of extreme flood quantiles and that the nonparametric
As no prior information was available for the location and scale module of the mixed generation model suffices for simulating daily
parameter, it is unfeasible to render an objective evaluation of the rainfall in the dry season.
posterior inference of l and r. It has to be noted, however, that the As for the resampling window size, partition intervals of 7, 14
marginal posterior distributions of l and r are unimodal and and 28 days were tested. Simulations of median curves of 1,000
approximately symmetric. model runs showed that mean daily rainfall, in each month of
the year, is properly reproduced for any of the proposed intervals.
3.3. Calibration and performance evaluation of the mixed model for However, daily standard deviations in winter months are underes-
stochastic daily rainfall generation timated for the 7-day window size. The partition intervals of 14
and 28 days, in turn, led to very similar performances regarding
Calibration of the mixed model for stochastic daily rainfall gen- daily variability. Thus, in order to reduce the computational effort,
eration requires the selection of a threshold between low to mod- the window size of 14 days was selected.
erate and extreme rainfall and the definition of the resampling Daily summary statistics were then computed for the calibrated
window size for the former quantities. The threshold level directly model, considering an ensemble of 1,000 35-year long series. Fig. 3
affects the behavior of the upper tail of the distribution of maxi- presents some results. Mean values of daily rainfall are properly
mum daily precipitation. In fact, a very low value for the threshold reproduced by the mixed model (Fig. 3a) during the entire year,
implies that rainfall volumes will be frequently sampled from the and the 95% credible intervals show little dispersion around the
parametric module of the generator and, thus, overestimation of central value, particularly in the dry season. Point estimates of
precipitation quantiles for most return periods is likely to occur. daily coefficients of variation (Fig. 3b) also closely agree with the
A very high value for the threshold, on the other hand, inhibits empirical counterpart, with a single mismatch in July. Wider vari-
extreme rainfall generation and may constrain the generated ation intervals, however, are obtained in the summer months. As

Fig. 3. Comparison of statistics of the simulated series (dashed line) and the observed one (continuous line) for each month of the year (a) mean daily rainfall; (b) coefficient
of variation; (c) coefficient of skewness; (d) average number of wet days; and (e) maximum daily rainfall. The X axis corresponds to the months of the year.
V. Costa, W. Fernandes / Journal of Hydrology 554 (2017) 137–154 147

for the coefficient of skewness (Fig. 3c), the empirical estimates Table 2
were suitably reproduced only in the wet season. Nonetheless, Prior uncertainty distributions for the parameters of the hydrological and the
calibration residuals.
coefficients of skewness of magnitude as high as 10 were simu-
lated by the mixed model. In addition, all monthly estimates, apart Model Parameter Lower Upper
from that of July, are contained inside the 95% credible intervals, Bound Bound

indicating that the mixed generator is able to model even complex Rio Grande k 0.500 1.000
empirical distributions. The mean number of wet days per month imp 0.000 0.030
wum 5.000 150.000
(Fig. 3d), in turn, was consistently underestimated by the simula- wlm 50.000 250.000
tion algorithm. This fact is probably related to the daily time span wdm 5.000 150.000
employed in the construction of the Markov chains, as most mod- sm 40.000 120.000
els based on monthly (or longer) time spans are effective in simu- b 0.100 1.000
ex 0.100 2.000
lating such a statistic.
c 0.010 1.000
The mixed model also proved to be able to reproduce the max- kss 0.050 0.350
imum daily rainfall amounts in almost all months of the year kg 0.050 0.650
(Fig. 3e). Exceptions are made for January. During the dry season, ci 0.400 0.990
daily maxima estimates are obtained from the nonparametric cg 0.980 0.999
Residuals r0 2.000 1.000
module of the generator, which hinders the possibilities of extrap-
r1 0.000 1.000
olation in those months. However, in all winter months the daily b 0.900 1.000
rainfall maxima are simulated from the LN4 model, which indi- n 0.100 10.000
cates the suitability of the calibrated threshold, at least for the /1 0.000 1.000
period-of-record utilized for training the mixed model. From these lg 0.000 100.000

results, it seems reasonable to assume, at least for the present


dataset, that the LN4 distribution provides an appropriate descrip-
tion of maximum daily rainfall from the threshold up to the upper assumed prior uncertainty distributions for both hydrologic and
bound. calibration residuals models.
Similar results (not show here) were obtained for the ensemble The joint posterior distribution was again simulated with
of 10,000-year long time series, except for annual daily rainfall MCMC sampling algorithms. In this modeling step, the DREAM
maximum estimates, which, as expected, are much larger (e.g., (Differential Evolution Adaptive Metropolis) algorithm, developed
median value of 240 mm in February) than the ones obtained from by Vrugt et al. (2008), was employed for numerical simulations.
the 35-year long series. This fact highlights the adequacy of the The DREAM algorithm is a well-suited tool for hydrologic modeling
mixed model for simulating very long series while preserving since it allows the simulation of several Markov chains in parallel,
important statistical features of the original sample. for exploring the multidimensional parametric space more effi-
As the focus of this paper is the modeling of extreme rainfall ciently. It also adjusts the shape and scale of the proposal distribu-
events, summary statistics related to time intervals other than tion throughout the simulations, in order to accelerate the
the day are not addressed here. However, the mixed model shares convergence to the equilibrium distribution. In the calibration pro-
useful properties with strictly nonparametric models, which cess, we considered 10 parallel Markov chains and accepted con-
allows it to suitably reproduce the variance of monthly and annual vergence when the Brooks-Gelman-Rubin statistic attained the
rainfall solely by aggregating the daily amounts (Costa et al., 2015). value of 1.01, which occurred after approximately 80,000 itera-
Such a property is not enjoyed by most parametric counterparts tions. We also considered a lag of 50 for removing serial correla-
(Boughton, 1999) and provide broader scopes for utilizing the tion, after which a sample of size 1,000 was retained for
mixed model in simulation procedures. hydrologic simulation. Fig. 4 illustrates the marginal posterior dis-
tributions for the parameters of the Rio Grande rainfall-runoff
3.4. Calibration of the Rio Grande rainfall-runoff model model.
Evaluating the marginal effects of the parameters in the result-
The runoff production module of the Rio Grande rainfall runoff ing hydrograph is not a trivial procedure due to the complex struc-
model requires the calibration of 13 parameters (see Appendix A ture of dependence between them. In this sense, the joint variation
for details). In the proposed Bayesian framework, each of these of a given set of parameters might impose opposite nonlinear
parameters is assigned a prior uniform uncertainty distribution. effects in a particular range of the simulated streamflows, which
Prior data-independent knowledge may be acquired from previous could attenuate or even conceal the influence of a given parameter
studies with the referred hydrologic model. Benchmark model in the model output. Nonetheless, the marginal behavior of the
applications (Zhijia et al., 2013; Lü et al., 2013; Costa et al., 2014; parameter posterior distributions may provide insights on the
Silva et al., 2014) provide a suitable spectra for defining uniformly updated knowledge regarding the formation of hydrographs. It is
distributed variation ranges for the model parameters. Obviously, possible to observe from Fig. 4 that the parameters that control
some degree of tuning on these ranges is desired for simulation the water distribution in the soil, wum, wlm and wdm, are strongly
efficiency. Hence, the largest possible ranges were initially admit- concentrated in the vicinities of the upper bounds assumed a pri-
ted and whenever an optimum interval was highlighted by the ori. For practical purposes, this limits their influence in the estima-
benchmark applications or convergence was not attained after tion of daily streamflow predictive uncertainty, as little variation is
the maximum pre-defined number of iterations, the initial range expected in their values when exploring the joint posterior distri-
was reduced. This should ascertain that as little prior information bution. A similar condition concerns parameters ci, cg, sm and kg: a
as possible is aggregated to inference. A similar procedure was significant reduction in the range of variation initially admitted
adopted for defining the uniformly distributed ranges of variation and a relative degree of pooling nearby the marginal distribution
for the parameters of the generalized log-likelihood function, with modes are also verified, albeit, in these cases, they do not necessar-
benchmark model applications provided by Schoups and Vrugt ily correspond to the prior ranges upper bounds. Parameter EX
(2010) and Silva et al. (2014). An additional remark with respect remained approximately uniformly distributed, although its range
to the error model is that autoregressive polynomials of orders of variation was reduced to about 25% of that considered in the
1–4 were considered in the simulations. Table 2 presents the prior distribution, assuming values between 1.5 and 2.0. The
148 V. Costa, W. Fernandes / Journal of Hydrology 554 (2017) 137–154

Fig. 4. Marginal posterior distributions for the parameters of the Rio Grande rainfall-runoff model.

remaining parameters presented marginal posterior distributions sented a higher degree of variability in the posterior results, will
approximately bell-shaped and symmetric around the mean value have a more pronounced effect in the calculations of the predictive
of the prior uniform interval. It is also worth mentioning that no uncertainty of daily streamflow estimates.
significant reduction on the prior admitted range was observed Fig. 5 presents a comparison between the observed and the sim-
for k, imp, b, c and kss (at least for parameter k, such a behavior ulated hydrographs. The modes of the marginal posterior distribu-
was expected as large uncertainties are present in evaporation data tions were used as point estimates for the model parameters in the
and they were not accounted for by means of a prior distribution). hydrologic simulation. A visual assessment readily shows that
Therefore, one may expect that the latter 6 parameters, which pre- most peak flows are properly reproduced by the Rio Grande rainfall
V. Costa, W. Fernandes / Journal of Hydrology 554 (2017) 137–154 149

as point estimates for the model parameters. One may notice that
the Rio Grande rainfall-runoff model explained about 88% of the
daily streamflow variation for the training dataset and 81% for
the testing counterpart. As index NS imposes larger weights for
flows of higher magnitude, it is possible to infer that flood events
are also reasonably modeled in the validation phase. In addition,
no significant mismatches were verified in the ratios between sim-
ulated and observed flows, for daily duration. In fact, a slight ten-
dency of underestimation is perceived, probably due to
misrepresentations of the peak flow contribution in a reduced set
of large floods between 1940 and 1942. Finally, the relatively high
values of the Pearson correlation coefficient, r, for both calibration
and validation datasets, indicate the time coherence is preserved in
the simulations.
Fig. 6 depicts posterior inferences on the calibration residuals. A
plot of simulated versus observed daily streamflows is shown in the
Fig. 5. Simulated and observed hydrographs for the model training period. upper left panel (Fig. 6a). One may notice that, apart from a small
number of points, an overall low degree of scatter is verified, which
demonstrates the reasonable predictive ability of the hydrologic
Table 3
Performance indexes for rainfall-runoff modeling in calibration and validation
model in all portions of the simulated hydrograph. Fig. 6b illus-
periods. trates the behavior of the normalized residuals with respect to
the observed streamflows. It is possible to observe that the
Performance Calibration Validation
Index
heteroscedasticity was successfully removed. In fact, the larger
normalized errors are related to the low flow portion of the hydro-
NS 0.882 0.810
RMSE 21.657 30.489
graph. In other words, at least in this case, the assumption of vari-
VR 0.957 0.928 able variance improved the representation of the peak flows. The
r 0.922 0.886 lower left panel (Fig. 6c) presents the autocorrelation function of
the calibration residuals, with the 95% confidence intervals, from
lags 1–10. It is possible to observe that significant serial correla-
model during the training period. Of particular interest, the highest tions occur for most lags, and no apparent tendency of decay with
simulated flows are, for most cases, in close agreement with the the increase of the lag order was verified. Hence, even an AR(4)
observed flows. This fact suggests that the hydrologic model is able time series model was unable to remove the serial correlation of
to capture, at least to some extent, the main characteristics of large the modeling errors. This fact makes evident the complex time
flood formation dynamics in the catchment. Table 3 provides a dependency structure of calibration residuals. It is possible that
summary of performance indexes for calibration and validation an ARMA model, as suggested by Schoups and Vrugt (2010), be
periods, also using the modes of marginal posterior distributions more appropriate for modeling the residuals time series. However,

Fig. 6. Evaluation of the behavior of calibration residuals. (a) simulated versus observed daily streamflows; (b) plot of the normalized residual versus observed daily
streamflow; (c) residuals autocorrelation function; and (d) empirical (continuous line) and theoretical (dashed line) distributions of the residuals.
150 V. Costa, W. Fernandes / Journal of Hydrology 554 (2017) 137–154

as most serial correlation coefficients have low magnitude, the bias Fig. 8 presents the results of flood frequency analysis for three
introduced in parameter estimates should not be large (Silva et al., different Bayesian modeling approaches, in comparison to the pro-
2014). Finally, Fig. 6d illustrates the close agreement between the posed estimation method. The upper panel (Fig. 8a) depicts the
empirical frequency histogram of calibration residuals, which is quantile curve and uncertainty bands for a positively skewed
slightly right-skewed, and the theoretical probabilistic model. In (therefore unbounded) log-Pearson type 3 distribution (LP3),
effect, the SEP model was able to describe shape and dispersion which originated from the study of USBR (2002). The middle panel
characteristics of the empirical frequency distribution. The (Fig. 8b) comprises point and interval quantile estimation from
skewed-to-the-right property makes inference more robust to out- a GEV distribution, with prior uncertainty on the shape parameter
liers, which, in essence, avoids ill-posed regression models. In addi- summarized by the geophysical prior distribution (Martins and
tion, the right-skewed SEP model suggests that the calibration Stedinger, 2000). Such a model was employed as a benchmark
residuals are heavy-tailed, which is consistent with several previ- unbounded parametric form in the work of Fernandes et al.
ous studies, such as Yang et al. (2007) and Schoups and Vrugt (2010). At last, the lower panel (Fig. 8c) presents the results of
(2010). From these results, it seems reasonable to assume that the study of Fernandes et al. (2010), which made use of a bounded
the joint posterior distribution, as obtained with the generalized model, the LN4 distribution, with subjective prior distribution for
log-likelihood function, provide a reliable uncertainty account for the upper bound parameter elicited on the basis of regional PMF
the model parameters in the light of the information gathered by information. All three approaches included non-systematic
the data, which, according to Schoups and Vrugt (2010), is critical
for the estimation of the predictive uncertainty.
After calibration and validation of the hydrologic model, the
flood quantile curves were constructed. For this purpose, each of
the 10,000-year long synthetic daily rainfall series, as obtained
from an ensemble of 1,000 runs of the mixed model, was routed
into the rainfall-runoff model with a specific parametric vector,
and the quantiles of interest were estimated from the daily stream-
flow time series. Fig. 7 presents the resulting median curve along
with the 95% credible intervals. Empirical estimates are shown as
circles whereas vertical bars denote the uncertainty ranges of the
hydraulic modeling for non-systematic data. The plotting positions
for both systematic and non-systematic flood events were
obtained from Fernandes et al. (2010). It is possible to observe that
the median flood quantile curve (continuous line) provide accurate
estimates for annual maximum floods for most return periods,
including some of the paleoflood events. In effect, only the point
estimate for the largest non-systematic flood was significantly
underestimated with the proposed framework. Nonetheless, the
95% credible intervals for such an event contain the entire uncer-
tainty range imposed by hydraulic modeling. Such a situation also
holds for other non-systematic events, albeit small departures
from the estimated uncertainty bands are verified for two pale-
ofloods. This possibly indicates that uncertainties resulting from
meteorological processes randomness, the structure of the utilized
models and overall parameter estimation were relatively well
quantified and propagated during hydrologic synthesis, which
should attest, at least for this application, the adequacy of the
method for indirect extreme flood estimation.

Fig. 8. Quantile curves for different inference approaches: (a) indirect flood
Fig. 7. Simulated median flood quantile curve (continuous line) and 95% credible estimation method as compared to Bayesian FFA with LP III; (b) indirect flood
intervals (dashed lines) Circles represent empirical maximum flood quantiles and estimation method as compared to Bayesian FFA with GEV; and (c) indirect flood
vertical bars comprise uncertainty ranges for non-systematic data. estimation method as compared to Bayesian FFA with LN4
V. Costa, W. Fernandes / Journal of Hydrology 554 (2017) 137–154 151

information in the likelihood function. As such, useful information theoretical foundations of the method rely on the assumption that
regarding the annual maximum floods upper tail were at disposal, daily precipitation is a physically limited quantity, and, as such,
which should improve point and interval estimates of rare and should be modeled by a probability distribution with bounded
extreme flood quantiles as compared to the case in which only upper tail. A modeling strategy based on aggregating regional
the systematic record is utilized for inference. information from a large set of 1-day PMP estimates was employed
As the assumptions on which the outlined approaches and the for eliciting a subjective prior uncertainty distribution on the upper
proposed method were based on are not similar, and distinct uncer- bound parameter, and the posterior descriptive distribution of the
tainty sources were considered in each study, it is not possible to ren- annual block-maxima bounded model is then used as a functional
der a direct performance comparison between estimation partition of a stochastic daily rainfall generator, from which long
techniques. However, general remarks regarding the quantile curves rainfall time series are simulated in order to provide an ensemble
and credible intervals can be made in an objective fashion. First, one of input realizations for hydrologic modeling, flood quantile esti-
may notice that both GEV and LP3 models underestimate flood quan- mation and predictive uncertainty accounting.
tiles associated to return periods between 10 and 100 years. This The method for indirect flood estimation was successfully
range should contain most quantiles of practical interest for the applied to the American river catchment, at the Folsom dam
design of usual hydraulic structures, such as open channels. Thus, cross-section. The mixed model for stochastic daily rainfall gener-
risk assessment for such structures may not be properly performed ation proved capable of capturing the main variability characteris-
when these probabilistic models are employed. As for the larger tics of daily rainfall at the referred site from a relatively short
flood quantiles, the LP3 model shows reasonable matches, particu- training dataset, as most sample summary statistics, for both short
larly for the paleoflood events. The GEV model, on the other hand, and long series, were properly reproduced, including those fre-
departs from the empirical records for return periods about quently reported to fail in similar models, such as the coefficient
300 years, with a noticeable tendency of overestimation for the of skewness (Srikanthan and Pegram, 2009; Costa et al., 2015). In
floods of higher magnitude. In summary, at least in this application, addition, the LN4 model appears to provide a reasonable descrip-
both unbounded models were unable to reproduce the entire range tion of the annual maximum daily rainfall upper tail, since the
of variability of annual maximum flood events in the catchment, most critical rainfall observed events in the wet season were rea-
which may limit their use in practical situations. sonably simulated and extrapolation conditions were, at least in
As for the upper-bounded models approach, both techniques this application, validated by the flood quantile curves. These
provided reasonable estimates for most empirical records in the results are consistent with the previous study of Costa et al.
quantile curve. In effect, point estimates indirectly obtained from (2015) using the mixed model and demonstrates the potential
hydrologic simulation were quite similar to the ones derived from advantages of synthetically simulating daily rainfall by combining
the Bayesian flood frequency analysis method proposed by parametric and nonparametric techniques. However, prescribing
Fernandes et al. (2010), with the former being slightly lower than other distributional models for annual block-maxima in the para-
the latter. Nonetheless, the study of Fernandes et al. (2010) demon- metric module of the mixed model is necessary for a meaningful
strated that a lower predictive ability is achieved in quantile esti- assessment of uncertainty in extreme rainfall simulation due to
mation when only systematic data are employed in inference. In model selection. Furthermore, alternative procedures for eliciting
fact, for this situation, most paleoflood events were not suitably prior uncertainty distributions for the upper bound and their
modeled. This fact suggests that, as little information on the upper effects in the posterior descriptive distribution have to be evalu-
tail of annual maximum daily precipitation is required in the pre- ated. Finally, more objective procedures for defining the threshold
sent modeling strategy, broader spectra for application, encom- between low to moderate and extreme rainfall are desirable.
passing those catchments where large flood non-systematic As for the hydrologic modeling, the use of the generalized log-
information are not available or the systematic streamflow record likelihood function allowed one to relax simplifying modeling
is too short, may be associated to indirect flood estimation method. assumptions on the behavior of calibration residuals. The adequacy
Another important aspect concerns interval quantile estimation of the likelihood function was highlighted for the good match
on the set of inference techniques. It is possible to observe that the between the simulated and the observed hydrographs and in the
credible intervals are consistently narrower for those approaches diagnostic plots, with exception made for the serial structure of
based on bounded models. Of course, such a behavior is expected dependency. As pointed out by Schoups and Vrugt (2010), the
since prior information is included in inference in these cases. choice of a suitable statistical model for calibration residuals has
However, as the bounded models demonstrated, at least in this a strong influence on the estimation of the predictive uncertainty.
application, higher predictive ability than the unbounded counter- Therefore, it is expected that more reliable parameter posterior
parts, the derived uncertainty bands for the former are deemed to estimates were obtained in calibration, as compared to traditional
be more realistic. In addition, it appears that accounting for uncer- least-squares procedures, and a more realistic account for the pre-
tainties in each modeling step and propagating them along the dictive uncertainty is provided by the proposed approach.
simulations entail the shortest interval widths for all evaluated Nonetheless, an explicit treatment of both input uncertainties
quantiles. This could indicate another potential advantage of the and model structural errors by means of prior uncertainty distribu-
proposed approach. However, some caution is necessary when com- tions, as suggested by Renard et al. (2010), along with a formal sta-
paring the resulting uncertainty bands as the generation of excep- tistical description of the dependence structure of the model
tionally large (yet physically plausible) rainfall amounts may parameters are still required for a broader comprehension of the
present limitations, due to the algorithm for random number simu- rainfall-runoff model capabilities and for evaluating the contribu-
lation utilized in the mixed model. Thus, more extreme rainfall con- tion of each of these sources in the resulting predictive uncertainty.
ditions could be obtained if a larger ensemble of daily rainfall series The flood quantile curves derived from hydrologic simulation
was utilized, which should entail wider credible intervals. closely resembled the empirical counterparts, in most ranges of
the hydrologic random variable domain. As compared to other
studies in the catchment, particularly those based on distributions
4. Conclusions with unbounded upper tails, the proposed approach led to more
accurate point estimates and narrower uncertainty bands for most
This article presented a Bayesian framework for indirect estima- of the evaluated return periods, which, at some extent, demon-
tion of extreme flood quantiles from a rainfall-runoff model. The strates the advantages of incorporating appropriate upper bounds
152 V. Costa, W. Fernandes / Journal of Hydrology 554 (2017) 137–154

to statistical inference. In addition, as little knowledge on excep- channel reach is available, although, for the purposes of this paper,
tionally large rainfall amounts appears to be necessary for calibrat- it is not necessary.
ing the mixed model for daily rainfall generation, the method could The runoff production module of hydrologic model is based on
be easily adapted for simulating large floods in those catchments the concepts of the Xinanjiang model, first described by Zhao et al.
where the only information available is the systematic record of (1980). Such a concept establishes that runoff only occurs when
streamflows and the annual maximum floods upper tails cannot the maximum soil storage is attained (Costa et al., 2014). In other
be characterized as precisely. Obviously, a larger set of applications words, the moisture content of the unsaturated portion of the soil
is necessary for allowing some generalization. However, the Baye- must reach the field capacity for allowing the production of runoff.
sian indirect flood estimation method points to a promising direc- Fig. A1 presents a flow chart of the runoff production module.
tion for improving flood frequency analysis, since, by separating Symbols inside blocks denote inputs, outputs or state variables,
and accounting for uncertainties in each modeling step, one is able whereas those elements outside blocks refer to model parameters.
to obtain a better understating of the influence of meteorological Model inputs comprise areal average daily rainfall (P) and daily
factors and catchment hydrologic response conditions in extreme pan evaporation (EM). The main output is the total discharge per
flood formation dynamics. unit area (CIN). Model state variables are the mean spatial soil
water (W), which represents water storage, and the mean spatial
Acknowledgements free water storage (S), which corresponds to the amount of water
in the soil phase that is free to flow. W has three components,
The authors wish to acknowledge Simon Michael Papalexiou namely, WU, WL and WD, representing, respectively, the water
and an anonymous reviewer for the valuable comments and sug- storage in upper, lower and deep layers. FR is the contributing area
gestions, which helped improving the paper. The authors also factor; RB is the runoff from the impermeable area of the catch-
acknowledge the support to this research from CNPq (‘‘Conselho ment, while R denotes the runoff from the permeable area, com-
Nacional de Desenvolvimento Científico e Tecnológico”) and from prising surface (RS), interflow (RI) and groundwater (RG). Runoff
CAPES (‘‘Coordenação de Aperfeiçoamento de Pessoal de Nível components are then transferred into QS, QI and QG, respectively,
Superior”). and through their combination, at each time step, the total dis-
charge CIN is formed.
Appendix A. The Rio Grande rainfall-runoff model As for the model parameters, k denotes the ratio of potential
evapotranspiration with respect to pan evaporation; wm and b
The Rio Grande rainfall-runoff model is a conceptual semi- accounts for the soil water distribution, where wm is the mean spa-
distributed hydrologic model, which was developed by the Depart- tial soil water capacity, divided in components wum, wlm and wdm,
ment of Hydraulic and Water Resources Engineering of Federal and b is the exponent of the soil water capacity distribution curve;
University of Minas Gerais. In short, the hydrological syntheses of imp represents the impermeable area factor; sm and ex describe
the Rio Grande model comprises two modules: a runoff production the distribution of free water, kss and kg are the coefficients of free
module, which controls water transfers in the soil phase by means water storage to interflow and groundwater flow, cg is the daily
of a collection of 13 parameter that require calibration, and a flow recession constant of the groundwater flow and ci denotes the
concentration module, which routs the runoff from different parts recession constant to the interflow counterpart. For more details
of the catchment to its outlet. An additional module, which per- on the meaning and typical values of the model parameters, the
forms the routing of the produced hydrograph throughout a given reader is referred to Zhao et al. (1980), and Costa et al. (2014).

Fig. A1. Flow chart of the runoff production module of the Rio Grande rainfall-runoff model, which is similar to the Xinanjing model as described by Zhao et al. (1980).
V. Costa, W. Fernandes / Journal of Hydrology 554 (2017) 137–154 153

The flow concentration model conveys the runoff for different N., Turcotte, R. (Eds.), Calibration of Watershed Models. American Geophysical
Union, Washington, D.C., pp. 49–68. http://dx.doi.org/10.1029/WS006p0049.
portions of the catchment to the main outlet by applying a transfer
Klemes, V., 1987. Hydrological and engineering relevance of flood frequency
function, based on Clark’s synthetic unity hydrograph, as originally analysis. In: Proceedings of International Symposium on Flood Frequency Risk
formulated by the Hydrological Engineering Center of the US Army Analysis – Regional Flood Frequency Analysis. D. Reidel Publishing Company,
Corps of Engineers (HEC, 1998). Such a procedure is intended to Baton Rouge, pp. 1–18.
Koutsoyiannis, D., 2004. Statistics of extremes and estimation of extreme rainfall: I.
account for the different time lags in which runoff pulses reach Theoretical investigation/Statistiques de valeurs extrêmes et estimation de
the channel network and are routed throughout it. précipitations extrêmes: I. Recherche théorique. Hydrol. Sci. J. 49. http://dx.doi.
org/10.1623/hysj.49.4.575.54430.
Lall, U., Sharma, A., 1996. A nearest neighbor bootstrap for resampling hydrologic
References time series. Water Resour. Res. 32, 679–693. http://dx.doi.org/10.1029/
95WR02966.
Laursen, E.M., 1983. Comment on ‘‘Paleohydrology of southwestern Texas” by R.
Apipattanavis, S., Podestá, G., Rajagopalan, B., Katz, R.W., 2007. A semiparametric
Craig Kochel, Victor R. Baker, and Peter C. Patton. Water Resour. Res., 19, p.
multivariate and multisite weather generator. Water Resour. Res. 43. http://dx.
1339. http://dx.doi.org/10.1029/WR019i005p01339.
doi.org/10.1029/2006WR005714.
Li, C., Singh, V.P., Mishra, A.K., 2012. Simulation of the entire range of daily
Baker, V.R., Webb, R.H., House, P.K., 2002. The Scientific and Societal Value of
precipitation using a hybrid probability distribution. Water Resour. Res. 48.
Paleoflood Hydrology. In: House, P.K., Webb, R.H., Baker, V.R., Levish, D.R. (Eds.),
http://dx.doi.org/10.1029/2011WR011446.
Ancient Floods, Modern Hazards: Principles and Applications of Paleoflood
Lü, H., Hou, T., Horton, R., Zhu, Y., Chen, X., Jia, Y., Wang, W., Fu, X., 2013. The
Hydrology. American Geophysical Union, Washington, D.C., pp. 1–19. http://dx.
streamflow estimation using the Xinanjiang rainfall runoff model and dual
doi.org/10.1029/WS005p0001.
state-parameter estimation method. J. Hydrol. 480, 102–114. http://dx.doi.org/
Basinger, M., Montalto, F., Lall, U., 2010. A rainwater harvesting system reliability
10.1016/j.jhydrol.2012.12.011.
model based on nonparametric stochastic rainfall generator. J. Hydrol. 392,
Lunn, D.J., Thomas, A., Best, N., Spiegelhalter, D., 2000. WinBUGS – A Bayesian
105–118. http://dx.doi.org/10.1016/j.jhydrol.2010.07.039.
modelling framework: concepts, structure, and extensibility. Stat. Comput. 10,
Botero, B.A., Francés, F., 2010. Estimation of high return period flood quantiles using
325–337. http://dx.doi.org/10.1023/A:1008929526011.
additional non-systematic information with upper bounded statistical models.
Markovich, N., 2007. Nonparametric Analysis of Univariate Heavy-Tailed Data,
Hydrol. Earth Syst. Sci. 14, 2617–2628. http://dx.doi.org/10.5194/hess-14-
Wiley Series in Probability and Statistics. John Wiley & Sons Ltd, Chichester, UK.
2617-2010.
10.1002/9780470723609.
Boughton, W.C., 1980. A frequency distribution for annual floods. Water Resour.
Martins, E.S., Stedinger, J.R., 2000. Generalized maximum-likelihood generalized
Res. 16, 347–354. http://dx.doi.org/10.1029/WR016i002p00347.
extreme-value quantile estimators for hydrologic data. Water Resour. Res. 36,
Boughton, W.C., 1999. A daily rainfall generating model for water yield and flood
737–744. http://dx.doi.org/10.1029/1999WR900330.
studies. Melbourne.
NOAA - National Oceanic and Atmospheric Administration, 1999. Probable
Brooks, S.P., Gelman, A., 1998. General methods for monitoring convergence of
Maximum Precipitation for California - Hydrometeorological Report 59. Silver
iterative simulations. J. Comput. Graph. Stat. 7, 434–455. http://dx.doi.org/
Spring.
10.1080/10618600.1998.10474787.
NOAA - National Oceanic and Atmospheric Administration, 1982. Mean Monthly,
Chen, J., Brissette, F., 2014. Stochastic generation of daily precipitation amounts:
Seasonal, and Annual Pan Evaporation for the United States - Technical Report
review and evaluation of different models. Clim. Res. 59, 189–206. http://dx.doi.
NWS 34. Washington, D.C.
org/10.3354/cr01214.
NRC - National Research Council, 1999. Improving American River Flood Frequency
Costa, V., Fernandes, W., Naghettini, M., 2014. Regional models of flow-duration
Analyses. The National Academies Press, Washington, D.C. doi:10.17226/6483.
curves of perennial and intermittent streams and their use for calibrating the
Papalexiou, S.M., Koutsoyiannis, D., 2006. A probabilistic approach to the concept of
parameters of a rainfall–runoff model. Hydrol. Sci. J. 59, 262–277. http://dx.doi.
Probable Maximum Precipitation. Adv. Geosci. 7, 51–54. http://dx.doi.org/
org/10.1080/02626667.2013.802093.
10.5194/adgeo-7-51-2006.
Costa, V., Fernandes, W., Naghettini, M., 2015. A Bayesian model for stochastic
Papalexiou, S.M., Koutsoyiannis, D., 2012. Entropy based derivation of probability
generation of daily precipitation using an upper-bounded distribution function.
distributions: a case study to daily rainfall. Adv. Water Resour. 45, 51–57.
Stoch. Environ. Res. Risk Assess. 29, 563–576. http://dx.doi.org/10.1007/
http://dx.doi.org/10.1016/j.advwatres.2011.11.007.
s00477-014-0880-9.
Papalexiou, S.M., Koutsoyiannis, D., 2016. A global survey on the seasonal variation
Efron, B., 1979. Bootstrap methods: another look at the Jackknife. Ann. Stat. 7, 1–26.
of the marginal distribution of daily precipitation. Adv. Water Resour. 94, 131–
http://dx.doi.org/10.1214/aos/1176344552.
145. http://dx.doi.org/10.1016/j.advwatres.2016.05.005.
Elíasson, J., 1994. Statistical estimates of PMP Values. Nord. Hydrol. 25, 301–312.
Papalexiou, S.M., Koutsoyiannis, D., Makropoulos, C., 2013. How extreme is
http://dx.doi.org/10.2166/nh.1994.019.
extreme? An assessment of daily rainfall distribution tails. Hydrol. Earth Syst.
Elíasson, J., 1997. A statistical model for extreme precipitation. Water Resour. Res.
Sci. 17, 851–862. http://dx.doi.org/10.5194/hess-17-851-2013.
33, 449–455. http://dx.doi.org/10.1029/96WR03531.
Ramesh, N.I., Onof, C., 2014. A class of hidden Markov models for regional average
Enzel, Y., Ely, L.L., House, P.K., Baker, V.R., Webb, R.H., 1993. Paleoflood evidence for
rainfall. Hydrol. Sci. Journal. 59 (9), 1704–1717. http://dx.doi.org/10.1080/
a natural upper bound to flood magnitudes in the Colorado River Basin. Water
02626667.2014.881484.
Resour. Res. 29, 2287–2297. http://dx.doi.org/10.1029/93WR00411.
Renard, B., Kavetski, D., Kuczera, G., Thyer, M., Franks, S.W., 2010. Understanding
Fernandes, W., Naghettini, M., Loschi, R., 2010. A Bayesian approach for estimating
predictive uncertainty in hydrologic modeling: the challenge of identifying
extreme flood probabilities with upper-bounded distribution functions. Stoch.
input and structural errors. Water Resour. Res. 46. http://dx.doi.org/10.1029/
Environ. Res. Risk Assess. 24, 1127–1143. http://dx.doi.org/10.1007/s00477-
2009WR008328.
010-0365-4.
Robert, C.P., 2007. The Bayesian Choice. In: Springer Texts in Statistics. 2nd ed.
Furrer, E.M., Katz, R.W., 2008. Improving the simulation of extreme precipitation
Springer, New York, NY. http://dx.doi.org/10.1007/0-387-71599-1.
events by stochastic weather generators. Water Resour. Res. 44. http://dx.doi.
Schoups, G., Vrugt, J.A., 2010. A formal likelihood function for parameter and
org/10.1029/2008WR007316.
predictive inference of hydrologic models with correlated, heteroscedastic, and
Gabriel, K.R., Neumann, J., 1962. A Markov chain model for daily rainfall occurrence
non-Gaussian errors. Water Resour. Res. 46. http://dx.doi.org/10.1029/
at Tel Aviv. Q. J. R. Meteorol. Soc. 88, 90–95. http://dx.doi.org/10.1002/
2009WR008933.
qj.49708837511.
Sharif, M., Burn, D.H., 2006. Simulating climate change scenarios using an improved
Guse, B., Hofherr, T., Merz, B., 2010. Introducing empirical and probabilistic regional
K-nearest neighbor model. J. Hydrol. 325, 179–196. http://dx.doi.org/10.1016/j.
envelope curves into a mixed bounded distribution function. Hydrol. Earth Syst.
jhydrol.2005.10.015.
Sci. 14, 2465–2478. http://dx.doi.org/10.5194/hess-14-2465-2010.
Silva, F., Naghettini, M., Fernandes, W., 2014. Avaliação bayesiana das incertezas nas
Hosking, J.R.M., Wallis, J.R., 1997. Regional Frequency Analysis: An Approach Based
estimativas dos parâmetros de um modelo chuva-vazão conceitual. Rev. Bras.
on L-Moments. Cambridge University Press.
Recur. Hídricos 19, 148–159. http://dx.doi.org/10.21168/rbrh.v19n4.p148-159.
Hundecha, Y., Pahlow, M., Schumann, A., 2009. Modeling of daily precipitation at
Slade, J.J., 1936. An asymmetric probability function. Trans. Am. Soc. Civ. Eng. 101,
multiple locations using a mixture of distributions to characterize the extremes.
35–61.
Water Resour. Res. 45. http://dx.doi.org/10.1029/2008WR007453.
Sorooshian, S., Dracup, J.A., 1980. Stochastic parameter estimation procedures for
HEC - Hydrologic Engineering Center, 1998. HEC-1 Flood Hydrograph Package
hydrologic rainfall-runoff models: Correlated and heteroscedastic error cases.
User’s Manual. Davis, USA.
Water Resour. Res. 16, 430–442. http://dx.doi.org/10.1029/WR016i002p00430.
Jacoby, Y., Grodek, T., Enzel, Y., Porat, N., McDonald, E.V., Dahan, O., 2008. Late
Srikanthan, R., McMahon, T.A., 2000. Stochastic generation of climate data: a
Holocene upper bounds of flood magnitudes and twentieth century large floods
review. Report 00/16. Melbourne.
in the ungauged, hyperarid alluvial Nahal Arava, Israel. Geomorphology 95,
Srikanthan, R., Pegram, G.G.S., 2009. A nested multisite daily rainfall stochastic
274–294. http://dx.doi.org/10.1016/j.geomorph.2007.06.008.
generation model. J. Hydrol. 371, 142–153. http://dx.doi.org/10.1016/j.
Kanda, J., 1981. A new extreme value distribution with lower and upper limits for
jhydrol.2009.03.025.
earthquake motions and wind speeds. Theor. Appl. Mech. 31, 351–360.
Takara, K., Tosa, K., 1999. Storm and flood frequency analysis using PMP/PMF
Katz, R.W., Parlange, M.B., Naveau, P., 2002. Statistics of extremes in hydrology. Adv.
estimates. In: International Symposium on Floods and Droughts. Nanjing, pp. 7–
Water Resour. 25 (8–12), 1287–1304. http://dx.doi.org/10.1016/S0309-1708
17.
(02)00056-8.
USACE - U.S. Army Corps of Engineers, 1987. Water Control Manual, Folsom Dam
Kavetski, D., Franks, S.W., Kuczera, G., 2003. Confronting input uncertainty in
and Lake, American River. Sacramento, USA.
environmental modelling. In: Duan, Q., Gupta, H.V., Sorooshian, S., Rousseau, A.
154 V. Costa, W. Fernandes / Journal of Hydrology 554 (2017) 137–154

USACE - U.S. Army Corps of Engineers, 1998. American river, California, rain flood (Switzerland). Water Resour. Res. 43. http://dx.doi.org/10.1029/
flow frequency analysis: Civil Design Branch. Sacramento, USA. 2006WR005497.
USBR - U.S. Bureau of Reclamation, 2002. Flood hazard analysis - Folsom Dam Yates, D., Gangopadhyay, S., Rajagopalan, B., Strzepek, K., 2003. A technique for
Central Valley Project. Denver, USA. generating regional climate scenarios using a nearest-neighbor algorithm.
Vrugt, J.A., ter Braak, C.J.F., Clark, M.P., Hyman, J.M., Robinson, B.A., 2008. Treatment Water Resour. Res. 39. http://dx.doi.org/10.1029/2002WR001769.
of input uncertainty in hydrologic modeling: Doing hydrology backward with Yevjevich, V., 1968. Misconceptions in hydrology and their consequences. Water
Markov chain Monte Carlo simulation. Water Resour. Res. 44. http://dx.doi.org/ Resour. Res. 4, 225–232. http://dx.doi.org/10.1029/WR004i002p00225.
10.1029/2007WR006720. Yevjevich, V., Harmancioglu, N.B., 1987. Some reflections on the future of
Wilby, R.L., Wigley, T.M.L., Conway, D., Jones, P.D., Hewitson, B.C., Main, J., Wilks, D. hydrology. In: IASH Publishing 164 (Ed.), Proceedings of Rome Symposium for
S., 1998. Statistical downscaling of general circulation model output: a the Future: Hydrology in Perspective. International Association of Hydrological
comparison of methods. Water Resour. Res. 34, 2995–3008. http://dx.doi.org/ Sciences, Wallinford, pp. 405–414.
10.1029/98WR02577. Zhao, R.J., Zhuang, L.R., Fang, X., Liu, R., Zhang, Q.S., 1980. The Xinanjiang model. In:
Wilks, D.S., 1999. Interannual variability and extreme-value characteristics of IAHS Publication 129 (Ed.), Hydrological Forecasting. Proceedings from the
several stochastic daily precipitation models. Agric. For. Meteorol. 93, 153–169. Oxford Symposium. pp. 351–356.
http://dx.doi.org/10.1016/S0168-1923(98)00125-7. Zhijia, L., Penglei, X., Jiahui, T., 2013. Study of the Xinanjiang Model Parameter
Wilks, D.S., 1998. Multisite generalization of a daily stochastic precipitation Calibration. J. Hydrol. Eng. 18, 1513–1521. http://dx.doi.org/10.1061/(ASCE)
generation model. J. Hydrol. 210, 178–191. http://dx.doi.org/10.1016/S0022- HE.1943-5584.0000527.
1694(98)00186-3.
Yang, J., Reichert, P., Abbaspour, K.C., 2007. Bayesian uncertainty analysis in
distributed hydrologic modeling: a case study in the Thur River basin

You might also like