Professional Documents
Culture Documents
available at www.sciencedirect.com
Towards a Bayesian total error analysis of conceptual rainfall-runoff models: Characterising model error using storm-dependent parameters
George Kuczera
a b
a,*
School of Engineering, University of Newcastle, NSW 2308, Australia Department of Civil and Environmental Engineering, Princeton University, Princeton, NJ 08544, USA
Received 19 August 2005; received in revised form 30 April 2006; accepted 9 May 2006
KEYWORDS
Conceptual rainfall-runoff modelling; Parameter calibration; Model error; Input uncertainty; Bayesian parameter estimation; Parameter variation; Model determinism
Summary Calibration and prediction in conceptual rainfall-runoff (CRR) modelling is affected by the uncertainty in the observed forcing/response data and the structural error in the model. This study works towards the goal of developing a robust framework for dealing with these sources of error and focuses on model error. The characterisation of model error in CRR modelling has been thwarted by the convenient but indefensible treatment of CRR models as deterministic descriptions of catchment dynamics. This paper argues that the uxes in CRR models should be treated as stochastic quantities because their estimation involves spatial and temporal averaging. Acceptance that CRR models are intrinsically stochastic paves the way for a more rational characterisation of model error. The hypothesis advanced in this paper is that CRR model error can be characterised by storm-dependent random variation of one or more CRR model parameters. A simple sensitivity analysis is used to identify the parameters most likely to behave stochastically, with variation in these parameters yielding the largest changes in model predictions as measured by the NashSutcliffe criterion. A Bayesian hierarchical model is then formulated to explicitly differentiate between forcing, response and model error. It provides a very general framework for calibration and prediction, as well as for testing hypotheses regarding model structure and data uncertainty. A case study calibrating a six-parameter CRR model to daily data from the Abercrombie catchment (Australia) demonstrates the considerable potential of this approach. Allowing storm-dependent variation in just two model parameters (with one of the parameters characterising model error and the other reecting input
* Corresponding author. Tel.: +61 2 49 216038; fax: +61 2 49 216991. E-mail address: george.kuczera@newcastle.edu.au (G. Kuczera). 0022-1694/$ - see front matter c 2006 Elsevier B.V. All rights reserved. doi:10.1016/j.jhydrol.2006.05.010
162
G. Kuczera et al.
uncertainty) yields a substantially improved model t raising the NashSutcliffe statistic from 0.74 to 0.94. Of particular signicance is the use of posterior diagnostics to test the key assumptions about the data and model errors. The assumption that the storm-dependent parameters are log-normally distributed is only partially supported by the data, which suggests that the parameter hyper-distributions have thicker tails. The results also indicate that in this case study the uncertainty in the rainfall data dominates model uncertainty. c 2006 Elsevier B.V. All rights reserved.
Introduction
Catchment models simulate water balance dynamics at the catchment scale. Because of the signicance of water in terrestrial ecosystems, catchment models are an integral part of virtually all environmental models formulated at the catchment scale and their applications range from catchment water and nutrient balances to biophysical models. This paper focuses on conceptual rainfall-runoff (CRR) models. An important, perhaps dening, feature of CRR models is that their parameters are not directly measurable and must be inferred (calibrated) from the observed data (e.g., Beven and Binley, 1992). The advantage of this class of models is their ability to capture the dominant catchment dynamics while remaining parsimonious and computationally efcient. Characterising the uncertainty in streamow predicted by a CRR model has attracted the attention of hydrologists over many years. Yet in the recent reviews of CRR model calibration, Kuczera and Franks (2002), Kavetski et al. (2002) and Vrugt et al. (2005) note the lack of a robust framework that accounts for all sources of error (input, model and response error). Although Vrugt et al. (2005) propose a simultaneous parameter optimization and data assimilation method for improved uncertainty analysis, their strategy merges input and model structural errors into a single forcing term [p. 11]. One concludes that realistic statistical models of input and model structural error remain to be articulated. The lack of a robust calibration framework has a number of implications for CRR modelling: (i) quantifying the predictive uncertainty in streamow and other model outputs is problematic; (ii) the regionalisation of CRR model parameters continues to be confounded by biases in the calibrated parameters and unreliable assessment of parameter uncertainty; and (iii) it is difcult to discriminate between competing CRR model hypotheses because poor model performance can hide behind the veil of ignorance about the sources of error. This paper focuses on a more rigorous characterisation of the uncertainty associated with CRR models. The study builds on the Bayesian total error analysis of Kavetski et al. (2002, 2006c,d) who proposed a parameter estimation methodology that discriminates between input, model and response errors. The main contribution of this work is an explicit characterisation of model error that is open to scrutiny and improvement. When linked with statistical models of input and response error, a basic total error framework emerges. This framework advances both operational hydrology (it improves the quality of predictions and generates more meaningful uncertainty bounds) and scientic hydrol-
ogy (it yields insights into model error and thus facilitates model development). The paper is organised as follows: after a brief review of CRR modelling, the need for characterising model error is motivated by an example. It is then argued that the notion of a deterministic CRR model is indefensible, at least for the current generation of such models. Although there are many ways to make a CRR model stochastic, a simple approach is to abandon the notion that model parameters are xed quantities and instead assume that they are random variables. Different probability distributions for these parameters can then be investigated. A starting point is to assume that the CRR model parameters vary from storm to storm. This strategy offers a simple characterisation of model error and, importantly, one that can be tested. A Bayesian inference framework is then developed, which requires the modeller to make explicit assumptions regarding model, input and response uncertainty and allows testing these hypotheses against available evidence. A case study illustrates this approach and highlights the role of diagnostic checks of key assumptions.
The vector qt contains the true responses of the catchment at time t, which are observable point or spatially/temporally averaged quantities. In the simplest case, qt is scalar and contains the streamow observed at the catchment outlet. Generally the function h( ) is a probability density function (pdf), so that the true response qt is a random sample from the pdf h( ). The usual practice of assuming that the CRR model is deterministic yields a special form of h( ), the Dirac delta function. The catchment responds to external forcing or inputs. The vector xt represents the true input and contains one or more observable point or spatially/temporally averaged quantities, typically rainfall and potential evapotranspiration within the catchment at a time t. The term st refers to the set of internal state and ux variables. An internal variable is one that is not observed or measured for the purpose of testing the model hypothesis. It is stressed that an internal variable may be observable in the sense that there exists a technology to observe or measure it. However, if that technology has not been applied, then the variable is internal to the model and thus cannot be scrutinised. The terms g and j refer respectively to the sets of conceptual and physical parameters. These parameters deter-
Towards a Bayesian total error analysis of conceptual rainfall-runoff models mine qt and st for a given external forcing xt. Physical parameters can be estimated using procedures that are independent of observable catchment responses qt (e.g., laboratory measurements of soil core permeabilities), whereas conceptual parameters can only be inferred (calibrated) by some process involving matching simulated catchment responses to observed values of qt. In the remainder of this paper we omit reference to the internal states st and to the physical parameters j, since they are inferred independently of the conceptual parameters g. The CRR model then reduces to qt hxt ; g 2
163
2. Model and response errors are lumped into a single random process, the simplest being the standard least squares (SLS) error model ~ t hxt ; g et q 3
where et is random independent Gaussian error with constant variance. The objective of this paper is to analyse the effect of these assumptions on the calibrated parameters and model predictions, and suggest practical strategies for making CRR calibration more consistent with the error propagation schematic shown in Fig. 1(a).
All calibration methods in CRR modelling involve some form of matching simulated responses h(xt, g) to the observed responses qt. These methods necessarily make assumptions, either explicit or implicit, about how errors arise and propagate through the CRR model to affect the simulated catchment responses (see Kavetski et al. (2002) for a more complete overview). Fig. 1(a) summarises our current understanding of the uncertainty in CRR modelling. There are three distinctly different sources of error. The observation of forcing inputs, particularly rainfall but also potential evapotranspiration (PET), is subject to measurement error and, more importantly, is affected by the sampling uncertainty arising from incomplete sampling of the spatially/temporally distributed random elds. The response of the catchment, typically streamow discharge at one or more locations, is itself subject to measurement and rating curve errors. Finally, given the simplications made in deriving CRR models, they cannot be expected to reproduce the true response exactly even if error-free forcing and response data were available this discrepancy is termed model or structural error. In stark contrast, Fig. 1(b) shows the conceptualisation that underpins calibration methods dominating current practice. The dening features of these methods are listed below: 1. Input error is ignored or assumed negligible, i.e., the ~t is assumed to be equal to the true observed forcing x forcing xt.
True input, xt
Input errors Observed input data, x t Conceptual catchment model Model errors
True input, xt
True dynamics
True dynamics
,) Simulated response, h ( x t
Model and Response errors, t
True response, qt
Figure 1
Schematic of error propagation in CRR models (sources of errors shaded grey) (taken from Kavetski et al. (2002)).
164
16000
90% prediction limit
G. Kuczera et al.
Runoff (ML/day)
2000
4000
6000
8000
10000
12000
14000
16000
Figure 2
Runoff scatter plot for the Sacramento model calibrated to two years of daily runoff for the Abercrombie River.
term et. Noting that the 90% prediction limit interval represents 6080% of the simulated runoff, it is highly unlikely that this magnitude of uncertainty is due to errors in estimating runoff, since a gauging station with a stable and well developed rating curve is unlikely to have a coefcient of variation in errors exceeding 510%. The evidence therefore strongly suggests that the bulk of the predictive uncertainty is due to structural errors in the model and input data uncertainty, both of which are ignored by the SLS calibration. Deeper insight into the nature of the model and forcing error is provided by Fig. 3, which presents a time series plot
of observed and simulated daily runoff. It is immediately evident that the model error is highly structured and completely at odds with the SLS assumption of independence from one day to the next. There are long runs of systematic over- and under-estimation. Many recessions are systematically mis-specied, while several peaks dominated by quickow are either spuriously exaggerated or completely missed. These qualitative features are well known to practitioners and researchers it is generally recognised that model and input errors induce a complex uncertainty structure in the model parameters and predictions. Our focus in this
16000 14000 Simulated 12000 Runoff (ML/day) 10000 8000 6000 4000 2000 0 400 Observed
500
600
700 Days
800
900
1000
1100
Figure 3 Runoff time series for the Sacramento model calibrated to two years of daily rainfall-runoff data for the Abercrombie River using the SLS method.
Towards a Bayesian total error analysis of conceptual rainfall-runoff models paper is to disentangle the contribution of model and forcing errors and acquire a deeper understanding of how these uncertainties affect and propagate through CRR models.
165
1. Relax the assumption that CRR parameters are timeinvariant constants. By allowing some of the CRR parameters to be random variables over some characteristic time scales, stochastic variations in the uxes can be modelled. 2. Perturb the internal states over some characteristic time scale. The notion of time-varying parameters is not new. The state-space formulation underlying the Kalman lter naturally allows for time variation in parameters in the extended Kalman lter CRR parameters can be treated as state variables that can be randomly perturbed at every update step (see Bras and Rodriguez-Iturbe (1985) for an overview of hydrologic applications). However, as Kavetski et al. (2002) observe, the extended Kalman lter approach to CRR modelling is hampered by assumptions of linearity of the state equation and Gaussian structure of all errors. Although recent work by Vrugt et al. (2005) using the ensemble Kalman lter has shown that model nonlinearity can be accommodated, their approach of lumping model and input error into an additive Gaussian error fails to address the fundamental differences between input and model error. The critical issue to be addressed is the temporal variation of the random perturbations of the model uxes. For example, if a CRR model uses an hourly time step for computation, should the uxes also be randomly perturbed at hourly intervals? If the hourly time step is signicantly less than the response time of the store receiving the ux, independent hourly perturbations would average out and the store would behave like a low-pass lter responding only to the average component of the input. Some means of representing the persistence in the random perturbation of the ux is therefore necessary. One way to introduce such persistence is to randomly perturb the model parameters at the beginning of each storm event. This strategy is consistent with the idea of storm-dependent parameters explored by Kuczera (1990). Since the rainfall during a storm event represents the primary (and the most spatially and temporally heterogeneous) forcing of the catchment water balance, ux perturbations are likely to persist over storm-event time scales. This is arguably the simplest hypothesis that allows for persistence in model error and must be judged by its consistency with available evidence. The simplest stochastic perturbation approach partitions the set of conceptual parameters as g = (h, x), where x is the nd-vector of time-invariant (deterministic) parameters and h is the ns-vector of storm-dependent (stochastic) parameters. The latter are then described by sampling distributions (pdfs), such as the following simple Gaussian form: hi Nhjli ; r2 i ; i 1; . . . ; ns 4
where Nhjli ; r2 i is an independent normal pdf with mean li and variance r2 i . This distribution is stationary (constant mean and variance) if the catchment is assumed not to be undergoing a change over time (natural and anthropogenic changes may invalidate this assumption). An alternative approach to introducing stochasticity into the model uxes involves randomly perturbing CRR store depths at the beginning of each storm event, while leaving
166 the CRR parameters time-invariant. This approach can be readily implemented in the state-space formulation of the Kalman lter by treating the depths of CRR stores as state variables and adding random perturbations at the beginning of each storm. However, this approach is fundamentally unsatisfying: the mass balance of a CRR store is deterministic and the appearance of a mass imbalance is a consequence of spatial and temporal averaging, not a violation of mass conservation. Since it is best to work directly with the more plausible cause of model error, rather than with its symptom, the approach of perturbing CRR store depths is not explored further in this paper.
G. Kuczera et al. (3) The stream store is a linear reservoir with depth S (mm) temporarily delaying the progress of water in the stream channel according to
dS dt
where kStream is a CRR parameter. Table 1 summarises the log SPM parameters. The seventh parameter, rMult, needs further comment. Kavetski et al. (2002, 2006c,d) used storm-dependent rainfall depth multipliers as an explicit (albeit approximate) representation of input uncertainty, which corresponds to the assumption that the rainfall errors are multiplicative (i.e., raintrue = rainobs * rMult with rMult varying from storm to storm). This paper uses the same approach to account for the uncertainty in the observed catchment rainfall. The effects of storm-dependent parameter stochasticity were explored using a synthetic daily runoff time series Qo derived from the two-year daily rainfall record for the Abercrombie River. The series Qo was generated assuming all the parameters were deterministic (their values are given in Table 1 and were obtained by tting the log SPM model to the Abercrombie runoff record with a NashSutcliffe statistic of 0.73, the same as obtained with the more complex Sacramento model). Given the same rainfall series, a new runoff time series Qi was generated with hi (the ith log SPM parameter) selected as being stochastic (its distribution is assumed to be log-normal with expected value given in Table 1 and a given coefcient of variation CV), while keeping the remaining parameters time-invariant (values given in Table 1). A new value of hi was sampled from the assumed log-normal distribution at the beginning of each storm.
ds dt
rMult rain quickf ssff rgef ets water balance saturation-soil depth function quickflow flux subsurface stormflow flux groundwater recharge flux actual evapotranpiration flux 5
quickf f rMult rain ssff f ssfMax rgef f rgeMax ets pet 1 exps
where sF, k, ssfMax, rgeMax are CRR parameters (all positive) and rMult is a rainfall storm depth multiplier. The relationship between the soil store depth and the model uxes (excepting ET) is a modied logistic function. (2) The groundwater store is a linear reservoir with depth h (mm) receiving a recharge ux from the soil store and discharging a baseow ux into the stream according to
dh dt
The NashSutcliffe statistic NS(i) was then evaluated for the runoff time series Qo and Qi, where Qi was treated as the simulated time series and Qo as the traditional observed time series. Fig. 5 presents a plot of NS(i), i = 1, . . . , 7, for a range of CVs. Several important observations can be made: 1. The model predictions, and hence the NS statistic, are most sensitive to storm-dependent variation in the parameter k. This parameter species how rapidly the saturated area grows as a function of the soil store depth s and controls the production of saturation overland and subsurface stormow. A CV of 20% reduces the NS statistic to values as low as 4050%.
rge bfh water balance bfh kBf h baseflow flux where kBf is a CRR parameter.
167
Soil store Subsurface stormflow ssf Saturated area function f Stream linear store Streamflow q
Recharge rge
h Baseflow bf
Figure 4
Table 1
Summary of log SPM parameters Description Exponent controlling saturated area fraction Exponent controlling saturated area fraction Subsurface stormow at full saturation Groundwater recharge rate at full saturation Groundwater discharge constant Stream discharge constant Observed storm depth rainfall multiplier Expected valuea 0.02 2300 0.62 mm/day 5.6 mm/day 6.3 105 0.47 1.21
Expected value obtained by calibrating to two years of daily rainfall-runoff data at Abercrombie River.
2. The second most sensitive parameter is the rainfall multiplier parameter rMult. This parameter regulates the magnitude of the error in the rainfall, the primary forcing of the model. 3. The remaining parameters display limited sensitivity to storm-dependent variation suggesting they are best treated as time-invariant. Using the insights from Fig. 5, we attempt to replicate the effects of parameter stochasticity on model calibration by the following experiment. A runoff series was generated by log SPM with parameter k made log-normally stormdependent with a CV of 30% and all other parameters set as time-invariant with values given in Table 1. This time series is shown as the observed series in Fig. 6. The simulated series in Fig. 6 is obtained by SLS tting the log SPM model to this synthetic observed series assuming all the parameters are deterministic (time-invariant). Comparison with Fig. 3 (which shows calibration to real observed data)
reveals qualitative similarities peaks are either missed or spuriously generated, while baseow recessions are often systematically in error. Therefore, calibration of a stochastic-parameter model erroneously assuming that its parameters are time-invariant, produces model mismatches that are qualitatively similar to those observed in typical hydrological calibrations.
168
G. Kuczera et al.
Figure 5
14 12 10 Runoff (mm) 8 6 4 2 0 0 100 200 300 Days 400 500 600 700 Simulated Observed
Figure 6 Time series of synthetic runoff generated by log SPM, illustrating the effects of ignoring potential stochasticity of the model. The observed series were generated with a storm-dependent k with 30% CV. The simulated series were obtained by SLS calibration assuming all parameters are xed.
(e.g., storm-dependent) parameters. In this section the error propagation process outlined in Fig. 1(a) is formally dened in terms of the hierarchical model shown in Fig. 7. Suppose a hydrologic time series is partitioned into n epochs {(ti, ti+1 1), i = 1, . . . , n} where ti is the time step index corresponding to the beginning of the ith epoch. Each epoch begins with a storm event and ends with a dry spell exceeding a minimum duration. The observed response time series for the ith epoch is ~ i fq ~ t ; t ti ; . . . ; ti1 1g, whereas qi is the true response q ~ i and xi contain time series for the ith epoch. The vectors x the observed and true forcing time series respectively for the ith epoch. The BATEA hypothesis of Kavetski et al. (2006c) assumed ~i ; ui that maps the observed forcing x ~i into a function gx the true forcing xi. The function g( ) accounts for the sam-
pling and measurement error in the observed forcing. For example, Kavetski et al. (2002, 2006c) considered the special case of ui being a storm depth multiplier scalar, which ~i (note that in log SPM, the yields the mapping xi ui x parameter rMult serves as u). The vector ui is assumed to vary from storm to storm and be a random realisation from the probability model with pdf p(uja) ui puja 8
where a is a vector of parameters describing the statistical properties of the input errors (e.g., the mean and variance of the multipliers). Since the same averaged inputs can yield different internal dynamics and thus different catchment responses, we extend the BATEA hypothesis by relaxing the assumption that the CRR model is deterministic in the sense of producing
169
Observed input ~ xi
True input
Legend
Parameter Hierarchical process Observed variable
xi g(~ xi , i )
i p( |)
True streamflow
qi h(xi, i , )
Observed streamflow
RESPONSE ERROR
~ p(q ~ | q, ) q i
Figure 7
a unique response for a given forcing input and a set of CRR parameters. Specically, we assume that for each epoch there exists a CRR model h(xi, hi, x) that maps the true forcing xi into the true response q where hi is a set of event-specic CRR model parameters drawn from the hyper-distribution p(hjb) hi phjb 9
where b are the CRR hyper-parameters (e.g., means and variances of the parameters). The storm-dependent parameters are therefore treated as latent (or hidden) variables. The true response for the ith storm epoch then becomes qi hxi ; hi ; x 10
e; X e is the full posterior pdf, where pa; b; x; c; h1:n ; u1:n j Q h1:n = {h1, . . . , hn} contains the sets of storm-dependent CRR parameter realisations for all the storms, and u1:n = {u1, . . . , un}. Direct evaluation of this integral is formidable due to its high dimensionality and strong nonlinearity. Following Kavetski et al. (2002), it is advantageous (both statistically and computationally) to work directly with the full posterior. Our primary interest in this study is to nd the most probable (or modal) posterior parameters, given by the maximum of the posterior pdf ^; x ^ 1 :n ^ ^;^ a; b c; ^ h1:n ; u
a;b;x;c;h1:n ;u1:n
max
where x are the time-invariant CRR parameters. The observed response is corrupted by errors in the gauging process and is assumed to be distributed according to ~i q ~ jqi ; c pq 11
To focus on the key issues characterising model uncertainty using storm-dependent parameters, the following simplications can be made: 1. Since the latent variables of the input and structural error models are both associated with storm epochs, they are combined into h, the set of storm-dependent parameters. 2. The response measurement error parameters c are assumed to be known. In the case of streamow, this is a reasonable assumption since at a well-maintained gauging station there would be numerous ow gaugings to develop the rating curve. Exploiting these simplications along with those offered by the hierarchical structure of BATEA yields the following posterior pdf:
e e e e;X e ; c p Q jb; x; h1:n ; X ; cpb; x; h1:n ; X ; c pb; x; h1:n j Q e;X e ; c p Q e e e ; cpb; xj X e ; c p Q jx; h1:n ; X ; cph1:n jb; x; X e;X e ; c p Q e e p Q jx; h1:n ; X ; cph1:n jbpbpx e;X e ; c p Q e jx; h1:n ; X e ; cph1:n jbpbpx / p Q 14
~ jq; c describes the response measurement error where pq and is conditioned on the true discharge q and the parameter set c that characterises the error process. The hierarchical BATEA model is atypical of Bayesian hierarchical models, since the sampling of the true response q is not independent of earlier storm epochs. This complication arises because the time memory of CRR models induces a dependence between storm epochs: storm-dependent parameters of early storm events affect model responses in subsequent events. This dependence requires careful attention and precludes routine application of Bayesian hierarchical model packages.
e jx; h1:n ; X e ; c is the likelihood function (sampling where p Q e which, according to Fig. 7, is independent distribution) of Q
170 from b, p(h1:njb) is the hyper-distribution of h1:n which only depends on b, and p(x) and p(b) are prior pdfs. In addition to identifying the modal values of the stormdependent parameters and the parameters of their hyperdistribution by maximising the objective function (14), the full distribution of these quantities could be obtained using a Monte Carlo or Markov chain Monte Carlo method. However, this is nontrivial due to the high dimensionality of the posterior pdf (14) and the dependence between storm epochs described in the previous section. Consequently, this paper limits itself to determining the modal values of all quantities of interest. In the case of the storm-dependent model parameters, this includes the mean and standard deviation of their hyper-distributions and therefore gives signicant information regarding the overall shape of these distributions. Avenues for more thorough analysis of posterior BATEA pdfs will be investigated in future papers.
G. Kuczera et al. stream water balances; and (iii) it uses an implicit Euler scheme with convergence to machine precision for the nonlinear soil water balance ODE (5). These model implementation techniques enable the use of computationally fast Newton-type methods to maximise the objective function (14). For example, the BATEA calibration to 71 storm events with 146 parameters in the following case study takes about 3 min of CPU time on a standard 2 GHz laptop processor.
Case study
The Abercrombie catchment is revisited to explore the hypothesis that storm-dependent parameters adequately describe input (rainfall) and model uncertainty. e jx; h1:n ; X e ; c assumes that the The likelihood function p Q daily streamow measurement errors are independently and normally distributed with zero mean and a standard deviation of 0.25 mm. This streamow measurement error model was selected for convenience and sufces for the purpose of this case study. The key point is that the measurement error model is inferred independently of the BATEA calibration, typically by analysis of rating curve residuals. This assists the BATEA inference because it reduces the effects of one of the three sources of uncertainty. Storm epochs were dened by inter-storm dry spells of two or more days, followed by rainfall exceeding a 0.5 mm/day threshold. In the two-year daily record, 71 such storm epochs were identied. The log SPM model was rst calibrated using SLS, which yielded a NS statistic of 0.736 (same as for the more complex Sacramento model). The parameter sF was highly correlated with ssfMax and rgeMax, and was therefore xed at its SLS value in all subsequent runs involving BATEA. This avoids the confounding effects associated with strong parameter interaction (which are important in practice but lie beyond the scope of this paper). In all calibrations, the model parameters were log-transformed to reduce the parameterisation nonlinearity of the objective function and to ensure that the parameters remain positive. The search for the mode of the posterior pdf (14) was implemented in three steps: 1. The CRR parameters most likely to be storm-dependent were identied using the NashSutcliffe sensitivity plot (Fig. 5) these were k and rMult. 2. The posterior mode of the parameters b, x and h1:n was estimated by a quasi-Newton optimisation scheme using the SLS estimates as initial values. The optimisation was based on the logarithm of the posterior pdf to reduce the nonlinearity of the problem. 3. Posterior diagnostics were evaluated to check the assumptions of the storm-dependent parameter models. Whereas uniform noninformative priors were specied for the deterministic parameters x, weakly informative priors were prescribed for the hyper-parameters b, since otherwise the posterior pdf becomes unbounded and hence ill-posed (Kavetski et al., 2006c). For log rMult, the prior on the hyper-mean was a normal distribution with zero mean and standard deviation of 0.1, whereas the prior on the hypervariance was a scaled inverse-v2 distribution with a single
Towards a Bayesian total error analysis of conceptual rainfall-runoff models degree of freedom and a scale of 0.2 (which is equivalent to a CV of 20% on rMult) specifying a single degree of freedom ensures that the prior is proper but very diffuse. For the storm-dependent parameter log k, the prior on the hypermean was a Gaussian pdf with a mean of 3.88 (equal to the SLS value) and standard deviation of 0.5, whereas the prior on the hyper-variance was a scaled inverse-v2 distribution with a single degree of freedom and scale of 0.5. Table 2 summarises the hyper-parameters and prior distributions used in this case study. Under these assumptions, the posterior pdf (14) can be expressed using the notation of Gelman et al. (1995, Table A.1) as e; X e jx; h1:n ; X e ; c / p Q e ; cph1:n jbpbpx pb; x; h1:n j Q 15 where e jx; h1:n ; X e ; c p Q ph1:n jb
n Y i1 2 2 2 pb Nlh jlp ; r2 p Inv v rh jm ; s n Y i1
171
~ i jhx ~i rMulti ; hi ; x; r2 N q c
Nhi jlh ; r2 h
px constant
Here N(zjl, r2) is the joint pdf of the vector z whose ith component is independently distributed as a normal variate 2 2 with mean li and variance r2 i , while Inv v (yjm, s ) is the joint pdf of the vector y whose ith component is independently distributed as a scaled inverse v2 variate with degrees of freedom mi and scale si. The streamow sampling distribu~ i jhx ~ i rMulti ; hi ; x; r2 tion is Nq c where the expected streamow is given by the log SPM model h( ) with input dur~i rMulti , and the standard ing the ith storm epoch equal to x deviation of measurement error rc is equal to 0.25 mm. The hyper-parameter vector b consists of the mean lh and the variance r2 h of the storm-dependent parameters h. The vectors lp and r2 p are, respectively, the prior mean and variance
Table 2 Summary of distributions used in the application of BATEA in the Abercrombie River case study Variable k sF ssfMax rgeMax kBF kStream rMult Streamow error Probability model log k $ Nlk ; r2 k Prior distribution lk $ N3:88; 0:52 , 2 2 r2 k $ Inv v 1; 0:5 Uniform Uniform Uniform Uniform Uniform lr $ N0; 0:12 , 2 2 r2 r $ Inv v 1; 0:2
a These variables were assumed time-invariant and therefore no probability model was required.
of lh, while the vectors m and s are, respectively, the prior degrees of freedom and prior scale of r2 h. Three BATEA runs using different combinations of stormdependent parameters were undertaken: BATEA Run 1: storm-dependent parameter k According to Fig. 5, stochastic variation of parameter k is likely to account for the largest variation between simulated and observed daily runoff. Table 3 reports the NS statistic for the posterior modal t, the posterior pdf at the mode and modal values for the deterministic parameters x and storm-dependent hyper-parameters b. The posterior modal t uses the values of x and h1:n that maximise the posterior pdf (14) this gives the best possible t to the observed data because the storm-dependent parameters h1:n have been optimised for each epoch. Table 3 shows that the NS statistic climbs from 0.736 for the SLS t using deterministic parameters to 0.897 if k is treated as stormdependent. BATEA Run 2: storm-dependent parameter rMult Fig. 5 also suggests that the stochastic variation of parameter rMult is likely to have similar sensitivity as parameter k. Table 3 reports that the NS statistic climbs to 0.938. BATEA Run 3: Storm-dependent parameters k and rMult In the third run, parameters k and rMult are made stormdependent. Table 3 reports that the NS statistic climbs to 0.947. This suggests that the NS statistic is starting to plateau and that signicant further improvement in the goodness-of-t is unlikely. Fig. 8 presents a time series plot of observed daily runoff and simulated runoff with stormdependent parameters taking their modal value for each storm epoch. Unlike the SLS t in Fig. 3, the BATEA t is excellent with only small discrepancies at peaks and in recessions. The change in the standard deviation of the hyper-distribution of k and rMult depending on which parameters are treated stochastically, is also noted. The modal standard deviation of log k is 0.209 in run 1, whereas in run 3 it drops to 0.075. In run 1, k was the only storm-dependent parameter and had to compensate for both input and model error. In run 3, rMult was allowed to vary between storms and dealt more directly with input errors, thus allowing k to focus on model error. Comparison of the deterministic model parameters across the three runs reveals that their modal values are also sensitive to the choice of storm-dependent parameter. Table 3 shows the results of the SLS calibration (assuming all model parameters, including k and rMult, are timeinvariant). Comparison with BATEA run 3 reveals that the parameters have shifted markedly, suggesting that SLS calibration ignoring model and input error can lead to signicant parameter bias. This nding extends the results of Kavetski et al. (2002), who demonstrated parameter bias in a synthetic example with corrupt inputs and no model error. However, in the absence of uncertainty measures on the parameters (and since the true parameter values are never known in a real-data study), the suspected bias in this case study cannot be conrmed. The differences between the SLS and BATEA modal estimates for the rainfall multiplier rMult are of interest: SLS infers a modal rMult of 1.20, whereas BATEA run 3 infers a modal value of 0.74. This marked difference in the
172
Table 3 Summary of BATEA calibration in the Abercrombie River case study NashSutcliffe statistic Log-pdf at the posterior mode Parameter
G. Kuczera et al.
BATEA run
Values at posterior mode Mean Standard deviation 0.209 0.271 0.075 0.272
0.897
196.5
loge k loge sF loge ssfMax loge rgeMax loge kBF loge kStream loge rMult loge k loge sF loge ssfMax loge rgeMax loge kBF loge kStream loge rMult loge k loge sF loge ssfMax loge rgeMax loge kBF loge kStream loge rMult loge k loge sF loge ssfMax loge rgeMax loge kBF loge kStream loge rMult
2.788 7.746 1.351 1.276 9.171 0.181 0.107 2.060 7.746 4.456 3.303 8.840 0.984 0.324 2.127 7.746 4.492 3.358 8.933 0.973 0.300 3.864 7.746 0.559 1.721 10.18 0.747 0.185
0.938
4.35
0.947
196.6
SLS
0.736
816.8
10 9 8
Daily runoff (mm)
Observed Simulated
Figure 8 Time series of observed and calibrated (posterior modal) runoff for the Abercrombie River obtained using BATEA with storm-dependent log SPM parameters: k and rMult.
estimated rainfall error profoundly affects the log SPM simulation of the groundwater store depth h. Over the calibration period, the SLS simulation forces the groundwater store to increase by about 400 mm, while in the BATEA simulation the groundwater store declines a modest
30 mm over the same two-year period. Because the log SPM model has no capability to lose water, the SLS simulation has to soak up the excess rainfall by storing it in the groundwater store. Such physically unreasonable behaviour of internal mode variables is a potential artefact
173
0.4 0.2 0 -0.2 log rMult -0.4 -0.6 -0.8 -1 -1.2 -1.4 -1.6 -3 -2 -1 0 1 2 3 Standard normal deviate z Standard normal deviate z
Figure 9 Normal probability plots for the calibrated (modal) storm-dependent parameters k and rMult [symbols] and tted Gaussian hyper-distributions [solid lines].
log k
Posterior diagnostics
While the results presented in Fig. 8 are encouraging (allowing storm-dependent variation in just two parameters significantly improved the t), it is necessary to probe using posterior diagnostics the specic assumptions made in this BATEA analysis. The major assumption is that the storm-dependent parameters are independent realisations from a log-normal distribution. This assumption can be readily evaluated since there are 71 storm epochs and hence 71 realisations. Fig. 9 presents normal probability plots for log k and log rMult along with the tted hyper-distributions, while Table 4 summarises the KolmogorovSmirnov statistics. In the case of log k, the underlying distribution is largely normal with the two outliers being largely responsible for the failure to conform to a normal distribution (it is clear that the tted hyper-distribution is signicantly affected by the outliers). However, the distribution of log rMult seems more complicated: while the normal distribution provides a reasonable rst approximation, there are clear systematic departures from normality in the lower tail and near the median. Nonetheless, visual inspection suggests that the log-normal hyper-distribution describes the storm-dependent variation of parameter rMult better than the variation in parameter k.
Table 4
Posterior diagnostics for BATEA run 3 Statistic Kolmogorov Smirnov Nonparametric runs test Kolmogorov Smirnov Nonparametric runs test Value of statistic 0.195 1.94 0.139 1.23 5% Signicance value (s) 0.105 1.96 0.105 1.96
Parameter k
rMult
Further insight can be gleaned from the time series plots of the storm-dependent parameters shown in Fig. 10 along with the nonparametric runs test statistics in Table 4. While the runs test statistics do not reject the hypothesis that the storm-dependent parameters are independent, inspection of the time series plots reveals that the outliers tend to cluster and that the model parameter values for consecutive storm events are often nearly identical. These second-order effects suggest that the denition of storm epochs requires further renement. More generally, the optimal denition of the time scale at which the model parameters vary stochastically is unresolved and will be investigated in future work. For example, sampling once a day seems less attractive than sampling at the beginning of a storm, since the latter represents the commencement of a forcing event and thus sets a natural time scale for the system. In the spirit of CRR modelling, the statistical representation of model uncertainty should be parsimonious, favouring fewer latent variables to avoid over-parameterisation (over-tting) and statistical illposedness. Finally, increasing the time resolution of the time dependent parameters raises the dimensionality of the objective function (14) and yields a progressively more difcult computational problem. The relationship between the input data and the associated latent variables is further explored in Fig. 11, which shows scatter plots of storm rainfall depth versus the storm-dependent parameters. While parameter k exhibits no signicant relationship with storm depth, parameter rMult appears to exhibit a statistically signicant relationship with storm depth the p value on the linear trend slope parameter is 0.0013. However, this is a tenuous relationship: if the two largest storms (storms 2 and 45) were removed, there would be no signicant relationship with storm depth. The evidence supporting a relationship between rMult and storm depth is therefore inconclusive. Overall, the assumption that k and rMult are independently and log-normally distributed seems reasonable in the Abercrombie case study. However, there is a clear need to accommodate outliers using distributions more kurtotic than the Gaussian distribution. Another assumption that can be tested concerns the likee jx; h1:n ; X e ; c. It is assumed that after lihood function p Q
174
-1.8 -1.9 -2 log k -2.1 -2.2 -2.3 -2.4 0
0.5 0.3 0.1 -0.1 log rMult -0.3 -0.5 -0.7 -0.9 -1.1 -1.3 -1.5 0 10 20 30 40 50 Storm epoch number 60 70
G. Kuczera et al.
10
20
60
70
80
80
Figure 10
Time series of the calibrated (modal) log SPM storm-dependent parameters: k and rMult.
-1.8 -1.9 -2 -2.1 -2.2 y = 3E-05x - 2.127 -2.3 -2.4 0 50 100 Storm rainfall (mm) 150 R = 0.0004
2
0.4 0.2 0 -0.2 log rMult -0.4 -0.6 -0.8 -1 -1.2 -1.4 -1.6 0 50 100 Storm rainfall (mm) 150 y = -0.0035x - 0.231 2 R = 0.141
log k
Figure 11
allowance has been made for storm-dependent parameters, the residuals (dened as the difference between observed runoff and runoff computed using the modal parameter values for each storm event) are independently and normally distributed with zero mean and a standard deviation of 0.25 mm. Fig. 12 presents a normal probability plot of the residuals. While the distribution of residuals is symmetric (with mean of 0.001 and standard deviation of 0.245 mm), its tails are considerably fatter than expected for a Gaussian
distribution. The generalisation of the normal model described by Box and Tiao (1973) and implemented in BaRE (Thiemann et al., 2001) could therefore be preferable. Finally, Fig. 13 shows the residual time series: while the autocorrelation is not signicantly different from zero, the nonparametric runs yield a test statistic of 12.37, strongly rejecting the assumption of independence. However, given the small magnitude of the residuals, this is a relatively minor issue.
175
Figure 12 Posterior check of the response error model: normal probability plot of the log SPM model residuals (dened as the discrepancy between observed and calibrated runoff).
Residual (mm)
Figure 13
Time series of the residual errors of the log SPM simulation with modal parameter values estimated using BATEA.
rainfall multiplier rMult dominates the overall predictive uncertainty. Conversely, the variation in the model responses due to parameter k is relatively minor, suggesting that in this case study the model error is dominated by the uncertainty in observed inputs.
176
14 12 Daily runoff (mm) 10 8 6 4 2 0 0 100 200 300 Days 400 500 600
G. Kuczera et al.
700
Figure 14 Time series of the median and 90% prediction limits due to the uncertainty in storm-dependent parameters k and rMult identied using BATEA.
14
Observed
Median
100
200
300 Days
400
500
600
700
Figure 15 Time series of the median and 90% prediction limits due to the uncertainty in storm-dependent parameter k only, identied using BATEA.
ent uncertainty in this description. This paper shows that such a framework can be built using formal Bayesian theory. GLUE and BATEA share the recognition that model error is signicant and difcult to characterise. However, the conceptual frameworks are fundamentally different. While BATEA is built on the error propagation framework shown in Fig. 1(a) and explicitly differentiates between input, response and model error, GLUE utilises parameter uncertainty to represent all sources of error. GLUE remains rooted to the deterministic CRR model, since each CRR time series is generated using time-invariant parameters sampled from the behavioural parameter set, whereas BATEA allows parameters to vary stochastically from storm to storm. The exclusive focus on parameter uncertainty in GLUE creates conceptual difculties in its derivation from a strict Bayesian perspective. For example, although GLUE uses Bayesian updating, its likelihood functions are not proper indeed they are often termed pseudo-likelihood functions in recognition that subjective goodness-of-t criteria are used to construct the likelihood function and that the likelihood function becomes independent of the number of observations. In contrast BATEA attempts to directly represent in-
put, response and model error within the standard Bayesian framework, making all assumptions explicit and open to challenge. Seen in this light, BATEA includes some of the philosophical basis of GLUE (which abandons the notion of single true parameters) and seeks to improve on it by explicitly disaggregating input, model and response error using formal Bayesian strategies.
Conclusions
The characterisation of model error in CRR modelling has been thwarted by the convenient but indefensible assumption that CRR models are deterministic descriptions of catchment dynamics. Explicit acceptance that CRR models are fundamentally stochastic paves the way for a more rational characterisation of model error. This paper argues that the uxes in CRR models are fundamentally stochastic because they involve spatial and temporal averaging. The challenge is to characterise this stochasticity in a way that is consistent with available evidence and is statistically and computationally tractable.
Towards a Bayesian total error analysis of conceptual rainfall-runoff models We proposed the hypothesis that the structural error of CRR models can be characterised by storm-dependent random variation of one or more CRR model parameters. A sensitivity analysis was designed to identify the parameters most likely to vary between storms. A Bayesian hierarchical model (BATEA) was then developed to explicitly differentiate between input, response and model error using stormdependent parameters. The hypothesis that storm-dependent parameters are independent and log-normally distributed was evaluated in a case study. Posterior diagnostics showed that this hypothesis is reasonably consistent with the evidence, although the need to deal with outliers was recognised. This study moves one step closer to a total error formalism that (i) enables a rational assessment of predictive uncertainty, (ii) allows rigorous testing of competing CRR model hypotheses, and (iii) removes parameter biases that can confound regionalisation of CRR parameters. Nonetheless, signicant problems remain, most notably, optimal characterisation of the apparent stochasticity of CRR models and the identication of the time scale at which this stochasticity operates. The intuitive approach of stormdependent parameters proposed in this study is in general agreement with the evidence, yet it may be possible to derive more rigorous stochastic formulations by investigating the mechanics of spatial and temporal averaging. The computational issues of accommodating stormdependent parameters are formidable because the dimensionality of the problem depends on the number of storms in the calibration time period. In this study, we overcame these difculties by selecting a numerically smooth CRR model amenable to optimisation using computationally fast Newton-type methods. We are presently working on methods that avoid the growth of the dimensionality of the problem and permit the use of more common numerically nonsmooth CRR models.
177
Acknowledgment
This work was partially funded by a grant from the Australia Research Council.
References
Beven, K.J., Binley, A.M., 1992. The future of distributed hydrological models: model calibration and uncertainty prediction. Hydrological Processes 6, 279298. Box, G.E.P., Tiao, G.C., 1973. Bayesian Inference in Statistical Analyses. Addison-Wesley, Boston, MA.
Bras, R.L., Rodriguez-Iturbe, I., 1985. Random Functions in Hydrology. Addison-Wesley. Duan, Q., Sorooshain, S., Gupta, V.K., 1992. Effective and efcient global optimization for conceptual rainfall-runoff models. Water Resources Research 28, 10151031. Gelman, A., Carlin, J.B., Stern, H.S., Rubin, D.B., 1995. Bayesian Data Analysis. Chapman and Hall. Kavetski, D., Franks, S.W., Kuczera, G., 2002. Confronting input uncertainty in environmental modelling, in calibration of watershed models. In: Duan, Q., Gupta, H., Sorooshian, S., Rousseau, A., Tourcotte, R. (Eds.), Water Science and Application Series 6. American Geophysical Union, Washington, DC, pp. 4968. Kavetski, D., Kuczera, G., Franks, S.W., 2003. Semi-distributed hydrological modelling: a saturation path perspective on TOPMODEL and VIC. Water Resources Research 39 (9), 1246 1253. doi:10.1029/2003WR00212. Kavetski, D., Kuczera, G., Franks, S.W., 2006a. Calibration of conceptual hydrological models revisited: 1. Overcoming numerical artefacts. Journal of Hydrology 320 (12), 173186 (The model parameter estimation experiment MOPEX, Sapporo, Japan, Edited by Schaake, J., Qingyun Duan). Kavetski, D., Kuczera, G., Franks, S.W., 2006b. Calibration of conceptual hydrological models revisited: 2. Improving optimisation and analysis. Journal of Hydrology 320 (1 2), 187201 (The model parameter estimation experiment MOPEX, Sapporo, Japan, Edited by Schaake, J., Qingyun Duan). Kavetski, D., Kuczera, G., Franks, S.W., 2006c. Bayesian analysis of input uncertainty in hydrological modelling. I. Theory. Water Resources Research 42, W03407. doi:10.1029/2005WR004368. Kavetski, D., Kuczera, G., Franks, S.W., 2006d. Bayesian analysis of input uncertainty in hydrological modelling. II. Application. Water Resources Research 42, W03408. doi:10.1029/ 2005WR004376. Kuczera, G., 1990. Estimation of runoff-routing model parameters using incompatible storm data. Journal of Hydrology 114 (12), 4760. Kuczera, G., Franks, S.W., 2002. Testing hydrologic models: fortication or falsication? In: Singh, V.P., Frevert, D.K. (Eds.), Mathematical Modelling of Large Watershed Hydrology. Water Resources Publications, Littleton, Co. Nocedal, J., Wright, S.J., 1999. Numerical Optimization. SpringerVerlag, New York. Thiemann, M., Trosset, M., Gupta, H., Sorooshian, S., 2001. Bayesian recursive parameter estimation for hydrological models. Water Resources Research 37, 25212535. Thyer, M., Kuczera, G., Bates, B.C., 1999. Probabilistic optimization for conceptual rainfall-runoff models: a comparison of the shufed complex evolution and simulated annealing algorithms. Water Resources Research 35 (3), 767773. Vrugt, J.A., Diks, C.G.H., Gupta, H.V., Bouten, W., Verstraten, J.M., 2005. Improved treatment of uncertainty in hydrological modelling: combining the strengths of global optimization and data assimilation. Water Resources Research 41, W01017. doi:10.1029/2004WR0030059.