You are on page 1of 9

VOL. $, NO.

4 WATER RESOURCES RESEARCH FOURTH QUARTER 1967

Mathematical
Assessment Synthetic
Hydrology

N. C. MATALAS

U.S. GeologicalSurvey
Washington,D.C.

Abstract. To generate multivariate synthetic sequencesthat will resemble multivariate


historic sequencesin terms of means,standard deviations,skewnesses, lag-one serial correla-
tion, and lag-zero cross-correlations,
a multivariate, weakly stationary processis used. Tech-
niques of regionalizationand maximum likelihood estimatesmay be used to minimize the
biasesassociatedwith the estimatesof the parameters that characterizethe historic sequences.
(Key words:Statistics;hydrology;synthesis;simulation)

Introduction. To design a water resource Fiering, 1966]. Because simulated sequences


system,the planner must take into considera- can be made as long as desired,they may be
tion the interactionsamongthe designvariables dividedinto a large numberof segmentsto ob-
and the hydrology. Becauseof the complexity tain a large set of responsesto aid decision-
of the interactions,analytical evaluationsof a makingwith respectto the designand operation
system's performance are not feasible. Tech- of a system. For simulated sequencesto be
niquesof simulationafford the plannera means useful, they must bear a resemblanceto the
of making these evaluations.Although these historical sequencesin terms of certain proper-
techniquesare not new, their rather widespread ties that adequately characterizethe historic
use today in the designof water resourcesys- sequences. Althoughmany propertiesare needed
tems is due largely to advancesin electronic to characterize historic sequences,only those
computers.With modern high speedelectronic that exert a meaningful influenceon the sys-
computers,the planner may simulate a sys- tem's designneedbe considered.
In the follow-
tem and investigatealternative plans in greater ing paragraphs,the mathematicalfeaturesof
detail than was previously possible. various models for generating synthetic se-
If historic hydrologic sequenceswere very quencesare investigated.
long,they couldbe dividedinto segments whose Markovtan generatingprocesses.The basic
lengths would correspondto the system'seco- model used for generatingsynthetic sequences
nomic time horizon.The varioussegmentscould is the lag-oneMarkov process,
whichis defined
be used to obtain a set of systemresponsesthat as

would provide a means of determiningthe ex-


pected responseand the variability of the re-
sponses. However, the longest historic se-
--[1 -- 0,"(1)]'/"O'xei+
1 (1)
quence rarely will span more years than the
system's economic time horizon, so that his- wherex and x+ denotethe eventsat the time
toric sequencescan provide, in almost all pointsi and i -- 1, respectively,/ and are
practical situations,one and only one response. the mean and standard deviation of x, respec-
Since the historic sequenceis very unlikely tively, p(1) is the lag-oneserial correlation
to be repeatedin the future, the singleresponse coefficientfor x, and e+is a randomcomponent
obtained from this sequenceis unlikely to be that has zero mean and unit variance and is
representativeof the system'sfuture response. independentof x. The values of t, a, and
To obtain a set of system responses,simu- px(1)are unknown,but they may be estimated
lated hydrologicsequences, called syntheticse-
from the historic events. If t, a, and p(1)
quences, may be employed[Thomasand Fier- .are replacedby their estimates,, J, and
ing, 1962,1963; Fiering, 1966; Hu/schmidtand p(),
1 respectively, then equation1 canbe used
, 937
938 r. c. MATALAS

tO generatesynthetic events that resemblethe ess [Papoulis, 1965] that may be used to gen-
historic events in terms of these estimates
erate synthetic events that will resemble the
in the followingmanner. The most recent his- historicevents in terms of , , x, and (1).
toric event is representedby xi, and a value If the lag-one Markov processis taken to
of ei+ is randomly selectedfrom a population representa strictly stationary process,then the
that has zero mean and unit variance. With probability distributions of x and x+ must be
these two values, equation i yields a value for considered.With a strictly stationary process,
x+, the first synthetic event. This synthetic the distribution of xi is identical to x+ for
event now assumesthe role of x and with the all valuesof k, where]kl -- 0, 1, ... [Papoul;s,
random selection of a new value of e+, equa- 1965]. If x is assumed to follow a skewed dis-
tion i yields a new value for x+, the second tribution, then the assumption of strict sta-
synthetic event. This procedure is repeated N fionarity canbe usedto generatesyntheticevents
timesto obtaina syntheticsequenceof N events. that will resemble the historic events in terms
As N tends to infinity, the estimatesof x, , of , , , and x(1). Whether a particular
and px(1) obtained from the synthetic events skeweddistributionhas any physicalrelevance
will approach, , and p(1). is immaterial as long as interest is confined to
In generatingthe synthetic events, the only , , , and (1). In line with this interest,the
requirementimposedon the values of e+ was reasonablenessof stationarity of order greater
that the values be randomly selected from a than 3 need not be questioned.The assumption
populationthat has zero mean and unit variance of strict stationarity is imposedmerelyto provide
for all valuesof i. This requirementlimits equa- a means of generating synthetic events, in a
tion i to representingwhat is called a weakly rather straightforward manner, that will resem-
stationaryprocess[Wold,1954].For sucha proc- ble the historic events in terms of x, , ,
ess, and r are independentof the absolute and (1). Becausethe gamma and log-normal
valuesof i, and the lag-k serialcorrelationcoeffi- distributionsare widely usedto approximatethe
cient,px(k), ]k] - 0, 1, 2, .-. , that pertainsto distributions of hydrologic events, these two
the events x and xi+ dependsonly upon the distributions will be considered to illustrate the
absolutetime difference[i -- (i q- k)] for all use of strict stationarity in the generationof
values of i and k. synthetic events.
If resemblancebetween the synthetic events Let Yi denote a set of random variates, j =
and the historic events is to be extended to the 1, .-. , m, where each variate is generatedby a
skewhessof x, equation i must be modified. lag-oneMarkov process
This modification can take one of several forms,
amongwhich are the following. Y,+,,i = P,(1)yl ,i 'q'-[1 - p,'(1)"/= (4)
To considerskewness,Thomas and Fier{ng
[1963] replacedthe random componente+, by wherey is normally distributedwith zero mean,
+,, whichis definedas unit variance, and has lag-one serial correlation
coefficientequal to pv(1) for all values of j.
e+,.i is normally distributed with zero mean
and unit variance and is independentof y.
wherethe skewness of $, denotedby q,e,is related for all values of j. In terms of first and second
to the estimate of the skewhessof x, denoted order moments of Yi, equation 4 representsa
by -, by weakly stationary process;however, with the
assumptionthat Yi is normallydistributed,equa-
[1 -- a(1)]
(3) tion 4 representsa strictly stationary process,
'Y= [1 -- ,8'(1)]
a/'% sincefor a normal processthe processis com-
If r/+ is assumedto be normally distributed pletely defined by the first and secondorder
with zero mean and unit variance, then + is moments[Papcults, 1965].
approximatelydistributedas gamma with zero For each value of j, equation4 may be used
mean, unit variance, and skewhess3'. With to generatea sequence of y's, wherethe sequences
+ as the random component, the lag-one are mutually independent,so that the random
Markov processis a third-orderstationaryproc- variate z, definedas
SyntheticHydrology 939

mean and unit varianceand is independentof


Zi Yi, i 5 y. In termsof x, the generatingprocessis
is distributed as gamma with , -- m, az -- x,+ - a - {exp [z(1-- p)]}
2m,% - 2 //m, andp(1) = p(1)[Yevd-
jevich, 1966]. To transform the z-sequenceinto (x,- a)p $+ (11)
a sthetic x-sequence,the valuesof x and z are
where p- py(1) and $i+,- exp {[1 --
related by
p(1)]%e+}. Under the assumptionthat x
x. + (m/, and $+are independent

where the x's have mean , sandard deviation p(1) = [exp[ap] - 1]/[exp [a2] - 1]
, coecient of skewess = (8/m)TM,and (12)
lag-one serial correlation coecient p%(1)
The procedurefor generatingsyntheticevents
p(1) : p(1). If , d, and (1) denote he that will resemble the historic events in terms
parametersfor a historic sequence,then the
of x, dx,, and (1) is as follows.The valuesof
sthetic events generatedby equation 6 will
resemble the historic events in ter of these
, , , and x(1) are set equal to the right-
hand sidesof equations7, 8, 9, and 12, where-
parameters.For the events generatedby equa-
upon the solutionsof these equationsgive the
tion 6, the coecient of skewness ' is a function
valuesof a, g, a, and p(1). With the valuesfor
of m, and becausem must be an integer,
/, ay, and py(1), equation 10 may be used to
cannot be set equal, in general, to the historic
generate a sequenceof y's. To thej antilog of
coecientof skewhess . If 2 , then eachvalue of y, the value of a is added to obtain
the procedureoutlined above cannot be used
the synthetic sequenceof x's that will resemble
to generatesynCeric eventsthat are distributed
the historic sequencesin terms of , , ,
as gamma.
and (1). If the valuesof a, g, a, and p(1) had
If the historic events are assumed to follow
been obtained from the logarithms of the his-
a 3-parameterlog-normaldistribution,then syn- toric events rather than in the manner out-
thetic events may be generated that are so
lined above, then equation 10 would lead to a
distributed and that resemble the historic
syntheticsequencethat would not resemblethe
eventsin ter of , d, , and (1).
Assume a to be the lower bound of a random
historicsequencein termsof , dx,, and (1).
The importanceof synthetic events following
variate x, where (x- a) is log-normally dis-
a particular probability distributionis hard to
tributed, so that y: log (x -- a), where loga- evaluate. In studies that deal with truncated
rithm is to the base e, is normally distributed.
events, as would be the case when low flow
For the randomvariate x, the mean, variance
augmentationis being considered,the probabil-
, and the coecient of skewness are related
ity distribution of the synthetic events might
to the lower bound a and to the mean and
be quite important. In such studies,interest is
variancea for the random variate y by focused on the distribution of the durations and
volume deficiencies that are associated with flows
= a exp[1/2a ] (7)
2
that are lessthan certainspecifiedvalues.These
a exp[2(a )] -- exp[a 2y] flow characteristics
for the syntheticeventsmay
(s) not resemble those for the historic events if
only the parameters, d, x, and (1) are con-
exp[3a] -- 3 explay] 2 sideredin the generationof the syntheticevents.
= {exp
[a
]-- 1}/ (9)
A strictly mathematicalapproachto determining
[Aitchisonand Brown, 1957]. Assumey instead whether the probability distribution of the his-
of x to be generatedby lag-oneMarkov process toric events must be consideredin the genera-
tion of the syntheticeventsdoesnot appear to
- = - be feasible. Perhaps the best approach is an
operational one' in a particular situation use
+ [1 - (10)
syntheticsequences that do and do not consider
where + is normally distributed with zero the probability distribution of the historic
90 . C. MATALAS

eventsand determinethe extent to whichsystem E[e+] = 0 sinceE[x+] = E[x,] = 0, where


performanceis affected.Since the true probabil- E denotesthe mathematicalexpectationof the
ity distributionfor hydrologiceventsis unknown, argumentin brackets.If both sidesof equation
more than one distribution should be used to 13 are postmultipliedby x% the transposeof
determine whether the planning decisionsare x, then the expectationsof the various matrix
sensitive to the choice of distribution. productslead to
Multivariate generatingprocesses.The mate-
rial discussed abovepertainsto the generationof M- AMo (14)
synthetic events at a particular station. To where M0 = E[xx'] and M = E[x+xi'].
generate synthetic events for more than one M0 is an (m X m) matrix whosediagonalele-
station, the cross-correlation between the his- ments are the variances:), p - 1, ... , m,
toric events at different stations must be con-
and whoseoff-diagonalelementsare the covari-
sidered as well as the estimates x, x, , and
ancesx()(o(0)()(o, p, q = 1, ... , m, where
j(1) for the historic events at each station. p q. M is an (m X m) matrix whosediagonal
If the cross-correlationswere all equal to zero,
elementsare p,)(1):(), p = 1, ... , m, and
then the syntheticeventsat eachstationcould whoseoff-diagonalelementsare ()(q)(1)(q),
be generatedindependentlyin the manner out- where )(o(1) is the lag-one cross-correlation
lined above. Within a given river basin, the
betweenx) and x(O, p, q - 1, ... , m, where
cross-correlations
are unlikely to be zero, so that pq.
a multivariate generatingprocessis needed to If the elementsof e+are mutuallyindepend-
handle the nonzero cross-correlations.
ent with zero mean and unit variance, E[e+
Before the multivariate generating procedure e+] - I, an (m X m) matrix wherein each
is discussed, the followingnotation is introduced. diagonalelementequalsunity andall off-diagonal
Let x(), p = 1, ..., m denotethe randomvariate elements equal zero. Thus, if both sides of
that pertainsto the pth stationin a river basin. equation 13 are postmultiplied by x+', the
From the historic events for x(p), the estimates
expectationsof the variousmatrix productslead
of the mean, standard deviation, coefficientof to
skewhess,and lag-one serial correlation coeffi-
cient are denotedby ), d), (p), and ix(p)(1), Mo = AMx + BB' (15)
respectively.The lag-zero cross-correlation be-
tween the historic events at stations p and Where
M, andB arethetransposes
ofM, and
q, p, q = 1, ... , m, is denotedby j(p)(q)(0). B, respectively.
There are various ways of generating mul- The elementsof the matrix A are given by
tivariate synthetic sequences,but a rather
A = M, Mo-' (16)
straightforwardway is basedon a multivariate
weakly stationary generatingprocess(Almon' where Mo- is the inverseof Mo. M0, the vari-
personalcommunication) that is definedas ance-covariancematrix, is a nonsingularmatrix,
and therefore its inverse exists. The elements of
x,+ - Ax, + Be,+ (13) the matrix B are given by the solutionof
In equation13,x+ andx are (m X 1) matrices BB'= Mo- MMo-Mx (17)
whosepth elementsare x+(p)--/(p) andxi(p)--
g?), respectively,wherex+) and xi) denote The techniques of principalcomponent anal-
the eventsof x(p)at the time pointsi -]- I and ysis[Harman,1960;Kendall,1961]may be used
i, respectively.The random componente+ is to solve equation 17 for B. Ordinarily these
an (m X 1) matrix whoseelementsare inde- techniquesare appliedto the matrix M0, but
pendentof the elementsof x. A and B are here they must be appliedto the matrix M0 --
(m X m) matrices whose elementsmust be MMo-M '. If the matrix B is determinedby
defined in such a way that the multivariate the techniques of principalcomponent analysis,
then
syntheticsequences generatedby equation13
will resemble the multivariate historic se-
quencesin termsof ), x), x), )(1), and B'B= X (18)
p)(o(0) for all valuesof p and q. where k is an (m X m) diagonal urntrix
SyntheticHydrology 941

whose elements are the eigenvaluesof M0 -- whoseeventsoccurat the time points (i q- 1).
MM M. Thus, if px)(q)(1)is replacedby x)(q)(1)in the
Insofar as synthetic hydrologyis concerned, matrix M, then the multivariate synthetic se-
the elementsof B carry no physicalsignificance. quencesgeneratedby equation13 will resemble
Let B denote the matrix obtainedby the prin- the multivariate historic sequences in terms of
cipal componentsolution of equation 17 and x(p),,(), )(1), and ,)()(0), but not in terms
let 0 denotean (m X m) matrix suchthat 00 - of p)(q)(1). If )(q)(1) is of no interest,p(p)(q)(1)
I, where I is an (m X m) identity matrix. A may be taken to be x)(q)(1),definedby equa-
matrix B*, definedas tion 23, thus avoidingthe actual computationof
)(o(1) from the historicevents.
B* -- BO (19) To account for the coefficients of skewness
where ) the assumptionis made that x) followsa
3-parameterlog-normaldistributionwith lower
B'B* ' = BB' ' (20) bound a(p), so that y) = log (x(p) -- at)) is
may be usedin placeof B in equation13. There normallydistributed,wherelogarithmsare taken
existsmore than one matrix 0 suchthat 00 = I, to the base e. The mean, standard deviation,
coefficientof skewhess,and the lag-one serial
and thereforemanyB* matricesmay be defined,
correlation coefficientfor x) are denoted by
any one of which may be usedin place of B in
equation 13. ), ), %,
a ) and x)(1), respectively,and
The matfleesA and B are functionsof only two for y), the mean, standarddeviation, and lag-
one serial correlation coefficientare denoted by
matflees, Mo and M, that involve only second
order moments. The first order moments tz,), a,(p),andp,(p)(1),respectively.The relations
havebeenaccounted
for in the definitions
of betweenthe parametersof x(p)and y(p)for each
matflees x+ and x. The matrix M involves value of p are given by equations7, 8, and 9.
the lag-one serial correlation coefficientsand The lag-zerocross-correlation betweenx(p) and
the lag-one cross-correlation coefficients.If the x(q>,>(,(0), is related to the lag-zero cross-
lag-onecross-correlations are of no interest,then correlationbetweeny) and y(O,or p?)(q)(0), by
the task of computingthe elementsof M may
be facilitated in the followingway.
The matrix A may be taken to be a diagonal exp[a(P)a()p(p)()(0)]-- 1
matrix whosediagonalelementsare the lag-one {exr [a,(>
] -- 1}/{ exr [,'q'] -- 1}/
serial correlation coefficients'the (p, p)th ele-
ment of A is p ) ()1. With A so defined, the
pth and qth elementsof x+ are The y(p),p = 1, ... , m, are assumedto be
representedby the multivariate weakly sta-
() () () ' () (21) tionary process

Y+ = Y + s' + (25)
- + (22)
which is analogousto the processrepresentedby
whereb., and b., are the (p, s)th and (q, s)th equation13, wherethe elementsof A and B
may be detersned in the manner indicated
elements of B, respectively.If both sides of
above for the mirices A and B of equation 13.
equation21 are multipliedby x(q),then the The solutionfor B' vol yes lhe (m X m) mtrk
expectationsof the various productslead to
M/, whoseelementsare composedof the lag-one
- ()( a ()( ()
(1) serial correlationcoefficientsof y(p),p = 1, .-- ,
m, and the lag-onecross-correlation coefficients
where )(q)(1) denotes the lag-one cross-cor- of y(p)and y(O,p, q = 1, ... , m. If the lag-one
relation for the events generatedby equations cross-correlationsof x() and x(q)are of no interest,
21 and 22. Equation23 showsat for lag-one then the lag-one cross-correlations of y>) and
Markov processes, e lag-onecross-correlationy{q),p)<q)(1), may be definedas
the product of the lag-zerocross-correlation
and the lag-oneserialcorrelationof the variate
942 . c. MATALAS

To generate multivariate synthetic sequences z(e) = [(1 -]-


y), p - 1, ... , m, by means of equation 25,
the parametersof y(P),p -- 1, ... , m, must be
determined in the manner discussed above for and relations for e() and e() in terms of () and
the univariatelog-normalcase.The multivariate x () are

syntheticsequences y), p - 1, ... , m, are


() [2(1-]- R)]-/(x') -]--x()) (30)
transformed into multivariate synthetic se-
quencesx(p), p = 1, ... , m, by taking the e() = [2(1 -- R)]-/(x()-- x()) (31)
antilogs of the synthetic events of y) and The details for deriving theserelationsare given
addinga(p)to eachfor all valuesof p. by Kendall [1961].
The assumptionsof log-normalityand weak The relations expressed by equations 28
stationarityare not necessarily
physicallysignifi- through 31 are based on the assumptionthat
cant. These assumptionsmight be called opera- the elements of e are mutually independent.
tional assumptionsthat are made to achieve a There is a weaknessin this assumptionthat
certain goal: to generatemultivariate synthetic will be discussedlater. Fiering used equations
sequencesthat will resemble the multivariate 30 and 31 to generate 'historic' events for e()
historicsequences in termsof certainparameters. and e(") to obtain the lag-one serial correlation
The use of multivariate analysistechniques coefficientsp,()(1) and p,(")(1). From the as-
for synthetichydrologywas first introducedby sumption that e() and e(') are independent,
Fiering [1964].The principalcomponentanalysis synthetic events for e() and e(') are generated
presented by Fiering yields multivariate syn- by independentlag-one Markov processes. The
thetic sequences that resemblethe multivariate synthetic events for e() and e(') are converted
historic sequencesin terms of into syntheticeventsfor x (x)and x(') by means
,(p)(q)(0),but not in terms of ,)(1), p, q = of equations28 and 29.
1, ... , m. To showthat resemblancein terms of The lag-one serial correlation coefficientsfor
(p)(1) is not achieved,the case of only two e() and e(') are related to p and r by
variates, x() and x("), will be considered.To
simplifynotation,let R = x()(2)(0),p = x()(1), p,(1)(1) = (1) = (p -]--r)/2 (32)
and r = ,()(1). Without loss of generality, Under the assumptionthat e() and e(') are gen-
the assumptionis made that ,() = fix(") = eratedby independentlag-oneMarkov processes,
/z,() = /z,()= 0 and ()= e.)= a,()=
the relations between the lag-one serial correla-
,(") = 1, where ,() and ,() are mean and tion coefficientsfor the synthetic events of x()
standard deviation, respectively,for the first and x(') in terms of p and r are
principal componente(x),and g,() and a,(') are
the mean and standarddeviation,respectively, () (1) = ,()(1) = (p -]- r)/2 (33)
for the secondprincipalcomponente(). where () and x(") denote the lag-one serial
Fiering took as his model
correlation coefficientsfor the two synthetic
sequences.Equation 32 showsthat the lag-one
, = serial correlation coefficients for e() and e(') are
equal, and equation 33 showsthat the lag-one
where, in general, x is an (m X 1) matrix, C serial correlation coefficientsfor the synthetic
is an (m X m') matrix, and e is an (m' X 1) events of x() and x(') are equal. In general,
matrix, wherem' _ m. Whether m' is equal to ()(1) p and (")(1) r. Fiering (personal
or lessthan m dependsupon whether the rank communication) has recognized the limitation
m' of Mo - E[xix] is equal to or lessthan the of his multivariate techniquefor synthetic hy-
order m of Mo. drology and is revising his computer programs
For m -- 2 and R' ,( 1, m' -- 2. Relations for along the lines discussedearlier.
x() and x(') in terms of e() and e(') are
A brief examination of the assumptionthat
the elementsof e are mutually independentis
x() [(1 now considered.If both sidesof equation 27 are
postmultipliedby x_, then the expectations
d-' [(1 R)/2]z2 (28) of the variousmatrix productslead to
SyntheticHydrology 943

M = CJC' (34) the extent to which) is likely to deviate from


() depends
uponthe magnitudeof the standard
whereJ is an (mt X mt) matrix whosediagonal
elementsare the lag-oneserialcorrelationcoeffi- If a syntheticsequenceis dividedinto segments
cientsfor e(p'),pt _ 1, ... , mt, and whose of length n, then each segmentwill yield an
off-diagonalelementsare the lag-onecross-cor- estimate,not of/(p), but of ). The segment
relation coefficientsfor e ') and e(q'), p,qt __ estimates will be distributed about (p) and
1, ... , m'. In general,the off-diagonalelements will approach) as n tends to infinity. For
of J are not zero.Moreover, if x), p = 1, ..-, m, segmentsof syntheticevents,) acts as the
follow a multivariate lag-one Markov process, populationvalue of the mean.
then e('), pt - 1, -. , mt will in general not In terms of momentsabout the origin, the
follow a multivariate lag-one Markov process. meanis simplythe first moment,the standard
Becausethe off-diagonalelementsof J are not deviation is a function of the first and second
equalto zero,eventhoughee = I, the events order moments,and the coefficientof skewhess
for e') cannot be generatedindependentlyof is a functionof the first, second,and third order
the events for (q'). If the off-diagonalelements moments. The various correlation coefficients
are taken into account,then principalcomponent are functions of the first and second order mo-
analysisapplied to equation27 will lead to ments and the time lag between events. For
multivariate synthetic sequences that will re- parametersthat are definedin termsof moments
semble the multivariate historic sequencesin and time lags, their standard errors increase
terms of ()(1) as well as in terms of ), ), with an increase in the order of the moments
and px(p)(q)(0).This change is being intro- and timelag formingthe parameters. Thuswhat
ducedinto the Fiering programs. has been said about bias relative to ?) applies
The multivariate techniquesdiscussedabove to the other parametersas well, but to a greater
were introducedby assumingx), p -- 1, ..' , m, extent. The larger the standarderror is, the
to denote historic hydrologic sequencesat m larger the bias is likely to be.
stations.However, the techniquesare quite gen- In addition to the parameters discussed
eral, and they may be appliedto the casewhere above,otherscouldbe usedto characterize his-
m pertainsto seasons or a combination ofstations toric sequences.However,theseadditionalpa-
and seasons. rameters involve higher order momentsand
Parameter estimation. In the above discus- larger time lagsand consequently larger stand-
sions,resemblance betweensyntheticand his- rd errors. Because biases tend to increase with
toric sequences was limited to the parameters an increasein standard errors, the usefulness
), ), ), (p)(1), )()(0), andp()()(1), of these additionalparametersis of doubtful
p, q -- 1, -.. , m, wherem _ 1. From a statis- value.
tical point of view, eachof theseparameters is Bias can be minimized but never eliminated.
an unbiased and consistent estimator of its
One techniquefor minimizingbias is regional-
respective populationvalue,but with respectto ization,which takesthe form of relatingthe
syntheticsequences eachof the parametersis a parameters,estimatedfrom the historic se-
biasedestimate.To illustrate this bias, consider
quences at a numberof sitesin a basin,to cer-
(p), the estimateof /), obtainedfrom the tain physiographic and meteorologic
character-
historicsequence x(p), that consistsof n events. isticsof the basin [Bensonand Matalas, 1967].
The estimate(p), given by From the characteristics of the subbasins that
pertainto the varioussites,theserelations
yield
() - x,()/n
i=1
(35) values of the parametersthat considerboth
the temporaland spatialvariationthat is in-
is a statisticallyunbiasedestimatorof/(p), be- herent in the historicsequences.In effect,what
causethe expectedvalue of ) equals regionalizationis doingis extending the lengths
and a consistentestimator,becauseits standard of the historicsequences, thereby reducing the
errora)/V approaches
zeroas n tendsto standard errors of the parameters.
infinity. However,for a givenhistoricsequence For particular parameter,it may not be
of n events,) is unlikelyto equal/), and possibleto establisha regionalrelation.If the
variation of the values of the parameter ob- not known. In some cases,it is possibleto
tained from the various historic sequences assessthe differenceson strictly mathematical
can be attributed entirely to chanceas a result grounds,as was done for the variousgenerating
of the finite lengths of the historic sequences, models considered above. When appropriate
then the averageof thesevaluesmay be taken statistical tests cannot be devised and when
as the regional value of the parameter. mathematicalassessment is not possible,a sensi-
Another technique for minimizing bias is tivity analysis might be used to determine
the use of maximum likelihood estimates of the whether discrepanciesbetween values of syn-
parameters, for the standard errors of these thetic and historicparametershave an influence
estimates are smaller than those for estimates on planningdecisions.
based on moments. Maximum likelihood esti- Summary and conclusions. In the above dis-
mators are not always statistieally unbiased, cussions, variousmodelsfor generatingsynthetic
and they cannot be determinedwithout making events were considered.The modelsrepresented
an assumptionabout the probability distribu- stationary processes,in which the order of
tions underlying the events of historic se- stationarity dependedupon the parametersthat.
quences.If maximum likelihood estimators are were usedto characterizethe historic sequences.
to be used,then the model for generatingsyn- For processesthat are strictly stationary, an
thetic eventsmust representa strictly station- assumptionmust be made about the probability
ary process. distributions that underlie the sequencesof
Earlier a strictly stationary processwas used historic events. As long as interest is limited
to generate synthetic events that would re- to the parameters (defined in terms of mo-
semble the historic events in terms of certain ments) that are used to characterize historic
parametersbasedon moments.For this process, events, the choiceof distribution for a strictly
an assumptionwas made about the probability stationary process is unimportant, since the
distribution underlying the historic events. As choicedoesnot affect the values of the param-
long as interest is limited to these parameters, eters.
the assumptionof the probabilitydistributionis Multivariate stationary processeswere used
merely an operational assumption made to to generate multivariate synthetic sequences,
achieve the desired resemblance between the where the sequencesmay pertain to stations,
synthetic and historic sequences.The choice seasons,or a combinationof stationsand sea-
of distribution, as long as a strictly stationary sons. The techniquesof principal component
generatingprocesscan be constructed,is unim- analysiswere mentioned as a convenientmeans
portant, since the choice does not affect.the of determiningthe parametersof the processes.
values of the parameters.However, maximum The set of parameters determined by these
likelihood estimatesare not independentof the techniquesmay be transformedinto many sets,
probability distribution,so that the assumption each of which may be used to generatemulti-
of a particular distribution is not an operational variate synthetic sequencesthat will resemble
assumption,unless system performance is in- multivariate historic sequencesin terms of cer-
sensitive to the choice of distribution. tain parameters that are used to. characterize
A syntheticsequenceconsistsof a finite num- the multivariate historic sequences.
ber of events, where the number of synthetic Although the parametersusedto characterize
eventsis large relative to the numberof events the historic sequencesmay be statistically un-
associatedwith historic sequences.The values biased, they are biased with respect to syn-
of the parametersobtainedfrom the synthetic thetic hydrology. The parameters determined
sequences will differ from the values for the from the historic sequencesare sample esti-
respectiveparameters obtained from the his- mates of their respective population values,
torie sequences, but thesedifferences shouldnot but the sample estimates act as population
be greater than what is expectedby chance. values for the synthetic sequences.Because
Various statistical tests can be used to evaluate the sampleestimatesare unlikely to equaltheir
these differences,but the results of these tests respectivepopulation values, the estimatesare
may not be conclusive, particularly when the operationally biased.
samplingdistributionsfor the parametersare Operational bias is not a consequence of the
SyntheticHydrology 945
generating model; however, becauseof this Fiering, M. B, Multivariate techniquesfor syn-
bias, a practical limit is placedon the complex- thetic hydrology, J. Hydraul. Div., Am. Soc.
Civil Engrs., 90, I-IY5, 43-60, 1964.
ity of the model, where complexity refers to Fiering, M. B, Synthetic hydrology--An assess-
the number of parametersused to characterize ment, in Kneese and Smith, eds., Water Re-
the historic sequences. Parametersthat are de- search, Johns I-Iopkins Press for Resourcesfor
fined in terms of high order momentsor large .the Future, Baltimore, Maryland, 331-341, 1966.
time lags are subject to large standard errors ISarman, I-I. I5., Modern Factor Analysis, Uni-
versity of Chicago Press,Chicago, Illinois, 1960.
and consequentlylarge operationalbiases.Op- ISufschmidt, M. M., and M. B. Fiering, Simula-
erational biases can never be eliminated, but tion Techniques yor Desigr o Water-Resource
they can be minimized by the use of region- Systems,Harvard University Press,Cambridge,
alization to accountfor the temporal and spatial Massachusetts,1966.
variationsinherentin the historicsequences and Kendall, M. G., A Coursein Multivariate Analy-
sis,CharlesGriffin and Company,Limited, Lon-
by the use of maximum likelihoodestimatesof don, England, 1961.
the parameters used to characterizethe historic Papoulis, A., Probability, Random Variables, and
sequences. The use of maximum likelihoodesti- Stochastic Processes,McGraw-I-Ifil Book Com-
matesis somewhatimpairedby the needof mak- pany, New York, 1965.
Thomas, I-I. A., and M. B. Fiering, Mathematical
ing an assumptionabout the probability dis- synthesisof streamflowsequences for the analy-
tributionsthat underlie the historicsequences. sis of river basins by simulation, Chap. 12,
This assumptionis not an operational one, Design oy Water-ResourceSystems,A. Maass,
because the maximum likelihood estimates will M. YIufschmidt, R. Dorfman, I-I. Thomas,
vary with the choiceof distribution. S. Marglin, and G. Fair, YIarvard University
Press, Cambridge, Massachusetts,1962.
Acknowledgments. This study was undertaken Thomas, YI. A., and M. B. Fiering, Statistical
in response to questions about the resemblance analysis of reservoir storage-yield relations,
between synthetic and historic sequencesposed Chap. 1, Operationsresearchin water quality
by R. K. Davis, B. T. Bower, and R. M. Stein- management,Final Report to Bureau oy State
berg of Resources for the Future. Discussionson Services,Div. of Water Supply and Pollution
multivariate time series analysis with M. B Fier- Control,PublicYIealthService,Dept. of YIealth,
ing, Harvard University, and Clopper Almon, Education, and Welfare, 1963.
University of Maryland, are gratefully cknowl- Wold, I-I., A Study in the Analysiso Stationary
edged. Time Series,Almquist and Wiksells,Uppsala,
Sweden, 1954.
REFERENCES Yevdjevich, V. M., Stochastic problems in the
Aitchison, J., and J. A. C. Brown, The Log-Nor- designof reservoirs,
in Kneeseand Smith, eds.,
mal Distribution, Cambridge University Press, Water Research,Johns I-Iopkins Press for Re-
London, England, 1957. sourcesfor the Future, Baltimore,Md., 375-411,
1966.
Benson,M. A., and N. C. Matalas, Synthetic hy-
drologybasedon regionalstatisticalparameters,
Water ResourceRes., this issue,931-935, 1967. (Manuscript receivedJune 1, 1967.)

You might also like