You are on page 1of 42

2.

Probability and Probability


Distributions –
Basic Concepts
HYDROLOGIC PROCESSES may be thought of as stochastic processes. Stochastic in this
sense means involving an element of chance or uncertainty where the process may take on any of
the values of a specified set with a certain probability. An example of a stochastic hydrologic
process is the annual maximum daily rainfall over a period of several years. Here the variable
would be the maximum daily rainfall for each year and the specified set would be the set of
positive numbers.

The instantaneous maximum peak flow observed during a year would be another example
of a stochastic hydrologic process. Table 2.1 contains such a listing for the Kentucky River near
Salvisa, Kentucky. By examining this table it can be seen that there is some order to the values
yet a great deal of randomness exists as well. Even though the peak flow for each of the 99 years
is listed, one cannot estimate with certainty what the peak flow for 1998 was. From the tabulated
data one could surmise that the 1998 peak flow was “probably” between 20,600 cfs and 144,000
cfs. We would like to be able to estimate the magnitude of this “probably”. The stochastic nature
of the process, however, means that one can never estimate with certainty the exact value for the
process (peak discharge) based solely on past observations.

The definition of stochastic given above has some theoretical drawbacks, as we shall see.
Hydrologic processes are continuous processes. The probability of realizing a given value from a
continuous probability distribution is zero. Thus, the probability that a variable will take on a
certain value from a specified set is zero, if the variable is continuous. Practically this presents no
problem since we are generally interested in the probabilities that the variant will be in some
range of values. For instance, we are generally not interested in the probability that the flow rate
will be exactly 100,000 cfs but may desire to estimate the probability that the flow will exceed
100,000 cfs, or be less than 100,000 cfs, or between 90,000 and 120,000 cfs.

With this introduction several concepts such as probability, continuous and probability
distributions have been introduced. We will now define these concepts and others as a basis for
considering statistical methods in hydrology.

16
Table 2.1. Peak discharges (cfs) Kentucky River near Salvisa, Kentucky
Year Flow Year Flow Year Flow
1895 47300 1928 68700 1961 48800
96 54400 29 80100 62 101000
97 87200 30 32300 63 63800
98 65700 31 43100 64 82500
99 91500 32 77000 65 60300
1900 53500 33 53600 66 38900
01 67800 34 70800 67 69900
02 70000 35 89400 68 55200
03 66900 36 62600 69 29400
04 34700 37 112000 70 68400
05 58000 38 44000 71 51700
06 47000 39 84300 72 107000
07 66300 40 45000 73 58600
08 80900 41 28400 74 77900
09 80000 42 46000 75 79600
10 52300 43 80400 76 46300
11 58000 44 55000 77 53400
12 67200 45 72900 78 61800
13 115000 46 71200 79 144000
14 46100 47 46800 80 40600
15 52400 48 84100 81 49600
16 94300 49 61300 82 42900
17 111000 50 87100 83 68800
18 71700 51 70500 84 98400
19 96100 52 77700 85 36500
20 92500 53 44200 86 42100
21 34100 54 20600 87 48800
22 69000 55 85000 88 35100
23 73400 56 82900 89 105000
24 99100 57 88900 90 53600
25 79200 58 60200 91 82900
26 62600 59 40300 92 54600
27 93700 60 50500 93 52100
94 71000

17
PROBABILITY

In the mathematical development of probability theory, the concern has been not so much
how to assign probability events, but what can be done with probability once these assignments
are made. In most applied problems in hydrology, one of the most important and difficult task is
the initial assignment of probability. We may be interested in the probability that a certain flood
level will be exceeded in any year or that the elevation of a piezometric head may be more than
30 feet below the ground surface for 20 consecutive months. We may want to determine the
capacity required in a storage reservoir so that the probability of being able to meet the projected
water demand is 0.97. To address these problems we must understand what probability means
and how to relate magnitude to probabilities.

The definition of probability has been labored over for many years. One definition that is
easy to grasp is the classical or a priori definition:

If a random event can occur in n equally likely and mutually exclusive ways, and if na of
these ways have an attribute A, then the probability of the occurrence of the event having
attribute A is na/n written as
𝑛𝑎
𝑝𝑟𝑜𝑏(𝐴) = (2.1)
𝑛

This definition is an a priori definition because it assumes that one can determine before
the fact all of the equally likely and mutually exclusive ways that an event can occur and all of
the ways that an event with attribute A can occur. The definition is somewhat circular in that
“equally likely” is another way of saying “equally probable” and we end up using the word
“probable” to define probability. This classical definition is widely used in games of chance
such as card games and dice and in selecting objects with certain characteristics from a larger
group of objects. This definition is difficult to apply in hydrology because we generally cannot
divide hydrologic events into equally likely categories. To do that would require knowledge of
the likelihood or probability of the events which is generally the objective of our analysis and not
known before the analysis.

The classical definition of probability takes on more utility in hydrology in terms of


relative frequencies and limits.

If a random event occurs a large number of times n and the event has attribute A
in na of these occurrences, then the probability of the occurrence of the event
having attribute A is
𝑛𝑎
𝑝𝑟𝑜𝑏 (𝐴) = lim𝑛→∞ (2.2)
𝑛

The relative frequency approach to estimating probabilities is empirical in that it is based


on observations. Obviously, we will not have an infinite number of observations. For this
probability estimate to be very accurate, n may have to be quite large. This is frequently a
18
limitation in hydrology.

The relative frequency concept of probability is the source of the relationship given in
Chapter 1 between the return period, T, of an event and its probability of occurrence, p.

These two definitions of probability can be illustrated by considering the probability of


getting heads in a single flip of a coin. If we know a priori that the coin is balanced and not
biased toward heads or tails, we can apply the first definition. There are two possible and equally
likely outcomes - heads or tails - so n is 2. There is one outcome with a head so na is 1. Thus the
probability of a head is ½. If the coin is not balanced so that the two outcomes are not equally
likely, we could not use the a priori definition. We had to know the answer to our question
before we could apply the a priori definition.

1
0.9 Probability of getting a "head"
0.8 from a coin flipping experiment
0.7
0.6
prob(H)

0.5
0.4
0.3
0.2
0.1
0
0 50 100 150 200 250

Number of trials

Fig 2.1 Coin flipping experiment

This is not the case when the relative frequency definition is used. Obviously we cannot
flip the coin an infinite number of times. We have to resort to a finite sample of flips. Figure 2.1
shows how the estimate of the probability of a head changes as the number of trials (flips)
changes. A trend toward ½ is noted. This is called stochastic convergence towards ½. One
19
question that might be asked is, “is the coin unbiased?” One’s initial reaction is that more trials
are needed. It can be seen that the probability is slowly converging toward ½ but after 250 trials
is not exactly equal to ½. This is the plight of the hydrologist. Many times more trials or
observations are needed but are not available. Still the data does not clearly indicate a single
answer. This is where probability and statistics come into play

Equation 2.2 allows us to estimate probabilities based on observations and does not
require that outcomes be equally likely or that they all be enumerated. This advantage is
somewhat offset in that estimates of probability based on observations are empirical and will
only stochastically converge to the true probability as the number of observations becomes large.
For example, in the case of annual flood peaks, only one value per year is realized. Figure 2.2
shows the probability of an annual peak flow on the Kentucky River exceeding the mean annual
flow as a function of time starting in 1895. Note that each year additional data becomes
available to determine both the mean annual flow and the probability of exceeding that value.
Here again a convergence toward 2 is noted yet not assured. In fact there is no reason to believe
that 2 is the Acorrect@ probability since the probability distribution of annual peak flows is likely
not symmetrical about the mean.

0.90

0.80

0.70

0.60

0.50
Probability

Prob(peak >mean)
0.40

0.30

0.20

0.10

0.00
1895 1915 1935 1955 1975
Year

Fig 2.2. Probability that the annual peak flow on the Kentucky River exceeds the mean annual
peak flow.
20
If two independent sets of observations are available (samples), an estimate of the
probability of an event A could be determined from each set of observations. These two
estimates of prob(A) would not necessarily equal each other nor would either estimate
necessarily equal the true (population) prob(A) based on an infinitely large sample. This dilemma
results in an important area of concern to hydrologists--how many observations are required to
produce “acceptable” estimates for the probabilities of events?

From either equation 2.1 or 2.2 it can be seen that the probability scale ranges from zero
to one. An event having a probability of zero is impossible while one having a probability of one
will happen with certainty. Many hydrologists like to avoid the end points of the probability
scale, zero and one, because they cannot be absolutely certain regarding the occurrence or
nonoccurrence of an event. Sometimes probability is expressed as a percent chance with a scale
ranging from 0% to 100%. Care must be taken to not confuse the percent chance values with
true probabilities. A probability of 1 is very different from a 1% chance of occurrence as the
former implies the event will certainly happen while the latter means it will happen only one
time in 100.

In mathematical statistics and probability, set and measure theory are used in defining
and manipulating probabilities. An experiment is any process that generates values of random
variables. All possible outcomes of an experiment constitute the total sample space known as the
population. Any particular point in the sample space is a sample point or element. An event is a
collection of elements known as a subset.

To each element in the sample space of an experiment a non-negative weight is assigned


such that the sum of the weights on all of the elements is 1. The magnitude of the weight is
proportional to the likelihood that the experiment will result in a particular element. If an
element is quite likely to occur, that element would have a weight of near 1. If an element was
quite unlikely to occur, that element would have a weight of near zero. For elements outside the
sample space, a weight of zero is assigned. The weights assigned to the elements of the sample
space are known as probabilities. Here again the word likelihood is used to define probability so
that the definition becomes circular.

Letting S represent the sample space; Ei for i = 1, 2, ..., represent elements in S; A and B
represent events in S; and prob(Ei) represent the probability of Ei, it follows that

0 ≤ 𝑝𝑟𝑜𝑏(𝐸𝑖 ) ≤ 1 (2.3)

Since the sample space is made up of the totality of elements in S, we have

𝑆 = ⋃𝑖 𝐸𝑖

where ⋃𝑖 represents the union or total collection of all of the Ei and

𝑝𝑟𝑜𝑏(𝑆) = ∑𝑖 𝑝𝑟𝑜𝑏(𝐸𝑖 ) = 1 (2.4)


21
An event, A, is made up of a subset of elements in S so that

𝐴 = ⋃𝑛𝑖=𝑚 𝐸𝑖

and

0 ≤ 𝑝𝑟𝑜𝑏(𝐴) = ∑𝑛𝑖=𝑚 𝐸𝑖 ≤ 1 (2.5)

These concepts are illustrated in figure 2.3 as a Venn diagram.

E1 E2 E3

A
B

E4

Fig. 2.1. Venn diagram illustrating a sample space, elements and events.

Using notation from set theory and Venn diagrams, several probabilistic relationships can
be illustrated. If A and B are two events in S, then the probability of A or B, shown as the shaded
area of Figure 2.3, is given by

𝑝𝑟𝑜𝑏(𝐴 ∪ 𝐵) = 𝑝𝑟𝑜𝑏(𝐴) + 𝑝𝑟𝑜𝑏(𝐵) − 𝑝𝑟𝑜𝑏(𝐴 ∩ 𝐵) (2.6)

Note that in probability the word “or” means “either or both”. The notation ∪ represents a union
so that A∪B represents all elements in A or B or both. The notation ∩ represents an intersection
so that A∩B represents all elements in both A and B. The last term of equation 2.6 is needed
since prob(A) and prob(B) both include prob(A∩B). Thus prob(A∩B) must be subtracted once so
the net result is only one inclusion of prob(A∩B) on the right-hand side of the equation. If A and
B are mutually exclusive, then both cannot occur and prob(A∩B) = 0. In this case

𝑝𝑟𝑜𝑏(𝐴 ∪ 𝐵) = 𝑝𝑟𝑜𝑏(𝐴) + 𝑝𝑟𝑜𝑏(𝐵)

22
E1 E2 E3

AUB
A U
A B

E4

Figure. 2.2. Venn diagram showing A∩B and A∪B.

Figure 2.3 illustrates the case where event A and B are mutually exclusive while figure
2.4 shows A and B when they are not mutually exclusive.

If Ac represents all elements in the sample space S that are not in A, then

𝐴 ∪ 𝐴𝑐 = 𝑆

Ac is known as the complement of A. Equation 2.4 indicates that

𝑝𝑟𝑜𝑏(𝐴 ∪ 𝐴𝑐 ) = 1

This statement says that the probability of A or Ac is certainty since one or the other must occur.
All of the possibilities have been exhausted. Since A and Ac are mutually exclusive

𝑝𝑟𝑜𝑏(𝐴 ∪ 𝐴𝑐 ) = 𝑝𝑟𝑜𝑏(𝐴) + 𝑝𝑟𝑜𝑏(𝐴𝑐 ) = 1

or we have the very useful result that the probability of an event A is

𝑝𝑟𝑜𝑏(𝐴) = 1 − 𝑝𝑟𝑜𝑏(𝐴𝑐 ) (2.7)

Equation 2.7 often makes it easy to evaluate probability by first evaluating the probability an

23
outcome will not occur.

An example is evaluating the probability that a peak flow q exceeds some particular flow
qo. A would be all q's greater than qo and Ac would be all q's less than qo. Because q must be
either greater than or less than qo, prob(q>qo) = 1-prob(q<qo). We show later that for continuous
random variables prob(q=qo)=0.

If the probability of an event B depends on the occurrence of an event A, then we write


𝑝𝑟𝑜𝑏(𝐵|𝐴), read as the probability of B given A or the conditional probability of B given A has
occurred. The prob(B) is conditioned on the fact that A has occurred. Referring to figure 2.4 it is
apparent that conditioning on the occurrence of A restricts consideration to A. Our total sample
space is now A. The occurrence of B given that A has occurred is represented by A∩B. Thus the
prob(B|A) is given by
𝑝𝑟𝑜𝑏(𝐴∩𝐵)
𝑝𝑟𝑜𝑏(𝐵|𝐴) = 𝑝𝑟𝑜𝑏(𝐴)
(2.8)

assuming of course that prob(A)≠0. Equation 2.8 can be rearranged to give the probability of A
and B as

𝑝𝑟𝑜𝑏(𝐴 ∩ 𝐵) = 𝑝𝑟𝑜𝑏(𝐴)𝑝𝑟𝑜𝑏(𝐵|𝐴)

Now if 𝑝𝑟𝑜𝑏(𝐵|𝐴) = 𝑝𝑟𝑜𝑏(𝐵), we say that B is independent of A. Thus, the joint probability of
two independent events is the product of their individual probabilities.

𝑝𝑟𝑜𝑏(𝐴 ∩ 𝐵) = 𝑝𝑟𝑜𝑏(𝐴)𝑝𝑟𝑜𝑏(𝐵) (2.9)

Example 2.1. Using the data shown in table 2.1, estimate the probability that a peak flow in
excess of 100,000 cfs will occur in 2 consecutive years on the Kentucky River near Salvisa,
Kentucky.

Solution: From table 2.1 it can be seen that a peak flow of 100,000 cfs was exceeded 7 times in
the 99-year record. If it is assumed that the peak flows from year to year are independent, then
the probability of exceeding 100,000 cfs in any one year is approximately 7/99 or 0.0707.
Applying equation 2.9 the probability of exceeding 100,000 cfs in two successive years is found
to be 0.0707 x 0.0707 or 0.0050.

Example 2.2. A study of daily rainfall at Ashland, Kentucky, has shown that in July the
probability of a rainy day following a rainy day is 0.444, a dry day following a dry day is 0.724,
a rainy day following a dry day is 0.276 and a dry day following a rainy day is 0.556. If it is
observed that a certain July day is rainy, what is the probability that the next two days will also
be rainy?

Solution: Let A be a rainy day 1 and B be a rainy day 2 following the initial rainy day. The
probability of A is 0.444 since this is the probability of a rainy day following a rainy day.

24
𝑝𝑟𝑜𝑏(𝐴 ∩ 𝐵) = 𝑝𝑟𝑜𝑏(𝐴)𝑝𝑟𝑜𝑏(𝐵|𝐴)

Now the prob(B|A) is also 0.444 since this is the probability of a rainy day following a rainy day.
Therefore

𝑝𝑟𝑜𝑏(𝐴 ∩ 𝐵) = 0.444 × 0.444 = 0.197

The probability of 2 rainy days following a dry day would be 0.276 × 0.444 or 0.122.
Note that the probabilities of wet and dry days are dependent on the previous day. Independence
does not exist. It can be shown that over a long period of time, 67% of the days will be dry and
33% will be rainy with the conditional probabilities as stated. If one had assumed independence,
then the probability of two consecutive rainy days would have been 0.33×0.33 =0.1089
regardless of whether the preceding day had been rainy or dry. Since the probability of a rainy
day following a rainy day is much greater than following a dry day, persistence is said to exist.

TOTAL PROBABILITY THEOREM

If B1, B2, ..., Bn represents a set of mutually exclusive and collectively exhaustive events,
one can determine the probability of another event A from

𝑝𝑟𝑜𝑏(𝐴) = ∑𝑛𝑖=1 𝑝𝑟𝑜𝑏(𝐴|𝐵𝑖 )𝑝𝑟𝑜𝑏(𝐵𝑖 ) (2.10)

This is called the theorem of total probability. Equation 2.10 is illustrated by figure 2.5.

Example 2.3. It is known that the probability that the solar radiation intensity will reach a
threshold value is 0.25 for rainy days and 0.80 for non-rainy days. It is also known that for this
particular location the probability that a day picked at random will be rainy is 0.36. What is the
probability the threshold intensity of solar radiation will be reached on a day picked at random?

Solution: Let A represent the threshold solar radiation intensity, B1 represent a rainy day and B2
a non-rainy day. From equation 2.10, we know that

𝑝𝑟𝑜𝑏(𝐴) = 𝑝𝑟𝑜𝑏(𝐴|𝐵1 )𝑝𝑟𝑜𝑏(𝐵1 ) + 𝑝𝑟𝑜𝑏(𝐴|𝐵2 )𝑝𝑟𝑜𝑏(𝐵2 )

= 0.25 × 0.36 + 0.80(1 − 0.36)

= 0.602

This is an example of a weighted probability. The probability of A given a condition is weighted


by the probability of the condition and summed.

25
B2 B3

B1 B4
A

B6 B5

Figure. 2.3. Venn diagram for theorem of total probability.

BAYES THEOREM

By rewriting equation 2.8 in the form

𝑝𝑟𝑜𝑏(𝐴)𝑝𝑟𝑜𝑏�𝐵𝑗 �𝐴� = 𝑝𝑟𝑜𝑏�𝐵𝑗 �𝑝𝑟𝑜𝑏(𝐴|𝐵𝑗 )

and then substituting from equation 2.10 for prob(A), we get what is called Bayes Theorem:

𝑝𝑟𝑜𝑏�𝐵𝑗 �𝑝𝑟𝑜𝑏(𝐴|𝐵𝑗 )
𝑝𝑟𝑜𝑏�𝐵𝑗 �𝐴� = ∑𝑛 (2.11)
𝑖=1 𝑝𝑟𝑜𝑏(𝐴|𝐵𝑖 )𝑝𝑟𝑜𝑏(𝐵𝑖 )

As pointed out by Benjamin and Cornell (1970), this simple derivation of Bayes Theorem belies
its importance. It provides a method for incorporating new information with previous or so-
called prior probability assessments to yield new values for the relative likelihood of events of
interest. These new (conditional) probabilities are called posterior probabilities. Equation 2.11
is the basis of Bayesian Decision Theory. Bayes Theorem provides a means of estimating
probabilities of one event by observing a second event. Such an application is illustrated in
example 2.4.

Example 2.4. The manager of a recreational facility has determined that the probability of 1000
or more visitors on any Sunday in July depends upon the maximum temperature for that Sunday
as shown in the following table. The table also gives the probabilities that the maximum
temperature will fall in the indicated ranges. On a certain Sunday in July, the facility has more
26
than 1000 visitors. What is the probability that the maximum temperature was in the various
temperature classes?

Temp Prob of Prob of Prob of


°F 1000 or being in Tj|1000
Tj more visitors temp class or more visitors
<60 0.05 0.05 0.005
60-70 0.20 0.15 0.059
70-80 0.50 0.20 0.197
80-90 0.75 0.35 0.517
90-100 0.50 0.20 0.197
>100 0.25 0.05 0.025

Total 1.000

Solution: Let Tj for j = 1, 2, ..., 6 represent the 6 intervals of temperature. Then from equation
2.11

𝑝𝑟𝑜𝑏�1000 𝑜𝑟 𝑚𝑜𝑟𝑒|𝑇𝑗 � × 𝑝𝑟𝑜𝑏�𝑇𝑗 �


𝑝𝑟𝑜𝑏�𝑇𝑗 �1000 𝑜𝑟 𝑚𝑜𝑟𝑒� = ∑6
𝑖+1 𝑝𝑟𝑜𝑏(1000 𝑜𝑟 𝑚𝑜𝑟𝑒|𝑇𝑖 ) × 𝑝𝑟𝑜𝑏(𝑇𝑖 )

𝑝𝑟𝑜𝑏�1000 𝑜𝑟 𝑚𝑜𝑟𝑒|𝑇𝑗 � × 𝑝𝑟𝑜𝑏�𝑇𝑗 �


𝑝𝑟𝑜𝑏�𝑇𝑗 �1000 𝑜𝑟 𝑚𝑜𝑟𝑒� =
0.5(0.5)+0.20(0.15)+⋯+0.25(0.05)

For example
0.05×0.05
𝑝𝑟𝑜𝑏(< 60𝐹|1000 𝑜𝑟 𝑚𝑜𝑟𝑒) = 0.507
= 0.005

Similar calculations yield the last column in the above table. Note that Σi=1prob(Tj | 1000 or
6

more) is equal to 1.

COUNTING

In applying equation 2.1 to determine probabilities, one often encounters situations where
it is impractical to actually enumerate all of the possible ways that an event can occur. To assist
in this matter certain general mathematical formulas have been developed.

If E1, E2, ..., En are mutually exclusive events such that Ei ∩ Ej = 0 for all i ≠ j where 0
represents an empty set and Ei can occur in ni ways, then the compound event E made up of
outcomes E1, E2, ..., En can occur in n1 n2 ...nn ways.

27
The problem of sampling or selecting a sample of r items from n items is commonly
encountered. Sampling can be done with replacement, so that the item selected is immediately
returned to the population, or without replacement, so that the item is not returned. The order of
sampling may be important in some situations and not in others. Thus, we may have four types
of samples--ordered with replacement, ordered without replacement, unordered with replacement
and unordered without replacement.

In case of an ordered sample with replacement, the number of ways of selecting item 1 is
n since there are n items in the population. Similarly, item 2 can be selected in n ways since the
first item selected is returned to the population. Thus two items can be selected in 𝑛 × 𝑛 ways.
Extending this argument to r selections, the number of ways of selecting r items from n with
replacement and order being important is simply nr.

If the items are not replaced after being selected, then the first item can be selected in n
ways, the second in n-1 ways and so on until the rth item can be selected in n-r+1 ways. Thus r
ordered items can be selected from n without replacement in n(n-1)(n-2)...(n-r+1) ways. This is
commonly written as (n)r and called the number of permutations of n items taken r at a time.

(𝑛)𝑟 = 𝑛(𝑛 − 1)(𝑛 − 2) … (𝑛 − 𝑟 + 1) = 𝑛!/(𝑛 − 𝑟)! (2.11)

The notation n! is read “n factorial”. n! = n(n-1)(n-2)...(2)(1). By definition 0! = 1.

Unordered sampling without replacement is similar to ordered sampling without


replacement except in the case of ordered sampling the r items selected can be arranged in r!
ways. That is, an ordered sample of r items can be selected from r items in (r)r or r!, ways.
Thus r! of the ordered samples will contain the same elements. The number of different
𝑛
unordered samples is therefore (n)r/r! commonly written as � � and called the binomial
𝑟
coefficient. The binomial coefficient gives the number of combinations possible when selecting
r items from n without replacement.
𝑛 (𝑛) 𝑛!
� � = 𝑟! 𝑟 = (𝑛−𝑟)!𝑟! (2.12)
𝑟
Finally, in unordered sampling, selecting r items from n with replacement is equivalent to
selecting r items from n+r-1 items without replacement. That is, we can consider the population
as containing r-1 items more than it really does since the items selected will be replaced. The
number of ways of selecting r unordered items from n items is then

𝑛+𝑟−1 (𝑛+𝑟−1)!
� � = (𝑛−1)!𝑟!
𝑟
The number of ways of selecting samples under the four above conditions is summarized in the
following table:

28
With Without

replacement replacement

Ordered 𝑛𝑟 𝑛!
(𝑛)𝑟 =
(𝑛 − 𝑟)!

Unordered 𝑛+𝑟−1 (𝑛 + 𝑟 − 1)! 𝑛 𝑛!


� �= � �=
𝑟 (𝑛 − 1)! 𝑟! 𝑟 (𝑛 − 𝑟)! 𝑟!

Example 2.5. For a particular watershed, records from 10 rain gages are available. Records
from 3 of the gages are known to be bad. If 4 records are selected at random form the 10 records

a) What is the probability that 1 bad record will be selected?


b) What is the probability that 3 bad records will be selected
c) What is the probability that at least 1 bad record will be selected?

Solution: The total number of ways of selecting 4 records from the 10 available records (order
is not important) is

10 10!
� � = 6!4! = 210
4

(a) The number of ways of selecting 1 bad record from 3 bad records and 3 good records
from 7 good records is

3 7 3! 7!
� � � � = 2!1! 4!3! = 105
1 3
Applying equation 2.1 and letting a = 1 bad and 3 good records, the probability of a is 105/210
or 0.500.

(b) The number of ways of selecting 3 bad records and 1 good record is

3 7
� � � � = 1 × 7=7
3 1

Thus, the probability of selecting 3 bad records is 7/210 or 0.033.

(c) The probability of at least 1 bad record is equal to the probability of 1 or 2 or 3 bad
records. The probability of 1 and 3 bad records is known to be 0.500 and 0.033
respectively. The probability of 2 bad records is

29
3 7
� � � � 3(21)
2 2 = = 0.300
210 210

Thus, the probability of at least 1 bad record is 0.500 + 0.300 + 0.033 = 0.833. This latter result
could also be determined from the fact that the probability of 0 or 1 or 2 or 3 bad records must
equal one. The probability of at least 1 bad record thus equals
3 7
� �� � 35
0 4
1 − 𝑝𝑟𝑜𝑏(0 𝑏𝑎𝑑 𝑟𝑒𝑐𝑜𝑟𝑑𝑠) = 1 − 210
= 1 − 210 = 0.833

Example 2.5. For the situation described in Example 2.5, what is the probability of selecting at
least 2 bad records given that one of the records selected is bad?

Solution:
𝑝𝑟𝑜𝑏(𝑎𝑡 𝑙𝑒𝑎𝑠𝑡 2 𝑏𝑎𝑑 𝑜𝑢𝑡 𝑜𝑓 4)
𝑝𝑟𝑜𝑏(𝑎𝑡 𝑙𝑒𝑎𝑠𝑡 2 𝑏𝑎𝑑 𝑜𝑢𝑡 𝑜𝑓 4|1 𝑖𝑠 𝑏𝑎𝑑) = 𝑝𝑟𝑜𝑏(1 𝑏𝑎𝑑 𝑖𝑛 4)

(0.300+0.033)
= = 0.667
0.500

GRAPHICAL PRESENTATION

Hydrologists are often faced with large quantities of data. Since it is difficult to grasp the
total data picture from tabulations such as table 2.1, a useful first step in data analysis is to use
graphical procedures. Throughout this book various graphical presentations will be illustrated.
Helsel and Hirsch (1992) contains a very good treatment of graphical presentations of hydrologic
data. The appropriate graphical technique depends on the purpose of the analysis. As various
analytic procedures are discussed throughout this book, graphical techniques that supplement the
procedures will be illustrated. Undoubtedly the most common graphical representation of
hydrologic data is a time series plot of magnitude versus time. Figure 1.2 is such a plot for the
peak flow data for the Kentucky River. A plot such as this is useful for detecting obvious trends
in the data in terms of means and variances or serial dependence of data values next to each other
in time. Figure 1.2 reveals no obvious trends in the mean annual peak flow or variances of
annual peak flows. It also shows no strong dependence of peak flow in one year on the peak
flow of the previous year. We previously indicated that figure 1.1 showing the water level in the
Great Salt Lake of Utah indicated an apparent year-to-year relationship. Such a relationship is a
reflected in the serial correlation coefficient. Serial correlation will be discussed in detail later in
the book. For now suffice it to say that the serial correlation for the annual lake level for the
Great Salt Lake is 0.969 and for the annual peak flows on the Kentucky River is -0.067. This
30
indicates a very strong year-to-year correlation for the Salt Lake data and insignificant
correlation for the peak flow on the Kentucky River. Serial correlation indicates dependence
from one observation to the next. It also indicates persistence. In the Salt Lake data, lake levels
tend to change slowly with high levels following high levels. In the Kentucky River data, no
such pattern is evident.

In conducting a probabilistic analysis, a graphical presentation that is often quite useful is


to plot the data as a frequency histogram. This is done by grouping the data into classes and then
plotting a bar graph with the number or the relative frequency (proportion) of observations in a
class versus the midpoint of the class interval. The midpoint of a class is called the class mark.
The class interval is the difference between the upper and lower class boundaries. Figure 2.6 is
such a plot for the Kentucky River peak flow data. Frequency histograms are of most value for
data that are independent from observation to observation. A frequency histogram for the
Kentucky River data would have the same general shape and location regardless of what period
of record was selected. The Salt Lake data would have a different location if the period 1890 to
1910 was used in comparison to the period 1870 to 1890 since the levels were higher over the
later period. Certainly no satisfactory histogram of levels could be developed for Devils Lake,
ND.

The selection of the class interval and the location of the first class mark can appreciably
affect the appearance of a frequency histogram. The appropriate width for a class interval
depends upon the range of the data, the number of observations, and the behavior of the data.
Several suggestions have been put forth for forming frequency histograms. Spiegel (1961)
suggests that there should be 5 to 20 classes. Steel and Torrie (1960) state that the class interval
should not exceed one-fourth to one-half of the standard deviation of the data. Sturges (1926)
recommends that the number of classes be determined from

𝑚 = 1 + 3.3 log 𝑛 (2.13)

where m is the number of classes, n is the number of data values, and the logarithm to the base
10 is used.

Whatever criteria is used, it should be kept in mind that sensitivity is lost if too few or too
many classes are used. Too few classes will eliminate detail and obscure the basic pattern of the
data. Too many classes result in erratic patterns of alternating high and low frequencies. Figure
2.7 represents the Kentucky River data with too many class intervals. If possible, the class
intervals and class marks should be round figures. This is not a computational or theoretical
consideration, but one aimed at making it easier for those viewing the histogram to grasp its full
meaning.

31
.30
.24 .25 Kentucky River
.25
.19
.20
Relative Frequency

.14
.15

.10 .09

.05 .03 .04


.00 .01
.00
22.5 37.5 52.5 67.5 82.5 97.5 112.5 127.5 142.5
Peak Flow (1000s cfs)

Fig. 2.4. Frequency histogram for the Kentucky River peak flow data.

.14
.12 Kentucky River
Relative Frequency

.10
.08
.06
.04
.02
.00
25 40 55 70 85 100 115 130 145
Peak Flow (1000's cfs)
Fig. 2.5. Frequency histogram for the Kentucky River data with too many classes.

In some situations it may be desirable to use nonuniform class intervals. In chapter 8 a


situation is presented where the intervals are such that the expected relative frequencies are the
same in each class.

Another common method of presenting data is in the form of a cumulative frequency


32
distribution. Cumulative frequency distributions show the frequency of events less than (greater
than) some given value. They are formed by ranking the data from the smallest (largest) to the
largest (smallest), dividing the rank by the number of data points and plotting this ratio against
the corresponding data value. If the data are ranked from the smaller (larger) data values to the
larger (smaller) values, the resulting cumulative frequency refers to the frequency of
observations less (more) than or equal to the corresponding data value. Figure 2.8 is a
cumulative frequency plot based on the Kentucky River peak flow data. Again, ranking and
plotting data in this fashion is most meaningful if the data are not serially correlated.

1
Cumulative frequency

0.8
less than
0.6

0.4
more than
0.2

0
20 70 120
Flow (1000s cfs)

Fig. 2.6. Cumulative frequency distribution of Kentucky River data.

RANDOM VARIABLES

Simply stated, a random variable is a real-valued function defined on a sample space. If


the outcome of an experiment, process, or measurement has an element of uncertainty associated
with it such that its value can only be stated probabilistically, the outcome is a random variable.
This means that nearly all quantities in hydrology, flows, precipitation depths, water levels,
storages, roughness coefficients, aquifer properties, water quality parameters, sediment loads,
number of rainy days, drought duration, etc., are random variables.

Random variables may be discrete or continuous. If the set of values a random variable
can assume is finite (or countably infinite), the random variable is said to be a discrete random
variable. If the set of values a random variable can assume is infinite, the random variable is
said to be a continuous random variable. An example of a discrete random variable would be
33
the number of rainy days experienced at a particular location over a period of 1 year. The
amount of rain received over the year would be a continuous random variable. For the most part
in this text, capital letters will be used to denote random variables and the corresponding lower
case letter will represent values of the random variable.

It is important to note that any function of a random variable is also a random variable.
That is if X is a random variable then Z=g(X) is a random variable as well. This follows from
the fact that if X is uncertain, then any function of X must be uncertain as well. Physically, this
means that any hydrologic quantity that is dependent on a random variable is also a random
variable. If runoff is a random variable, then erosion and sediment delivery to a stream is a
random variable. If sediment has absorbed chemicals, then water quality is a random variable.

Example 2.6. Nearly every hydrologic variable can be taken as a random variable. Rainfall for
any duration, streamflow, soil hydraulic properties, time between hydrologic events such as
flows above a certain base or daily rainfalls in excess of some amount, the number of times a
streamflow rate exceeds a given base over a period of a year and daily pan evaporation are all
random variables. Quantities derived from random hydrologic variables are also random
variables. The storage required in a water supply reservoir to meet a given demand is a function
of the demand and the inflow to the reservoir. Since reservoir inflow is a random variable,
required storage is also a random variable. As a matter of fact, the demand that is placed on the
reservoir would be a random variable as well. The velocity of flow through a porous media is a
function of the hydraulic conductivity and hydraulic gradient which are both random variables.
Therefore the flow velocity is a random variable also.

UNIVARIATE PROBABILITY DISTRIBUTIONS

If the random variable X can only take on values x1, x2, ..., xn with probabilities fX(x1),
fX(x2), ..., fX(xn) and ΣifX (xi) = 1, then X is a discrete random variable. With a discrete random
variable there are “lumps” or spikes of probability associated with the values that the random
variable can assume. Figure 2.9 is a typical plot of the distribution of probability associated with
the values that a discrete random variable can assume. This information would constitute a
probability distribution function (pdf) for a discrete random variable. shown in figure 2.10. The
cumulative distribution represents the probability that X is less than or equal to xk.

The cumulative probability distribution (cdf), FX(xk) for this descrete random variable is
shown in figure 2.10. The cumulative distribution represents the probability that X is less than or
equal to xk.

𝐹𝑥 (𝑥) = ∑𝑥𝑖 ≤𝑥 𝑓𝑥 (𝑥𝑖 ) (2.14)

The cumulative distribution has jumps in it at each xi equal in magnitude to fX(xi) or the
probability that X = xi. The probability that X = xi can be determined from

34
𝑓𝑋 (𝑥𝑖 ) = 𝐹𝑋 (𝑥𝑖 ) − 𝐹𝑋 (𝑥𝑖−1 ) (2.16)

0.3

0.25

0.2

0.15
f(x)

0.1

0.05

0
0 1 2 3 4 5 6 7 8 9 10
X
Figure 2.7. A discrete probability distribution function.

1.2
1
0.8
0.6
F(x)

0.4
0.2
0
0 1 2 3 4 5 6 7 8 9 10
X
Figure 2.8. A discrete cumulative distribution function.

The notation fX(x) and FX(x) denote the pdf and cdf of the discrete random variable X
evaluated at X = x.

Often, continuous data are treated as though they were discrete. Looking again at the
Kentucky River peak flow data, we can define the event A as having a peak flow in the ith class.
Letting ni be the number of observed peak flows in the ith interval and n be the total number of
35
observed peak flows, the probability that a peak flow is in the ith class is given by
𝑛𝑖
𝑝𝑟𝑜𝑏(𝐴) = = 𝑓𝑋𝑖 (2.15)
𝑛

Thus, the relative frequency, fXi, can be interpreted as a probability estimate, the frequency
histogram can be interpreted as an approximation for a pdf, and the cumulative frequency can be
interpreted as an approximation for a cdf.

Many times it is desirable to treat continuous random variables directly. Continuous


random variables can take on any value in a range of values permitted by the physical processes
involved. Probability distribution functions of continuous random variables are smooth curves.
The pdf of a continuous random variable X is denoted by pX(x). The cdf is denoted by PX(x).
PX(x) represents the probability that X is less than or equal to x or

𝑃𝑋 (𝑥) = 𝑝𝑟𝑜𝑏(𝑋 ≤ 𝑥) (2.16)

The pdf and the cdf function are related by

𝑑𝑃𝑋 (𝑥) = 𝑝𝑋 (𝑥)𝑑𝑥 (2.19)

or
𝑥
𝑃𝑋 (𝑥) = ∫−∞ 𝑝𝑋 (𝑡)𝑑𝑡 (2.17)

The notation pX(x) and PX(x) denote the pdf and cdf, respectively, of the continuous
random variable X evaluated at X = x. Thus pY (a) represents the pdf of the random variable Y
evaluated at Y = a. PY(a) represents the cdf of the random variable Y and gives prob(Y ≤ a).

A function, pX(x), defined on the real line can be a pdf if and only if

1. pX(x) ≥ 0 for all x ∈ R, the range of X (2.21)

2. ∫𝑅 𝑝𝑋 (𝑥)𝑑𝑥 = 1

By definition pX(x) is zero for X outside R. Also PX (xl) = 0 and PX (xu) = 1 where xl and
xu are the lower and upper limits of X in R. For many distributions these limits are -∞ to ∞ or 0
to ∞. It is also apparent that the probability that X takes on a value between a and b is given by
𝑏
𝑝𝑟𝑜𝑏(𝑎 ≤ 𝑋 ≤ 𝑏) = ∫𝑎 𝑝𝑋 (𝑡)𝑑𝑡 (2.22)

= 𝑃𝑋 (𝑏) − 𝑃𝑋 (𝑎) (2.23)

36
The prob(a≤X≤b) is the area under the pdf between a and b. The probability that a random
variable takes on any particular value from a continuous distribution is zero. This can be seen
from
𝑑
𝑝𝑟𝑜𝑏(𝑋 = 𝑑) = ∫𝑑 𝑝𝑋 (𝑡)𝑑𝑡 (2.22)

= 𝑃𝑋 (𝑑) − 𝑃𝑋 (𝑑)

=0

Because the probability that a continuous random variable takes on a specified value is zero, the
expressions prob(a≤X≤b), prob(a<X≤b), prob(a≤X<b) and prob(a<X<b) are all equivalent. It is
also apparent that PX(x) can be interpreted as the probability that X is strictly less than x since
prob(X=x) = 0.

Figures 2.11 and 2.12 illustrate a possible pdf and its corresponding cdf. In addition to
density functions that are symmetrical and bell-shaped, densities may take on a number of
different shapes including distributions that are skewed to the right or left, rectangular,
triangular, exponential, “J”-shaped, and reverse “J”-shaped (Fig.2.13).

A1 = prob(x<a)
A2 = prob(a<x<b)
A3 = prob(x>b)
p(x)

Total area = 1

A3
A2
A1

a b X

Fig. 2.11 Probability density function.

37
1

0.8

0.6
prob(x>b)
P(x)

0.4

prob(a<x<b)
0.2

prob(x<a)
0
a b X

Fig. 2.9. Cumulative probability distribution function.

At this point a cautionary note is added to the effect that the probability density function,
pX(x), is not a probability and can have values exceeding one. The cumulative probability
distribution, PX(x), is a probability [prob(X ≤ x) = PX(x)] and must have values ranging from 0 to
1. Of course, pX(x) and PX(x) are related as indicated by equation 2.20 and knowledge of one
specifies the other.

38
Example 2.7. Evaluate the constant a for the following expression to be considered a
probability density function:

𝑝𝑋 (𝑥) = 𝑎𝑥 2 , 0≤𝑋≤5

What is the probability that a value selected at random from this distribution will

a) Be less than 2?
b) Fall between 1 and 3?
c) Be larger than 4?
d) Be larger than or equal to 4?
e) Exceed 6?

Solution:

From equation 2.21 we must have


5
∫0 𝑝𝑋 (𝑡) 𝑑𝑡 = 1
5
∫0 𝑎𝑡 2 𝑑𝑡 = 1

𝑎𝑡 3 5
� =1
3 0

39
𝑎 = 3/125

So
3𝑥 2
𝑝𝑋 (𝑥) = 125

and
𝑥3
𝑃𝑋 (𝑥) = 125 𝑓𝑜𝑟 0 ≤ 𝑋 ≤ 5

8
a) 𝑝𝑟𝑜𝑏(𝑋 ≤ 2) = 𝑃𝑋 (2) = 125
b) 𝑝𝑟𝑜𝑏(1 ≤ 𝑋 ≤ 3) = 𝑃𝑋 (3) − 𝑃𝑋 (1) = 26/125
64
c) 𝑝𝑟𝑜𝑏(𝑋 > 4) = 1 − 𝑃𝑋 (4) = 1 − 125 = 61/125
61
d) 𝑝𝑟𝑜𝑏(𝑋 ≥ 4) = 125 𝑠𝑖𝑛𝑐𝑒 𝑝𝑟𝑜𝑏(𝑋 = 4) = 0
e) 𝑝𝑟𝑜𝑏(𝑋 > 6) = 1 − 𝑃𝑋 (6)

𝑃𝑥 (𝑥) 𝑓𝑜𝑟 𝑋 > 5 = 1


𝑝𝑟𝑜𝑏(𝑋 > 6) = 1 − 𝑃𝑋 (6) = 1 − 1 = 0

Piecewise continuous distributions satisfying the requirements for a probability


distribution in which the prob(X=d) is not zero are possible. Such a distribution could be defined
by

𝑃𝑋 (𝑥) = 𝑃1 (𝑥) 𝑓𝑜𝑟 𝑋 < 𝑑 (2.24)

= 𝑃2 (𝑥) 𝑓𝑜𝑟 𝑋 ≥ 𝑑

where P2(d) > P1(d), P1(xl) = 0, P2(xu) = 1 and P1(x) and P2(x) are nondecreasing functions of X.
Figure 2.14 is a plot of such a distribution. For this situation the prob (X=d) equals the
magnitude of the jump ΔP at X=d or is equal to P2(d) - P1(d). Any finite number of
discontinuities of this type is possible.

40
SPIKE
REPRESENTS p2(x)
prob (X=d)
p(x)

p1(x)

d X

1.0
P2(x)

0.8

0.6 ∆P = prob (X=d)


P(x)

P1 (x)
0.4

0.2

0.0 X

Figure 2.14. A possible piecewise continuous pdf and cdf for the case prob(X=d) ≠ 0.A

41
An example of a distribution as shown in figure 2.14 is the distribution of daily rainfall
amounts. The probability that no rainfall is received, prob(X=0), is finite, whereas the
probability distribution of rain on rainy days would form a continuous distribution. A second
example would be the probability distribution of the water level in some reservoir. The water
level may be maintained at a constant level d as much as possible but may fluctuate below or
above d at times. The distribution shown in figure 2.14 could represent this situation.

The relationship between relative frequency and probability can be envisioned by


considering an experiment whose outcome is a value of the random variable X. Let pX(x) be the
probability density function of X. The probability that a single trial of the experiment will result
in an outcome between X=a and X=b is given by
𝑏
𝑝𝑟𝑜𝑏(𝑎 < 𝑋 < 𝑏) = ∫𝑎 𝑝𝑋 (𝑥)𝑑𝑥 = 𝑃𝑋 (𝑎) − 𝑃𝑋 (𝑏)

In N independent trials of the experiment, the expected number of outcomes in the interval a to b
would be

𝑛𝑎−𝑏 = 𝑁[𝑃𝑋 (𝑏) − 𝑃𝑥 (𝑎)]

and the expected relative frequency of outcomes in the interval a to b is


𝑛𝑎−𝑏
𝑓𝑎−𝑏 = = 𝑃𝑋 (𝑏) − 𝑃𝑥 (𝑎)
𝑁

In general, if xi represents the midpoint of an interval of X given by xi - Δxi/2 to xi +


Δxi/2, then the expected relative frequency of outcomes in this interval of repeated, independent
trials of the experiment is given by

𝑓𝑋𝑖 = 𝑃𝑋 (𝑥𝑖 + ∆𝑥𝑖 /2) − 𝑃𝑋 (𝑥𝑖 − ∆𝑥𝑖 /2) (2.25a)

Because the right-hand side of this equation represents the area under pX(x) between 𝑥𝑖 −
∆𝑥𝑖 /2 and 𝑥𝑖 + ∆𝑥𝑖 /2, it can be approximated by

𝑓𝑋𝑖 = ∆𝑥𝑖 𝑝𝑋 (𝑥𝑖 ) (2.25b)

Equations 2.25 can be used to determine the expected relative frequency of repeated,
independent outcomes of a random experiment whose outcome is a value of the random variable
X.

If N independent observations of X are available, the actual relative frequency of


outcomes in an interval of width Δxi centered on xi may not equal fXi as given by equation 2.25
since X is a random variable whose behavior can only be described probabilistically. The most
probable outcome or the expected outcome will equal the observed outcome only if pX(x) is truly
the probability density function for X and for an infinitely large number of observations. Even if
the true probability density function is being used, the actual frequency of outcomes in the
42
interval Δxi approaches the expected number only as the number of trials or observations
becomes very large.

Example 2.8. Plot the expected frequency histogram using the probability density function of
example 2.8 and a class interval of 1/2.

Solution:

𝑓𝑋𝑖 = ∆𝑥𝑖 𝑝𝑋 (𝑥𝑖 )

𝑝𝑋 (𝑥𝑖 ) 3𝑥 2
= = 250𝑖
2

The desired plot is shown in figure 2.15.

xi pX (xi) fXi
0.25 0.0015 0.00075
0.75 0.0135 0.00675
1.25 0.0375 0.01875
1.75 0.0735 0.03675
2.25 0.1215 0.06075
2.75 0.1815 0.09075
3.25 0.2535 0.12657
3.75 0.3375 0.16875
4.25 0.4335 0.21675
4.75 0.5415 0.27075
Sum 0.9975

0.30
0.25
0.20
f(x)

0.15
0.10
0.05
0.00
0.25 0.75 1.25 1.75 2.25 2.75 3.25 3.75 4.25 4.75
X

Fig. 2.15. Plot for example 2.10

43
BIVARIATE DISTRIBUTIONS

The situation frequently arises where one is interested in the simultaneous behavior of
two or more random variables. An example might be the flow rates on two streams near their
confluence. One might like to know the probability of both streams having peak flows
exceeding a given value. A second example might be the probability of a rainfall exceeding 2.5
inches at the same time the soil is nearly saturated. Rainfall depth and soil water content would
be two random variables.

Example 2.10. The magnitude of peak flows from small watersheds is often estimated from the
“Rational Equation” given by Q=CIA where Q is the estimated flow, C is a coefficient, I is a
rainfall intensity, and A is the watershed area. The assumption is made that the return period of
flow will be the same as the return period of the rainfall that is used. To verify this assumption it
is necessary to study the joint probabilities of the two random variables Q and I.

If X and Y are continuous random variables, their joint probability density function is
pX,Y(x, y) and the corresponding cumulative probability distribution is PX,Y(x,y). These two are
related by

𝛿2
𝑃𝑋,𝑌 (𝑥, 𝑦) = 𝛿𝑥𝛿𝑦 𝑃𝑋,𝑌 (𝑥, 𝑦) (2.26)

and
𝑥 𝑦
𝑃𝑋,𝑌 = 𝑝𝑟𝑜𝑏(𝑋 ≤ 𝑥 𝑎𝑛𝑑 𝑌 ≤ 𝑦) = ∫−∞ ∫−∞ 𝑝𝑋,𝑌 (𝑡, 𝑠)𝑑𝑠 𝑑𝑡 (2.27)

The corresponding relationships for X and Y being discrete random variables are

𝑓𝑋,𝑌 (𝑥𝑖 , 𝑦𝑖 ) = 𝑝𝑟𝑜𝑏(𝑋 = 𝑥𝑖 𝑎𝑛𝑑 𝑌 = 𝑦𝑖 ) (2.18)

𝐹𝑋,𝑌 (𝑥, 𝑦) = 𝑝𝑟𝑜𝑏(𝑋 ≤ 𝑥 𝑎𝑛𝑑 𝑌 ≤ 𝑦) = ∑𝑥𝑖≤𝑥 ∑𝑦𝑖≤𝑦 𝑓𝑋,𝑌 (𝑥𝑖 , 𝑦𝑖 ) (2.19)

It should be noted that the bivariate analogy of equation 2.16 is

𝑓𝑋𝑌 �𝑥𝑖 , 𝑦𝑗 � = 𝐹𝑋,𝑌 �𝑥𝑖 , 𝑦𝑗 � − 𝐹𝑋,𝑌 �𝑥𝑖 , 𝑦𝑗−1 � − 𝐹𝑋,𝑌 �𝑥𝑖−1 , 𝑦𝑗 � + 𝐹𝑋,𝑌 �𝑥𝑖−1 , 𝑦𝑗−1 � (2.20)

Some of the properties of continuous bivariate distributions are

1) PX,Y(x,∞) is a cumulative univariate probability function of X only (the cumulative


marginal distribution of X).

2) PX,Y (∞, y) is a cumulative univariate probability function of Y only (the cumulative


marginal distribution of Y).

3) pX,Y (x, y) ≥ 0
44
4) PX,Y (∞, ∞) = 1

5) PX,Y (-∞, y) = PX,Y (x, -∞) = 0

MARGINAL DISTRIBUTIONS

If one is interested in the behavior of one of a pair of random variables regardless of the
value of the second random variable, the marginal distribution may be used. For instance the
marginal density of X, pX (x), is obtained by integrating pX,Y (x, y) over all possible values of Y.

p x ( x ) = ∫ p X ,Y ( x, s )ds (2.21)
−∞

The cumulative marginal distribution is given by

𝑃𝑋 (𝑥) = 𝑃𝑋,𝑌 (𝑥, ∞) = 𝑝𝑟𝑜𝑏(𝑋 ≤ 𝑥 𝑎𝑛𝑑 𝑌 ≤ ∞) (2.32a)

= 𝑝𝑟𝑜𝑏(𝑋 ≤ 𝑥) (2.32b)
𝑥 ∞
= ∫−∞ ∫−∞ 𝑝𝑋,𝑌 (𝑡, 𝑠)𝑑𝑠 𝑑𝑡 (2.32c)
𝑥
= ∫−∞ 𝑝𝑋 (𝑡)𝑑𝑡 (2.32d)

Similarly the marginal density and cumulative marginal distribution of Y are



pY ( y ) = ∫ p X ,Y (t , y )dt (2.22)
−∞

and

PY ( y ) = ∫ pY (s )ds
y
(2.23)
−∞

The corresponding relationships for a discrete bivariate distribution are

𝑓𝑋 (𝑥𝑖 ) = 𝑝𝑟𝑜𝑏(𝑋 = 𝑥𝑖 ) = ∑𝑗 𝑓𝑋,𝑌 (𝑥𝑖 , 𝑦𝑖 ) (2.24a)

𝑓𝑌 �𝑦𝑗 � = 𝑝𝑟𝑜𝑏�𝑌 = 𝑦𝑗 � = ∑𝑖 𝑓𝑋,𝑌 (𝑥𝑖 , 𝑦𝑖 ) (2.24b)

𝐹𝑋 (𝑥) = 𝑝𝑟𝑜𝑏(𝑋 ≤ 𝑥) = ∑𝑥𝑖 ≤𝑥 𝑓𝑋 (𝑥𝑖 ) = ∑𝑥𝑖 ≤𝑥 ∑𝑗 𝑓𝑋,𝑌 (𝑥𝑖 , 𝑦𝑖 ) (2.25a)

𝐹𝑌 (𝑦) = 𝑝𝑟𝑜𝑏(𝑌 ≤ 𝑦) = ∑𝑦𝑗≤𝑦 𝑓𝑌 (𝑦𝑖 ) = ∑𝑦𝑗 ≤𝑦 ∑𝑖 𝑓𝑋,𝑌 (𝑥𝑖 , 𝑦𝑖 ) (2.25b)

45
CONDITIONAL DISTRIBUTIONS

A marginal distribution is the distribution of one variable regardless of the value of the
second variable. The distribution of one variable with restrictions or conditions placed on the
second variable is called a conditional distribution. Such a distribution might be the distribution
of X given that Y equals y0 or the distribution of Y given that x1 ≤ X ≤ x2.

In general the conditional distribution of X given that Y is in some region R is arrived at


using the same reasoning that was used in obtaining equation 2.8. The total sample space of Y is
now the region R. Because

∫∫ p X ,Y (t , s )dtds = ∫ pY (s )ds
−∞ R
R

then

∫𝑅 𝑝𝑋,𝑌 (𝑥,𝑠)𝑑𝑠
∫𝑅 𝑝𝑌 (𝑠)𝑑𝑠

represents a probability density function of X given that Y is in R. The conditional density of X


given Y is in R is given by

∫𝑅 𝑝𝑋,𝑌 (𝑥,𝑠)𝑑𝑠
𝑓𝑋|𝑌 (𝑥|𝑌 𝑖𝑠 𝑖𝑛 𝑅) = (2.26)
∫𝑅 𝑝𝑌 (𝑠)𝑑𝑠

for X and Y continuous. Similarly the conditional distribution of (X|Y is in R) for X and Y
discrete is

∑ f (x , y )
X ,Y i j

f X |Y ( xi | Y is in R ) =
y j in R
(2.27)
∑f
y j in R
Y (yj)

The determination of conditional probabilities from equations 2.37 and 2.38 are done in the usual
way.

prob( X is in S | Y is in R ) = ∫ p X |Y (x | Y is in R )dx (2.28)


S

for X and Y continuous and

prob( X is in S | Y is in R ) = ∑ f (x
xi in S
X |Y i
| Y is in R ) ) (2.29)

for X and Y discrete.


46
For the special case where X and Y are continuous and the conditional density of X given
Y=y0 is desired, equation 2.37 breaks down since both the numerator and denominator become
zero. In this case

p X ,Y ( x, yO )
p X |Y ( x | Y = yo ) = (2.30)
pY ( yo )

The proof of this may be found in Neuts (1973). In most statistics books pX|Y (x|Y = y0) is
simply written as

p X ,Y ( x, y )
p X |Y ( x | y ) = (2.31)
pY ( y )

and called the conditional density of X given Y.

All of the above results are symmetrical with respect to X and Y. For example

𝑝𝑟𝑜𝑏(𝑌 𝑖𝑠 𝑖𝑛 𝑆|𝑋 𝑖𝑠 𝑖𝑛 𝑅) = ∫𝑆 𝑝𝑋|𝑌 (𝑦|𝑋 𝑖𝑠 𝑖𝑛 𝑅)𝑑𝑦 (2.32)

for X and Y continuous where

∫ p X ,Y (t , y )dt
pY |X ( y | X is in R ) = R
(2.33)
∫ p X (t )dt
R

If the region R of equation 2.37 is the entire region of definition with respect to Y, then

∫𝑅 𝑝𝑌 (𝑠) 𝑑𝑠 = 1

and

∫𝑅 𝑝𝑋,𝑌 (𝑥, 𝑠) 𝑑𝑠 = 𝑝𝑋 (𝑥)

so that

pY |X ( x | Y is in R ) = p x ( x )

This results from the fact that the condition that Y is in R when R encompasses the entire region
of definition of Y is really no restriction but simply a condition stating that Y may take on any
value in its range. In this case pX|Y (x|Y is in R) is identical to the marginal density of X.

47
INDEPENDENCE

From equation 2.37 or 2.38 it can be seen that in general the conditional density of X
given Y is a function of y. If the random variables X and Y are independent, this functional
relationship disappears (i.e. pX|Y (x|Y) is not a function of y). In fact in this case

p X |Y ( x | y ) = p X ( x )

or

f X |Y (xi | y j ) = f X ( xi )

or the conditional density equals the marginal density. Furthermore, if X and Y are independent
(continuous or discrete) random variables, their joint density is equal to the product of their
marginal densities.

p X |Y ( x, y ) = p X ( x ) pY ( y ) (2.45a)

or

f X |Y (xi , y j ) = f X ( xi ) f Y ( y j ) (2.33b)

The random variables X and Y are independent in the probabilistic sense (stochastically
independent) if and only if their joint density is equal to the product of their marginal densities.

Independence is an extremely important property. A bivariate distribution is much more


difficult to define and to work with than is a univariate distribution. If independence exists and
the proper pdf for X and for Y can be determined, the bivariate distribution for X and Y is given
as the product of these two univariate distributions.

DERIVED DISTRIBUTIONS

Situations often arise where the joint probability distribution of a set of random variables
is known and the distribution of some function or transformation of these variables is desired.
For example the joint probability distribution of the flows in two tributaries of a stream may be
known while the item of interest may be the sum of the flows in the two tributaries. Some
commonly used transformations are translation and/or rotation of axes, logarithmic
transformations, nth root transformations for n equal 2 and 3 and certain trigonometric
transformations.

Thomas (1971) presents the developments that lead to the results presented here
concerning transformations and derived distributions for continuous random variables. The
procedure for discrete random variables is simply an accounting procedure.

48
Example 2.9. Let X have the distribution function

𝑓𝑋 (𝑥) = 𝑐/𝑥 𝑓𝑜𝑟 𝑋 = 2,3,4,5

Let 𝑌 = 𝑋 2 − 7𝑋 + 12. The probability distribution and possible values of Y can be determined
from the following table.

x 2 3 4 5
y 2 0 0 2
fx(x) c/2 c/3 c/4 c/5

Thus 𝑓𝑌 (𝑦) = 𝑐/3 + 𝑐/4 = 35𝑐/60 𝑓𝑜𝑟 𝑌 = 0

= 𝑐/2 + 𝑐/5 = 42𝑐/60 𝑓𝑜𝑟 𝑌 = 2

= 0 𝑒𝑙𝑠𝑒𝑤ℎ𝑒𝑟𝑒

The value for c can be evaluated from either the requirement that

∑4𝑖=1 𝑓𝑋 (𝑥𝑖 ) = 1 𝑜𝑟 ∑2𝑖=1 𝑓𝑌 (𝑦𝑖 ) = 1

In either case the value of c will be found to be 60/77.

For a univariate continuous distribution of the random variable X, the distribution of U


where

𝑈 = 𝑢(𝑋) (2.34)

is a monotonic function u(X) is monotonically increasing if u(x2) ≥ u(x1) for x2 > x1 and
monotonically decreasing if u(x2) ≤ u(x1) for x2 > x1 can be found from
𝑑𝑥
𝑝𝑈 (𝑢) = 𝑝𝑋 (𝑥) �𝑑𝑢� (2.35)

Example 2.10. Find the probability of 0<U<10 if U = X2 and X is a continuous random variable
with
3𝑥 2
𝑝𝑋 (𝑥) = 125 0<𝑋<5

Solution:

49
𝑑𝑥
𝑝𝑈 (𝑢) = 𝑝𝑋 (𝑥) �𝑑𝑢�

𝑑𝑥 1 1
𝑑𝑢
= 2𝑥 = 2
√𝑢

3𝑥 2
𝑝𝑈 (𝑢) = 250
√𝑢

or, since 𝑈 = 𝑋 2 ,

3√𝑢
𝑝𝑈 (𝑢) = 250

A check to see that pU(u) is a probability density can be made by integrating pU(u) from 0 to 25

25 25 3√𝑢
∫0 𝑝𝑈 (𝑢)𝑑𝑢 = ∫0 250
𝑑𝑢 = 1

Now

10 3√𝑢 103/2
𝑝𝑟𝑜𝑏(0 < 𝑈 < 10) = ∫0 𝑑𝑢 =
250 125

The same result could have been obtained by noting that

𝑝𝑟𝑜𝑏(0 < 𝑈 < 10) = 𝑝𝑟𝑜𝑏(0 < 𝑋 < �10)

√10 3𝑥 2 103/2
𝑝𝑟𝑜𝑏�0 < 𝑈 < √10� = ∫0 𝑑𝑥 =
125 125

In the case of a continuous bivariate density, the transformation from pX,Y (x,y) to pU, V
(u, v) where U = u(X,Y) and V = v(X,Y) are one-to-one continuously differentiable
transformations can be made by the relationship

 x, y 
pU ,V (u, v ) = p X ,Y ( x, y ) J   (2.36)
 u, v 

𝑥,𝑦
where 𝐽 �𝑢,𝑣� is the Jacobian of the transformation computed as the determinant of the matrix of
partial derivatives

∂x ∂x
 x, y  ∂u ∂v
J = (2.37)
 u, v  ∂y ∂y
∂u ∂v

50
The limits on U and V must be determined from the individual problem at hand.

Example 2.11. Given that 𝑝𝑋,𝑌 (𝑥, 𝑦) = (5 − 𝑦/2 − 𝑥)/14 for 0<X<2 and 0<Y<2. If U = X
+ Y and V = Y/2, what is the joint probability density function for U and V? What are the
proper limits on U and V?

Solution:
𝑦 𝑦
5− −𝑥 5+ −(𝑥+𝑦)
𝑝𝑋,𝑌 (𝑥, 𝑦) = 2
14
= 2
14

𝑈 =𝑋+𝑌 𝑉 = 𝑌/2
𝑥,𝑦
𝑝𝑈,𝑉 (𝑢, 𝑣) = 𝑝𝑋,𝑌 (𝑥, 𝑦) �𝐽 �𝑢,𝑣��

𝛿𝑥 𝛿𝑥 𝛿𝑢 𝛿𝑣
𝑥,𝑦 𝛿𝑢 𝛿𝑣 1 𝛿𝑥 𝛿𝑥
1 0
𝐽 �𝑢,𝑣� = �𝛿𝑦 𝛿𝑦
� or 𝐽
= �𝛿𝑢 𝛿𝑣 � = �1 1 � or 𝑗 = 2
2
𝛿𝑢 𝛿𝑣 𝛿𝑦 𝛿𝑦

The limits on U and V can be determined by noting that 𝑌 = 2𝑉 and 𝑋 = 𝑈 − 2𝑉.


Therefore, the limit of Y = 0 maps to V = 0, Y = 2 maps to V = 1, X = 0 maps to 𝑈 = 2𝑉 and X
= 2 maps to 𝑈 = 2𝑉 + 2. These limits are shown in figure 2.16. A check can be made by
integrating pU,V(u,v) over the region 0 < 𝑉 < 1, 2𝑉 < 𝑈 < 2𝑉 + 2.

Fig. 2.16. Mapping from X,Y to U,V for example 2.13.


1 2𝑣+2 5+𝑣−𝑢
∫0 ∫2𝑣 7
𝑑𝑢𝑑𝑣 =

2𝑣+2
1 1 5𝑢+𝑣𝑢−𝑢2

7 0
� 𝑑𝑣 =
2 2𝑣

51
1
1 1 8𝑣−𝑣 2
∫ (8
7 0
− 2𝑣)𝑑𝑣 = 7
� =1
0

Therefore pU,V(u, v) over the above defined region is indeed a probability density function.

A special case of a bivariate transformation is when the distribution of U=u(X, Y) is


desired. In this case one method of obtaining pU(u) is to define a dummy random variable V =
v(X, Y). Equation 2.48 is then used to find the joint density of U and V, pU,V(u, v). The
univariate density of U is now the marginal distribution of U found by integrating out V.

Other special cases of bivariate transformations involve the sums, products and quotients
of random variables. If the joint distribution of X and Y is PX, Y (x, y) for X>0 and Y>0, the
following results are obtained for the distribution of U as a function of X and Y.

Function pdf

U=X+Y 𝑝𝑈 (𝑢) = ∫ 𝑝𝑋,𝑌 (𝑠, 𝑢 − 𝑥)𝑑𝑥 (2.38)


1
U=XY 𝑝𝑈 (𝑢) = ∫ 𝑥 𝑝𝑋,𝑌 (𝑥, 𝑢/𝑥)𝑑𝑥 (2.39)

𝑥
U=X/Y 𝑝𝑈 (𝑢) = ∫ 𝑢2 𝑝𝑋,𝑌 (𝑥, 𝑥/𝑢)𝑑𝑥 (2.40)

U=Y/X 𝑝𝑈 (𝑢) = ∫ 𝑥𝑝𝑋,𝑌 (𝑥, 𝑢𝑥) 𝑑𝑥 (2.41)

In some cases the function U = u(X) may be such that it is difficult to analytically
determine the distribution of U from the distribution of X. In this case it may be possible to
generate a large sample of X=s (chapter 13), calculate the corresponding U=s and then fit a
probability distribution to the U=s (chapter 6). It should be noted, however, that this empirical
method will not in general satisfy equations 2.47 or 2.48.

MIXED DISTRIBUTIONS

If pi(x) for i = 1, 2, ..., m represent probability density functions and λi for i = 1, 2, ..., m
represent parameters satisfying λi ≥ 0 and Σi = 1 λi = 1, then
m

𝑝𝑋 (𝑥) = ∑𝑚
𝑖=1 𝜆𝑖 𝑝𝑖 (𝑥) (2.54)

is probability density function known as a mixture or mixed distribution since it is composed of


a mixture of pi(x). The parameter λi may be thought of as the probability that a random variable
is from the probability distribution pi(x) and pi(x) is the probability distribution of X given that X
is from the ith distribution. The cumulative distribution of X is given by

PX ( x) = ∫ ∑
x
λi pi (t )dt
m

i −1
(2.42)
−∞

52
= ∑i −1 λi ∫ pi (t )dt
m x

−∞

Mixed distributions in hydrology may be applicable in situations where more than one
distinct cause for an event may exist. For example flood peaks from convective storms might be
described by p1(x) and from hurricane storms by p2(x). If λ1 is the proportion of flood peaks
generated by convective storms and λ2 = (1 - λ1), is the proportion generated by hurricane storms,
then equations 2.54 and 2.55 would describe the probability distribution of flood peaks.

Singh (1974), Hawkins (1974), Singh (1987a, 1987b) Hirschboeck (1987), and Diehl and
Potter (1987) discuss procedures for applying mixed distribution in the form of equation 2.54 to
flood frequency determinations. Two general approaches are used. One is to allow the data and
statistical estimation procedures to determine the mixing parameter, λ, and the parameters of the
distributions, pi(x). The other is to use physical information on the actual events to classify them
and thus determine λ. Once classified, the two sets of data can independently be used to
determine the parameters of the pdfs.

Example 2.12. A certain event has probability 0.3 of being from the distribution p1(x) = e-x,
x>0. The event may also be from the distribution p2(x) = 2e-2x, x>0. What is the probability that
a random observation will be less than 1?

Solution:

𝑃𝑋 (𝑥) = 0.3𝑃1 (𝑥) + 0.7𝑃2 (𝑥)

= 0.3(1 − 𝑒 −𝑥 ) + 0.7(1 − 𝑒 −2𝑥 )

𝑃𝑋 (1) = 0.3(1 − 𝑒 −1 ) + 0.7(1 − 𝑒 −2 )

= 0.3(1 − 0.368) + 0.7(1 − .135)

=0.795

53
EXERCISES

2.1 (a) Construct the theoretical relative frequency histogram for the sum of values obtained in
tossing two dice.

(b) Toss two dice 100 times and tabulate the frequency of occurrence of the sums of the two
dice. Plot the results on the histogram of part a.

(c) Why do the results of part b not equal the theoretical results of part a? What possible kinds
of errors are involved? Which kind of error was the largest in your case?

2.2 Select a set of data consisting of 50 or more observations. Construct a relative frequency
plot using at least two different groupings of the data. Which of the two groupings do you
prefer? Why?

2.3 In a period of one week, 3 rainy days were observed. If the occurrence of a rainy day is an
independent event, how many ways could the sequence consisting of 4 dry and 3 wet days be
arranged.

2.4 If the occurrence of a rainy day is an independent event with probability equal to 0.3, what is
the probability of

(a) exactly 3 rainy days in one week?

(b) the next three days will be rain?

(c) 3 rainy days in a row during any week with the other 4 days dry?

2.5 Consider a coin with the probability of a head equal to p and the probability of a tail equal to
q=1-p.

(a) What is the probability of the sequence HHTHTTH in 7 flips of the coin?

(b) What is the probability of a specified sequence resulting in r H=s and s T=s?

(c) How many ways can r H’s and s T’s be arranged?

(d) What is the probability of r H’s and s T’s without regard to the order of the sequence?

2.6 The distribution given by fx(x) = 1/N for X = 1,2,3,...,N is known as the discrete uniform
distribution. In the following consider N≥5.

(a) What is the probability that a random value from fX(x) will be equal to 5?

(b) What is the probability that a random value from fX(x) will be between 3 and 5 inclusive?

54
(c) What is the probability that in a random sample of 3 values from fX(x) all will be less than 5?

(d) What is the probability that the 3 random values from fX(x) will all be less than 5 given that 1
of the values is less than 5?

(e) If 2 random values are selected from fX(x), what is the probability that one will be less than 5
and the other greater than 5? (f) For what X from fX(x) is prob(X≤x) = 0.5?

2.7 Consider the continuous probability density function pX(x) = a sin2mx for 0 < X < π.

(a) What must be the value of a and m?

(b) What is PX(x)?

(c) What is prob(0<X<π/2)?

(d) What is prob(X>a/2 | X<a/4)?

2.8 Consider the continuous probability density function given by pX(x) = 0.25 for 0<X<a.

(a) What is a?

(b) What is prob(X>a/2)?

(c) What is prob(X>a/2|X>a/4)?

(d) What is prob(X>a|X<a/4)?

2.9 Let pX(x) = 0.25 for 0<X<a as in exercise 2.8. What is the distribution of Y = ln X? Sketch
pY(y).

2.10 Many probability distributions can be defined simply by consulting a table of definite

integrals. For example ∫0 𝑥 𝑛−1 𝑒 −𝑥 𝑑𝑥 is equal to Γ(n) where Γ(n) is defined as the gamma
function (see chapter 6). Therefore one can define pX(x)=xn-1 e-x/Γ(n) to be a probability density
function for n>0 and 0<X<∞. This distribution is known as the 1-parameter gamma distribution.
Using a table of definite integrals, define several possible continuous probability distributions.
Give the appropriate range on X and any parameters.

2.11 The annual inflow into a reservoir (acre-feet) follows a probability density given by pX(x) =
1/(β1 - α1). The total annual outflow in acre-feet follows a probability distribution given by pY(y)
= 1/(β2 - α2). Consider that β1 > β2 and α1 < α2.

(a) Calculate the expression for the probability distribution of the annual change in storage.

55
(b) Plot the probability distribution of the annual change in storage.

(c) If β1 = 100,000, α1 = 20,000, β2 = 70,000 and α2 = 50,000, what is the probability that the
change in storage will be I) negative and ii) greater than 15,000 acre-feet?

2.12 The probability of receiving more than 1 inch of rain in each month is given in the
following table. If a monthly rainfall record selected at random is found to have more than 1
inch of rain, what is the probability the record is for July? April?

Jan 0.25 Apr 0.40 Jul 0.05 Oct 0.05


Feb 0.30 May 0.20 Aug 0.05 Nov 0.10
Mar 0.35 Jun 0.10 Sep 0.05 Dec 0.20

2.13 It is known that the discharge from a certain plant has a probability of 0.001 of containing a
fish killing pollutant. An instrument used to monitor the discharge will indicate the presence of
the pollutant with probability 0.999 if the pollutant is present and with probability 0.01 if the
pollutant is not present. If the instrument indicates the presence of the pollutant, what is the
probability that the pollutant is really present?

2.14 A potential purchaser of a ferry across a river knows that if a flow of 100,000 cfs or more
occurs, the ferry will be washed downstream, over a low dam and destroyed. He knows that the
probability of a flow of this kind in any year is 0.05. He also knows that for each year that the
ferry operates a net profit of $10,000 is realized. The purchase price of the ferry is $50,000.
Sketch the probability distribution of the potential net profit over a period of years neglecting
interest rates and other complications. Assume that if a flow of 100,000 cfs or more occurs in a
year, the profit for that year is zero.

2.15 Assume that the probability density function of daily rainfall is given by

prob(X=0) = 0.25 X=0

pX(x) = 0.25x 0<X<1.732

pX(x) = 0.866 - 0.25x 1.732<X<3.464

(a) Is this a proper probability density function?

(b) What is prob(X>0.5)?

(c) What is prob(X>0.5 | X≠0)?

2.16 Consider the probability density function given by

56
pX(x) = λ1/2 0<X<2

pX(x) = (1 - λ1)/4 2<X<6

This is a mixture of 2 uniform distributions.

(a) sketch pX(x) for λ1 = 0.5.

(b) Sketch pX(x) for λ1 = 0.1.

(c) Sketch pX(x) for λ1 = 0.333.

(d) In a random sample from pX(x), 60% of the values were between 0 and 2. What would be an
estimate for the value of λ1?

2.17 Show that equations 2.50 through 2.53 are valid.

57

You might also like