Professional Documents
Culture Documents
The instantaneous maximum peak flow observed during a year would be another example
of a stochastic hydrologic process. Table 2.1 contains such a listing for the Kentucky River near
Salvisa, Kentucky. By examining this table it can be seen that there is some order to the values
yet a great deal of randomness exists as well. Even though the peak flow for each of the 99 years
is listed, one cannot estimate with certainty what the peak flow for 1998 was. From the tabulated
data one could surmise that the 1998 peak flow was “probably” between 20,600 cfs and 144,000
cfs. We would like to be able to estimate the magnitude of this “probably”. The stochastic nature
of the process, however, means that one can never estimate with certainty the exact value for the
process (peak discharge) based solely on past observations.
The definition of stochastic given above has some theoretical drawbacks, as we shall see.
Hydrologic processes are continuous processes. The probability of realizing a given value from a
continuous probability distribution is zero. Thus, the probability that a variable will take on a
certain value from a specified set is zero, if the variable is continuous. Practically this presents no
problem since we are generally interested in the probabilities that the variant will be in some
range of values. For instance, we are generally not interested in the probability that the flow rate
will be exactly 100,000 cfs but may desire to estimate the probability that the flow will exceed
100,000 cfs, or be less than 100,000 cfs, or between 90,000 and 120,000 cfs.
With this introduction several concepts such as probability, continuous and probability
distributions have been introduced. We will now define these concepts and others as a basis for
considering statistical methods in hydrology.
16
Table 2.1. Peak discharges (cfs) Kentucky River near Salvisa, Kentucky
Year Flow Year Flow Year Flow
1895 47300 1928 68700 1961 48800
96 54400 29 80100 62 101000
97 87200 30 32300 63 63800
98 65700 31 43100 64 82500
99 91500 32 77000 65 60300
1900 53500 33 53600 66 38900
01 67800 34 70800 67 69900
02 70000 35 89400 68 55200
03 66900 36 62600 69 29400
04 34700 37 112000 70 68400
05 58000 38 44000 71 51700
06 47000 39 84300 72 107000
07 66300 40 45000 73 58600
08 80900 41 28400 74 77900
09 80000 42 46000 75 79600
10 52300 43 80400 76 46300
11 58000 44 55000 77 53400
12 67200 45 72900 78 61800
13 115000 46 71200 79 144000
14 46100 47 46800 80 40600
15 52400 48 84100 81 49600
16 94300 49 61300 82 42900
17 111000 50 87100 83 68800
18 71700 51 70500 84 98400
19 96100 52 77700 85 36500
20 92500 53 44200 86 42100
21 34100 54 20600 87 48800
22 69000 55 85000 88 35100
23 73400 56 82900 89 105000
24 99100 57 88900 90 53600
25 79200 58 60200 91 82900
26 62600 59 40300 92 54600
27 93700 60 50500 93 52100
94 71000
17
PROBABILITY
In the mathematical development of probability theory, the concern has been not so much
how to assign probability events, but what can be done with probability once these assignments
are made. In most applied problems in hydrology, one of the most important and difficult task is
the initial assignment of probability. We may be interested in the probability that a certain flood
level will be exceeded in any year or that the elevation of a piezometric head may be more than
30 feet below the ground surface for 20 consecutive months. We may want to determine the
capacity required in a storage reservoir so that the probability of being able to meet the projected
water demand is 0.97. To address these problems we must understand what probability means
and how to relate magnitude to probabilities.
The definition of probability has been labored over for many years. One definition that is
easy to grasp is the classical or a priori definition:
If a random event can occur in n equally likely and mutually exclusive ways, and if na of
these ways have an attribute A, then the probability of the occurrence of the event having
attribute A is na/n written as
𝑛𝑎
𝑝𝑟𝑜𝑏(𝐴) = (2.1)
𝑛
This definition is an a priori definition because it assumes that one can determine before
the fact all of the equally likely and mutually exclusive ways that an event can occur and all of
the ways that an event with attribute A can occur. The definition is somewhat circular in that
“equally likely” is another way of saying “equally probable” and we end up using the word
“probable” to define probability. This classical definition is widely used in games of chance
such as card games and dice and in selecting objects with certain characteristics from a larger
group of objects. This definition is difficult to apply in hydrology because we generally cannot
divide hydrologic events into equally likely categories. To do that would require knowledge of
the likelihood or probability of the events which is generally the objective of our analysis and not
known before the analysis.
If a random event occurs a large number of times n and the event has attribute A
in na of these occurrences, then the probability of the occurrence of the event
having attribute A is
𝑛𝑎
𝑝𝑟𝑜𝑏 (𝐴) = lim𝑛→∞ (2.2)
𝑛
The relative frequency concept of probability is the source of the relationship given in
Chapter 1 between the return period, T, of an event and its probability of occurrence, p.
1
0.9 Probability of getting a "head"
0.8 from a coin flipping experiment
0.7
0.6
prob(H)
0.5
0.4
0.3
0.2
0.1
0
0 50 100 150 200 250
Number of trials
This is not the case when the relative frequency definition is used. Obviously we cannot
flip the coin an infinite number of times. We have to resort to a finite sample of flips. Figure 2.1
shows how the estimate of the probability of a head changes as the number of trials (flips)
changes. A trend toward ½ is noted. This is called stochastic convergence towards ½. One
19
question that might be asked is, “is the coin unbiased?” One’s initial reaction is that more trials
are needed. It can be seen that the probability is slowly converging toward ½ but after 250 trials
is not exactly equal to ½. This is the plight of the hydrologist. Many times more trials or
observations are needed but are not available. Still the data does not clearly indicate a single
answer. This is where probability and statistics come into play
Equation 2.2 allows us to estimate probabilities based on observations and does not
require that outcomes be equally likely or that they all be enumerated. This advantage is
somewhat offset in that estimates of probability based on observations are empirical and will
only stochastically converge to the true probability as the number of observations becomes large.
For example, in the case of annual flood peaks, only one value per year is realized. Figure 2.2
shows the probability of an annual peak flow on the Kentucky River exceeding the mean annual
flow as a function of time starting in 1895. Note that each year additional data becomes
available to determine both the mean annual flow and the probability of exceeding that value.
Here again a convergence toward 2 is noted yet not assured. In fact there is no reason to believe
that 2 is the Acorrect@ probability since the probability distribution of annual peak flows is likely
not symmetrical about the mean.
0.90
0.80
0.70
0.60
0.50
Probability
Prob(peak >mean)
0.40
0.30
0.20
0.10
0.00
1895 1915 1935 1955 1975
Year
Fig 2.2. Probability that the annual peak flow on the Kentucky River exceeds the mean annual
peak flow.
20
If two independent sets of observations are available (samples), an estimate of the
probability of an event A could be determined from each set of observations. These two
estimates of prob(A) would not necessarily equal each other nor would either estimate
necessarily equal the true (population) prob(A) based on an infinitely large sample. This dilemma
results in an important area of concern to hydrologists--how many observations are required to
produce “acceptable” estimates for the probabilities of events?
From either equation 2.1 or 2.2 it can be seen that the probability scale ranges from zero
to one. An event having a probability of zero is impossible while one having a probability of one
will happen with certainty. Many hydrologists like to avoid the end points of the probability
scale, zero and one, because they cannot be absolutely certain regarding the occurrence or
nonoccurrence of an event. Sometimes probability is expressed as a percent chance with a scale
ranging from 0% to 100%. Care must be taken to not confuse the percent chance values with
true probabilities. A probability of 1 is very different from a 1% chance of occurrence as the
former implies the event will certainly happen while the latter means it will happen only one
time in 100.
In mathematical statistics and probability, set and measure theory are used in defining
and manipulating probabilities. An experiment is any process that generates values of random
variables. All possible outcomes of an experiment constitute the total sample space known as the
population. Any particular point in the sample space is a sample point or element. An event is a
collection of elements known as a subset.
Letting S represent the sample space; Ei for i = 1, 2, ..., represent elements in S; A and B
represent events in S; and prob(Ei) represent the probability of Ei, it follows that
0 ≤ 𝑝𝑟𝑜𝑏(𝐸𝑖 ) ≤ 1 (2.3)
𝑆 = ⋃𝑖 𝐸𝑖
𝐴 = ⋃𝑛𝑖=𝑚 𝐸𝑖
and
E1 E2 E3
A
B
E4
Fig. 2.1. Venn diagram illustrating a sample space, elements and events.
Using notation from set theory and Venn diagrams, several probabilistic relationships can
be illustrated. If A and B are two events in S, then the probability of A or B, shown as the shaded
area of Figure 2.3, is given by
Note that in probability the word “or” means “either or both”. The notation ∪ represents a union
so that A∪B represents all elements in A or B or both. The notation ∩ represents an intersection
so that A∩B represents all elements in both A and B. The last term of equation 2.6 is needed
since prob(A) and prob(B) both include prob(A∩B). Thus prob(A∩B) must be subtracted once so
the net result is only one inclusion of prob(A∩B) on the right-hand side of the equation. If A and
B are mutually exclusive, then both cannot occur and prob(A∩B) = 0. In this case
22
E1 E2 E3
AUB
A U
A B
E4
Figure 2.3 illustrates the case where event A and B are mutually exclusive while figure
2.4 shows A and B when they are not mutually exclusive.
If Ac represents all elements in the sample space S that are not in A, then
𝐴 ∪ 𝐴𝑐 = 𝑆
𝑝𝑟𝑜𝑏(𝐴 ∪ 𝐴𝑐 ) = 1
This statement says that the probability of A or Ac is certainty since one or the other must occur.
All of the possibilities have been exhausted. Since A and Ac are mutually exclusive
Equation 2.7 often makes it easy to evaluate probability by first evaluating the probability an
23
outcome will not occur.
An example is evaluating the probability that a peak flow q exceeds some particular flow
qo. A would be all q's greater than qo and Ac would be all q's less than qo. Because q must be
either greater than or less than qo, prob(q>qo) = 1-prob(q<qo). We show later that for continuous
random variables prob(q=qo)=0.
assuming of course that prob(A)≠0. Equation 2.8 can be rearranged to give the probability of A
and B as
𝑝𝑟𝑜𝑏(𝐴 ∩ 𝐵) = 𝑝𝑟𝑜𝑏(𝐴)𝑝𝑟𝑜𝑏(𝐵|𝐴)
Now if 𝑝𝑟𝑜𝑏(𝐵|𝐴) = 𝑝𝑟𝑜𝑏(𝐵), we say that B is independent of A. Thus, the joint probability of
two independent events is the product of their individual probabilities.
Example 2.1. Using the data shown in table 2.1, estimate the probability that a peak flow in
excess of 100,000 cfs will occur in 2 consecutive years on the Kentucky River near Salvisa,
Kentucky.
Solution: From table 2.1 it can be seen that a peak flow of 100,000 cfs was exceeded 7 times in
the 99-year record. If it is assumed that the peak flows from year to year are independent, then
the probability of exceeding 100,000 cfs in any one year is approximately 7/99 or 0.0707.
Applying equation 2.9 the probability of exceeding 100,000 cfs in two successive years is found
to be 0.0707 x 0.0707 or 0.0050.
Example 2.2. A study of daily rainfall at Ashland, Kentucky, has shown that in July the
probability of a rainy day following a rainy day is 0.444, a dry day following a dry day is 0.724,
a rainy day following a dry day is 0.276 and a dry day following a rainy day is 0.556. If it is
observed that a certain July day is rainy, what is the probability that the next two days will also
be rainy?
Solution: Let A be a rainy day 1 and B be a rainy day 2 following the initial rainy day. The
probability of A is 0.444 since this is the probability of a rainy day following a rainy day.
24
𝑝𝑟𝑜𝑏(𝐴 ∩ 𝐵) = 𝑝𝑟𝑜𝑏(𝐴)𝑝𝑟𝑜𝑏(𝐵|𝐴)
Now the prob(B|A) is also 0.444 since this is the probability of a rainy day following a rainy day.
Therefore
The probability of 2 rainy days following a dry day would be 0.276 × 0.444 or 0.122.
Note that the probabilities of wet and dry days are dependent on the previous day. Independence
does not exist. It can be shown that over a long period of time, 67% of the days will be dry and
33% will be rainy with the conditional probabilities as stated. If one had assumed independence,
then the probability of two consecutive rainy days would have been 0.33×0.33 =0.1089
regardless of whether the preceding day had been rainy or dry. Since the probability of a rainy
day following a rainy day is much greater than following a dry day, persistence is said to exist.
If B1, B2, ..., Bn represents a set of mutually exclusive and collectively exhaustive events,
one can determine the probability of another event A from
This is called the theorem of total probability. Equation 2.10 is illustrated by figure 2.5.
Example 2.3. It is known that the probability that the solar radiation intensity will reach a
threshold value is 0.25 for rainy days and 0.80 for non-rainy days. It is also known that for this
particular location the probability that a day picked at random will be rainy is 0.36. What is the
probability the threshold intensity of solar radiation will be reached on a day picked at random?
Solution: Let A represent the threshold solar radiation intensity, B1 represent a rainy day and B2
a non-rainy day. From equation 2.10, we know that
= 0.602
25
B2 B3
B1 B4
A
B6 B5
BAYES THEOREM
and then substituting from equation 2.10 for prob(A), we get what is called Bayes Theorem:
𝑝𝑟𝑜𝑏�𝐵𝑗 �𝑝𝑟𝑜𝑏(𝐴|𝐵𝑗 )
𝑝𝑟𝑜𝑏�𝐵𝑗 �𝐴� = ∑𝑛 (2.11)
𝑖=1 𝑝𝑟𝑜𝑏(𝐴|𝐵𝑖 )𝑝𝑟𝑜𝑏(𝐵𝑖 )
As pointed out by Benjamin and Cornell (1970), this simple derivation of Bayes Theorem belies
its importance. It provides a method for incorporating new information with previous or so-
called prior probability assessments to yield new values for the relative likelihood of events of
interest. These new (conditional) probabilities are called posterior probabilities. Equation 2.11
is the basis of Bayesian Decision Theory. Bayes Theorem provides a means of estimating
probabilities of one event by observing a second event. Such an application is illustrated in
example 2.4.
Example 2.4. The manager of a recreational facility has determined that the probability of 1000
or more visitors on any Sunday in July depends upon the maximum temperature for that Sunday
as shown in the following table. The table also gives the probabilities that the maximum
temperature will fall in the indicated ranges. On a certain Sunday in July, the facility has more
26
than 1000 visitors. What is the probability that the maximum temperature was in the various
temperature classes?
Total 1.000
Solution: Let Tj for j = 1, 2, ..., 6 represent the 6 intervals of temperature. Then from equation
2.11
For example
0.05×0.05
𝑝𝑟𝑜𝑏(< 60𝐹|1000 𝑜𝑟 𝑚𝑜𝑟𝑒) = 0.507
= 0.005
Similar calculations yield the last column in the above table. Note that Σi=1prob(Tj | 1000 or
6
more) is equal to 1.
COUNTING
In applying equation 2.1 to determine probabilities, one often encounters situations where
it is impractical to actually enumerate all of the possible ways that an event can occur. To assist
in this matter certain general mathematical formulas have been developed.
If E1, E2, ..., En are mutually exclusive events such that Ei ∩ Ej = 0 for all i ≠ j where 0
represents an empty set and Ei can occur in ni ways, then the compound event E made up of
outcomes E1, E2, ..., En can occur in n1 n2 ...nn ways.
27
The problem of sampling or selecting a sample of r items from n items is commonly
encountered. Sampling can be done with replacement, so that the item selected is immediately
returned to the population, or without replacement, so that the item is not returned. The order of
sampling may be important in some situations and not in others. Thus, we may have four types
of samples--ordered with replacement, ordered without replacement, unordered with replacement
and unordered without replacement.
In case of an ordered sample with replacement, the number of ways of selecting item 1 is
n since there are n items in the population. Similarly, item 2 can be selected in n ways since the
first item selected is returned to the population. Thus two items can be selected in 𝑛 × 𝑛 ways.
Extending this argument to r selections, the number of ways of selecting r items from n with
replacement and order being important is simply nr.
If the items are not replaced after being selected, then the first item can be selected in n
ways, the second in n-1 ways and so on until the rth item can be selected in n-r+1 ways. Thus r
ordered items can be selected from n without replacement in n(n-1)(n-2)...(n-r+1) ways. This is
commonly written as (n)r and called the number of permutations of n items taken r at a time.
𝑛+𝑟−1 (𝑛+𝑟−1)!
� � = (𝑛−1)!𝑟!
𝑟
The number of ways of selecting samples under the four above conditions is summarized in the
following table:
28
With Without
replacement replacement
Ordered 𝑛𝑟 𝑛!
(𝑛)𝑟 =
(𝑛 − 𝑟)!
Example 2.5. For a particular watershed, records from 10 rain gages are available. Records
from 3 of the gages are known to be bad. If 4 records are selected at random form the 10 records
Solution: The total number of ways of selecting 4 records from the 10 available records (order
is not important) is
10 10!
� � = 6!4! = 210
4
(a) The number of ways of selecting 1 bad record from 3 bad records and 3 good records
from 7 good records is
3 7 3! 7!
� � � � = 2!1! 4!3! = 105
1 3
Applying equation 2.1 and letting a = 1 bad and 3 good records, the probability of a is 105/210
or 0.500.
(b) The number of ways of selecting 3 bad records and 1 good record is
3 7
� � � � = 1 × 7=7
3 1
(c) The probability of at least 1 bad record is equal to the probability of 1 or 2 or 3 bad
records. The probability of 1 and 3 bad records is known to be 0.500 and 0.033
respectively. The probability of 2 bad records is
29
3 7
� � � � 3(21)
2 2 = = 0.300
210 210
Thus, the probability of at least 1 bad record is 0.500 + 0.300 + 0.033 = 0.833. This latter result
could also be determined from the fact that the probability of 0 or 1 or 2 or 3 bad records must
equal one. The probability of at least 1 bad record thus equals
3 7
� �� � 35
0 4
1 − 𝑝𝑟𝑜𝑏(0 𝑏𝑎𝑑 𝑟𝑒𝑐𝑜𝑟𝑑𝑠) = 1 − 210
= 1 − 210 = 0.833
Example 2.5. For the situation described in Example 2.5, what is the probability of selecting at
least 2 bad records given that one of the records selected is bad?
Solution:
𝑝𝑟𝑜𝑏(𝑎𝑡 𝑙𝑒𝑎𝑠𝑡 2 𝑏𝑎𝑑 𝑜𝑢𝑡 𝑜𝑓 4)
𝑝𝑟𝑜𝑏(𝑎𝑡 𝑙𝑒𝑎𝑠𝑡 2 𝑏𝑎𝑑 𝑜𝑢𝑡 𝑜𝑓 4|1 𝑖𝑠 𝑏𝑎𝑑) = 𝑝𝑟𝑜𝑏(1 𝑏𝑎𝑑 𝑖𝑛 4)
(0.300+0.033)
= = 0.667
0.500
GRAPHICAL PRESENTATION
Hydrologists are often faced with large quantities of data. Since it is difficult to grasp the
total data picture from tabulations such as table 2.1, a useful first step in data analysis is to use
graphical procedures. Throughout this book various graphical presentations will be illustrated.
Helsel and Hirsch (1992) contains a very good treatment of graphical presentations of hydrologic
data. The appropriate graphical technique depends on the purpose of the analysis. As various
analytic procedures are discussed throughout this book, graphical techniques that supplement the
procedures will be illustrated. Undoubtedly the most common graphical representation of
hydrologic data is a time series plot of magnitude versus time. Figure 1.2 is such a plot for the
peak flow data for the Kentucky River. A plot such as this is useful for detecting obvious trends
in the data in terms of means and variances or serial dependence of data values next to each other
in time. Figure 1.2 reveals no obvious trends in the mean annual peak flow or variances of
annual peak flows. It also shows no strong dependence of peak flow in one year on the peak
flow of the previous year. We previously indicated that figure 1.1 showing the water level in the
Great Salt Lake of Utah indicated an apparent year-to-year relationship. Such a relationship is a
reflected in the serial correlation coefficient. Serial correlation will be discussed in detail later in
the book. For now suffice it to say that the serial correlation for the annual lake level for the
Great Salt Lake is 0.969 and for the annual peak flows on the Kentucky River is -0.067. This
30
indicates a very strong year-to-year correlation for the Salt Lake data and insignificant
correlation for the peak flow on the Kentucky River. Serial correlation indicates dependence
from one observation to the next. It also indicates persistence. In the Salt Lake data, lake levels
tend to change slowly with high levels following high levels. In the Kentucky River data, no
such pattern is evident.
The selection of the class interval and the location of the first class mark can appreciably
affect the appearance of a frequency histogram. The appropriate width for a class interval
depends upon the range of the data, the number of observations, and the behavior of the data.
Several suggestions have been put forth for forming frequency histograms. Spiegel (1961)
suggests that there should be 5 to 20 classes. Steel and Torrie (1960) state that the class interval
should not exceed one-fourth to one-half of the standard deviation of the data. Sturges (1926)
recommends that the number of classes be determined from
where m is the number of classes, n is the number of data values, and the logarithm to the base
10 is used.
Whatever criteria is used, it should be kept in mind that sensitivity is lost if too few or too
many classes are used. Too few classes will eliminate detail and obscure the basic pattern of the
data. Too many classes result in erratic patterns of alternating high and low frequencies. Figure
2.7 represents the Kentucky River data with too many class intervals. If possible, the class
intervals and class marks should be round figures. This is not a computational or theoretical
consideration, but one aimed at making it easier for those viewing the histogram to grasp its full
meaning.
31
.30
.24 .25 Kentucky River
.25
.19
.20
Relative Frequency
.14
.15
.10 .09
Fig. 2.4. Frequency histogram for the Kentucky River peak flow data.
.14
.12 Kentucky River
Relative Frequency
.10
.08
.06
.04
.02
.00
25 40 55 70 85 100 115 130 145
Peak Flow (1000's cfs)
Fig. 2.5. Frequency histogram for the Kentucky River data with too many classes.
1
Cumulative frequency
0.8
less than
0.6
0.4
more than
0.2
0
20 70 120
Flow (1000s cfs)
RANDOM VARIABLES
Random variables may be discrete or continuous. If the set of values a random variable
can assume is finite (or countably infinite), the random variable is said to be a discrete random
variable. If the set of values a random variable can assume is infinite, the random variable is
said to be a continuous random variable. An example of a discrete random variable would be
33
the number of rainy days experienced at a particular location over a period of 1 year. The
amount of rain received over the year would be a continuous random variable. For the most part
in this text, capital letters will be used to denote random variables and the corresponding lower
case letter will represent values of the random variable.
It is important to note that any function of a random variable is also a random variable.
That is if X is a random variable then Z=g(X) is a random variable as well. This follows from
the fact that if X is uncertain, then any function of X must be uncertain as well. Physically, this
means that any hydrologic quantity that is dependent on a random variable is also a random
variable. If runoff is a random variable, then erosion and sediment delivery to a stream is a
random variable. If sediment has absorbed chemicals, then water quality is a random variable.
Example 2.6. Nearly every hydrologic variable can be taken as a random variable. Rainfall for
any duration, streamflow, soil hydraulic properties, time between hydrologic events such as
flows above a certain base or daily rainfalls in excess of some amount, the number of times a
streamflow rate exceeds a given base over a period of a year and daily pan evaporation are all
random variables. Quantities derived from random hydrologic variables are also random
variables. The storage required in a water supply reservoir to meet a given demand is a function
of the demand and the inflow to the reservoir. Since reservoir inflow is a random variable,
required storage is also a random variable. As a matter of fact, the demand that is placed on the
reservoir would be a random variable as well. The velocity of flow through a porous media is a
function of the hydraulic conductivity and hydraulic gradient which are both random variables.
Therefore the flow velocity is a random variable also.
If the random variable X can only take on values x1, x2, ..., xn with probabilities fX(x1),
fX(x2), ..., fX(xn) and ΣifX (xi) = 1, then X is a discrete random variable. With a discrete random
variable there are “lumps” or spikes of probability associated with the values that the random
variable can assume. Figure 2.9 is a typical plot of the distribution of probability associated with
the values that a discrete random variable can assume. This information would constitute a
probability distribution function (pdf) for a discrete random variable. shown in figure 2.10. The
cumulative distribution represents the probability that X is less than or equal to xk.
The cumulative probability distribution (cdf), FX(xk) for this descrete random variable is
shown in figure 2.10. The cumulative distribution represents the probability that X is less than or
equal to xk.
The cumulative distribution has jumps in it at each xi equal in magnitude to fX(xi) or the
probability that X = xi. The probability that X = xi can be determined from
34
𝑓𝑋 (𝑥𝑖 ) = 𝐹𝑋 (𝑥𝑖 ) − 𝐹𝑋 (𝑥𝑖−1 ) (2.16)
0.3
0.25
0.2
0.15
f(x)
0.1
0.05
0
0 1 2 3 4 5 6 7 8 9 10
X
Figure 2.7. A discrete probability distribution function.
1.2
1
0.8
0.6
F(x)
0.4
0.2
0
0 1 2 3 4 5 6 7 8 9 10
X
Figure 2.8. A discrete cumulative distribution function.
The notation fX(x) and FX(x) denote the pdf and cdf of the discrete random variable X
evaluated at X = x.
Often, continuous data are treated as though they were discrete. Looking again at the
Kentucky River peak flow data, we can define the event A as having a peak flow in the ith class.
Letting ni be the number of observed peak flows in the ith interval and n be the total number of
35
observed peak flows, the probability that a peak flow is in the ith class is given by
𝑛𝑖
𝑝𝑟𝑜𝑏(𝐴) = = 𝑓𝑋𝑖 (2.15)
𝑛
Thus, the relative frequency, fXi, can be interpreted as a probability estimate, the frequency
histogram can be interpreted as an approximation for a pdf, and the cumulative frequency can be
interpreted as an approximation for a cdf.
or
𝑥
𝑃𝑋 (𝑥) = ∫−∞ 𝑝𝑋 (𝑡)𝑑𝑡 (2.17)
The notation pX(x) and PX(x) denote the pdf and cdf, respectively, of the continuous
random variable X evaluated at X = x. Thus pY (a) represents the pdf of the random variable Y
evaluated at Y = a. PY(a) represents the cdf of the random variable Y and gives prob(Y ≤ a).
A function, pX(x), defined on the real line can be a pdf if and only if
2. ∫𝑅 𝑝𝑋 (𝑥)𝑑𝑥 = 1
By definition pX(x) is zero for X outside R. Also PX (xl) = 0 and PX (xu) = 1 where xl and
xu are the lower and upper limits of X in R. For many distributions these limits are -∞ to ∞ or 0
to ∞. It is also apparent that the probability that X takes on a value between a and b is given by
𝑏
𝑝𝑟𝑜𝑏(𝑎 ≤ 𝑋 ≤ 𝑏) = ∫𝑎 𝑝𝑋 (𝑡)𝑑𝑡 (2.22)
36
The prob(a≤X≤b) is the area under the pdf between a and b. The probability that a random
variable takes on any particular value from a continuous distribution is zero. This can be seen
from
𝑑
𝑝𝑟𝑜𝑏(𝑋 = 𝑑) = ∫𝑑 𝑝𝑋 (𝑡)𝑑𝑡 (2.22)
= 𝑃𝑋 (𝑑) − 𝑃𝑋 (𝑑)
=0
Because the probability that a continuous random variable takes on a specified value is zero, the
expressions prob(a≤X≤b), prob(a<X≤b), prob(a≤X<b) and prob(a<X<b) are all equivalent. It is
also apparent that PX(x) can be interpreted as the probability that X is strictly less than x since
prob(X=x) = 0.
Figures 2.11 and 2.12 illustrate a possible pdf and its corresponding cdf. In addition to
density functions that are symmetrical and bell-shaped, densities may take on a number of
different shapes including distributions that are skewed to the right or left, rectangular,
triangular, exponential, “J”-shaped, and reverse “J”-shaped (Fig.2.13).
A1 = prob(x<a)
A2 = prob(a<x<b)
A3 = prob(x>b)
p(x)
Total area = 1
A3
A2
A1
a b X
37
1
0.8
0.6
prob(x>b)
P(x)
0.4
prob(a<x<b)
0.2
prob(x<a)
0
a b X
At this point a cautionary note is added to the effect that the probability density function,
pX(x), is not a probability and can have values exceeding one. The cumulative probability
distribution, PX(x), is a probability [prob(X ≤ x) = PX(x)] and must have values ranging from 0 to
1. Of course, pX(x) and PX(x) are related as indicated by equation 2.20 and knowledge of one
specifies the other.
38
Example 2.7. Evaluate the constant a for the following expression to be considered a
probability density function:
𝑝𝑋 (𝑥) = 𝑎𝑥 2 , 0≤𝑋≤5
What is the probability that a value selected at random from this distribution will
a) Be less than 2?
b) Fall between 1 and 3?
c) Be larger than 4?
d) Be larger than or equal to 4?
e) Exceed 6?
Solution:
𝑎𝑡 3 5
� =1
3 0
39
𝑎 = 3/125
So
3𝑥 2
𝑝𝑋 (𝑥) = 125
and
𝑥3
𝑃𝑋 (𝑥) = 125 𝑓𝑜𝑟 0 ≤ 𝑋 ≤ 5
8
a) 𝑝𝑟𝑜𝑏(𝑋 ≤ 2) = 𝑃𝑋 (2) = 125
b) 𝑝𝑟𝑜𝑏(1 ≤ 𝑋 ≤ 3) = 𝑃𝑋 (3) − 𝑃𝑋 (1) = 26/125
64
c) 𝑝𝑟𝑜𝑏(𝑋 > 4) = 1 − 𝑃𝑋 (4) = 1 − 125 = 61/125
61
d) 𝑝𝑟𝑜𝑏(𝑋 ≥ 4) = 125 𝑠𝑖𝑛𝑐𝑒 𝑝𝑟𝑜𝑏(𝑋 = 4) = 0
e) 𝑝𝑟𝑜𝑏(𝑋 > 6) = 1 − 𝑃𝑋 (6)
= 𝑃2 (𝑥) 𝑓𝑜𝑟 𝑋 ≥ 𝑑
where P2(d) > P1(d), P1(xl) = 0, P2(xu) = 1 and P1(x) and P2(x) are nondecreasing functions of X.
Figure 2.14 is a plot of such a distribution. For this situation the prob (X=d) equals the
magnitude of the jump ΔP at X=d or is equal to P2(d) - P1(d). Any finite number of
discontinuities of this type is possible.
40
SPIKE
REPRESENTS p2(x)
prob (X=d)
p(x)
p1(x)
d X
1.0
P2(x)
0.8
P1 (x)
0.4
0.2
0.0 X
Figure 2.14. A possible piecewise continuous pdf and cdf for the case prob(X=d) ≠ 0.A
41
An example of a distribution as shown in figure 2.14 is the distribution of daily rainfall
amounts. The probability that no rainfall is received, prob(X=0), is finite, whereas the
probability distribution of rain on rainy days would form a continuous distribution. A second
example would be the probability distribution of the water level in some reservoir. The water
level may be maintained at a constant level d as much as possible but may fluctuate below or
above d at times. The distribution shown in figure 2.14 could represent this situation.
In N independent trials of the experiment, the expected number of outcomes in the interval a to b
would be
Because the right-hand side of this equation represents the area under pX(x) between 𝑥𝑖 −
∆𝑥𝑖 /2 and 𝑥𝑖 + ∆𝑥𝑖 /2, it can be approximated by
Equations 2.25 can be used to determine the expected relative frequency of repeated,
independent outcomes of a random experiment whose outcome is a value of the random variable
X.
Example 2.8. Plot the expected frequency histogram using the probability density function of
example 2.8 and a class interval of 1/2.
Solution:
𝑝𝑋 (𝑥𝑖 ) 3𝑥 2
= = 250𝑖
2
xi pX (xi) fXi
0.25 0.0015 0.00075
0.75 0.0135 0.00675
1.25 0.0375 0.01875
1.75 0.0735 0.03675
2.25 0.1215 0.06075
2.75 0.1815 0.09075
3.25 0.2535 0.12657
3.75 0.3375 0.16875
4.25 0.4335 0.21675
4.75 0.5415 0.27075
Sum 0.9975
0.30
0.25
0.20
f(x)
0.15
0.10
0.05
0.00
0.25 0.75 1.25 1.75 2.25 2.75 3.25 3.75 4.25 4.75
X
43
BIVARIATE DISTRIBUTIONS
The situation frequently arises where one is interested in the simultaneous behavior of
two or more random variables. An example might be the flow rates on two streams near their
confluence. One might like to know the probability of both streams having peak flows
exceeding a given value. A second example might be the probability of a rainfall exceeding 2.5
inches at the same time the soil is nearly saturated. Rainfall depth and soil water content would
be two random variables.
Example 2.10. The magnitude of peak flows from small watersheds is often estimated from the
“Rational Equation” given by Q=CIA where Q is the estimated flow, C is a coefficient, I is a
rainfall intensity, and A is the watershed area. The assumption is made that the return period of
flow will be the same as the return period of the rainfall that is used. To verify this assumption it
is necessary to study the joint probabilities of the two random variables Q and I.
If X and Y are continuous random variables, their joint probability density function is
pX,Y(x, y) and the corresponding cumulative probability distribution is PX,Y(x,y). These two are
related by
𝛿2
𝑃𝑋,𝑌 (𝑥, 𝑦) = 𝛿𝑥𝛿𝑦 𝑃𝑋,𝑌 (𝑥, 𝑦) (2.26)
and
𝑥 𝑦
𝑃𝑋,𝑌 = 𝑝𝑟𝑜𝑏(𝑋 ≤ 𝑥 𝑎𝑛𝑑 𝑌 ≤ 𝑦) = ∫−∞ ∫−∞ 𝑝𝑋,𝑌 (𝑡, 𝑠)𝑑𝑠 𝑑𝑡 (2.27)
The corresponding relationships for X and Y being discrete random variables are
𝑓𝑋𝑌 �𝑥𝑖 , 𝑦𝑗 � = 𝐹𝑋,𝑌 �𝑥𝑖 , 𝑦𝑗 � − 𝐹𝑋,𝑌 �𝑥𝑖 , 𝑦𝑗−1 � − 𝐹𝑋,𝑌 �𝑥𝑖−1 , 𝑦𝑗 � + 𝐹𝑋,𝑌 �𝑥𝑖−1 , 𝑦𝑗−1 � (2.20)
3) pX,Y (x, y) ≥ 0
44
4) PX,Y (∞, ∞) = 1
MARGINAL DISTRIBUTIONS
If one is interested in the behavior of one of a pair of random variables regardless of the
value of the second random variable, the marginal distribution may be used. For instance the
marginal density of X, pX (x), is obtained by integrating pX,Y (x, y) over all possible values of Y.
∞
p x ( x ) = ∫ p X ,Y ( x, s )ds (2.21)
−∞
= 𝑝𝑟𝑜𝑏(𝑋 ≤ 𝑥) (2.32b)
𝑥 ∞
= ∫−∞ ∫−∞ 𝑝𝑋,𝑌 (𝑡, 𝑠)𝑑𝑠 𝑑𝑡 (2.32c)
𝑥
= ∫−∞ 𝑝𝑋 (𝑡)𝑑𝑡 (2.32d)
and
PY ( y ) = ∫ pY (s )ds
y
(2.23)
−∞
45
CONDITIONAL DISTRIBUTIONS
A marginal distribution is the distribution of one variable regardless of the value of the
second variable. The distribution of one variable with restrictions or conditions placed on the
second variable is called a conditional distribution. Such a distribution might be the distribution
of X given that Y equals y0 or the distribution of Y given that x1 ≤ X ≤ x2.
then
∫𝑅 𝑝𝑋,𝑌 (𝑥,𝑠)𝑑𝑠
∫𝑅 𝑝𝑌 (𝑠)𝑑𝑠
∫𝑅 𝑝𝑋,𝑌 (𝑥,𝑠)𝑑𝑠
𝑓𝑋|𝑌 (𝑥|𝑌 𝑖𝑠 𝑖𝑛 𝑅) = (2.26)
∫𝑅 𝑝𝑌 (𝑠)𝑑𝑠
for X and Y continuous. Similarly the conditional distribution of (X|Y is in R) for X and Y
discrete is
∑ f (x , y )
X ,Y i j
f X |Y ( xi | Y is in R ) =
y j in R
(2.27)
∑f
y j in R
Y (yj)
The determination of conditional probabilities from equations 2.37 and 2.38 are done in the usual
way.
prob( X is in S | Y is in R ) = ∑ f (x
xi in S
X |Y i
| Y is in R ) ) (2.29)
p X ,Y ( x, yO )
p X |Y ( x | Y = yo ) = (2.30)
pY ( yo )
The proof of this may be found in Neuts (1973). In most statistics books pX|Y (x|Y = y0) is
simply written as
p X ,Y ( x, y )
p X |Y ( x | y ) = (2.31)
pY ( y )
All of the above results are symmetrical with respect to X and Y. For example
∫ p X ,Y (t , y )dt
pY |X ( y | X is in R ) = R
(2.33)
∫ p X (t )dt
R
If the region R of equation 2.37 is the entire region of definition with respect to Y, then
∫𝑅 𝑝𝑌 (𝑠) 𝑑𝑠 = 1
and
so that
pY |X ( x | Y is in R ) = p x ( x )
This results from the fact that the condition that Y is in R when R encompasses the entire region
of definition of Y is really no restriction but simply a condition stating that Y may take on any
value in its range. In this case pX|Y (x|Y is in R) is identical to the marginal density of X.
47
INDEPENDENCE
From equation 2.37 or 2.38 it can be seen that in general the conditional density of X
given Y is a function of y. If the random variables X and Y are independent, this functional
relationship disappears (i.e. pX|Y (x|Y) is not a function of y). In fact in this case
p X |Y ( x | y ) = p X ( x )
or
f X |Y (xi | y j ) = f X ( xi )
or the conditional density equals the marginal density. Furthermore, if X and Y are independent
(continuous or discrete) random variables, their joint density is equal to the product of their
marginal densities.
p X |Y ( x, y ) = p X ( x ) pY ( y ) (2.45a)
or
f X |Y (xi , y j ) = f X ( xi ) f Y ( y j ) (2.33b)
The random variables X and Y are independent in the probabilistic sense (stochastically
independent) if and only if their joint density is equal to the product of their marginal densities.
DERIVED DISTRIBUTIONS
Situations often arise where the joint probability distribution of a set of random variables
is known and the distribution of some function or transformation of these variables is desired.
For example the joint probability distribution of the flows in two tributaries of a stream may be
known while the item of interest may be the sum of the flows in the two tributaries. Some
commonly used transformations are translation and/or rotation of axes, logarithmic
transformations, nth root transformations for n equal 2 and 3 and certain trigonometric
transformations.
Thomas (1971) presents the developments that lead to the results presented here
concerning transformations and derived distributions for continuous random variables. The
procedure for discrete random variables is simply an accounting procedure.
48
Example 2.9. Let X have the distribution function
Let 𝑌 = 𝑋 2 − 7𝑋 + 12. The probability distribution and possible values of Y can be determined
from the following table.
x 2 3 4 5
y 2 0 0 2
fx(x) c/2 c/3 c/4 c/5
= 0 𝑒𝑙𝑠𝑒𝑤ℎ𝑒𝑟𝑒
The value for c can be evaluated from either the requirement that
𝑈 = 𝑢(𝑋) (2.34)
is a monotonic function u(X) is monotonically increasing if u(x2) ≥ u(x1) for x2 > x1 and
monotonically decreasing if u(x2) ≤ u(x1) for x2 > x1 can be found from
𝑑𝑥
𝑝𝑈 (𝑢) = 𝑝𝑋 (𝑥) �𝑑𝑢� (2.35)
Example 2.10. Find the probability of 0<U<10 if U = X2 and X is a continuous random variable
with
3𝑥 2
𝑝𝑋 (𝑥) = 125 0<𝑋<5
Solution:
49
𝑑𝑥
𝑝𝑈 (𝑢) = 𝑝𝑋 (𝑥) �𝑑𝑢�
𝑑𝑥 1 1
𝑑𝑢
= 2𝑥 = 2
√𝑢
3𝑥 2
𝑝𝑈 (𝑢) = 250
√𝑢
or, since 𝑈 = 𝑋 2 ,
3√𝑢
𝑝𝑈 (𝑢) = 250
A check to see that pU(u) is a probability density can be made by integrating pU(u) from 0 to 25
25 25 3√𝑢
∫0 𝑝𝑈 (𝑢)𝑑𝑢 = ∫0 250
𝑑𝑢 = 1
Now
10 3√𝑢 103/2
𝑝𝑟𝑜𝑏(0 < 𝑈 < 10) = ∫0 𝑑𝑢 =
250 125
√10 3𝑥 2 103/2
𝑝𝑟𝑜𝑏�0 < 𝑈 < √10� = ∫0 𝑑𝑥 =
125 125
In the case of a continuous bivariate density, the transformation from pX,Y (x,y) to pU, V
(u, v) where U = u(X,Y) and V = v(X,Y) are one-to-one continuously differentiable
transformations can be made by the relationship
x, y
pU ,V (u, v ) = p X ,Y ( x, y ) J (2.36)
u, v
𝑥,𝑦
where 𝐽 �𝑢,𝑣� is the Jacobian of the transformation computed as the determinant of the matrix of
partial derivatives
∂x ∂x
x, y ∂u ∂v
J = (2.37)
u, v ∂y ∂y
∂u ∂v
50
The limits on U and V must be determined from the individual problem at hand.
Example 2.11. Given that 𝑝𝑋,𝑌 (𝑥, 𝑦) = (5 − 𝑦/2 − 𝑥)/14 for 0<X<2 and 0<Y<2. If U = X
+ Y and V = Y/2, what is the joint probability density function for U and V? What are the
proper limits on U and V?
Solution:
𝑦 𝑦
5− −𝑥 5+ −(𝑥+𝑦)
𝑝𝑋,𝑌 (𝑥, 𝑦) = 2
14
= 2
14
𝑈 =𝑋+𝑌 𝑉 = 𝑌/2
𝑥,𝑦
𝑝𝑈,𝑉 (𝑢, 𝑣) = 𝑝𝑋,𝑌 (𝑥, 𝑦) �𝐽 �𝑢,𝑣��
𝛿𝑥 𝛿𝑥 𝛿𝑢 𝛿𝑣
𝑥,𝑦 𝛿𝑢 𝛿𝑣 1 𝛿𝑥 𝛿𝑥
1 0
𝐽 �𝑢,𝑣� = �𝛿𝑦 𝛿𝑦
� or 𝐽
= �𝛿𝑢 𝛿𝑣 � = �1 1 � or 𝑗 = 2
2
𝛿𝑢 𝛿𝑣 𝛿𝑦 𝛿𝑦
2𝑣+2
1 1 5𝑢+𝑣𝑢−𝑢2
∫
7 0
� 𝑑𝑣 =
2 2𝑣
51
1
1 1 8𝑣−𝑣 2
∫ (8
7 0
− 2𝑣)𝑑𝑣 = 7
� =1
0
Therefore pU,V(u, v) over the above defined region is indeed a probability density function.
Other special cases of bivariate transformations involve the sums, products and quotients
of random variables. If the joint distribution of X and Y is PX, Y (x, y) for X>0 and Y>0, the
following results are obtained for the distribution of U as a function of X and Y.
Function pdf
𝑥
U=X/Y 𝑝𝑈 (𝑢) = ∫ 𝑢2 𝑝𝑋,𝑌 (𝑥, 𝑥/𝑢)𝑑𝑥 (2.40)
In some cases the function U = u(X) may be such that it is difficult to analytically
determine the distribution of U from the distribution of X. In this case it may be possible to
generate a large sample of X=s (chapter 13), calculate the corresponding U=s and then fit a
probability distribution to the U=s (chapter 6). It should be noted, however, that this empirical
method will not in general satisfy equations 2.47 or 2.48.
MIXED DISTRIBUTIONS
If pi(x) for i = 1, 2, ..., m represent probability density functions and λi for i = 1, 2, ..., m
represent parameters satisfying λi ≥ 0 and Σi = 1 λi = 1, then
m
𝑝𝑋 (𝑥) = ∑𝑚
𝑖=1 𝜆𝑖 𝑝𝑖 (𝑥) (2.54)
PX ( x) = ∫ ∑
x
λi pi (t )dt
m
i −1
(2.42)
−∞
52
= ∑i −1 λi ∫ pi (t )dt
m x
−∞
Mixed distributions in hydrology may be applicable in situations where more than one
distinct cause for an event may exist. For example flood peaks from convective storms might be
described by p1(x) and from hurricane storms by p2(x). If λ1 is the proportion of flood peaks
generated by convective storms and λ2 = (1 - λ1), is the proportion generated by hurricane storms,
then equations 2.54 and 2.55 would describe the probability distribution of flood peaks.
Singh (1974), Hawkins (1974), Singh (1987a, 1987b) Hirschboeck (1987), and Diehl and
Potter (1987) discuss procedures for applying mixed distribution in the form of equation 2.54 to
flood frequency determinations. Two general approaches are used. One is to allow the data and
statistical estimation procedures to determine the mixing parameter, λ, and the parameters of the
distributions, pi(x). The other is to use physical information on the actual events to classify them
and thus determine λ. Once classified, the two sets of data can independently be used to
determine the parameters of the pdfs.
Example 2.12. A certain event has probability 0.3 of being from the distribution p1(x) = e-x,
x>0. The event may also be from the distribution p2(x) = 2e-2x, x>0. What is the probability that
a random observation will be less than 1?
Solution:
=0.795
53
EXERCISES
2.1 (a) Construct the theoretical relative frequency histogram for the sum of values obtained in
tossing two dice.
(b) Toss two dice 100 times and tabulate the frequency of occurrence of the sums of the two
dice. Plot the results on the histogram of part a.
(c) Why do the results of part b not equal the theoretical results of part a? What possible kinds
of errors are involved? Which kind of error was the largest in your case?
2.2 Select a set of data consisting of 50 or more observations. Construct a relative frequency
plot using at least two different groupings of the data. Which of the two groupings do you
prefer? Why?
2.3 In a period of one week, 3 rainy days were observed. If the occurrence of a rainy day is an
independent event, how many ways could the sequence consisting of 4 dry and 3 wet days be
arranged.
2.4 If the occurrence of a rainy day is an independent event with probability equal to 0.3, what is
the probability of
(c) 3 rainy days in a row during any week with the other 4 days dry?
2.5 Consider a coin with the probability of a head equal to p and the probability of a tail equal to
q=1-p.
(a) What is the probability of the sequence HHTHTTH in 7 flips of the coin?
(b) What is the probability of a specified sequence resulting in r H=s and s T=s?
(d) What is the probability of r H’s and s T’s without regard to the order of the sequence?
2.6 The distribution given by fx(x) = 1/N for X = 1,2,3,...,N is known as the discrete uniform
distribution. In the following consider N≥5.
(a) What is the probability that a random value from fX(x) will be equal to 5?
(b) What is the probability that a random value from fX(x) will be between 3 and 5 inclusive?
54
(c) What is the probability that in a random sample of 3 values from fX(x) all will be less than 5?
(d) What is the probability that the 3 random values from fX(x) will all be less than 5 given that 1
of the values is less than 5?
(e) If 2 random values are selected from fX(x), what is the probability that one will be less than 5
and the other greater than 5? (f) For what X from fX(x) is prob(X≤x) = 0.5?
2.7 Consider the continuous probability density function pX(x) = a sin2mx for 0 < X < π.
2.8 Consider the continuous probability density function given by pX(x) = 0.25 for 0<X<a.
(a) What is a?
2.9 Let pX(x) = 0.25 for 0<X<a as in exercise 2.8. What is the distribution of Y = ln X? Sketch
pY(y).
2.10 Many probability distributions can be defined simply by consulting a table of definite
∞
integrals. For example ∫0 𝑥 𝑛−1 𝑒 −𝑥 𝑑𝑥 is equal to Γ(n) where Γ(n) is defined as the gamma
function (see chapter 6). Therefore one can define pX(x)=xn-1 e-x/Γ(n) to be a probability density
function for n>0 and 0<X<∞. This distribution is known as the 1-parameter gamma distribution.
Using a table of definite integrals, define several possible continuous probability distributions.
Give the appropriate range on X and any parameters.
2.11 The annual inflow into a reservoir (acre-feet) follows a probability density given by pX(x) =
1/(β1 - α1). The total annual outflow in acre-feet follows a probability distribution given by pY(y)
= 1/(β2 - α2). Consider that β1 > β2 and α1 < α2.
(a) Calculate the expression for the probability distribution of the annual change in storage.
55
(b) Plot the probability distribution of the annual change in storage.
(c) If β1 = 100,000, α1 = 20,000, β2 = 70,000 and α2 = 50,000, what is the probability that the
change in storage will be I) negative and ii) greater than 15,000 acre-feet?
2.12 The probability of receiving more than 1 inch of rain in each month is given in the
following table. If a monthly rainfall record selected at random is found to have more than 1
inch of rain, what is the probability the record is for July? April?
2.13 It is known that the discharge from a certain plant has a probability of 0.001 of containing a
fish killing pollutant. An instrument used to monitor the discharge will indicate the presence of
the pollutant with probability 0.999 if the pollutant is present and with probability 0.01 if the
pollutant is not present. If the instrument indicates the presence of the pollutant, what is the
probability that the pollutant is really present?
2.14 A potential purchaser of a ferry across a river knows that if a flow of 100,000 cfs or more
occurs, the ferry will be washed downstream, over a low dam and destroyed. He knows that the
probability of a flow of this kind in any year is 0.05. He also knows that for each year that the
ferry operates a net profit of $10,000 is realized. The purchase price of the ferry is $50,000.
Sketch the probability distribution of the potential net profit over a period of years neglecting
interest rates and other complications. Assume that if a flow of 100,000 cfs or more occurs in a
year, the profit for that year is zero.
2.15 Assume that the probability density function of daily rainfall is given by
56
pX(x) = λ1/2 0<X<2
(d) In a random sample from pX(x), 60% of the values were between 0 and 2. What would be an
estimate for the value of λ1?
57