You are on page 1of 17

5.

Normal Distribution

THE MOST widely used and most important continuous probability distribution is the
Gaussian or normal distribution. The normal distribution has been widely used because of its
early connection with the "Theory of Errors" and because it has certain useful mathematical
properties. Many statistical techniques such as analysis of variance and the testing of certain
hypotheses rely on the assumption of normality. The errors involved in incorrectly assuming
normality (purposely or unknowingly) depend on the use under consideration. Many statistical
methods derived under the assumption of normality remain approximately valid when moderate
departures from normality are present and as such are said to be robust.

The very name "normal" distribution is misleading in that it implies that random
variables that are not normally distributed are abnormal in some sense. The Central Limit
Theorem indicates the conditions under which a random variable can be expected to be normally
distributed. In a strict theoretical sense, most hydrologic variables cannot be normally distributed
because the range on any random variable that is normally distributed is the entire real line (-∞ to
+∞). Thus non-negative variables such as rainfall, streamflow, reservoir storage, and so on,
cannot be strictly normally distributed. However, if the mean of a random variable is 3 or 4 times
greater than its standard deviation, the probability of a normal random variable being less than
zero is very small and can in many cases be neglected.

GENERAL NORMAL DISTRIBUTION

The normal distribution is a 2-parameter distribution whose density function is

1 1 𝑥 − 𝜃1 2
𝑝𝑋 (𝑥) = 𝑒 −2 � � −∞<𝑋 <∞
�2𝜋𝜃22 𝜃2

114
Figure. 5.1. Normal distributions with same mean and different variances.

Figure. 5.2. Normal distributions with same variances and different means.

In examples 3.3 and 3.5 it was shown that if either the method of moments or the method of
maximum likelihood is used to estimate the two parameters of this distribution, the result is θ1 =
μ and θ22 =σ2 where μ and σ2 are the mean and variance of X, respectively. For this reason the
normal distribution is generally written as

1 1
𝑥−𝜇 2
𝑝𝑋 (𝑥) = √2𝜋𝜎2 𝑒 −2 � 𝜎
� −∞<𝑋 <∞ (5.1)

Thus, the normal distribution is a two parameter distribution which is bell-shaped,


continuous and symmetrical about μ (the coefficient of skew is zero). If μ is held constant and
115
σ2 varied, the distribution changes as in figure 5.1. If σ2 is held constant and μ varied, the
distribution does not change scale but does change location as in figure 5.2. The parameters μ
and σ2 are sometimes denoted as location and scale parameters. A common notation for
indicating that a random variable is normally distributed with mean μ and variance σ2 is N(μ, σ2).

REPRODUCTIVE PROPERTIES

If a random variable X is N(μ, σ2 ) and Y = a + bX, the distribution of Y can be shown to


be N(a+bμ, b2σ2 ). Furthermore, if Xi for i = 1, 2, ..., n, are independently and normally
distributed with mean μi and variance σi2, then Y=a+b1X1 + b2X2 +...+ bnXn is normally
distributed with

𝜇𝑌 = 𝑎 + ∑𝑛𝑖=1 𝑏𝑖 𝜇𝑖 (5.2)

and

𝜎𝑌2 = ∑𝑛𝑖=1 𝑏𝑖2 𝜎𝑖2 (5.3)

Any linear function of independent normal random variables is also a normal random variable.

Example 5.1. If X is a random observation from the distribution N(μ, σ2 ), what is the
𝑥
distribution of 𝑥̅ = ∑𝑛𝑖=1 𝑖?
𝑛

Solution: X is a linear function of x1 given by x = (xl +x2 +...+xn )/n. From equations 5.2 and
5.3 and the reproductive properties of the normal distribution, X is normally distributed with
mean
𝑛 𝑛 𝜇 𝜇
𝜇𝑥̅ = 𝑎 + � 𝑏𝑖 𝜇𝑖 = 0 + � =𝑛 =𝜇
𝑖=1 𝑖=1 𝑛 𝑛

and variance
𝑛 𝑛 𝜎2 𝜎2 𝜎2
𝜎𝑋2� = � 𝑏𝑖2 𝜎𝑖2 = � 2
= 𝑛 =
𝑖=1 𝑖=1 𝑛 𝑛2 𝑛

Therefore X is N(μ, σ2/n ).

116
STANDARD NORMAL DISTRIBUTION

The cumulative distribution function for the normal distribution is

𝑥 1 1
𝑡−𝜇 2
𝑃𝑋 (𝑥) = ∫−∞ √2𝜋𝜎2 𝑒 −2 � 𝜎
� 𝑑𝑡 (5.4)

Unfortunately, equation 5.4 cannot be evaluated analytically so that approximate methods


of integration are required. If a tabulation of the integral was made, a separate table would be
required for each value of μ and σ2 . By using the linear transformation

𝑋−𝜇
𝑍=
𝜎
the random variable Z will be N(0,1). This is a special case of a + bX with a = -μ/σ and b = 1/σ .
The random variable Z is said to be standardized (has μ = 0 and σ2 = 1) and N(0,1) is said to be
the standard normal distribution. The standard normal distribution is given by
1 2 ⁄2
𝑝𝑍 (𝑧) = 𝑒 −𝑧 −∞<𝑧 <∞ (5.5)
√2𝜋

and the cumulative standard normal is given by


𝑧 1 2 ⁄2
𝑃𝑍 (𝑧) = ∫−∞ 𝑒 −𝑡 𝑑𝑡 (5.6)
√2𝜋

117
Figure. 5.3. Standard normal distribution ( μ= 0, σ2 = 1).

Figure 5.3 shows the standard normal distribution which along with the transformation
𝑍 = (𝑋 − 𝜇)/𝜎 contains all of the information shown in figures 5.1 and 5.2. Both pZ(z) and
PZ(z) are widely tabulated. Most tables utilize the symmetry of the normal distribution so that
only positive values of Z are shown. Tables of PZ(z) may show prob(Z < z), prob(0 < Z < z), or
prob(-z < Z < z). Care must be exercised when using normal probability tables to see what values
are tabulated. The table of PZ(z) in the appendix gives prob(Z < z). There are many routines
programmed into computer software to evaluate the normal pdf and cdf. Some approximations
for the standard normal distribution are given below.

A table of PZ(z) shows that 68.26% of the normal distribution is within 1 standard deviation
of the mean, 95.44% within 2 standard deviations of the mean and 99.74% within 3 standard
deviations of the mean. These are called the 1, 2 and 3 sigma bounds of the normal distribution.
The fact that only 0.26% of the area of the normal distribution lies outside the 3 sigma bound
demonstrates that the probability of a value less than μ-3σ is only 0.0013 and is the justification
for using the normal distribution in some instances even though the random variable under
consideration may be bounded by X = 0. If μ is greater than 3σ, the chance that X is less than
zero is many times negligible (this is not always true however).

Example 5.2. Compare the 1, 2 and 3 sigma bounds under the assumption of normality and
under no distributional assumptions using Chebyshev's inequality.

Solution: The 1, 2 and 3 sigma bounds of N(μ,σ2) contain 68.26, 95.44 and 99.72% of the
distribution. Thus the probability that X deviates more than σ, 2σ, and 3σ from μ is 0.3174,
0.0456 and 0.0028 respectively.

Chebyshev's inequality states that the prob( | X - μ | > kσ) < 1/k2. This corresponds to a
probability that X deviates more than σ, 2σ, and 3σ from μ of less than 1.00, less than 0.25, and
less than 0.11 respectively.

Comment: By making no distributional assumptions, we are forced to make very conservative


probability statements. It is to be emphasized that Chebyshev's inequality gives an upper bound
to the probability and not the probability itself.

Example 5.3. As an example of using tables of the normal distribution consider a sample drawn
from a N(15, 25). What is the prob(15.6 < X < 20.4)?

118
Solution: The desired probability could be evaluated from
2
(𝑥−15)
20.4 1 −
𝑝𝑟𝑜𝑏(15.6 ≤ 𝑋 ≤ 20.4) = ∫15.6 √50𝜋 𝑒 50 𝑑𝑥

However this integral is difficult to evaluate. Making use of the standard normal distribution, we
can transform the limits on X to limits on Z and then use standard normal tables.

x = 15.6 transforms to z = (15.6 - 15.0)/5 = 0.12

x = 20.4 transforms to z = (20.4 - 15.0)/5 = 1.08

The desired probability is

prob(0.12 < Z < 1.08) = Pz (1.08) - Pz (0.12)

From the standard normal table Pz(1.08) = 0.860 and Pz(0.12) = 0.548. The desired
probability is 0.860 - 0.548 or 0.312.

APPROXIMATIONS FOR STANDARD NORMAL DISTRIBUTION

Maidment (1993) presents several approximations for the normal distribution. Let PZ(z) =
p for 0.005 ≤PZ(z) ≤ 0.995 where Z is the standard normal variate. Then z can be approximated
from
𝑝0.135 −(1−𝑝)0.135
𝑧= 0.1975

Let 𝑦 = − 𝑙𝑛(2𝑝). For 0.005 < 𝑃𝑍 (𝑧) < 0.5 an approximation for z is given by

𝑦 2 [(4𝑦 + 100)𝑦 + 205]


𝑧 = −�
[(2𝑦 + 56)𝑦 + 192]𝑦 + 131

An approximation for PZ(z) for positive values of z is given by

(83𝑧 + 351)𝑧 + 562


𝑃𝑍 (𝑧) = 1 − 0.5𝑒𝑥𝑝 �− �
703
𝑧 + 165

119
Of course for negative values of z, PZ(z) for the absolute value of z can be obtained and then
𝑃𝑍 (𝑧) = 1 − 𝑃𝑍 (|𝑧|).

Example 5.4 Use a normal approximation to determine prob(10.5 < X < 20.4) if X is distributed
N(15, 25)?

Solution:

Using the approximation for PZ(z)

[(83)(1.08)+351]1.08+562
𝑝𝑟𝑜𝑏(𝑧 < 1.08) = 1 − 0.5𝑒𝑥𝑝 �− 703 �
+165
1.08

𝑝𝑟𝑜𝑏(0 < 𝑧 < 1.08) = 0.85987 − 0.50000 = 0.360

Similarly, 𝑝𝑟𝑜𝑏(𝑧 < 0.9) = 0.816 so that

𝑝𝑟𝑜𝑏(0 < 𝑧 < 0.9) = 0.316 and

𝑝𝑟𝑜𝑏(−0.9 < 𝑧 < 1.08) = 0.360 + 0.316 = 0.676

Figure. 5.4. Prob(-0.9 < z < 1.08).

Comment: Many times in solving problems of this type it is useful to sketch a normal
distribution and then shade in the area corresponding to the desired probability. For this problem
the sketch would be as in figure 5.4.

120
Example 5.5. Repeat example 3.7 assuming the Kentucky River data is N(μ, 22,3222).
𝑋 −𝜇 �
Solution: Since X is assumed normal, 𝑋� is N(μ, 22,3222/n). Therefore 𝑍 = 22,322/ is N(0,1).
√𝑛
From the Problem statement | X -μ| < 10,000. Therefore n must be determined so that

𝑝𝑟𝑜𝑏(−10,000√𝑛 /22,322 < 𝑍 < 10,000√𝑛 /22,322) = 0.95

From the standard normal table it is seen that 95% of the normal distribution is enclosed by

-1.96 < Z < 1.96. From this n is calculated as

𝑛 > (22,322(1.96)/10,000)2 = 19.14

or at least 19 observations are required to be 95% sure that X is within 10,000 cfs of μ if X is
N(μ, 22,3222).

Comment: By assuming normality, the required minimum number of observations has been
reduced from 100 to 19. The Law of Large Numbers has placed a lower limit on n without
knowledge of the distribution of X. The price for this ignorance of the distribution of X is seen
to be very great if in fact X is normally distributed.

CENTRAL LIMIT THEOREM

The conditions under which a random variable might be expected to follow a normal
distribution are specified by the Central Limit Theorem.

If Sn is the sum of n independently and identically distributed random variables Xi each


having a mean, μ, and variance, σ2, then in the limit as n approaches infinity, the
distribution of Sn approaches a normal distribution with mean nμ and variance nσ2.

In practice if the Xi are identically and independently distributed, n does not have to be very
large for Sn to be approximated by a normal distribution. If interest lies in the central part of the
distribution of Sn, values of n as small as 5 or 6 will result in the normal distribution producing
reasonable approximations to the true distribution of Sn . If interest lies in the tails of the
distribution of Sn as it often does in hydrology, larger values of n may be required.

As stated above, the Central Limit Theorem is of limited value in hydrology since most
hydrologic variables are not the sum of a large number of independently and identically
distributed random variables. Fortunately under some very general conditions it can be shown
that if Xi for i = 1, 2, ..., n is a random variable independent of Xj for j ≠ i and E(Xi) = μ i and

121
Var(Xi) = σi2, then the sum Sn = X1 + X2 + ... + Xn approaches a normal distribution with
𝐸(𝑆𝑛 ) = ∑𝑛𝑖=1 𝜇𝑖 and 𝑉𝑎𝑟(𝑆𝑛 ) = ∑𝑛𝑖=1 𝜎𝑖2 as n approaches infinity (Thomas 1971). One
condition for this generalized Central Limit Theorem is that each Xi has a negligible effect on
the distribution of Sn (i.e., there cannot be one or two dominating Xi's).

This general theorem is very useful in that it says that if a hydrologic random variable is the
sum of n independent effects and n is relatively large, the distribution of the variable will be
approximately normal. Again how large n must be depends on the area of interest (central part or
tail of the distribution) and on how good an approximation is needed.

Example 5.6. In the last chapter the gamma distribution for integer values of n was derived as
the sum of n exponentially distributed random variables. The mean and variance of the
exponential distribution are given as 1/ λ and l/ λ2 respectively. The Central Limit Theorem
gives the mean and variance of the sum of n values from the exponential distribution as n/λ and
n/λ2 for large n. This agrees with the mean and variance of the gamma distribution. In chapter 6
the coefficient of skew of the gamma distribution is given as 2//n which approaches zero as n
gets large. Thus the sum of n random variables from an exponential distribution is a gamma
distribution which approaches a normal distribution (with γ approaching 0) as n gets large.

CONSTRUCTING PDF CURVES FOR DATA

Frequently the histogram of a set of observed data suggests that the data may be
approximated by a particular probability density function. One way to investigate the goodness
of this approximation is by superimposing a pdf on the frequency histogram and then visually
comparing the two distributions. Statistical procedures for testing the hypothesis that a set of
data can be approximated by a particular distribution are given in chapter 8.

Consider the data of table 2.1 and the frequency histogram of figure 2.6. The probability (or
relative frequency) of a peak flow in any one of the class intervals assuming a normal
distribution can be obtained by integrating the normal distribution over the limits of the class
interval. For example the expected (according to the normal distribution) relative frequency in
the first interval can be calculated from
1 𝑥−66,540 2
29,950 1 − � �
𝑓25,000 = ∫19,950 �2𝜋22,3222 𝑒 2 22,322 𝑑𝑥

because the mean of the data is 66,540 cfs and the standard deviation is 22,322 cfs, this integral
is easily evaluated using standard normal tables as 0.0322.

An approximation to the relative frequency in a class interval can also be made by using
equation 2.25b.

𝑓𝑋𝑖 = 𝛥𝑥𝑖 𝑝𝑋 (𝑥𝑖 )


122
Using the standard normal distribution through the transformation
𝑑𝑧 𝑝𝑍 (𝑧𝑖 )
𝑝𝑋 (𝑥𝑖 ) = 𝑝𝑍 (𝑧𝑖 ) �𝑑𝑥� = (5.7)
𝜎

for the first class interval Δxi = 10,000, zi = (25,000 - 66,540)/22,322 = -1.8609, pZ(zi ) = 0.0706
(from equation 5.3) and σ is estimated by s = 22,322.

f25,000 = 10,000 x 0.0706/22,322 = 0.0316

Similar calculations for each of the class intervals are shown in table 5.1 with the results plotted
in figure 5.5. The sum of the expected relative frequencies is not one since the entire range of the
normal distribution was not covered.

Table 5.1. Expected relative frequencies according to the normal distribution for the Kentucky
River data

Class Expected
Mark Relative Frequencies Observed
xi Zi PZ(zi) fXi Relative Frequencies
25,000 -1.8609 0.0706 0.03160 0.030
35,000 -1.413 0.1471 0.06590 0.061
45,000 -0.965 0.2505 0.11220 0.162
55,000 -0.517 0.3491 0.15640 0.162
65,000 -0.069 0.3981 0.17830 0.182
75,000 0.379 0.3714 0.16640 0.131
85,000 0.827 0.2835 0.12700 0.131
95,000 1.275 0.177 0.07930 0.071
105,000 1.723 0.0904 0.04050 0.030
115,000 2.171 0.0378 0.01690 0.030
Sum 0.9744 0.99

The procedure of integrating pX(x) over each class interval or of using equation 2.25b can
be used for any continuous probability distribution to get the expected relative frequencies for
that distribution.

123
0.20
Normal distribution
Observed Distribution
0.18

0.16

0.14
Relative Frequency

0.12

0.10

0.08

0.06

0.04

0.02

0.00
0 20 40 60 80 100 120 140
Peak Flow (1000 cfs)

Figure. 5.5. Comparison of normal distribution with the observed distribution, Kentucky River
peak flows.

NORMAL APPROXIMATIONS FOR OTHER DISTRIBUTIONS

The normal distribution can be shown to be a good approximation to several other


distributions both discrete and continuous. Before using the normal to approximate some other
distribution, care must be taken to see that the conditions for the approximation to be valid are
met. Generally the approximations are quite good in the central part of the distribution with the
accuracy dropping off in the tails of the distribution. Throughout our study of distributions, the
sensitivity of the tails of distributions to distributional assumptions will be of concern. This is of
particular importance in hydrology when the magnitude of a rare event is to be estimated since
this estimate must come from the tail of the distribution being used.

124
Whenever a continuous distribution is used to approximate a discrete distribution, half-
interval corrections must be applied to the continuous distribution. For example the probability
that X is equal to some positive integer X can be evaluated for a discrete distribution. This same
probability is zero if a continuous distribution is used. When a continuous distribution is used to
approximate the prob(X = x), the 𝑝𝑟𝑜𝑏(𝑥 – 1/2 < 𝑋 < 𝑥 + 1/2) must be evaluated. This
illustrates the general rule that a 1/2 interval correction must be added to the upper limit and
subtracted from the lower limit. The prob(𝑋 = 𝑥, 𝑥 + 1, 𝑥 + 2, . . . , 𝑦) in a discrete case is
approximated by 𝑝𝑟𝑜𝑏(𝑥 – 1/2 < 𝑋 < 𝑦 + 1/2) in the continuous case. The prob(X < x) in a
discrete case is approximated by 𝑝𝑟𝑜𝑏(𝑋 < 𝑥 + 1/2) in the continuous case. More examples
of these corrections are shown in table 5.2.

The Central Limit Theorem provides the mechanism by which the normal distribution
becomes an approximation for several other distributions.

Table 5.2. Corrections for approximating a discrete random variable by a continuous random
variable.

Discrete Continuous

𝑿=𝒙 1 1
𝑥− <𝑋<𝑥+
2 2
𝒙≤𝑿≤𝒚 1 1
𝑥− <𝑋<𝑦+
2 2
𝑿≤𝒙 1
𝑋<𝑥+
2
𝑿≥𝒙 1
𝑋>𝑥−
2
𝑿<𝒙 1
𝑋<𝑥−
2
𝑿>𝒙 1
𝑋>𝑥+
2

Binomial Distribution

It was stated in chapter 4 that if X is a binomial random variable with parameters n1


and p and Y is a binomial random variable with parameters n2 and p then Z = X + Y is a
binomial random variable with parameters n = n1 + n2 and p. Extending this to the sum of several
125
binomial random variables, the Central Limit Theorem would indicate that the normal
distribution approximates the binomial distribution if n is large. Thus as n gets large the
distribution of

𝑋−𝜇 𝑋 − 𝑛𝑝
𝑍= =
𝜎 �𝑛𝑝(1 − 𝑝)
(5.8)

approaches a N(0,1). This is sometimes known as the DeMoivre-Laplace limit theorem (Mood et
al. 1974).

Example 5.7. X is a binomial random variable with n = 25 and p = 0.3. Compare the binomial
and normal approximation to the binomial for evaluating the prob(5 < X < 8).

Solution: Using the binomial distribution this is equivalent to

∑8𝑖=6 𝑓𝑥 (𝑖; 25,0.3) = 0.483

Using the normal approximation, the probability is determined as prob(5.5 < X < 8.5) which is
0.476. Therefore the exact probability of 0.483 is approximated by the normal to be 0.476 for an
n of 25.

Negative Binomial Distribution

Following reasoning similar to that given for the binomial distribution, the negative
binomial distribution with large k can be approximated by a normal distribution. In the case of
the negative binomial the distribution of

𝑋−𝜇 𝑋−𝑘�𝑝
𝑍= = (5.9)
𝜎 �𝑘𝑞/𝑝2

approaches N(0,1) as k gets large.

Example 5.8. Work example 4.11 using the normal approximation for the negative binomial.

Solution: The desired probability is prob(39.5 <X< 40.5). Using the standard normal
distribution, the limits on Z are

126
39.5 − 4.0/0.1 −0.5
𝑍= = = −0.026
�4(0.9)/0.12 18.97

40.5 − 4.0/0.1 0.5


𝑍= = = 0.026
�4(0.9)/0.12 18.97

𝑝𝑟𝑜𝑏(−0.026 < 𝑍 < 0.026) = 0.0208

This compares favorably with the 0.0206 computed using the negative binomial.

Poisson Distribution

The sum of two Poisson random variables with parameters λ1 and λ2 is also a Poisson
random variable with parameter λ = λ1 + λ2. Extending this to the sum of a large number of
Poisson random variables, the Central Limit Theorem indicates that for large λ, the Poisson may
be approximated by a normal distribution. In this case the distribution of
𝑋−𝜇 𝑋−𝜆
𝑍= 𝜎
= (5.10)
√𝜆

approaches a N(0,1). Since the Poisson is the limiting form of the binomial and the binomial can
be approximated by the normal, it is no surprise that the Poisson can also be approximated by the
normal.

Continuous Distributions

Many continuous distributions can be approximated by the normal distribution for certain
values of their parameters. For instance, in example 5.6, it was shown that for large n the gamma
distribution approaches the normal distribution. To make these approximations one merely
equates the mean and variance of the distribution to be approximated to the mean and variance of
the normal and then uses the fact that

𝑋−𝜇
𝑍=
𝜎
is N(0, 1) if X is N(μ, σ2). Not all continuous distributions can be approximated by the normal
and for those that can the approximation is only valid for certain parameter values. Things to
look for are parameters that produce near zero skew, symmetry and tails that asymptotically
approach pX(x) = 0 as X approaches large and small values. Again it is emphasized that
approximations in the tails of the distributions may not be as good as in the central region of the
distribution.

127
EXERCISES

5.1 Consider sampling from a normal distribution with a mean of 0 and a variance of 1. What is
the probability of selecting

a) an observation between 0.5 and 1.5?


b) an observation outside the interval -0.5 to +0.5?
c) 3 observations inside and 2 observations outside the interval of 0.5 and 1.5?
d) 4 observations inside the interval 0.5 to 1.5 exactly two of which are not in the interval -
0.5 to 1.0?

5.2 What is the probability of selecting an observation at random from a N(100, 2500) that is

a) less than 75?


b) equal to 75?

5.3 For the Kentucky River data of table 2.1, what is the probability of a peak flow exceeding
100,000 cfs if the peaks are assumed to be normally distributed?

5.4 Construct the theoretical distribution for the data of exercise 2.2 if it is assumed that the data
are normally distributed. From a visual comparison with the data histogram, would you say the
data are normally distributed?

5.5 Work exercise 4.1 using the normal approximation to the binomial and plot the results on the
histogram developed for exercise 4.1.

5.6 Show that if X is N(μ, σ2) then Y = a + bX is N(a + bμ, b2 σ2 ).

5.7 For a particular set of data the coefficient of variation is 0.4. If the data are normally
distributed, what percent of the data will be less than 0.0?

5.8 A sample of 150 observations has a mean of 10,000, a standard deviation of 2,500 and is
normally distributed. Plot a frequency histogram showing the number of observations expected
in each interval.

5.9 The appendix contains a listing of the annual runoff from Cave Creek watershed near Fort
Spring, Kentucky. What is the probability that the true mean annual runoff is less than 14.0 in. if
one can assume the true variance is 22.56 in.2? What other assumptions are needed?

5.10 Random digits are the numbers 0, 1, 2, ..., 9 selected in such a fashion that each is equally
likely (i.e., has probability 1/10 of being selected). An experiment is performed by selecting 5
random digits, adding them together and calling their sum X. The experiment is repeated 10
times and X is calculated. What is the probability that X is less than 21.5? (Exercise 13.9
requires that this experiment be carried out.)
128
5.11 Plot the individual terms of the Poisson distribution for λ = 2. Approximate the Poisson by
the normal and plot the normal approximations on the same graph.

5.12 Repeat exercise 5.11 for λ = 9.

5.13 Assume the data of exercise 4.21 is normally distributed.

a) Within each month what is the probability of 10 or more rainy days?


b) What is the probability of 20 or more rainy days in the July-August period?
c) What is the difference in assuming the data are normally distributed and in assuming the
data are binomially distributed and approximating the binomial with the normal?

5.14 Plot the observed frequency histogram and the frequency histogram expected from the
normal distribution for the annual peak flows for the following rivers. Discuss how well the
normal approximates the data in terms of the coefficient of variation and skewness. (Note: data
are in the appendix or may be obtained from the internet).

a) North Llano River near Junction, Texas


b) Cumberland River at Cumberland Falls, Kentucky
c) Piscataquis River near Dover-Foxcroft, Maine.

5.15 The occurrence of rainstorms is sometimes considered to be a Poisson process so that the
time between rainstorms is exponentially distributed. If for a certain locality the mean of this
exponential distribution is 10 days, what is the probability that the elapsed time for 15 storms to
occur will exceed 120 days?

129
5.16 Lane and Osborn (1973) present the following data for the mean number of days with more
than 0.10 inches of precipitation at Tombstone, Arizona. If the occurrence of more than 0.10
inches of rain in any month can be considered as an independent Poisson process, what is the
probability of fewer than 30 days with more than 0.10 inches of rain in one year at Tombstone?

Month No.of Days Month No.of Days

Jan. 2 July 7

Feb. 2 Aug. 7

Mar. 2 Sept. 3

Apr. 1 Oct. 2

May 0 Nov. 2

June 2 Dec. 2

Total 32

5.17 An experimenter is measuring the water level in an experimental towing channel. Because
of waves and surges, a single measurement of the water level is known to be inaccurate. Past
experience indicates the variance of these measurements is 0.0025 ft2. How many independent
observations are required to be 90% confident that the mean of all the measurements will be
within 0.02 feet of the true water level?

5.18 At a certain location the annual precipitation is approximately normally distributed with a
mean of 45 in. and a standard deviation of 15 in. Annual runoff can be approximated by 𝑅 =
−7.5 + 0.5𝑃 where R is annual runoff and P is annual precipitation. What is the mean and
variance of annual runoff? What is the probability that the annual runoff will exceed 20 inches?

5.19 Plot a frequency distribution for a mixture of two normal distributions. Use as the first
distribution a N(0, 1) and as the second a N(1, 1). Use as values for the mixing parameter 0.2, 0.5
and 0.8.

130

You might also like