Professional Documents
Culture Documents
Probability Distributions
Overview
In Unit 5, we assigned probabilities to events. This unit extends on the concept of
probability by discussing probability distributions. The social scientist seldom works
with events arising from an experiment. Instead, focus is placed on the random variable
that arises out of the experiment, or the outcomes and their probabilities.
In preparation for inferential statistics, we now introduce random variables, classify these,
construct the corresponding probability distributions, introduce expected value and
variance of these variables, and focus specifically on the poisson, binomial and normal
probability distributions.
129
Note to students
This unit contains several activities and one practice assignment at the end of
the unit. You are to work on the activities on your own. If you have any questions
or concerns please post a message in the unit discussion forum so that your
E-tutor can provide assistance to you. The assignment is to be uploaded in the
practice assignment area.
130
Session 1
Understanding Random Variables and Probability
Distributions.
Introduction
In this session we explain the term ‘random variable’ and distinguish between discrete
and random variables before moving on to a discussion of probability distributions.
Objectives
A Random Variable
We begin by considering the following experiment that forms part of a game during a
party among some friends. The rules of the game specify that you are required to toss a
‘fair’ coin twice. Each outcome of a ‘head’ entitles the player to a $1.00 prize while each
outcome of a ‘Tail’ results in the player paying out $1.00.
Recall the sample space from tossing a ‘fair’ coin twice is given by
S = { HH, HT, TH, TT}.
The rules of the game allow us to assign a net prize money to each outcome in the
sample space.
Table 6.1
Net Prize Money from the Experiment
Outcome Net Prize Money
131
Each time that the experiment is repeated, the player stands to get net prize money of
$2.00 or $0.00 or pay out $2.00. We do not, however, know the precise prize money in
advance as these moneys all depend on outcomes that are random.
We typically represent variables by X. Here X takes on the values of $2, $0 and -$2 in a
random manner. X is called a random variable. We can also represent X as a mapping that
assigns a real valued number to each of the outcomes of the sample space.
.
Mapping Diagram
HH $2
HT
$0
TH
TT -$2
DEFINITION 6.1
The nature of the assignment gives the random variable its name. In our experiment
above, X is the net prize money from the game of tossing two ‘fair’ coins.
Random variables form the core of inferential statistics. There are two types of random
variables:
• discrete, and
• continuous.
132
Discrete Random Variables
DEFINITION 6.2
The random variable X, arising from our experiment of tossing a ‘fair’ coin twice, takes
on 3 real number dollar values, namely, 2, 0 and -2. These are indeed countable. Hence X
here is a discrete random variable.
If so, do you agree that the name of this random variable should be ‘Age of a Form I
male student from the city secondary school’?
Now that you have been presented with two examples of a random variable, you need
to create your own example of a discrete random variable.
ACTIVITY 6.1.
Choose an experiment and create a discrete random variable that arises from
that experiment.
133
DEFINITION 6.3
A continuous random variable takes on any values that are not countable. In
fact, these variables can take on every value within an interval of the real line.
ACTIVITY 6.2
Before we leave the discussion on random variables, you are required to complete the
next activity.
ACTIVITY 6.3
Indicate which of the following random variables are discrete and which are
continuous:
134
was fair, the classical approach allows us to assign a probability of 0.5 for a head and 0.5
for a tail.
In Unit 5, we also introduced the laws of probability. We can use the multiplication law
to determine the probabilities of the outcomes from two tosses of a fair coin as shown in
the table below.
Table 6.2
Probabilities of Outcomes in the Experiment of Tossing of a Fair Coin Twice
Outcome Probability
HH 0.25
HT 0.25
TH 0.25
TT 0.25
Table 6.3
Probabilities of Outcomes and Related Random Variables
Outcome Net Prize Money Probability
HH $2 0.25
HT $0 0.25
TH $0 0.25
TT -$2 0.25
Looking again at Table 6.3, we observe that we have listed all the values of net prize
money and their corresponding probabilities. Further the sum of the probabilities is 1.
Remember that net prize money is the discrete random variable in this example. If we
were to create a table in which we present all values of the random variables and their
corresponding probabilities, we would refer to this table as a probability distribution of
the discrete random variable, net prize money.
Table 6.4
Probability Distribution of Net Prize Money
Net Prize Money Probability
$2 0.25
$0 0.50
-$2 0.25
Total 1.00
135
DEFINITION 6.4
If you were asked to develop a probability distribution for a discrete random variable,
you should follow these steps:
1. Define the experiment.
2. Define the related sample space.
3. Perform a sufficiently large number of repetitions of the experiment.
4. Create the frequency table for the outcomes of the repetitions of the experiment.
5. Create the corresponding relative frequency table.
6. In accordance with the relative frequency approach to probability, we can
rename relative frequency as probability. Hence the relative frequency table in
step 5 above becomes the probability distribution.
Note that when the relative frequencies are known for the population, these are seen to
be the theoretical probabilities of the outcomes. Accordingly, the relative frequency table is
seen to be the theoretical probability distribution of the random variable.
ACTIVITY 6.4
Each of the three tables below lists certain values x of a discrete random
variable X. Show that Table 2 is the lone valid probability distribution.
136
Figure 6.1
P(x)
0.4
0.3
0.2 P(x)
0.1
0
1 2 3 4
Table 6.5
Cumulative Probability Distribution
x P(X ≤ x)
1 .25
2 .59
3 .87
4 1.00
How did we get P(X ≤ 2) = 0.59 above? We used the addition law of probability to say
that P(X ≤ 2) = P(X = 1 or 2) = P(X = 1) + P(X=2) = .25 + .34 = .59.
In a similar way, we can justify the other values in Table 6.5 above.
Observe that all the cumulative probabilities are of the type P(X ≤ x ). Alternative
cumulative probabilities are of the form P(X < x), P(X > x ) and P(X ≥ x ) and these give
rise to three other cumulative probability distributions.
ACTIVITY 6.5
Recall Table 6.4. Show that the cumulative probability distribution of Net
Prize Money is given by:
x P(X ≤ x)
-$2 0.25
$0 0.25+0.50 = 0.75
$2 0.75+0.25 = 1.00
137
Expected Value of a Discrete Random Variable
Suppose that we played this game repeatedly a large number of times, say 4000 times.
From the probability distribution we can say that:
• Net prize money of -$2 is expected in 25% of the games i.e. in 1000 games
• Net prize money of $0 is expected in 50% of the games i.e. in 2000 games
• Net prize money of $2 is expected in 25% of the games i.e. in 1000 games.
What then would be our average net prize money after these 4000 games?
Simplifying we get
= $0
If therefore we played the game a large number of times, on average, we can expect to
win nothing (and, by extension, lose nothing). The value of $0 so computed is called the
expected value of the discrete random variable X. It is denoted by E(X). What does this
mean?
First, we can say what it does not mean. It certainly does not mean that a player is
guaranteed to win nothing or lose nothing in one game. Rather, some players will win,
some will break even and some will lose but, overall, the winnings of one player will
cancel out the losses of another as the number of games gets sufficiently large.
= $0
138
In that calculation, we multiplied each value of the random variable X by its
corresponding probability and summed the resulting products. We can rewrite this
calculation as follows:
3
E(X) = ∑ xi P(xi)
i=1
where x1 = -$2; P(x1) = 0.25 or 1/4; x2 = $0; P(x2) = 0.50 or 2/4; x3 = $2, P(x3) = 0.25 or 1/4.
DEFINITION 6.5
f
Since by the relative frequency approach to probability, P(xi) = Rel. Freq (xi) = i/N for
each of the values of I where N is the total frequency,
n n
E(X) = ∑ xi fi = ∑ xi fi
i = 1 N i = 1 N
Hence we say:
• E(X) is equal to the mean of the discrete random variable.
• The mean of a discrete random variable is the value that is expected to occur per
repetition, on average, if an experiment is repeated a large number of times.
• In short, E(X) is called the long run average value of the random variable.
ACTIVITY 6.6
A random variable X assumes the values of –1, 0, 1 and 2 only. The probability
that X assumes the value x is given by
P(X = x) = ( x – 3)2
30
a. What kind of random variable is X?
b. Show that the information above constitutes a probability distribution.
c. Find the mean value of X.
139
Properties of Expectation
Property #1 is saying that if a random variable for some reason takes on one value a,
then the expected value (otherwise called its mean) will undoubtedly be the value a.
How do you use the other properties? Let us look at these examples.
EXAMPLE 6.1
140
EXAMPLE 6.1 cont’d
Suppose that each investor decided to invest only $20,000. Further, each
investor decided to invest $10,000 in a range of local financial institutions and
invest the remaining $10,000 in a range of extra-regional financial institutions.
Let X be the discrete random variable representing the return on the local
investment. Also let Y be the discrete random variable representing the return
on the extra-regional investment. At the end of any given period, the investor’s
total return will be the sum of the local return and the extra-regional return.
Thus, for each investor the total return on the investment of $20,000 can be
represented by the random variable X + Y. The average total return at the end of
any given period will be the E(X + Y). This value will be the sum of the average
return from the local institutions and the average return from the extra-regional
institutions. Essentially, E(X + Y) = E(X) + E(Y) which is Property #5.
Suppose now that the local financial institution offered a bonus of $10 over and
above the return on the investment of $10,000. In addition, the extra-regional
institutions countered by offering a bonus of $13 over and above the return on
the investment of $10,000. Let X be the discrete random variable representing
the return on the local investment. Also let Y be the discrete random variable
representing the return on the extra-regional investment. At the end of any given
period the local investment will yield an average of E(X) + 10 while the extra-
regional investment will yield an average of E(Y) + 13.
The average total yield on the $20,000 investment will be the sum of E(X) + 10
and E(Y) + 25.
By Property #3, E(X) + 10 = E(X + 10) and E(Y) + 13 = E(Y + 13).
E(X) + 10 + E(Y) + 13 = E(X + 10) + E(Y + 13) = E( X + 10 + Y + 13).
X + 10 is a function of X and Y+ 13 is a function of Y.
If we generalise this we see what Property #6 is saying.
The expected value of the discrete random variable, E(X) has been shown to be
equal to the mean of the random variable. Mean is a measure of central tendency.
It highlights where the probability distribution of X is centred.
There should also be an associated measure of dispersion for the variable since the
mean alone does not give an adequate description of the shape of the distribution.
Variance is that measure. It is a measure of how the values of the variable X are
spread out or dispersed from the mean.
Consider a discrete random variable X taking on values x1 , x2 , x3 , …… xn with
associated probabilities p1 , p2 , p3 , ……, pn
141
For the relative frequency table
Variance = ∑ fi (xi - μ )2
N
DEFINITION 6.6
where μ = E(X).
One short comming of variance is the unit of measure. If we are finding for
example the variance of the ages of 20 boys, the variance will be measured in
years2. This has no interpretation in thereal world. We get around this short
comming by using another measure known as standard deviation.
DEFINITION 6.7
The standard deviation of the discrete random variable X is the square root of
the variance of X.
142
ACTIVITY 6.7
1. Find the variance and standard deviation of the net prize money variable
from Table 6.4
Property #1 is saying that if a random variable for some reason takes on one
value a, then the mean will be the value a. and each observation of this variable
will have no dispersion (no deviation) from the mean. Hence variance, which
measures the average deviation from the mean, will equal zero.
How do you use the other properties? Let us look at these examples.
143
EXAMPLE 6.2
Suppose instead that each financial institution decided to add a bonus of $25 to
the amount paid out to each investor of $30,000. Let X be the discrete random
variable representing the return on the investment of $10,000.The average
return on the investment of $30,000, E(3X + 25) = E(3X) + 25. Each observation
in this distribution is of the form 3X + 25. Thus the deviation from the mean for
each observation will be 3X + 25 – (3E(X) + 25) which equals 3X – 3E(X) or 3(X
– E(X)). In the formula for finding variance we must square this deviation to get
32 (X – E(X)))2.This procedure is followed for all observations thus giving rise to
a variance that is 32 times the variance of X. Hence Property #4 which states
Var(aX + b) = a2 Var(X).
Suppose that each investor decided to invest only $20,000. Further, each
investor decided to invest $10,000 in a range of local financial institutions and
invest the remaining $10,000 in a range of extra-regional financial institutions.
Let X be the discrete random variable representing the return on the local
investment; and Y represent the discrete random variable for the return on the
extra-regional investment. X and Y are independent of each other given the
differences in the structure of the two markets. For each investor the total yield
will be of the form X + Y. Further E(X + Y) = E(X) + E(Y). Each observation in this
distribution is of the form X + Y. Thus the deviation from the mean for each
observation will be X + Y – (E(X) + E(Y)) which equals (X – E(X)) + (Y – E(Y)). This
procedure is followed for all observations thus giving rise to a variance that is
the same as the sum of the variance of X and variance of Y. Hence Property
#5 which states that Var(X + Y) = Var(X) + Var(Y) when X and Y are mutually
independent.
144
Discrete Probability Distributions to be Covered in this Course
We begin this section by making the observation that because of the uncountable
nature of the values taken on by continuous random variables, we cannot think of
finding probability by adding probabilities of simple events. Because the values
are uncountable, we will never stop adding. Accordingly, we must modify the
approach used for the discrete random variables.
We call the graph formed by the outline of the relative frequency polygon the
probability density curve for the continuous random variable.
145
Activity 6.8
You are shown a relative frequency histogram below for a variable defined as
the height of a basketball player measured in feet. Being a relative frequency
histogram the sum of the areas of the bars equals 100%.
You must shade the relative frequency polygon on the histogram below.
Remember that the area under the relative frequency polygon approximates the
area under the relative frequency histogram. Thus the area under the polygon is
also 100%.
Highlight the top outline of the polygon. You would agree that the outline makes
a curve. That curve is the probability density function for variable defined as the
height of a basketball player.
35
30
P
25
e
r 20
c
e 15
n
10
t
5
35
30
D 25
e
n 20
s
i 15
t
y 10
146
To recap, we confirmed in Activity 6.8 that
• The heights of the bars in the relative frequency histogram sum to 1. Hence
the area under the histogram is 1.
• The relative frequency polygon has an area that approximates the area of the
histogram i.e. it is equal to 1.
• The smoothed outline of the relative frequency polygon is the probability
density curve. Hence the area under the probability density curve also
equals 1.
• The function whose graph is the probability density curve is called the
probability density function of the continuous random variable.
We therefore encounter a new paradigm in which probability is seen as the area under a
probability density curve.
DEFINITION 6.8
The probability density curve is the graph consisting of the outline of the
relative frequency polygon. The total area under the curve equals 1.
DEFINITION 6.9
The probability density function is the function whose graph is the probability
density curve. It is usually represented by f(x). This function takes on only non-
negative values and the total area under the graph of this function equals 1.
In this paradigm, the probability that the continuous random variable lies between any
two given values a and b is given by the area under the probability distribution curve
bounded by the lines x = a and x = b.
Accordingly,
• all probabilities will lie in the range of 0 to 1 inclusive.
• the probability that the random variable will assume all possible intervals equals
the entire area under the curve i.e. an area of 1.
Further, the probability that the continuous random variable assumes a single value is
seen to be the area of a bar with zero width. i.e. such an area equals zero.
147
Once we know the probability density function, we can refer to cumulative probabilities
such as
1. P(X < a) which is the area under the density curve from the far left to x = a;
2. P(X > a) which is the area under the density curve from x = a to the far right of the curve;
3. P(a < X < b) which is equal to the area under the density curve from x = a to x = b.
148
DEFINITION 6.10
In the discrete case we added the products of x and P(x). We cannot do this for
continuous random variables.
In the continuous case we find the centre of gravity (so to speak) of the area under the
probability distribution curve. Students of Mathematics can show that this centre of
gravity of the area under the curve is computed by integrating the product of x and f(x).
As in the case of the discrete random variables, variance is a measure of the dispersion
of the distribution of the variable.
149
Properties of Variance for Discrete Random Variables
• normal distribution,
• student-t distribution,
• F distribution,
We shall also explore the normal distribution later in this Unit. The other three
distributions will be presented in subsequent units.
Summary
This unit explored discrete and continous random variables and probability distribution
for a discrete random variable. We also explained the concept of expected value for
both discrete and continous variables, variance of discrete random variables and the
probability density function of the continous random variable.
150
Session 2
Binomial and Poisson Probability Distributions
Introduction
Objectives
On completing this session you should be able to:
Binomial Experiments
DEFINITION 6.11
2. Each trial can result in one of only two possible outcomes; one outcome is
dubbed ‘success’, the other ‘failure’.
The 2nd and 3rd conditions imply that the probability of failure q equals 1 – p.
151
Examples of Binomial Experiments
ACTIVITY 6.9
Review the examples above and verify that the four conditions apply in each
case.
A random variable arises out of the binomial experiment. We define it in this way.
DEFINITION 6.12
• X is discrete.
In keeping with Definition 6.4, we can derive this distribution as a formula or function
that highlights the following:
152
The formula is developed out of our attempt to put meaning to the question
‘What does P(X = r) mean?’
P(X = r) means P(r successes in n trials) or equivalently P(r successes and n-r failures).
The event of ‘r successes and n-r failures’ can occur in several ways; each way is called a
combination of r successes and n-r failures.
Suppose, for example, that the first year students comprise exactly 1/3 of the student
population in the Faculty of Social Sciences at UWI and a small random sample of 5
students was selected from this Faculty. How do we find the probability that exactly
2 of the 5 students will be first year?
• By the addition law the probability of the 10 combinations = 10. (1/3)2 (2/3)3
5 5! 5x4x3x2x1
C2 = =
2! 3! (2x1)x(3x2x1)
5
• Thus the probability of 2 successes can be written as C2. (1/3)2 (2/3)3 = 80/243
153
We can find the probabilities for 0, 1, 3, 4, and 5 successes respectively in much the same
manner.
5
• P(0 successes) = C0. (1/3)0 (2/3)5 = 32/243
5
• P(1 success) = C1. (1/3)1 (2/3)4 = 80/243
5
• P(3 successes) = C3. (1/3)3 (2/3)2 = 40/243
5
• P(4 successes) = C4. (1/3)4 (2/3)1 = 10/243
5
• P(5 successes) = C5. (1/3)5 (2/3)0 = 1/243
DEFINITION 6.13
We can compute any cumulative probability for X with the aid of the binomial
probability formula and the addition law. e.g.
• Probability of less than 2 successes = P(X < 2) = P(X = 0 or 1) = P(X = 0) + P(X = 1)
= 32/243 + 80/243 = 112/243
154
Note that n and p are called the parameters of the binomial distribution. Without values
for n and p we cannot use the binomial distribution formula to compute P(X = r).
EXAMPLE 6.3:
Return to the example involving first year students from the Faculty of Social
Sciences.
Recall that the binomial random variable X was defined as the number of
successes in the 5 students who were selected.
E(X) = np = 5 x 1/3 = 5/
3 (What is your interpretation of this value?)
155
EXAMPLE 6.4
Return to the example involving first year students from the Faculty of Social
Sciences.
Recall that the binomial random variable X was defined as the number of
successes in the 5 students who were selected.
Find the probability that between 1 and 5 students selected would be first year
students.
156
ACTIVITY 6.10
According to a recently conducted survey, 90% of all male drivers in Trinidad
and Tobago claim to consistently drive over the legal speed limit.
Assume that this result is true for the current male driving population of
Trinidad and Tobago. Find the probability that, in a random sample of 20
male drivers:
(i) none of them drive over the legal speed limit
(ii) at least 2 of them drive over the legal speed limit
(iii) 19 of them DO NOT drive over the legal speed limit
DEFINITION 6.14
The Poisson process fits events that are randomly scattered over time and/or space (i.e.
you cannot predict when or where an event will occur). e.g.
• power outages,
• equipment failures,
• accidents,
• hurricanes.
DEFINITION 6.15
Such variables can take on values between o and infinity. These are clearly discrete.
The probability of a given number of occurrences is dependent on only one value i.e.
the average number of occurrences in the stated interval of time or space. In fact, in a
157
Poisson process the only available data is the average number of occurrences in the
stated interval of time or space. This average value is denoted by and is called the
parameter of the poisson distribution.
In keeping with Definition 6.4, we can derive this distribution as a formula or function
that highlights the following:
• all the values assumed by X,
• their corresponding probabilities,
• the probabilities sum to 1.
The formula is developed out of our attempt to put meaning to the question
‘What does P(X = r) mean?’
DEFINITION 6.16
Let X be the discrete random variable arising out of a poisson process with
parameter , the poisson probability formula is given by:
P( X = r) = e – λλr r = 0, 1, 2,……
r!
where e is the value 2.718………..
The formula reinforces the statement that once we know the value of we can compute
any probability under the Poisson Distribution.
Example: 6.5
158
Cumulative probabilities can be computed using the poisson probability distribution
and the addition law.
Example: 6.6
Find the probability that there will be at most 3 accidents at the Roundabout in
March 2002.
P(X ≤ 3) = P( X = 0 or 1 or 2 or 3)
ACTIVITY 6.11
Calls reaching a telephone exchange over the past five years average 10
per minute.
Find the probability of that
a) no calls are recieved during a one minute period.
b) between 5 and 9 calls inclusive are recieved during a one minute
period.
159
Recall that
For such a relationship to apply to the Binomial it follows that np = npq; this is possible
when q is approximately 1 (which implies that p is approximately 0).
To execute the approximation we simply set the parameter = np and substitute into the
Poisson distribution formula.
ACTIVITY 6.12
1. The table below shows the comparative values of P(X = 50) when p = 0.02
for three values of n viz. 50, 100 and 200
Binomial Poisson Error % Error
Probability Approximation
n = 50 0.0027 0.0031 .0004 15
n = 100 0.0353 0.0361 .0008 2
n = 200 0.1579 0.1563 .0016 1
Summary
Binomial and Poisson probability distributions were the focus of this session. We
defined the Binomial experiment and the Binomial probability distributions using the
formula:
n r n-r
P(r successes in n trials) = Cr (p) (q)
We went on to define the conditions for the poisson process, the poisson probability
formula and the uses of the Poisson distribution.
160
Session 3
The Normal Distribution
Introduction
The normal distribution is first and foremost a distribution for continuous random
variables. It is the most widely used continuous distribution and hence it is of great
importance in inferential statistics. In fact, a large number of phenomena in the real
world are either exactly or approximately normally distributed. Variables such as
heights, weights, time intervals, weights of packages, productive life of bulbs etc. all
possess a normal distribution. This session focuses on computation, interpretation and
application of the normal distribution.
Objectives:
At the end of this session, students must be able to:
161
Let X be the normal random variable. Clearly X is continuous.
All probabilities in the normal distribution are given as areas under the probability
density curve since that is the paradigm used for probability of continuous random
variables. Typical areas under the normal curve are as follows: .
• P(X is less than μ) = .50;
162
• P(X is greater than μ) = .50;
For a normal random variable X, the expected value E(X) = μ and the variance Var(X) =
2 ; hence the Standard Deviation of X = .
Since 99.75% of the area is trapped within 3 standard deviations of the mean , we say
that the Range of X is approximately 6 .
Each combination of and gives rise to a unique normal curve referred to as N(μ , ).
Hence and are called the parameters of the normal distribution; in fact, no probability
can be computed under the normal distribution without values for and .
Because the combinations of and can take on infinitely many values, we recognise
that there exists an infinitely large family of normal curves. The areas under each normal
curve N( ) can be presented in a cumulative probability table. Does this suggest that
we need to access a book containing infinitely many cumulative normal probability
tables? No. The reality is that we need only one probability table, i.e. the Table of the
Standard Normal Distribution. What is this distribution?
DEFINITION 6.17
DEFINITION 6.18
The random variable that possesses the standard normal distribution is called
the Standard Normal Variable and it is denoted by Z.
163
• The values of Z are located on the horizontal axis of the standard normal
curve, and are called Z scores otherwise called standard scores.
• Each Z score gives the distance on the z axis between the mean and the point
denoted by z in terms of standard deviations (the unit of measure of z is
therefore standard deviations)
Look at your copy of the Table of the Standard Normal Distribution and
double-check these five areas.
164
3. Let the corresponding Z scores be z1 and z2.
5. Read off the area from the table of the standard normal curve.
ACTIVITY 6.13
Use the Table of the Standard Normal Distribution to find the following
probabilities:
P(Z < 1.9) P(Z > 2.1)
P(1.9 < Z < 2.1) P(Z > -1.9)
P(-1.9 < Z < 1.9) P(Z < -2.1)
P(0 < Z < 0.44)
In the interest of much needed practice you should also attempt the next two activities.
ACTIVITY 6.14
Use the Table of the Standard Normal Distribution to find the following
probabilities:
• P( Z < 1.48)
• P( Z < -0.93)
• P( Z > 0.50)
• P( Z > -1.66)
• P( -2.15 < Z < 1.96)
• P(0.51 < Z < 1.12)
• P( -1.35 < Z < -0.64)
165
In Activities 6.12 and 6.13, you are required to work from the periphery of the standard
normal table to the interior of the table. You must also be able to work from the interior
to the periphery. This is the objective of the next activity.
ACTIVITY 6.15
Find z for each of the following:
• P( Z < z) = .80
The binomial distribution formula requires some ‘clumsy’ computations of the form
n
Cr and becomes very tedious when n is large.
We explored earlier in this unit the Poisson approximation to the Binomial; this
approximation could also be very tedious when n is large.
Recall
Variable Type Mean Variance
What then (we might ask) links the Binomial variable with the Standard Normal
variable? The answer is the Central Limit Theorem (CLT) which we describe as follows:
166
We will find that the outline of the histogram will approximate the curve of the Standard
Normal distribution i.e. symmetric and bell-shaped, mean = 0, variance = 1.
This is the guarantee provided by the CLT and forms the basis of the approximation.
Mindful of the fact that we are using a distribution for a continuous variable to
approximate the distribution of a discrete variable, we must ‘correct’ the discrete
variable so that it is seen to be continuous; this is done via a Continuity Correction.
DEFINITION 6.19
The subtraction of 0.5 and/or addition of 0.5 from the end points X = r1 and X = r2
respectively allows us to rewrite
P(r1 < X < r2)
i.e. the probability of between a and x2 successes
as
P(r1 - 0.5 < X < r2 + 0.5);
i.e. the probability that the ‘Continuous’ Variable lies between r1 – 0.5 and r2 + 0.5
167
• Sketch a curve of the standard normal distribution and shade the area
that corresponds to P( z1 < Z < z2)
• Read off the area from the standard normal table.
ACTIVITY 6.16
PRACTICE ASSIGNMENT #1
According to a recently conducted survey, 90% of all drivers in Trinidad and Tobago
claim to consistently drive over the legal speed limit. Assume that this result is true for
the current population of a Caribbean country. Find the probability that, in a random
sample of 20 drivers:
PRACTICE ASSIGNMENT #2
An average of 0.8 accidents occur every day on the roads of Trinidad and Tobago. Find
the probability that
PRACTICE ASSIGNMENT #3
In a recent poll, 81% of parents with children under 18 years of age said that it is more
difficult to raise children now than it was when their parents raised them. Assume that
this percentage is true for the current population of all parents with children under 18
years of age. Find the probability that, in a random sample of 1000 such parents, less
than 800 will hold the above view
168
PRACTICE ASSIGNMENT #4
Let X and Y denote two random variables that are normally and independently
distributed with means of 75 and 40 respectively, and standard deviations of 5 and 4
respectively. Define a new random variable U = 3X-2Y.
PRACTICE ASSIGNMENT #5
Let x be the number of cards that a randomly selected auto mechanic repairs on a given
day. The following table lists the probability distribution of x.
X 2 3 4 5 6
P(X=x) 0.05 0.22 * 0.23 0.10
169
Summary
This final session examined the random variables that are normally distributed,
parameters of the normal distribution and the use of the table of the standard normal
distribution to find probabilities. We also discussed use of the normal distribution to
approximate the binomial distribution.
Wrap Up
In this unit, we have defined random variables and provided a basis for classifying
them as discrete or continuous. We have constructed both the probability distribution
and the cumulative probability distribution for a discrete random variable; defined and
interpreted expected value and variance of a discrete random variable. We also defined
binomial experiments and poisson processes from which the binomial distribution and
the poisson distribution respectively arise. The probability distribution formulae for
both these distributions were presented.
We introduced the paradigm by which we conceptualise probability for continuous
random variables; defined probability density function and probability distribution
function to the probability density function for a continuous random variable; extended
the concepts of expected value and variance to continuous random variables.
We also interpreted the properties of expected value and variance for both discrete and
continuous random variables.
Special focus was given to the normal distribution, the poisson approximation to the
binomial distribution and the normal approximation to the binomial distribution.
This unit has introduced several fundamental issues that will be used in Units 7 to 9.
170