You are on page 1of 32

DEPARTMENT OF MATHEMATICS AND STATISTICS

Memorial University of Newfoundland St. John’s, Newfoundland


CANADA A1C 5S7 ph. (709) 737-8075 fax (709) 737-3010

Alwell Julius Oyet, Phd email: aoyet@math.mun.ca

STATISTICS FOR PHYSICAL SCIENCES - LECTURE


NOTES

3. DISCRETE RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS

§3.1 Random Variables

In any experiment, it is common for the experimenter to focus on either the outcomes of the ex-
periment or some characteristics or feature of the outcomes. For instance, if we conduct a simple
experiment of tossing a balanced coin twice, we may be interested in

(a) the possible outcomes of the experiment, or

(b) the number of tails that will appear.

We have studied (a) extensively over the past few weeks. Thus, we can say that the set of all possible
outcomes of the coin experiment, called the sample space, is

S = {HH, HT, T H, T T }.

Now, let us define the feature of interest as

X(·) = number of tails in two tosses.

Then, if the outcome was HH, we will find that X(HH) = 0 since this outcome has no tails.
Similarly,
X(HT ) = 1, X(T H) = 1 and X(T T ) = 2.

That is, the function X assigns the values 0, 1 and 2 to the outcomes in S. To fix ideas, let us
consider the experiment in Problem 2, Page 57 with sample space
 
 (R, R, R), (R, R, L), (R, R, S), (R, L, R), (R, L, L), (R, L, S), (R, S, R), (R, S, L), (R, S, S), 
 
S= (L, R, R), (L, R, L), (L, R, S), (L, L, R), (L, L, L), (L, L, S), (L, S, R), (L, S, L), (L, S, S), .

 
(S, R, R), (S, R, L), (S, R, S), (S, L, R), (S, L, L), (S, L, S), (S, S, R), (S, S, L), (S, S, S) 

1
Define the rule,

X = number of cars that turned right.

We find that

X(R, R, R) = 3, X(R, L, R) = 2, X(R, S, L) = 1, X(L, L, L) = 0, and so on.

If we enumerate all the values that X assigns to the outcomes in S, we will find that the possible
values of X are 0, 1, 2, and 3. The value 0 will occur 8 times because there are eight outcomes with
no R; 1 will occur 12 times; 2 will occur 6 times and 3 will occur only once. In this case, x = 0, 1, 2, 3.

We observe that elements of S are assigned different numerical values by the function X. That
means, X is a variable. Secondly, before we actually perform the experiment of tossing the coin, we
do not know what the outcome will be and hence the value of X is also unknown. The value of X
will depend on the observed outcome. The variable X is thus said to be random.

We also note that it is possible to define another rule on S, such as

Y = number of cars that went straight ahead, or

Z = number of cars that turned left.

X, Y , and Z are all rules defined on the same sample space. These rules assign numerical values to
each outcome of the sample space. By definition, any rule which assigns a number to each outcome
of a sample space will be called a random variable. By convention, we shall use uppercase letters to
represent random variables and lowercase letters to represent their possible values.

Definition: Let S be a given sample space of an experiment. A random variable is any function
or rule which assigns a number to each outcome in S.

Example: Consider the experiment in which batteries coming off an assembly line were examined
until a ‘good’ one (S) was obtained with sample space S = {S, FS, FFS, FFFS,· · ·}. Define the
random variable

X = number of batteries examined before the experiment terminates.

2
Then
X(S) = 1, X(F S) = 2, X(F F S) = 3, · · · .

Implying that the possible values of X are 1,2,3,4,5,· · · - an infinite sequence of numbers (the set of
natural numbers). In our previous examples the set of all possible values were finite sequences of
numbers. For the coin tossing experiment the set was 0 and 1 and 0, 1, 2 and 3 for the car experiment.

A common feature of these random variables is that the set of numbers they assign to the out-
comes of S are either finite or in the infinite case, the numbers can be listed in a sequence. That is,
there is a first element, a second element, a third element, and so on. Such a set, is said to be discrete.

Definition: A random variable is said to be discrete if its set of possible values is a discrete set.
Examples

1. For each of the random variables defined below, describe the set of possible values for the
variable and state whether the variable is discrete.

(a) X = the number of students on a class list for STAT 2510 who were absent on the first day
of classes.

(b) Y = the distance between a point target and a shot aimed at the point in a coin-operated
target game.

(c) Consider the outcome of a students attempt to log on to morgan. Let Z = the number of
successes in one attempt.

Solution:

(a) x = 0, 1, 2, 3, . . . , N , where N is the total number of students on the class list. The variable
X is discrete.

(b) a ≤ y ≤ b where a is the smallest possible distance and b is the largest possible distance.
The variable Y is not discrete.

(c) z = 0, 1. The variable Z is discrete.

Definition: Any random variable whose only possible outcomes are 0 and 1 is called a
Bernoulli random variable.

3
2. Problem 10, Page 101

The number of pumps in use at both a six pump station and a four-pump station will be
determined. Give the possible values for each of the following random variables.

(a) T = the total number of pumps in use.

(b) X = the difference between the numbers of pump in use at Stations 1 and 2.

(c) U = the maximum number of pumps in use at either station.

(d) Z = the number of stations having exactly two pumps in use.

Solution:

(a) t = 0, 1, 2, . . . , 10.

(b) x = −4, −3, −2, −1, 0, 1, 2, 3, 4, 5, 6.

(c) u = 0, 1, 2, 3, 4, 5, 6.

(d) z = 0, 1, 2.

§3.2 Probability Distributions For Discrete Random Variables


Recall the coin tossing experiment in which S = {HH, HT, T H, T T } and

X(·) = number of tails in two tosses.

We found that the possible values of X are x = 0, 1, 2. Now, since one out of the four outcomes in S
has the value x = 0, we can say that
1
p(0) = P (X = 0) = .
4
Similarly, since two of the outcomes in S has the value x = 1 and one of the outcomes has the value
x = 2, we can write
2 1
p(1) = P (X = 1) = and p(2) = P (X = 2) = .
4 4
These probabilities can also be tabulated as follows
x 0 1 2

1 1 1
p(x) 4 2 4

4
or written in the form

 1

 4
, if x = 0 or x = 2




p(x) =  12 , if x = 1





 0, otherwise.

That is, in repeated performance of this experiment, the outcomes corresponding to each of the
values x = 0 and x = 2 are expected to occur 25% of the time and so on.

In order to gain a better understanding of what we are doing, let us consider the experiment involving
cars turning right, straight or left. Recall that one of the rules we defined on the sample space of the
experiment was

X = number of cars that turned right.

We also found that the possible values of X are x = 0, 1, 2, 3. Now, since there are eight (8) outcomes
8
in S with no right turns, we have that p(0) = P (X = 0) = 27
. Similarly we have,
12 6 1
p(1) = P (X = 1) = , p(2) = P (X = 2) = , p(3) = P (X = 3) = .
27 27 27
That is,
x 0 1 2 3

8 12 6 1
p(x) 27 27 27 27
or
 8

 27
, if x = 0







 12
, if x = 1

 27




6
p(x) =  27
, if x = 2





 1

 , if x = 3

 27





0, otherwise.

Again, we find that a probability of 8/27 is associated with the X value 0, etc. The values of a ran-
dom variable, say X, along with their probabilities collectively specify the probability distribution
or probability mass function of X.

5
Example: Consider the mortgage example with a sample space of 16 outcomes S = {FFFF, FFFV,
FFVV, FVVV, VVVV, VVVF, VVFF, VFFF, VFFV, VFVV, VVFV, FFVF, FVVF, FVFV, VFVF,
FVFF}. Define the function or rule on S:

X(·) = number of fixed mortgages in a sample of 4 .

Based on this definition, we find that

X(F F F F ) = 4; X(F F F V ) = X(V F F F ) = X(F F V F ) = X(F V F F ) = 3;


X(F F V V ) = X(V V F F ) = X(V F F V ) = X(F V V F ) = X(F V F V ) = X(V F V F ) = 2;
X(F V V V ) = X(V V V F ) = X(V F V V ) = X(V V F V ) = 1; X(V V V V ) = 0.

Thus, the possible values of X are 0, 1, 2, 3 and 4 and


1 4 1 6 3
p(0) = P (X = 0) = ; p(1) = P (X = 1) = = ; p(2) = P (X = 2) = = ;
16 16 4 16 8
4 1 1
p(3) = P (X = 3) = = ; p(4) = P (X = 4) = .
16 4 16
It is common to represent this in the forms:

x 0 1 2 3 4

1 1 3 1 1
p(x) 16 4 8 4 16

or

 1

 16
, if x = 0 or 4







 1
, if x = 1 or 3
 4
p(x) = 

 3

 8
, if x = 2






 0, otherwise.
1 1
That is, a probability of 16
is placed on the X values 0 and 4; a probability of 4
is distributed to the
3
X value of 1 and 3; and a probability of 8
is associated with the X value 2.

Observe that the probabilities in each of the examples satisfy the following conditions:
(a) p(x) ≥ 0, for all values of x.
P
(b) all x p(x) = 1.
Note that any probability mass function (pmf) that does not satisfy these two conditions is not a
legitimate pmf.
Examples

6
Problem 15, Page 110

Many manufacturers have quality control programs that include inspection of incoming mate-
rials for defects. Suppose a computer manufacturer receives computer boards in lots of five.
Two boards are selected from each lot for inspection. We can represent possible outcomes of
the selection process by pairs. For example, the pair (1,2) represents the selection of boards 1
and 2 for inspection.

(a) List the ten different possible outcomes.

(b) Suppose that boards 1 and 2 are the only defective boards in a lot of five. Two boards are
to be chosen at random. Define X to be the number of defective boards observed among those
inspected. Find the probability distribution of X.

Solution:

(a) S = {(1,2), (1,3), (1,4), (1,5), (2,3), (2,4), (2,5), (3,4), (3,5) (4,5)}.

(b) Define X = number of defective boards observed among those inspected. Then, from part
(a) we find that x = 0, 1, 2. Now
3 6 1
p(0) = ; p(1) = ; p(2) = .
10 10 10
Thus the pmf of X is given by

x 0 1 2

3 6 1
p(x) 10 10 10

or

 3

 10
, if x = 0







 6
, if x = 1
 10
p(x) = 

 1

 10
, if x = 2






 0, otherwise.

Problem

Consider a group of five potential blood donors - A, B, C, D, and E - of whom only A and B
have type O+ blood. Five blood samples, one from each individual, will be typed in random
order until an O+ individual is identified. Let the rv Y = the number of typings necessary to

7
identify an O+ individual. Find the pmf of Y .

Solution:

Using a tree diagram, it is easy to see that y = 1, 2, 3, 4 and that

x 1 2 3 4

2 3 1 1
p(x) 5 10 5 10

or
 2

 5
, if x = 1







 3
, if x = 2

 10




1
p(x) =  5
, if x = 3





 1

 , if x = 4

 10





0, otherwise.

Cumulative Distribution Function

Consider the coin tossing experiment with pmf


x 0 1 2

1 1 1
p(x) 4 2 4

We observe that the probability that X will be equal to a value of x that is less than 0 is equal to
zero. That is, P (X < 0) = 0. Furthermore, the probability that X will take a value between 0 and
1 (excluding the endpoints) will also be zero. That is, P (0 < X < 1) = 0. It then follows that,

1
P (X ≤ 0) = P (X ≤ 0.5) = P (X ≤ 0.789) = . . . = P (X ≤ 0.99999) = P (X = 0) = .
4
Similarly, we find that
3
P (X ≤ 1) = P (X ≤ 1.5) = P (X ≤ 1.789) = . . . = P (X ≤ 1.99999) = P (X = 0) + P (X = 1) = .
4
Now,
P (X ≤ 2) = P (X = 0) + P (X = 1) + P (X = 2) = 1.

8
Since the largest possible value of X is x = 2, we find that

P (X ≤ 2.005) = P (X ≤ 2.3) = P (X ≤ 2.999999) = P (X ≤ 5)


= P (X ≤ any number greater than 2)
= P (X = 0) + P (X = 1) + P (X = 2) = 1.

Let us denote P (X ≤ x) by F (x). Then, we can summarize these results in the following way.


 0, if x < 0







 1
 4,
 if 0 ≤ x < 1
F (x) = 

 3

 4
, if 1 ≤ x < 2






 1, if x ≥ 2.

Applying the same principle to the computer boards Problem 15, Page 108 we find that


 0, if x < 0







 3
 10 ,
 if 0 ≤ x < 1
F (x) = 

 9

 10
, if 1 ≤ x < 2






 1, if x ≥ 2.

We will refer to F (x) as the cumulative distribution function (cdf) of a discrete random variable X.
By definition, the cdf of a discrete rv X, for every number x, is given by
X
F (x) = P (X ≤ x) = p(y),
all y ≤ x

where p(x) is the pmf of X.

Examples

Problem 14, Page 109

A contractor is required by a county planning department to submit one, two, three, four, or
five forms (depending on the nature of the project) in applying for a building permit. Let Y =
the number of forms required of the next applicant. The probability that y forms are required
is known to be proportional to y - that is p(y) = ky for y = 1, . . . , 5.

(a) What is the value of k ?

9
(b) What is the probability that at most three forms are required ?

(c) What is the probability that between two and four forms (inclusive) are required ?
y2
(d) Could p(y) = /
50 for y = 1, . . . , 5 be the pmf of Y ?

Solution:

(a) We will use the fact that any legitimate pmf must satisfy the condition
X
p(x) = 1
all x values
to find the value of k. Now,
X 5
X
p(y) = ky = k(1 + 2 + 3 + 4 + 5) = 15k = 1.
all y values y=1

1
Thus, k = 15
. That is,
 y

 15
, if y = 1, 2, . . . , 5
p(y) =


0, otherwise.

(b) Let Y = number of forms required of the next applicant. Then, the required probability is

F (3) = P (Y ≤ 3) = P (Y = 1) + P (Y = 2) + P (Y = 3)
1 2 3 6
= + + = = 0.4.
15 15 15 15

(c) In this part, the required probability is

P (2 ≤ Y ≤ 4) = P (Y = 2) + P (Y = 3) + P (Y = 4)
2 3 4 9
= + + = = 0.6.
15 15 15 15

(c) In this part, we check the two conditions that p(y) must satisfy.

(a) First,
y2
p(y) = > 0, for all values of y.
50
(b) Secondly,
5
X X y2 1 4 9 16 25 55
p(y) = = + + + + = > 1.
y=1 50 50 50 50 50 50 50
all y values
Since, p(y) fail to satisfy condition two, it cannot be the pmf of Y .

10
OTHER PRACTICE PROBLEMS

1. Problem

Let X = the number of tires on a randomly selected automobile that are underinflated.

(a) Which of the following three p(x) functions is a legitimate pmf of X, and why are the other
two not allowed ?

x 0 1 2 3 4
p(x) 0.3 0.2 0.1 0.05 0.05
p(x) 0.4 0.4 0.1 0.4 0.3
p(x) 0.4 0.1 0.2 0.1 0.3

(b) For the legitimate pmf of part a, compute P (2 ≤ X ≤ 4), P (X ≤ 2), and P (X 6= 0).

(c) If p(x) = c · (5 − x) for x = 0, 1, · · · , 4, what is the value of c ?

2. Problem 18, Page 110

Two fair six-sided dice are tossed independently. Let M = the maximum of the two tosses (so
M(1, 5) = 5, M (3, 3) = 3, etc.).

a. What is the pmf of M ?

b. Determine the cdf of M and graph it.

We note that given the pmf of any random variable, we are able to construct the cdf of the rv. A
question which arises naturally is whether we can do the opposite. That is, given the cdf, can we
obtain the pmf ? The answer is yes. For instance, in Problem 15, Page 108, we find that

P (X ≤ 0) = P (X = 0).

3
Thus, p(0) = P (X = 0) = F (0) = P (X ≤ 0) = 10
. Furthermore,

P (X ≤ 1) = P (X = 0) + P (X = 1).

Therefore,
9 3 6
p(1) = P (X = 1) = P (X ≤ 1) − P (X = 0) = P (X ≤ 1) − P (X ≤ 0) = F (1) − F (0) = − = .
10 10 10
Similarly, we find that
1
p(2) = P (X = 2) = P (X ≤ 2) − P (X ≤ 1) = F (2) − F (1) = 1 − f rac910 = .
10

11
The idea we have applied here can be generalized. First, we identify the possible values of X from
the cdf. Then we compute the corresponding probabilities by subtraction. That is, in general, if a is
any one of the possible values of X, we have

p(a) = P (X = a) = F (a) − F (a − 1).

Also, let a and b be any two numbers such that a ≤ b. Then,

P (a ≤ X ≤ b) = F (b) − F (a − 1).

12
Example

Problem 23, Page 110

An insurance company offers policyholders a number of different premium payment options. For
a randomly selected policyholder, let X = the numer of months between successive payments.
The cdf of X is as follows:


 0, if x < 1







 0.3, if 1 ≤ x < 3








 if 3 ≤ x < 4
 0.4,

F (x) = 



 0.45, if 4 ≤ x < 6







 0.6, if 6 ≤ x < 12








0, if 12 ≤ x

(a) What is the pmf of X ?

(b) Using just the cdf, compute P (3 ≤ X ≤ 6) and P (4 ≤ X).

Solution:

(a) From the cdf we find that the possible values of X are the lower limits of the range of values
of x in the cdf defined above. That is, possible values of X are, x = 1, 3, 4, 6, and 12. Now,

p(1) = P (X = 1) = F (1) − F (0) = 0.3 − 0 = 0.3.


p(3) = P (X = 2) = F (3) − F (2) = 0.4 − 0.3 = 0.1.
p(4) = P (X = 4) = F (4) − F (3) = 0.45 − 0.4 = 0.05.
p(6) = P (X = 6) = F (6) − F (5) = 0.6 − 0.45 = 0.15.
p(12) = P (X = 12) = F (12) − F (11) = 1 − 0.6 = 0.4.

Thus, the pmf of X is given by

x 1 3 4 6 12

p(x) 0.3 0.1 0.05 0.15 0.4

13
Recall that a Bernoulli r. v. can only take the values 0 and 1. The general form of the pmf of
Bernoulli r. v.’s is


 1 − α, if x = 0





p(x; α) =  α, if x = 1





 0, otherwise.
where 0 < α < 1. The quantity α is called a parameter of the distribution. Different values of α yield
different pmf’s. Thus, the general form is called the family of Bernoulli distributions. For example
 

 0.4, if x = 0 
 0.5, if x = 0

 


 

 
p(x; 0.6) =  0.6, if x = 1 p(x; 0.5) =  0.5, if x = 1

 


 


 0, 
 0,
otherwise; otherwise.
§3.3 Expected Values of Discrete Random Variables

Refering back to the experiment of tossing a fair coin twice, we might want know how many heads
will occur, on the average or the number of heads we expect to occur. In that case, we have to
compute the average of the possible values of the random variable X = number of heads in two
tosses. Recall that the possible values of X are:

X(HH) = 2; X(HT ) = 1; X(T H) = 1; X(T T ) = 0.

2+1+1+0
The average of these values is, Mean = 4
= 1. That is, on the average, only a single head
will occur in two tosses or we expect a single head in two tosses. Now, given the pmf of X, we can
also compute the mean value of X. The pmf of X is
x 0 1 2

p(x) 0.25 0.5 0.25.


The expected value or mean value of X, denoted by E(X) or µX or simply µ, can then be computed
as
µ = E(X) = 0 × 0.25 + 1 × 0.5 + 2 × 0.25 = 1.

That is, we multiply each value x by the corresponding probability and then sum to find the mean
value. In general, the mean value of a discrete random variable X is defined by
X
µ = E(X) = x · p(x).
all x values

14
Example

Problem 29, Page 118

An individual who has automobile insurance from a certain company is randomly selected. Let
Y be the number of moving violations for which the individual was cited during the last three
years. The pmf of Y is

y 0 1 2 3
p(y) 0.60 0.25 0.1 0.05

(a) Compute E(Y ).

(b) Suppose an individual with Y violations incurs a surcharge of $100Y 2 . Calculate the ex-
pected amount of the surcharge.

Solution:

(a) Using the formula, we find that


X
µ = E(X) = x · p(x)
all x values
= 0 × 0.6 + 1 × 0.25 + 2 × 0.1 + 3 × 0.05 = 0.6.

(b) We can solve part (b) in two ways. We will use the fact that if h(X) is any function of a
discrete rv X, then
X
µh(X) = E[h(X)] = h(x) · p(x).
all x values
In this example, h(Y ) = 100Y 2 is the surcharge. It follows that

X 3
X
µh(Y ) = E[h(Y )] = h(y) · p(y) = 100y 2 × p(y)
all y values y=0

= 100 × 02 × 0.6 + 100 × 12 × 0.25 + 100 × 22 × 0.1 + 100 × 32 × 0.05 = 110.00.

That is, the expected or mean surcharge is $110.00. Alternatively, we can rewrite E[h(Y )] =
E(100Y 2 ) = 100E(Y 2 ). Then,
3
X
µh(Y ) = 100E(Y 2 ) = 100 y 2 × p(y)
y=0

= 100 × [0 × 0.6 + 1 × 0.25 + 22 × 0.1 + 32 × 0.05] = 110.00.


2 2

15
We now state a rule we can use in computing mean values of linear functions of random variables.

Rule:
Let a and b be constants. If h(X) = aX + b, then E[h(X)] = E(aX + b) = aE(X) + b. In particular,

1. E(aX) = aE(X).

2. E(X + b) = E(X) + b.

Variance of a discrete rv

Another important special case is when h(X) is the square of the deviation of X from the mean
value. That is,
h(X) = (X − µ)2 , µ = E(X).
2
In this case, the expected value of the squared deviation is denoted by σX of V (X), called the variance
of X. That is, the variance of X is given by
X
2
σX = V (X) = E[(X − µ)2 ] = (x − µ)2 · p(x).
all x values
If we arrange the values of X in ascending order, we will find that the mean value or the expected
value of X will lie somewhere at the midpoint. The variance then measures how spread out the
values of X are, about the mean value. If the values of X are close together, the variance will be
small and large if the X values are far apart. Thus, the variance of X is a measure of the dispersion
of the values of X from the mean. It can be used as an estimate of total variability in the observed
responses from our experiment.

It is easier to use a shortcut formula for calculating the variance of a random variable. The formula,
which can be easily derived, is given by

2
σX = V (X) = E[(X − µ)2 ] = E(X 2 ) − [E(X)]2 .

Example

Problem 31, Page 118

An appliance dealer sells three different models of upright freezers having 13.5, 15.9 and 19.1
cubic feet of storage space, respectively. Let X = the amount of storage space purchased by
the next customer to buy a freezer. Suppose that X has pmf

16
x 13.5 15.9 19.1
p(x) 0.2 0.5 0.3

(a) Compute E(X), E(X 2 ), V (X).

(b) If the price of a freezer having a capacity X cubic feet is 25X − 8.5, what is the expected
price paid by the next customer to buy a freezer ?

(c) What is the variance of the price 25X − 8.5 paid by the next customer ?

Solution:

(a)

E(X) = 13.5 × 0.2 + 15.9 × 0.5 + 19.1 × 0.3 = 16.38.

E(X 2 ) = 13.52 × 0.2 + 15.92 × 0.5 + 19.12 × 0.3 = 272.298.

V (X) = E(X 2 ) − µ2 = 272.298 − 16.382 = 3.9936.

(b) Here, h(X) = 25X − 8.5. Thus, using the linearity property of expectation

E[h(X)] = 25E(X) − 8.5 = 25 × 16.38 − 8.5 = 401.

(c) In this part we are to find V [h(X)]. The easiest way of solving this problem is by using
a rule of variance for linear functions of random variables. That rule says, if h(X) = aX + b,
then
V [h(X)] = V (aX + b) = a2 V (X).

In particular, if a = 1,
V [X + b] = V (X).

That is, adding a constant to all values of the rv X will not change the variance of that random
variable. Applying these rules to part (c), we find that if h(X) = 25X − 8.5, then

V [h(X)] = V (25X − 8.5) = 252 V (X) = 252 × 3.9936 = 2496.

We conclude this section with another example.

17
Example

Problem 34, Page 119


A small drugstore orders copies of a news magazine for its magazine rack each week. Let X =
demand for the magazine, with pmf

x 1 2 3 4 5 6
1 2 3 4 3 2
p(x) 15 15 15 15 15 15

Suppose the store owner actually pays $1.00 for each copy and the price to customers is $2.00.
If magazines left at the end of the week have no salvage value, is it better to order three or
four copies of the magazine ?

Solution:
The number of copies to be ordered will depend on the revenue the store owner expect to make
at the end of the week. Now, let X = demand for the magazine and h(X) = net revenue of
store owner. Then, if the store owner purchases 3 copies, her net revenue will be

 2X − 3,
 if x = 1, 2, 3
h(X) =


3, if x = 4, 5, 6.
It follows that,
X 3
X 6
X
E[h(X)] = h(x) · p(x) = (2x − 3) · p(x) + 3 · p(x)
all x values x=1 x=4
1 2 3 4 3 2 37
= −1 × +1× +3× +3× +3× +3× = .
15 15 15 15 15 15 15
Thus, the store owner will make a net revenue of approximately $2.47 if 3 copies of the maga-
zine are ordered.

Now, if 4 copies are ordered we find that



 2X − 4,
 if x = 1, 2, 3, 4
h(X) = 

4, if x = 5, 6.
It follows that,
X 4
X 6
X
E[h(X)] = h(x) · p(x) = (2x − 4) · p(x) + 4 · p(x)
all x values x=1 x=5
1 2 3 4 3 2 40
= −2 × +0× +2× +4× + 34 × +4× = .
15 15 15 15 15 15 15

18
Thus, the store owner will make a net revenue of approximately $2.67 if 4 copies of the maga-
zine are ordered. We will therefore recommend that 4 copies of the magazine should be ordered.

Practice Problem

An instructor in a technical writing class asked that a certain report be turned in the following
week, adding the restriction that any report exceeding four pages will not be accepted. Let Y
= the number of pages in a randomly chosen student’s report and suppose that Y has pmf

y 1 2 3 4
p(y) 0.01 0.19 0.35 0.45

(a) Compute E(Y ).



(b) Suppose the instructor spends Y minutes grading a paper consisting of Y pages. What

is the expected amount of time E( Y ) spent grading a randomly selected paper ?

§3.4 The Binomial Probability Distribution

There are several experiments with outcomes that can be classified into two distinct categories. For
instance, products from an assembly line can be classified as defective or not defective; a new born
baby is either a boy or a girl; a drug will either be effective or not effective; and so on. As a form
of notation, let us denote one of the categories by success (S) and the other category by failure (F ).
We emphasize that this is only a notation and does not necessarily mean that success is “good” and
failure is “bad”. For example, a couple that are desperate for a baby girl may view having a baby
boy as a failure. That does not necessarily mean that it is bad to have a boy. Failure in this case
means that they failed in their attempt to have a baby girl. Thus we will use the term success to
represent the outcome that is of interest to the experimenter.

Now consider an experiment that satisfies the following conditions.

(a) The outcomes of the experiment can be classified into two categories which we shall denote
by success (S) or failure (F ).

(b) The experiment can is repeated n times under identical conditions.

(c) The outcome of the experiment in one trial does not influence the outcome in any other
trial of the experiment. That is, the trials of the experiment are independent.

19
(d) The probability of success, we shall denote by p, is constant from trial to trial.

Any experiment satisfying these conditions shall be called a binomial experiment. In binomial ex-
periments, we shall be interested in the random variable

X = number of successes in n trials.

X is said to be a binomial random variable with parameters n and p and we write X ∼ Bin(n, p).
Now, supposing that n = 4, then there are 24 = 16 possible outcomes in the sample space of the
experiment
( )
SSSS, SSSF, SSF S, SSF F, SF SS, SF SF, SF F S, SF F F,
S= .
F SF S, F SF F, F SSS, F F SS, F F F S, F F F F, F F SS, F F SF

From S we can see that the possible values of X are x = 0, 1, 2, 3, 4. To construct the pmf of X we
have to find the probability of X at each of these values. Now, using independence we find that
à !
4 4
p(0) = P (X = 0) = P (F F F F ) = P (F ) × P (F ) × P (F ) × P (F ) = (1 − p) = p0 (1 − p)4−0 .
0

Next,

p(1) = P (X = 1) = P (SF F F or F SF F or F F F S or F F SF )
= P (SF F F ) + P (F SF F ) + . . . + P (F F SF )
= P (S) · P (F ) · P (F ) · P (F ) + P (F ) · P (S) · P (F ) · P (F ) + . . .
= 4 × P (S) · P (F ) · P (F ) · P (F )
= 4p(1 − p)3
à !
4
= p1 (1 − p)4−1 .
1

Similarly, we find that

p(2) = P (X = 2) = P (SSF F or SF SF or SF F S or F SF S or F SSF or F F SS)


= P (SSF F ) + P (SF SF ) + . . . + P (F F SS)
= P (S) · P (S) · P (F ) · P (F ) + P (S) · P (F ) · P (S) · P (F ) + . . .
= 6 × P (S) · P (S) · P (F ) · P (F )
= 6p2 (1 − p)2
à !
4
= p2 (1 − p)4−2 ;
2

20
p(3) = P (X = 3) = 4 × P (S) · P (S) · P (S) · P (F )
= 4p3 (1 − p)
à !
4
= p3 (1 − p)4−3 ;
3

and

p(4) = P (X = 4) = P (S) · P (S) · P (S) · P (S)


= p4
à !
4
= p4 (1 − p)4−4 .
4

Now, if X ∼ Bin(n, p), define b(x; n, p) = P (X = x). Then, from the pattern in the probabilities we
have just computed, we deduce that
 Ã !

 n
px (1 − p)n−x , x = 0, · · · , n
b(x; n, p) = P (X = x) = x


0, otherwise.

is the probability mass function of a binomial random variable.


Example

Problem 44, Pg 126

Compute the following probabilities directly from the formula for b(x; n.p)
(a) b(3; 8, 0.6)

(b) b(5; 8, 0.6)

(c) P (3 ≤ X ≤ 5) when n = 8 and p = 0.6

Solution:

(a) From the formula,


à !
8
b(3; 8, 0.6) = P (X = 3) == 0.63 (1 − 0.6)8−3
3
= 56 × 0.216 × 0.01024 = 0.123863.

(b)
à !
8
b(5; 8, 0.6) = P (X = 5) == 0.65 (1 − 0.6)8−5
5
= 56 × 0.07776 × 0.064 = 0.2786918.

21
(c)

P (3 ≤ X ≤ 5) = b(3; 8, 0.6) + b(4; 8, 0.6) + b(5; 8, 0.6)


à !
8
= 0.123863 + 0.64 (1 − 0.6)8−4 + 0.2786918
4
= 0.123863 + 70 × 0.1296 × 0.0256 + 0.2786918 = 0.634798.

Having discussed the pmf of a binomial rv, it is natural to consider the cumulative distribution
function (cdf) of the random variable. Recall that the cdf of a rv is defined by

F (x) = P (X ≤ x).

If X ∼ Bin(n, p), we denote the cdf by B(x; n, p). That is, for a binomial rv,

B(x; n, p) = P (X ≤ x).

The values of the cumulative binomial probabilities have been evaluated for some values of n and p
and tabulated. These tables can be used rather than computing the probabilties directly from the
formula. We can use the property that a pmf should sum up to one over all values of the random
variable to show that, for the binomial rv,

E(X) = np; and σ 2 = V (X) = np(1 − p).

Examples

Problem 45, Page 126


Use the table of cumulative binomial probabilities to obtain the following probabilities
(a) B(4; 10, 0.3)
(b) b(4; 10, 0.3)
(c) P (X ≥ 2) when X ∼ Bin(10, 0.3)
(d) P (2 < X < 6) when X ∼ Bin(10, 0.3)

Solution:
(a) From the table, B(4; 10, 0.3) = 0.850
(b) Before we can use the table, we have to express the required probability in terms of cumu-
lative probability.

b(4; 10, 0.3) = P (X = 4) = F (4) − F (3)


= B(4; 10, 0.3) − B(3; 10, 0.3) = 0.85 − 0.65 = 0.2.

22
(c) We use the idea of compliments to express the required probability in terms of cumulative
probability.

P (X ≥ 2) = 1 − P (X < 2) = 1 − P (X ≤ 2) = 1 − B(2; 10, 0.3)


= 1 − 0.149 = 0.851.

(d) Again we need to express the required probability in terms of cumulative probability. We
proceed as follows

P (2 < X < 6) = P (3 ≤ X ≤ 5) = F (5) − F (2)


= B(5; 10, 0.3) − B(2; 10, 0.3)
= 0.953 − 0.149 = 0.804.

Problem 48, Page 126

Suppose that only 20% of all drivers come to a complete stop at an intersection having flashing
red lights in all directions when no other cars are visible. What is the probability that, of 20
randomly chosen drivers coming to an intersection under these conditions,

(a) At most 6 will come to a complete stop ?

(b) Exactly 6 will come to a complete stop ?

(c) At least 6 will come to a complete stop ?

(d) How many of the next 20 drivers do you expect to come to a complete stop ?

Solution:

First, we define a random variable, X = number of drivers that will come to a complete stop
out of 20. Note that n = 20 and p = 0.2. Then, X ∼ Bin(20, 0.2).

(a) P (X ≤ 6) = B(6; 20, 0.2) = 0.913 (from the table).

(b)

P (X = 6) = b(6; 20, 0.2) = B(6; 20, 0.2) − B(5; 20, 0.2)


= 0.913 − 0.804 = 0.109.

(c) We use the law of compliments to write

P (X ≥ 6) = 1 − B(5; 20, 0.2) = 1 − 0.804 = 0.196.

(d) E(X) = n · p = 20 × 0.2 = 4.

23
Problem 51, Page 127

Twenty percent of all telephones of a certain type are submitted for service while under war-
ranty. Of these, 60% can be repaired whereas the other 40% must be replaced with new units.
If a company purchases ten of these telephones, what is the probability that exactly two will
end up being replaced under warranty ?

Solution:

Again, we begin by defining a random variable, X = number of telephones to be replaced


under warranty out of 10. Now, n = 10 and p = probability of replacement under warranty is
unknown and has to be computed. Define the events,

Define the events, A = {submitted for service} and B = {replaced under warranty}. Now,
P (A) = 0.2, P (B|A) = 0.4, P (B 0 |A) = 0.6, P (B|A0 ) = 0. Using the law of total probability,
we obtain

p = P (B) = P (B|A) · P (A) + P (B|A0 ) · P (A0 )


= 0.2 × 0.4 + 0 = 0.08.

It follows that
à !
10
b(2; 10, 0.08) = P (X = 2) == 0.082 (1 − 0.08)10−2
2
= 45 × 0.0064 × 0.5132188 = 0.147807.

§3.5 Hypergeometric Distribution


When we studied counting techniques, we mentioned certain features in a problem we can use to
identify the method of counting to apply. Some of the features we mentioned are

1. the experiment involves selecting n items or objects, without replacement, from a set or popu-
lation of N items or objects.

2. each item in the population can be classified into two categories. We will label all items in one
of the categories as success (S) and the items in the second category as failure (F) and there
are M items that are successes in the population. That is, the N items can be split into two
sets of M successes and N − M failures.

3. the n items are selected in such a way that each subset of n items is equally likely to be selected.

24
From these features we can immediately see that p = proportion of successes in the population =
M M
N
and 1 − p = proportion of failures in the population = 1 − N
. The sample space of such an
experiment will consist of all possible arrangements of n items from N without replacement and
regard to order. That is,
à !
N
N(S) = .
n

Now we define a random variable by X = number of successes amongst the n selected items. These x
successes amongst the n selected items can only come from the set of M successes. The total number
of ways in which these x successes can be selected from the M successes is
à !
M
n1 = .
x

If there are x successes amongst the n selected items, then there are obviously n − x failures which
can only be taken from the set containing N − M failures. Now, these n − x failures can be chosen
from the set of N − M failures in
à !
N −M
n2 =
n−x

ways. It follows that, the event A which cotains n items with exactly x successes and n − x failures
from the population of N items will have a total of
à ! à !
M N −M
N(A) = n1 × n2 = ×
x n−x

combinations. Thus,
à ! à !
M N −M
×
N (A) x n−x
P (A) = P (X = x) = = Ã ! .
N (S) N
n

This probability distribution is the pmf of X called the hypergeometric distribution of X and denoted
by h(x; n, M, N ). That is, p(x) = P (X = x) = h(x; n, M, N ).

Note that the definition of the hypergeometric rv X is similar to the definition of the binomial rv.
The main difference is that in the hypergeometric case, we know the number of objects or individuals
in the population N whereas in the binomial case, this information is unknown. Thus, the selection
in the hypergeometric case is said to be from a finite population with
M M
p= ; and 1 − p = 1 − .
N N

25
Using ideas from the binomial rv, we can deduce that for the hypergeometric rv X,
M
E(X) = n · p = n · ; and
N
N −n
V (X) = n · p · (1 − p) ·
N −1
µ ¶
M M N −n
= n· · 1− · ,
N N N −1
N−n
where N−1
is called the finite population correction factor (fpc). Notice that when N is large
compared to n, then
N −n
≈ 1.
N −1
For instance, take N = 500 and n = 10, then
N −n 490
= = 0.9819639.
N −1 499
In situations where N is large compared to n, one can use the binomial probability mass function
n
to approximate the hypergeometric probability mass function. As a rule of thumb, if N
≤ 0.05, the
M
approximation b(x; n, p) ≈ h(x; n, M, N) with p = N
will be a good approximation.
Examples

Problem 67, Page 134


A geologist has collected 10 specimens of basaltic rock and 10 specimens of granite. The
geologist instructs a lab assistant to randomly select 15 of the specimens for analysis.
(a) What is the pmf of the number of granite specimens selected for analysis ?
(b) What is the probability that all specimens of one of the two types of rock are selected for
analysis ?
(c) What is the probability that the number of granite specimens selected for analysis is within
1 standard deviation of its mean value ?

Solution:
In this problem, N = 20, n = 15, M = 10, N − M = 10. Define X = number of granites
amongst 15 selected rocks. Then,
(a)
à ! à !
10 10
×
x 15 − x
p(x) = P (X = x) = h(x; 15, 10, 20) Ã ! , for x = 5, 6, . . . , 10.
20
15

26
(b) The required probability is the probability that the 15 selected rocks consist of 10 granites
and 5 basaltic rocks or 5 granites and 10 basaltic rocks. That is, the required probability is
à ! à ! à ! à !
10 10 10 10
× ×
10 5 5 10
P (X = 10) + P (X = 5) = Ã ! + Ã ! = 0.0325077
20 20
15 15

(c) Required probability = P (−σ ≤ X − µ ≤ σ) = P (µ − σ ≤ X ≤ µ + σ). Thus, we need to


compute µ = E(X) and σ 2 = V (X) before the required probability can be evaluated. Now,
M 10
µ = E(X) = n · = 15 × = 7.5.
N 20

µ ¶
M M N −n
σ 2 = V (X) = n · · 1− ·
N N N −1
µ ¶
10 10 5 75
= 15 · · 1− · = = 0.9868421.
20 20 19 76

Therefore, σ = 0.9868421 = 0.9933992. It follows that

µ − σ = 7.5 − 0.9933992 = 6.5066007 and µ + σ = 7.5 + 0.9933992 = 8.4933993

Then, the required probability is

P (6.5066007 ≤ X ≤ 8.4933993) = P (7 ≤ X ≤ 8)
= P (X = 7) + P (X = 8)
à ! à ! à ! à !
10 10 10 10
× ×
7 8 8 7
= Ã ! + Ã !
20 20
15 15
= 0.6965944

Problem 68, Page 134

A personnel director interviewing 11 senior engineers for four job openings has scheduled six
interviews for the first day and five for the second day of interviewing. Assume that the
candidates are interviewed in random order.

(a) What is the probability that x of the top four candidates are interviewed on the first day ?

(a) How many of the top four candidates can be expected to be interviewed on the first day ?

27
Solution:

We first define the random variable X = number of top four candidates interviewed on the first
day. Now, N = 11, M = 4, n = 6 and N − M = 7.

(a)
à ! à !
4 7
×
x 6−x
P (X = x) = Ã ! , x = 0, 1, 2, 3, 4
11
6

(b)
M 4
E(X) = n · =6× ≈ 2.18.
N 11
Problem 70, Page 134

A second-stage smog alert has been called in a certain area of Los Angeles County in which
there are 50 industrial firms. An inspector will visit 10 randomly selected firms to check for
violation of regulations.

(a) If 15 of the firms are actually violating at least one regulation, what is the pmf of the
number of firms visited by the inspector that are in violation of at least one regulation ?

(b) If there are 500 firms in the area, of which 150 are in violation, approximate the pmf of
part (a) by a simpler pmf.

(c) For X = number among the 10 visited that are in violation, compute E(X) and V (X) both
for the exact pmf and the approximating pmf in part (b).

Solution:

In this problem, N = 50, n = 10, M = 15, N − M = 35. Define X = number of firms that are
in violation amongst 10 selected for inspection. Then,

(a)

p(x) = P (X = x) = h(x; 10, 15, 50)


à ! à !
15 35
×
x 10 − x
= Ã ! , for x = 0, 1, . . . , 10.
50
10

28
150
(b) Here, M = 150, N = 500. It follows that, p = 500
= 0.3. Therefore,

p(x) = P (X = x) = h(x; 10, 150, 500) ≈ b(x; 10, 0.3)

(c) Using the hypergeometric distribution, we have


M
E(X) = n · = 10 × 0.3 = 3
N
and
µ ¶
M M N −n
V (X) = n · · 1− ·
N N N −1
490
= 10 × 0.3 × (1 − 0.3) · = 2.062124.
499
Under the binomial approximation, we find that

E(X) = n · p = 10 × 0.3 = 3

and

V (X) = n · p · (1 − p)
= 10 × 0.3 × (1 − 0.3) = 2.1.

§3.6 The Poisson Probability Distribution


Consider an experiment which involves standing at an intersection and counting the number of acci-
dents that occur every 60mins. We are most likely going to find that most 60min. periods will have
zero accidents. Thus the event that accidents will occur at that intersection every 60mins. is a rare
event. It is also rare to observe the west nile virus infection every month in Canada. The Poisson
distribution is often used to model rare events within some unit such as time or area.

Definition: A random variable X is said to have a Poisson distribution if the pmf of X is


e−λ λx
P (X = x) = p(x; λ) = , x = 0, 1, 2. . . .
x!
for some λ > 0. The parameter λ is frequently the average value of X or a rate per unit time or area.
It is very straightforward to verify that if X follows the Poisson distribution with parameter λ, then

E(X) = V (X) = λ.

Using tables of cumulative Poisson probabilities, it is easy to solve a number of problems.

Example

29
Problem 76, Page 138

Suppose the number X of tornadoes observed in a particular region during a 1-year period has
a Poisson distribution with λ = 8.

(a) Compute P (X ≤ 5)

(b) Compute P (6 ≤ X ≤ 9)

(c) Compute P (X ≥ 10)

(d) How many tornadoes can be expected to be observed during the 1-year period ? What is
the standard deviation of the number of observed tornadoes ?

Solution:

Let X = number of tornadoes in a region. Using the table of cumulative probabilities with
λ = 8, we find that

(a) P (X ≤ 5) = 0.191

(b) We first express the required probability in the form P (X ≤ x) before we can use the table.

P (6 ≤ X ≤ 9) = P (X ≤ 9) − P (X ≤ 5)
= 0.717 − 0.191 = 0.526.

(c) Here we use the idea of compliments to write

P (X ≥ 10) = 1 − P (X ≤ 9)
= 1 − 0.717 = 0.283.

q √
(d) E(X) = λ = 8 and σ = V (X) = 8 ≈ 2.83

Sometimes, we find that the number of trials n in a binomial experiment is so large and the probability
of success p is small. It can be shown in such cases, the Poisson pmf can be used to approximate the
binomial pmf. That is
b(x; n, p) ≈ p(x; λ)

where λ = np. As a rule of thumb, this approximation will work well if n ≥ 100, p ≤ 0.01 and
np ≤ 20. Now let us consider an example in which we can apply this principle.

Examples

30
Problem 79, Page 138

An article in the Los Angeles Times (Dec. 3, 1993) reports that 1 in 200 people carry the
defective gene that causes inherited colon cancer. In a sample of 1000 individuals, what is the
approximate distribution of the number who carry this gene ? Use this distribution to calculate
the approximate probability that

(a) Between 5 and 8 (inclusive) carry the gene.

(b) At least 8 carry the gene.

Solution:
1
Define X = number of individuals with defective gene. Here p = 200
, n = 1000 and X ∼
1
Bin(100, 200 ). Now, since n = 1000 is large, p = 0.005 is small and λ = np = 1000 × 0.005 = 5
is less than 20, we can use the Poisson approximation in place of the binomial pmf.

(a) Using the Poisson pmf, we find that

P (5 ≤ X ≤ 8) = P (X ≤ 8) − P (X ≤ 4)
≈ 0.932 − 0.44 = 0.492.

(b) P (X ≥ 8) = 1 − P (X ≤ 7) ≈ 1 − 0.867 = 0.133.

Problem 81, Page 138

The number of requests for assistance received by a towing service is a Poisson process with
rate α = 4 per hour.

(a) Compute the probability that exactly ten requests are received during a particular 2-hour
period.

(b) If the operators of the towing service take a 30-min break for lunch, what is the probability
that they do not miss any calls for assistance ?

(c) How many calls would you expect during their break ?

Solution:

(a) Here, λ = 8 in 2-hours. Therefore,

P (X = 10) = P (X ≤ 10) − P (X ≤ 9)
= 0.816 − 0.717 = 0.099.

31
(b) Here, λ = 2 in 30-mins. The operators will not miss any calls if none came in. Thus, the
required probability is

P (X = 0) = P (X ≤ 0) = 0.135.

(c) E(X) = λ = 2.

Problem 84, Page 139

In proof testing of circuit boards, the probability that any circuit board will fail is 0.01. Suppose
a circuit board contains 200 diodes.

a. How many diodes would you expect to fail, and what is the standard deviation of the
number that are expected to fail ?

b. What is the (approximate) probability that at least four diodes will fail on a randomly
selected board ?

c. If five boards are shipped to a particular customer, how likely is it that at least four of them
will work properly ? (A board works properly only if all its diodes work.)

32

You might also like