You are on page 1of 27

SCHOOL OF MATHEMATICAL SCIENCES

MAT 1034 INTRODUCTION TO PROBABILITY


Chapter 4: Multivariate Probability Distribution

1. Joint Probability Distributions

Our study of random variables and their probability distributions in the preceding chapters is
restricted to one-dimensional sample spaces, where we recorded outcomes of an experiment as
values assumed by a single random variable. There will be situations, however, where we may
find it desirable to record the simultaneous outcomes of several variables. For example:

1. We might measure the amount of precipitate P and volume V of gas released from a
controlled chemical experiment, giving rise to a two-dimensional sample space
consisting of the outcomes (p, v).

2. We might be interested in the hardness H and tensile strength T of cold-drawn copper,


resulting in the outcomes (h, t).

3. In a study to determine the likelihood of success in college based on high school data,
we might use a three-dimensional sample space and record for each individual his or
her aptitude test score, high school class rank, and grade-point average at the end of
freshman year in college.

If X and Y are two discrete random variables, the probability distribution for their simultaneous
occurrence can be represented by a function with values f ( x, y ) for any pair of values ( x, y )
within the range of the random variables X and Y. This function is referred to as the joint
probability distribution of X and Y.

Hence, in the discrete case,


f ( x, y ) = P ( X = x , Y = y )
that is, the values f ( x, y ) give the probability that outcomes x and y occur at the same time.

Definition 1.1
The function f ( x, y ) is a joint probability distribution or probability mass function of the
discrete random variables X and Y if
1. f ( x, y )  0 for all ( x, y )
2.  f ( x, y ) = 1
x y

3. P ( X = x, Y = y ) = f ( x , y )
For any region A in the xy plane, P ( X , Y )  A =  f ( x, y )
A

1
SCHOOL OF MATHEMATICAL SCIENCES

The information for a discrete joint distribution can be neatly summarized in tabular form as
follows:

Y
y1 … yn p(x)
x1 p(x1,y1) … p(x1,yn) p(x1)
. . . .
X . . . .
. . . .
xm p(xm,y1) p(xm,yn) p(xm)
p(y) p(y1) … p(yn) 1

Example 1
Given the following joint probability distribution f ( x, y) :
Y
0 1 4
1 0.10 0.05 0.15
X 3 0.05 0.20 0.25
5 0.15 0.00 0.05

(a) Verify that this is a joint distribution table.


(b) Find P (X = 3, Y = 4).
(c) P[( X , Y )  R], where R is the region {( x, y) | x + y  4}.

Solution
(a)
Y g ( x)
0 1 4
1 0.10 0.05 0.15 0.3
X 3 0.05 0.20 0.25 0.5
5 0.15 0.00 0.05 0.2
h( y ) 0.30 0.25 0.45 1.0

Since all f ( x, y)  0 and  f ( x, y) = 1, this is a joint distribution table.


x y

(b) P(3, 4) = 0.25

(c) P ( x + y  4 ) = f (1, 0 ) + f (1,1) + f (3, 0) + f (3,1)


= 0.10 + 0.05 + 0.05 + 0.20 = 0.40

2
SCHOOL OF MATHEMATICAL SCIENCES

Example 2
Two refills for a ballpoint pen are selected at random from a box that contains 3 blue refills, 2
red refills, and 3 green refills. If X is the number of blue refills and Y is the number of red refills
selected, find

(a) The joint probability function f ( x, y) , and


(b) P[( X , Y )  R], where R is the region {( x, y) | x + y  1}.

Note: 0  x + y  2

Example 3
Roll a red die and a green die. Let X1 = number of dots on the red die, X2 = number of dots on
the green die. There are 36 points in the sample space.
Table: Possible Outcomes of Rolling a Red Die and a Green Die. (First number in the pair is
the number on the red die.)

3
SCHOOL OF MATHEMATICAL SCIENCES

Green 1 2 3 4 5 6
Red
1 1 1 1 2 1 3 1 4 1 5 1 6
2 2 1 2 2 2 3 2 4 2 5 2 6
3 3 1 3 2 3 3 3 4 3 5 3 6
4 4 1 4 2 4 3 4 4 4 5 4 6
5 5 1 5 2 5 3 5 4 5 5 5 6
6 6 1 6 2 6 3 6 4 6 5 6 6

1 1
The probability of (1, 1) is . The probability of (6, 3) is also .
36 36
 1  1
Now consider P ( 2  X 1  4,1  X 2  3) = f ( 2, 2 ) + f ( 2,3) + f ( 3, 2 ) + f ( 3,3) = 4   =
 36  9

When X and Y are continuous random variables, the joint density function f ( x, y ) is a surface
lying above the xy plane, and P ( X , Y )  A , where A is any region in the xy plane, is equal
to the volume of the right cylinder bounded by the base A and the surface f ( x, y ) .

Definition 1.2
The function f ( x, y) is a joint density function of the continuous random variables X and Y if
1. f ( x, y)  0 for all ( x, y)
 
2.   f ( x, y )dxdy = 1
− −

3. P[( X , Y )  R] =   f ( x, y ) dxdy for any region R in the xy plane.


R

Example 4
A candy company distributes boxes of chocolates with a mixture of creams, toffees, and nuts
coated in both light and dark chocolate. For a randomly selected box, let X and Y, respectively,
be the proportion of the light and dark chocolates that are creams and suppose that the joint
density function is

2
 (2 x + 3 y ) 0  x  1,0  y  1
f ( x, y ) =  5
 0 elsewhere

(a) Verify condition (2) of Definition 1.2.


(b) Find the P[( X , Y )  A] , where A is the region ( x, y ) | 0  x  1 , 1  y  1  .
 2 4 2

Solution
  1 1
2
(a)  f ( x, y )dxdy =   (2 x + 3 y)dxdy
− − 0 0
5

4
SCHOOL OF MATHEMATICAL SCIENCES
x =1
 2 x 2 6 xy 
1
=  +  dy
0
5 5  x =0
 2 6y 
1
=  + dy
0
5 5 
1
 2 y 3y2 
= + 
 5 5 0
2 3
= + =1
5 5
 1 1 1
(b) P[( X , Y )  A] = P  0  X  ,  Y  
 2 4 2
1 1
2 2
2
=   (2 x + 3 y ) dxdy
1 0 5
4
1
1 x=
2
 2 x 2 6 xy 
2
=  +  dy
1 5 5 
4 x =0
1

 1 3y 
2
=   + dy
1  10 5 
4
1
 y 3y2  2
= + 
 10 10  1
4

1  1 3   1 3   13
=  + − +  =
10  2 4   4 16   160

5
SCHOOL OF MATHEMATICAL SCIENCES

Example 5
Let f ( x, y ) = kx be a joint density function on the region R in the plane described by
0  x  y  1 . Find the value of k.

Example 6
An insurance company insures a large number of drivers. Let X be the random variable
representing the company’s losses under collision insurance, and let Y represent the company’s
losses under liability insurance. X and Y have joint density function

 2x + 2 − y
 , for 0  x  1 and 0  y  2
f ( x, y ) =  4

 0, otherwise.

What is the probability that the total loss is at least 1?

6
SCHOOL OF MATHEMATICAL SCIENCES

Example 7
Let X, Y, and Z have the joint probability density function

kxy 2 z , 0  x, y  1, 0  z  2
f ( x, y , z ) = 
 0, elsewhere

(a) Find k.
 1 1 
(b) Find P  X  , Y  ,1  Z  2  .
 4 2 
Solution
  
(a)    f ( x, y, z )dxdydz = 1
− − −
2 1 1

   kxy zdxdydz = 1
2

0 0 0
2 1 x =1
x2 y 2 z
0 0 k 2 dydz = 1
x =0
2 1
y2 z
k
0 0
2
dydz = 1

2 y =1
y3 z
0 6
k dz = 1
y =0
2
z
 k 6dz = 1
0
z =2
z2
k =1
12 z =0
 4
k   =1
 12 
12
k = =3
4

7
SCHOOL OF MATHEMATICAL SCIENCES
1

 
2 1 4
1 1
(b) P  X  , Y  ,1  Z  2  =    3xy 2 zdxdydz
 4 2  110
2
1
2 1 x=
3x 2 y 2 z 4
=  dydz
1 1
2 x =0
2
2 1
3y2 z
=  dydz
1 1
32
2
2 y =1
y3 z
= dz
1
32 y=
1
2

 z z 
2
=  − dz
1  32 256 
2
7z
= dz
1
256
z =2
7z2
=
512 z =1
28 7 21
= − =
512 512 512

2. Marginal Distributions

Given the joint probability distribution f ( x, y ) of the discrete random variables X and Y, the
probability distribution g(x) of X alone is obtained by summing f ( x, y ) over the values of Y.
Similarly, the probability distribution h(y) of Y alone is obtained by summing f (x, y) over the
values of X. We define g(x) and h(y) to be the marginal distributions of X and Y, respectively.
When X and Y are continuous random variables, summations are replaced by integrals. We can
now make the following general definitions.

Definition 2.1
The marginal distributions of X alone and of Y alone are
g ( x ) =  f ( x, y ) and h ( y ) =  f ( x, y )
y x

for the discrete case, and


 
g ( x) =  f ( x, y )dy and h ( y ) =  f ( x, y )dx
− −
for the continuous case.

The term marginal is used here because, in the discrete case, the values of g(x) and h(y) are just
the marginal totals of the respective columns and rows when the values of f(x, y) are displayed
in a rectangular table.
8
SCHOOL OF MATHEMATICAL SCIENCES

Example 8
By referring to Example 1, find the marginal distributions for X and Y.

Solution
Marginal distribution for X
x 1 3 5
g(x) 0.30 0.50 0.20

Marginal distribution for Y

Y 0 1 4
h(y) 0.30 0.25 0.45

Example 9
By referring to Example 2, find the marginal distribution of X and Y.

9
SCHOOL OF MATHEMATICAL SCIENCES

Example 10
Find the marginal distribution of X and Y.

2
 (2 x + 3 y ), 0  x  1, 0  y  1,
f ( x, y ) =  5
 0, elsewhere.

Solution

g ( x) =  f ( x, y )dy
−

1
2
=  (2 x + 3 y )dy
0
5
y =1
 4 xy 6 y 2 
= + 
 5 10  y =0
4x + 3
=
5

 4x + 3
 ,0  x 1
g ( x) =  5
 0, elsewhere


h( y) =  f ( x, y )dx
−
1
2
=  (2 x + 3 y )dx
0
5
x =1
 2 x 2 6 xy 
= + 
 5 5  x =0
2 + 6y
=
5

2 + 6y
 ,0  y 1
h( y) =  5
 0, elsewhere

Example 11
Let f ( x, y) = 6 x be a joint density function on the region R in the plane described by
0  x  y  1.

a) Find marginal density functions for X and Y.


b) Find E(X) and E(Y).

10
SCHOOL OF MATHEMATICAL SCIENCES

3. Conditional Probability Distributions

In Chapter 1, recall that the conditional probability is given as


P ( A  B)
P ( B A) = , provided P ( A)  0 ,
P ( A)
where A and B are now the events defined by X = x and Y = y , respectively, then
P ( X = x, Y = y ) f ( x , y )
P (Y = y X = x ) = = , provided g ( x )  0
P ( X = x) g ( x)
where X and Y are discrete random variables.

f ( x, y )
The function is strictly a function of y with x fixed and satisfies all the conditions of
g ( x)
a probability distribution. This is also true when f ( x, y ) and g ( x ) are the joint density and
marginal distribution, respectively, of continuous random variables. As a result, it is extremely
f ( x, y )
important that we make use of the special type of distribution of the form in order to
g ( x)
effectively compute conditional probabilities. This type of distribution is called a conditional
probability distribution.

Definition 3.1
Let X and Y be two random variables, discrete or continuous. The conditional distribution of
the random variable Y given that X = x is
f ( x, y )
f ( y x) = , provided g ( x )  0
g ( x)
Similarly, the conditional distribution of X given that Y = y is
11
SCHOOL OF MATHEMATICAL SCIENCES

f ( x, y )
f ( x y) = , provided h ( y )  0
h( y)
If we wish to find the probability that the discrete random variable X falls between a and b
when it is known that the discrete variable Y = y, we evaluate
P ( a  X  b Y = y ) =  f ( x y ),
a  x b
where the summation extends over all values of X between a and b. When X and Y are
continuous, we evaluate
b
P ( a  X  b Y = y ) =  f ( x y ) dx.
a

Example 12
By referring to Example 2, find the conditional distribution of X, given that Y = 1, and use it
to determine P( X = 0 | Y = 1).

Therefore, if it is known that 1 of the 2 pen refills selected is red, we have a probability equal
1
to that the other refill is not blue.
2

Example 13
The joint density for the random variables (X,Y), where X is the unit temperature change and Y
is the proportion of spectrum shift that a certain atomic particle produces is

10 xy 2 0  x  y 1
f ( x, y ) = 
 0 elsewhere

a) Find the marginal densities g(x), h(y), and the conditional density f ( y | x) .

12
SCHOOL OF MATHEMATICAL SCIENCES

b) Find the probability that the spectrum shifts more than half of the total observations,
given the temperature is increased to 0.25 units.

 1
(a) g ( x) =  f ( x, y )dy =  10 xy 2 dy
− x
y =1

x (1 − x3 ) , 0  x  1
10 3 10
= xy =
3 y=x 3
 y

h( y) =  f ( x, y )dx =  10 xy dx
2

− 0

2 x= y
= 5x2 y = 5 y4 , 0  y  1
x =0

f ( x, y ) 10 xy 2 3y2
f ( y x) = = = ,0  x  y 1
g ( x) 10
x (1 − x )
3 1 − x 3

 
1
P  Y  X = 0.25  =  f ( y x = 0.25 )dy
1
(b)
 2  1
2
1
3y2 8
= dy =
1 1 − 0.25
3
9
2

Example 14
Given the joint density function
 x (1 + 3 y 2 )
 , 0  x  2, 0  y  1
f ( x, y ) =  4

 0, elsewhere
1 1
Find g ( x ) , h ( y ) , f ( x y ) and evaluate P   X  Y =  .
1
4 2 3

13
SCHOOL OF MATHEMATICAL SCIENCES

Solution
 1
x (1 + 3 y 2 )
g ( x) =  f ( x, y )dy =  dy
− 0
4
y =1
 xy xy 3  x
= +  = ,0  x  2
 4 4  y =0 2
 2
x (1 + 3 y 2 )
h( y) =  f ( x, y )dx =  dx
− 0
4
x=2
 x 2 3x 2 y 2  1+ 3y2
= +  =
 8 8  x =0 2
f ( x, y ) x (1 + 3 y ) 1 + 3 y 2 x
2

f ( x y) = =  = ,
h( y) 4 2 2
and
1

1 1 1 x 3 2
P   X  Y =  =  dx =
4 2 3 1 2 64
4

4. Statistical Independence
If f ( x | y ) does not depend on y, as in the case for Example 14, then f ( x | y ) = g ( x) and
f ( x, y ) = g ( x)h( y ) . The proof follows by substituting
f ( x, y ) = f ( x y ) h ( y )
into the marginal distribution of X. That is,
 
g ( x) = 
−
f ( x, y )dy =  f ( x y ) h ( y )dy
−

If f ( x y ) does not depend on y, we may write



g ( x ) = f ( x y )  h ( y )dy
−

Note that  h ( y )dy = 1 since h ( y ) is the probability density function of Y. Therefore
−

g ( x ) = f ( x y ) and then f ( x, y ) = g ( x ) h ( y )

If f ( x | y ) does not depend on y, then the outcome of the random variable Y has no impact on
the outcome of the random variable X. In other words, we say that X and Y are independent
random variables.

14
SCHOOL OF MATHEMATICAL SCIENCES

Definition 4.1
Let X and Y be two random variables, discrete or continuous, with joint probability
distribution f ( x | y ) and marginal distributions g (x) and h( y ) , respectively. The random
variables X and Y are said to be statistically independent if and only if

f ( x, y) = g ( x).h( y)
for all (x,y) within their range.

The continuous random variables of Example 14 are statistically independent, since the product
of the two marginal distributions gives the joint density function. This is obviously not the case,
however, for the continuous random variables of Example 13. Checking for statistical
independence of discrete random variables requires a more thorough investigation, since it is
possible to have the product of the marginal distributions equal to the joint probability
distribution for some but not all combinations of ( x, y ) . If you can find any point ( x, y ) for
which f ( x, y ) is defined such that f ( x, y )  g ( x ) h ( y ) , the discrete variables are not
statistically independent.

Example 15
By referring to Example 2, is the number of blue refills in the sample independent of the number
of red refills? (Is X independent of Y?)

Example 16

6 xy 2 , 0  x  1;0  y  1
Let f ( x, y ) = 
0, elsewhere

Show that X and Y are independent.

15
SCHOOL OF MATHEMATICAL SCIENCES

5. The Covariance of Two Random Variables


Definition 5.1
Let X and Y be random variables with joint probability distribution f ( x, y ) and means  X
and Y , respectively. The covariance of X and Y is
 XY =Cov(X , Y ) = E[( X −  X )(Y − Y )]

If X and Y are discrete,


 XY = E[( X −  X )(Y − Y )] =  ( X −  X )(Y − Y ) f ( x, y)
x y

If X and Y are continuous,


 
 XY = E[( X −  X )(Y − Y )] =   (X − 
− −
X )(Y − Y ) f ( x, y )dxdy

Note:
1. Cov( X , Y ) is a measurement of the nature of the association between the random variables
X and Y. If large values of X often results in large values of Y or small values of X result in
small values of Y, positive X −  X will result in positive Y − Y and negative X −  X will
result in negative Y − Y . Thus, the product ( X −  X )(Y − Y ) will tend to be positive. On
the other hand, if large X values often result in small Y values, the product
( X −  X )(Y − Y ) will tend to be negative.
2. The sign of the covariance indicates whether the relationship between two dependent
random variables is positive or negative.
3. When X and Y are statistically independent, the Cov( X , Y ) =0. The converse is not
generally true, i.e. two variables may have zero covariance and still not be statistically
independent.
4. The covariance only describes the linear relationship between two random variables.
Therefore, if a covariance between X and Y is zero, X and Y may have a nonlinear
relationship, which means that they are not necessarily independent.

The alternative and preferred formula for  XY is stated in Theorem 5.1.

16
SCHOOL OF MATHEMATICAL SCIENCES

Theorem 5.1
If X and Y are random variables with means  X and Y , respectively, the covariance of X
and Y is
Cov( X , Y ) = E[( X −  X )(Y − Y )] = E[ XY ] − E[ X ]E[Y ] = E[ XY ] −  X Y

Proof
For the discrete case, we can write
 XY =  ( x −  X )( y − Y ) f ( x, y)
x y

=  xyf ( x, y) −  X  yf ( x, y) −Y  xf ( x, y) +  X Y  f ( x, y)
x y x y x y x y

Since
 X =  xf ( x, y), Y =  yf ( x, y), and  f ( x, y) = 1
x y x y x y

for any joint discrete distribution, it follows that


 XY = E  XY  −  X Y − Y  X +  X Y = E[ XY ] −  X Y
For the continuous case, the proof is identical with summations replaced by integrals.

Example 17
Find the covariance between X and Y with joint probability function:
Y
0 1 4
1 0.10 0.05 0.15
X 3 0.05 0.20 0.25
5 0.15 0.00 0.05

Solution
5 4
E  XY  =  xyf ( x, y )
x =1 y =0

= (1)( 0 ) f (1, 0 ) + (1)(1) f (1,1) + (1)( 4 ) f (1, 4 ) + ( 3)( 0 ) f ( 3, 0 ) + ( 3)(1) f ( 3,1) + ( 3 )( 4 ) f (3, 4 )
+ ( 5 )( 0 ) f ( 5, 0 ) + ( 5 )(1) f ( 5,1) + ( 5 )( 4 ) f ( 5, 4 )
= f (1,1) + 4 f (1, 4 ) + 3 f ( 3,1) + 12 f ( 3, 4 ) + 5 f ( 5,1) + 20 f ( 5, 4 )
= 0.05 + 4 ( 0.15) + 3 ( 0.20 ) + 12 ( 0.25) + 5 ( 0 ) + 20 ( 0.05)
= 5.25
Marginal Distribution for X
x
1 3 5
g (x) 0.30 0.5 0.20
5
 X =  xg ( x ) = (1)( 0.30 ) + ( 3)( 0.50 ) + ( 5 )( 0.20 ) = 2.8
x =1

17
SCHOOL OF MATHEMATICAL SCIENCES

Marginal Distribution for Y


y
0 1 4
h (y) 0.30 0.25 0.45
4
Y =  yh ( y ) = ( 0 )( 0.30 ) + (1)( 0.25) + ( 4 )( 0.45) = 2.05
y =0

 XY = E  XY  −  X Y
= 5.25 − ( 2.8 )( 2.05 )
= −0.49

Example 18
The fraction X of male runners and the fraction Y of female runners who compete in marathon
races are described by the joint density function
8 xy, 0  y  x  1
f ( x, y ) = 
 0, elsewhere
Find the covariance of X and Y.

Solution
4 x3 , 0  x  1
g ( x) = 
 0, elsewhere
4 y (1 − y ) , 0  y  1
 2

h( y) = 

 0, elsewhere
1
4
 X = E  X  =  4 x 4 dx =
0
5
1
Y = E Y  =  4 y 2 (1 − y 2 ) dy =
8
0
15
1 1
4
E  XY  =   8 x 2 y 2 dxdy =
0 y
9
4  4  8  4
 XY = E  XY  −  X Y = −    =
9  5  15  225

Example 19
Let X and Y be discrete random variables with a joint probability distribution shown as follows.
Show that X and Y are dependent but have zero covariance.

18
SCHOOL OF MATHEMATICAL SCIENCES

Solution
1 1
E  XY  =   xyf ( x, y )
x =−1 y =−1

= ( −1)( −1) f ( −1, −1) + ( −1)(1) f ( −1,1) + (1)( −1) f (1, −1) + (1)(1) f (1,1)
1 1 1 1
= − − + =0
16 16 16 16
Marginal Distribution for X
x
-1 0 1
g (x) 5 6 5
16 16 16
1
 5  5
 X =  xg ( x ) = ( −1)   + (1)   = 0
x =−1  16   16 
Marginal Distribution for Y
y
-1 0 1
h (y) 5 6 5
16 16 16
1
5 5
Y =  yh ( y ) = ( −1)   + (1)   = 0
y =−1  16   16 
 XY = E  XY  −  X Y
= 0−0
=0
 5  5  25 1
However, g ( −1) h ( −1) =    = but f ( −1, −1) = . Hence, X and Y are not
 16  16  256 16
independent since f ( −1, −1)  g ( −1) h ( −1) .

19
SCHOOL OF MATHEMATICAL SCIENCES

Although the covariance between two random variables does provide information regarding
the nature of the relationship, the magnitude of  XY does not indicate anything regarding the
strength of the relationship, since  XY is not scale-free. Its magnitude will depend on the units
used to measure both X and Y. There is a scale-free version of the covariance called the
correlation coefficient that is used widely in statistics.

Definition 5.2
Let X and Y be random variables with covariance  XY and standard deviation  X and  Y ,
respectively. The correlation coefficient of X and Y is
 XY
 XY = , where -1 ≤  ≤ 1
 XY

 XY is free of the units of X and Y. It assumes a value of zero when  XY = 0. When there is an
exact linear dependency, say Y = a + bX ,  XY = 1 if b > 0 and  XY = −1 if b < 0.

Example 20
Find the correlation coefficient between X and Y in Example 17.
Solution
E  X 2  = (12 ) ( 0.30 ) + ( 32 ) ( 0.5 ) + ( 52 ) ( 0.20 ) = 9.8
E Y 2  = ( 02 ) ( 0.30 ) + (12 ) ( 0.25 ) + ( 42 ) ( 0.45 ) = 7.45

 2 = 9.8 − ( 2.8 ) = 1.96


2
X

 2 = 7.45 − ( 2.05) = 3.2475


2
Y

 XY −0.49 −0.49
 XY = = = = −0.1942
 XY 1.96 3.2475 (1.4 )(1.8021)

Example 21
Find the correlation coefficient of X and Y in Example 18.
Solution
1
2
E  X  =  4 x5 dx =
2

0
3
1
E Y 2  =  4 y 3 (1 − y 2 )dy = 1 −
2 1
=
0
3 3
2
2 4 2
 2 = −  =
X
3  5  75
2
1 8 11
Y = −  =
2

3  15  225
 4 / 225 4
 XY = XY = = = 0.4924
 XY 2 / 75 11/ 225 66

20
SCHOOL OF MATHEMATICAL SCIENCES

Note that although the covariance in Example 20 is larger in magnitude (disregarding the sign)
that that in Example 21, the relationship of the magnitude of the correlation coefficients in
these two examples is just the reverse. This is evidence that we cannot look at the magnitude
of the covariance to decide on how strong the relationship is.

Theorem 5.2
E  X  Y  = E  X   E Y 
If X represent the daily production of some item from machine A and Y the daily production of
the same kind of item from machine B, then X + Y represents the total number of items produced
daily by both machines. Theorem 5.2 states that the average daily production for both machines
is equal to the sum of the average daily production of each machine.

Theorem 5.3
Let X and Y be two independent variables. Then
E  XY  = E  X  E Y 
Proof
 
E  XY  =   xyf ( x, y )dxdy
− −

Since X and Y are independent


f ( x, y ) = g ( x ) h ( y )
where g ( x ) and h ( y ) are the marginal distributions of X and Y, respectively. Hence
   
E  XY  =  xyg ( x ) h ( y )dxdy =  xg ( x ) dx  yh ( y ) dy
− − − −

= E  X  E Y 

Theorem 5.2 can be illustrated for discrete variables by considering the experiment of tossing
a green die and a red die. Let the random variable X represent the outcome on the green die and
the random variable Y represent the outcome on the red die. Then XY represents the product of
the numbers that occur on the pair of dice. In the long run, the average of the products of the
numbers is equal to the product of the average number that occurs on the green die and the
average of the number that occurs on the red die.

Theorem 5.4
If X and Y are independent random variables, then
 XY = 0
Proof:
 XY = E[ XY ] −  X Y

= E[ X ]E[Y ] −  X Y (X and Y independent)

=0

21
SCHOOL OF MATHEMATICAL SCIENCES

Theorem 5.5
If X and Y are random variables with joint probability distribution f ( x, y) , then
Var(aX + bY + c) = a 2 Var(X ) + b 2 Var(Y ) + 2abCov( X , Y )
Proof:

Var(aX + bY + c) = E  (ax + by + c) − aX +bY +c  


2
 

= E (ax + by + c) − a X − bY − c  
2
 

= E  a( x −  X ) + b ( y − Y ) 
2

 

  
= a 2 E ( x −  X ) 2 + 2abE( x −  X )( y − Y ) + b 2 E ( y − Y ) 2 
= a 2 X2 + 2ab XY + b 2 Y2

= a 2 Var( X ) + 2abCov( X , Y ) + b 2 Var(Y )

Note: Var(aX + bY + c) = Var(aX + bY )

Var (aX − bY ) = a 2 Var( X ) + b 2 Var(Y ) − 2abCov( X , Y )

Corollary:

1. Setting b = 0, Var ( aX ) = a 2Var ( X )


2. Setting a = 1 and b = 0, Var ( X + c ) = Var ( X )
3. Setting b = 0 and c = 0, Var ( aX ) = a 2Var ( X )
4. If X and Y are independent random variables, then
Var(a X + bY ) = a 2 Var( X ) + b 2 Var(Y )
5. If X and Y are independent random variables, then
Var(a X − bY ) = a 2 Var( X ) + b 2 Var(Y )
6. If X1 , X2 ,…, Xn are independent random variables, then
Var(a1 X1 + a2 X 2 + ... + an X n ) = a12 Var(X1 ) + ... + an 2Var(X n )

Corollary 1 to 3 state that the variance is unchanged if a constant is added to or subtracted from
a random variable. The addition or subtraction of a constant simply shifts the values of X to the
right or to the left but does not change their variability. However, if a random variable is
multiplied or divided by a constant, then Corollary 1 and 3 state that the variance is multiplied
or divided by the square of the constant.
The result stated in Corollary 4 is obtained from Theorem 5.5 by invoking Theorem 5.4.
Corollary 5 follows when b in Corollary 4 is replaced by -b.
Generalizing to a linear combination of n independent random variables, we have Corollary 6.

22
SCHOOL OF MATHEMATICAL SCIENCES

Example 22
Suppose E[X] = -3, E[X2] = 13, Var [Y] = 20, E[Y] = 4, and E [XY] = 7. Find Var [5X – 9Y].
Solution

Var  X  = E  X 2  − ( E  X )
2

= 13 − ( −3) = 4
2

Cov  XY  = E  XY  − E  X  E Y 
= 7 − ( −3)( 4 )
= 19
Var 5 X − 9Y  = 25Var  X  + 81Var Y  − 2 ( 5 )( 9 ) Cov  XY 
= 25 ( 4 ) + 81( 20 ) − 90 (19 ) = 100 + 1620 − 1710 = 10

Example 23
If X and Y are random variables with variances  X2 = 2 and  Y2 = 4 and covariance  XY
2
= −2 ,
find the variance of the random variable Z = 3 X − 4Y + 8 .
Solution
Var(3 X − 4Y + 8) = 9Var(X ) + 16Var(Y ) − 2 ( 3)( 4 ) Cov( X , Y )
= ( 9 )( 2 ) + (16 )( 4 ) − ( 24 )( −2 ) = 130

Example 24
Let X and Y denote the amounts of two different types of impurities in a batch of a certain
chemical product. Suppose that X and Y are independent random variables with variances
 X2 = 2 and  Y2 = 3 . Find the variance of the random variable Z = 3 X − 2Y + 5
Solution
Var(3 X − 2Y + 5) = 9Var(X ) + 4Var(Y )
= ( 9 )( 2 ) + ( 4 )( 3) = 30

6. The Multinomial Probability Distribution


The binomial experiment in chapter 2 becomes a multinomial experiment if we let each trial
have more than two possible outcomes. The classification of a manufactured product as being
light, heavy, or acceptable and the recording of accidents at a certain intersection according to
the day of the week (for example, the number of accidents on Monday, Tuesday, …, Friday)
constitute multinomial experiments. The drawing of a card from a deck with replacement is
also a multinomial experiment if the 4 suits are the outcomes of interest.

In general, if a given trial can result in any one of k possible outcomes E1, E2, …, Ek with
probabilities p1 , p2 ,..., pk , then the multinomial distribution will give the probability that E1
occurs x1 times, E2 occurs x2 times, …, and Ek occurs xk times in n independent trials, where
23
SCHOOL OF MATHEMATICAL SCIENCES

x1 + x2 + ... + xk = n
We shall denote this joint probability distribution by
f ( x1 , x2 ,..., xk ; p1 , p2 ,..., pk , n ) ,
where p1 + p2 + ... + pk = 1 , since the result of each trial must be one of the k possible outcomes.
The following shows the multinomial distribution.

Multinomial Distribution
If a given trial can result in the k outcomes E1, E2, …, Ek with probabilities p1 , p2 ,..., pk , then
the probability distribution of the random variables X 1 , X 2 ,..., X k representing the number of
occurrences for E1, E2, …, Ek in n independent trials, is

 n  x1 x2
f ( x1 , x2 ,..., xk ; p1 , p2 ,..., pk , n ) =  xk
 p1 p 2 ... p k ,
 1 2
x , x ,..., xk 

with
k k

 xi = n and
i =1
p
i =1
i =1

Example 25
A certain city has 3 newspapers, A, B, and C. Newspaper A has 50 percent of the readers in
the city. Newspaper B, has 30 percent of the readers, and newspaper C has the remaining 20
percent. Find the probability that, among 8 randomly chosen readers in that city, 5 will read
newspaper A, 2 will read newspaper B, and 1 will read newspaper C. (assume no one reads
more than one newspaper)
Solution

 8 
f ( 5, 2,1;0.5, 0.3, 0.2,8 ) =   ( 0.5 ) ( 0.3) ( 0.2 )
5 2 1

 5, 2,1
8!
= ( 0.5) ( 0.3) ( 0.2 )
5 2 1

5!2!1!
= 0.0945

Example 26
The complexity of arrivals and departures of planes at an airport is such that computer
simulation is often used to model the “ideal” conditions. For a certain airport with three
runways, it is known that in the ideal setting the following are the probabilities that the
individual runways are accessed by a randomly arriving commercial jet:
2
Runway 1: p1 =
9
1
Runway 2: p2 =
6
24
SCHOOL OF MATHEMATICAL SCIENCES

11
Runway 3: p3 =
18
What is the probability that 6 randomly arriving airplanes are distributed in the following
fashion?
Runway 1: 2 airplanes
Runway 2: 1 airplane
Runway 3: 3 airplanes
Solution
 6   2   1   11 
2 1 3
 2 1 11 
f  2,1,3; , , ,8  6 =       
 9 6 18   2,1,3   9   6   18 
2 1 3
6!  2   1   11 
=      
2!1!3!  9   6   18 
= 0.1127

7. Conditional Expectations

Definition 7.1
If X and Y are any two random variables, the conditional expectation of g(X), given that Y =
y, is defined to be

E[ g ( X ) | Y = y] =  g ( x) f ( x | y)dx if X and Y are jointly continuous and
−

E[ g ( X ) | Y = y ] =  g ( x) p( x | y) if X and Y are jointly discrete.


all x

Let us denote by E  X Y  that function of the random variable Y whose value at Y = y is


E  X Y = y  . Note that E  X Y  is itself a random variable. An extremely important property
of conditional expectation is that for all random variables X and Y

E[ X ] = E[ E[ X | Y ]] (7.1)

If Y is a discrete random variable, then Equation (7.1) states that

E  X  =  E  X Y = y  P (Y = y ) (7.2)
y

while if Y is continuous with density fY ( y ) , then Equation (7.1) says that



EX  =  E  X Y = y  f ( y ) dy
Y (7.3)
−
The following shows the proof for Equation (7.2). The proof for Equation (7.3) is similar by
changing the summation notation to integrals.

25
SCHOOL OF MATHEMATICAL SCIENCES

Proof

 E  X Y = y  P (Y = y ) =  xP ( X = x Y = y )P (Y = y )
y y x

P ( X = x, Y = y )
=  x P (Y = y )
y x P (Y = y )
=  xP ( X = x, Y = y )
y x

=  x  P ( X = x, Y = y )
x y

=  xP ( X = x )
x

= EX 

Example 27
A quality control plan for an assembly line involves sampling n =10 finished items per day and
counting Y, the number of defectives. If p denotes the probability of observing a defective, the
Y has a binomial distribution, assuming a large number of items are produced by the line. But
p varies from day to day and is assumed to have a uniform distribution on the interval from 0
to ¼. Find the expected value of Y.
Solution
 1
Y ~ Bin (10, p ) , where p ~ Uniform  0, 
 4

E Y  =  E Y p  f ( p ) dp
−

E Y p  = np = 10 p
 1
 4, 0  p 
f ( p) =  4
0, elsewhere

E Y  =  E Y p  f ( p ) dp
−
1
4
=  (10 p )( 4 ) dp
0
1
=  20 p 2  4
0

20 5
= =
16 4

26
SCHOOL OF MATHEMATICAL SCIENCES

Example 28
A professor works in Moon Township and lives in Pittsburgh. It is about a 25 mile commute.
The professor randomly chooses from 3 different routes home in a futile attempt to evade rush
hour traffic. The routes are identified by the name of a major bridge along the way. The
professor has accumulated data over a lengthy period of time on the mean drive times of the
three routes. Using the data summary given below,
Route Probability of Route Expected Time of
Route (in minutes)
Wickle bridge 0.2 55
Fort bridge 0.4 50
Liberty bridge 0.4 45

Calculate the overall expected drive time.

Solution
Let X be the drive time
Y = 1 be route Wickle bridge is chosen
Y = 2 be route Fort bridge is chosen
Y = 3 be route Liberty bridge is chosen
E  X  =  E  X Y = y  P (Y = y )
y

= E  X Y = 1 P (Y = 1) + E  X Y = 2 P (Y = 2 ) + E  X Y = 3 P (Y = 3)


= ( 55 )( 0.2 ) + ( 50 )( 0.4 ) + ( 45 )( 0.4 )
= 49 minutes

27

You might also like