Professional Documents
Culture Documents
and thus we are furnished with two r.v.’s or a two-dimensional random vector. Let
us denote by (X, Y) the two relevant r.v.’s or the two-dimensional random vector.
Here are some examples where two r.v.’s arise in a natural way. The pair of r.v.’s
(X, Y) denotes, respectively: The SAT and GPA scores of a student from a specified
student population; the number of customers waiting for service in two lines in your
local favorite bank; the days of a given year that the Dow Jones Industrial Average
index closed with a gain and the corresponding gains; the number of hours a student
spends daily for studying and for other activities; the weight and the height of an
individual from a targeted population; the amount of fertilizer used and the yield of a
certain agricultural commodity; the lifetimes of two components used in an electronic
system; the dosage of a drug used for treating a certain allergy and the number of days
a patient enjoys relief.
We are going to restrict ourselves to the case where both X and Y are either discrete
or of the continuous type. The concepts of probability distribution function, distri-
bution function, and probability density function are defined by a straightforward
generalization of the definition of these concepts in Section 2.2 of Chapter 2. Thus, the
joint probability distribution function of (X, Y), to be denoted by PX,Y , is defined by:
PX,Y (B) = P[(X, Y) ∈ B], B ⊆ 2 = × , the two-dimensional Euclidean space,
the plane. In particular, by taking B = (−∞, x] × (−∞, y], we obtain the joint d.f.
of X, Y, to be denoted by FX,Y ; namely, FX,Y (x, y) = P(X ≤ x, Y ≤ y), x, y ∈ . The
d.f. FX,Y has properties similar to the ones mentioned in the case of a single r.v.,
namely:
1. 0 ≤ FX,Y (x, y) ≤ 1 for all x, y ∈ .
Whereas it is, clearly, still true that x1 ≤ x2 and y1 ≤ y2 imply FX,Y (x1 , y1 ) ≤
FX,Y (x2 , y2 ), property #2 in the case of a single r.v. may be restated as follows:
x1 < x2 implies FX (x2 ) − FX (x1 ) ≥ 0. This property is replaced here by:
2. The variation of FX,Y over rectangles with sides parallel to the axes, given in
Figure 4.1, is ≥ 0.
3. FX,Y is continuous from the right (right-continuous); i.e., if xn ↓ x and yn ↓ y, then
FX,Y (xn , yn ) → FX,Y (x, y) as n → ∞.
4. FX,Y (+∞, +∞) = 1 and FX,Y (−∞, −∞) = FX,Y (−∞, y) = FX,Y (x, −∞) = 0
for any x, y ∈ , where, of course, FX,Y (+∞, +∞) is defined to be the
limn→∞ FX,Y (xn , yn ) as xn ↑ ∞ and yn ↑ ∞, and similarly for the remaining
cases.
Property #1 is immediate, and property #2 follows by the fact that the variation of
FX,Y as described is simply the probability that the pair (X, Y) lies in the rectangle of
Figure 4.1, or, more precisely, the probability P(x1 < X ≤ x2 , y1 < Y ≤ y2 ), which,
of course, is ≥ 0; the justification of properties #3 and #4 is based on Theorem 2 in
Chapter 2.
Now, suppose that the r.v.’s X and Y are discrete and take on the values xi
and yj , i, j ≥ 1, respectively. Then the joint p.d.f. of X and Y, to be denoted by
fX,Y , is defined by: fX,Y (xi , yj ) = P(X = xi , Y = yj ) and fX,Y (x, y) = 0 when (x, y) =
4.1 Joint d.f. and joint p.d.f. of two random variables 137
y
(x1, y2)
y2 (x2, y2)
− +
+ −
y1 (x2, y1)
(x1, y1)
x
0 x1 x2
FIGURE 4.1
The Variation V of FX , Y over the rectangle is FX , Y (x1 , y1 ) + FX , Y (x2 , y2 ) − FX , Y (x1 , y2 ) −
FX , Y (x2 , y1 ).
Example 1. Each one of the r.v.’s X and Y takes on four values only, 0, 1, 2, 3, with
joint probabilities expressed best in a matrix form as in Table 4.1.
Discussion. The r.v.’s X and Y may represent, for instance, the number of customers
waiting for service in two lines in a bank. Then, for example, for (x, y) with x = 2
and y = 1, we have FX,Y (x, y) = FX,Y (2, 1) = u≤2,v≤1 fX,Y (u, v) = fX,Y (0, 0) +
fX,Y (0, 1) + fX,Y (1, 0) + fX,Y (1, 1) + fX,Y (2, 0) + fX,Y (2, 1) = 0.05 + 0.20 + 0.21 +
y\x 0 1 2 3 Totals
0.26 + 0 + 0.08 = 0.80; also, P(2 ≤ X ≤ 3, 0 ≤ Y ≤ 2) = fX,Y (2, 0) + fX,Y (2, 1)+
fX,Y (2, 2) + fX,Y (3, 0) + fX,Y (3, 1) + fX,Y (3, 2) = 0 + 0.08 + 0.07 + 0 + 0 +
0.02=0.17.
Now, suppose that both X and Y are of the continuous type, and, indeed, a little
bit more; namely, there exists a nonNegative
y x function fX,Y defined on 2 such that,
for all x and y in : FX,Y (x, y) = −∞ −∞ fX,Y (s, t) ds dt. Then for B ⊆ 2 (inter-
pret B as a familiar Geometric figure in 2 ): P[(X, Y) ∈ B] = B fX,Y (x, y) dx dy,
∞ ∞
and, in particular, −∞ −∞ fX,Y (x, y) dx dy = P[(X, Y) ∈ 2 ] = 1. The function
fX,Y is called the joint p.d.f. of the r.v.’s X and Y. Analogously to the case of
a single r.v., the relationship ∂x∂ ∂y FX,Y (x, y) = fX,Y (x, y) holds true (for continuity
2
points (x, y) of fX,Y )), so that not only does the joint p.d.f. determine the joint
d.f. through an integration process, but the converse is also true; i.e., the joint d.f.
determines the joint p.d.f. through differentiation. Again, as in the case of a single
r.v., P(X = x, Y = y) = 0 for all x, y ∈ ; also, if a nonNegative function f , defined
on 2 , integrates to 1, then there exist two r.v.’s X and Y for which f is their
joint p.d.f.
This section is concluded with a reference to Example 37 in Chapter 1, where two
continuous r.v.’s X and Y arise in a natural manner. Later on (see Subsection 4.5.2),
it may be stipulated that the joint distribution of X and Y is the Bivariate Normal. For
the sake of a simpler illustration, consider the following example.
Example 2. Let the r.v.’s X and Y have the joint p.d.f. fX,Y (x, y) = λ1 λ2 e−λ1 x−λ2 y ,
x, y > 0, λ1 , λ2 > 0. For example, X and Y may represent the lifetimes of two
components in an electronic system. Derive the joint d.f. FX,Y .
By letting x and y → ∞, we obtain FX,Y (∞, ∞) = 1, which also shows that fX,Y , as
∞∞
given above, is, indeed, a p.d.f., since FX,Y (∞, ∞) = 0 0 λ1 λ2 × e−λ1 s−λ2 t ds dt.
Example 3. It is claimed that the function FX,Y given by: FX,Y (x, y) = 1
16 xy ×
(x + y), 0 ≤ x ≤ 2, 0 ≤ y ≤ 2, is the joint d.f. of the r.v.’s X and Y. Then:
(i) Verify that FX,Y is, indeed, a d.f.
(ii) Determine the corresponding joint p.d.f. fX,Y .
4.1 Joint d.f. and joint p.d.f. of two random variables 139
(iii) Verify that fX,Y found in part (ii) is, indeed, a p.d.f.
(iv) Calculate the probability: P(0 ≤ X ≤ 1, 1 ≤ Y ≤ 2).
Discussion.
(i) We have to verify the validity of the defining relations (1)–(4). Clearly,
FX,Y (x, y) attains its maximum for x = y = 2, which is 1. Since also
FX,Y (x, y) ≥ 0, the first property holds. Next, for any rectangle as in Figure 4.1,
we have:
1
= (2 × 2 + 2 × 2)
8
= 1.
21 2 1
(iv) Here, P(0 ≤ X ≤ 1, 1 ≤ Y ≤ 2) = 1 0 18 (x + y)dx dy = 18 [ 1 ( 0 x dx)dy +
2 1
1 (y 0 dx)dy] = 8 ( 2 × 1 + 1 × 2 ) = 4 .
1 1 3 1
140 CHAPTER 4 Joint and Conditional p.d.f.’s
1 (1, 1)
y = x2
y (x, y)
x
0 x 1
FIGURE 4.2
Range of the pair (x, y ).
Example 4. If the function fX,Y is given by: fX,Y (x, y) = cx2 y for 0 < x2 ≤ y < 1,
x > 0 (and 0 otherwise):
(i) Determine the constant c, so that fX,Y is a p.d.f.
(ii) Calculate the probability: P(0 < X < 34 , 14 ≤ Y < 1).
Discussion.
(i) Clearly, for the function to be nonNegative, c must be > 0. The actual value of
c will be determined through the relationship below for x > 0:
cx2 y dx dy = 1.
{(x,y);0<x2 ≤y<1}
The region over which the p.d.f. is positive is the shaded region in Figure 4.2,
determined by a branch of the parabola y = x2 , the y-axis, and the line segment
connecting the points (0, 1) and (1, 1). Since for each fixed x with 0 < x < 1, y
1 21
ranges from x2 to 1, we have: {x2 ≤y<1} cx y dx dy = c 0 (x x2 y dy) dx =
2
c 1 2
2 0 x (1 − x )dx = 2 ( 3 − 7 ) = 21 = 1 and c = 2 .
4 c 1 1 2c 21
(ii) Since y = x = 4 for x = 2 , it follows that, for each x with 0 < x ≤ 12 , the range
2 1 1
of y is from 14 to 1; on the other hand, for each x with 12 < x ≤ 34 , the range of y
is from x2 to 1 (see Figure 4.3).
4.1 Joint d.f. and joint p.d.f. of two random variables 141
1 (1, 1)
1/4
x
0 1/2 3/4 1
FIGURE 4.3
Diagram facilitating integration.
Thus,
1 1 3 1
3 1 2 4
P 0<X≤ , ≤Y<1 = c x2 y dy dx + c x2 y dy dx
4 4 0 1
4
1
2 x2
1 1 3 1
2 4
= c x2 y dy dx + c x2 y dy dx
1 1
0 4 2 x2
1 3
c 2 1 c 4 2
= x2 1 − dx + x (1 − x4 ) dx
2 0 16 2 12
EXERCISES
1.1 Let X and Y be r.v.’s denoting, respectively, the number of cars and buses
lined up at a stoplight at a given point in time, and suppose their joint p.d.f. is
given by the following table:
y\x 0 1 2 3 4 5
Let X and Y be the r.v.’s denoting the number of children in a family from the
target population and the number of boys among those children, respectively.
Finally, suppose that P(observing a boy) = P(observing a girl) = 0.5.
Calculate the joint p.d.f. fX,Y (x, y) = P(X = x, Y = y), 0 ≤ y ≤ x, x = 0, 1, 2.
Hint. Tabulate the joint probabilities as indicated below by utilizing the
formula:
P(X = x, Y = y) = P(Y = y | X = x)P(X = x).
y\x 0 1 2
0
1
2
1.3 If the r.v.’s X and Y have the joint p.d.f. given by:
fX,Y (x, y) = x + y, 0 < x < 1, 0 < y < 1,
1.6 If the r.v.’s X and Y have the joint p.d.f. fX,Y (x, y) = e−x−y , for x > 0 and
y > 0, compute the following probabilities:
(i) P(X ≤ x); (ii) P(Y ≤ y); (iii) P(X < Y); (iv) P(X + Y ≤ 3).
Hint. For part (iii), draw the picture of the set for which 0 < x < y, and for
part(iv), draw the picture of the set for which 0 < x + y ≤ 3
1.7 Let X and Y be r.v.’s jointly distributed with p.d.f. fX,Y (x, y) = 2/c2 , for
0 < x ≤ y < c.
Determine the constant c.
Hint. Draw the picture of the set for which 0 < x ≤ y < c.
1.8 The r.v.’s X and Y have the joint p.d.f. fX,Y given by:
Determine the condition that c1 and c2 must satisfy so that fX,Y is, indeed, a
p.d.f.
Hint. All that can be done here is to find a relation that c1 and c2 satisfy; c1
and c2 cannot be determined separately.
1.10 The joint p.d.f. of the r.v.’s X and Y is given by:
fX,Y (x, y) = cx, x > 0, y > 0, 1 ≤ x + y < 2 (c > 0).
x
+
y
=
2
2
y=2−x
x
+
y
=
1
y=1−x
y=2−x
0 x 1 x 2
FIGURE 4.4
Diagram facilitating integration.
fX|Y (x|y) = fX,Y (x, y)/fY (y) for fixed y with fY (y) > 0,
and
fY|X (y|x) = fX,Y (x, y)/fX (x) for fixed x with fX (x) > 0.
These nonNegative functions are, actually, p.d.f.’s. For example, for the continuous
case:
∞ ∞
1 fY (y)
fX|Y (x|y) dx = fX,Y (x, y) dx = = 1,
−∞ fY (y) −∞ fY (y)
and similarly for fY|X (y|x); in the discrete case, integrals are replaced by summation
signs. The p.d.f. fX|Y (·|y) is called the conditional p.d.f. of X, given Y = y, and
fY|X (· | x) is the conditional p.d.f. of Y, given X = x. The motivation for this
f (x,y) P(X=x,Y=y)
terminology is as follows: For the discrete case, fX|Y (x | y) = X,Y
fY (y) = P(Y=y) =
P(X = x | Y = y); i.e., fX|Y (x | y) does, indeed, stand for the conditional probability
that X = x, given that Y = y. Likewise for fY|X (· | x). In the continuous case, the
points x and y are to be replaced by “small” intervals around them.
The concepts introduced so far are now illustrated by means of examples.
Example 5. Refer to Example 1 and derive the marginal and conditional p.d.f.’s
involved.
Discussion. From Table 4.1, we have: fX (0) = 0.25, fX (1) = 0.53, fX (2) = 0.18,
and fX (3) = 0.04; also, fY (0) = 0.26, fY (1) = 0.54, fY (2) = 0.15, and fY (3) = 0.05.
Thus, the probability that there are two people in line one, for instance, regard-
less of how many people are in the other line, is P(X = 2) = fX (2) = 0.18. For
fX|Y (x|y):
y\x 0 1 2 3
0.05 5 0.21 21
0 0.26 = 26 0.192 0.26 = 26 0.808 0 0
0.20 20 0.26 26 0.08 8
1 0.54 = 54 0.37 0.54 = 54 0.482 0.54 = 54 0.148 0
0.06 6 0.07 7 0.02 2
2 0 0.15 = 15 = 0.40 0.15 = 15 0.467 0.15 = 15 0.133
0.03 3 0.02 2
3 0 0 0.05 = 5 = 0.60 0.05 = 5 = 0.40
y\x 0 1 2 3
21
0 0.2 53 0.396 0 0
26 8
1 0.8 53 0.491 18 0.444 0
6 7
2 0 53 0.113 18 0.389 0.5
3
3 0 0 18 0.167 0.5
146 CHAPTER 4 Joint and Conditional p.d.f.’s
Example 6. Refer to Example 2 and derive the marginal d.f.’s and p.d.f.’s, as well
as the conditional p.d.f.’s, involved.
Example 7. Refer to Example 4 and determine the marginal and conditional p.d.f.’s
fX , fY , fX|Y , and fY|X .
Discussion. We have:
1 1 21 2
fX (x) = cx2 y dy = cx2 y dy = x (1 − x4 ), 0 < x < 1,
x2 x2 4
√y √y
7 2√
fY (y) = cx2 y dx = cy x2 dx = y y, 0 < y < 1,
0 0 2
and therefore
21 2
2 x y 3x2 √
fX|Y (x|y) = 21 2 √
= √ , 0<x≤ y, 0 < y < 1,
6 y y y y
21 2
2 x y 2y
fY|X (y|x) = = , x2 ≤ y < 1, 0 < x < 1.
4 x (1 − x )
21 2 4 1 − x4
Discussion.
(i) Since fX,Y is nonNegative, all we have to check is that it integrates to 1. In fact,
1 y 1 y 1
8xy dx dy = 8 y x dx dy = 4 y3 dy = 1.
0 0 0 0 0
4.2 Marginal & conditional p.d.f.’s, conditional expectation & variance 147
(ii)
1 1
fX (x) = 8xy dy = 8x y dy = 4x(1 − x2 ), 0 < x < 1,
x x
y y
fY (y) = 8xy dx = 8y x dx = 4y3 , 0 < y < 1,
0 0
and therefore
8xy 2x
fX|Y (x|y) = = 2, 0 < x ≤ y < 1,
4y3 y
8xy 2y
fY|X (y|x) = = , 0 < x ≤ y < 1.
4x(1 − x2 ) 1 − x2
(iii)
1 1 1 1
8
EX = x × 4x(1 − x2 )dx = 4 x2 (1 − x2 )dx = 4 x2 dx − x4 dx = ,
0 0 0 0 15
1 1 1
1
EX 2 = x2 × 4x(1 − x2 )dx = 4 x3 dx − x5 dx = , so that
0 0 0 3
1 64 11
Var(X) = EX 2 − (EX)2 = − = ;
3 225 225
1 1
4
EY = y × 4y3 dy = 4 y4 dy = ,
0 0 5
1 1
2
EY 2 = y2 × 4y3 dy = 4 y5 dy = , so that
0 0 3
2 16 2 6
Var(Y) = EY 2 − (EY)2 = − = = .
3 25 75 225
Finally,
1 y 1 y
E(XY) = xy × 8xy dx dy = 8 y2 x2 dx dy
0 0 0 0
1
8 4
= y5 dy = .
3 0 9
Once a conditional p.d.f. is at hand, an expectation can be defined as done in
relations (1), (2), and (3) of Chapter 3. However, a modified notation will be
needed to reveal the fact that the expectation is calculated with respect to a
conditional p.d.f. The resulting expectation is the conditional expectation of
one r.v., given the other r.v., as specified below.
∞
E(X | Y = yj ) = xi fX|Y (xi | yj ) or E(X | Y = y) = xfX|Y (x | y)dx, (2)
xi ∈ −∞
148 CHAPTER 4 Joint and Conditional p.d.f.’s
(3)
Of course, it is understood that the preceding expectations exist as explained
right after relations (2) and (3) in Chapter 3 were defined. However, unlike the
results in (1)–(3) in Chapter 3 which are numbers, in relations (2) and (3) above
the outcomes depend on yj or y, and xi or x, respectively, which reflect the values
that the “conditioning” r.v.’s assume. For illustrative purposes, let us calculate
some conditional expectations.
Remark 1. However, Var [E(X | Y)] ≤Var(X) with equality holding, if and only
if Y is a function of X (with probability 1). A proof of this fact may be found in
Section 5.3.1 in the book A Course in Mathematical Statistics, 2nd edition (1997),
Academic Press, by G. G. Roussas.
Example 13. Verify the first relation E[E(X | Y)] = EX, in (4) for Examples 4
and 8.
1 1 1
21 21 21
EX = x × x2 (1 − x4 )dx = x3 dx − x7 dx = .
0 4 4 0 0 32
150 CHAPTER 4 Joint and Conditional p.d.f.’s
√
From Example 11, E(X | Y) = 3 4 Y , 0 < Y < 1, whereas, from Example 7, fY (y) =
21 2 √
6 y y, 0 < y < 1, so that
1 √
3 y 21 2 √ 21 1 3 21
E[E(X | Y)] = × y y dy = y dy = = EX.
0 4 6 8 0 32
√ √ √
(However, Var[E(X | Y)] = Var( 3 4 Y ) = 16
9
Var( Y) = 16 9
[EY − (E Y)2 ] =
16 ( 5 − 64 ) = 5120 < 75 = Var(Y).)
9 4 49 99 2
Also, from Examples 12 and 8(ii), E(X | Y) = 2Y 3 , 0 < Y < 1, and fY (y) = 4y ,
3
1
2y 8 1 4 8
E[E(X | Y)] = × 4y3 dy = y dy = = EX.
0 3 3 0 15
(However, Var[E(X | Y)] = Var( 2Y 3 ) = 9 Var(Y) < Var(Y).)
4
and
∞
Var(X | Y = y) = [x − E(X | Y = y)]2 fX|Y (x | y)dx, (6)
−∞
for the discrete and the continuous case, respectively. The conditional variances
depend on the values of the conditioning r.v., as was the case for the conditional expec-
tations. From formulas (5) and (6), it is not hard to see (see also Exercise 2.20) that:
Var(X | Y = yj ) = E(X 2 | Y = yj ) − [E(X | Y = yj )]2 or
(7)
Var(X | Y = y) = E(X 2 | Y = y) − [E(X | Y = y)]2 ,
for the discrete and the continuous case, respectively.
Discussion. By (7),
Var(X | Y = y) = E(X 2 | Y = y) − [E(X | Y = y)]2
y 2
2x 2y
= x2 × 2 dx − (by Examples 8(ii) and 12)
0 y 3
2 y 3 4y2 y2 4y2 y2
= 2 x dx − = − = , 0 < y < 1.
y 0 9 2 9 18
4.2 Marginal & conditional p.d.f.’s, conditional expectation & variance 151
EXERCISES
2.1 Refer to Exercise 1.1 and calculate the marginal p.d.f.’s fX and fY .
2.2 Refer to Exercise 1.2 and calculate the marginal p.d.f.’s fX and fY .
2.3 If the joint p.d.f. of the r.v.’s X and Y is given by the following table,
determine the marginal p.d.f.’s fX and fY .
y\x −4 −2 2 4
−2 0 0.25 0 0
−1 0 0 0 0.25
1 0.25 0 0 0
2 0 0 0.25 0
2.4 The r.v.’s X and Y take on the values 1, 2, and 3, as indicated in the following
table:
y\x 1 2 3
(i) Determine the marginal p.d.f.’s fX and fY , and the conditional p.d.f.
fX|Y (· | y), y = 1, 2.
(ii) Calculate: EX, EY, E(X | Y = y), y = 1, 2, and E[E(X | Y)].
(iii) Compare EX and E[E(X | Y)].
(iv) Calculate: Var(X) and Var(Y).
2.6 Let the r.v.’s X and Y have the joint p.d.f.:
2
fX,Y (x, y) = , y = 1, . . . , x; x = 1, . . . , n.
n(n + 1)
Then compute:
(i) The marginal p.d.f.’s fX and fY .
(ii) The conditional p.d.f.’s fX|Y (· | y) and fY|X (· | x).
(iii) The conditional expectations E(X | Y = y) and E(Y | X = x).
Hint. For part (iii), use the appropriate part of #1 in Table 8 in the Appendix.
152 CHAPTER 4 Joint and Conditional p.d.f.’s
(i) Determine the marginal p.d.f.’s fX and fY , and specify the range of the
arguments involved.
(ii) Determine the conditional p.d.f.’s fX|Y (· | y) and fY|X (· | x), and specify
the range of the arguments involved.
(iii) Calculate the (conditional) probability P(X > 2 log 2 | Y = log 2),
where always log stands for the natural logarithm.
2.10 The joint p.d.f. of the r.v.’s X and Y is given by:
1 −xy
fX,Y (x, y) = ye , 0 < x < ∞, 0 < y < 2.
2
(i) Determine the marginal p.d.f. fY .
(ii) Find the conditional p.d.f. fX|Y (· | y), and evaluate it at y = 1/2.
(iii) Compute the conditional expectation E(X | Y = y), and evaluate it at
y = 1/2.
2.12 In reference to Exercise 1.4, calculate:
(i) The marginal p.d.f.’s fX , fY , and the conditional p.d.f. fY|X (· | x); in all
cases, specify the range of the variables involved.
(ii) EY and E(Y | X = x).
(iii) E[E(Y | X)] and observe that it is equal to EY.
(iv) The probability P(Y > 12 | X < 12 ).
4.2 Marginal & conditional p.d.f.’s, conditional expectation & variance 153
Hint. For part (i), form first the joint p.d.f. of X and Y. Also, use the
appropriate part of #6 of Table 8 in the Appendix.
2.20 (i) Let X and Y be two discrete r.v.’s with joint p.d.f. fX,Y . Then show that the
conditional variance of X, given Y, satisfies the following relation:
Var(X | Y = yj ) = E(X 2 | Y = yj ) − [E(X | Y = yj )]2 .
(ii) Establish the same relation, if the r.v.’s X and Y are of the continuous
type.
2.21 In reference to Exercise 3.76(iii), (iv) (in this chapter), suppose that
P(Y = y|X = x) = pqy , y = 0, 1, . . ., where q = 1 − p, so that
E(Y|X = x) = q/p. Further, assume that E(Y|X = x) = x2 as in Exercise
3.76. Then determine the values of p = p(x) for x = 0, 1, 2.
2.22 If for the r.v.’s X, Y, the functions g, h: → are such that E[g(X)h(Y)],
Eh(Y) are finite, then show that: E[g(X)h(Y)|X = x] = g(x)E[h(Y)|X = x].
2.23 A couple has X children, where X ∼ P(λ1 ), and each one of the children
produces Y children with Y ∼ P(λ2 ).
(i) What is the expected number of the grandchildren of the couple?
(ii) What is the numerical value in part (i) for λ1 = 4 and λ2 = 3?
2.24 The r.v.’s X and Y (of the continuous type) have joint p.d.f. given by:
fX,Y (x, y) = 8xy, 0 < x < y < 1.
(i) Determine the marginal p.d.f. fX .
2y
(ii) Show that the conditional p.d.f. fY|X (·|x) is given by: fY|X (y|x) = ,
1 − x2
0 < x < y < 1.
(iii) Compute the conditional expectation: E(Y|X = x).
(iv) What is the value of E(Y|X = x) for x = 1/2?
2.25 The (discrete) r.v.’s X and Y have the joint p.d.f. fX,Y where values are given in
the table below:
Y\X 1 2
1 0.2 0.4
2 0.3 0.1
Then determine:
(i) The marginal p.d.f.’s fX , fY .
(ii) The conditional p.d.f. fX|Y (·|y), and the specific values fX|Y (1|1).
fX|Y (2|1); fX|Y (1|2), fX|Y (2|2).
(iii) The EX, the σ 2 (X), and the σ (X); likewise, EY, σ 2 (Y), and σ (Y).
(iv) The E(XY).
(v) The Cov(X, Y) and the ρ(X, Y).
(vi) The conditional expectation E(X|Y = 1) and E(X|Y = 2).
(vii) Verify that E[E(X|Y)] = EX.
4.2 Marginal & conditional p.d.f.’s, conditional expectation & variance 155
2.27 The r.v. X has p.d.f. f (x) = 20x3 (1 − x), 0 < x < 1, and the conditional p.d.f.
of Y, given X = x, is U(0, 1 − x). Derive:
(i) The joint p.d.f. of X and Y.
(ii) The marginal p.d.f. of Y.
(iii) The conditional p.d.f. of X, given Y = y.
(iv) The conditional expectations and variances E(Y|X = x), E(X|Y = y);
σ 2 (Y|X = x), σ 2 (X|Y = y).
2.28 The joint p.d.f. of the r.v.’s X and Y is given by: fX,Y (x, y) = cxy2 ,
0 < x < y < 1 (c > 0).
(i) Determine the constant c.
(ii) Find the marginal p.d.f.’s fX and fY .
(iii) Find the conditional p.d.f.’s fX|Y (·|y) and fY|X (·|x).
(iv) Determine fX|Y (x|y = 1/2) and fY|X (y|x = 1/3).
2.29 The r.v.’s X and Y have joint p.d.f. given by:
fX,Y (x, y) = c2 e−cy , 0 < x < y (c > 0).
(i) Determine the marginal p.d.f. fX , and identify it. From this conclude that
c can be any positive constant.
(ii) Determine the conditional p.d.f. fY|X (·|x), and identify it.
(iii) Use part (ii) in order to write down the E(Y|X = x) and the σ 2 (Y|X = x)
without any integrations.
(iv) Find the p.d.f. fY without integration.
2.32 The r.v.’s X and Y have joint p.d.f. given by:
fX,Y (x, y) = cx(1 − x)y2 , 0 < x < 1, 0 < y < 1 (c > 0).
(iii) Furthermore, find E(X|Y = y) and E(Y|X = x), and evaluate them at
y = 13 and x = 12 .
(iv) Finally, verify that E[E(X|Y)] = EX, and E[E(Y|X)] = EY.
2.37 The r.v.’s X and Y have a joint p.d.f. given by:
fX,Y (x, y) = x + y, 0 < x < 1, 0 < y < 1.
for the discrete and the continuous case, respectively, provided, of course, the
quantities defined exist. Properties analogous to those in (4) in Chapter 3 apply here,
too. Namely, for c and d constants:
E[cg(X, Y)] = cEg(X, Y), E[cg(X, Y) + d] = cEg(X, Y) + d. (9)
Also, if h is another real-valued function, then (see also Exercise 3.17):
g(X, Y) ≤ h(X, Y) implies Eg(X, Y) ≤ Eh(X, Y), (10)
and, in particular,
g(X) ≤ h(X) implies Eg(X) ≤ Eh(X). (11)
For the special choice of the function g(x, y) = et1 x+t2 y ,
t1 , t2 reals, the expec-
tation E exp(t1 X + t2 Y) defines a function in t1 , t2 for those t1 , t2 for which this
expectation is finite. That is:
MX,Y (t1 , t2 ) = Eet1 X+t2 Y , (t1 , t2 ) ∈ C ⊆ 2 . (12)
Thus, for the discrete and the continuous case, we have, respectively,
MX,Y (t1 , t2 ) = et1 xi +t2 yj fX,Y (xi , yj ), (13)
xi ∈,yj ∈
and
∞ ∞
MX,Y (t1 , t2 ) = et1 x+t2 y fX,Y (x, y)dx dy. (14)
−∞ −∞
The function MX,Y (·, ·) so defined is called the joint m.g.f. of the r.v.’s X and Y. Clearly,
MX,Y (0, 0) = 1 for any X and Y, and it may happen that C = {(0, 0)} or C ⊂ 2 or
C = 2 . Here are two examples of joint m.g.f.’s.
Example 15. Refer to Example 1 and calculate the joint m.g.f. of the r.v.’s involved.
Example 16. Refer to Example 2 and calculate the joint m.g.f. of the r.v.’s involved.
and
λ1 λ2
MX (t1 ) = , t1 < λ1 , MY (t2 ) = , t2 < λ2 . (20)
λ1 − t1 λ2 − t2
The joint m.g.f., as defined in (12), has properties analogous to the ones stated in
(12) of Chapter 3. Namely, for c1 , c2 and d1 , d2 constants:
Mc1 X+d1,c2 Y+d2 (t1 , t2 ) = ed1 t1 +d2 t2 MX,Y (c1 t1 , c2 t2 ). (21)
Its simple justification is left as an exercise (see Exercise 3.2).
In the present context, a version of the properties stated in (13) of Chapter 3, is
the following:
∂ ∂
MX,Y (t1 , t2 )|t1 =t2 =0 = EX, MX,Y (t1 , t2 )|t1 =t2 =0 = EY, (22)
∂t1 ∂t2
and
∂2
MX,Y (t1 , t2 )|t1 =t2 =0 = E(XY), (23)
∂t1 ∂t2
160 CHAPTER 4 Joint and Conditional p.d.f.’s
provided one may interchange the order of differentiating and taking expectations.
For example, for (23), we have:
∂2 ∂2
MX,Y (t1 , t2 )|t1 =t2 =0 = E et1 X+t2 Y t =t =0
∂t1 ∂t2 ∂t1 ∂t2 1 2
2
∂
= E et1 X+t2 Y t =t =0
∂t1 ∂t2 1 2
= E XY et1 X+t2 Y t =t =0 = E(XY).
1 2
Remark 2. Although properties (21) and (22) allow us to obtain moments by means
of the m.g.f.’s of the r.v.’s X and Y, the most significant property of the m.g.f. is that
it allows (under certain conditions) to retrieve the distribution of the r.v.’s X and Y.
This is done through the so-called inversion formula.
Now, select the function g as follows: g(x, y) = cx + dy, where c and d are
constants. Then, for the continuous case:
∞ ∞
Eg(X, Y) = E(cX + dY) = (cx + dy)fX,Y (x, y)dx dy
−∞ −∞
∞ ∞ ∞ ∞
= c xfX,Y (x, y)dx dy + d yfX,Y (x, y)dx dy
−∞ −∞ −∞ −∞
∞ ∞ ∞ ∞
= c x fX,Y (x, y)dy dx + d y fX,Y (x, y)dx dy
−∞ −∞ −∞ −∞
∞ ∞
= c xfX (x)dx + d yfY (x)dy = cEX + dEY; i.e.,
−∞ −∞
The second equality in (26) follows by multiplying out (X − EX)(Y − EY) and
applying property (25).
4.3 Expectation of a function of two r.v.’s, joint and marginal m.g.f.’s 161
The variance of a single r.v. has been looked upon as a measure of dispersion
of the distribution of the r.v. Some motivation will be given subsequently to the
effect that the Cov(X, Y) may be thought of as a measure of the degree to which
X and Y tend to increase or decrease simultaneously when Cov(X, Y) > 0 or to move
toward opposite directions when Cov(X, Y) < 0. This point is sufficiently made by
the following simple example.
Example 17. Consider the events A and B with P(A)P(B) > 0 and set X = IA and
Y = IB for the indicator functions, where IA (s) = 1 if s ∈ A and IA (s) = 0 if s ∈ Ac .
Then, clearly, EX = P(A), EY = P(B), and XY = IA∩B , so that E(XY) = P(A ∩ B).
It follows that Cov(X, Y) = P(A ∩ B) − P(A)P(B). Next,
xy 0 1 2 3 4 6 9
Therefore, E(XY) = 1.3 and then, by formula (26), Cov(X, Y) = 1.3 − 1.01 ×
0.99 = 0.3001.
Here, the covariance is positive, and by comparing the values of the conditional
probabilities in Example 5 with the appropriate unconditional probabilities, we see
that this is consonant with the observation just made that X and Y tend to take
simultaneously either “large” values or “small” values. (See also Example 19 later.)
The result obtained next provides the range of values of the covariance of two
r.v.’s; it is also referred to as a version of the Cauchy-Schwarz inequality.
Theorem 1.
(i) Consider the r.v.’s X and Y with EX = EY = 0 and Var(X) = Var(Y) = 1. Then
always −1 ≤ E(XY) ≤ 1, and E(XY) = 1 if and only if P(X = Y) = 1, and
E(XY) = −1 if and only if P(X = −Y) = 1.
(ii) For any r.v.’s X and Y with finite expectations and positive variances σX2 and σY2 ,
it always holds:
−σX σY ≤ Cov(X, Y) ≤ σX σY , (31)
and Cov(X, Y) = σX σY if and only if P[Y = EY + σσXY (X − EX)] = 1, Cov(X, Y) =
−σX σY if and only if P[Y = EY − σσXY (X − EX)] = 1.
Proof.
(i) Clearly, 0 ≤ E(X − Y)2 = EX 2 + EY 2 − 2E(XY) = 2 − 2E(XY), so that
E(XY) ≤ 1; also, 0 ≤ E(X + Y)2 = EX 2 + EY 2 + 2E(XY) = 2 + 2E(XY), so
that −1 ≤ E(XY). Combining these results, we obtain −1 ≤ E(XY) ≤ 1. As for
equalities, observe that, if P(X = Y) = 1, then E(XY) = EX 2 = 1, and if P(X =
−Y) = 1, then E(XY) = −EX 2 = −1. Next, E(XY) = 1 implies E(X − Y)2 = 0
or Var(X − Y) = 0. But then P(X − Y = 0) = 1 or P(X = Y) = 1 (see
Exercise 2.4 in Chapter 3). Also, E(XY) = −1 implies E(X + Y)2 = 0 or
Var(X + Y) = 0, so that P(X = −Y) = 1 (by the exercise just cited).
(ii) Replace the r.v.’s X and Y by the r.v.’s X ∗ = X−EX σX and Y ∗ = Y−EY
σY , for
which EX ∗ = EY ∗ = 0 and Var(X ∗ ) = Var(Y ∗ ) = 1. Then the inequalities
−1 ≤ E(X ∗ Y ∗ ) ≤ 1 become
X − EX Y − EY
−1 ≤ E ≤1 (32)
σX σY
from which (31) follows. Also, E(X ∗ Y ∗ ) = 1 if and only if P(X ∗ = Y ∗ ) = 1
becomes E[(X − EX)(Y − EY)] = σX σY if and only if P[Y = EY + σσYX ×
(X − EX)] = 1, and E(X ∗ Y ∗ ) = −1 if and only if P(X ∗ = −Y ∗ ) = 1 becomes
E[(X − EX)(Y − EY)] = −σX σY if and only if P[Y = EY − σσYX × (X − EX)] =
1. A restatement of the last two conclusions is: Cov(X, Y) = σX σY if and
4.3 Expectation of a function of two r.v.’s, joint and marginal m.g.f.’s 163
X)
y −E
σ Y (x
σ EY σX
+
EY + σ Y EX y=
X
y=
EY EY
− σY
σ σX (x
EY − σ Y EX −E
X)
X
x
0 EX
FIGURE 4.5
Lines of perfect linear relation of x and y .
164 CHAPTER 4 Joint and Conditional p.d.f.’s
values or “small” values. Since Cov(X, Y) and ρ(X, Y) have the same sign, the same
statement can be made about ρ(X, Y), being positive if and only if X and Y tend to take
simultaneously either “large” values or “small” values. The same arguments apply for
the case that Cov(X, Y) < 0 (equivalently, ρ(X, Y) < 0). This reasoning indicates that
ρ(X, Y) may be looked upon as a measure of linear dependence between X and Y.
The pair (X, Y) lies on the line y = EY + σσXY (x − EX) if ρ(X, Y) = 1; pairs identical
to (X, Y) tend to be arranged along this line, if (0 <)ρ(X, Y) < 1, and they tend to
move further and further away from this line as ρ(X, Y) gets closer to 0; the pairs bear
no sign of linear tendency whatever, if ρ(X, Y) = 0. Rough arguments also hold for
the reverse assertions. For 0 < ρ(X, Y) ≤ 1, the r.v.’s X and Y are said to be positively
correlated, and uncorrelated if ρ(X, Y) = 0. Likewise, the pair (X, Y) lies on the line
y = EY − σσXY (x − EX) if ρ(X, Y) = −1; pairs identical to (X, Y) tend to be arranged
along this line if −1 < ρ(X, Y) < 0. Again, rough arguments can also be made for the
reverse assertions. For −1 ≤ ρ(X, Y) < 0, the r.v.’s X and Y are said to be Negatively
correlated. (See Figure 4.6.)
y y
x x
(a) (b)
y y
x x
(c) (d)
x
(e)
FIGURE 4.6
Depiction of the correlation coefficient of two r.v.’s.
4.3 Expectation of a function of two r.v.’s, joint and marginal m.g.f.’s 165
Actually, a more precise argument to this effect can be made by considering the
distance D of the (random) point (X, Y) from the lines y = EY ± σσYX (x − EX). It can
be seen that:
2σX2 σY2
ED2 = (1 − |ρ(X, Y)|). (37)
σX2 + σY2
Then one may use the interpretation of the expectation as an average and exploit (37)
in order to arrive at the same reasoning as above, but in a more rigorous way.
As an illustration, let us calculate the ρ(X, Y) for Examples 1 and 8.
Example 19. In reference to Example 1, calculate the Cov(X, Y) and the ρ(X, Y).
Example 20. In reference to Example 8, calculate the Cov(X, Y) and the ρ(X, Y).
4
Cov(X, Y) 4
ρ(X, Y) = = √
225
√ =√ 0.492.
Var(X) Var(Y) 11
× 6 66
15 15
Example 21. Let X and Y be two r.v.’s with finite expectations and equal (finite)
variances, and set U = X + Y and V = X − Y. Then the r.v.’s U and V are uncorre-
lated.
Discussion. Indeed,
Figure 4.6 illustrates the behavior of the correlation coefficient ρ(X, Y) of the r.v.’s X
and Y. In (a), ρ(X, Y) = 1, the r.v.’s X and Y are perfectly positively linearly related. In
(b), ρ(X, Y) = −1, the r.v.’s X and Y are perfectly Negatively linearly related. In (c),
0 < ρ(X, Y) < 1, the r.v.’s X and Y are positively correlated. In (d), −1 < ρ(X, Y) < 0,
166 CHAPTER 4 Joint and Conditional p.d.f.’s
the r.v.’s X and Y are Negatively correlated. In (e), ρ(X, Y) = 0, the r.v.’s X and Y are
uncorrelated.
The following result presents an interesting property of the correlation
coefficient.
Theorem 2. Let X and Y be r.v.’s with finite first and second moments and positive
variances, and let c1 , c2 , d1 , d2 be constants with c1 c2 = 0. Then:
Proof. Indeed, Var(c1 X + d1 ) = c21 Var(X), Var(c2 Y + d2 ) = c22 Var(Y), and Cov(c1 X +
d1 , c2 Y + d2 ) = E{[(c1 X + d1 ) − E(c1 X + d1 )][(c2 Y + d2 ) − E(c2 Y + d2 )]} =
E[c1 (X − EX) × c2 (Y − EY)] = c1 c2 E[(X − EX)(Y − EY)] = c1 c2 Cov(X, Y).
Therefore ρ(c1 X + d1 , c2 Y + d2 ) = |c c c|1√c2Var(X)Var(Y)
Cov(X,Y)
, and the conclusion follows.
1 2
Example 22. Let X and Y be temperatures in two localities measured in the Celsius
scale, and let U and V be the same temperatures measured in the Fahrenheit scale.
Then ρ(X, Y) = ρ(U, V), as it should be. This is so because U = 95 X + 32 and V =
5 Y + 32, so that (38) applies with the + sign.
9
Theorem 3. For two r.v.’s X and Y with finite first and second moments, and
(positive) standard deviations σX and σY , it holds:
Var(X + Y) = σX2 + σY2 + 2Cov(X, Y) = σX2 + σY2 + 2σX σY ρ(X, Y), (39)
and
Var(X + Y) = σX2 + σY2 if X and Y are uncorrelated. (40)
Proof. Since (40) follows immediately from (39), and Cov(X, Y) = σX σY × ρ(X, Y),
it suffices to establish only the first equality in (39). Indeed,
Var(X + Y) = E[(X + Y) − E(X + Y)]2 = E[(X − EX) + E(Y − EY)]2
= E(X − EX)2 + E(Y − EY)2 + 2E[(X − EX)(Y − EY)]
Discussion. By (39),
Var(X + Y) = Var(X) + Var(Y) + 2Cov(X, Y)
EXERCISES
3.1 Let X and Y be the r.v.’s denoting the number of sixes when two fair dice are
rolled independently 15 times each. Determine the E(X + Y).
3.2 Show that the joint m.g.f. of two r.v.’s X and Y satisfies the following property,
where c1 , c2 , d1 , and d2 are constants.
3.9 Refer to Exercise 2.5, and calculate the Cov(X, Y) and the ρ(X, Y).
3.10 Let X be a r.v. taking on the values −2, −1, 1, 2, each with probability 1/4, and
define the r.v. Y by: Y = X 2 . Then calculate the quantities: EX, Var(X), EY,
Var(Y), E(XY), Cov(X, Y), and ρ(X, Y). Are you surprised by the value of
ρ(X, Y)? Explain.
3.11 Refer to Example 8 and compute the covariance Cov(X, Y) and the
correlation coefficient ρ(X, Y). Decide on the kind of correlation of the
r.v.’s X and Y.
3.17 Justify the statement made in relation (10), for both the discrete and the
continuous case.
Hint. Set g(α, β) = E[Y − (αX − β)2 ]2 , carry out the operations on the
right-hand side in order to get: g(α, β) = EY 2 + α 2 EX 2 + β 2 + 2αβEX−
2αE(XY) − 2βEY, minimize g(α, β) by equating to 0 the two partial
derivatives in order to find the values α̂ and α̂ given above. Finally, show that
these values α̂ and β̂ do, indeed, minimize g(α, β) by showing that the
2 × 2 matrix of the second-order derivatives has its 1 × 1 and 2 × 2
determinants positive. Or that the matrix of the second-order derivatives is
positive definite.
3.18 Consider the (discrete) r.v. X and its distribution given below:
x -2 -1 1 2
3.21 If X and Y are two r.v.’s with (finite) variances σ12 , σ22 , respectively, and
correlation coefficients ρ, show that Var(X + Y) ≤ (σ1 + σ2 )2 .
3.22 In reference to Exercise 1.51 (in Chapter 3), show that
(i) E(XY) = 3n +7n+8
2
12 .
n2 +n−2
(ii) Cov(X, Y) = 36 , and ρ(X, Y) = 12 .
3.23 If the r.v.’s X and Y have (positive and finite) variances σ 2 (X) and σ 2 (Y),
show that the r.v.’s X and aX + Y are uncorrelated if and only if
a = −Cov(X, Y)/σ 2 (X). Likewise, the r.v.’s Y and X + bY are uncorrelated if
and only if b = −Cov(X, Y)/σ 2 (Y).
3.24 In reference to Exercise 2.28 (in this chapter), compute:
(i) E(XY).
(ii) Cov(X, Y).
(iii) ρ(X, Y).
(iv) E XY , E YX2 , and E XY .
fX1 ,...,Xk (x1i , . . . , xki ), where the summation extends over all (x1i , . . . , xki ) ∈ B.
For the continuous case, xk the joint p.d.f. is a nonNegative function such that
x1
FX1 ,...,Xk (x1 , . . . , xk ) = −∞ . . . −∞ fX1 ,...,Xk (t1 , . . . , tk )dt1 . . . dtk , x1 , . . . , xk in . It
follows that for B ⊆ k (where you may interpret B as a familiar Geometric figure in
k ): P[(X1 , . . . , Xk ) ∈ B] = . . . fX1 ,...,Xk (x1 , . . . , xk )dx1 . . . dxk , and in particular,
∞ ∞ B
−∞ . . . −∞ fX1 ,...,Xk (x1 , . . . , xk )dx1 . . . dxk (= P[(X1 , . . . , Xk ) ∈ ]) = 1. As in
k
the two-dimensional case, ∂x1∂...∂xk FX1 ,...,Xk (x1 , . . . , xk ) = fX1 ,...,Xk (x1 , . . . , xk ) (for
k
continuity points (x1 , . . . , xk ) of fX1 ,...,Xk ). In the next subsection, three concrete
examples will be presented, one for the discrete case, and two for the continuous
case. In the present k-dimensional case, there are many marginal d.f.’s and p.d.f.’s.
Thus, if in FX1 ,...,Xk (x1 , . . . , xk ), t of the x’s, xj1 , . . . , xjt , are replaced by +∞
(in the sense they are let to tend to +∞), then what is left is the marginal
joint d.f. of the r.v.’s Xi1 , . . . , Xis , FXi1 ,...,Xis , where s + t = k. Likewise, if in
fX1 ,...,Xk (x1 , . . . , xk ), xj1 , . . . , xjt are eliminated through summation (for the discrete
case) or integration (for the continuous case), what is left is the marginal joint p.d.f.
of the r.v.’s Xi1 , . . . , Xis , fXi1 , . . . , Xis . Combining joint and marginal joint p.d.f.’s, as
in the two-dimensional case, we obtain a variety of conditional p.d.f.’s. Thus, for
example,
fX1 ,...,Xk (x1 , . . . , xk )
fXj1 ,...,Xjt |Xi1 ,...,Xis (xj1 , . . . , xjt | xi1 , . . . , xis ) = .
fXi1 ,...,Xis (xi1 , . . . , xis )
172 CHAPTER 4 Joint and Conditional p.d.f.’s
Remark 3. Relation (44) demonstrates the joint moment generating property of the
joint m.g.f. The joint m.g.f. can also be used for recovering the joint distribution of
the r.v.’s X1 , . . . , Xk as indicated in Remark 2.
Finally, the appropriate versions of relations (39) and (40) become here, by setting
σX2i = Var(Xi ), i = 1, . . . , k:
k
Var(X1 + · · · + Xk ) = σX2i + 2 Cov(Xi , Xj )
i=1 1≤i<j≤k
k
= σX2i + 2 σXi σXj ρ(Xi , Xj ), (45)
i=1 1≤i<j≤k
and
k
Var(X1 + · · · + Xk ) = σX2i if the Xi ’s are pairwise uncorrelated; (46)
i=1
EXERCISES
4.1 If the r.v.’s X1 , X2 , X3 have the joint p.d.f. fX1 ,X2 ,X3 (x1 , x2 , x3 ) =
c3 e−c(x1 +x2 +x3 ) , x1 > 0, x2 > 0, x3 > 0 (c > 0), determine:
4.5 The Multinomial, the Bivariate Normal, and the multivariate Normal 173
Hint. Use the definition of covariance and the linearity property of the
expectation.
same throughout. Denote by Xi the r.v. of the number of times outcome Oi occurs,
i = 1, . . . , k. Then the joint p.d.f. of X1 , . . . , Xk is given by:
n!
fX1 ,...,Xk (x1 , . . . , xk ) = px1 . . . pxkk , (47)
x1 ! . . . xk ! 1
where x1 , . . . , xk are ≥ 0 integers with x1 + · · · + xk = n, and, of course, 0 < pi <
1, i = 1, . . . , k, p1 + · · · + pk = 1. The distribution given by (47) is the Multinomial
distribution with parameters n and p1 , . . . , pk , and the r.v.’s X1 , . . . , Xk are said to
have the Multinomial distribution with these parameters. That the right-hand side of
(47) is the right formula for the joint probabilities P(X1 = x1 , . . . , Xk = xk ) ensues
as follows: By independence, the probability that Oi occurs ni times, i = 1, . . . , k, in
specified positions, is given by: px11 . . . pxkk , regardless of the location of the specific
positions of occurrence of Oi ’s. The different ways of 1choosing
n−n1the ni positions for
−···−nk−1
the occurrence of Oi , i = 1, . . . , k, is equal to: nn1 n−nn2 . . . nk . Writing
out each term in factorial form and making the obvious cancellations, we arrive at:
n!/(x1 ! . . . xk !) (see also Exercise 5.1). For illustrative purposes, let us consider the
following example.
Example 24. A fair die is rolled independently 10 times. Find the probability that
faces #1 through #6 occur the following respective number of times: 2, 1, 3, 1, 2,
and 1.
n! xi x
= p 1 . . . pisis qn−r .
xi1 ! . . . xis !(n − r)! i1
(ii) For ≥ 0 integers xj1 , . . . , xjt with xj1 + · · · + xjt = n − r, we have:
fXj1 ,...,Xjt |Xi1 ,...,Xis (xj1 , . . . , xjt |xi1 , . . . , xis )
= P(Xj1 = xj1 , . . . , Xjt = xjt |Xi1 = xi1 , . . . , Xis = xis )
P(Xj1 = xj1 , . . . , Xjt = xjt , Xi1 = xi1 , . . . , Xis = xis )
=
P(Xi1 = xi1 , . . . , Xis = xis )
xj1 x xi x
n!
xj1 !···xjt !xi1 !···xis ! pj1 . . . pjtjt × pi11 . . . pisis
= xi1 x
n!
xi1 !...xis !(n−r)! pi1 . . . pisis qn−r
(n − r)!
= (pj /q)xj1 . . . (pjt /q)xjt .
xj1 ! . . . xjt ! 1
Example 26. In a genetic experiment, two different varieties of a certain species are
crossed and a specific characteristic of the offspring can occur only at three levels, A,
B, and C, say. According to a proposed model, the probabilities for A, B, and C are
1 3 8
12 , 12 , and 12 , respectively. Out of 60 offspring, calculate:
(i) The probability that 6, 18, and 36 fall into levels A, B, and C, respectively.
(ii) The (conditional) probability that 6 and 18 fall into levels A and B, respectively,
given that 36 have fallen into level C.
Discussion.
(i) Formula (47) applies with n = 60, k = 3, p1 = 12
1
, p2 = 12
3
, p3 = 12
8
, x1 =
6, x2 = 18, x3 = 36 and yields:
6 18 36
60! 1 3 8
P(X1 = 6, X2 = 18, X3 = 36) = 0.011.
6!18!36! 12 12 12
(ii) Here, Theorem 4(ii) applies with s = 1, t = 2, xi1 = x3 = 36, xj1 =
x1 = 6, xj2 = x2 = 18, r = 36, so that n − r = 60 − 36 = 24, q = 1 − p3 =
176 CHAPTER 4 Joint and Conditional p.d.f.’s
1 − 8
12 = 4
12 , and yields:
(n − r)! p1 x1 p2 x2
P(X1 = 6, X2 = 18 | X3 = 36) =
x1 !x2 ! q q
1 6 3 18
(24)! 12
= 4
12
4
6!18! 12 12
6 18
24 1 3
=
6 4 4
= 0.1852 (from the Binomial tables).
By means of (44) and (48), we can find the Cov(Xi , Xj ) and the ρ(Xi , Xj ) for
any 1 ≤ i < j ≤ k. Indeed, EXi = npi , EXj = npj , Var(Xi ) = npi (1 − pi ), Var(Xj ) =
npj (1 − pj ) and E(Xi Xj ) = n(n − 1)pi pj . Therefore:
Cov(Xi , Xj ) = −npi pj and ρ(Xi , Xj ) = −[pi pj /((1 − pi )(1 − pj ))]1/2 (49)
(see Exercise 5.4 for details).
x y
x
(a) (b)
FIGURE 4.7
Graphs of the p.d.f. of the Bivariate Normal distribution: (a) Centered at the origin;
(b) Centered elsewhere in the (x, y )-plane.
178 CHAPTER 4 Joint and Conditional p.d.f.’s
0.10
y
0.05
20
16
0 12
8 y
0 5 4
10 15 0
x
FIGURE 4.8
Conditional probability density functions of the bivariate Normal distribution.
ρσ1 σ2
and therefore ρ(X, Y) = σ1 σ2 = ρ. Thus, we have:
Cov(X, Y) = ρσ1 σ2 and ρ(X, Y) = ρ. (57)
Relation (57) reveals that the parameter ρ in (50) is, actually, the correlation
coefficient of the r.v.’s X and Y.
Example 27. If the r.v.’s X1 and X2 have the Bivariate Normal distribution with
parameters μ1 , μ2 , σ12 , σ22 , and ρ:
(i) Calculate the quantities: E(c1 X1 + c2 X2 ), Var(c1 X1 + c2 X2 ), where c1 , c2 are
constants.
(ii) How the expression in part (i) becomes for: μ1 = −1, μ2 = 3, σ12 = 4, σ22 = 9,
and ρ = 12 ?
Discussion.
(i) E(c1 X1 + c2 X2 ) = c1 EX1 + c2 EX2 = c1 μ1 + c2 μ2 , since Xi ∼ N (μi , σi2 ), so
that EXi = μi , i = 1, 2. Also,
Var(c1 X1 + c2 X2 ) = c21 σX21 + c22 σX22 + 2c1 c2 σX1 σX2 ρ(X1 , X2 ) (by 39)
Example 28. Suppose that the heights of fathers and sons are r.v.’s X and Y,
respectively, having (approximately) Bivariate Normal distribution with parameters
(expressed in inches) μ1 = 70, σ1 = 2, μ2 = 71, σ2 = 2 and ρ = 0.90. If for a given
pair (father, son) it is observed that X = x = 69, determine:
(i) The conditional distribution of the height of the son.
(ii) The expected height of the son.
(iii) The probability of the height of the son to be more than 72 in.
Discussion.
(i) According to (55), Y|X = x ∼ N(bx , (σ2 1 − ρ 2 )2 ), where
ρσ2
bx = μ2 + (x − μ1 ) = 71 + 0.90 × (69 − 70) = 70.1, and
σ1
σ2 1 − ρ 2 = 2 × 1 − 0.902 0.87.
Finally, it can be seen by integration that the joint m.g.f. of X and Y is given by
the formula:
1
MX,Y (t1 , t2 ) = exp μ1 t1 + μ2 t2 + σ12 t12 + 2ρσ1 σ2 t1 t2 + σ22 t22 , t1 , t2 ∈ ; (58)
2
we choose not to pursue its justification (which can be found, e.g., in pages 158–159,
in the book A Course in Mathematical Statistics, 2nd edition (1997), Academic Press,
by G.G. Roussas). We see, however, easily that:
∂
MX,Y (t1 , t2 ) = μ1 + σ12 t1 + ρσ1 σ2 t2 MX,Y (t1 , t2 ),
∂t1
and hence:
∂2
MX,Y (t1 , t2 ) = ρσ1 σ2 MX,Y (t1 , t2 ) + μ1 + σ12 t1 + ρσ1 σ2 t2
∂t1 ∂t2
× μ2 + σ22 t2 + ρσ1 σ2 t1 MX,Y (t1 , t2 ),
which, evaluated at t1 = t2 = 0, yields ρσ1 σ2 + μ1 μ2 = E(XY), as we have already
seen.
4.5 The Multinomial, the Bivariate Normal, and the multivariate Normal 181
EXERCISES
5.1 Show that for n1 , . . . , nk ≥ 0 integers with n1 + · · · + nk = n:
n n − n1 n − n1 − · · · − nk−1 n!
··· = .
n1 n2 nk n1 !n2 ! . . . nk !
Hint. Write out the terms on the left-hand side as factorials, and recall that
0! = 1.
5.2 In a store selling TV sets, it is known that 25% of the customers will purchase
a TV set of brand A, 40% will purchase a TV set of brand B, and 35% will
just be browsing around. For a lot of 10 customers:
(i) What is the probability that two will purchase a TV set of brand A, three
will purchase a TV set of brand B, and five will purchase
neither?
182 CHAPTER 4 Joint and Conditional p.d.f.’s
(ii) If it is known that six customers did not purchase a TV set, what is the
(conditional) probability that one of the rest will purchase a TV set of
brand A and three will purchase a TV set of brand B?
Hint. Part (i) is an application of formula (47), and part (ii) is an application
of Theorem 4(i).
5.3 Human blood
occurs in four types termed A, B, AB, and O with respective frequencies
pA = 0.40, pB = 0.10, pAB = 0.05, and pO = 0.45. If n donors participate
in a blood drive, denote by XA , XB , XAB , and XO the numbers of donors with
respective blood types A, B, AB, and O. Then XA , XB , XAB , and XO are r.v.’s
having the Multinomial distribution with parameters n and pA , pB , pAB , pO .
Write out the appropriate formulas for the following probabilities:
(i) P(XA = xA , XB = xB , XAB = xAB , XO = xO ) for
xA , xB , xAB , and xO nonNegative integers with xA + xB + xAB + xO = n.
(ii) P(XA = xA , XB = xB , XAB = xAB ).
(iii) P(XA = xA , XB = xB ).
(iv) P(XA = xA ).
(v) P(XA = xA , XB = xB , XAB = xAB | XO = xO ).
(vi) P(XA = xA , XB = xB | XAB = xAB , XO = xO ).
(vii) P(XA = xA | XB = xB , XAB = xAB , XO = xO ).
(viii) Give numerical answers
to parts (i)-(vii), if n = 20, and xA = 8, xB = 2, xAB = 1, xO = 9.
Hint. Part (i) is an
application of formula (47) and parts (ii)-(vii) are applications of Theorem 4.
5.4 In conjunction with the Multinomial distribution, show that:
EXi = npi , EXj = npj , Var(Xi ) = npi (1 − pi ), Var(Xj ) = npj (1 − pj ),
pi pj
Cov(Xi , Xj ) = −npi pj and ρ(Xi , Xj ) = − .
[pi (1 − pi )pj (1 − pj )]1/2
Hint. Use the m.g.f. given in formula (48).
5.5 Refer to Exercises 5.3 and 5.4, and for n = 20, calculate the quantities:
EXA , EXB , EXAB , EXO ; Var(XA ), Var(XB ), Var(XAB ),
(i) The r.v.’s U and V also have the Bivariate Normal distribution
with parameters: μ1 + μ2 , μ1 − μ2 , τ12 = σ12 + 2ρσ1 σ2 + σ22 , τ22 =
σ12 − 2ρσ1 σ2 + σ22 , and ρ0 = (σ12 − σ22 )/τ1 τ2 .
(ii) U ∼ N(μ1 + μ2 , τ12 ), V ∼ N(μ1 − μ2 , τ22 ).
(iii) The r.v.’s U and V are uncorrelated if and only if σ12 = σ22 .
Hint. For part (i), start
out with the joint m.g.f. of U and V, and express it in terms of the m.g.f. of
X and Y. Then use the formula given in Exercise 5.10, and regroup the terms
in the exponent suitably to arrive at the desired conclusion. Parts (ii) and
(iii) follow from part (i) and known facts about Bivariate Normal distribution.
5.14 Let the r.v.’s X and Y denote the scores in two tests T1 and T2 ,
and suppose that they have Bivariate Normal distribution with the following
parameter: μ1 = 82, σ1 = 10, μ2 = 90, σ2 = 9, ρ = 0.75. Compute
the following quantities: P(Y > 92|X = 84), P(X > Y), P(X + Y > 190).
Hint. The first probability if calculated by the fact that we know
the conditional distribution of Y, given X = x. The last two probabilities are
calculated by the fact that the distributions of X − Y and X + Y are given in
Exercise 5.13.
5.16 Let the r.v.’s X and Y denote the heights of pairs (father, son) in a certain age
bracket, and suppose that they have Bivariate Normal distribution with the
following parameters (measured in inches as appropriate): μ1 = 67, σ1 = 2,
μ2 = 68, σ2 = 1, ρ = 0.8. If it is observed that X = 69, compute the
following quantities: E(Y|X = 69), conditional s.d. of Y|X = 69,
P(Y > 70|X = 69) and P(Y > X).
5.18 Show that the Bivariate Normal p.d.f. f (x, y) given by relation (50) is
maxi-
mized for x = μ1 , y = μ2 , and the maximum is equal to 1/(2πσ1 σ2 1 − ρ 2 ).
y−μ2
Hint. Set x−μ
σ1 = u, σ2 = v, so that the
1
5.20 Show that the intersection of the surface represented by a Bivariate Normal
p.d.f. by any plane perpendicular to the z-axis is an ellipse, and it is a circle
if and only if ρ = 0.
5.21 For any two r.v.’s X1 , X2 with finite expectations, it is known that:
E[E(X2 |X1 )] = EX2 , E[E(X1 |X2 )] = EX1 .
Suppose now that the r.v.’s have jointly the Bivariate Normal distribution
with parameters μ1 , μ2 , σ12 , σ22 , ρ. Then show that:
Var[E(X2 |X1 )] ≤ σ22 = σ 2 (X2 ), Var[E(X1 |X2 )] ≤ σ12 = σ 2 (X1 ).
186 CHAPTER 4 Joint and Conditional p.d.f.’s