You are on page 1of 11

§9 Conditional distribution

§9.1 Introduction
9.1.1 Let f (x, y), fX (x) and fY (y) be the joint probability (density or mass) function of (X, Y ) and
the marginal probability functions of X and Y , respectively. Consider, for some small δ > 0,
the conditional probability
Z x Z y+δ . Z y+δ
P(X ≤ x | y ≤ Y ≤ y + δ) = f (u, v) du dv fY (v) dv
−∞ y y
Z x Z x
. f (u, y)
≈ δ f (u, y) du δ fY (y) = du,
−∞ −∞ fY (y)

which suggests Z x
f (u, y)
lim P(X ≤ x | y ≤ Y ≤ y + δ) = du.
δ→0 −∞ fY (y)
[The discrete case can be treated in a similar way.]

This motivates us to define the conditional probability function of X given Y = y to be


f (x, y)
fX|Y (x|y) = ,
fY (y)
and the corresponding conditional cdf to be

FX|Y (x|y) = P(X ≤ x|Y = y)


 X X


 f X|Y (u|y) = P(X = u|Y = y) (discrete case),

{u: u≤x} {u: u≤x}
= Z x

fX|Y (u|y) du (continuous case).



−∞

9.1.2 By conditioning on {Y = y}, we limit our scope to those outcomes of X that are possible
when Y is observed to be y.
Example. Let X = no. of casualties at a road accident and Y = total weight (in tons) of vehicles
involved. Then the conditional distributions of X given different values of Y may be very different, i.e.
distribution of X|Y = 10 may be different from distribution of X|Y = 1, say.

9.1.3 Special case: Y = 1 {A} (for some event A)


For brevity, we write f (x|A) = fX|Y (x|1) and f (x|Ac ) = fX|Y (x|0), and similarly for condi-
tional cdf’s, F (x|A) and F (x|Ac ).

56
9.1.4 X, Y independent if and only if fX|Y (x|y) = fX (x) for all x, y.

9.1.5 Conditional distributions may similarly be defined for groups of random variables.
For example, for random variables X = (X1 , . . . , Xr ) and Y = (Y1 , . . . , Ys ), let

f (x1 , . . . , xr , y1 , . . . , ys ), fX (x1 , . . . , xr ) and fY (y1 , . . . , ys )

X , Y ), X and Y , respectively. Then


be the joint probability functions of (X

(a) conditional (joint) probability function of X given Y = (y1 , . . . , ys ):


f (x1 , . . . , xr , y1 , . . . , ys )
fX |YY (x1 , . . . , xr |y1 , . . . , ys ) = ;
fY (y1 , . . . , ys )
(b) conditional (joint) cdf of X given Y = (y1 , . . . , ys ):

FX |YY (x1 , . . . , xr |y1 , . . . , ys )


Z x 1 Z xr


 ··· fX |YY (u1 , . . . , ur |y1 , . . . , ys ) dur · · · du1 (continuous case),
−∞ −∞
= X X


 ··· fX |YY (u1 , . . . , ur |y1 , . . . , ys ) (discrete case).
u1 ≤x1 ur ≤xr

9.1.6 X , Y independent if and only if

fX |YY (x1 , . . . , xr |y1 , . . . , ys ) = fX (x1 , . . . , xr ), for all x1 , . . . , xr , y1 , . . . , ys .

9.1.7 Concepts previously established for “unconditional” distributions can be obtained analogously
for conditional distributions by substituting conditional distribution or probability functions,
i.e. F (·|·) or f (·|·), for their unconditional counterparts, i.e. F (·) or f (·).

9.1.8 CONDITIONAL INDEPENDENCE


X1 , X2 , . . . are conditionally independent given Y iff
n
Y
P( X1 ≤ x1 , . . . , Xn ≤ xn |Y = y ) = P(Xi ≤ xi |Y = y)
i=1

for all x1 , . . . , xn , y ∈ [−∞, ∞] and any n ∈ {1, 2, . . .}. The latter condition is equivalent to
n
Y
fX1 ,...,Xn |Y (x1 , . . . , xn |y) = fXi |Y (xi |y)
i=1

for all x1 , . . . , xn , y ∈ [−∞, ∞] and any n ∈ {1, 2, . . .}, where fX1 ,...,Xn |Y (x1 , . . . , xn |y) denotes
the joint probability function of (X1 , . . . , Xn ) conditional on Y = y.

57
9.1.9 Examples.

(i) Let X1 , X2 , Y be Bernoulli random variables such that fY (y) = (1/2) 1{y = 0 or 1} and

fX1 ,X2 |Y (x1 , x2 |y) = 1 {x1 = x2 = y = 0} + (1/4) 1 {y = 1}, x1 , x2 ∈ {0, 1}.

Note that

1 {x = 0} 1 {x = 0}, y = 0,
1 2
fX1 ,X2 |Y (x1 , x2 |y) =
(1/2) 1 {x1 = 0 or 1} × (1/2) 1 {x2 = 0 or 1}, y = 1.

Since fX1 ,X2 |Y (x1 , x2 |y) can be factorised into a product fX1 |Y (x1 |y)fX2 |Y (x2 |y) for y = 0
or 1, X1 and X2 are conditionally independent given Y .
However, the (unconditional) joint mass function of (X1 , X2 ),
1
X
fX1 ,X2 (x1 , x2 ) = fX1 ,X2 |Y (x1 , x2 |y) fY (y) = (1/2) 1 {x1 = x2 = 0} + 1/8,
y=0

cannot be factorised into a product fX1 (x1 )fX2 (x2 ). Therefore, X1 , X2 are not indepen-
dent.
(ii) Toss a coin N times, where N ∼ Poisson (λ). Suppose the coin has probability p of
turning up “head”. Let X = no. of heads and Y = N − X = no. of tails. Then
 
n x
p (1 − p)n−x 1 x ∈ {0, 1, . . . , n} ,

fX|N (x|n) =
x
 
n n−y
p (1 − p)y 1 y ∈ {0, 1, . . . , n} .

fY |N (y|n) =
y

Conditional joint mass function of (X, Y ) given N = n:

fX,Y |N (x, y|n) = P(X = x, Y = y|N = n)


n! x
p (1 − p)y 1 x, y ∈ {0, 1, . . . , n}, x + y = n

=
x! y!
6= fX|N (x|n) fY |N (y|n)

in general — hence X, Y are not conditionally independent given N .

58
Joint mass function of (X, Y ):

X
f (x, y) = fX,Y |N (x, y|n) P(N = n)
n=0

X n! x λn e−λ
p (1 − p)y 1 x, y ∈ {0, 1, . . . , n}, x + y = n

=
n=0
x! y! n!
y
x+y −λ
(pλ)x e−pλ (1 − p)λ e−(1−p)λ
  
(x + y)! x y λ e
= p (1 − p) = ,
x! y! (x + y)! x! y!

so that X and Y are independent Poisson random variables with means pλ and (1 − p)λ,
respectively.
(iii) Joint pdf:
f (x, y, z) = 40 xz 1 {x, y, z ≥ 0, x + z ≤ 1, y + z ≤ 1}.
It has been derived in Example §7.1.11(c) that, for x, y, z ∈ [0, 1],
20 5
fX (x) = x(1 − x)2 (1 + 2x), fY (y) = (1 − 4y 3 + 3y 4 ), fZ (z) = 20z(1 − z)3 .
3 3
Thus, for x, y, z ∈ [0, 1], conditional pdf’s can be obtained as follows.
– X, Y |Z = z ∼ ?
  
f (x, y, z) 2x 1 {x ≤ 1 − z} 1 {y ≤ 1 − z}
fX,Y |Z (x, y|z) = = .
fZ (z) (1 − z)2 1−z

The above factorisation suggests that X and Y are conditionally independent given
Z, and therefore

2x 1 {x ≤ 1 − z}
fX|Y,Z (x|y, z) = fX|Z (x|z) = ,


(1 − z)2
 1 {y ≤ 1 − z}
fY |X,Z (y|x, z) = fY |Z (y|z) =
 ⇒ Y |Z = z ∼ U [0, 1 − z].
1−z

– X, Z|Y = y ∼ ?

f (x, y, z) 24 xz 1 z ≤ 1 − max{x, y}
fX,Z|Y (x, z|y) = = .
fY (y) 1 − 4y 3 + 3y 4
Note: fX,Z|Y (x, z|y) cannot be expressed as a product of a function of (x, y) and a function
of (z, y), which implies that X and Z are not conditionally independent given Y .

59
– Z|(X, Y ) = (x, y) ∼ ?

fX,Z|Y (x, z|y) fX,Z|Y (x, z|y)


fZ|X,Y (z|x, y) = = R1
fX|Y (x|y) f (x, z 0 |y) dz 0
0 X,Z|Y
 
z 1 z ≤ 1 − max{x, y} 2 z 1 z ≤ 1 − max{x, y}
= R 1−max{x,y} = 2 .
z 0 dz 0 1 − max{x, y}
0

§9.2 Conditional expectation


9.2.1 Let fX|Y (x|y) be the conditional probability function of X given Y = y. Then, for any function
g(x),  X


 g(x)fX|Y (x|y) (discrete case),
  x ∈ supp(X)
E g(X) Y = y = Z ∞

g(x)fX|Y (x|y) dx (continuous case).



−∞
 
9.2.2 Let ψ(y) = E g(X) Y = y , a function of y. The random variable ψ(Y ) is usually written
     
as E g(X) Y for brevity, so that E g(X) Y = y is a realisation of E g(X) Y when Y is
observed to be y.

9.2.3 X, Y independent ⇒ E[g(X)|Y = y] = E[g(X)] ∀ y.


R R
Proof: ∀ y, E[g(X)|Y = y] = g(x) fX|Y (x|y) dx = g(x) fX (x) dx = E[g(X)].
(discrete case similar).

Note: The above equality may not hold if X, Y are not independent.

9.2.4 Standard properties of E[ · ] still hold for conditional expectations E[ · | Y ]:

ˆ X1 ≥ X2 ⇒ E[X1 |Y ] ≥ E[X2 |Y ].
ˆ For any functions α(Y ), β(Y ), γ(Y ) of Y ,
 
E α(Y )X1 + β(Y )X2 + γ(Y ) Y = α(Y ) E[X1 |Y ] + β(Y ) E[X2 |Y ] + γ(Y ).

ˆ E[X|Y ] ≤ E |X| Y .
 

ˆ X1 , X2 conditionally independent given Y ⇒ E[X1 X2 |Y ] = E[X1 |Y ] E[X2 |Y ].

9.2.5 Concepts built upon E[ · ] can be extended to a conditional version. For example,

60
ˆ CONDITIONAL VARIANCE
  2
– Var(X|Y ) = E (X − E[X|Y ])2 Y = E[X 2 |Y ] − E[X|Y ] .

– For any functions a(Y ), b(Y ) of Y , Var a(Y )X + b(Y ) Y = a(Y )2 Var(X|Y ).
– Var(X|Y ) ≥ 0.

– Var(X|Y ) = 0 iff P X = h(Y ) Y = 1 for some function h(Y ) of Y .
P  P
– X1 , . . . , Xn conditionally independent given Y ⇒ Var i Xi Y =

i Var(Xi |Y ).

ˆ CONDITIONAL COVARIANCE/CORRELATION COEFFICIENT


 
Cov(X1 , X2 |Y ) = E (X1 − E[X1 |Y ])(X2 − E[X2 |Y ]) Y
= E[X1 X2 |Y ] − E[X1 |Y ] E[X2 |Y ],
Cov(X1 , X2 |Y )
ρ(X1 , X2 |Y ) = p .
Var(X1 |Y )Var(X2 |Y )

ˆ CONDITIONAL QUANTILE
Conditional αth quantile of X given Y is inf{x ∈ R : FX|Y (x|Y ) > α}.
 
9.2.6 Proposition. (Law of total expectation) For any function g(x), E E[g(X)|Y ] = E[g(X)].
Proof: Consider the continuous case (discrete case similar).
Z Z Z 
   
E E[g(X)|Y ] = E g(X) Y = y fY (y) dy =
g(x) fX|Y (x|y) dx fY (y) dy
Z Z  Z
= g(x) f (x, y) dy dx = g(x) fX (x) dx = E[g(X)].

 
9.2.7 Proposition. For any event A, E P(A|Y ) = P(A).
Proof: Note that
   
E 1 {A} Y = 1 × P 1 {A} = 1 Y + 0 × P 1 {A} = 0 Y = P(A|Y ).

Similarly, E 1 {A} = P(A). The result follows by applying Proposition §9.2.6 with X = 1 {A}.
 

  
9.2.8 Proposition. (Law of total variance) Var(X) = E Var(X|Y ) + Var E[X|Y ] .

61
Proof:

Var(X) = E (X − E[X|Y ] + E[X|Y ] − E[X])2


 
h  i
= E E (X − E[X|Y ])2 Y + E (E[X|Y ] − E[X])2
 
h  i
+ 2 E E (X − E[X|Y ])(E[X|Y ] − E[X]) Y
   h  i
= E Var(X|Y ) + Var E[X|Y ] + 2 E (E[X|Y ] − E[X]) E X − E[X|Y ] Y
  
= E Var(X|Y ) + Var E[X|Y ] .

9.2.9 The results of §9.2.1 to §9.2.8 can be extended to the multivariate case where Y is replaced by
(Y1 , . . . , Ys ).

9.2.10 Examples — (cont’d from §9.1.9)

(i) X|N ∼ Binomial (N, p) ⇒ E[X|N ] = N p and Var(X|N ) = N p(1 − p).


Then
 
E E[X|N ] = p E[N ] = pλ,
Var E[X|N ] = p2 Var(N ) = p2 λ,

 
E Var(X|N ) = p(1 − p) E[N ] = p(1 − p)λ.

But, unconditionally, X ∼ Poisson (pλ), which implies E[X] = Var(X) = pλ.


This confirms the laws of total expectation and total variance:
    
E E[X|N ] = E[X] and Var(X) = E Var(X|N ) + Var E[X|N ] .

(ii) Consider conditional expectations of g(X, Y, Z) = XY Z given Z and (X, Y ), respectively.


– Given Z
Since X and Y are conditionally independent given Z,
       
E XY Z Z = Z E XY Z = Z E X Z E Y Z
Z 1−Z Z 1−Z
2x2 y Z(1 − Z)2
=Z dx dy = .
0 (1 − Z)2 0 1−Z 3

62
– Given (X, Y )
Z 1
   
E XY Z X, Y = XY E Z X, Y = XY zfZ|X,Y (z|X, Y ) dz
0
Z 1−max{X,Y }
2XY 2
z 2 dz = XY 1 − max{X, Y } .

= 2
1 − max{X, Y } 0 3

Taking further expectations,


 
Z 1
E E[XY Z|Z] = E[XY Z|Z = z] fZ (z) dz
0
Z 1
z(1 − z)2 

20z(1 − z)3 dz = 5/126,

=
0 3
and
 
Z 1 Z 1
E E[XY Z|X, Y ] = E[XY Z|X = x, Y = y] fX,Y (x, y) dx dy
0 0
Z 1Z 1  Z 1 
2 
= xy 1 − max{x, y} f (x, y, z) dz dx dy
0 0 3 0
40 1 1 2
Z Z
3
= x y 1 − max{x, y} dx dy
3 0 0
40 1
Z  Z x Z 1 
2 3 2 3
= x (1 − x) y dy + x y(1 − y) dy dx = 5/126.
3 0 0 x

As expected, the above results agree with that derived in Example §8.1.4(iii):
   
E E[XY Z|Z] = E E[XY Z|X, Y ] = E[XY Z] = 5/126.

9.2.11 Proposition. Suppose Ω = A1 ∪ A2 · · · , where the Aj ’s are mutually exclusive. Then

E[X] = E[X|A1 ] P(A1 ) + E[X|A2 ] P(A2 ) + · · · .

Proof: Define Y = j if Aj occurs, j = 1, 2, . . . . Then


∞ ∞
  X X
E[X] = E E[X|Y ] = E[X|Y = j] P(Y = j) = E[X|Aj ] P(Aj ).
j=1 j=1

Note: The expectation of X can be treated as a weighted average of the conditional expectations
of X given disjoint sectors of the sample space. The weights are determined by the probabilities of
the sectors. The special case X = 1 {B} reduces to the “law of total probability”.

63
9.2.12 Proposition. For a random variable X and an event A with P(A) > 0,
 
E X 1 {A}
E[X|A] = .
P(A)
Proof: Applying Proposition §9.2.11 with A1 = A, A2 = Ac and X replaced by X 1 {A}, we have

E[X 1{A}] = E[X 1{A}|A] P(A) + E[X 1{A}|Ac ] P(Ac ) = E[X|A] P(A).

Note: The special case X = 1 {B} reduces to the definition of the conditional probability P(B|A).

9.2.13 Example. A person is randomly selected from an adult population and his/her height X
measured. It is known that the mean height of a man is 1.78m, and that of a woman is 1.68m.
Men account for 48% of the population. Calculate the mean height of the adult population,
E[X].
Answer:
E[X] = E[X|{man}] P(man) + E[X|{woman}] P(woman) = 1.78m × 0.48 + 1.68m × 0.52 = 1.728m.

9.2.14 Example §7.1.11(b) (cont’d)


Recall that X = 1 if the fair coin shows a head and X = 2 otherwise, and
d
f (y|{head}) = P(U ≤ y) = (2π)−11 {0 < y < 2π},
dy
d
P(U + V ≤ y) = (2π)−2 1 {0 < y < 2π}y + 1 {2π ≤ y < 4π}(4π − y) .
 
f (y|{tail}) =
dy
 −1
Calculate E Y Y > Xπ and Var Y −1 Y > Xπ .
 

Answer: Consider first, for j = 0, 1, 2,


Z ∞ Z 2π
−j −1
y f (y|{head}) dy = (2π) y −j dy, x=1



 −j   π π
E Y 1 {Y > xπ} X = x = Z ∞
Z 4π
y −j f (y|{tail}) dy = (2π)−2 y −j (4π − y) dy, x = 2




2π 2π

1 ln 2 1
 1 {j = 0} + 1 {j = 1} + 2 1 {j = 2}, x=1


2 2π 4π
=
 1 1{j = 0} + 2 ln 2 − 1 1{j = 1} + 1 − ln 2 1{j = 2}, x = 2.


2 2π 4π 2
Note that by the law of total expectation,
2 2
 X 1 X  −j
E Y −j 1 {Y > Xπ} = E Y −j 1 {Y > xπ} X = x P(X = x) =
   
E Y 1 {Y > xπ} X = x .
2
x=1 x=1

64
Putting j = 0, 1, 2, respectively, we have
 
1 1 1 1
P(Y > Xπ) = + = ,
2 2 2 2
 
 −1  1 ln 2 2 ln 2 − 1 3 ln 2 − 1
E Y 1 {Y > Xπ} = + = ,
2 2π 2π 4π
 
 1 1 1 − ln 2 2 − ln 2
E Y −2 1 {Y > Xπ} =

2
+ 2
= .
2 4π 4π 8π 2

It follows that
 E Y −1 1 {Y > Xπ}
 
 −1 3 ln 2 − 1
E Y Y > Xπ = = .
P(Y > Xπ) 2π
Similarly, we have
 E Y −2 1 {Y > Xπ}
 
 −2 2 − ln 2
E Y Y > Xπ = = ,
P(Y > Xπ) 4π 2
so that
 2
Var Y −1 Y > Xπ = E Y −2 Y > Xπ − E Y −1 Y > Xπ
    

3 ln 2 − 1 2 1 + 5 ln 2 − 9(ln 2)2
 
2 − ln 2
= − = .
4π 2 2π 4π 2

§9.3 *** More challenges ***


9.3.1 Let X and Y be continuous random variables with joint density function

f (x, y) = C 1 {x, y > 0} y 1 {x + y ≤ 1} + (x + y)−3 1 {x + y > 1} ,




for some constant C > 0.

(a) Find C.
(b) Find the conditional pdf fX|Y .
(c) Find the conditional pdf of X given X > Y .

9.3.2 Dreydel is an ancient game played by Jews at the Chanukah festival. It can be played by any
number, m say, of players. Each player takes turns to spin a four-sided top — the dreydel —
marked with letters N, G, H and S respectively. Before the game starts, each player contributes one
unit to the pot which thus contains m units. Depending on the outcome of his spin, the spinning
player

ˆ receives no payoff if N turns up,

65
ˆ receives the entire pot if G turns up,
ˆ receives half the pot if H turns up,
ˆ contributes 1 unit to the pot if S turns up.

If G turns up, all m players must each contribute one unit to the pot to start the game again.

(a) Show that in the long run, the pot contains 2(m + 1)/3 units on average.
(b) Is Dreydel a fair game, i.e. no player has advantages over the others?

9.3.3 Two players compete in a card-drawing game. Each player is given a full pack of 52 cards. Each
draws cards from the pack repeatedly and with replacement, until a stopping condition is met.
Player 1 stops whenever a queen is followed by a queen, which is then followed by a king. Player
2 stops whenever a queen is followed by a king, which is then followed by a queen. Whoever stops
first is the winner. Compare the expected numbers of cards drawn by the two players. Which player
does the game favour more?

66

You might also like