P9-Conditional Distribution

§9 Conditional distribution
§9.1 Introduction
9.1.1 Let f (x, y), fX (x) and fY (y) be the joint probability (density or mass) function of (X, Y ) and
the marginal probability functions of X and Y , respectively. Consider, for some small δ > 0,
the conditional probability
Z x Z y+δ . Z y+δ
P(X ≤ x | y ≤ Y ≤ y + δ) = f (u, v) du dv fY (v) dv
−∞ y y
Z x Z x
. f (u, y)
≈ δ f (u, y) du δ fY (y) = du,
−∞ −∞ fY (y)
which suggests Z x
f (u, y)
lim P(X ≤ x | y ≤ Y ≤ y + δ) = du.
δ→0 −∞ fY (y)
[The discrete case can be treated in a similar way.]
This motivates us to define the conditional probability function of X given Y = y to be

f (x, y)
fX|Y (x|y) = ,
fY (y)
and the corresponding conditional cdf to be
FX|Y (x|y) = P(X ≤ x|Y = y)

 X X


 f X|Y (u|y) = P(X = u|Y = y) (discrete case),

{u: u≤x} {u: u≤x}
= Z x

fX|Y (u|y) du (continuous case).



−∞
9.1.2 By conditioning on {Y = y}, we limit our scope to those outcomes of X that are possible
when Y is observed to be y.
Example. Let X = no. of casualties at a road accident and Y = total weight (in tons) of vehicles
involved. Then the conditional distributions of X given different values of Y may be very different, i.e.
distribution of X|Y = 10 may be different from distribution of X|Y = 1, say.
9.1.3 Special case: Y = 1 {A} (for some event A)

For brevity, we write f (x|A) = fX|Y (x|1) and f (x|Ac ) = fX|Y (x|0), and similarly for condi-
tional cdf’s, F (x|A) and F (x|Ac ).
56
9.1.4 X, Y independent if and only if fX|Y (x|y) = fX (x) for all x, y.
9.1.5 Conditional distributions may similarly be defined for groups of random variables.
For example, for random variables X = (X1 , . . . , Xr ) and Y = (Y1 , . . . , Ys ), let
f (x1 , . . . , xr , y1 , . . . , ys ), fX (x1 , . . . , xr ) and fY (y1 , . . . , ys )
X , Y ), X and Y , respectively. Then

be the joint probability functions of (X
(a) conditional (joint) probability function of X given Y = (y1 , . . . , ys ):

f (x1 , . . . , xr , y1 , . . . , ys )
fX |YY (x1 , . . . , xr |y1 , . . . , ys ) = ;
fY (y1 , . . . , ys )
(b) conditional (joint) cdf of X given Y = (y1 , . . . , ys ):
FX |YY (x1 , . . . , xr |y1 , . . . , ys )

Z x 1 Z xr


 ··· fX |YY (u1 , . . . , ur |y1 , . . . , ys ) dur · · · du1 (continuous case),
−∞ −∞
= X X


 ··· fX |YY (u1 , . . . , ur |y1 , . . . , ys ) (discrete case).
u1 ≤x1 ur ≤xr
9.1.6 X , Y independent if and only if
fX |YY (x1 , . . . , xr |y1 , . . . , ys ) = fX (x1 , . . . , xr ), for all x1 , . . . , xr , y1 , . . . , ys .
9.1.7 Concepts previously established for “unconditional” distributions can be obtained analogously
for conditional distributions by substituting conditional distribution or probability functions,
i.e. F (·|·) or f (·|·), for their unconditional counterparts, i.e. F (·) or f (·).
9.1.8 CONDITIONAL INDEPENDENCE

X1 , X2 , . . . are conditionally independent given Y iff
n
Y
P( X1 ≤ x1 , . . . , Xn ≤ xn |Y = y ) = P(Xi ≤ xi |Y = y)
i=1
for all x1 , . . . , xn , y ∈ [−∞, ∞] and any n ∈ {1, 2, . . .}. The latter condition is equivalent to
n
Y
fX1 ,...,Xn |Y (x1 , . . . , xn |y) = fXi |Y (xi |y)
i=1
for all x1 , . . . , xn , y ∈ [−∞, ∞] and any n ∈ {1, 2, . . .}, where fX1 ,...,Xn |Y (x1 , . . . , xn |y) denotes
the joint probability function of (X1 , . . . , Xn ) conditional on Y = y.
57
9.1.9 Examples.
(i) Let X1 , X2 , Y be Bernoulli random variables such that fY (y) = (1/2) 1{y = 0 or 1} and
fX1 ,X2 |Y (x1 , x2 |y) = 1 {x1 = x2 = y = 0} + (1/4) 1 {y = 1}, x1 , x2 ∈ {0, 1}.
Note that

1 {x = 0} 1 {x = 0}, y = 0,
1 2
fX1 ,X2 |Y (x1 , x2 |y) =
(1/2) 1 {x1 = 0 or 1} × (1/2) 1 {x2 = 0 or 1}, y = 1.
Since fX1 ,X2 |Y (x1 , x2 |y) can be factorised into a product fX1 |Y (x1 |y)fX2 |Y (x2 |y) for y = 0
or 1, X1 and X2 are conditionally independent given Y .
However, the (unconditional) joint mass function of (X1 , X2 ),
1
X
fX1 ,X2 (x1 , x2 ) = fX1 ,X2 |Y (x1 , x2 |y) fY (y) = (1/2) 1 {x1 = x2 = 0} + 1/8,
y=0
cannot be factorised into a product fX1 (x1 )fX2 (x2 ). Therefore, X1 , X2 are not indepen-
dent.
(ii) Toss a coin N times, where N ∼ Poisson (λ). Suppose the coin has probability p of
turning up “head”. Let X = no. of heads and Y = N − X = no. of tails. Then

n x
p (1 − p)n−x 1 x ∈ {0, 1, . . . , n} ,

fX|N (x|n) =
x

n n−y
p (1 − p)y 1 y ∈ {0, 1, . . . , n} .

fY |N (y|n) =
y
Conditional joint mass function of (X, Y ) given N = n:
fX,Y |N (x, y|n) = P(X = x, Y = y|N = n)

n! x
p (1 − p)y 1 x, y ∈ {0, 1, . . . , n}, x + y = n

=
x! y!
6= fX|N (x|n) fY |N (y|n)
in general — hence X, Y are not conditionally independent given N .
58
Joint mass function of (X, Y ):
∞
X
f (x, y) = fX,Y |N (x, y|n) P(N = n)
n=0
∞
X n! x λn e−λ
p (1 − p)y 1 x, y ∈ {0, 1, . . . , n}, x + y = n

=
n=0
x! y! n!
y
x+y −λ
(pλ)x e−pλ (1 − p)λ e−(1−p)λ

(x + y)! x y λ e
= p (1 − p) = ,
x! y! (x + y)! x! y!
so that X and Y are independent Poisson random variables with means pλ and (1 − p)λ,
respectively.
(iii) Joint pdf:
f (x, y, z) = 40 xz 1 {x, y, z ≥ 0, x + z ≤ 1, y + z ≤ 1}.
It has been derived in Example §7.1.11(c) that, for x, y, z ∈ [0, 1],
20 5
fX (x) = x(1 − x)2 (1 + 2x), fY (y) = (1 − 4y 3 + 3y 4 ), fZ (z) = 20z(1 − z)3 .
3 3
Thus, for x, y, z ∈ [0, 1], conditional pdf’s can be obtained as follows.
– X, Y |Z = z ∼ ?

f (x, y, z) 2x 1 {x ≤ 1 − z} 1 {y ≤ 1 − z}
fX,Y |Z (x, y|z) = = .
fZ (z) (1 − z)2 1−z
The above factorisation suggests that X and Y are conditionally independent given
Z, and therefore

2x 1 {x ≤ 1 − z}
fX|Y,Z (x|y, z) = fX|Z (x|z) = ,


(1 − z)2
 1 {y ≤ 1 − z}
fY |X,Z (y|x, z) = fY |Z (y|z) =
 ⇒ Y |Z = z ∼ U [0, 1 − z].
1−z
– X, Z|Y = y ∼ ?

f (x, y, z) 24 xz 1 z ≤ 1 − max{x, y}
fX,Z|Y (x, z|y) = = .
fY (y) 1 − 4y 3 + 3y 4
Note: fX,Z|Y (x, z|y) cannot be expressed as a product of a function of (x, y) and a function
of (z, y), which implies that X and Z are not conditionally independent given Y .
59
– Z|(X, Y ) = (x, y) ∼ ?
fX,Z|Y (x, z|y) fX,Z|Y (x, z|y)

fZ|X,Y (z|x, y) = = R1
fX|Y (x|y) f (x, z 0 |y) dz 0
0 X,Z|Y

z 1 z ≤ 1 − max{x, y} 2 z 1 z ≤ 1 − max{x, y}
= R 1−max{x,y} = 2 .
z 0 dz 0 1 − max{x, y}
0
§9.2 Conditional expectation

9.2.1 Let fX|Y (x|y) be the conditional probability function of X given Y = y. Then, for any function
g(x),  X


 g(x)fX|Y (x|y) (discrete case),
x ∈ supp(X)
E g(X)Y = y = Z ∞

g(x)fX|Y (x|y) dx (continuous case).



−∞

9.2.2 Let ψ(y) = E g(X)Y = y , a function of y. The random variable ψ(Y ) is usually written

as E g(X)Y for brevity, so that E g(X)Y = y is a realisation of E g(X)Y when Y is
observed to be y.
9.2.3 X, Y independent ⇒ E[g(X)|Y = y] = E[g(X)] ∀ y.

R R
Proof: ∀ y, E[g(X)|Y = y] = g(x) fX|Y (x|y) dx = g(x) fX (x) dx = E[g(X)].
(discrete case similar).
Note: The above equality may not hold if X, Y are not independent.
9.2.4 Standard properties of E[ · ] still hold for conditional expectations E[ · | Y ]:
X1 ≥ X2 ⇒ E[X1 |Y ] ≥ E[X2 |Y ].
For any functions α(Y ), β(Y ), γ(Y ) of Y ,

E α(Y )X1 + β(Y )X2 + γ(Y )Y = α(Y ) E[X1 |Y ] + β(Y ) E[X2 |Y ] + γ(Y ).
E[X|Y ] ≤ E |X| Y .

X1 , X2 conditionally independent given Y ⇒ E[X1 X2 |Y ] = E[X1 |Y ] E[X2 |Y ].
9.2.5 Concepts built upon E[ · ] can be extended to a conditional version. For example,
60
CONDITIONAL VARIANCE
2
– Var(X|Y ) = E (X − E[X|Y ])2 Y = E[X 2 |Y ] − E[X|Y ] .

– For any functions a(Y ), b(Y ) of Y , Var a(Y )X + b(Y ) Y = a(Y )2 Var(X|Y ).
– Var(X|Y ) ≥ 0.

– Var(X|Y ) = 0 iff P X = h(Y )Y = 1 for some function h(Y ) of Y .
P P
– X1 , . . . , Xn conditionally independent given Y ⇒ Var i Xi Y =

i Var(Xi |Y ).
CONDITIONAL COVARIANCE/CORRELATION COEFFICIENT

Cov(X1 , X2 |Y ) = E (X1 − E[X1 |Y ])(X2 − E[X2 |Y ]) Y
= E[X1 X2 |Y ] − E[X1 |Y ] E[X2 |Y ],
Cov(X1 , X2 |Y )
ρ(X1 , X2 |Y ) = p .
Var(X1 |Y )Var(X2 |Y )
CONDITIONAL QUANTILE
Conditional αth quantile of X given Y is inf{x ∈ R : FX|Y (x|Y ) > α}.

9.2.6 Proposition. (Law of total expectation) For any function g(x), E E[g(X)|Y ] = E[g(X)].
Proof: Consider the continuous case (discrete case similar).
Z Z Z

E E[g(X)|Y ] = E g(X) Y = y fY (y) dy =
g(x) fX|Y (x|y) dx fY (y) dy
Z Z Z
= g(x) f (x, y) dy dx = g(x) fX (x) dx = E[g(X)].

9.2.7 Proposition. For any event A, E P(A|Y ) = P(A).
Proof: Note that

E 1 {A}Y = 1 × P 1 {A} = 1Y + 0 × P 1 {A} = 0Y = P(A|Y ).
Similarly, E 1 {A} = P(A). The result follows by applying Proposition §9.2.6 with X = 1 {A}.

9.2.8 Proposition. (Law of total variance) Var(X) = E Var(X|Y ) + Var E[X|Y ] .
61
Proof:
Var(X) = E (X − E[X|Y ] + E[X|Y ] − E[X])2

h i
= E E (X − E[X|Y ])2 Y + E (E[X|Y ] − E[X])2

h i
+ 2 E E (X − E[X|Y ])(E[X|Y ] − E[X])Y
h i
= E Var(X|Y ) + Var E[X|Y ] + 2 E (E[X|Y ] − E[X]) E X − E[X|Y ]Y

= E Var(X|Y ) + Var E[X|Y ] .
9.2.9 The results of §9.2.1 to §9.2.8 can be extended to the multivariate case where Y is replaced by
(Y1 , . . . , Ys ).
9.2.10 Examples — (cont’d from §9.1.9)
(i) X|N ∼ Binomial (N, p) ⇒ E[X|N ] = N p and Var(X|N ) = N p(1 − p).

Then

E E[X|N ] = p E[N ] = pλ,
Var E[X|N ] = p2 Var(N ) = p2 λ,

E Var(X|N ) = p(1 − p) E[N ] = p(1 − p)λ.
But, unconditionally, X ∼ Poisson (pλ), which implies E[X] = Var(X) = pλ.

This confirms the laws of total expectation and total variance:

E E[X|N ] = E[X] and Var(X) = E Var(X|N ) + Var E[X|N ] .
(ii) Consider conditional expectations of g(X, Y, Z) = XY Z given Z and (X, Y ), respectively.

– Given Z
Since X and Y are conditionally independent given Z,

E XY Z Z = Z E XY Z = Z E X Z E Y Z
Z 1−Z Z 1−Z
2x2 y Z(1 − Z)2
=Z dx dy = .
0 (1 − Z)2 0 1−Z 3
62
– Given (X, Y )
Z 1

E XY Z X, Y = XY E Z X, Y = XY zfZ|X,Y (z|X, Y ) dz
0
Z 1−max{X,Y }
2XY 2
z 2 dz = XY 1 − max{X, Y } .

= 2
1 − max{X, Y } 0 3
Taking further expectations,

Z 1
E E[XY Z|Z] = E[XY Z|Z = z] fZ (z) dz
0
Z 1
z(1 − z)2

20z(1 − z)3 dz = 5/126,

=
0 3
and

Z 1 Z 1
E E[XY Z|X, Y ] = E[XY Z|X = x, Y = y] fX,Y (x, y) dx dy
0 0
Z 1Z 1 Z 1
2
= xy 1 − max{x, y} f (x, y, z) dz dx dy
0 0 3 0
40 1 1 2
Z Z
3
= x y 1 − max{x, y} dx dy
3 0 0
40 1
Z Z x Z 1
2 3 2 3
= x (1 − x) y dy + x y(1 − y) dy dx = 5/126.
3 0 0 x
As expected, the above results agree with that derived in Example §8.1.4(iii):

E E[XY Z|Z] = E E[XY Z|X, Y ] = E[XY Z] = 5/126.
9.2.11 Proposition. Suppose Ω = A1 ∪ A2 · · · , where the Aj ’s are mutually exclusive. Then
E[X] = E[X|A1 ] P(A1 ) + E[X|A2 ] P(A2 ) + · · · .
Proof: Define Y = j if Aj occurs, j = 1, 2, . . . . Then

∞ ∞
X X
E[X] = E E[X|Y ] = E[X|Y = j] P(Y = j) = E[X|Aj ] P(Aj ).
j=1 j=1
Note: The expectation of X can be treated as a weighted average of the conditional expectations
of X given disjoint sectors of the sample space. The weights are determined by the probabilities of
the sectors. The special case X = 1 {B} reduces to the “law of total probability”.
63
9.2.12 Proposition. For a random variable X and an event A with P(A) > 0,

E X 1 {A}
E[X|A] = .
P(A)
Proof: Applying Proposition §9.2.11 with A1 = A, A2 = Ac and X replaced by X 1 {A}, we have
E[X 1{A}] = E[X 1{A}|A] P(A) + E[X 1{A}|Ac ] P(Ac ) = E[X|A] P(A).
Note: The special case X = 1 {B} reduces to the definition of the conditional probability P(B|A).
9.2.13 Example. A person is randomly selected from an adult population and his/her height X
measured. It is known that the mean height of a man is 1.78m, and that of a woman is 1.68m.
Men account for 48% of the population. Calculate the mean height of the adult population,
E[X].
Answer:
E[X] = E[X|{man}] P(man) + E[X|{woman}] P(woman) = 1.78m × 0.48 + 1.68m × 0.52 = 1.728m.
9.2.14 Example §7.1.11(b) (cont’d)

Recall that X = 1 if the fair coin shows a head and X = 2 otherwise, and
d
f (y|{head}) = P(U ≤ y) = (2π)−11 {0 < y < 2π},
dy
d
P(U + V ≤ y) = (2π)−2 1 {0 < y < 2π}y + 1 {2π ≤ y < 4π}(4π − y) .

f (y|{tail}) =
dy
−1
Calculate E Y Y > Xπ and Var Y −1 Y > Xπ .

Answer: Consider first, for j = 0, 1, 2,

Z ∞ Z 2π
−j −1
y f (y|{head}) dy = (2π) y −j dy, x=1



−j  π π
E Y 1 {Y > xπ} X = x = Z ∞
Z 4π
y −j f (y|{tail}) dy = (2π)−2 y −j (4π − y) dy, x = 2




2π 2π

1 ln 2 1
 1 {j = 0} + 1 {j = 1} + 2 1 {j = 2}, x=1


2 2π 4π
=
 1 1{j = 0} + 2 ln 2 − 1 1{j = 1} + 1 − ln 2 1{j = 2}, x = 2.


2 2π 4π 2
Note that by the law of total expectation,
2 2
X 1 X −j
E Y −j 1 {Y > Xπ} = E Y −j 1 {Y > xπ}X = x P(X = x) =

E Y 1 {Y > xπ}X = x .
2
x=1 x=1
64
Putting j = 0, 1, 2, respectively, we have

1 1 1 1
P(Y > Xπ) = + = ,
2 2 2 2

−1 1 ln 2 2 ln 2 − 1 3 ln 2 − 1
E Y 1 {Y > Xπ} = + = ,
2 2π 2π 4π

1 1 1 − ln 2 2 − ln 2
E Y −2 1 {Y > Xπ} =

2
+ 2
= .
2 4π 4π 8π 2
It follows that
E Y −1 1 {Y > Xπ}

−1 3 ln 2 − 1
E Y Y > Xπ = = .
P(Y > Xπ) 2π
Similarly, we have
E Y −2 1 {Y > Xπ}

−2 2 − ln 2
E Y Y > Xπ = = ,
P(Y > Xπ) 4π 2
so that
2
Var Y −1 Y > Xπ = E Y −2 Y > Xπ − E Y −1 Y > Xπ

3 ln 2 − 1 2 1 + 5 ln 2 − 9(ln 2)2

2 − ln 2
= − = .
4π 2 2π 4π 2
§9.3 *** More challenges ***

9.3.1 Let X and Y be continuous random variables with joint density function
f (x, y) = C 1 {x, y > 0} y 1 {x + y ≤ 1} + (x + y)−3 1 {x + y > 1} ,

for some constant C > 0.
(a) Find C.
(b) Find the conditional pdf fX|Y .
(c) Find the conditional pdf of X given X > Y .
9.3.2 Dreydel is an ancient game played by Jews at the Chanukah festival. It can be played by any
number, m say, of players. Each player takes turns to spin a four-sided top — the dreydel —
marked with letters N, G, H and S respectively. Before the game starts, each player contributes one
unit to the pot which thus contains m units. Depending on the outcome of his spin, the spinning
player
receives no payoff if N turns up,
65
receives the entire pot if G turns up,
receives half the pot if H turns up,
contributes 1 unit to the pot if S turns up.
If G turns up, all m players must each contribute one unit to the pot to start the game again.
(a) Show that in the long run, the pot contains 2(m + 1)/3 units on average.
(b) Is Dreydel a fair game, i.e. no player has advantages over the others?
9.3.3 Two players compete in a card-drawing game. Each player is given a full pack of 52 cards. Each
draws cards from the pack repeatedly and with replacement, until a stopping condition is met.
Player 1 stops whenever a queen is followed by a queen, which is then followed by a king. Player
2 stops whenever a queen is followed by a king, which is then followed by a queen. Whoever stops
first is the winner. Compare the expected numbers of cards drawn by the two players. Which player
does the game favour more?
66

P9-Conditional Distribution

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

P9-Conditional Distribution

Uploaded by

Copyright:

Available Formats

§9 Conditional distribution

This motivates us to define the conditional probability function of X given Y = y to be

FX|Y (x|y) = P(X ≤ x|Y = y)

9.1.3 Special case: Y = 1 {A} (for some event A)

f (x1 , . . . , xr , y1 , . . . , ys ), fX (x1 , . . . , xr ) and fY (y1 , . . . , ys )

X , Y ), X and Y , respectively. Then

(a) conditional (joint) probability function of X given Y = (y1 , . . . , ys ):

FX |YY (x1 , . . . , xr |y1 , . . . , ys )

9.1.6 X , Y independent if and only if

fX |YY (x1 , . . . , xr |y1 , . . . , ys ) = fX (x1 , . . . , xr ), for all x1 , . . . , xr , y1 , . . . , ys .

9.1.8 CONDITIONAL INDEPENDENCE

fX1 ,X2 |Y (x1 , x2 |y) = 1 {x1 = x2 = y = 0} + (1/4) 1 {y = 1}, x1 , x2 ∈ {0, 1}.

Conditional joint mass function of (X, Y ) given N = n:

fX,Y |N (x, y|n) = P(X = x, Y = y|N = n)

in general — hence X, Y are not conditionally independent given N .

fX,Z|Y (x, z|y) fX,Z|Y (x, z|y)

§9.2 Conditional expectation

9.2.3 X, Y independent ⇒ E[g(X)|Y = y] = E[g(X)] ∀ y.

9.2.4 Standard properties of E[ · ] still hold for conditional expectations E[ · | Y ]:

 X1 , X2 conditionally independent given Y ⇒ E[X1 X2 |Y ] = E[X1 |Y ] E[X2 |Y ].

 CONDITIONAL COVARIANCE/CORRELATION COEFFICIENT

Var(X) = E (X − E[X|Y ] + E[X|Y ] − E[X])2

9.2.10 Examples — (cont’d from §9.1.9)

(i) X|N ∼ Binomial (N, p) ⇒ E[X|N ] = N p and Var(X|N ) = N p(1 − p).

But, unconditionally, X ∼ Poisson (pλ), which implies E[X] = Var(X) = pλ.

(ii) Consider conditional expectations of g(X, Y, Z) = XY Z given Z and (X, Y ), respectively.

Taking further expectations,

9.2.11 Proposition. Suppose Ω = A1 ∪ A2 · · · , where the Aj ’s are mutually exclusive. Then

E[X] = E[X|A1 ] P(A1 ) + E[X|A2 ] P(A2 ) + · · · .

Proof: Define Y = j if Aj occurs, j = 1, 2, . . . . Then

9.2.14 Example §7.1.11(b) (cont’d)

Answer: Consider first, for j = 0, 1, 2,

§9.3 *** More challenges ***

f (x, y) = C 1 {x, y > 0} y 1 {x + y ≤ 1} + (x + y)−3 1 {x + y > 1} ,

for some constant C > 0.

 receives no payoff if N turns up,

You might also like

X1 , X2 conditionally independent given Y ⇒ E[X1 X2 |Y ] = E[X1 |Y ] E[X2 |Y ].

CONDITIONAL COVARIANCE/CORRELATION COEFFICIENT

§9.3 * More challenges *

receives no payoff if N turns up,