Professional Documents
Culture Documents
Conditional Density Functions and Conditional Expected Values
Conditional Density Functions and Conditional Expected Values
(11-1)
1
Suppose, we let
B = {y1 < Y ( ) y 2 } .
(11-2)
(11-3)
where we have made use of (7-4). But using (3-28) and (7-7) we can rewrite (11-3) as x y f XY ( u , v ) dudv y F X ( x | y1 < Y y 2 ) = . y (11-4) y f Y ( v ) dv To determine, the limiting case FX ( x | Y = y ), we can let y1 = y and y2 = y + y in (11-4).
2 1 2 1
This gives
FX
( x | y < Y y + y ) =
y + y y
f XY ( u , v ) dudv f Y ( v ) dv
y + y y
f XY ( u , y ) du y fY ( y ) y
(11-5)
f XY ( u , y ) du fY ( y )
(11-6)
(To remind about the conditional nature on the left hand side, we shall use the subscript X | Y (instead of X) there). Thus x
F X |Y ( x | Y = y ) =
f XY ( u , y ) du fY ( y )
(11-7)
(11-8)
3
It is easy to see that the left side of (11-8) represents a valid probability density function. In fact
f X |Y ( x | Y = y ) = f XY ( x , y ) 0 fY ( y ) = fY ( y ) = 1, fY ( y )
(11-9)
and
f X |Y ( x | Y = y ) dx =
f XY ( x , y ) dx fY ( y )
(11-10)
where we have made use of (7-14). From (11-9) - (11-10), (11-8) indeed represents a valid p.d.f, and we shall refer to it as the conditional p.d.f of the r.v X given Y = y. We may also write
f X |Y ( x | Y = y ) = f X |Y ( x | y ).
(11-11)
(11-12)
4
and similarly
f XY ( x , y ) fY |X ( y | x ) = . f X ( x)
(11-13)
If the r.vs X and Y are independent, then f XY ( x , y ) = f X ( x ) f Y ( y ) and (11-12) - (11-13) reduces to
f X |Y ( x | y ) = f X ( x ), f Y | X ( y | x ) = f Y ( y ),
(11-14)
implying that the conditional p.d.fs coincide with their unconditional p.d.fs. This makes sense, since if X and Y are independent r.vs, information about Y shouldnt be of any help in updating our knowledge about X. In the case of discrete-type r.vs, (11-12) reduces to
P (X = x i | Y = y j ) = P ( X = xi , Y = y j ) P (Y = y j ) .
(11-15)
5
Next we shall illustrate the method of obtaining conditional y p.d.fs through an example. Example 11.1: Given
k , 0 < x < y < 1, f XY ( x , y ) = (11-16) otherwise , 0, determine f X |Y ( x | y ) and f Y | X ( y | x ).
1
1
Fig. 11.1
Solution: The joint p.d.f is given to be a constant in the shaded region. This gives
and
f XY ( x , y ) dxdy =
y 0
k dx dy =
1 0
k y dy =
k = 1 k = 2. 2
Similarly
f X ( x) =
f XY ( x , y ) dy =
1 x
k dy = k (1 x ),
y 0
0 < x < 1,
(11-17) (11-18)
6
fY ( y ) =
f XY ( x , y ) dx =
k dx = k y ,
0 < y < 1.
(11-19)
and
(11-20)
We can use (11-12) - (11-13) to derive an important result. From there, we also have
f XY ( x , y ) = f X |Y ( x | y ) f Y ( y ) = f Y | X ( y | x ) f X ( x )
(11-21)
or
fY |X ( y | x ) = f X |Y ( x | y ) f Y ( y ) f X ( x) .
(11-22) (11-23)
7
But
f X ( x) =
f XY ( x , y ) dy =
f X |Y ( x | y ) f Y ( y ) dy
f YX ( y | x ) =
f X |Y ( x | y ) f Y ( y )
(24)
f X |Y ( x | y ) f Y ( y ) dy
Equation (11-24) represents the p.d.f version of Bayes theorem. To appreciate the full significance of (11-24), one need to look at communication problems where observations can be used to update our knowledge about unknown parameters. We shall illustrate this using a simple example. Example 11.2: An unknown random phase is uniformly distributed in the interval ( 0,2 ), and r = + n, where n N (0, 2 ). Determine f ( | r ). Solution: Initially almost nothing about the r.v is known, so that we assume its a-priori p.d.f to be uniform in the interval ( 0,2 ).
8
In the equation r = + n, we can think of n as the noise contribution and r as the observation. It is reasonable to assume that and n are independent. In that case
f ( r | = ) N ( , 2 )
(11-25)
since it is given that = is a constant, r = + n behaves like n. Using (11-24), this gives the a-posteriori p.d.f of given r to be (see Fig. 11.2 (b))
f ( | r ) =
2 0
f ( r | ) f ( ) f ( r | ) f ( ) d
( r ) 2 / 2
2
e ( r ) 1 2
/ 2
2 0
( r ) 2 / 2
= ( r )e
,
2
0 < < 2 ,
.
(11-26)
where
(r) =
2 0
( r ) 2 / 2
Notice that the knowledge about the observation r is reflected in the a-posteriori p.d.f of in Fig. 11.2 (b). It is no longer flat as the a-priori p.d.f in Fig. 11.2 (a), and it shows higher probabilities in the neighborhood of = r.
f ( )
1 2
2
f |r ( | r )
=r
Conditional Mean: We can use the conditional p.d.fs to define the conditional mean. More generally, applying (6-13) to conditional p.d.fs we get
10
E ( g( X ) | B ) =
g ( x ) f X ( x | B )dx.
(11-27)
(11-28)
y fY | X ( y | x ) dy.
(11-29)
= E (( X X |Y ) 2 | Y = y ).
(11-30)
(11-31)
y
f XY ( x, y )dy = 2 x,
1 dx = 1 | y |,
0 < x < 1,
| y| < 1,
and
fY ( y ) =
Fig. 11.3
This gives
f X |Y ( x | y ) =
and
fY |X ( y | x ) =
f XY ( x , y ) 1 = , fY ( y ) 1 | y | f XY ( x , y ) 1 = , f X ( x) 2x
(11-32) (11-33)
12
Hence
E(X |Y ) =
x f X |Y ( x | y ) dx =
2 1 | y|
x
x | y| (1 | y |) dx
1
1 x = (1 | y |) 2
E (Y | X ) =
1 | y |2 1+ | y | = = , 2 (1 | y |) 2
2 x
| y |< 1 .
0 < x < 1.
(11-34) (11-35)
yf Y | X ( y | x ) dy =
y 1 y dy = x 2 x 2x 2
= 0,
x
It is possible to obtain an interesting generalization of the conditional mean formulas in (11-28) - (11-29). More generally, (11-28) gives + (11-36) E (g( X ) |Y = y ) = g ( x ) f X |Y ( x | y ) dx . But + + + E (g( X )) = g ( x ) f X ( x ) dx = g ( x) f XY ( x , y ) dydx
= =
g ( x ) f XY ( x , y ) dxdy =
g ( x ) f ( x | y ) dx
X |Y E ( g ( X )|Y = y )
f Y ( y ) dy
E ( g ( X ) | Y = y ) f Y ( y ) dy = E {E ( g ( X ) | Y = y )} .
(11-37)
13
Obviously, in the right side of (11-37), the inner expectation is with respect to X and the outer expectation is with respect to Y. Letting g( X ) = X in (11-37) we get the interesting identity
E ( X ) = E { E ( X | Y = y )},
(11-38)
where the inner expectation on the right side is with respect to X and the outer one is with respect to Y. Similarly, we have
E (Y ) = E { E (Y | X = x )}.
(11-39)
Conditional mean turns out to be an important concept in estimation and prediction theory. For example given an observation about a r.v X, what can we say about a related r.v Y ? In other words what is the best predicted value of Y given that X = x ? It turns out that if best is meant in the sense of minimizing the mean square error between Y and , then the conditional mean of Y given X = x, its estimate Y i.e., E (Y | X = x ) is the best estimate for Y (see Lecture 16 for more on Mean Square Estimation).
15