You are on page 1of 34

Chapter 3

Conditional Expectation

zrrrennrlitigjn
Essence of PNB or
EX 沙

1

i

1313

P BIA RA t PCBHDP

At
A B

A B

i
⼝ A E AIDS


PLAID ⼩ ⾞ 器
Given B Africa
P

DslArifiak2

A B Region of Interest RE

A B PND Shifa
i

GditiondEx .EXIY Ixifxn.ly

y

f

fxfyjdx

Eh
E
X income for univers
ig graduates

An

EX
Business Humanities
Y Science Engineering

S E B H

hn

iiixiy
Property of hg 比作 ⼩
ohgdependsonynnnrrernnnnnnnwnnnnnnnnnnnnn

Eh Y ⼆ EX

More generally

Ed IYEAKEYIYEAD
Éy E

Note
xicts.no
jxfdyjdx
h

Íxiptxily y xidy
Proof
Eh D fhyfydy

ffxffylyldxdy
ffxflx y dxdy
EX

LHS ⼆

y
fhyt 的邱 f dy

ffxfdyldy
X
It Nfǎy
y th

ffx IDEA f x y dxdg
yx

EKIXAD
RHS
Extending Elly y END
h.ly

YEZ

Properties

Z depends
on
y
on measurable
G

Elif

IN
11

EAZ ⼆
点 X
where A isly_measurable
EExteudingEXM EXH .in

Def ZEENA
Z depends on
A
lie Amearablej
EAZ EAX VAEA

ERIAFECXID
E X 2 IA ⼆ 0

E X 2 Y 0 畑
x 2I Y x
XECATD

2 X EXH

fx
o dy

0Hilbertspae
Q existence

unique of EEXH

AYesp.sn

X ⼆ 0

0 A
EYD

go a
P two measures on A EA

vap
PLA ⼆ 0 ⼆
EAKEXIA 0
RN Lemma

⼆ a
unique ZEA

UA f Zdp
ie
ll Il

Ey HZID
Entering
X xtx
Chapter 3

Conditional expectation

We will define conditional expectation E(X|A), where X is a r.v. and A is a sub- -algebra.
This will be used to define martingales later.

3.1 Basic concepts

Given (⌦, F, P ), let A, B 2 F with P (B) > 0, and X, Y are two r.v.s.

• P (A) is a measure of uncertainty of an event A.


e.g. what is the chance of raining for tomorrow?

• Conditional probability P (A|B) is an updated probability of A after getting new infor-


mation B:
P (A \ B)
P (A|B) = .
P (B)
e.g. It has not rained for the past week, and what is the chance of raining tomorrow?

• Without any information about X, the “best” guess of X is

– the population mean µ = EX (for not so heavy-tailed distribution), since

EX = arg min E(C X)2 .


C

1
– the population median m = F (1/2) (for heavy-tailed distribution), since

m = arg min E|C X|.


C

– the population mode (for really heavy-tailed distribution):

mode = arg max f (C).


C

e.g. The salary of a fresh university graduate is $10,000.

13
• Given some extra information B, the updated “best” guess of X is

– the sub-population (or sectional) mean µB = E(X|B) (for not so heavy-tailed dis-
tribution), since
µB = arg min E{(C X)2 |B}.
C

Note: E(X|B) makes a better guess of X than EX in the L2 sense in that

E[E(X|B) X]2  E[EX X]2 .

– the sub-population (or sectional) median mB = F 1 (1/2|B) (for heavy-tailed distri-


bution), since
mB = arg min E{|C X||B}.
C

– the sub-population (or sectional) mode (for really heavy-tailed distribution):

modeB = arg max f (C|B).


C

e.g. The salary of a fresh university graduate is $15,000, if (s)he works in fin-tech industry.

3.2 Examples of conditional probability

3.3 Definitions of E(X|Y ), E(X| (Y )) and E(X|A)

Given (⌦, F, P ), let A, B 2 F with P (B) > 0, and X, Y are two r.v.s.

• The conditional expectation of X given B (P (B) > 0):

E(XIB )
E(X|B) = ,
P (B)

where IB (!) = 1 if ! 2 B, and 0 otherwise.

• Take B = {Y = y}, we get


1
X
E(X|Y = y) = xi P (X = xi |Y = y), X is a discrete r.v.
i=1
Z 1
= xfX|Y (x|y)dx, X is a continuous r.v.
1

14
An example

Example 3.1 Let X ⇠ U (0, 1] with pdf fX (x) = I(0,1] (x). Let

(i 1) i
Ai = {X 2 ( , ]}, i = 1, 2, ...., n.
n n
Pn
Clearly, ⌦ = i=1 Ai and P (Ai ) = 1/n. From the above definition, we get
R Z i/n
xfX (x)dx n i/n 2i 1
Ai
E(X|Ai ) = =n xdx = x2 = .
P (Ai ) (i 1)/n 2 (i 1)/n 2n
2i 1
The “best” guess for X, given Ai ’s (or (A1 , ..., Am )), is E(X|Ai ) = 2n
, i.e.,

1
Xn = E(X|A1 ) = if A1 occurs (with prob. n 1 )
2n
3
E(X|A2 ) = if A2 occurs (with prob. n 1 )
2n
..........
2n 1
E(X|An ) = if An occurs (with prob. n 1 ).
2n
These are the updated expectations on the new space (A1 , ..., Am ). For example, when n = 5,
Xn is a r.v. taking values = 0.1, 0.3, 0.5, 0.7, 0.9 with equal probabilities 1/5.

Here is a graphical illustration.

Definition 3.1 Let h(y) = E(X|Y = y). Then define E(X|Y ) to be

Z := h(Y ).

n
Theorem 3.1 Let Y be a discrete r.v. taking values {y1 , ..., ym } (m could be 1). Then,

• Z is (Y )-measurable (i.e., depending on Y ).

• EA X = EA Z, i.e., E[(X Z)IA ] = 0, for all A 2 (Y ).

Proof. Note that (Y ) = ({Y = y1 }, ..., {Y = ym }).

S
• {Z  t} = {h(Y )  t} = {k:h(yk )t} {Y = yk } 2 (Y ). So h(Y ) is (Y )-measurable

15
END 1
EXH

0Ah Az Asl A4 f
EKÌ
• For Ai = {Y = yi }. we have

E (h(Y )IAi ) = E (h(yi )IAi ) = h(yi )P (Ai ) = E(X|Ai )P (Ai ) = E(XIAi ). (3.1)

For general A 2 (Y ) = (A1 , ..., Am ), it can be shown that A = [i2I Ai , where I ⇢


{1, ..., m}. Then,
X X X
E(XIA ) = E(X I Ai ) = E(XIAi ) = E[h(Y )IAi ] (from (3.1))
i2I i2I i2I
!
X
= E h(Y ) I Ai = EA h(Y ).
i2I

For discrete r.v. Y , knowing Y is equivalent to knowing (Y ). So we define

E(X| (Y )) =: E(X|Y ).

Alternatively, the two properties in Theorem 3.1 can be used to define E(X| (Y )) for all r.v.’s
Y (discrete or otherwise).

Definition 3.2 If E|X| < 1, we define E(X| (Y )) to be any r.v. Z such that:

• Z is (Y )-measurable, i.e. {Z  t} ⇢ (Y ), 8t 2 R.

• EA X = EA Z, i.e., E[(X Z)IA ] = 0, for all A 2 (Y ).

This definition can be generalized even further by replacing (Y ) with A.

Definition 3.3 Give (⌦, F, P ) and a sub- -algebra A ⇢ F. If E|X| < 1, we define E(X|A)
to be any r.v. Z such that

1. Z is A-measurable; (i.e., Z depends on A)

2. EA X = EA Z, i.e., E[(X Z)IA ] = 0, for all A 2 A.

3.4 Existence and uniqueness of E(X|A)

Let
µ(A) := P (A) and ⌫(A) := EA X.
Then µ and ⌫ are two -finite measures on (⌦, A). Then EA X = EA Z can be rewritten as
Z
⌫(A) = Zdµ
A

Then the result follows from Radon-Nikodym Lemma as shown below.

16
Theorem 3.2

• E(X|A) is the Radon-Nikodym derivative of (signed) measure ⌫(A) := EA X w.r.t. P .

• Thus, E(X|A) exists, and is unique (by Radon-Nikodym Lemma).

Proof. Let A 2 A, and ⌫(A) := EA X.

(A) Suppose X 0. Then

– P and ⌫(A) are two -finite measures on (⌦, A) (Exercise).


– ⌫ is dominated by P , i.e., ⌫ ⌧ P.
Proof. If P (A) = EIA = 0, so IA = 0 a.s., hence ⌫(A) = E(XIA ) = 0.

By Radon-Nikodym Theorem, there exists a unique Z =: d⌫/dP such that

(a) Z is A-measurable;
Z
(b) ⌫(A) =: EA X = ZdP = EA Z, 8A 2 A.
A

(B) Suppose E|X| < 1.


Write X = X + X , where X + = max(0, X) and X = min(0, X). Applying (A) to
X + and X , respectively, to get the desired result.

3.5 Geometric interpretation of E(X|A)

Define L2 (⌦, G, P ) = {W 2 (⌦, G, P ) : EW 2 < 1}. Then, E(X|A) has a very nice interpreta-
tion.

Theorem 3.3 We have

E(X|A) = arg min E(X Y )2 .


Y 2L2 (⌦,A,P )

Hence, E(X|A) is the projection of X onto L2 (⌦, A, P ), i.e., the closest point in this subspace
to X.

Proof. Note that

E(X Y )2 = E[(X E(X|A)) + (E(X|A) Y )]2


= E(X E(X|A))2 + E(E(X|A) Y )2 + 2C,

where C = 0 (to be shown later). Therefore,

E(X Y )2 = E(X E(X|A))2 + E(E(X|A) Y )2 E(X E(X|A))2 ,

17
and the equality is attained at Y = E(X|A) 2 A.

It remains to show C = 0:
C = E(X E(X|A))(E(X|A) Y )
= E[E(X E(X|A))(E(X|A) Y )|A]
= E{[E(X|A) Y ][E(X E(X|A))|A]}
= E{(E(X|A) Y ) ⇥ 0}
= 0.

Alternative geometric definition of E(X|A) Z


One can define E(X|A) by using the above projection in L2 (⌦, A, P ), and then use truncation
idea to generalize to L1 (⌦, A, P ).

1. Consider everything in L2 (⌦, A, P ). We have


EA Z = EA X, 8A 2 A,
() E(X Z)IA = 0, 8A 2 A

Loi
() E(X
() (X
Z)Y = 0,
Z) ? Y,
8A 2 A
8A 2 A
ye
(Here ? means perpendicular, not independent.)
() Z = arg minY 2L2 (⌦,A,P ) E(X Y )2 .
2. Next consider everything in L1 (⌦, A, P ). Define
Zn = E(X + ^ n|A) E(X ^ n|A),

EU
and Z = limn Zn .

We no longer have the concept of projection in L1 (⌦, A, P ), but it can be shown


EA Z = EA X, 8A 2 A,
Although it is attempting to define E(X|A) using this geometric argument, it is simpler to
define it by a more abstract definition, as we did before.
I

3.6 Properties of E(X|A)

Recall that, in order to show that E(W |A) = Z, it suffices to show that
It

Ai
(a) Z 2 A;
(b) EA W = EA Z, for any A 2 A.
ZE
18 EAW ⼆ 点 2
KAEA
take
y 有 EA


If EKD In ⼆ 0
VAFA

E 熙州
Eyz y
0
YEA
WE A Eh ⼦ D
xwxy 㸂 X

2 X

fx
EX iy

OHilbertspae ENIA

ln UA

EA
Properties of E(X|A) are very much like their unconditional counterparts. Here are some
examples.

1. (Linearity). E(aX + bY |A) = aE(X|A) + bE(Y |A).


Proof. Take Z =: aE(X|A) + bE(Y |A) and W = aX + bY .

(a) Clearly Z 2 A since E(X|A) 2 A and E(Y |A) 2 A.


(b) For all A 2 A, we have
EA (aX + bY ) = aEA X + bEA Y = aEA (E(X|A)) + bEA (E(Y |A))
= EA (aE(X|A) + bEA (E(Y |A)) = EA Z.

2. (Monotonicity). If X < Y , then E(X|A)  E(Y |A).


[Equivalently, if Y > 0, then E(Y |A) 0.]
Proof. Since X < Y , we have EA [E(X|A)] = EA X  EA Y = EA [E(Y |A)]. Take
A = {! : E(X|A) E(Y |A) ✏ > 0} 2 A. Then P (A) = 0, 8✏ > 0, because otherwise

EA [E(X|A)] EA [E(Y |A)] = EA [E(X|A) E(Y |A)] EA ✏ = ✏P (A) > 0

which is a contradiction.

3. (Jensen’s inequality). If is convex, E|X| < 1 and E| (X)| < 1, then

(E(X|A))  E( (X)|A).

Proof. Take µA = E(X|A) 2 A. Since is convex, there exists ⇢µA such that

(x) (µA ) + ⇢(x µA ), 8x 2 R. (6.2)

Put x = X and take conditional expectation, we get

E( (X)|A) E( (µA )|A) + E[⇢µA (X µA )|A]


= (µA ) + ⇢µA (µA µA )
= (µA ) = (E(X|A)).

4. (Perfect information). If X 2 A, then E(X|A) = X. In particular, E(C|A) = C.


That is, if we know X, then our best “guess” of X is itself.
Proof. Take Z = X, clearly,

(a) Z 2 A by assumption;
(b) for all A 2 A, we have EA (Z) = EA (X).

5. (Partial information: taking out what is known.) If X 2 A, and E|Y | < 1 and
E|XY | < 1, then E(XY |A) = XE(Y |A).
Proof. Let Z = XE(Y |A).

(a) Clearly Z 2 A since X, E(Y |A) 2 A.


(b) For all A 2 A, we need to show that

EA Z =: EA [XE(Y |A)] = EA (XY ). (6.3)

19
Proof of (b). We prove this result in the most typical manner.

• First suppose that X = IB with B 2 A. Since A 2 A, we have A\B 2 A. Therefore,

FFMBY
(6.3) holds since

EA [IB E(Y |A)] = EA\B [E(Y |A)] = EA\B (Y ) = EA (IB Y ).X


• We can extend (i) to simple X by linearity.
ECIAIBY
• We can extend (ii) to nonnegative X and Y by the monotone convergence theorem.
• We can extend (iii) to general r.v.s X, Y by splitting them into positive and negative
parts.

6. (No information) If X is independent of A, then E(X|A) = E(X).


i.e., if we don’t know anything about X, then our best “guess” of X is EX.
Proof. Take Z = EX, clearly,

(a) Clearly EX 2 A.
(Any constant C is A-measurable as {C  t} is either ; or ⌦.)
(b) For all A 2 A, we have

EA (Z) = EA (EX) = (EX)EA 1 = (EX)EIA = E(XIA ) = EA X,

where the second last equality follows from independence of X and A 2 A.

7. (Tower property). A1 ⇢ A2 ⇢ F, then

(i) E(X|A1 ) = E[E(X|A1 )|A2 ];


(ii) E(X|A1 ) = E[E(X|A2 )|A1 ]

That is, the smaller -algebra always wins.


Proof.

(i) Notice that E(X|A1 ) 2 A1 , so (i) follows easily.


(ii) Let Y = E(X|A2 ) and Z = E(X|A1 ). We need to show that Z = E[Y |A1 ].
(a) Clearly Z 2 A1 .
(b) For all A 2 A1 , then A 2 A2 , and so EA Z = EA X = EA Y.

8. (Average of local averages is the global average, or double expectation rule.)


EX = E(E(X|A)).
Proof. Note that EA X = EA (E(X|A)) whenever A 2 A. Take A = ⌦.

20
Appendix: An introduction to probability

Sets

Definition 3.4 ⌦ is a sample space.

• A set is a collection of elements in ⌦, denoted by A, B, ...

• A class (or family) is a collection of sets of ⌦, denoted by


A, B, C, D, E, F, G, ...

• ! 2 A : ! is an element of A.

• A ⇢ B : the set A is contained in the set B.

• An empty set ; ⇢ A for any set A.

Definition 3.5 Define

Ac = {! :! 62 A} (complement)
A[B = {! :! 2 A, or ! 2 B} (union)
A\B = {! :! 2 A, and ! 2 B} (intersection)
[1
n=1 An = {! :! 2 An for some n}
\1
n=1 An = {! :! 2 An for all n}.

Probability measure and random variables

A probability is a (set) function from a -algebra to [0, 1]. It is defined on -algebras, since the
power set P(⌦) can be too large unless ⌦ is finite or countable.

Definition: A class F is a -algebra if

• ⌦ 2 F,

• Ac 2 F whenever A 2 F,

• [1
n=1 An 2 F whenever An 2 F, n 1.

Definition: P is a probability measure on (⌦, F) if

• 8A 2 F, 0  P (A)  1,

21
• P (⌦) = 1,
P1 P1
• P( 1 An ) = 1 P (An ).

(⌦, A, P ) is called the probability space.

Definition 3.6 A random variable X : ⌦ ! R is a measurable function, i.e.,

{! : X(!)  r} 2 F.

Example 3.2 Toss a coin. We can construct a probability space (⌦, F, P ):

⌦ = {H, T }, where ”H=Head” or ”T=Tail”,


F = {all possible subsets of ⌦} = {;, H, T, ⌦},
P (;) = 0, P (H) = 1/2, P (T ) = 1/2, P (⌦) = 1.

Here ! = H or T , and |F| = 22 = 4. Let

X = X(w) = I{w = H} = I{the toss is ”Head”}.

It can be shown that X is a random variable (r.v.), since

{X = 0} = {w : X(w) = 0} = T ⇢ F, {X = 1} = {w : X(w) = 1} = H ⇢ F.

Expectation

Definition 3.7 Let X be a r.v. on (⌦, A, P ).

Pn Pn
• For a simple r.v. X = 1 ai IAi with 1 Ai = ⌦, Ai 2 A, define
n
X
EX = ai P (Ai ).
1

• For X 0, there exists simple nonnegative r.v.’s Xn (!) % X(!) for every !. We define

EX = lim EXn  1
n!1

• For general r.v. X, if either EX + < 1 or EX < 1, then define

EX = EX + EX ,

where X + = max{X, 0} = XI{X 0} , X = max{ X, 0} = XI{X0} .

22
Properties of expectations

Assume that X, Y, X1 , ..., Xn below are all r.v.’s on (⌦, A, µ).

• (Absolute integrability) EX is finite if and only if E|X| is finite.

• (Linearity). E(aX + bY ) = aEX + bEY.


P1 P1
• ( -additivity). If A = i=1 Ai , then EA X = i=1 EAi X.

• (Positivity). If X 0 a.s., then EX 0.

• (Monotonicity). If X1  X  X2 a.s., then EX1  EX  EX2 .

• (Mean value theorem). If a  X  b a.s. on A 2 A, then aP (A)  EA X  bP (A).

• (Modulus inequality). |EX|  E|X|.

• (Fatou’s Lemma). If Xn 0 a.s., then E (lim inf n Xn )  lim inf n EXn .

• (Monotone Convergence Theorem). If 0  Xn % X, then limn EXn = EX =


E limn Xn .

• (Dominated Convergence Theorem). If Xn ! X a.s., |Xn | < Y a.s. for all n, and
EY < 1, then limn EXn = EX = E limn Xn .
P P1
• (Integration term by term). If 1 E|Xn | < 1, then |Xn | < 1, a.s. so that
P1 P1 i=1 P1 i=1
i=1 Xn converges a.s., and E ( i=1 Xn ) = i=1 EXn .

Convergence Concepts

Definition 3.8 Let X, X1 , X2 , ... be r.v.’s on (⌦, A, P ). We say that

1. Xn ! X a.s. if P (limn!1 Xn = X) = P ({! : limn!1 Xn (!) = X(!)}) = 1.

2. Xn ! X in rth mean (r > 0) if limn!1 E|Xn X|r = 0.

3. Xn ! X in prob, if limn!1 P (|Xn X| > ✏) = 0, for all ✏ > 0.

4. Xn ! X in distribution if limn!1 FXn (x) = FX (x) for all continuity points of FX (x).

Di↵erence between a.s. convergence and convergence in probability

Conceptually, the mode of almost sure convergence keeps track of the values of random variables
at the same sample point on and on. It requires the convergence of X(!) at almost all sample
points. The mode of convergence in probability requires that the event on which Xn and X
di↵er more than a fixed amount shrinks in probability. This event can be di↵erent when n = 10

23
and when n = 11 and so on. Because of this, the convergence in probability does not imply
convergence of Xn (!) for any ! 2 ⌦. The following classical example is a vivid way to illustrate
this point.

Example. Let ⌦ = [0, 1]. Let F be the classical Borel -algebra on [0, 1] and P be the uniform
probability measure.

For m = 0, 1, 2, ... and k = 0, 1, ..., 2m 1, let


m m
X2m +k (!) = I{k2 < !  (k + 1)2 }
In plain words, we have defined a sequence of random variables made of indicator functions
on intervals of shrinking length 2 m . Yet the union of every 2m intervals completely cover the
sample space [0, 1] as m increases.

It is seen that
m
P (|Xn | > 0)  2
where m = log n/ log 2 1. Hence as n ! 1, P (|Xn | > 0) ! 0. This implies Xn ! 0 in
probability.

At the same time, the sequence


X1 (!), X2 (!), X3 (!), ....
contains infinity numbers of both 0 and 1 for any !. Thus none of such sequence converge. In
other words,
P ({! : Xn (!) converges}) = 0.
Hence, Xn does not converge to 0 in the mode of “almost surely”.

Relationships between modes of convergence

Theorem 3.4

1. Xn ! X a.s. or in Lr (r 1) =) Xn !p X =) Xn !d X.
2. If r > s > 0, then Xn ! X in Lr =) Xn ! X in Ls .
3. No other implications hold in general.

We also have some partial converses to the above results, i.e., converse results with some
additional assumptions. Here are a few examples.

Theorem 3.5 Xn !d C () Xn !p C, where C is a constant.

Theorem 3.6 (Lebesgue Dominated Convergence Theorem) If Xn !p X, |Xn |  Y


a.s. for all n, and EY r < 1 for r > 0, then Xn ! X in Lr , which in turn implies that
EXnr ! EX r .

24
Theorem 3.7 (Dominated convergence a.s. implies convergence in mean) If Xn !
X a.s., P (|Xn |  Y ) = 1 for all n, and EY r < 1 for r 0, then Xn ! X in Lr .

Theorem 3.8 (Convergence in probability sufficiently fast implies a.s. convergence)


P
If 1n=1 P (|Xn X| > ✏) < 1 for all ✏ > 0, then Xn ! X a.s.

Theorem 3.9 (Convergence sequences in probability contains a.s. subsequences) If


Xn !p X, then there exists a non-random integers n1 < n2 < ... such that Xni ! X a.s.

Theorem 3.10 (Skorokhod’s representation theorem) Suppose that Xn !d X. Then


there exist r.v.’s Y and {Yn , n 1} on ((0, 1), B(0,1) , P = (0,1) ) s.t.

(1) Yn and Y have the same d.f.’s as Xn and X. That is, Xn =d Yn , X =d Y .


(2) Yn ! Y a.s. as n ! 1.

Definition 3.9 A sequence of r.v.’s {Yn } is uniformly integrable (u.i.) if

lim sup E{|Yn |I{|Yn |>C} } = 0.


C!1 n

Theorem 3.11 (Vitali’s Theorem) Suppose that Xn !p X, and E|Xn |r < 1 all n (i.e.
Xn 2 Lr ). Then the following three statements are equivalent.

(i) {Xnr } is u.i.;


(ii) Xn ! X in Lr ; and E|X|r < 1.
(iii) E|Xn |r ! E|X|r < 1.

Some more useful theorems are given below.

Theorem 3.12 (Continuous mapping theorem) Let X1 , X2 , ... and X be k-dim random
vectors, g : Rk ! R be continuous. Then

(a). Xn ! X a.s. =) g(Xn ) ! g(X) a.s.


(b). Xn !p X =) g(Xn ) !p g(X).
(c). Xn !d X =) g(Xn ) !d g(X).

Theorem 3.13 (Slutsky’s Theorem) Let Xn !d X, Yn !p C (i.e. Yn !d C). Then

25
(a). Xn + Yn !d X + C.
(b). Xn Yn !d CX.
(c). Xn /Yn !d X/C if C 6= 0.

Skorokhod’s
representation
theorem
a.s.
-
Xn X
6AA
AA
AA fast enough or
AA subsequence
AA
AA
AA Len on
AA
AA
AA
AA
AA
AA P d
u.i. fast X - X > Xn - X
n
enough

X=C

u.i. - necessary and sufficient


DCT - sometimes too strong

? r
L-
L.TT
Xn X<

u.i.

26
Radon-Nikodym theorem

Definition 3.10 Given two measures µ and ⌫ on (⌦, F), we say that ⌫ is absolutely con-
tinuous w.r.t. µ, written as ⌫ ⌧ µ (i.e., ⌫ is dominated by µ ), if

whenever µ(A) = 0 implies ⌫(A) = 0.

Theorem 3.14 (Radon-Nikodym Theorem) If ⌫ ⌧ µ and µ is -finite, there exists a


unique F-measurable function f 0 so that
Z
⌫(A) = f dµ, 8A 2 F.
A

(f is called the “Radon-Nikodym derivative”, often written as d⌫/dµ.)

R
Remark 3.1 Clearly, ⌫ ⌧ µ and µ is -finite () 9f 0 : ⌫(A) = A f dµ.

-algebra generated by r.v.s

Definition. Given (⌦, F, P ). A is a subset of F. Let X be a r.v.

1. (A) is the smallest -algebra containing A.


1
2. (X) := ([B2B {X 2 B}) = (X (B)) .

3. (X1 , ..., Xn ) := ( (X1 ) [ ... [ (Xn )) . It is easy to see

(X1 ) ⇢ (X1 , X2 ) ⇢ ...... ⇢ (X1 , ..., Xn ).

Example 3.3 Toss a coin twice. Then we can construct a probability space (⌦, F, P ), where

⌦ = {HH, HT, T H, T T },
F = {all possible subsets of ⌦}
= {;, HH, HT, T H, T T, {HH, HT }, ...., ⌦}, |F| = 24 = 16
P (HH) = P (HT ) = P (T H) = P (T T ) = 1/4.

Let X = I{first toss is Head}, Y = I{second toss is Head}. Now

{X = 0} = {w : X(w) = 0} = {T H, T T } ⇢ F,

{X = 1} = {w : X(w) = 1} = {HH, HT } ⇢ F.
Now
(X) = (X = 0, X = 1) = ({X = 1}c , {X = 1}) = ({X = 1})

27
= ({HH, HT }) = {;, ⌦, {HH, HT }, {T H, T T }}
(Y ) = (Y = 0, Y = 1) = ({Y = 1}c , {Y = 1}) = ({Y = 1})
= ({HH, T H}) = {;, ⌦, {HH, T H}, {HT, T T }}.
Similarly, it can also be shown that

(X, Y ) = ({X = 1}, {Y = 1}) = .... = F.

(X) and (Y ) contain information about possible outcomes for the first and second tosses,
respectively. Knowing both (X) and (Y ) (i.e., information about both tosses), given by
(X, Y ), we get F.

More on fields generated by a discrete r.v.

Theorem 3.15 Let X be a discrete r.v. taking distinct values {xi , 1  i  n} (where n could
be 1) and let Ai = {! : X(!) = xi }. Then,

• {Ai , i 1} constitute a disjoint partition of ⌦.

• Choose C = {A1 , A2 , ..., An }, then

(C) = (A1 , A2 , ..., An ) = (A0 , A1 , ..., An ) = {[i2I Ai : I ⇢ {0, 1, ..., n}},

where A0 = ;. (Hint: show that the RHS forms a -algebra.)

3.7 Exercises
1. Let V ar(X|A) = E(X 2 |A) (E(X|A))2 . Show

V ar(X) = E (V ar(X|A)) + V ar (E(X|A)) .

Remark 3.2 From the exercise, we get V ar(X) V ar (E(X|A)). That is, smoothing
by local averaging (i.e. E(X|A)) reduces variance.

2. Show that if X and Y are r.v.’s with E (Y |A) = X and EX 2 = EY 2 < 1, then X = Y
a.s. (i.e. P (X = Y ) = 1). [Hint: Work out E(X Y )2 .]

Game y if
OECYHKXEA
EXEEY2

A.i 28 yinx
Malhematiedprofnznnnr

Let z Ext Then

EZ ⼆
EIEY_HD EY EY EYr.co

V4 EZYEDTEZ
⼆ EXTEF
crrv
2EXY
wv

2 EXLEXY as EXEEY2

9EECXYIA
as EXY ⼆

EY Ey
2
⼆ 比



z x_x 0 as

You might also like