Professional Documents
Culture Documents
1
Motivation: Estimating Mean
2
Motivation: Estimating Mean
Example: X1 , X2 , . . . , Xn i.i.d. N (0, 1)
▶ Generate 6 sets of X1 , X2 , . . . , Xn (sample paths).
▶ Note that every sample path appears to be converging to the mean 0 as n increases.
3
Almost Sure Convergence
This means that the set of ω such that the sample path or sequence X1 (ω), X2 (ω), . . .
converges to X(ω) has probability 1.
4
Almost Sure Convergence
a.s.
Lemma. P max |Xi − X| > ϵ → 0 for any ϵ > 0 iff Xn −→ X.
i≥n
We prove the forward direction. We will see that the converse holds when discussing
convergence in probability.
Let Mn = maxi≥n |Xi − X|, which is a decreasing sequence bounded below by 0. Therefore
Mn ↓ M for some M .
Then, for all ϵ > 0, P(M > ϵ) ≤ P(Mn > ϵ) → 0 as n → ∞.
Therefore, from continuity of P, P(M = 0) = 1 and Mn → 0 a.s. This is equivalent to saying
that Xn → X a.s.
5
Qn
Example †: X1 , X2 , . . . i.i.d. ∼ Bern (1/2). Let Yn = 2n i=1 Xi . Show that the sequence
Yn converges to 0 a.s.
For 0 < ϵ < 1, we have
P max |Yi − 0| > ϵ ≤ P(Xi = 1 ∀ i ≤ n)
i≥n
1
= →0
2n
as n → ∞.
Important example of a.s. convergence is the Strong Law of Large Numbers (SLLN): If
X1 , X2 , . . . are i.i.d. (in fact pairwise independence suffices) with finite mean µ, then
n
1X a.s.
Sn = Xi −→ µ
n i=1
as n → ∞. Proof of the SLLN is beyond the scope of this course. See [Github:
https://github.com/wptay/aipt].
6
Convergence in Mean Square
Let X1 , X2 , . . . be i.i.d. with finite mean E[X] and var(X). Then Sn → E[X] in mean square.
" n
!2 #
2
1 X
E (Sn − E[X]) =E (Xi − E[X])
n
i=1
" #
1 X
= 2E (Xi − E[X])(Xj − E[X])
n
i,j
n
1 X
= E(Xi − E[X])2 ∵ Xi s are independent
n2
i=1
1
= var(X) → 0
n
as n → ∞.
Note proof works even if the Xi s are only pairwise independent or even only uncorrelated.
7
Convergence in Mean Square
m
1
Y
P max |Xi | < ϵ = lim P max |Xi | < ϵ = lim 1−
i≥n m→∞ n≤i≤m m→∞ i
i=n
m
! m
!
1 1 n
X X
= lim exp log 1 − ≤ lim exp − ≤ lim = 0.
m→∞ i m→∞ i m→∞ m
i=n i=n
8
Convergence in Mean Square
9
Convergence in Probability
p
If Xn → X a.s., then Xn −→ X.
For proof, see [ Github: https://github.com/wptay/aipt ].
If Xn → X a.s., then 1{|Xn −X|>ϵ} → 0 a.s. The Dominated Convergence Theorem tells
us that
Example ♣:
1
P(|Xn − 0| > ϵ) = → 0,
n
so the sequence converges in probability. But we saw before that it does not converge a.s.
10
Convergence in Probability
p
If Xn → X in mean square, then Xn −→ X.
From the Markov inequality,
1
P(|Xn − X| > ϵ) ≤ E(Xn − X)2 → 0.
ϵ2
Converse is not true. Example:
0 w.p. 1 − n1 ,
Xn =
n 1
w.p. n.
Convergence in probability:
1
P(|Xn − 0| > ϵ) = → 0.
n
But,
1
E(Xn − 0)2 = n2 · = n → ∞.
n
Convergence in probability is weaker than both convergence a.s. and in mean square.
11
Weak Law of Large Numbers
Suppose X1 , X2 , . . . are such that EXi = 0, EXi2 = σ 2 < ∞ and E[Xi Xj ] ≤ 0 for i ̸= j. Then,
n
1 X p
Xi −→ 0 as n → ∞.
n
i=1
Proof:
We have
" n
!2 # " n
#
1 X 1 X X
E Xi = 2E Xi2 +2 Xi Xj
n n
i=1 i i<j
n
1 X σ2
≤ EXi2 = .
n2 n
i
12
Weak Law of Large Numbers
n
!
1 X 1 σ2
P Xi > ϵ ≤ → 0,
n ϵ2 n
i=1
as n → ∞.
13
Convergence in distribution
d
We say that Xn −→ X if for all continuous bounded functions f ,
d
Equivalent to saying Xn −→ X ⇐⇒ FXn (t) → FX (t) for all continuity points t of
FX (·). [Github: https://github.com/wptay/aipt]
Sn − µ d
Zn = √ −→ N (0, 1).
σ/ n
14
CLT Example
15
Characteristic Functions
φ(t) = E eitX ,
= E[cos(tX)] + iE[sin(tX)],
√
where i = −1.
This is the Fourier transform of pdf of X.
Example: The characteristic function of the Gaussian distribution N (µ, σ 2 ) is
σ 2 t2
φ(t) = eiµt− 2 .
In particular, when µ = 0, σ 2 = 1,
t2
φ(t) = e− 2 .
16
Fourier Inversion
R
Suppose that φ(t) is a characteristic function of a r.v. X. If |φ(t)| dt < ∞, then X has
pdf
Z
1
fX (x) = φ(t)e−itx dt.
2π
d
If Xn −→ X then φn (t) → φ(t), ∀ t ∈ R.
Obvious because φn (t) = E eitXn = E[cos tXn ] + iE[sin tXn ], cos(tx) and sin(tx) are
d
bounded continuous functions, and Xn −→ X.
17
Fourier Inversion
when n → ∞, which implies that this sequence does not converge in distribution.
18
Levy’s Continuity Theorem
Suppose Xn has characteristic function φn (t) → φ(t), and φ(t) is continuous at t = 0. Then there exists
d
X s.t. φ(t) is the characteristic function of X and Xn −→ X.
For proof, see [Github: https://github.com/wptay/aipt].
Proof of CLT for i.i.d. sequence. Without loss of generality, we assume that µ = 0, σ = 1. We have
n
1 X
Zn = √ Xj .
n
j=1
Let
Pn
j=1
Xj
φn (t) = E exp it √
n
n h X i
Y j
= E exp it √
n
j=1
X
n
it √ 1
= Ee n
n
t
= φ1 √ .
n
19
CLT Proof
From the Taylor series expansion, we have
φ′′
1 (0)
φ1 (t) = φ1 (0) + φ′1 (0)t + + o(t2 ).
2!
Since φ1 (0) = 1, φ′1 (0) = E iX1 ei0X1 = iEX1 = 0, φ′′
1 (0) = E (iX1 )
2 = −1, we obtain
t2
φ1 (t) = 1 − + o(t2 ).
2
Therefore,
n
t
φn (t) = φ1 √
n
n
t2 n→∞ t2
= 1− + o(t2 /n) −−−−→ e− 2 ,
2n
the characteristic function of N (0, 1) (certainly continuous at t = 0). From Levy’s Continuity Theorem,
we then have
d
Zn −→ N (0, 1).
20
CLT for Discrete R.V.
Pn Xi −1/2
Example: X1 , X2 , . . . are i.i.d. ∼ Bern (1/2). Note that Zn = i=1
√
n/2
is discrete and has
no pdf. But its cdf converges to the Gaussian cdf.
21
CLT for Random Vectors
22
Summary
23