You are on page 1of 23

Lecture 7: Convergence and Limit Theorems

Almost sure convergence


Convergence in mean square
Convergence in probability
Convergence in distribution

1
Motivation: Estimating Mean

Want to estimate the mean E[X] of a distribution.


Generate X1 , X2 , . . . , Xn i.i.d. drawn from the distribution and compute the sample mean
n
1X
Sn = Xi .
n i=1

Does Sn converge to E[X]? In what sense? Note that Sn is a random variable.

2
Motivation: Estimating Mean
Example: X1 , X2 , . . . , Xn i.i.d. N (0, 1)
▶ Generate 6 sets of X1 , X2 , . . . , Xn (sample paths).
▶ Note that every sample path appears to be converging to the mean 0 as n increases.

3
Almost Sure Convergence

Recall that a sequence of numbers x1 , x2 , . . . converges to x if given any ϵ > 0, there


exists an n(ϵ) such that for all n ≥ n(ϵ), |xn − x| < ϵ. Equivalently, maxi≥n |xi − x| → 0
as n → ∞.

Suppose a sequence of random variables X1 , X2 , . . . defined over the same probability


space (Ω, F, P). For every ω ∈ Ω, we get a sample path X1 (ω), X2 (ω), . . ., which is a
sequence of numbers.

We say Xn → X almost surely (a.s.) or with probability 1 (w.p. 1) if


n o
P ω : lim Xn (ω) = X(ω) = 1.
n→∞

This means that the set of ω such that the sample path or sequence X1 (ω), X2 (ω), . . .
converges to X(ω) has probability 1.

4
Almost Sure Convergence

 
a.s.
Lemma. P max |Xi − X| > ϵ → 0 for any ϵ > 0 iff Xn −→ X.
i≥n

We prove the forward direction. We will see that the converse holds when discussing
convergence in probability.
Let Mn = maxi≥n |Xi − X|, which is a decreasing sequence bounded below by 0. Therefore
Mn ↓ M for some M .
Then, for all ϵ > 0, P(M > ϵ) ≤ P(Mn > ϵ) → 0 as n → ∞.
Therefore, from continuity of P, P(M = 0) = 1 and Mn → 0 a.s. This is equivalent to saying
that Xn → X a.s.

5
Qn
Example †: X1 , X2 , . . . i.i.d. ∼ Bern (1/2). Let Yn = 2n i=1 Xi . Show that the sequence
Yn converges to 0 a.s.
For 0 < ϵ < 1, we have
 
P max |Yi − 0| > ϵ ≤ P(Xi = 1 ∀ i ≤ n)
i≥n
1
= →0
2n
as n → ∞.
Important example of a.s. convergence is the Strong Law of Large Numbers (SLLN): If
X1 , X2 , . . . are i.i.d. (in fact pairwise independence suffices) with finite mean µ, then
n
1X a.s.
Sn = Xi −→ µ
n i=1

as n → ∞. Proof of the SLLN is beyond the scope of this course. See [Github:
https://github.com/wptay/aipt].

6
Convergence in Mean Square

We say Xn → X in mean square or L2 if

lim E(Xn − X)2 = 0.


n→∞

Let X1 , X2 , . . . be i.i.d. with finite mean E[X] and var(X). Then Sn → E[X] in mean square.
" n
!2 #
 2
 1 X
E (Sn − E[X]) =E (Xi − E[X])
n
i=1
" #
1 X
= 2E (Xi − E[X])(Xj − E[X])
n
i,j
n
1 X
= E(Xi − E[X])2 ∵ Xi s are independent
n2
i=1
1
= var(X) → 0
n
as n → ∞.
Note proof works even if the Xi s are only pairwise independent or even only uncorrelated.
7
Convergence in Mean Square

Convergence in mean square does not imply convergence a.s.


Example ♣: Let X1 , X2 , . . . be independent random variables such that
 1
0 w.p. 1 − n
,
Xn = 1
1 w.p. n .
  1
E (Xn − 0)2 = n
→ 0 as n → ∞. Therefore, this sequence converges to 0 in mean square.
Pm 1 m
For any 0 < ϵ < 1, using the inequalities log(1 − x) ≤ −x and i=n i
≥ log n
,

    m 
1
Y 
P max |Xi | < ϵ = lim P max |Xi | < ϵ = lim 1−
i≥n m→∞ n≤i≤m m→∞ i
i=n
m
! m
!
1 1 n
X   X
= lim exp log 1 − ≤ lim exp − ≤ lim = 0.
m→∞ i m→∞ i m→∞ m
i=n i=n

Therefore, it does not converge a.s.

8
Convergence in Mean Square

Convergence a.s. does not imply convergence in mean square.

From Example †, Yn converges to 0 a.s., but


1
E (Yn − 0)2 = 22n n = 2n ,
 
2
the sequence does not converge in mean square.

9
Convergence in Probability

A sequence X1 , X2 , . . . converges to a random variable X in probability if for any ϵ > 0,

lim P(|Xn − X| > ϵ) = 0,


n→∞

p
If Xn → X a.s., then Xn −→ X.
For proof, see [ Github: https://github.com/wptay/aipt ].
If Xn → X a.s., then 1{|Xn −X|>ϵ} → 0 a.s. The Dominated Convergence Theorem tells
us that

E1{|Xn −X|>ϵ} = P(|Xn − X| > ϵ) → 0.

Example ♣:
1
P(|Xn − 0| > ϵ) = → 0,
n
so the sequence converges in probability. But we saw before that it does not converge a.s.
10
Convergence in Probability
p
If Xn → X in mean square, then Xn −→ X.
From the Markov inequality,
1
P(|Xn − X| > ϵ) ≤ E(Xn − X)2 → 0.
ϵ2
Converse is not true. Example:

 0 w.p. 1 − n1 ,
Xn =
 n 1
w.p. n.

Convergence in probability:
1
P(|Xn − 0| > ϵ) = → 0.
n
But,
1
E(Xn − 0)2 = n2 · = n → ∞.
n
Convergence in probability is weaker than both convergence a.s. and in mean square.
11
Weak Law of Large Numbers

Suppose X1 , X2 , . . . are such that EXi = 0, EXi2 = σ 2 < ∞ and E[Xi Xj ] ≤ 0 for i ̸= j. Then,

n
1 X p
Xi −→ 0 as n → ∞.
n
i=1

Proof:

We have
" n
!2 # " n
#
1 X 1 X X
E Xi = 2E Xi2 +2 Xi Xj
n n
i=1 i i<j
n
1 X σ2
≤ EXi2 = .
n2 n
i

12
Weak Law of Large Numbers

From Chebyshev’s inequality, we then have for any ϵ > 0,

n
!
1 X 1 σ2
P Xi > ϵ ≤ → 0,
n ϵ2 n
i=1

as n → ∞.

13
Convergence in distribution

d
We say that Xn −→ X if for all continuous bounded functions f ,

E[f (Xn )] → E[f (X)].

d
Equivalent to saying Xn −→ X ⇐⇒ FXn (t) → FX (t) for all continuity points t of
FX (·). [Github: https://github.com/wptay/aipt]

Central Limit Theorem (CLT) for i.i.d. sequences.


ConsiderPi.i.d. random variables X1 , X2 , . . ., with EXi = µ and var Xi = σ 2 < ∞. Let
n
Sn = n1 i=1 Xi . Then

Sn − µ d
Zn = √ −→ N (0, 1).
σ/ n

14
CLT Example

15
Characteristic Functions

Given a random variable X, the characteristic function of X is φ : R 7→ C given by

φ(t) = E eitX ,
 

= E[cos(tX)] + iE[sin(tX)],

where i = −1.
This is the Fourier transform of pdf of X.
Example: The characteristic function of the Gaussian distribution N (µ, σ 2 ) is
σ 2 t2
φ(t) = eiµt− 2 .

In particular, when µ = 0, σ 2 = 1,
t2
φ(t) = e− 2 .

16
Fourier Inversion

R
Suppose that φ(t) is a characteristic function of a r.v. X. If |φ(t)| dt < ∞, then X has
pdf
Z
1
fX (x) = φ(t)e−itx dt.

d
If Xn −→ X then φn (t) → φ(t), ∀ t ∈ R.
Obvious because φn (t) = E eitXn = E[cos tXn ] + iE[sin tXn ], cos(tx) and sin(tx) are
d
bounded continuous functions, and Xn −→ X.

17
Fourier Inversion

Converse not true. The sequence Xn ∼ N (0, n) has characteristic function


nt2 n→∞
φn (t) = e− 2 −−−−→ 0, ∀ t ̸= 0 and φn (0) = 1, ∀ n. Therefore, φn (t) converges.
But for all x ∈ R,
  
1 x 1
P(Xn < x) = 1 + erf √ → ,
2 2n 2

when n → ∞, which implies that this sequence does not converge in distribution.

18
Levy’s Continuity Theorem
Suppose Xn has characteristic function φn (t) → φ(t), and φ(t) is continuous at t = 0. Then there exists
d
X s.t. φ(t) is the characteristic function of X and Xn −→ X.
For proof, see [Github: https://github.com/wptay/aipt].
Proof of CLT for i.i.d. sequence. Without loss of generality, we assume that µ = 0, σ = 1. We have
n
1 X
Zn = √ Xj .
n
j=1

Let
  Pn 
j=1
Xj
φn (t) = E exp it √
n
n h  X i
Y j
= E exp it √
n
j=1
 X
n
it √ 1
= Ee n

n
t

= φ1 √ .
n
19
CLT Proof
From the Taylor series expansion, we have

φ′′
1 (0)
φ1 (t) = φ1 (0) + φ′1 (0)t + + o(t2 ).
2!
   
Since φ1 (0) = 1, φ′1 (0) = E iX1 ei0X1 = iEX1 = 0, φ′′
1 (0) = E (iX1 )
2 = −1, we obtain

t2
φ1 (t) = 1 − + o(t2 ).
2
Therefore,
n
t

φn (t) = φ1 √
n
 n
t2 n→∞ t2
= 1− + o(t2 /n) −−−−→ e− 2 ,
2n

the characteristic function of N (0, 1) (certainly continuous at t = 0). From Levy’s Continuity Theorem,
we then have
d
Zn −→ N (0, 1).

20
CLT for Discrete R.V.
Pn Xi −1/2
Example: X1 , X2 , . . . are i.i.d. ∼ Bern (1/2). Note that Zn = i=1

n/2
is discrete and has
no pdf. But its cdf converges to the Gaussian cdf.

21
CLT for Random Vectors

The CLT applies to i.i.d. sequences of random vectors.


Let X1 , X2 , . . . be i.i.d. with each having finite mean µ and non-singular covariance
matrix Σ.
n
1 X d
Zn = √ (Xi − µ) −→ N (0, Σ)
n i=1

22
Summary

23

You might also like