Lecture 7: Convergence and Limit Theorems

Lecture 7: Convergence and Limit Theorems
Almost sure convergence

Convergence in mean square
Convergence in probability
Convergence in distribution
1
Motivation: Estimating Mean
Want to estimate the mean E[X] of a distribution.

Generate X1 , X2 , . . . , Xn i.i.d. drawn from the distribution and compute the sample mean
n
1X
Sn = Xi .
n i=1
Does Sn converge to E[X]? In what sense? Note that Sn is a random variable.
2
Motivation: Estimating Mean
Example: X1 , X2 , . . . , Xn i.i.d. N (0, 1)
▶ Generate 6 sets of X1 , X2 , . . . , Xn (sample paths).
▶ Note that every sample path appears to be converging to the mean 0 as n increases.
3
Almost Sure Convergence
Recall that a sequence of numbers x1 , x2 , . . . converges to x if given any ϵ > 0, there

exists an n(ϵ) such that for all n ≥ n(ϵ), |xn − x| < ϵ. Equivalently, maxi≥n |xi − x| → 0
as n → ∞.
Suppose a sequence of random variables X1 , X2 , . . . defined over the same probability

space (Ω, F, P). For every ω ∈ Ω, we get a sample path X1 (ω), X2 (ω), . . ., which is a
sequence of numbers.
We say Xn → X almost surely (a.s.) or with probability 1 (w.p. 1) if

n o
P ω : lim Xn (ω) = X(ω) = 1.
n→∞
This means that the set of ω such that the sample path or sequence X1 (ω), X2 (ω), . . .
converges to X(ω) has probability 1.
4
Almost Sure Convergence

a.s.
Lemma. P max |Xi − X| > ϵ → 0 for any ϵ > 0 iff Xn −→ X.
i≥n
We prove the forward direction. We will see that the converse holds when discussing
convergence in probability.
Let Mn = maxi≥n |Xi − X|, which is a decreasing sequence bounded below by 0. Therefore
Mn ↓ M for some M .
Then, for all ϵ > 0, P(M > ϵ) ≤ P(Mn > ϵ) → 0 as n → ∞.
Therefore, from continuity of P, P(M = 0) = 1 and Mn → 0 a.s. This is equivalent to saying
that Xn → X a.s.
5
Qn
Example †: X1 , X2 , . . . i.i.d. ∼ Bern (1/2). Let Yn = 2n i=1 Xi . Show that the sequence
Yn converges to 0 a.s.
For 0 < ϵ < 1, we have

P max |Yi − 0| > ϵ ≤ P(Xi = 1 ∀ i ≤ n)
i≥n
1
= →0
2n
as n → ∞.
Important example of a.s. convergence is the Strong Law of Large Numbers (SLLN): If
X1 , X2 , . . . are i.i.d. (in fact pairwise independence suffices) with finite mean µ, then
n
1X a.s.
Sn = Xi −→ µ
n i=1
as n → ∞. Proof of the SLLN is beyond the scope of this course. See [Github:
https://github.com/wptay/aipt].
6
Convergence in Mean Square
We say Xn → X in mean square or L2 if
lim E(Xn − X)2 = 0.

n→∞
Let X1 , X2 , . . . be i.i.d. with finite mean E[X] and var(X). Then Sn → E[X] in mean square.
" n
!2 #
2
1 X
E (Sn − E[X]) =E (Xi − E[X])
n
i=1
" #
1 X
= 2E (Xi − E[X])(Xj − E[X])
n
i,j
n
1 X
= E(Xi − E[X])2 ∵ Xi s are independent
n2
i=1
1
= var(X) → 0
n
as n → ∞.
Note proof works even if the Xi s are only pairwise independent or even only uncorrelated.
7
Convergence in mean square does not imply convergence a.s.

Example ♣: Let X1 , X2 , . . . be independent random variables such that
1
0 w.p. 1 − n
,
Xn = 1
1 w.p. n .
1
E (Xn − 0)2 = n
→ 0 as n → ∞. Therefore, this sequence converges to 0 in mean square.
Pm 1 m
For any 0 < ϵ < 1, using the inequalities log(1 − x) ≤ −x and i=n i
≥ log n
,
m
1
Y
P max |Xi | < ϵ = lim P max |Xi | < ϵ = lim 1−
i≥n m→∞ n≤i≤m m→∞ i
i=n
m
! m
!
1 1 n
X X
= lim exp log 1 − ≤ lim exp − ≤ lim = 0.
m→∞ i m→∞ i m→∞ m
i=n i=n
Therefore, it does not converge a.s.
8
Convergence a.s. does not imply convergence in mean square.
From Example †, Yn converges to 0 a.s., but

1
E (Yn − 0)2 = 22n n = 2n ,

2
the sequence does not converge in mean square.
9
Convergence in Probability
A sequence X1 , X2 , . . . converges to a random variable X in probability if for any ϵ > 0,
lim P(|Xn − X| > ϵ) = 0,

n→∞
p
If Xn → X a.s., then Xn −→ X.
For proof, see [ Github: https://github.com/wptay/aipt ].
If Xn → X a.s., then 1{|Xn −X|>ϵ} → 0 a.s. The Dominated Convergence Theorem tells
us that
E1{|Xn −X|>ϵ} = P(|Xn − X| > ϵ) → 0.
Example ♣:
1
P(|Xn − 0| > ϵ) = → 0,
n
so the sequence converges in probability. But we saw before that it does not converge a.s.
10
Convergence in Probability
p
If Xn → X in mean square, then Xn −→ X.
From the Markov inequality,
1
P(|Xn − X| > ϵ) ≤ E(Xn − X)2 → 0.
ϵ2
Converse is not true. Example:

 0 w.p. 1 − n1 ,
Xn =
 n 1
w.p. n.
Convergence in probability:
1
P(|Xn − 0| > ϵ) = → 0.
n
But,
1
E(Xn − 0)2 = n2 · = n → ∞.
n
Convergence in probability is weaker than both convergence a.s. and in mean square.
11
Weak Law of Large Numbers
Suppose X1 , X2 , . . . are such that EXi = 0, EXi2 = σ 2 < ∞ and E[Xi Xj ] ≤ 0 for i ̸= j. Then,
n
1 X p
Xi −→ 0 as n → ∞.
n
i=1
Proof:
We have
" n
!2 # " n
#
1 X 1 X X
E Xi = 2E Xi2 +2 Xi Xj
n n
i=1 i i<j
n
1 X σ2
≤ EXi2 = .
n2 n
i
12
Weak Law of Large Numbers
From Chebyshev’s inequality, we then have for any ϵ > 0,
n
!
1 X 1 σ2
P Xi > ϵ ≤ → 0,
n ϵ2 n
i=1
as n → ∞.
13
Convergence in distribution
d
We say that Xn −→ X if for all continuous bounded functions f ,
E[f (Xn )] → E[f (X)].
d
Equivalent to saying Xn −→ X ⇐⇒ FXn (t) → FX (t) for all continuity points t of
FX (·). [Github: https://github.com/wptay/aipt]
Central Limit Theorem (CLT) for i.i.d. sequences.

ConsiderPi.i.d. random variables X1 , X2 , . . ., with EXi = µ and var Xi = σ 2 < ∞. Let
n
Sn = n1 i=1 Xi . Then
Sn − µ d
Zn = √ −→ N (0, 1).
σ/ n
14
CLT Example
15
Characteristic Functions
Given a random variable X, the characteristic function of X is φ : R 7→ C given by
φ(t) = E eitX ,

= E[cos(tX)] + iE[sin(tX)],
√
where i = −1.
This is the Fourier transform of pdf of X.
Example: The characteristic function of the Gaussian distribution N (µ, σ 2 ) is
σ 2 t2
φ(t) = eiµt− 2 .
In particular, when µ = 0, σ 2 = 1,
t2
φ(t) = e− 2 .
16
Fourier Inversion
R
Suppose that φ(t) is a characteristic function of a r.v. X. If |φ(t)| dt < ∞, then X has
pdf
Z
1
fX (x) = φ(t)e−itx dt.
2π
d
If Xn −→ X then φn (t) → φ(t), ∀ t ∈ R.
Obvious because φn (t) = E eitXn = E[cos tXn ] + iE[sin tXn ], cos(tx) and sin(tx) are
d
bounded continuous functions, and Xn −→ X.
17
Fourier Inversion
Converse not true. The sequence Xn ∼ N (0, n) has characteristic function

nt2 n→∞
φn (t) = e− 2 −−−−→ 0, ∀ t ̸= 0 and φn (0) = 1, ∀ n. Therefore, φn (t) converges.
But for all x ∈ R,

1 x 1
P(Xn < x) = 1 + erf √ → ,
2 2n 2
when n → ∞, which implies that this sequence does not converge in distribution.
18
Levy’s Continuity Theorem
Suppose Xn has characteristic function φn (t) → φ(t), and φ(t) is continuous at t = 0. Then there exists
d
X s.t. φ(t) is the characteristic function of X and Xn −→ X.
For proof, see [Github: https://github.com/wptay/aipt].
Proof of CLT for i.i.d. sequence. Without loss of generality, we assume that µ = 0, σ = 1. We have
n
1 X
Zn = √ Xj .
n
j=1
Let
Pn
j=1
Xj
φn (t) = E exp it √
n
n h X i
Y j
= E exp it √
n
j=1
X
n
it √ 1
= Ee n
n
t

= φ1 √ .
n
19
CLT Proof
From the Taylor series expansion, we have
φ′′
1 (0)
φ1 (t) = φ1 (0) + φ′1 (0)t + + o(t2 ).
2!

Since φ1 (0) = 1, φ′1 (0) = E iX1 ei0X1 = iEX1 = 0, φ′′
1 (0) = E (iX1 )
2 = −1, we obtain
t2
φ1 (t) = 1 − + o(t2 ).
2
Therefore,
n
t

φn (t) = φ1 √
n
n
t2 n→∞ t2
= 1− + o(t2 /n) −−−−→ e− 2 ,
2n
the characteristic function of N (0, 1) (certainly continuous at t = 0). From Levy’s Continuity Theorem,
we then have
d
Zn −→ N (0, 1).
20
CLT for Discrete R.V.
Pn Xi −1/2
Example: X1 , X2 , . . . are i.i.d. ∼ Bern (1/2). Note that Zn = i=1
√
n/2
is discrete and has
no pdf. But its cdf converges to the Gaussian cdf.
21
CLT for Random Vectors
The CLT applies to i.i.d. sequences of random vectors.

Let X1 , X2 , . . . be i.i.d. with each having finite mean µ and non-singular covariance
matrix Σ.
n
1 X d
Zn = √ (Xi − µ) −→ N (0, Σ)
n i=1
22
Summary
23

Lecture 7: Convergence and Limit Theorems

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 7: Convergence and Limit Theorems

Uploaded by

Copyright:

Available Formats

Lecture 7: Convergence and Limit Theorems

Almost sure convergence

Want to estimate the mean E[X] of a distribution.

Does Sn converge to E[X]? In what sense? Note that Sn is a random variable.

Recall that a sequence of numbers x1 , x2 , . . . converges to x if given any ϵ > 0, there

Suppose a sequence of random variables X1 , X2 , . . . defined over the same probability

We say Xn → X almost surely (a.s.) or with probability 1 (w.p. 1) if

We say Xn → X in mean square or L2 if

lim E(Xn − X)2 = 0.

Convergence in mean square does not imply convergence a.s.

Therefore, it does not converge a.s.

Convergence a.s. does not imply convergence in mean square.

From Example †, Yn converges to 0 a.s., but

A sequence X1 , X2 , . . . converges to a random variable X in probability if for any ϵ > 0,

lim P(|Xn − X| > ϵ) = 0,

E1{|Xn −X|>ϵ} = P(|Xn − X| > ϵ) → 0.

From Chebyshev’s inequality, we then have for any ϵ > 0,

E[f (Xn )] → E[f (X)].

Central Limit Theorem (CLT) for i.i.d. sequences.

Given a random variable X, the characteristic function of X is φ : R 7→ C given by

Converse not true. The sequence Xn ∼ N (0, n) has characteristic function

The CLT applies to i.i.d. sequences of random vectors.

You might also like