You are on page 1of 8

Math/Stat 394, Winter 2011

F.W. Scholz
Central Limit Theorems and Proofs
The following gives a self-contained treatment of the central limit theorem (CLT). It is based on
Lindebergs (1922) method. To state the CLT which we shall prove, we introduce the following
notation. We assume that X
n1
, . . . , X
nn
are independent random variables with means 0 and
respective variances
2
n1
, . . . ,
2
nn
with

2
n1
+ . . . +
2
nn
=
2
n
> 0 for all n.
Denote the sum X
n1
+ . . . + X
nn
by S
n
and observe that S
n
has mean zero and variance
2
n
.
Lindebergs Central Limit Theorem:
If the Lindeberg condition is satised, i.e., if for every > 0 we have that
L
n
() =
1

2
n
n

i=1
E
_
X
2
ni
I
{|X
ni
|
n
}
_
0 as n ,
then for every a R we have that
P (S
n
/
n
a) (a) 0 as n .
Proof: Step 1 (convergence of expectations of smooth functions): We will show in
Appendix 1 that for certain functions f we have that
E [f (S
n
/
n
)] E [f (Z)] 0 as n , (1)
where Z denotes a standard normal random variable. If this convergence would hold for any
function f and if we then applied it to
f
a
(x) = I
(,a]
(x) = 1 if x a and = 0 if x > a,
then
E [f
a
(S
n
/
n
)] E [f
a
(Z)] = P (S
n
/
n
a) (a)
and the statement of the CLT would follow. Unfortunately, we cannot directly demonstrate the
above convergence (1) for all f, but only for smooth f. Here smooth f means that f is bounded
and has three bounded, continuous derivatives as stipulated in Lemma 1 of Appendix 1.
Step 2 (sandwiching a step function between smooth functions): We will approximate
f
a
(x) = I
(,a]
(x), which is a step function with step at x = a, by sandwiching it between two
smooth functions. In fact, for > 0 one easily nds (see Appendix 2 for an explicit example)
smooth functions f(x) with f(x) = 1 for x a, f(x) monotone decreasing from 1 to 0 on
[a, a + ] and f(x) = 0 for x a. Hence we would have
f
a
(x) f(x) f
a+
(x) for all x R .
1
a a +
f(x) f
a
(x) f
a+
(x)
.................................................................................................................... . . . . . . . . . . . . . . . . . . . ...................................................................................................................................................................................................................................................................................................................................................................................................... . . . . . . . . . . . . . . . . . . . ...................................................................................................................
Since f
a+
(x + ) = f
a
(x), we also get from the second previous
f
a
(x) = f
a+
(x + ) f(x + ) .
Combining these, we can bracket f
a
(x) for all x R by
f(x + ) f
a
(x) f(x) .
a a a +
f(x) f
a
(x) f(x + )
.................................................................................................................... . . . . . . . . . . . . . . . . . . . ...................................................................................................................................................................................................................................................................................................................................................................................................... . . . . . . . . . . . . . . . . . . . ...................................................................................................................
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..............................................................................................................................................................................................................................................................................................................................................
Step 3 (the approximation argument): The following diagram will clarify the approxima-
tion strategy. The inequalities result from the bracketing in the previous step.
E
_
f
_
S
n

n
+
__
E
_
f
a
_
S
n

n
__
E
_
f
_
S
n

n
__

E [f (Z + )] E [f
a
(Z)] E [f (Z)]
means that for any xed > 0 and thus for xed f the terms above and below become
arbitrarily close, say within /3 of each other, as n (see Step 1). Further, since f(Z +)
f(Z) = 0 for Z [a , a + ] and |f(Z + ) f(Z)| 1 for a Z a + we have
|E[f(Z + )] E[f(Z)]| =

E
_
(f(Z + ) f(Z)) I
[aZa+]
_

E
_
I
[aZa+]
_

= P(a Z a + ) = (a + ) (a )
() () 2(0) = 2/

2 .
The latter bound can be made as small as /3, no matter what f is, by taking =

2/6.
With this choice of , and n suciently large, the above diagram entails that

E
_
f
a
_
S
n

n
__
E [f
a
(Z)]



3
+

3
+

3
=
2
Since this can be done for any > 0 we have shown that

E
_
f
a
_
S
n

n
__
E [f
a
(Z)]

0 as n 2
From Lindebergs CLT other special versions follow at once. The rst, the Lindeberg-Levy CLT,
considers independent identically distributed random variables with nite variance
2
. The
second, the Liapunov CLT, considers independent, but not necessarily identically distributed
random variables with nite third moments.
Lindeberg-Levy CLT: Let Y
1
, . . . , Y
n
be independent and identically distributed random vari-
ables with common mean and nite positive variance
2
and let T
n
= Y
1
+ . . . + Y
n
. Then
for all a R P
_
T
n
n

n
a
_
(a) 0 as n .
Proof: Let X
i
= Y
i
, then X
i
has mean zero and variance
2
. Note that S
n
= X
1
+. . .+X
n
=
T
n
n has variance
2
= n
2
so that
T
n
n

n
=
S
n

n
.
The Lindeberg function L
n
() becomes
L
n
() =
1
n
2
n

i=1
E
_
X
2
i
I
[|X
i
|

n]
_
=
1
n
2
nE
_
X
2
1
I
[|X
1
|

n]
_
0
as n . For example, for a continuous r.v. with mean zero and nite variance
2
this last
convergence follows by noting
1

2
=
_
(a,a)
x
2
f(x)dx +
_
(a,a)
c
x
2
f(x)dx ,
where the rst summand on the right converges to
2
as a . Thus the second term must
go to zero. But the second term is just
_
(a,a)
c
x
2
f(x)dx = E
_
X
2
I
[|X|a]
_
2
Liapunov CLT: Let Y
1
, . . . , Y
n
be independent r.v. with means
1
, . . . ,
n
,
variances
2
1
, . . . ,
2
n
and nite absolute central moments, i.e.,
E(|Y
i

i
|
3
) < . If Liapunovs condition is satised, i.e.,

n
=
1

3
n
n

i=1
E
_
|Y
i

i
|
3
_
0 as n ,
then
P
_
_

Y
i

i
_

2
i
a
_
_
(a) 0 as n
1
We express the following in terms of densities, but it holds generally, i.e., for discrete r.v.s as well.
3
Proof: We will simply show that Liapunovs condition implies Lindebergs condition. Note
that X
i
= Y
i

i
has mean zero and variance
2
i
.

n
=
1

3
n

E
_
|X
i
|
3
_

1

3
n

E
_
|X
i
|
3
I
[|X
i
|
n
]
_


n

3
n

E
_
|X
i
|
2
I
[|X
i
|
n
]
_
= L
n
() 2
Appendix 1
Demonstration of Step 1: A crucial part of the proof is the following form of Taylors formula.
You may want to skip the technical proof which is provided only for completeness.
Lemma 1 (Taylors Formula): Let f be a bounded function dened on R with three bounded
continuous derivatives f
(0)
= f, f
(1)
, f
(2)
and f
(3)
, i.e., with
sup
xR

f
(i)
(x)

= M
(i)
f
< for i = 0, 1, 2, 3.
Then for all h R
g(h) = sup
xR

f(x + h) f(x) f
(1)
(x)h
1
2
f
(2)
(x)h
2

K
f
min
_
h
2
, |h|
3
_
.
Proof: From the usual form of Taylors theorem we have for any h
f(x + h) f(x) f
(1)
(x)h
1
2
f
(2)
(x)h
2
=
h
3
6
f
(3)
( x)
where x lies between x and x + h. Hence for all h R

f(x + h) f(x) f
(1)
(x)h
1
2
f
(2)
(x)h
2


M
(3)
f
6
|h|
3
. (2)
On the other hand by simple application of the triangle inequality
(|a + b + . . . | |a| +|b| + . . .) we also have
|f(x + h) f(x) f
(1)
(x)h
1
2
f
(2)
(x)h
2
|
M
(0)
f
+ M
(0)
f
+ M
(1)
f
|h| +
1
2
M
(2)
f
h
2
= h
2
_
_
2M
(0)
f
h
2
+
M
(1)
f
|h|
+
1
2
M
(2)
f
_
_
which for large h, say |h| b
f
, and large K

, say K

=
1
2
M
(2)
f
+1, can be bounded by K

h
2
. For
|h| < b
f
we have from the inequality (2)

f(x + h) f(x) f
(1)
(x)h
1
2
f
(2)
(x)h
2


M
(3)
f
6
|h|
3
b
f
M
(3)
f
6
h
2
.
4
Taking
K
f
= max
_
_
b
f
M
(3)
f
6
, K

,
M
(3)
f
6
_
_
we have for h R

f(x + h) f(x) f
(1)
(x)h
1
2
f
(2)
(x)h
2

K
f
h
2
and K
f
|h|
3
,
hence min(K
f
h
2
, K
f
|h|
3
) 2.
From Lemma 1 we obtain the following Corollary.
Corollary 1: Under the conditions of Lemma 1 we have
f(x + h
1
) f(x + h
2
) = [f
(1)
(x)(h
1
h
2
) +
1
2
f
(2)
(x)(h
2
1
h
2
2
)] + r
with
|r| g(h
1
) + g(h
2
) K
f
_
min(h
2
1
, |h
1
|
3
) + min(h
2
2
, |h
2
|
3
)
_
.
Proof:
|r| =

f(x + h
1
) f(x + h
2
) [f
(1)
(x)(h
1
h
2
) +
1
2
f
(2)
(x)(h
2
1
h
2
2
)]

f(x + h
1
) f(x) f
(1)
(x)h
1

1
2
f
(2)
(x)h
2
1

_
f(x + h
2
) f(x) f
(1)
(x)h
2

1
2
f
(2)
(x)h
2
2
_

g(h
1
) + g(h
2
)
by the triangle inequality 2.
To demonstrate the convergence in (1) consider independent normal random variables V
n1
, . . .,
V
nn
with mean zero and respective variances
2
n1
, . . .,
2
nn
, so that
Z =
V
n1
+ . . . + V
nn

n
is normal with mean zero and variance one, i.e., standard normal. These V
ni
s are also taken to
be independent of the X
nj
s.
First rewrite the approximation dierence in the following telescoped fashion (to simplify nota-
tion we write X
i
, V
i
and
i
for X
ni
, V
ni
and
ni
, respectively)
E
_
f
_
S
n

n
__
E [f (Z)]
= E
_
f
_
X
1
+ . . . + X
n

n
__
E
_
f
_
X
1
+ . . . + X
n1
+ V
n

n
__
+ E
_
f
_
X
1
+ . . . + X
n1
+ V
n

n
__
E
_
f
_
X
1
+ . . . + X
n2
+ V
n1
+ V
n

n
__
+ . . . . . . . . .
+ . . . . . . . . .
+ E
_
f
_
X
1
+ V
2
+ . . . + V
n

n
__
E
_
f
_
V
1
+ . . . + V
n

n
__
.
5
Now let Y
i
= X
1
+ . . . + X
i
+ V
i+2
+ . . . + V
n
for i = 1, . . . , n 2 and Y
0
= V
2
+ . . . + V
n
and
Y
n1
= X
1
+ . . . + X
n1
. With this notation and employing the two term Taylor expansion of
Corollary 1 one can express a typical dierence term in the above telescoped sum as
E
_
f
_
Y
i
+ X
i+1

n
__
E
_
f
_
Y
i
+ V
i+1

n
__
= E
_
_
X
i+1
V
i+1

n
_
f
(1)
_
Y
i

n
_
+
X
2
i+1
V
2
i+1
2
2
n
f
(2)
_
Y
i

n
_
+ R
n,i+1
_
where
|R
n,i+1
| g
_
X
i+1

n
_
+ g
_
V
i+1

n
_
K
f
_
min
_
X
2
i+1

2
n
,
|X
i+1
|
3

3
n
_
+ min
_
V
2
i+1

2
n
,
|V
i+1
|
3

3
n
__
(3)
Noting the independence of Y
i
and (X
i+1
, V
i+1
) and E(X
i+1
) = E(V
i+1
) = 0 we nd
E
__
X
i+1
V
i+1

n
_
f
(1)
_
Y
i

n
__
= E
_
X
i+1
V
i+1

n
_
E
_
f
(1)
_
Y
i

n
__
= 0 .
Similarly,
E
_
X
2
i+1
V
2
i+1
2
2
n
f
(2)
_
Y
i

n
_
_
= E
_
X
2
i+1
V
2
i+1
2
2
n
_
E
_
f
(2)
_
Y
i

n
__
=

2
i+1

2
i+1
2
2
n
E
_
f
(2)
_
Y
i

n
__
= 0
Thus

E
_
f
_
Y
i
+ X
i+1

n
__
E
_
f
_
Y
i
+ V
i+1

n
__

E |R
n,i+1
| .
Using the triangle inequality on the above telescoped sum we have

E
_
f
_
S
n

n
__
E [f (Z)]


n1

i=0
E |R
n,i+1
|
n1

i=0
E
_
g
_
X
i+1

n
__
+
n1

i=0
E
_
g
_
V
i+1

n
__
=
n

i=1
E
_
g
_
X
i

n
__
+
n

i=1
E
_
g
_
V
i

n
__
,
changing the running index from i = 0, . . . , n 1 to i = 1, . . . , n in the last step. We will show
that both sums on the right can be made arbitrarily small by letting n . First consider
E
_
g
_
X
i

n
__
= E
_
g
_
X
i

n
_
I
{|X
i
|
n
}
_
+ E
_
g
_
X
i

n
_
I
{|X
i
|>
n
}
_
and invoke from (3) the bound K
f
|X
i
/
n
|
3
for the rst term and the bound K
f
|X
i
/
n
|
2
for the
second term to get
E
_
g
_
X
i

n
__
K
f
E
_
|X
i
|
3

3
n
I
{|X
i
|
n
}
_
+ K
f
E
_
|X
i
|
2

2
n
I
{|X
i
|>
n
}
_
K
f
E
_
|X
i
|
2

2
n
I
{|X
i
|
n
}
_
+ K
f
E
_
|X
i
|
2

2
n
I
{|X
i
|>
n
}
_
K
f
E
_
|X
i
|
2

2
n
_
+ K
f
E
_
|X
i
|
2

2
n
I
{|X
i
|>
n
}
_
= K
f

2
i

2
n
+ K
f
1

2
n
E
_
X
2
i
I
{|X
i
|>
n
}
_
.
Summing these we get
n

i=1
E
_
g
_
X
i

n
__
K
f
n

i=1

2
i

2
n
+ K
f
1

2
n
n

i=1
E
_
X
2
i
I
{|X
i
|>
n
}
_
= K
f
+ K
f
1

2
n
n

i=1
E
_
X
2
i
I
{|X
i
|>
n
}
_
= K
f
+ K
f
L
n
() K
f
6
as n by the assumption of the Lindeberg condition.
Similarly one gets
n

i=1
E
_
g
_
V
i

n
__
K
f
+ K
f
1

2
n
n

i=1
E
_
V
2
i
I
{|V
i
|>
n
}
_
K
f
,
provided we can show that the Lindeberg condition also holds for the V
i
s. These two conver-
gences to K
f
show that the approximation error can be made arbitrarily small as n , since
> 0 can be any small number, as long as f and hence K
f
is kept xed.
To show the Lindeberg condition for the V
i
s, we rst note that the Lindeberg condition for the
X
i
s entails that
max{
1
/
n
, . . . ,
n
/
n
} 0 as n . (4)
This follows from

2
i

2
n
=
1

2
n
E
_
X
2
i
I
{|X
i
|
n
}
_
+
1

2
n
E
_
X
2
i
I
{|X
i
|>
n
}
_

1

2
n

2
n

2
E
_
I
{|X
i
|
n
}
_
+
1

2
n
E
_
X
2
i
I
{|X
i
|>
n
}
_

2
+
1

2
n
E
_
X
2
i
I
{|X
i
|>
n
}
_

2
+ L
n
() .
From the Lindeberg condition it follows that the second term vanishes as n and hence

2
i
/
2
n
2
2
as n for any > 0. This shows (4).
The Lindeberg condition for the V
i
s is seen as follows, using |V
i
/(
n
)| > 1 in the rst :
1

2
n
n

i=1
E
_
V
2
i
I
{|V
i
|>
n
}
_

1

3
n
n

i=1
E
_
|V
i
|
3
_
=
1

3
n
n

i=1

3
i
E
_
|Z|
3
_

max{
1
/
n
, . . . ,
n
/
n
}

n
i=1

2
i

2
n
E
_
|Z|
3
_
=
1

max{
1
/
n
, . . . ,
n
/
n
}E
_
|Z|
3
_
0
as n 2.
Appendix 2
Here we will exhibit an explicit function f(x) which has three continuous and bounded deriva-
tives and which is 1 for x 0, decreases from 1 to 0 on the interval [0, 1] and is 0 for x 1.
The form of this function taken from Thomasian (1969). By taking
h(x) = f
_
x a

_
we immediately get a corresponding smooth function which descends from 1 to 0 over the interval
[a, a + ] instead.
The function f(x) is given as follows for x [0, 1]:
f(x) = 1 140
_
1
4
x
4

3
5
x
5
+
1
2
x
6

1
7
x
7
_
.
Then
f

(x) = 140x
3
(1 x)
3
for x [0, 1] ,
7
hence f is monotone decreasing on [0, 1]. Further, f(1) = 0 and f(0) = 1. To check smoothness
we have to verify that the rst three derivatives join continuously at 0 and 1, i.e., are 0. Clearly,
f

(0) = f

(1) = 0. Next,
f

(x) = 420x
2
(1 x)
2
(1 2x) with f

(0) = f

(1) = 0
and
f

(x) = 420
_
2x(1 x)
2
(1 2x) 2(1 x)x
2
(1 2x) 2x
2
(1 x)
2
_
with f

(0) = f

(1) = 0.
References
Lindeberg, J.W. (1922). Eine neue Herleitung des Exponentialgesetzes in der Wahrscheinlich-
keitsrechnung. Mathemat. Z., 15, 211-225.
Thomasian, A. (1969). The Structure of Probability Theory with Applications. Mc Graw Hill,
New York. (pp. 483-493)
8

You might also like