You are on page 1of 59

February 23, 2005 Master Review Vol.

9in x 6in (for Lecture Note Series, IMS, NUS) main-t


Steins method for normal approximation
Louis H. Y. Chen and Qi-Man Shao
Institute for Mathematical Sciences, National University of Singapore
3 Prince Georges Park, Singapore 118402
E-mail: lhychen@ims.nus.edu.sg
and
Department of Statistics and Applied Probability
National University of Singapore
6 Science Drive 2, Singapore 117543;
Department of Mathematics, University of Oregon
Eugene, OR 97403, USA
E-mail: qmshao@darkwing.uoregon.edu
Steins method originated in 1972 in a paper in the Proceedings of the
Sixth Berkeley Symposium. In that paper, he introduced the method
in order to determine the accuracy of the normal approximation to the
distribution of a sum of dependent random variables satisfying a mix-
ing condition. Since then, many developments have taken place, both
in extending the method beyond normal approximation and in apply-
ing the method to problems in other areas. In these lecture notes, we
focus on univariate normal approximation, with our main emphasis on
his approach exploiting an a priori estimate of the concentration func-
tion. We begin with a general explanation of Steins method as applied
to the normal distribution. We then go on to consider expectations of
smooth functions, rst for sums of independent and locally dependent
random variables, and then in the more general setting of exchangeable
pairs. The later sections are devoted to the use of concentration inequal-
ities, in obtaining both uniform and non-uniform BerryEsseen bounds
for independent random variables. A number of applications are also
discussed.
1
February 23, 2005 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) main-t
2 Louis H. Y. Chen and Qi-Man Shao
Contents
1 Introduction 2
2 Fundamentals of Steins method 8
2.1 Characterization 8
2.2 Properties of the solutions 10
2.3 Construction of the Stein identities 11
3 Normal approximation for smooth functions 13
3.1 Independent random variables 14
3.2 Locally dependent random variables 17
3.3 Exchangeable pairs 19
4 Uniform BerryEsseen bounds: the bounded case 23
4.1 Independent random variables 23
4.2 Binary expansion of a random integer 28
5 Uniform BerryEsseen bounds: the independent case 31
5.1 The concentration inequality approach 31
5.2 Proving the BerryEsseen theorem 34
5.3 A lower bound 35
6 Non-uniform BerryEsseen bounds: the independent case 39
6.1 A non-uniform concentration inequality 39
6.2 The nal result 44
7 Uniform and non-uniform bounds under local dependence 48
8 Appendix 53
References 58
1. Introduction
Steins method is a way of deriving explicit estimates of the accuracy of the
approximation of one probability distribution by another. This is accom-
plished by comparing expectations, as indicated in the title of Steins (1986)
monograph, Approximate computation of expectations. An upper bound is
computed for the dierence between the expectations of any one of a (large)
family of test functions under the two distributions, each family of test func-
tions determining an associated metric. Any such bound in turn implies a
corresponding upper bound for the distance between the two distributions,
measured with respect to the associated metric.
Thus, if the family of test functions consists of the indicators of all (mea-
surable) subsets, then accuracy is expressed in terms of the total variation
February 23, 2005 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) main-t
Normal approximation 3
distance d
TV
between the two distributions:
d
TV
(P, Q) := sup
hH

_
hdP
_
hdQ

= sup
A
|P(A) Q(A)|,
where H = {1
A
; A measurable}. If the distributions are on IR, and the
test functions are the indicators of all half-lines, then accuracy is expressed
using Kolmogorov distance, as is customary in BerryEsseen theorems:
d
K
(P, Q) := sup
hH

_
hdP
_
hdQ

= sup
zIR
|P(, z] Q(, z]|,
where H = {1
(,z]
; z IR}. If the test functions consist of all uniformly
Lipschitz functions h with constant bounded by 1, then the Wasserstein
distance results:
d
W
(P, Q) := sup
hH

_
hdP
_
hdQ

,
where H = {h: IR IR; h

1} =: Lip(1), and where, for a function


g : IR IR, g denotes sup
xIR
|g(x)|. If the test functions are uniformly
bounded and uniformly Lipschitz, bounded Wasserstein distances result:
d
BW(k)
(P, Q) := sup
hH

_
hdP
_
hdQ

,
where H = {h: IR IR; h 1, h

k}. Steins method applies to all


of these distances, and to many more.
For normal approximation on IR, Stein began with the observation that
IE{f

(Z) Zf(Z)} = 0 (1.1)


for any bounded function f with bounded derivative, if Z has the standard
normal distribution N(0, 1). This can be veried by partial integration:
1

2
_

(x)e
x
2
/2
dx
=
_
1

2
f(x)e
x
2
/2
_

+
1

2
_

xf(x)e
x
2
/2
dx
=
1

2
_

xf(x)e
x
2
/2
dx.
However, the same partial integration can also be used to solve the dier-
ential equation
f

(x) xf(x) = g(x), lim


x
f(x)e
x
2
/2
= 0, (1.2)
February 23, 2005 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) main-t
4 Louis H. Y. Chen and Qi-Man Shao
for g any bounded function, giving
_
y

g(x)e
x
2
/2
dx =
_
y

{f

(x) xf(x)}e
x
2
/2
dx
=
_
y

d
dx
{f(x)e
x
2
/2
} dx
= f(y)e
y
2
/2
,
and hence
f(y) = e
y
2
/2
_
y

g(x)e
x
2
/2
dx.
Note that this f actually satises lim
y
f(y) = 0, because
(y) :=
1

2
_
y

e
x
2
/2
dx (y

2)
1
e
y
2
/2
as y , and that f is bounded (and then, by the same argument, has
lim
y
f(y) = 0) if and only if
_

g(x)e
x
2
/2
dx = 0, or, equivalently, if
IEg(Z) = 0.
Hence, taking any bounded function h, we observe that the function f
h
,
dened by
f
h
(x) := e
x
2
/2
_
x

{h(t) IEh(Z)}e
t
2
/2
dt, (1.3)
satises (1.2) for g(x) = h(x)IEh(Z); substituting any random variable W
for x in (1.2) and taking expectations then yields
IE{f

h
(W) Wf
h
(W)} = IEh(W) IEh(Z). (1.4)
Thus the characterization (1.1) of the standard normal distribution also de-
livers an upper bound for normal approximation with respect to any of the
distances introduced above: for any class H of (bounded) test functions h,
sup
hH
|IEh(W) IEh(Z)| = sup
hH
|IE{f

h
(W) Wf
h
(W)}| . (1.5)
The curious fact is that the supremum on the right hand side of (1.5),
which contains only the random variable W, is frequently much simpler
to bound than that on the left hand side. The dierential characteriza-
tion (1.1) of the normal distribution is reected in the fact that the quantity
IE{f

(W) Wf(W)} is often relatively easily shown to be small, when the


structure of W is such as to make normal approximation seem plausible; for
instance, when W is a sum of individually small and only weakly dependent
February 23, 2005 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) main-t
Normal approximation 5
random variables. This is illustrated in the following elementary argument
for the classical central limit theorem.
Suppose that X
1
, X
2
, . . . , X
n
are independent and identically dis-
tributed random variables, with mean zero and unit variance, and such
that IE|X
1
|
3
< ; let W = n
1/2

n
j=1
X
j
. We evaluate the quantity
IE{f

(W) Wf(W)} for a smooth and very well-behaved function f.


Since W is a sum of identically distributed components, we have
IE{Wf(W)} = nIE{n
1/2
X
1
f(W)},
and W = n
1/2
X
1
+W
1
, where W
1
= n
1/2

n
j=2
X
j
is independent of X
1
.
Hence, by Taylors expansion,
IE{Wf(W)} = n
1/2
IE{X
1
f(W
1
+n
1/2
X
1
)}
= n
1/2
IE{X
1
(f(W
1
) +n
1/2
X
1
f

(W
1
))} +
1
, (1.6)
where |
1
|
1
2
n
1/2
IE|X
1
|
3
f

. On the other hand, again by Taylors


expansion,
IE{f

(W) = IEf

(W
1
+n
1/2
X
1
) = IEf

(W
1
) +
2
, (1.7)
where |
2
| n
1/2
IE|X
1
| f

. But now, since IEX


1
= 0 and IEX
2
1
= 1,
and since X
1
and W
1
are independent, it follows that
|IE{f

(W) Wf(W)}| n
1/2
{1 +
1
2
IE|X
1
|
3
}f

,
for any f with bounded second derivative. Hence, if H is any class of test
functions and
C
H
:= sup
hH
f

h
,
then it follows that
sup
hH
|IEh(W) IEh(Z)| C
H
n
1/2
{1 +
1
2
IE|X
1
|
3
}. (1.8)
The inequality (1.8) thus establishes an accuracy of order O(n
1/2
) for the
normal approximation of W, with respect to the distance d
H
on probability
measures on IR dened by
d
H
(P, Q) = sup
hH

_
hdP
_
hdQ

,
provided only that C
H
< . For example, as observed by Erickson (1974),
the class of test functions h for the Wasserstein distance d
W
is just the
Lipschitz functions Lip(1) with constant no greater than 1, and, for this
class, C
H
= 2: see (2.13) of Lemma 2.3. The distinction between the bounds
February 23, 2005 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) main-t
6 Louis H. Y. Chen and Qi-Man Shao
for dierent distances is then reected in the dierences between the values
of C
H
. Computing good estimates of C
H
involves studying the properties of
the solutions f
h
to the Stein equation given in (1.3) for h H. Fortunately,
for normal approximation, a lot is known.
It sadly turns out that this simple argument is not enough to prove the
BerryEsseen theorem, which states that
sup
zIR
|IP[W z] (z)| Cn
1/2
IE|X
1
|
3
,
for a universal constant C. This is because, for Hthe set of indicators of half-
lines, C
H
= . However, it is possible to rene the argument, by making
less crude estimates of |
1
| and |
2
|. Broadly speaking, if h = 1
(,z]
, and
writing f
z
for the corresponding solution f
h
, then f

z
is smooth, except for
a jump discontinuity at z, as can be seen from (1.2). Hence, taking |
2
| for
illustration,
|f

z
(W
1
+n
1/2
X
1
) f

z
(W
1
)|
n
1/2
|X
1
|M(z, f
z
)(|W
1
| +n
1/2
|X
1
| + 1), (1.9)
where M(z, f) := sup
x=z
{|f

(x)|/(|x| + 1)}, provided that z does not lie


between W
1
+ n
1/2
X
1
and W
1
. If it does, the dierence is still bounded
by 2f

z
. Hence, since both sup
zIR
f

z
and sup
zIR
M(z, f
z
) are nite,
explicit bounds of order O(n
1/2
) can still be deduced for the BerryEsseen
theorem, provided that it can be shown that
sup
zIR
IP(W
1
[z, z +h]) C(h +n
1/2
IE|X
1
|
3
) (1.10)
for some constant C (the corresponding estimates for |
1
| are a little more
involved, but the argument can be carried through analogously). Inequal-
ities of this form are precisely the concentration inequalities that play a
central part in these notes, and the above discussion illustrates their im-
portance.
If the random variables X
i
are dependent, these simple arguments are no
longer valid. However, they can be adapted to work eectively, if less cleanly,
for many weak dependence structures. For instance, if individual summands
have little inuence on W, as is typical when a normal approximation is
expected to be good, then it may be possible to decompose W in the form
W = n
1/2
(X
i
+U
i
) +W
i
,
where W
i
is almost independent of X
i
, and |U
i
| is not too large, en-
abling Taylor expansions analogous to (1.6) and (1.7) to be attempted.
February 23, 2005 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) main-t
Normal approximation 7
However, especially if Kolmogorov distance is to be considered, it pays to
be rather less brutal, making direct use of (1.4), and rewriting the errors
in exact, integral form; this will appear time and again in what follows.
Better bounds can be achieved as a result, and concentration arguments
also become feasible: note that nding sharp bounds for a probability such
as IP(a W
i
a+n
1/2
(X
i
+U
i
)) is not an easy task when W
i
, X
i
and U
i
are dependent.
In these lecture notes, we follow Steins original theme, by presenting
his ideas in the context of normal approximation. We shall focus on the ap-
proach using concentration inequalities. This approach dates back to Steins
lectures around 1970 (see Ho & Chen (1978)). It was later extended by Chen
(1986) to dependent and non-identically distributed random variables with
arbitrary index set. A proof of the BerryEsseen theorem for independent
and non-identically distributed random variables using the concentration
inequality approach is given in Section 2 of Chen (1998). The approach
has further been shown to be eective in obtaining not only uniform but
also non-uniform BerryEsseen bounds for the accuracy of normal approx-
imation for independent random variables (Chen & Shao, 2001) and for
locally dependent random elds as well (Chen & Shao, 2004a). As an exten-
sion, a randomized concentration inequality was used to establish uniform
and non-uniform BerryEsseen bounds for non-linear statistics in Chen &
Shao (2004b). In view of its current successes, the technique seems well
worth further exploration.
We also briey discuss the exchangeable pairs coupling, and use it to
obtain a BerryEsseen bound for the number of 1s in the binary expansion
of a random integer. The use of exchangeable pairs was discussed in Steins
monograph (Stein, 1986), and is now widely known (see, for example, Dia-
conis, Holmes & Reinert (2004)). Other proofs of the BerryEsseen theorem
which will not be included in these lectures are the inductive argument of
Barbour & Hall (1984), Stroock (1993) and Bolthausen (1984), the size bias
coupling used by Goldstein & Rinott (1996) and the zero bias transforma-
tion introduced by Goldstein & Reinert (1997). The inductive argument
works well for uniform bounds when the conditional distribution of the
sum under consideration, given any one of the variables in the sum, has a
form similar to that of the corresponding unconditional distribution of the
sum. The size biasing method works well for many combinatorial problems,
such as counting the number of vertices in a random graph; theoretically
speaking, the zero bias transformation approach works for arbitrary random
variables W with mean zero and nite variance. More on these approaches
February 23, 2005 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) main-t
8 Louis H. Y. Chen and Qi-Man Shao
can be found in Sections 4.5, 5.2 and 8 of Chapter 4 of this volume. We
also refer to Goldstein & Reinert (1997) for some of the important features
of the zero bias transformation and to Goldstein (2005) for BerryEsseen
bounds for combinatorial central limit theorems and pattern occurrences
using zero and size biasing approaches. Short surveys of normal approx-
imation by Steins method were given in Rinott & Rotar (2000) and in
Chen (1998). Steins method has also been used to establish bounds on the
approximation error in the multivariate central limit theorem, in particular
by Gotze (1991) and by Rinott & Rotar (1996). A collection of expository
lectures on a variety of aspects of Steins method and its applications has
recently been edited by Diaconis & Holmes (2004).
These notes are organized as follows. We begin with a more detailed
account of Steins method than the sketch provided in this section. In Sec-
tion 3, we consider the expectation of smooth functions under independence
and local dependence, and also in the setting of an exchangeable pair hav-
ing the linear regression property. Some applications are given. Section 4
discusses sums of uniformly bounded random variables. Here, it is shown
that BerryEsseen bounds for distribution functions can be obtained with-
out the use of concentration inequalities. Both the independent case and
the problem of counting the number of ones in the binary expansion of a
random integer are studied, where in the latter case the use of an exchange-
able pair is demonstrated. In Sections 5 and 6, concentration inequalities
are invoked to obtain both uniform and non-uniform BerryEsseen bounds
for sums for independent random variables. The last section is devoted to
obtaining a uniform BerryEsseen bound under local dependence.
2. Fundamentals of Steins method
In this section, we follow Stein (1986), and give a more detailed account of
the basic method sketched in the introduction.
2.1. Characterization
Let Z be a standard normally distributed random variable, and let C
bd
be
the set of continuous and piecewise continuously dierentiable functions
f : R R with IE|f

(Z)| < . Steins method rests on the following


characterization, which is a slight strengthening of (1.1).
February 23, 2005 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) main-t
Normal approximation 9
Lemma 2.1: Let W be a real valued random variable. Then W has a
standard normal distribution if and only if
IEf

(W) = IE{Wf(W)}, (2.1)


for all f C
bd
.
Proof: Necessity. If W has a standard normal distribution, then for f C
bd
IEf

(W) =
1

2
_

(w)e
w
2
/2
dw
=
1

2
_
0

(w)
_
_
w

(x)e
x
2
/2
dx
_
dw
+
1

2
_

0
f

(w)
_
_

w
xe
x
2
/2
dx
_
dw.
By Fubinis theorem, it thus follows that
IEf

(W) =
1

2
_
0

_
_
0
x
f

(w) dw
_
(x)e
x
2
/2
dx
+
1

2
_

0
_
_
x
0
f

(w) dw
_
xe
x
2
/2
dx
=
1

2
_

[f(x) f(0)]xe
x
2
/2
dx
= IEWf(W).
Suciency. For xed z IR, let f(w) := f
z
(w) denote the solution of the
equation
f

(w) wf(w) = 1
(,z]
(w) (z). (2.2)
Multiplying by e
w
2
/2
on both sides of (2.2) yields
_
e
w
2
/2
f(w)
_

= e
w
2
/2
_
1
(,z]
(w) (z)
_
.
Thus f
z
is given by
f
z
(w) = e
w
2
/2
_
w

[1
(,z]
(x) (z)]e
x
2
/2
dx
= e
w
2
/2
_

w
[1
(,z]
(x) (z)]e
x
2
/2
dx
=
_
2e
w
2
/2
(w)[1 (z)] if w z,

2e
w
2
/2
(z)[1 (w)] if w z.
(2.3)
February 23, 2005 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) main-t
10 Louis H. Y. Chen and Qi-Man Shao
By Lemma 2.2 below, f
z
is a bounded continuous and piecewise continu-
ously dierentiable function. Suppose that (2.1) holds for all f C
bd
. Then
it holds for f
z
, and hence, by (2.2)
0 = IE[f

z
(W) Wf
z
(W)] = IE[1
(,z]
(w) (z)] = P(W z) (z).
Thus W has a standard normal distribution.
Equation (2.2) is a particular case of the more general Stein equation
f

(w) wf(w) = h(w) IEh(Z), (2.4)


which is to be solved for f, given a real valued measurable function h with
IE|h(Z)| < : c.f. (1.2). As for (2.3), the solution f = f
h
is given by
f
h
(w) = e
w
2
/2
_
w

[h(x) IEh(Z)]e
x
2
/2
dx
= e
w
2
/2
_

w
[h(x) IEh(Z)]e
x
2
/2
dx. (2.5)
2.2. Properties of the solutions
We now list some basic properties of the solutions (2.3) and (2.5) to the
Stein equations (2.2) and (2.4). The reasons why we need them have al-
ready been indicated in (1.8) and (1.9), where estimates of sup
hH
f

h
,
sup
zIR
f

z
and sup
zIR
sup
x=z
|f

z
(x)| were required, in order to determine
our error bounds for the various approximations. For the more detailed ar-
guments to come, further properties are also needed. We defer the proofs
to the appendix, since they only involve careful real analysis.
We begin by considering the solution f
z
to (2.2), given in (2.3).
Lemma 2.2: The function f
z
dened by (2.3) is such that
wf
z
(w) is an increasing function of w. (2.6)
Moreover, for all real w, u and v,
|wf
z
(w)| 1, |wf
z
(w) uf
z
(u)| 1 (2.7)
|f

z
(w)| 1, |f

z
(w) f

z
(v)| 1 (2.8)
0 < f
z
(w) min(

2/4, 1/|z|) (2.9)


and
|(w +u)f
z
(w +u) (w +v)f
z
(w +v)| (|w| +

2/4)(|u| +|v|). (2.10)


February 23, 2005 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) main-t
Normal approximation 11
We mostly use (2.8) and (2.9) for our approximations. If one does not
care about the constants, one can easily obtain
|f

z
(w)| 2 and 0 < f
z
(w)
_
/2
by using the well-known inequality
1 (w) min
_
1
2
,
1
w

2
_
e
w
2
/2
, w > 0.
Next, we consider the solution f
h
to the Stein equation (2.4), as given
in (2.5), for any bounded absolutely continuous function h.
Lemma 2.3: For any absolutely continuous function h: IR IR, the so-
lution f
h
given in (2.5) satises
f
h
min
__
/2h() IEh(Z), 2h

_
, (2.11)
f

h
min
_
2h() IEh(Z), 4h

_
(2.12)
and
f

h
2h

. (2.13)
2.3. Construction of the Stein identities
In this section, we return to the elementary use of Steins method which led
to the simple bound (1.8), but express the remainders
1
and
2
in a form
better suited to deriving more advanced results. We now take
1
,
2
, ,
n
to be independent random variables satisfying IE
i
= 0, 1 i n, and
such that

n
i=1
IE
2
i
= 1. Thus
i
corresponds to the normalized random
variable n
1/2
X
i
of the introduction; here, however, we no longer require
the
i
to be identically distributed. Much as before, we set
W =
n

i=1

i
and W
(i)
= W
i
, (2.14)
and dene
K
i
(t) = IE{
i
(I
{0t
i
}
I
{
i
t<0}
)}. (2.15)
It is easy to check that K
i
(t) 0 for all real t, that
_

K
i
(t) dt = IE
2
i
and that
_

|t|K
i
(t) dt =
1
2
IE|
i
|
3
. (2.16)
February 23, 2005 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) main-t
12 Louis H. Y. Chen and Qi-Man Shao
Let h be a measurable function with IE|h(Z)| < , and let f = f
h
be the corresponding solution of the Stein equation (2.4). Our goal is to
estimate
IEh(W) IEh(Z) = IE{f

(W) Wf(W)}.
The following argument is fundamental to the approach, and many of the
tricks reappear repeatedly in what follows.
Since
i
and W
(i)
are independent for each 1 i n, we have
IE{Wf(W)} =
n

i=1
IE{
i
f(W)}
=
n

i=1
IE{
i
[f(W) f(W
(i)
)]},
where the last equality follows because IE
i
= 0. Writing the nal dierence
in integral form, we thus have
IE{Wf(W)} =
n

i=1
IE
_

i
_

i
0
f

(W
(i)
+t) dt
_
=
n

i=1
IE
__

(W
(i)
+t)
i
(I
{0t
i
}
I
{
i
t<0}
) dt
_
=
n

i=1
_

IE
_
f

(W
(i)
+t)
_
K
i
(t) dt, (2.17)
from the denition of K
i
and again using independence. However, because
n

i=1
_

K
i
(t) dt =
n

i=1
IE
2
i
= 1,
it follows that
IEf

(W) =
n

i=1
_

IE{f

(W)}K
i
(t) dt. (2.18)
Thus, by (2.17) and (2.18),
IE{f

(W) Wf(W)} =
n

i=1
_

IE{f

(W) f

(W
(i)
+t)}K
i
(t) dt. (2.19)
Equations (2.17) and (2.19) play a key role in proving good normal
approximations. Note in particular that (2.19) is an equality, replacing the
clumsier bounds for |
1
| and |
2
| which led to (1.8). This more careful
treatment of the errors pays big dividends later on. Note also that (2.17)
and (2.19) hold for all bounded absolute continuous f.
February 23, 2005 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) main-t
Normal approximation 13
3. Normal approximation for smooth functions
Our goal in this section is to estimate IEh(W) IEh(Z) for various classes
of random variables W, when h is a smooth function satisfying
h

:= sup
x
|h

(x)| < . (3.1)


Such estimates lead naturally to bounds on the accuracy of the standard
normal approximation to W in terms of the Wasserstein distance d
W
, and
hence suce to prove the central limit theorem. The next theorem high-
lights this, and also shows that Wasserstein bounds imply bounds, albeit
rather weaker, with respect to the Kolmogorov distance d
K
.
Theorem 3.1: Assume that there exists a such that, for any uniformly
Lipschitz function h,
|IEh(W) IEh(Z)| h

. (3.2)
Then
d
W
(L(W), N(0, 1)) := sup
hLip(1)
|IEh(W) IEh(Z)| ; (3.3)
d
K
(L(W), N(0, 1)) := sup
z
|IP(W z) (z)| 2
1/2
. (3.4)
Proof: The rst statement is immediate from the denition of d
W
. For the
second, we can assume that 1/4, since otherwise (3.4) is trivial. Let
=
1/2
(2)
1/4
, and dene for xed z
h

(w) =
_

_
1 if w z,
0 if w z +,
linear if z w z +.
Then h

= 1/, and hence, by (3.2),


IP(W z) (z) IEh

(W) IEh

(Z) + IEh

(Z) (z)

+ IP{z Z z +}

2
,
and hence
IP(W z) (z) 2(2)
1/4

1/2
2
1/2
. (3.5)
February 23, 2005 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) main-t
14 Louis H. Y. Chen and Qi-Man Shao
Similarly, we have
IP(W z) (z) 2
1/2
, (3.6)
proving (3.4).
In the next three sections, we show that (3.2) is satised with suitably
small , when W is a sum of (i) independent random variables, or (ii)
locally dependent random variables. We also show that (3.2) is satised
when (iii) W is such that an exchangeable pair (W, W

) can be constructed
having the linear regression property:
IE{W

| W} = (1 )W for some 0 < < 1. (3.7)


3.1. Independent random variables
In this section, we use (2.19) to prove (3.2) for W a sum of independent
random variables with zero means and nite third moments, extending (1.8)
to non-identically distributed random variables.
Theorem 3.2: Let
1
,
2
, ,
n
be independent random variables sat-
isfying IE
i
= 0 and IE|
i
|
3
< for each 1 i n, and such that

n
i=1
IE
2
i
= 1. Then Theorem 3.1 can be applied with
= 3
n

i=1
IE|
i
|
3
. (3.8)
In particular, we have

IE|W|
_
2

3
n

i=1
IE|
i
|
3
.
Proof: It follows from Lemma 2.3 that f

h
2h

. Therefore, by (2.19)
and the mean value theorem,
|IE{f

h
(W) Wf
h
(W)}|
n

i=1
_

IE|f

h
(W) f

h
(W
(i)
+t)|K
i
(t) dt
2h

i=1
_

IE(|t| +|
i
|)K
i
(t) dt.
February 23, 2005 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) main-t
Normal approximation 15
Using (2.16), it thus follows that
|IE{f

h
(W) Wf
h
(W)}| 2h

i=1
(IE|
i
|
3
/2 + IE|
i
|IE
2
i
)
3h

i=1
IE|
i
|
3
.
It is actually not necessary to assume the existence of nite third mo-
ments in Theorem 3.2. The quantity can be computed in terms of the
elements appearing in the statement of the Lindeberg central limit theo-
rem.
Theorem 3.3: Let
1
,
2
, ,
n
be independent random variables satis-
fying IE
i
= 0 for each 1 i n and such that

n
i=1
IE
2
i
= 1. Then
Theorem 3.1 can be applied with
= 4(4
2
+ 3
3
), (3.9)
where

2
=
n

i=1
IE
2
i
I
{|
i
|>1}
and
3
=
n

i=1
IE|
i
|
3
I
{|
i
|1}
. (3.10)
Proof: We use (2.12) and (2.13) to show that
|f

h
(W) f

h
(W
(i)
+t)| h

min
_
8, 2(|t| +|
i
|)
_
8h

(|t| 1 +|
i
| 1),
where ab denotes min(a, b). Substituting this bound into (2.19), we obtain
|IEh(W) IEh(Z)| 8h

i=1
_

IE(|t| 1 +|
i
| 1)K
i
(t) dt. (3.11)
Now
_

(|t| 1){1
[0,x]
(t) 1
[x,0)
(t)} dt =
_
1
2
|x|t
2
+|x|(|x| 1) if |x| > 1;
1
2
|x|
3
, if |x| 1,
so that
_

IE(|t| 1 +|
i
| 1)K
i
(t) dt
= IE{|
i
|(|
i
| 1)I
{|
i
|>1}
} +
1
2
IE{|
i
|(|
i
| 1)
2
} + IE{
2
i
IE(|
i
| 1)}
= IE{
2
i
I
{|
i
|>1}
}
1
2
IE{|
i
|I
{|
i
|>1}
}
+
1
2
IE{|
i
|
3
I
{|
i
|1}
} + IE{
2
i
IE(|
i
| 1)}.
February 23, 2005 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) main-t
16 Louis H. Y. Chen and Qi-Man Shao
Adding over i, it follows that
|IEh(W) IEh(Z)| 8h

2
+
1
2

3
+
n

i=1
IE
2
i
IE(|
i
| 1)
_
. (3.12)
However, since both x
2
and (x 1) are increasing functions of x 0, it
follows that, for any random variable ,
IE
2
IE(|| 1) IE{
2
(|| 1)} = IE||
3
I
{||1}
+ IE
2
I
{||>1}
, (3.13)
so that the nal sum is no greater than
3
+
2
, completing the bound.
Theorem 3.1 and Theorem 3.3 together yield the Lindeberg central limit
theorem, as follows. Let X
1
, X
2
, , X
n
be independent random variables
with IEX
i
= 0 and IEX
2
i
< for each 1 i n. Put
S
n
=
n

i=1
X
i
and B
2
n
=
n

i=1
IEX
2
i
.
To apply Theorems 3.1 and 3.3, let

i
= X
i
/B
n
and W = S
n
/B
n
. (3.14)
Dene
2
and
3
as in (3.10), and observe that, for any 0 < < 1,

2
+
3
=
1
B
2
n
n

i=1
IE{X
2
i
I
{|X
i
|>B
n
}
} +
1
B
3
n
n

i=1
IE{|X
i
|
3
I
{|X
i
|B
n
}
}

1
B
2
n
n

i=1
IE{X
2
i
I
{|X
i
|>B
n
}
} +
1
B
3
n
n

i=1
B
n
IE{X
2
i
I
{B
n
|X
i
|B
n
}
}
+
1
B
3
n
n

i=1
B
n
IE{X
2
i
I
{|X
i
|<B
n
}
}

1
B
2
n
n

i=1
IE{X
2
i
I
{|X
i
|>B
n
}
} +. (3.15)
If the Lindeberg condition holds, that is if
1
B
2
n
n

i=1
IE{X
2
i
I
{|X
i
|>B
n
}
} 0 as n (3.16)
for all > 0, then (3.15) implies
2
+
3
0 as n , since is arbitrary.
Hence, by Theorems 3.1 and 3.3,
sup
z
|IP(S
n
/B
n
z) (z)| 8(
2
+
3
)
1/2
0 as n .
This proves the Lindeberg central limit theorem.
February 23, 2005 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) main-t
Normal approximation 17
3.2. Locally dependent random variables
An m-dependent sequence of random variables
i
, i Z, is one with the
property that, for each i, the sets of random variables {
j
, j i} and
{
j
, j > i + m} are independent. As a special case, sequences of indepen-
dent random variables are 0-dependent. Local dependence generalizes the
notion of m-dependence to random variables with arbitrary index set. It is
applicable, for instance, to random variables indexed by the vertices of a
graph, and such that the collections {
i
, i I} and {
j
, j J} are inde-
pendent whenever I J = and the graph contains no edges {i, j} with
i I and j J.
Let J be a nite index set of cardinality n, and let {
i
, i J} be a
random eld with zero means and nite variances. Dene W =

iJ

i
,
and assume that Var(W) = 1. For A J, let
A
denote {
i
, i A} and
A
c
= {j J : j A}. We introduce the following two assumptions,
dening dierent strengths of local dependence.
(LD1) For each i J there exists A
i
J such that
i
and
A
c
i
are
independent.
(LD2) For each i J there exist A
i
B
i
J such that
i
is independent
of
A
c
i
and
A
i
is independent of
B
c
i
.
We then dene
i
=

jA
i

j
and
i
=

jB
i

j
. Note that, for inde-
pendent random variables
i
, we can take A
i
= B
i
= {i}, for which then

i
=
i
=
i
.
Theorem 3.4: Theorem 3.1 can be applied with
= 4IE

iJ
{
i

i
IE(
i

i
)}

iJ
IE|
i

2
i
| (3.17)
under (LD1), and with
= 2

iJ
(IE|
i

i
| +|IE(
i

i
)|IE|
i
|) +

iJ
IE|
i

2
i
| (3.18)
under (LD2).
Remark: For independent random variables, the value of in (3.18) is
5

iJ
IE|
i
|
3
, somewhat larger than the direct bound given in Theo-
rem 3.2.
February 23, 2005 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) main-t
18 Louis H. Y. Chen and Qi-Man Shao
Proof: We rst derive Stein identities similar to (2.17) and (2.19). Let
f = f
h
be the solution of the Stein equation (2.4). Then
IE{Wf(W)} =

iJ
IE
i
f(W)
=

iJ
IE
i
[f(W) f(W
i
)],
by the independence of
i
and W
i
. Hence
IE{Wf(W)} =

iJ
IE{
i
[f(W) f(W
i
)
i
f

(W)]}
+ IE
__

iJ

i
_
f

(W)
_
. (3.19)
Now, because IE
i
= 0 for all i, and from (LD1), it follows that
1 = IEW
2
=

iJ

jJ
IE{
i

j
} =

iJ
IE{
i

i
},
giving
IE{f

(W) Wf(W)} = IE
_

iJ
{
i

i
IE(
i

i
)}f

(W)
_
(3.20)

iJ
IE{
i
[f(W) f(W
i
)
i
f

(W)]} .
By (2.12) and (2.13), f

4h

and f

2h

. Therefore it follows
from (3.20) and the Taylor expansion that
|IEh(W) IEh(Z)| h

_
4IE

iJ
{
i

i
IE(
i

i
)}

iJ
IE|
i

2
i
|
_
.
This proves (3.17).
When (LD2) is satised, f

(W
i
) and
i

i
are independent for each
i J. Hence, using (3.20), we can write
|IEh(W) IEh(Z)|

IE

iJ
{
i

i
IE(
i

i
)}(f

(W) f

(W
i
))

+h

iJ
IE|
i

2
i
|
h

_
2

iJ
(IE|
i

i
| +|IE(
i

i
)|IE|
i
|) +

iJ
IE|
i

2
i
|
_
,
as desired.
February 23, 2005 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) main-t
Normal approximation 19
Here are two examples of locally dependent random elds. We refer to
Baldi & Rinott (1989), Rinott (1994), Baldi, Rinott & Stein (1989), Dembo
& Rinott (1996), and Chen & Shao (2004) for more details.
Example 1. Graphical dependence.
Consider a set of random variables {X
i
, i V} indexed by the vertices
of a graph G = (V, E). G is said to be a dependency graph if, for any pair
of disjoint sets
1
and
2
in V such that no edge in E has one endpoint
in
1
and the other in
2
, the sets of random variables {X
i
, i
1
} and
{X
i
, i
2
} are independent. Let D denote the maximal degree of G; that
is, the maximal number of edges incident to a single vertex. Let
A
i
= {i} {j V : there is an edge connecting j and i}
and B
i
=

jA
i
A
j
. Then {X
i
, i V} satises (LD2). Hence (3.18) holds.
Example 2. The number of local maxima on a graph.
Consider a graph G = (V, E) (which is not necessarily a dependency
graph) and independent and identically distributed continuous random vari-
ables {Y
i
, i V}. For i V dene the 0-1 indicator variable
X
i
=
_
1 if Y
i
> Y
j
for all j N
i
0 otherwise
where N
i
= {j V : {i, j} E}, so that X
i
= 1 indicates that Y
i
is a local
maximum. Let W =

iV
X
i
be the number of local maxima. Let
A
i
= {i}
_
_
_
_
jN
i
N
j
_
_
_
and B
i
=
_
jA
i
A
j
.
Then {X
i
, i V} satises (LD2), and (3.18) holds.
One can refer to Dembo & Rinott (1996) and Rinott & Rotar (1996)
for more examples and results under local dependence.
3.3. Exchangeable pairs
Let W be a random variable that is not necessarily the partial sum of
independent random variables. Suppose that W is approximately normal,
and that we want to nd how accurate the approximation is. Another basic
approach to Steins method is to introduce a second random variable W

on
the same probability space, in such a way that (W, W

) is an exchangeable
pair; that is, such that (W, W

) and (W

, W) have the same distribution.


February 23, 2005 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) main-t
20 Louis H. Y. Chen and Qi-Man Shao
The approach makes essential use of the elementary fact that, if (W, W

) is
an exchangeable pair, then
IEg(W, W

) = 0 (3.21)
for all antisymmetric measurable functions g(x, y) such that the expected
value exists.
A key identity is the following lemma (see Stein 1986).
Lemma 3.5: Let (W, W

) be an exchangeable pair of real random variables


with nite variance, having the linear regression property
IE(W

| W) = (1 )W (3.22)
for some 0 < < 1. Then
IEW = 0 and IE(W

W)
2
= 2IEW
2
, (3.23)
and, for every piecewise continuous function f satisfying the growth condi-
tion |f(w)| C(1 +|w|), we have
IE{Wf(W)} =
1
2
IE{(W W

)(f(W) f(W

))}. (3.24)
Proof: The proof exploits (3.21) with g(x, y) = (x y)(f(y) + f(x)), for
which IEg(W, W

) exists, because of the assumption on f. Then (3.21) gives


0 = IE{(W W

)(f(W

) +f(W))}
= IE{(W W

)(f(W

) f(W))} + 2IE{f(W)(W W

)} (3.25)
= IE{(W W

)(f(W

) f(W))} + 2IE
_
f(W)IE(W W

| W)
_
= IE{(W W

)(f(W

) f(W))} + 2IE{Wf(W)},
this last by (3.22), and this is just (3.24).
Using this lemma, we can deduce the following theorem.
Theorem 3.6: If (W, W

) is exchangeable and (3.22) holds, it follows that


Theorem 3.1 can be applied with
= 4IE

1
1
2
IE
_
(W

W)
2
| W
_

+
1
2
IE|W W

|
3
. (3.26)
Remark: In the rst term, note that
IE
_
1
1
2
IE
_
(W

W)
2
| W
_
_
= 1
1
2
IE(W

W)
2
= 1 IEW
2
,
so that the bound is unlikely to be useful unless IEW
2
is close to 1.
February 23, 2005 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) main-t
Normal approximation 21
Proof: Let f = f
h
be the solution (2.5) to the Stein equation, and dene

K(t) = (W W

)
_
I
{(WW

)t0}
I
{0<t(WW

)}
_
0,
noting that
_

K(t) dt = (W W

)
2
. (3.27)
By (3.24),
IE{Wf(W)} =
1
2
IE
_
_
0
(WW

)
f

(W +t)(W W

) dt
_
=
1
2
IE
__

(W +t)

K(t) dt
_
and
IEf

(W) = IE
_
f

(W)
_
1
1
2
(W W

)
2
__
+
1
2
IE
__

(W)

K(t) dt
_
,
from (3.27). Putting the two together gives
|IEh(W) IEh(Z)| = |IE{f

(W) Wf(W)}|
=

IEf

(W)
_
1
1
2
(W W

)
2
_
+
1
2
IE
_

(f

(W) f

(W +t))

K(t) dt

IE
_
f

(W)
_
1
1
2
IE((W W

)
2
| W)
__

+
1
2
IE

(f

(W) f

(W +t))

K(t) dt

,
and now the bounds in Lemma 2.3 give
|IEh(W) IEh(Z)|
h

_
4IE

1
1
2
IE((W W

)
2
| W)

+
1

IE
_

|t|

K(t) dt
_
= h

_
4IE

1
1
2
IE
_
(W

W)
2
| W
_

+
1
2
IE|W W

|
3
_
,
from (3.27), as desired.
We use the following example to show how to apply the bound in the
above theorem. Let
i
be independent random variables with zero means
and

n
i=1
IE
2
i
= 1, and put W =

n
i=1

i
. Let {

i
, 1 i n} be an
independent copy of {
i
, 1 i n}, and let I have uniform distribution
February 23, 2005 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) main-t
22 Louis H. Y. Chen and Qi-Man Shao
on {1, 2, . . . , n}, independent of {
i
} and {

i
}. Dene W

= W
I
+

I
.
Then (W, W

) is an exchangeable pair, and


IE(W

| W) =
_
1
1
n
_
W,
so that (3.22) is satised with = 1/n. Direct calculation also gives
IE|W W

|
3
=
1
n
n

i=1
IE|
i

i
|
3
(8/n)
n

i=1
IE|
i
|
3
and
IE((W W

)
2
| W) =
1
n
_
1 +
n

i=1
IE(
2
i
| W)
_
; (3.28)
from the latter, we have
IE

1
1
2
IE
_
(W

W)
2
| W
_

=
1
2
IE

1 IE(
n

i=1

2
i
| W)

1
2
IE

i=1
(
2
i
IE
2
i
)

.
Thus, if the
i
have nite fourth moments, Theorem 3.1 can be applied with
= 2

_
n

i=1
Var(
2
i
) + 4
n

i=1
IE|
i
|
3
2

_
n

i=1
IE|
i
|
4
+ 4
n

i=1
IE|
i
|
3
.
In particular, if
i
= n
1/2
X
i
, where the random variables X
i
are indepen-
dent and identically distributed random variables with nite fourth mo-
ments, then the bound is of the correct order O(n
1/2
).
On the other hand, if we keep the original form

IE
_
f

(W)
_
1
1
2
IE((W W

)
2
| W)
_
_

which gave rise to the rst term on the right hand side of (3.26), then we
only need to assume nite third moments. To see this, by (3.28).

IE
_
f

(W)
_
1
1
2
IE((W W

)
2
| W)
__

=
1
2

IE
_
f

(W)
n

i=1
(IE
2
i

2
i
)
_

=
1
2

i=1
IE
_
(f

(W) f

(W
i
))(IE
2
i

2
i
)
_

,
February 23, 2005 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) main-t
Normal approximation 23
because W
i
and
i
are independent. This is now bounded using
Lemma 2.3 by
h

i=1
IE|
i
(IE
2
i

2
i
)| 2h

i=1
IE|
i
|
3
,
and Theorem 3.1 can be applied with
= 6
n

i=1
IE|
i
|
3
.
We therefore end this section with the following alternative to (3.26): if
(W, W

) is exchangeable and (3.22) holds, then


|IEh(W) IEh(Z)|

IEf

h
(W)(1
1
2
(W W

)
2
)

+
1
2
h

IE|W W

|
3
.
(3.29)
4. Uniform BerryEsseen bounds: the bounded case
In the previous section, the sharpest bounds in Theorem 3.1 were of order
O(), and were obtained for the Wasserstein distance d
W
. Those for the
Kolmogorov distance d
K
were only of the larger order O(
1/2
). Here, we
turn to deriving bounds for d
K
which are of comparable order to those
for d
W
. We begin with the simplest case of independent summands. The
method of proof motivates the formulation of a rather general theorem,
which is then applied in a dependent setting, that of the number of 1s in
the binary expansion of a randomly chosen integer, using an exchangeable
pair.
4.1. Independent random variables
Let
1
,
2
, ,
n
be independent random variables with zero means and
with

n
i=1
IE
2
i
= 1. We use the notation of Section 2.3:
W =
n

i=1

i
, W
(i)
= W
i
and K
i
(t) = IE
_

i
(I
{0t
i
}
I
{
i
t<0}
)
_
.
Let f
z
be the solution of the Stein equation (2.1). For bounded
i
, we are
ready to apply (2.17) to obtain the following BerryEsseen bound.
Theorem 4.1: If |
i
|
0
for 1 i n, then
sup
z
|IP(W z) (z)| 3.3
0
. (4.1)
February 23, 2005 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) main-t
24 Louis H. Y. Chen and Qi-Man Shao
Proof: Write f = f
z
. It follows from (2.17) and because f satises the
Stein equation (2.1) that
IE{Wf(W)} =
n

i=1
_

IE{f

(W
(i)
+t)}K
i
(t) dt
=
n

i=1
_

IE
_
(W
(i)
+t)f(W
(i)
+t) +I
{W
(i)
+tz}
(z)
_
K
i
(t) dt.
Reorganizing this, and recalling that
n

i=1
_

K
i
(t) dt =
n

i=1
IE
2
i
= 1,
we have
n

i=1
_

IP(W
(i)
+t z)K
i
(t) dt (z)
=
n

i=1
_

IE
_
Wf(W) (W
(i)
+t)f(W
(i)
+t)
_
K
i
(t) dt. (4.2)
Now, by (2.10),
n

i=1
IE
_

Wf(W) (W
(i)
+t)f(W
(i)
+t)

K
i
(t) dt

i=1
_

IE
_
(|W
(i)
| +

2/4)(|
i
| +|t|)
_
K
i
(t) dt
(1 +

2/4)
n

i=1
_

(IE|
i
| +|t|)K
i
(t),
since IE{W
(i)
}
2
1 and
i
and W
(i)
are independent. Hence, recall-
ing (2.16), we have

i=1
_

IP(W
(i)
+t z)K
i
(t) dt (z)

(1 +

2/4)
n

i=1
_
IE|
i
|IE
2
i
+
1
2
IE|
i
|
3
}

3
2
(1 +

2/4)
n

i=1
IE|
i
|
3
. (4.3)
February 23, 2005 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) main-t
Normal approximation 25
Hence we would be nished if IP(W
(i)
+ t z) could be replaced by
IP(W z), since we have

n
i=1
_

K
i
(t) dt = 1. Clearly, in view of the
fact that IP(W z) = IP(W
(i)
z
i
), we have
|IP(W
(i)
+t z) IP(W z)|
IP(z max{
i
, t} W
(i)
z min{
i
, t}), (4.4)
and the dierence should be small if both |t| and |
i
| are.
Since the |
i
| are uniformly bounded by
0
, proving that the dierence
is small is particularly simple. First, we note that |
i
|
0
implies also that
K
i
(t) = 0 for |t| >
0
, so that we only need to consider (4.4) with both |t|
and |
i
| bounded by
0
. But then
IP(W
(i)
+t z) = IP(W
i
+t z)
_
IP(W z 2
0
)
IP(W z + 2
0
).
(4.5)
Taking z + 2
0
for z in the rst inequality in (4.5) and substituting it
into (4.3) gives
IP(W z) (z) (z + 2
0
) (z) +
3
2
(1 +

2/4)
n

i=1
IE|
i
|
3

2
0

2
+
3
2
(1 +

2/4)
0
3.3
0
, (4.6)
and the corresponding lower bound follows from the second inequality in
(4.5) and (4.3) with z 2
0
for z, completing the proof.
One can see from the above approach that the key ingredient of the
proof is to rewrite IE{Wf(W)} in terms of a functional of f

. We formulate
this in abstract form as follows.
Theorem 4.2: Let W be a real valued random variable having IE|W| 1,
and let f
z
be the solution of the Stein equation (2.1). Suppose that there
exist random variables R
1
, R
2
and M(t) 0, and constants
0
,
1
and
2
that do not depend on z, such that
_
|t|
0
M(t) dt = 1, (4.7)
|R
1
|
1
, |IE(R
2
)|
2
(4.8)
February 23, 2005 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) main-t
26 Louis H. Y. Chen and Qi-Man Shao
and
IE{Wf
z
(W)} = IE
_
_
|t|
0
f

z
(W +R
1
+t)M(t) dt
_
+ IE(R
2
). (4.9)
Then it follows that
sup
z
|IP(W z) (z)| 2.1(
0
+
1
) +
2
. (4.10)
Remark: Note that, although the function M is random, its integral always
takes the xed value 1.
Proof: Since f
z
satises the Stein equation (2.1), we have
IE
_
_
|t|
0
f

z
(W +R
1
+t)M(t) dt
_
= IE
_
_
|t|
0
(I
{W+R
1
+tz}
(z))M(t) dt
_
+ IE
_
_
|t|
0
(W +R
1
+t)f
z
(W +R
1
+t)M(t) dt
_
IE
_
_
|t|
0
(I
{Wz+
0
+
1
}
(z))M(t) dt
_
+ IE
_
_
|t|
0
(W +R
1
+t)f
z
(W +R
1
+t)M(t) dt
_
,
where the last line follows because R
1
t
0
+
1
. Hence, and since
_
|t|
0
M(t) dt = 1, we nd that
IE
_
_
|t|
0
f

z
(W +R
1
+t)M(t) dt
_
IP(W z +
0
+
1
) (z)
+
_
|t|
0
IE{(W +R
1
+t)f
z
(W +R
1
+t)M(t)} dt
IP(W z +
0
+
1
) (z +
0
+
1
)
+
(
0
+
1
)

2
+
_
|t|
0
IE{(W +R
1
+t)f
z
(W +R
1
+t)M(t)} dt.
February 23, 2005 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) main-t
Normal approximation 27
Thus by (4.9), (4.8) and (4.7)
IP(W z +
0
+
1
) (z +
0
+
1
)

(
0
+
1
)

2
IER
2
+
_
|t|
0
IE{[Wf
z
(W) (W +R
1
+t)f
z
(W +R
1
+t)]M(t)} dt

(
0
+
1
)

2

_
|t|
0
IE{(|W| +

2/4)(|R
1
| +|t|)M(t)} dt,
this last by (2.10), and so
IP(W z +
0
+
1
) (z +
0
+
1
)

(
0
+
1
)

2
IE
_
_
|t|
0
(|W| +

2/4)(
0
+
1
)M(t) dt
_
=
(
0
+
1
)

2
(IE|W| +

2/4)(
0
+
1
)
2.1(
0
+
1
)
2
. (4.11)
A similar argument gives
IP(W z
0

1
) (z
0

1
) 2.1(
0
+
1
) +
2
, (4.12)
and this proves (4.10).
To see why Theorem 4.1 is a special case of Theorem 4.2, let I be
independent of {
i
, 1 i n} with IP(I = i) = IE
2
i
for i = 1, 2, , n.
Then we can rewrite (2.17) as
IEWf(W) = IE
_
|t|
0
f

(W
(I)
+t)

K
I
(t) dt
= IE
_
|t|
0
f

(W +W
(I)
W +t)

K
I
(t) dt,
where

K
i
(t) = K
i
(t)/IE
2
i
. Set R
1
= W
(I)
W, R
2
= 0 and M(t) =

K
I
(t)
in Theorem 4.2. It is easy to see that conditions (4.7) (4.9) are satised
with
1
=
0
and
2
= 0. Hence (4.10) holds with a bound of 4.2
0
.
In the next section, we illustrate how to use Theorem 4.2 to get a Berry
Esseen bound for the number of ones in the binary expansion of a random
integer, using an exchangeable pair approach.
February 23, 2005 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) main-t
28 Louis H. Y. Chen and Qi-Man Shao
4.2. Binary expansion of a random integer
Let n 2 be a natural number and X be a random variable uniformly
distributed over the set {0, 1, , n1}. Let k be such that 2
k1
< n 2
k
.
Write the binary expansion of X as
X =
k

i=1
X
i
2
ki
and let S = X
1
+ + X
k
be the number of ones in the binary expansion
of X. When n = 2
k
, the distribution of S is the binomial distribution for
k trials with probability 1/2, and hence can be approximated by a normal
distribution. We shall show that the normal approximation is good for any
large n.
Theorem 4.3: Let k be such that 2
k1
< n 2
k
, and set W =
S(k/2)

k/4
.
Then
sup
z
|IP(W z) (z)| 6.2k
1/2
. (4.13)
Proof: We use the exchangeable pair approach. Let I be a random variable
uniformly distributed over the set {1, 2, , k} and independent of X, and
let the random variable X

be dened by
X

=
k

i=1
X

i
2
ki
,
where
X

i
=
_
_
_
X
i
if i = I
1 X
I
if i = I and X + 2
kI
< n
0 if X
I
= 0 and X + 2
kI
n.
Also let S

=

k
i=1
X

i
and W

=
S

(k/2)

k/4
. The ordered pair (X, X

) of
random variables is exchangeable, and thus the pairs (S, S

) and (W, W

)
are both exchangeable. Hence, for any function f on {0, 1, . . . , k}, we have
0 = IE
_
(S

S)(f(S) +f(S

))}
= 2IE
_
(S

S)f(S)} + IE
_
(S

S)(f(S

) f(S))}
= 2IE{f(S)IE(S

S|X)} + IE
_
IE((S

S)(f(S

) f(S))|X)}. (4.14)
February 23, 2005 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) main-t
Normal approximation 29
Observe that
IE(S

S|X) = IP(S

S = 1|X) IP(S

S = 1|X)
= IP(X
I
= 0, X

I
= 1|X) IP(X
I
= 1|X)
= IP(X
I
= 0|X) IP(X
I
= 0, X

I
= 0|X) IP(X
I
= 1|X)
= IE(1 X
I
|X) IP(X
I
= 0, X

I
= 0|X) IP(X
I
= 1|X)
=
1
k
k

i=1
(1 X
i
)
1
k
k

i=1
I
{X
i
=0,X+2
ki
n}

1
k
k

i=1
X
i
= 1
2S
k

Q
k
, (4.15)
where Q =

k
i=1
I
{X
i
=0,X+2
ki
n}
. Now rewrite (4.14), with a re-denition
of the (arbitrary) function f, as
k
1/2
IE{f(W)IE(S S

|X)} =
1
2
k
1/2
IE
_
IE((f(W

) f(W))(S

S)|X)}.
(4.16)
The left hand side of (4.16) is
k
1/2
IE
_
f(W)
_
1 +
2S
k
+
Q
k
_
_
= IE
_
f(W)
_
S k/2
k
1/2
/2
+
Q
k
1/2
_
_
= IE{Wf(W)} +k
1/2
IEQf(W), (4.17)
and the right hand side of (4.16) can be written as
1
2
k
1/2
IE
_
IE{(f(W

) f(W))I
{S

S=1}
|X}
IE{(f(W

) f(W))I
{S

S=1}
|X}
_
=
1
2
k
1/2
IE
_
IE{((f(W + 2k
1/2
) f(W))I
{S

S=1}
|X}
IE{((f(W 2k
1/2
) f(W))I
{S

S=1}
|X}
_
=
1
2
k
1/2
IE
_
(f(W + 2k
1/2
) f(W))IP(S

S = 1|X)
(f(W 2k
1/2
) f(W))IP(S

S = 1|X)
_
=
1
2
k
1/2
IE
_
(f(W + 2k
1/2
) f(W))
_
1
S
k

Q
k
_
(f(W 2k
1/2
) f(W))
S
k
_
.
February 23, 2005 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) main-t
30 Louis H. Y. Chen and Qi-Man Shao
The latter expression can in turn be written as
IE
_
(f(W + 2k
1/2
) f(W))
_
k S
2k
1/2
__
IE
_
(f(W 2k
1/2
) f(W))
_
S
2k
1/2
__
IE
_
(f(W + 2k
1/2
) f(W))
Q
2k
1/2
_
(4.18)
= IE
_
|t|2k
1/2
f

(W +t)M(t) dt IE
_
(f(W + 2k
1/2
) f(W))
Q
2k
1/2
_
,
where
M(t) =
_
_
_
kS
2k
1/2
for 0 t
2
k
1/2
S
2k
1/2
for
2
k
1/2
t < 0.
Note that M(t) 0 and
_
|t|2k
1/2
M(t) dt = 1. So, taking f = f
z
as given
in (2.2), we have
IE{Wf
z
(W)} =
_
|t|2k
1/2
IE{f

z
(W +t)M(t)} dt + IE(R
2
), (4.19)
of the form (4.9) with R
1
= 0 and
0
= 2k
1/2
, where
R
2
=
_
(f(W + 2k
1/2
) f(W))
Q
2k
1/2
_
k
1/2
Qf(W).
Then, by (2.9), it follows that
|IE(R
2
)|
_

2
8
+

2
4
_
IEQ

k
2k
1/2
,
since IEQ 2, as shown below. It thus follows from Theorem 4.2 that
sup
z
|IP(W z) (z)| 2.1(2k
1/2
) + 2k
1/2
= 6.2k
1/2
,
as desired.
To show that IEQ 2, simply observe that, from its denition,
IEQ =
k

i=1
IP(X
i
= 0, X + 2
ki
n)
k

i=1
IP(X n 2
ki
)
=
k

i=1
2
ki
n
=
2
k
1
n

2
k
1
2
k1
2.
This completes the proof.
February 23, 2005 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) main-t
Normal approximation 31
The binary expansion of a random integer has previously studied by a
number of authors. Diaconis (1977) and Stein (1986) also proved that the
distribution of S, the number of ones in the binary expansion, is only order
O(k
1/2
) away from the B(k, 1/2), whereas Barbour & Chen (1992) further
proved that, if a mixture of the B(k 1, 1/2) and B(k, 1/2) is used as an
approximation, the error can be reduced to O(k
1
).
5. Uniform BerryEsseen bounds: the independent case
Let
1
,
2
, ,
n
be independent random variables with zero means and

n
i=1
IE
2
i
= 1. Let
=
n

i=1
IE|
i
|
3
(5.1)
and W =

n
i=1

i
. If IE|
i
|
3
< , then we have the uniform BerryEsseen
inequality
sup
z
|IP(W z) (z)| C
0
(5.2)
and the non-uniform Berry-Esseen inequality
z R
1
, |IP(W z) (z)| C
1
(1 +|z|)
3
, (5.3)
where both C
0
and C
1
are absolute constants. One can take C
0
= 0.7975
[van Beeck (1972)] and C
1
= 31.935 [Paditz (1989)]. We shall use the con-
centration inequality approach to give a direct proof of (5.2) in this section
and (5.3) in next section, albeit with dierent constants. We refer to Chen
& Shao (2001) for more details.
5.1. The concentration inequality approach
As indicated in (1.10) and by (4.4) in the proof of Theorem 4.1, a key
step in proving the BerryEsseen bound is to have a good bound for the
probability IP(a W
(i)
b), where W
(i)
=

j=i

j
. In this section, we
prove such a concentration inequality. The idea is once again to use the Stein
identity. More precisely, we use the fact that IEf

(W) is close to IE{Wf(W)}


when W is close to the normal. If f

is taken to be the indicator 1


[a,b]
, then
IEf

(W) = IP(a W b); on the other hand, choosing f(


1
2
(b a)) = 0, it
follows that f =
1
2
(b a), so that
|IE{Wf(W)}|
1
2
(b a)IE|W|
1
2
(b a)
February 23, 2005 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) main-t
32 Louis H. Y. Chen and Qi-Man Shao
if IEW
2
= 1. Thus, if IEf

(W) and IE{Wf(W)} are close, it follows that


IP(a W b) is close to
1
2
(b a). The proof of the inequality below makes
this heuristic precise.
Proposition 5.1: We have
IP(a W
(i)
b)

2(b a) + (1 +

2) (5.4)
for all real a < b and for every 1 i n.
Proof: Dene = /2 and take
f(w) =
_

1
2
(b a) if w < a ,
w
1
2
(b +a) if a w b +,
1
2
(b a) + for w > b +,
(5.5)
so that f

= 1
[a,b+]
and f =
1
2
(b a) +. Set

M
j
(t) =
j
(I
{
j
t0}
I
{0<t
j
}
) 0, (5.6)

M(t) =

1jn

M
j
(t), M(t) = IE

M(t).
Since
j
and W
(i)

j
are independent for j = i,
i
is independent of W
(i)
and IE
j
= 0 for all j, we have
IE{W
(i)
f(W
(i)
)} IE{
i
f(W
(i)

i
)}
=
n

j=1
IE{
j
[f(W
(i)
) f(W
(i)

j
)]} =
n

j=1
IE
_

j
_
0

j
f

(W
(i)
+t) dt
_
.
Hence, using (5.6), we have
IE{W
(i)
f(W
(i)
)} IE{
i
f(W
(i)

i
)}
=
n

j=1
IE
__

(W
(i)
+t)

M
j
(t) dt
_
= IE
__

(W
(i)
+t)

M(t) dt
_
.
At this point, we have reached a precise replacement for the statement
IEf

(W) is close to IE{Wf(W)} of the heuristic, with W


(i)
for W. To
exploit it, we rst note that
IE
__

(W
(i)
+t)

M(t) dt
_
IE
_
_
|t|
f

(W
(i)
+t)

M(t) dt
_
,
February 23, 2005 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) main-t
Normal approximation 33
because f

(t) 0 and

M(t) 0. But now the denition of f implies that
IE
_
_
|t|
f

(W
(i)
+t)

M(t) dt
_
IE
_
I
{aW
(i)
b}
_
|t|

M(t) dt
_
= IE
_
_
_
I
{aW
(i)
b}
n

j=1
|
j
| min(, |
j
|)
_
_
_
H
1,1
H
1,2
, (5.7)
where
H
1,1
= IP(a W
(i)
b)
n

j=1
IE|
j
| min(, |
j
|)
and
H
1,2
= IE

j=1
{|
j
| min(, |
j
|) IE|
j
| min(, |
j
|)}

.
A direct calculation yields
min(x, y) x x
2
/(4y) (5.8)
for x > 0 and y > 0, implying that
n

j=1
IE|
j
| min(, |
j
|)
n

j=1
_
IE
2
j

IE|
j
|
3
4
_
=
1
2
, (5.9)
by choice of , and hence that
H
1,1

1
2
IP(a W
(i)
b). (5.10)
By the Holder inequality,
H
1,2

_
Var
_
n

j=1
|
j
| min(, |
j
|)
__
1/2

_
n

j=1
IE
2
j
min(, |
j
|)
2
_
1/2

_
n

j=1
IE
2
j
_
1/2
= . (5.11)
Hence
IE
__

(W
(i)
+t)

M(t) dt
_

1
2
IP(a W
(i)
b) , (5.12)
February 23, 2005 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) main-t
34 Louis H. Y. Chen and Qi-Man Shao
to be compared with the equation IEf

(W) = IP(a W b) of the


heuristic.
On the other hand, recalling that f
1
2
(b a) +, we have
IE{W
(i)
f(W
(i)
)} IE{
i
f(W
(i)

i
)}
{
1
2
(b a) +}(IE|W
(i)
| + IE|
i
|)

2
_
(IE|W
(i)
|)
2
+ (IE|
i
|)
2
_
1/2
(b a + 2)

2
_
IE|W
(i)
|
2
+ IE|
i
|
2
_
1/2
(b a + 2)
=
1

2
(b a + 2), (5.13)
the inequality to be compared with |IE{Wf(W)}|
1
2
(b a) from the
heuristic. Combining (5.12) and (5.13) thus gives
IP(a W
(i)
b)

2(b a) + 2(1 +

2) =

2(b a) + (1 +

2)
as desired.
5.2. Proving the BerryEsseen theorem
We are now ready to prove the classical BerryEsseen theorem, with a
constant of 7.
Theorem 5.2: We have
sup
z
|IP(W z) (z)| 7. (5.14)
Proof: As discussed in the previous section, we need to nd a way to
replace IP(W
(i)
+ t z) by IP(W z) in (4.3). However, it follows from
(5.4) that

i=1
_

IP(W
(i)
+t z)K
i
(t) dt IP(W z)

i=1
_

|IP(W
(i)
+t z) IP(W z)|K
i
(t) dt
=
n

i=1
_

IE
_
IP(z max(t,
i
) W
(i)
z min(t,
i
) |
i
)
_
K
i
(t) dt.
February 23, 2005 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) main-t
Normal approximation 35
Proposition 5.1 thus gives the bound

i=1
_

IP(W
(i)
+t z)K
i
(t) dt IP(W z)

i=1
_

IE
_
2(|t| +|
i
|) + (1 +

2)
_
K
i
(t) dt
= (1 +

2) +

2
n

i=1
(
1
2
IE|
i
|
3
+ IE|
i
|IE
2
i
)
(1 + 2.5

2). (5.15)
Now by (4.2)
|IP(W z) (z)| (1 + 2.5

2 + 1.5(1 +

2/4)) 7,
which is (4.1).
We remark that following the above lines of proof, one can prove that
sup
z
|IP(W z) (z)| 7
n

i=1
(IE
2
i
I
{|
i
|>1}
+ IE|
i
|
3
I
{|
i
|1}
),
dispensing with the third moment assumption. We leave the proof to the
reader. With a more rened concentration inequality, the constant 7 can be
reduced to 4.1 (see Chen & Shao (2001)).
5.3. A lower bound
Let X
i
, i 1, be independent random variables with zero means and nite
variances, and dene B
n
=

n
i=1
VarX
i
. It is known that if the Feller
condition
max
1in
IEX
2
i
/B
2
n
0, (5.16)
is satised, then the Lindeberg condition is necessary for the central limit
theorem. Barbour & Hall (1984) used Steins method to provide not only a
nice proof of the necessity, but also a lower bound for the Kolmogorov
distance between the distribution of W and the normal, which is as
close as can be expected to being a multiple of one of Lindebergs sums
B
2
n

n
i=1
IE{X
2
i
I
{B
1
n
|X
i
|>}
}. Note that no lower bound could simply be
a multiple of a Lindeberg sum, since a sum of n identically distributed
normal random variables is itself normally distributed, but the correspond-
ing Lindeberg sum is not zero. The next theorem and its proof are due to
Barbour & Hall (1984).
February 23, 2005 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) main-t
36 Louis H. Y. Chen and Qi-Man Shao
Theorem 5.3: Let
1
,
2
, ,
n
be independent random variables which
have zero means and nite variances IE
2
i
=
2
i
, 1 i n, and satisfy

n
i=1

2
i
= 1. Then there exists an absolute constant C such that for all
> 0
(1 e

2
/4
)
n

i=1
IE{
2
i
I
{|
i
|>}
} C
_
sup
z
|IP(W z) (z)| +
n

i=1

4
i
_
.
(5.17)
Remark: For the sequence of independent random variables {X
i
, i 1},
we take
i
= X
i
/B
n
. Clearly, Fellers negligibility condition (5.16) implies
that

n
i=1

4
i
max
1in

2
i
0 as n . Therefore, if S
n
/B
n
is asymp-
totically normal, then
n

i=1
IE{
2
i
I
{|
i
|>}
} 0
as n for every > 0, and the Lindeberg condition is satised.
Proof: Once again, the argument starts with the Stein identity
IE{f

h
(W) Wf
h
(W)} = IEh(W) IEh(Z),
for h yet to be chosen. Integrating by parts, the right hand side is bounded
by
|IEh(W) IEh(Z)| =

(w){IP(W w) (w)} dw

|h

(w)| dw, (5.18)


where = sup
z
|IP(W z) (z)|. For the left hand side, in the usual
way, because
i
and W
(i)
are independent and IE
i
= 0, we have
IE{Wf
h
(W)} =
n

i=1
IE{
2
i
f

h
(W
(i)
)}
+
n

i=1
IE{
i
(f
h
(W
(i)
+
i
) f
h
(W
(i)
)
i
f

h
(W
(i)
))},
and, because

n
i=1

2
i
= 1,
IEf

h
(W) =
n

i=1

2
i
IEf
h
(W
(i)
+
i
)
=
n

i=1

2
i
IEf

h
(W
(i)
) +
n

i=1

2
i
IE{f

h
(W) f

h
(W
(i)
)},
February 23, 2005 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) main-t
Normal approximation 37
with the last term easily bounded by
1
2
f

n
i=1

4
i
. Hence

IE{f

h
(W) Wf
h
(W)}
n

i=1
IE{
2
i
g(W
(i)
,
i
)}

1
2
f

h

n

i=1

4
i
, (5.19)
where
g(w, y) = g
h
(w, y) = y
1
{f
h
(w +y) f
h
(w) yf

h
(w)}.
Intuitively, if W is close to being normally distributed, then
R
1
:=
n

i=1
IE{
2
i
g(W
(i)
,
i
)} and R :=
n

i=1
IE{
2
i
g(Z,
i
)},
for Z N(0, 1) independent of the
i
s, should be close to one another.
Taking (5.18) and (5.19) together, it follows that we have a lower bound
for , provided that a function h can be found with
_

|h

(w)| dw <
for which IEg
h
(Z, y) is of constant sign, and provided also that f

h
< .
In practice, it is easier to look for a suitable f, and then dene h by setting
h(w) = f

(w) wf(w). The function g is zero for any linear function f,


and, for even functions f, it follows that IEg(Z, y) is antisymmetric in y
about zero, but odd functions are possibilities; for instance, f(x) = x
3
has IEg(Z, y) = y
2
, of constant sign. Unfortunately, this f fails to have
nite
_

|h

(w)| dw, but the choice f(w) = we


w
2
/2
is a good one; it
behaves much like the sum of a linear and a cubic function for those values
of w where N(0, 1) puts most of its mass, yet dies away to zero fast when |w|
is large. Making the computations,
IEg(Z, y) =
1
y

2
_

_
(w +y)e
(w+y)
2
/2
we
w
2
/2
ye
w
2
/2
(1 w
2
)
_
e
w
2
/2
dw
= y
1
_
y
2

2
_

exp{
1
2
(w
2
+ (w +y)
2
)} dw
y
2
_

2
_
=
1
2

2
(1 e
y
2
/4
) 0, (5.20)
and IEg(Z, y)
1
2

2
(1 e

2
/4
) whenever |y| . Hence, for this choice
of f, we have
R
1
2

2
(1 e

2
/4
)
n

i=1
IE{
2
i
I
{|
i
|>}
} (5.21)
February 23, 2005 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) main-t
38 Louis H. Y. Chen and Qi-Man Shao
for any > 0. It thus remains to show that R and R
1
are close enough,
after which (5.18), (5.19) and (5.21) complete the proof.
To show this, note rst that, taking this choice of f, and setting
h(w) = f

(w) wf(w), we have


c
1
:=
_

|h

(w)| dw 5; c
2
:=
_

|f

(w)| dw 4; c
3
:= sup
w
|f

(w)| = 3.
Now dene an intermediate R
2
between R
1
and R, by
R
2
:=
n

i=1
IE{
2
i
g(W

,
i
)},
where W

has the same distribution as W, but is independent of the


i
s.
Then
R
1
=
n

i=1
IE
_

2
i
_
1
0
[f

(W
(i)
+t
i
) f

(W
(i)
)] dt
_
= R
2
+
n

i=1
IE
_

2
i
_
1
0
[f

(W

+t
i
) f

(W
(i)
+t
i
)] dt
_

i=1
IE
_

2
i
_
1
0
[f

(W

) f

(W
(i)
)] dt
_
. (5.22)
Now, for any , and because
i
and W
(i)
are independent, with IE
i
= 0,
|IE
_
f

(W

+) f

(W
(i)
+)
_
|
= |IE
_
f

(W
(i)
+
i
+) f

(W
(i)
+)
_
|
= |IE
_
f

(W
(i)
+
i
+) f

(W
(i)
+)
i
f

(W
(i)
+)
_
|

1
2
c
3

2
i
,
by Taylors theorem. Hence, from (5.22) and because

n
i=1

2
i
= 1,
R
1
R
2
c
3
n

i=1

4
i
. (5.23)
Similarly,
R
2
= R +
n

i=1
IE
_

2
i
_
1
0
[f

(Z) +t
i
) f

(W

+t
i
)] dt
_

i=1
IE
_

2
i
_
1
0
[f

(Z) f

(W

)] dt
_
,
February 23, 2005 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) main-t
Normal approximation 39
and, for any ,
|IEf

(W

+) IEf

(Z +)|
=

(w)
_
IP(W

w ) (w )
_
dw

c
2
,
so that
R
2
R 2c
2
. (5.24)
Combining (5.18) and (5.19) with (5.23) and (5.24), it follows that
c
1
R
1

1
2
c
3
n

i=1

4
i
R
3
2
c
3
n

i=1

4
i
2c
2
.
In view of (5.21), collecting terms, it follows that
(c
1
+ 2c
2
) +
3
2
c
3
n

i=1

4
i

1
2

2
(1 e

2
/4
)
n

i=1
IE{
2
i
I
{|
i
|>}
} (5.25)
for any > 0. This proves (5.17), with C 40.
6. Non-uniform BerryEsseen bounds: the independent
case
In this section, we prove a non-uniform BerryEsseen bound similar to (5.3),
following the proof in Chen & Shao (2001). To do this, as in the previous
section, we rst need a concentration inequality; it must itself now be non-
uniform.
6.1. A non-uniform concentration inequality
Let
1
,
2
, ,
n
be independent random variables satisfying IE
i
= 0 for
every 1 i n and

n
i=1
IE
2
i
= 1. Let

i
=
i
I
{
i
1}
, W =
n

i=1

i
, W
(i)
= W

i
.
Proposition 6.1: We have
IP(a W
(i)
b) e
a/2
(5(b a) + 7) (6.1)
for all real b > a and for every 1 i n, where =

n
i=1
IE|
i
|
3
.
February 23, 2005 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) main-t
40 Louis H. Y. Chen and Qi-Man Shao
To prove it, we rst need to have the following BennettHoeding in-
equality.
Lemma 6.2: Let
1
,
2
, ,
n
be independent random variables satisfying
IE
i
0 ,
i
for 1 i n, and

n
i=1
IE
2
i
B
2
n
. Put S
n
=

n
i=1

i
.
Then
IEe
tS
n
exp
_

2
(e
t
1 t)B
2
n
_
(6.2)
for t > 0,
IP(S
n
x) exp
_

B
2
n

2
_
(1 +
x
B
2
n
) ln(1 +
x
B
2
n
)
x
B
2
n
__
(6.3)
and
IP(S
n
x) exp
_

x
2
2(B
2
n
+x)
_
(6.4)
for x > 0.
Proof: It is easy to see that (e
s
1 s)/s
2
is an increasing function of
s IR, from which it follows that
e
ts
1 +ts + (ts)
2
(e
t
1 t)/(t)
2
(6.5)
for s , if t > 0. Using the properties of the
i
s, we thus have
IEe
tS
n
=
n

i=1
IEe
t
i

i=1
_
1 +tIE
i
+
2
(e
t
1 t)IE
2
i
_

i=1
_
1 +
2
(e
t
1 t)IE
2
i
_
exp
_

2
(e
t
1 t)B
2
n
_
.
This proves (6.2).
To prove inequality (6.3), let
t =
1

ln
_
1 +
x
B
2
n
_
.
February 23, 2005 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) main-t
Normal approximation 41
Then, by (6.2),
IP(S
n
x) e
tx
IEe
tS
n
exp
_
tx +
2
(e
t
1 t)B
2
n
_
= exp
_

B
2
n

2
_
(1 +
x
B
2
n
) ln(1 +
x
B
2
n
)
x
B
2
n
__
.
In view of the fact that
(1 +s) ln(1 +s) s
s
2
2(1 +s)
for s > 0, (6.4) follows from (6.3).
Proof of Proposition 6.1. It follows from (6.2) with = 1 and B
2
n
= 1
that
IP(a W
(i)
b) e
a/2
IEe
W
(i)
/2
e
a/2
exp(e
1/2

3
2
) 1.19 e
a/2
.
Thus, (6.1) holds if 7 1.19.
We now assume that 1.19/7 = 0.17. Much as in the proof of Propo-
sition 5.1, dene = /2 ( 0.085), and set
f(w) =
_

_
0 if w < a ,
e
w/2
(w a +) if a w b +,
e
w/2
(b a + 2) if w > b +.
(6.6)
Put
M
i
(t) =
i
(I
{

i
t0}
I
{0<t

i
}
), M
(i)
(t) =

j=i
M
j
(t).
Clearly, M
(i)
(t) 0, f

(w) 0 and f

(w) e
w/2
for a w b +.
Arguing much as in the derivation of (5.7), we obtain
IE{W
(i)
f(W
(i)
)} =

j=i
IE
_

j
[f(W
(i)
) f(W
(i)

j
)]
_
=

j=i
IE
__

(W
(i)
+t)M
j
(t) dt
_
= IE
__

(W
(i)
+t)M
(i)
(t) dt
_
.
February 23, 2005 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) main-t
42 Louis H. Y. Chen and Qi-Man Shao
This expression is now bounded below, giving
IE{W
(i)
f(W
(i)
)} IE
_
I
{aW
(i)
b}
_
|t|
f

(W
(i)
+t)M
(i)
(t) dt
_
IE
_
e
(W
(i)
)/2
I
{aW
(i)
b}
_
|t|
M
(i)
(t) dt
_
IE
_
_
_
e
(W
(i)
)/2
I
{aW
(i)
b}

j=i
|
j
| min(, |

j
|)
_
_
_
e
/2
(H
2,1
H
2,2
), (6.7)
where
H
2,1
= IE
_
e
W
(i)
/2
I
{aW
(i)
b}
_

j=i
IE{|
j
| min(, |

j
|)},
H
2,2
= IE
_
e
W
(i)
/2

j=i
|
j
| min(, |

j
|) IE|
j
| min(, |

j
|)

_
.
Noting that .085 and that .17, and following the proof of (5.9), we
have

j=i
IE{|
j
| min(, |

j
|)} =

j=i
IE(|
j
|[min(, |
j
|) I
{
j
>1}
])
IE|
i
| +

n
j=1
IE{|
j
| min(, |
j
|)}

1/3
+ 0.5
0.085(0.17)
1/3
+ 0.5 0.085(0.17) 0.43. (6.8)
Hence
H
2,1
0.43 e
a/2
IP(a W
(i)
b). (6.9)
By the Bennett inequality (6.2) again, we have
IEe
W
(i)
exp(e 2)
and hence, as for (5.11),
H
2,2

_
IEe
W
(i)
_
1/2
_
Var
_

j=i
|
j
| min(, |

j
|)
__
1/2
exp(
e
2
1) 1.44 . (6.10)
February 23, 2005 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) main-t
Normal approximation 43
As to the left hand side of (6.7), we have
IE{W
(i)
f(W
(i)
)} (b a + 2)IE
_
|W
(i)
|e
W
(i)
/2
_
(b a + 2)(IE|W
(i)
|
2
)
1/2
_
IEe
W
(i)
_
1/2
(b a + 2) exp(e 2) 2.06(b a + 2).
Combining the above inequalities yields
IP(a W
(i)
b)
e
a/2
0.43
_
e
/2
2.06(b a + 2) + 1.44
_

e
a/2
0.43
_
e
.0425
2.06(b a + 2) + 1.44
_
e
a/2
(5(b a) + 13.4 )
e
a/2
(5(b a) + 7).
This proves (6.1).
Before proving the main theorem, we need the following moment in-
equality.
Lemma 6.3: Let 2 < p 3, and let {
i
, 1 i n} be independent
random variables with IE
i
= 0 and IE|
i
|
p
< . Put S
n
=

n
i=1

i
and
B
2
n
=

n
i=1
IE
2
i
. Then
IE|S
n
|
p
(p 1)B
p
n
+
n

i=1
IE|
i
|
p
(6.11)
Remark: Moment inequalities of this kind were rst proved by Rosen-
thal (1970), in a more general martingale setting.
Proof: Let S
(i)
n
= S
n

i
. Then
IE|S
n
|
p
=
n

i=1
IE
i
S
n
|S
n
|
p2
=
n

i=1
IE
i
(S
n
|S
n
|
p2
S
(i)
n
|S
n
|
p2
)
+
n

i=1
IE
i
(S
(i)
n
|S
n
|
p2
S
(i)
n
|S
(i)
n
|
p2
),
February 23, 2005 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) main-t
44 Louis H. Y. Chen and Qi-Man Shao
once again because
i
and S
(i)
n
are independent, and IE
i
= 0. Thus we
have
IE|S
n
|
p

i=1
IE
2
i
|S
n
|
p2
+
n

i=1
IE|
i
||S
(i)
n
|{(|S
(i)
n
| +|
i
|)
p2
|S
(i)
n
|
p2
}

i=1
IE
2
i
(|
i
|
p2
+|S
(i)
n
|
p2
)
+
n

i=1
IE|
i
||S
(i)
n
|
p1
{(1 +|
i
|/|S
(i)
n
|)
p2
1}.
Since (1 +x)
p2
1 (p 2)x in x 0, we thus have
IE|S
n
|
p

i=1
IE|
i
|
p
+
n

i=1
IE
2
i
IE|S
(i)
n
|
p2
+
n

i=1
IE|
i
||S
(i)
n
|
p1
(p 2)|
i
|/|S
(i)
n
|
=
n

i=1
IE|
i
|
p
+ (p 1)
n

i=1
IE
2
i
IE|S
(i)
n
|
p2
,
and Holders inequality now gives
IE|S
n
|
p

i=1
IE|
i
|
p
+ (p 1)
n

i=1
IE
2
i
(IE|S
(i)
n
|
2
)
(p2)/2

i=1
IE|
i
|
p
+ (p 1)B
p
n
,
as desired.
6.2. The nal result
We are now ready to prove the non-uniform BerryEsseen inequality.
Theorem 6.4: There exists an absolute constant C such that for every
real number z,
|IP(W z) (z)|
C
1 +|z|
3
. (6.12)
Proof: Without loss of generality, assume z 0. By (6.11),
IP(W z)
1 + IE|W|
3
1 +z
3

1 + 2 +
1 +z
3
.
February 23, 2005 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) main-t
Normal approximation 45
Thus (6.12) holds if 1, and we can now assume < 1. Let

i
=
i
I
{
i
1}
, W =
n

i=1

i
, W
(i)
= W

i
.
Observing that
{W z} = {W z, max
1in

i
> 1} {W z, max
1in

i
1}
{W z, max
1in

i
> 1} {W z},
we have
IP(W > z) IP(W > z) + IP(W > z, max
1in

i
> 1), (6.13)
and, since clearly W W,
IP(W > z) IP(W > z). (6.14)
Note that
IP(W > z, max
1in

i
> 1)
n

i=1
IP(W > z,
i
> 1)

i=1
IP(
i
> max(1, z/2)) +
n

i=1
IP(W
(i)
> z/2,
i
> 1)
=
n

i=1
IP(
i
> max(1, z/2)) +
n

i=1
IP(W
(i)
> z/2)IP(
i
> 1)


max(1, z/2)
3
+
n

i=1
(1 + IE|W
(i)
|
3
)
1 + (z/2)
3
IE|
i
|
3

C
1 +z
3
;
here, and in what follows, C denotes a generic absolute constant, whose
value may be dierent at each appearance. Thus, to prove (6.12), it suces
to show that
|IP(W z) (z)| Ce
z/2
. (6.15)
Let f
z
be the solution to the Stein equation (2.2), and dene
K
i
(t) = IE{

i
(I
{0t

i
}
I
{

i
t<0}
)}.
We follow the proof of (2.17), noting that

i
1, and that IE

i
need no
longer in general be equal to zero, but may be negative. This gives
IE{Wf
z
(W)} =
n

i=1
_
1

IEf

z
(W
(i)
+t)K
i
(t) dt +
n

i=1
IE

i
IEf
z
(W
(i)
).
February 23, 2005 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) main-t
46 Louis H. Y. Chen and Qi-Man Shao
From
n

i=1
_
1

K
i
(t) dt =
n

i=1
IE

2
i
= 1
n

i=1
IE
2
i
I
{
i
>1}
,
we obtain that
IP(W z) (z) = IEf

z
(W) IE{Wf
z
(W)}
=
n

i=1
IE{
2
i
I
{
i
>1}
}IEf

z
(W)
+
n

i=1
_
1

IE[f

z
(W
(i)
+

i
) f

z
(W
(i)
+t)]K
i
(t) dt
+
n

i=1
IE{
i
I
{
i
>1}
}IEf
z
(W
(i)
)
:= R
1
+R
2
+R
3
. (6.16)
By (8.3), (2.8) and (6.2),
IE|f

z
(W)| = IE{|f

z
(W)|I
{Wz/2}
} + IE{|f

z
(W)|I
{W>z/2}
}
(1 +

2(z/2)e
z
2
/8
)(1 (z)) + IP(W > z/2)
(1 +

2(z/2)e
z
2
/8
)(1 (z)) +e
z/2
IEe
W
Ce
z/2
,
and hence
|R
1
| Ce
z/2
. (6.17)
Similarly, we have IEf
z
(W
(i)
) Ce
z/2
and
|R
3
| Ce
z/2
. (6.18)
To estimate R
2
, write
R
2
= R
2,1
+R
2,2
,
where
R
2,1
=
n

i=1
_
1

IE[I
{W
(i)
+

i
z}
I
{W
(i)
+tz}
] K
i
(t) dt,
R
2,2
=
n

i=1
_
1

IE[(W
(i)
+

i
)f
z
(W
(i)
+

i
)
(W
(i)
+t)f
z
(W
(i)
+t)] K
i
(t) dt.
February 23, 2005 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) main-t
Normal approximation 47
By Proposition 6.1,
R
2,1

n

i=1
_
1

IE{I
{

i
t}
IP(z t < W
(i)
z

i
|

i
)}K
i
(t) dt
C
n

i=1
_
1

e
(zt)/2
IE(|

i
| +|t| +)K
i
(t) dt
Ce
z/2
. (6.19)
From Lemma 6.5, proved below, it follows that
R
2,2

n

i=1
_
1

IE
_
I
{t

i
}
[IE({W
(i)
+

i
}f
z
(W
(i)
+

i
) |

i
)
IE(W
(i)
+t)f
z
(W
(i)
+t)]
_
K
i
(t) dt
Ce
z/2
n

i=1
_
1

IE(|

i
| +|t|)K
i
(t) dt
Ce
z/2
. (6.20)
Therefore
R
2
Ce
z/2
. (6.21)
Similarly, we have
R
2
Ce
z/2
. (6.22)
This proves the theorem.
It remains to prove the following lemma.
Lemma 6.5: For s < t 1, we have
IE{(W
(i)
+t)f
z
(W
(i)
+t)} IE{(W
(i)
+s)f
z
(W
(i)
+s)}
Ce
z/2
(|s| +|t|). (6.23)
Proof: Let g(w) = (wf
z
(w))

. Then
IE{(W
(i)
+t)f
z
(W
(i)
+t)}IE{(W
(i)
+s)f
z
(W
(i)
+s)} =
_
t
s
IEg(W
(i)
+u)du.
From the denition of g and f
z
, we get
g(w) =
_

_
_

2(1 +w
2
)e
w
2
/2
(1 (w)) w
_
(z), w z
_

2(1 +w
2
)e
w
2
/2
(w) +w
_
(1 (z)), w < z.
February 23, 2005 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) main-t
48 Louis H. Y. Chen and Qi-Man Shao
By (2.6), g(w) 0 for all real w. A direct calculation shows that

2(1 +w
2
)e
w
2
/2
(w) +w 2 for w 0. (6.24)
Thus, we have
g(w)
_
4(1 +z
2
)e
z
2
/8
(1 (z)) if w z/2
4(1 +z
2
)e
z
2
/2
(1 (z)) if z/2 < w z,
and this latter bound holds also for w > z, as can be seen by applying (6.24)
with w for w to the formula for g in w z. Hence, for any u [s, t], we
have
IEg(W
(i)
+u)
= IE
_
g(W
(i)
+u)I
{W
(i)
+uz/2}
_
+ IE
_
g(W
(i)
+u)I
{W
(i)
+u>z/2}
_
4(1 +z
2
)e
z
2
/8
(1 (z)) + 4(1 +z
2
)e
z
2
/2
(1 (z))IP(W
(i)
+u > z/2)
Ce
z/2
+C(1 +z)e
z+2u
IEe
2W
(i)
.
But u t 1, and so
IEg(W
(i)
+u) Ce
z/2
+C(1 +z)e
z
IEe
2W
(i)
Ce
z/2
,
by (6.2). This gives
IE{(W
(i)
+t)f
z
(W
(i)
+t)} IE{(W
(i)
+s)f
z
(W
(i)
+s)} Ce
z/2
(|s| +|t|),
proving (6.23).
7. Uniform and non-uniform bounds under local
dependence
In this section, we extend the discussion of normal approximation under
local dependence using Steins method, which was begun in Section 3.2.
Our aim is to establish optimal uniform and non-uniform BerryEsseen
bounds under local dependence.
Throughout this section, let J be an index set of cardinality n and let
{
i
, i J} be a random eld with zero means and nite variances. Dene
W =

iJ

i
and assume that Var(W) = 1. For A J, let
A
denote
{
i
, i A}, A
c
= {j J : j A} and |A| the cardinality of A. We
introduce four dependence assumptions, the rst two of which appeared in
Section 3.2
(LD1) For each i J there exists A
i
J such that
i
and
A
c
i
are
independent.
February 23, 2005 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) main-t
Normal approximation 49
(LD2) For each i J there exist A
i
B
i
J such that
i
is independent
of
A
c
i
and
A
i
is independent of
B
c
i
.
(LD3) For each i J there exist A
i
B
i
C
i
J such that
i
is
independent of
A
c
i
,
A
i
is independent of
B
c
i
, and
B
i
is independent
of
C
c
i
.
(LD4

) For each i J there exist A


i
B
i
B

i
C

i
D

i
J
such that
i
is independent of
A
c
i
,
A
i
is independent of
B
c
i
, and then

A
i
is independent of {
A
j
, j B
c
i
}, {
A
l
, l B

i
} is independent of
{
A
j
, j C
c
i
}, and {
A
l
, l C

i
} is independent of {
A
j
, j D
c
i
}.
It is clear that (LD4

) implies (LD3), (LD3) yields (LD2) and (LD1) is


the weakest assumption. Roughly speaking, (LD4

) is a version of (LD3)
for {
A
i
, i J}. On the other hand, (LD1) in many cases actually implies
(LD2), (LD3) and (LD4

) and B
i
, C
i
, B

i
, C

i
and D

i
could be chosen as:
B
i
=
jA
i
A
j
, C
i
=
jB
i
A
j
, B

i
=
jA
i
B
j
, C

i
=
jB

i
B
j
and
D

i
=
jC

i
B
j
.
We rst present a general uniform BerryEsseen bound under assump-
tion (LD2).
Theorem 7.1: Let N(B
i
) = {j J : B
j
B
i
= } and 2 < p 4. Assume
that (LD2) is satised with |N(B
i
)| . Then
sup
z
|IP(W z) (z)| (7.1)
(13 + 11)

iJ
(IE|
i
|
3p
+ IE|Y
i
|
3p
) + 2.5
_

iJ
(IE|
i
|
p
+ IE|Y
i
|
p
)
_
1/2
,
where Y
i
=

jA
i

j
. In particular, if IE|
i
|
p
+ IE|Y
i
|
p

p
for some > 0
and for each i J, then
sup
z
|IP(W z) (z)| (13 + 11) n
3p
+ 2.5
p/2

n, (7.2)
where n = |J|.
Note that in many cases is bounded and is of order of n
1/2
. In those
cases, n
3p
+
p/2

n = O(n
(p2)/4
), which is of the best possible order
of n
1/2
when p = 4. However, the cost is the existence of fourth moments.
To reduce the assumption on moments, we need the stronger condition
(LD3).
February 23, 2005 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) main-t
50 Louis H. Y. Chen and Qi-Man Shao
Theorem 7.2: Let 2 < p 3. Assume that (LD3) is satised with
|N(C
i
)| , where N(C
i
) = {j J : C
i
B
j
= }. Then
sup
z
|IP(W z) (z)| 75
p1

iJ
IE|
i
|
p
. (7.3)
We now present a general non-uniform bound for locally dependent
random elds {
i
, i J} under (LD4

).
Theorem 7.3: Assume that IE|
i
|
p
< for 2 < p 3 and that (LD4

) is
satised. Let = max
iJ
max(|D

i
|, |{j : i D

j
}|). Then
|IP(W z) (z)| C
p
(1 +|z|)
p

iJ
IE|
i
|
p
, (7.4)
where C is an absolute constant.
The above results can immediately be applied to m-dependent random
elds. Let d 1 and Z
d
denote the d-dimensional space of positive integers.
The distance between two points i = (i
1
, , i
d
) and j = (j
1
, , j
d
) in Z
d
is dened by |ij| = max
1ld
|i
l
j
l
| and the distance between two subsets
A and B of Z
d
is dened by (A, B) = inf{|i j| : i A, j B}. For a
given subset J of Z
d
, a set of random variables {
i
, i J} is said to be an
m-dependent random eld if {
i
, i A} and {
j
, j B} are independent
whenever (A, B) > m, for any subsets A and B of J. Thus, choosing
A
i
= {j : |j i| m} J, B
i
= {j : |j i| 2m} J,
C
i
= {j : |j i| 3m} J, B

i
= {j : |j i| 3m} J,
C

i
= {j : |j i| 4m} J and D

i
= {j : |j i| 5m} J
in Theorems 7.2 and 7.3 yields a uniform and a non-uniform bound.
Theorem 7.4: Let {
i
, i J} be an m-dependent random elds with zero
means and nite IE|
i
|
p
< for 2 < p 3. Then
sup
z
|IP(W z) (z)| 75(10m+ 1)
(p1)d

iJ
IE|
i
|
p
(7.5)
and
|IP(W z) (z)| C(1 +|z|)
p
11
pd
(m+ 1)
(p1)d

iJ
IE|
i
|
p
, (7.6)
where C is an absolute constant.
February 23, 2005 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) main-t
Normal approximation 51
The main idea of the proof is similar to that in Sections 3 and 4, rst
deriving a Stein identity and then uniform and non-uniform concentration
inequalities. We outline some main steps in the proof and refer to Chen &
Shao (2004a) for details.
Dene

K
i
(t) =
i
{I(Y
i
t < 0) I(0 t Y
i
)}, K
i
(t) = IE

K
i
(t),

K(t) =

iJ

K
i
(t), K(t) = IE

K(t) =

iJ
K
i
(t).
(7.7)
We rst derive a Stein identity for W. Let f be a bounded absolutely
continuous function. Then
IE{Wf(W)}
=

iJ
IE{
i
(f(W) f(W Y
i
))} =

iJ
IE
_

i
_
0
Y
i
f

(W +t) dt
_
=

iJ
IE
_
_

(W +t)

K
i
(t) dt
_
= IE
_
_

(W +t)

K(t) dt
_
, (7.8)
and hence, by the fact that
_

K(t) dt = IEW
2
= 1,
IE{f

(W) Wf(W)} = IE
_

(W)K(t) dt IE
_

(W +t)

K(t) dt
= IE
_

(f

(W) f

(W +t))K(t) dt
+IEf

(W)
_

(K(t)

K(t)) dt
+IE
_

(f

(W +t) f

(W))(K(t)

K(t)) dt
:= R
1
+R
2
+R
3
.
Now let f = f
z
be the Stein solution (2.3). Then
|R
1
| IE
_

(|W| + 1)|tK(t)| dt +

IE
_

(I
{Wz}
I
{W+tz}
)K(t) dt

1
2
n

i=1
IE(|W| + 1)|
i
|Y
2
i
+
_

IP(z max(t, 0) W z min(t, 0))K(t) dt


:= R
1,1
+R
1,2
.
Estimating R
1,1
is not so dicult, while R
1,2
can be estimated by using a
concentration inequality given below.
February 23, 2005 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) main-t
52 Louis H. Y. Chen and Qi-Man Shao
Observe that
R
2
= IE
_
f

(W)
n

i=1
(
i
Y
i
IE(
i
Y
i
))
_
,
which can also be estimated easily. The main diculty arises from estimat-
ing R
3
. The reader may refer to Chen & Shao (2004a) for details.
To conclude this section, we give the simplest concentration inequality
in the paper of Chen & Shao (2004a), and provide a detailed proof to
illustrate the diculty for dependent variables.
Proposition 7.5: Assume (LD1). Then for any real numbers a < b,
IP(a W b) 0.625(b a) + 4r
1
+ 4r
2
, (7.9)
where r
1
=

iJ
IE|
i
|Y
2
i
and r
2
=
_

Var(

K(t)) dt.
Proof: Let = r
1
and dene
f(w) =
_

_
(b a +)/2 for w a
1
2
(w a +)
2
(b a +)/2 for a < w a
w (a +b)/2 for a < w b

1
2
(w b )
2
+ (b a +)/2 for b < w b +
(b a +)/2 for w > b +.
(7.10)
Then f

is a continuous function given by


f

(w) =
_
_
_
1, for a w b
0, for w a or w b +,
linear, for a w a or b w b +.
Clearly |f(w)| (b a + )/2. With this f, and with

K(t) and K(t) as
dened in (7.7), we have, by (7.8),
(b a +)/2 IEWf(W) = IE
_

(W +t)

K(t) dt
:= IEf

(W)
_

K(t) dt + IE
_

(f

(W +t) f

(W))K(t) dt
+IE
_

(W +t)(

K(t) K(t)) dt
:= H
1
+H
2
+H
3
. (7.11)
Clearly,
H
1
= IEf

(W) IP(a W b). (7.12)


February 23, 2005 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) main-t
Normal approximation 53
By the Cauchy inequality,
|H
3
| (1/8)IE
_

[f

(W +t)]
2
dt + 2IE
_

K(t) K(t))
2
dt
(b a + 2)/8 + 2r
2
. (7.13)
To bound H
2
, let
L() = sup
xR
IP(x W x +).
Then, by writing
H
2
= IE
_

0
_
t
0
f

(W +s) ds K(t) dt IE
_
0

_
0
t
f

(W +s) ds K(t) dt
=
1
_

0
_
t
0
{IP(a W +s a) IP(b W +s b +)}ds K(t) dt

1
_
0

_
0
t
{IP(a W +s a) IP(b W +s b +)}ds K(t) dt,
we have
|H
2
|
1
_

0
_
t
0
L()ds|K(t)| dt +
1
_
0

_
0
t
L()ds|K(t)| dt
=
1
L()
_

|tK(t)| dt
1
2

1
r
1
L() =
1
2
L(). (7.14)
It follows from (7.11) (7.14) that
IP(a W b) 0.625(b a) + 0.75 + 2r
2
+ 0.5L(). (7.15)
Substituting a = x and b = x + in (7.15), we obtain
L() 1.375 + 2r
2
+ 0.5L()
and hence
L() 2.75 + 4r
2
. (7.16)
Finally combining (7.15) and (7.16), we obtain (7.9).
8. Appendix
Here we give detailed proofs of the basic properties of the solutions to the
Stein equations (2.2) and (2.4), given in Lemmas 2.2 and 2.3. The proofs
of Lemma 2.2 and part of Lemma 2.3 follow Stein (1986), and part of the
proof of Lemma 2.3 is due to Stroock (1993).
February 23, 2005 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) main-t
54 Louis H. Y. Chen and Qi-Man Shao
Proof of Lemma 2.2: Since f
z
(w) = f
z
(w), we need only consider the
case z 0. Note that for w > 0
_

w
e
x
2
/2
dx
_

w
x
w
e
x
2
/2
dx =
e
w
2
/2
w
,
which also yields
(1 +w
2
)
_

w
e
x
2
/2
dx we
w
2
/2
,
by comparing the derivatives of the two functions. Thus
we
w
2
/2
(1 +w
2
)

2
1 (w)
e
w
2
/2
w

2
. (8.1)
It follows from (2.3) that
(wf
z
(w))

=
_

2[1 (z)]
_
(1 +w
2
)e
w
2
/2
(w) +
w

2
_
if w < z;

2(z)
_
(1 +w
2
)e
w
2
/2
(1 (w))
w

2
_
if w > z,
0,
by (8.1). This proves (2.6).
In view of the fact that
lim
w
wf
z
(w) = (z) 1 and lim
w
wf
z
(w) = (z), (8.2)
(2.7) follows by (2.6).
By (2.2), we have
f

z
(w) = wf
z
(w) +I
{wz}
(z)
=
_
wf
z
(w) + 1 (z) for w < z,
wf
z
(w) (z) for w > z;
=
_
(

2we
w
2
/2
(w) + 1)(1 (z)) for w < z,
(

2we
w
2
/2
(1 (w)) 1)(z) for w > z.
(8.3)
Since wf
z
(w) is an increasing function of w, by (8.1) and (8.2),
0 < f

z
(w) zf
z
(z) + 1 (z) < 1 for w < z (8.4)
and
1 < zf
z
(z) (z) f

z
(w) < 0 for w > z . (8.5)
Hence, for any w and v,
|f

z
(w) f

z
(v)| max(1, zf
z
(z) + 1 (z) (zf
z
(z) (z)) = 1.
February 23, 2005 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) main-t
Normal approximation 55
This proves (2.8).
Observe that, by (8.4) and (8.5), f
z
attains its maximum at z. Thus
0 < f
z
(w) f
z
(z) =

2e
z
2
/2
(z)(1 (z)). (8.6)
By (8.1), f
z
(z) 1/z. To nish the proof of (2.9), let
g(z) = (z)(1 (z)) e
z
2
/2
/4 and g
1
(z) =
1

2
+
z
4

2(z)

2
.
Observe that g

(z) = e
z
2
/2
g
1
(z) and that
g

1
(z) =
1
4

1

e
z
2
_

_
< 0 if 0 z < z
0
,
= 0 if z = z
0
,
> 0 if z > z
0
,
where z
0
= (2 ln(4/))
1/2
. Thus, g
1
(z) is decreasing on [0, z
0
) and increasing
on (z
0
, ). Since g
1
(0) = 0 and g
1
() = , there exists z
1
> 0 such that
g
1
(z) < 0 for 0 < z < z
1
and g
1
(z) > 0 for z > z
1
. Therefore, g(z) attains
its maximum at either z = 0 or z = , that is
g(z) max(g(0), g()) = 0,
which is equivalent to f
z
(z)

2/4. This completes the proof of (2.9).


The last inequality (2.10) is a consequence of (2.8) and (2.9) by rewriting
(w +u)f
z
(w +u) (w +v)f
z
(w +v)
= w(f
z
(w +u) f
z
(w +v)) +uf
z
(w +u) vf
z
(w +v)
and using the Taylor expansion.
Proof of Lemma 2.3: We dene

h(w) = h(w) IEh(Z), and then put
c
0
= sup
w
|

h(w)|, c
1
= sup
w
|h

(w)|. Since

h and f
h
are unchanged when h is
replaced by hh(0), we may assume that h(0) = 0. Therefore |h(t)| c
1
|t|
and |IEh(Z)| c
1
IE|Z| = c
1
_
2/.
First we verify (2.11). From the denition (2.5) of f
h
, it follows that
|f
h
(w)|
_
e
w
2
/2
_
w

h(x)|e
x
2
/2
dx if w 0,
e
w
2
/2
_

w
|

h(x)|e
x
2
/2
dx if w 0
e
w
2
/2
min
_
c
0
_

|w|
e
x
2
/2
) dx, c
1
_

|w|
(|x| +
_
2/)e
x
2
/2
dx
_
min(
_
/2, 2c
1
),
February 23, 2005 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) main-t
56 Louis H. Y. Chen and Qi-Man Shao
where in the last inequality we used the fact that
e
w
2
/2
_

|w|
e
x
2
/2
dx
_
/2.
Next we prove (2.12). By (2.4), for w 0,
|f

h
(w)| |h(w) IEh(Z)| +we
w
2
/2
_

w
|h(x) IEh(Z)|e
x
2
/2
dx
|h(w) IEh(Z)| +c
0
we
w
2
/2
_

w
e
x
2
/2
dx 2c
0
,
by (8.1). It follows from (2.5) again that
f

(w) wf

(w) f(w) = h

(w),
or equivalently that
(e
w
2
/2
f

(w))

= e
w
2
/2
(f(w) +h

(w)).
Therefore
f

(w) = e
w
2
/2
_

w
(f(x) +h

(x))e
x
2
/2
dx
and, by (2.11),
|f

(w)| 3c
1
e
w
2
/2
_

w
e
x
2
/2
dx 3c
1
_
/2 4c
1
.
Thus we have
sup
w0
|f

(w)| min
_
2c
0
, 4c
1
).
Similarly, the above bound holds for sup
w0
|f

(w)|. This proves (2.12).


Now we prove (2.13). Dierentiating (2.4) gives
f

h
(w) = wf

h
(w) +f
h
(w) +h

(w)
= (1 +w
2
)f
h
(w) +w(h(w) IEh(Z)) +h

(w). (8.7)
From
h(x) IEh(Z) =
1

2
_

[h(x) h(s)]e
s
2
/2
ds
=
1

2
_
x

_
x
s
h

(x) dte
s
2
/2
ds
1

2
_

x
_
s
x
h

(t) dte
s
2
/2
ds
=
_
x

(t)(t) dt
_

x
h

(t)(1 (t)) dt, (8.8)


February 23, 2005 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) main-t
Normal approximation 57
it follows that
f
h
(w) = e
w
2
/2
_
w

[h(x) IEh(Z)]e
x
2
/2
dx
= e
w
2
/2
_
w

_
_
x

(t)(t) dt
_

x
h

(t)(1 (t)) dt
_
e
x
2
/2
dx
=

2e
w
2
/2
(1 (w))
_
w

(t)(t) dt

2e
w
2
/2
(w)
_

w
h

(t)[1 (t)] dt. (8.9)


From (8.7) (8.9) and (8.1) we now obtain
|f

h
(w)| |h

(w)| +|(1 +w
2
)f
h
(w) +w(h(w) IEh(Z))|
|h

(w)| +

_
w

2(1 +w
2
)e
w
2
/2
(1 (w))
_
_
w

(t)(t) dt

_
w

2(1 +w
2
)e
w
2
/2
(w)
_
_

w
h

(t)(1 (t)) dt

|h

(w)| +c
1
_
w +

2(1 +w
2
)e
w
2
/2
(1 (w))
_
_
w

(t) dt
+c
1
_
w +

2(1 +w
2
)e
w
2
/2
(w)
_
_

w
(1 (t)) dt. (8.10)
Hence it follows that
|f

h
(w)| |h

(w)|
+c
1
_
w +

2(1 +w
2
)e
w
2
/2
(1 (w))
__
w(w) +
e
w
2
/2

2
_
+c
1
_
w +

2(1 +w
2
)e
w
2
/2
(w)
__
w(1 (w)) +
e
w
2
/2

2
_
= |h

(w)| +c
1
2c
1
, (8.11)
as desired.
Acknowledgement: We are grateful to Andrew Barbour for his careful
editing of the paper and, in particular, for substantially improving the ex-
position in the Introduction.
February 23, 2005 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) main-t
58 Louis H. Y. Chen and Qi-Man Shao
References
1. P. Baldi & Y. Rinott (1989) On normal approximation of distributions in
terms of dependency graph. Ann. Probab. 17, 16461650.
2. P. Baldi, Y. Rinott & C. Stein (1989) A normal approximation for the
number of local maxima of a random function on a graph. In: Probabil-
ity, statistics, and mathematics. Papers in honor of Samuel Karlin, Eds:
T. W. Anderson, K. B. Athreya, D. L. Iglehart, pp. 5981. Academic Press,
Boston.
3. A. D. Barbour & L. H. Y. Chen (1992) On the binary expansion of a
random integer. Statist. Probab. Lett. 14, 235241.
4. A. D. Barbour & P. Hall (1984) Steins method and the BerryEsseen
theorem. Austral. J. Statist. 26, 815.
5. P. van Beeck (1972) An application of Fourier methods to the problem of
sharpening the BerryEsseen inequality. Z. Wahrsch. verw. Geb. 23, 187
196.
6. A. Bikelis (1966) Estimates of the remainder in the central limit theorem.
Litovsk. Mat. Sb. 6, 323346 (in Russian).
7. E. Bolthausen (1984) An estimate of the remainder in a combinatorial
central limit theorem. Z. Wahrsch. verw. Geb. 66, 379386.
8. E. Bolthausen & F. G otze (1993) The rate of convergence for multivariate
sampling statistics. Ann. Statist. 21, 16921710.
9. L. H. Y. Chen (1986) The rate of convergence in a central limit theorem for
dependent random variables with arbitrary index set. IMA Preprint Series
#243, Univ. Minnesota.
10. L. H. Y. Chen (1998) Steins method: some perspectives with applications.
In: Probability towards 2000, Eds: L. Accardi & C. C. Heyde, Lecture Notes
in Statistics 128, pp. 97122. Springer-Verlag , Berlin.
11. L. H. Y. Chen (2000) Non-uniform bounds in probability approximations
using Steins method. In: Probability and statistical models with applications:
a volume in honor of Theophilos Cacoullos, Eds: N. Balakrishnan, Ch. A.
Charalambides & M. V. Koutras, pp. 314. Chapman & Hall/CRC Press,
Boca Raton, Florida.
12. L. H. Y. Chen & Q. M. Shao (2001) A non-uniform BerryEsseen bound
via Steins method. Probab. Theory Rel. Fields 120, 236254.
13. L. H. Y. Chen & Q. M. Shao (2004a) Normal approximation under local
dependence. Ann. Probab. 32, 19852028.
14. L. H. Y. Chen & Q. M. Shao (2004b) Normal approximation for non-linear
statistics using a concentration inequality approach. Preprint.
15. A. Dembo & Y. Rinott (1996) Some examples of normal approximations
by Steins method. In: Random Discrete Structures, Eds: D. Aldous & R. Pe-
mantle, pp. 2544. Springer, New York.
16. P. Diaconis (1977) The distribution of leading digits and uniform distribu-
tion mod 1. Ann. Probab. 5, 7281.
17. P. Diaconis & S. Holmes (2004) Steins method: expository lectures and
applications. IMS Lecture Notes, Vol. 46, Hayward, CA.
February 23, 2005 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) main-t
Normal approximation 59
18. P. Diaconis, S. Holmes & G. Reinert (2004) Use of exchangeable pairs
in the analysis of simulations. In: Steins method: expository lectures and
applications, Eds: P. Diaconis & S. Holmes, pp. 126. IMS Lecture Notes
Vol. 46, Hayward, CA.
19. R. V. Erickson (1974) L
1
bounds for asymptotic normality of m-dependent
sums using Steins technique. Ann. Probab. 2, 522529.
20. C. G. Esseen (1945) Fourier analysis of distribution functions: a mathemat-
ical study of the Laplace-Gaussian law. Acta Math. 77, 1125.
21. W. Feller (1968) On the BerryEsseen theorem. Z. Wahrsch. verw.
Geb. 10, 261268.
22. L. Goldstein (2005) BerryEsseen bounds for combinatorial central limit
theorems and pattern occurrences, using zero and size biasing. J. Appl.
Probab. 42 (to appear).
23. L. Goldstein & G. Reinert (1997) Steins method and the zero bias trans-
formation with application to simple random sampling. Ann. Appl. Probab. 7,
935952.
24. L. Goldstein & Y. Rinott (1996) Multivariate normal approximations by
Steins method and size bias couplings. Adv. Appl. Prob. 33, 117.
25. F. G otze (1991) On the rate of convergence in the multivariate CLT. Ann.
Probab. 19, 724739.
26. S. T. Ho & L. H. Y. Chen (1978) An L
p
bound for the remainder in a
combinatorial central limit theorem. Ann. Probab. 6, 231249.
27. S. V. Nagaev (1965) Some limit theorems for large deviations. Theory
Probab. Appl. 10, 214235.
28. L. Paditz (1989) On the analytical structure of the constant in the nonuni-
form version of the Esseen inequality. Statistics 20, 453464.
29. V. V. Petrov (1995) Limit theorems of probability theory: sequences of inde-
pendent random variables. Oxford Studies in Probability 4, Clarendon Press,
Oxford.
30. Y. Rinott (1994) On normal approximation rates for certain sums of de-
pendent random variables. J. Comput. Appl. Math. 55, 135143.
31. Y. Rinott & V. Rotar (1996) A multivariate CLT for local dependence
with n
1/2
log n rate and applications to multivate graph related statistics.
J. Multiv. Anal. 56, 333350.
32. Y. Rinott & V. Rotar (2000) Normal approximations by Steins method.
Decis. Econ. Finance 23, 1529.
33. H. P. Rosenthal (1970) On the subspaces of L
p
(p > 2) spanned by se-
quences of independent random variables. Israel J. Math. 8, 273303.
34. C. Stein (1972) A bound for the error in the normal approximation to the
distribution of a sum of dependent random variables. Proc. Sixth Berkeley
Symp. Math. Statist. Prob. 2, 583602. Univ. California Press, Berkeley, CA.
35. C. Stein (1986) Approximation computation of expectations. IMS Lecture
Notes Vol. 7, Hayward, CA.
36. D. W. Stroock (1993) Probability theory: an analytic view. Cambridge
Univ. Press, Cambridge, U.K.

You might also like