Professional Documents
Culture Documents
Advisors:
P. Bickel, P. Diggle, S. Fienberg, K. Krickeberg,
I. Olkin, N. Wermuth, S. Zeger
Resampling Methods
for Dependent Data
With 25 Illustrations
, Springer
S.N. Lahiri
Department of Statistics
Iowa State University
Ames, IA 50011-1212
USA
Typesetting: Pages created by the author using a Springer TEX macro package.
www.springer-ny.com
To my parents
Preface
selected material from the later chapters, can be used as a text for a grad-
uate level course. For the first part, familiarity with only basic concepts of
theoretical Statistics is assumed. In particular, no prior exposure to Time
Series is needed. The second part of the book (Chapters 6-12) is written
in the form of a research monograph, with frequent reference to the litera-
ture for the proofs and for further ramification of the topics covered. This
part is primarily intended for researchers in Statistics and Econometrics,
who are interested in learning about the recent advances in this area, or
interested in applying the methodology in their own research. A third po-
tential audience is the practitioners, who may go over the descriptions of
the resampling methods and the worked out numerical examples, but skip
the proofs and other technical discussions. Many of the results presented in
the book are from preprints of papers and are yet to appear in a published
medium. Furthermore, some (potential) open problems have been pointed
out.
Chapter 1 gives a brief description of the "bootstrap principle" and ad-
vocates resampling methods, at a heuristic level, as general methods for
estimating what are called "level-2" (and "higher-level") parameters in the
book. Chapter 2 sketches the historical development of bootstrap methods
since Efron's (1979) seminal work and describes various types of bootstrap
methods that have been proposed in the context of dependent (temporal)
data. Chapter 3 establishes consistency of various block bootstrap meth-
ods for estimating the variance and the distribution function of the sample
mean. Chapter 4 extends these results to general classes of statistics, in-
cluding M-estimators and differentiable statistical functionals, and gives
a number of numerical examples. Chapter 5 starts with a numerical com-
parison of different block bootstrap methods and follows it up with some
theoretical results. Chapter 6 deals with Edgeworth expansions and second-
order properties of block bootstrap methods for normalized and studentized
statistics under dependence. Chapter 7 addresses the important problem
of selecting the optimal block size empirically. Chapter 8 treats bootstrap
based on independent and identically distributed innovations in popular
time series models, such as the autoregressive processes. Chapter 9 deals
with the frequency domain bootstrap. Chapter 10 describes properties of
block bootstrap and subsampling methods for a class of long-range depen-
dent processes. Chapter 11 treats two special topics - viz., extremums of
dependent random variables and sums of heavy-tailed dependent random
variables. As in the independent case, here the block bootstrap fails if the
resample size equals the sample size. A description of the random limit is
given in these problems, but the proofs are omitted. Chapter 12 consid-
ers resampling methods for spatial data under different spatial sampling
designs. It also treats the problem of spatial prediction using resampling
methods. A list of important definitions and technical results are given in
Appendix A, which a reader may consult to refresh his or her memory.
Preface IX
2 Bootstrap Methods 17
2.1 Introduction...................... 17
2.2 IID Bootstrap.. . . . . . . . . . . . . . . . . . . 17
2.3 Inadequacy of IID Bootstrap for Dependent Data. 21
2.4 Bootstrap Based on IID Innovations 23
2.5 Moving Block Bootstrap . . . . . 25
2.6 Nonoverlapping Block Bootstrap 30
2.7 Generalized Block Bootstrap .. 31
2.7.1 Circular Block Bootstrap 33
2.7.2 Stationary Block Bootstrap 34
2.8 Subsampling . . . . . . . . . . . 37
2.9 Transformation-Based Bootstrap 40
2.10 Sieve Bootstrap. . . . . . . . . 41
A 339
B 345
References 349
and
4 1. Scope of Resampling Methods for Dependent Data
where E* denotes the conditional expectation given X!, ... , X n . Note that
these formulas are valid for any estimator On, and do not presuppose any
specific form.
The most common choice of Fn is the empirical distribution function
FnO == n- l I:~=l n(Xi :::; .), where n(A) denotes the indicator function of
a statement A, and it takes the values 0 or 1 according as the statement
A is false or true. In this case, the bootstrap random variables Xi, ... , X;
represent a simple random sample drawn with replacement from the ob-
served sample Xl"'" X n , and as a result, can assume only the n data
values observed, thereby justifying the name "resample." There are other
situations, e.g., the parametric bootstrap, where Xi, ... ,X; are generated
by a different estimated distribution and may take values other than the
observed values Xl"'" X n . However, the basic characteristics of the pop-
ulation are captured through the estimated distribution in both cases and
are reflected in the respective bootstrap observations Xi, ... ,X;. Thus, in
spite of such variations due to different choices of Fn , all these bootstrap
methods fall under the general bootstrap principle described above.
The same can be said about the bootstrap methods that have been pro-
posed in the context of dependent data. Here, the situation is slightly more
complicated because the "population" is not characterized entirely by the
one-dimensional marginal distribution F alone, but requires the knowledge
of the joint distribution of the whole sequence Xl, X 2, . . . . Nonetheless, the
basic principle still applies. Here, we consider the block bootstrap methods
that are most commonly used in the context of general time series data, and
show that these methods fall within the ambit of the bootstrap principle.
For simplicity, we restrict the discussion to the case of the nonoverlapping
block bootstrap (NBB) method (cf. Carlstein (1986)). The principle behind
other block bootstrap methods presented in Chapter 2 can be explained by
straightforward modification of our discussion below.
Suppose that Xl, X 2 , •.. is a sequence of stationary and weakly depen-
dent random variables such that the series of the autocovariances of Xi'S
converges absolutely. To fix ideas, first consider the case where the level-
1 parameter () of interest is the population mean, i.e., () = E(XI)' If we
use On == X n, the sample mean, as an estimator of (), then the distribu-
tion of On - () depends on not only on the marginal distribution of X I,
but it is a functional of the joint distribution of Xl"'" X n . For example,
Var(On) = n- l [Var(XI ) + 2 I:~:11(1- i/n)Cov(XI , Xl+ i )] depends on the
bivariate distribution of Xl and Xi for all 1 :::; i :::; n. Note that since the
process {Xn}ne:: l is assumed to be weakly dependent, the main contribu-
tion to Var(On) comes from the lower-order lag autocovariances and the
total contribution from higher-order lags is negligible. More specifically, if
£ is a large positive integer (but smaller than n), then the total contribution
to Var(On) from lags of order £ or more, viz., I:~:/ (1- i/n)Cov(XI, Xl+ i ),
is bounded in absolute value by I::e ICov(X I , Xl+i)l, which tends to zero
1.1 The Bootstrap Principle 5
e
as goes to infinity. As a consequence, accurate approximations for the
level-2 parameter Var(On) can be generated from the knowledge of the lag
covariances COV(Xl' X Hi ), 0 :S i < e, which depend on the joint distribu-
tion of the shorter series {Xl, ... , X£} of the given sequence of observations
{Xl, ... ,Xn }.
The block bootstrap methods exploit this fact to recreate the relation be-
tween the "population" and the "sample," in a way similar to the iid case.
Suppose that we are interested in approximating the sampling distribution
of a random variable ofthe form Tn = tn(Xn; B), where B == B(P) is a level-1
parameter based on the joint distribution P of Xl, X 2 , •.. and t n (·; B) is in-
variant under permutations of Xl' ... ' X n . For example, we may have Tn =
e
nl/2(Xn - f.l), with B = f.l == EX l . Next, suppose that is an integer such
e e e
that both and n/ are large, e.g., = ln" J for some 0 < 8 < 1, where for
any real number x, l x J denotes the largest integer not exceeding x. Also, for
simplicity, suppose that b == n/ e is an integer. Then, for the NBB method,
the given sequence of observations {Xl, ... ,Xn } is partitioned into b sub-
series or "blocks" {Xl, ... , Xt}, {X£+l, ... , X 2l }, ... , {X(b-l)l+l' ... ' Xn}
of length e, and a set of b blocks are resampled from these observed blocks
to generate the bootstrap sample Xi, ... , X~. The NBB version of Tn is de-
fined as T;' == tn(X;'; en), where X;' == {Xi, ... , X~} and en is an estimator
of B based on the conditional distribution of Xi, ... ,X~.
Again, the bootstrap principle underlying the NBB attempts to recreate
the relation between the population and the sample, although in a slightly
different way. Let Pk denote the joint distribution of (Xl, ... , Xk), k ~ 1,
and let Yl , ... , Yi, denote the b blocks under the NBB, defined by Yl =
{Xl, ... ,Xt}, ... ,Yi, = {X(b-l)l+l, ... ,Xn }. Note that because of sta-
tionarity, each block has the same (e-dimensional joint) distribution Pe.
Furthermore, because of the weak dependence of the original sequence
{Xn}n>l, these blocks are approximately independent for large values of
e. Thus, Yl , ... , Yb gives us a collection of "approximately independent"
and "identically distributed" random vectors with common distribution Pl.
By resampling from the collection Yl , ... , Yb randomly, the "block" boot-
strap method described above actually reproduces the relation between
the sample {Xl, ... , Xn} and the "approximate" population distribution
PI == P£ 0 ···0 Pl, which is "close" to the exact population distribution
Pn because of the weak-dependence assumption on {Xn }n2:l. Indeed, if Pt
denotes the empirical distribution of Yl , ... , Yi" then the joint distribution
of the bootstrap observations Xi, ... ,X~ under the NBB is given by Pl.
Thus, the "resampling population" distribution Pj is close to PI, which is
in turn close to the true underlying population Pn . Hence, the relation be-
tween {Xl, ... , Xn} and Pn in the original problem is reproduced (approx-
imately) by the relation between {Xi, ... , X~} and PI under the NBB. Let
£(W; Q) denote the probability distribution of a random quantity Wunder
a probability measure Q. For the random quantity Tn = tn(Xn; Bn(Pn)) of
6 1. Scope of Resampling Methods for Dependent Data
! L n(T~(j)
B
j=l
E .) ~ Pf(T~ E .) , (1.2)
1.2 Examples
The first example illustrates some of the basic usage of the bootstrap
method with a simulated data set.
(1.3)
where la11 < 1, 1,811 < 1 are constants and {fihEZ is a sequence of
iid random variables with Ef1 = 0, Eff = 1. Figure 1.1 shows a sim-
ulated data set of size n = 100 from the ARMA(l,l) process in (1.3)
with a1 = 0.2, ,81 = 0.3 and with Gaussian variables fl. Suppose we are
interested in estimating the variance and certain other population charac-
teristics of the sample mean Xn = n- 1 E~=l Xi (which, according to our
terminology, are level-2 parameters). For the sake of illustration, suppose
that we decided to use the NBB method with block length £ = 5, say, then
we first form the blocks 8 1 = (Xl, ... , X 5 ), 8 2 = (X6 , ... , X lO ), ... ,820 =
(X96, ... , X lOO ), and then, resample b = 20 blocks 8i, ... , 8 20 with re-
placement from {81 , ... , 8 20 } to generate the block bootstrap observations
Xi,··· ,Xioo·
o 20 40 60 80 100
FIGURE 1.1. A simulated data set of size n = 100 from model (1.3) with (31 = 0.2
and al = 0.3.
(1.4)
8 1. Scope of Resampling Methods for Dependent Data
(1.5)
where Vi denotes the mean of the jth block. Thus, for our example,
Vi = (Xl + .. ·+X5 )/5, V2 = (X6 +·· .+XlO )/5, ... , etc. Applying (1.5) to
the data set of Figure 1.1, we obtain V;;;(Xn) = 8.77 X 10- 3 as an estimate
of Var(Xn). This should be compared to the true value of Var(Xn), which
is 11.74 x 10- 3 under the assumed specification of model (1.3).
8...
from the same Monte-Carlo simulation. For 0 < a < 1, let to; denote the
(smallest) a-quantile of Tn, defined by
(1.7)
(1.8)
For the data set of Figure 1.1 , the (Monte-Carlo) values of to.9 and to.l are
respectively given by 1.3514 and -1.3675, the 1800th and the 200th order
statistics of the 2000 bootstrap replicates of T;. The resulting bootstrap
C1 for /l is thus given by
(-0.1258,0.1287) . (1.9)
10 1. Scope of Resampling Methods for Dependent Data
Note that the true value of the parameter JL in this case is zero. It may be
of some interest to compare this CI with the traditional large-sample CI
for JL based on asymptotic normality of Tn. Indeed, for the given sample, a
80% approximate normal CI for JL is given by
where m : [0, 1] -+ lR. is an unknown function, Xl, ... , Xn E [0, 1] are nonran-
dom design points, and {En}n>l is a sequence of unobservable zero mean
stationary random variables. Here, the function m(·) is a level-1 population
parameter that can be estimated from the observations 'YI , ... , Y n by one
of the "smoothing" methods. For the discussion here, suppose we use the
Nadaraya-Watson kernel estimator (cf. Nadaraya (1964), Watson (1964)):
where J C [0, 1]. For optimum performance, we need to use the estima-
tor mh (-) with bandwidth h = h *, where h * minimizes the risk function
MISE(h). Note that the function MISE(h) and, hence, h* depend on the
sampling distribution of the estimator mh 0 of a level-1 parameter m(·),
1.2 Examples 11
and thus, are level-2 parameters. In this case, one can apply the bootstrap
principle to obtain estimators of the risk function MISE(h) and the opti-
mum bandwidth h*. For independent error variables {En}n~l' estimation
of these second level parameters by resampling methods has been initiated
by Taylor (1989), Faraway and Jhun (1990) (although in the context of
density estimation) and has been treated in detail in Hall (1992) and Shao
and Tu (1995). For the case of dependent errors, estimation of MISE(h) and
h* becomes somewhat more involved due to the effect of serial correlation
of the observations. But the bootstrap principle still works. Indeed, block
bootstrap methods can be used to estimate the level-2 parameters MISE(h)
and h * consistently for a wide class of dependent processes including a class
of long range dependent processes (cf. Hall, Lahiri and Polzehl (1995)). D
estimator of the level-1 parameter. In Chapter 11, we show that the esti-
mator derived using this approach is asymptotically optimal. 0
1.4 Notation
For reference later on, we collect some common notation to be used in the
rest of the book. Let Nand Z, respectively, denote the set of all positive
integers and the set of all integers. Also, let Z+ = {O} U N be the set
of all nonnegative integers. Let JR denote the set of all real numbers and
C the set of all complex numbers. The extended real line is denoted by
lR = JRU{ -oo,oo}. The Borel o--field on a metric space § is denoted by B(§)
and for a set A C §, let cl.(A) and 8A respectively denote the closure and
the boundary of A. For a real number x, let l x J denote the largest integer
not exceeding x, let Ix 1 denote the smallest integer not less than x, and let
x+ = max{ x, O}. For x, Y E JR, let x /\ Y = min{ x, y} and x Vy = max{ x, y}.
Let n(8) denote the indicator function of a statement 8, with n(8) = 1
if 8 is true and n(8) = 0 otherwise. For a subset A of a nonempty set n,
we also write nA for the function nA(W) == n(w E A), wEn. For a finite
set A, we write IAI to denote the size, i.e., the number of elements of A.
Also, for a set A E B(JR k ), kEN, we write vol.(A) to denote the volume or
the Lebesgue measure of A. For kEN, let lh denote the identity matrix
of order k. Let r' denote the transpose of a matrix r. We write a m x n
matrix r with (i,j)-th element '"'Iij, 1 ~ i ~ m, 1 ~ j ~ n as r = (('"'Iij)) or
as r = (("(ij))mxn' Let det(r) denote the determinant of a square matrix
r.
As a convention, (random) vectors in JR k , kEN are regarded as column
vectors in this book. For x = (Xl"'" Xk)', y = (Yl,"" Yk)' E JR k , and 0: =
(0:1, ... ,O:k)' E Z~ (k EN), write xC> = I17=1 Xfi, Ixl = IXll+' "+Ixkl, o:! =
I17=1 O:i!, and Ilxll = (xi+" .+xD l / 2, and write x ~ Y if Xi ~ Yi for all 1 ~
i ~ k. For a m x n matrix A, write IIAII = sup{IIAxll : x E JRn, Ilxll = I} for
the spectral norm of A. For a smooth function h : JRk ----+ JR, let Djh denote
the partial derivative of h with respect to the j-th coordinate, i.e., Djh =
!xh . For 0: E Z~, let DC> denote the differential operator DC> == D~l ... D~k
J
8"'l +···+"'k k 11
- 8X~l , ... ,8x:k on JR . Write ~ = v -1. For a real valued function f defined
14 1. Scope of Resampling Methods for Dependent Data
and
bn » an as n -+ 00 if an = o(bn ) as n -+ 00 .
For {an}n~l' {bn}n~l C (0, (0), we write
an rv bn as n -+ 00 if lim an/b n = 1
n-->CX)
and
and
i.e., for every E > 0, there exists M E (0, (0) such that
Unless otherwise specified, the limits in order symbols are taken letting the
variable "n" tend to infinity. Thus, "an = o(b n )" is the same as "an = o(b n )
as n -+ 00".
Convergence in distribution and convergence in probability of random
entities are respectively denoted by ----+d and ----+p. Almost sure conver-
gence with respect to a measure v is written as a.s. (v) or simply, a.s.,
1.4 Notation 15
if the relevant measure v is clear from the context. In the later case, we
also use "a.s." as an abbreviation for "almost sure" or "almost surely," as
appropriate.
For k-dimensional random vectors X and Y with EIIXI1 2 + EIIYI12 < 00,
we define the covariance matrix of X and Y and the variance matrix of X
as
Cov(X, Y) = E{(X - EX)(Y - EY)'} and Var(X) = Cov(X, X) ,
respectively. For a random variable X and for p E [1,00], we define the
LP-norm of X by
(EIXIP)l/p if p E [1, (0)
IIXllp= {
ess. sup{X} if p = 00 .
For a k x k positive definite matrix ~, let <l>d') and <I>(.;~) both denote
the Gaussian distribution N(O,~) on JRk with mean zero and covariance
matrix ~. Let ¢~ and ¢(';~) both denote the density of <I>~ with respect
to the Lebesgue measure on JRk, given by
¢~(x) =¢(x;~) = (27f)-k/2[det(~)rl/2exp(-x'~-lx/2), x EJR k .
<I>(x; ~) <I>~(x) == r
J(-oo,xl
cp(y;L.)dy, x E JRk,
i
and
<I>(A;~) <I>~(A) == ¢(y; ~)dy, A E B(JR k ) .
2 .1 Introduction
In this chapter, we describe various commonly used bootstrap methods
that have been proposed in the literature. Section 2.2 begins with a brief
description of Efron's (1979) bootstrap method based on simple random
sampling of the data, which forms the basis for almost all other bootstrap
methods. In Section 2.3, we describe the famous example of Singh (1981),
which points out the limitation of this res amp ling scheme for dependent
variables. In Section 2.4, we present bootstrap methods for time-series mod-
els driven by iid variables, such as the autoregression model. In Sections 2.5,
2.6, and 2.7, we describe various block bootstrap methods. A description of
the subs amp ling method is given in Section 2.8. Bootstrap methods based
on the discrete Fourier transform of the data are described in Section 2.9,
while those based on the method of sieves are presented in Section 2.10.
the lID bootstrap may not be the "naive" thing to do for data with a
dependence structure.
We begin with the formulation of the IID bootstrap method of Efron
(1979). For the discussion in this section, assume that Xl, X 2, . .. is a
sequence of iid random variables with common distribution F. Suppose,
Xn = {X1, ... ,Xn } generate the data at hand and let Tn = tn(Xn;F),
n 2': 1 be a random variable of interest. Note that Tn depends on the data
as well as on the underlying unknown distribution F. Typical examples
of Tn include the normalized sample mean Tn == nl/2(Xn - p,)/(J" and the
studentized sample mean Tn == n 1 / 2(X n - p,)/ Sn where Xn = n- 1 2::~=1 Xi,
S;;, = n- 1 2::~=1 (Xi - Xn)2, p, = E(X 1 ), and (J"2 = Var(Xd. Let G n denote
the sampling distribution of Tn. The goal is to find an accurate approxima-
tion to the unknown distribution of Tn or to some population characteris-
tics, e.g., the standard error, of Tn. The bootstrap method of Efron (1979)
provides an effective way of addressing these problems without any model
assumptions on F.
Given X n , we draw a simple random sample X;' = {X;, ... ,X~} of size
m with replacement from X n . Thus, conditional on X n , {X;, ... , X~} are
iid random variables with
Also, let Grn,n denote the conditional distribution of T;',n' given X n . Then
the bootstrap principle advocates Grn,n as an estimator of the unknown
sampling distribution G n of Tn. If, instead of G n , one is interested in esti-
mating only a certain functional t.p( G n ) of the sampling distribution of Tn,
then the corresponding bootstrap estimator is given by plugging-in Grn,n
for G n , i.e., the bootstrap estimator of t.p(G n ) is given by t.p(Grn,n). For
example, if t.p(G n ) = Var(Tn) = J x 2dG n (x) - (J xdG n (X))2, the boot-
strap estimator of Var(Tn) is given by t.p(Grn,n) = Var(T;'.n I Xn) =
2.2 IID Bootstrap 19
E*(X[)k = J
xkdFn(x) = n- 1 fx: .
i=l
(2.1)
Proof: Since Xi, ... ,X~ are iid, by the Berry-Esseen Theorem (see The-
orem A.6, Appendix A)
sup JP*(T~ n :::; x) - <I>(x)J :::; (2.75)Lin , (2.4)
x '
20 2. Bootstrap Methods
3 r:::)
where sn2 = E.(X1• - Xn)
- 2
and ~n = E.IX1 - Xnl 3 j(snyn
A
. Clearly, by
• -
the Strong Law of Large Numbers (SLLN) (see Theorem A.3, Appendix
A),
n
S; = n- 1 LX; - (Xn)2 -7 (52 a.s.
i=l
Actually Theorem 2.1 holds for any resample size mn that goes to infinity
at a rate faster than loglog n, but the proof requires a different argument.
See Arcones and Gine (1989, 1991) for details.
Note that by the Central Limit Theorem (CLT), Tn also converges in
distribution to the N(O, 1) distribution. Hence, it follows that
i.e., the conditional distribution Gn,n of T~,n generated by the IID boot-
strap method provides a valid approximation for the sampling distribution
G n of Tn. Under some additional conditions, Singh (1981) showed that
Therefore, the bootstrap approximation for P(Tn ::; .) is far more accu-
rate than the classical normal approximation, which has an error of or-
der O(n- 1/ 2). Similar optimality properties of the bootstrap approxima-
tion have been established in many important problems. The literature on
bootstrap methods for independent data is quite extensive. By now, there
exist some excellent sources that give comprehensive accounts of the the-
ory and applications of the bootstrap methods for independent data. We
refer the reader to the monographs by Efron (1982), Hall (1992), Mammen
(1992), Efron and Tibshirani (1993), Barbe and Bertail (1995), Shao and
Tu (1995), Davison and Hinkley (1997), and Chernick (1999) for the boot-
strap methodology for independent data. Here, we have described Efron's
(1979) bootstrap for iid data mainly as a prelude to the bootstrap methods
for dependent data considered in later sections, as the basic principles in
both cases are the same. Furthermore, it provides a historical account of
the developments that culminated in formulation of the bootstrap methods
for dependent data.
2.3 Inadequacy of lID Bootstrap for Dependent Data 21
(2.6)
Proof: Note that conditional on X n , Xi, ... ,X~ are iid random variables.
As in the proof of Theorem 2.1, by the Berry-Esseen Theorem, it is enough
to show that
s~ --+ a 2 as n --+ 00 a.s.
22 2. Bootstrap Methods
and
n
L
IXi l3 --+ 0 as n --+ 00, a.s.
n- 3 / 2
i=l
These follow easily from the following lemma. Hence Theorem 2.2 is proved.
D
Thus, for all x =f. 0, the IID bootstrap estimator P* (T;: n ::; x) of the level-
2 parameter P(Tn ::; x) has a mean squared error that tends to a nonzero
number in the limit and the bootstrap estimator of P(Tn ::; x) is not con-
sistent. Therefore, the lID bootstrap method fails drastically for dependent
data. It follows from the proof of Theorem 2.2 that res amp ling individual
Xi'S from the data Xn ignores the dependence structure of the sequence
{Xn }n;:> 1 completely, and thus, fails to account for the lag-covariance terms
(viz., Cov(X1 ,Xl+ i ), 1::; i::; m) in the asymptotic variance.
Following this result, there have been several attempts in the literature to
extend the lID bootstrap method to the dependent case. In the next section,
2.4 Bootstrap Based on lID Innovations 23
Hence, we center the "raw" residuals Ei'S and define the "centered" residuals
where X,';, = {X;, ... , X,';,} and Fn denotes the empirical distribution of
the centered residuals Ei, p < i ~ n. The sampling distribution of Tn
is approximated by the conditional distribution of T,';, n given X n . For cer-
tain time-series models satisfying (2.8), different versio~s of this resampling
scheme have been proposed by Freedman (1984), Efron and Tibshirani
(1986), Swanepoel and van Wyk (1986), and Kreiss and Franke (1992).
The IID-innovation-bootstrap method can be applied with some simple
modifications to popular parametric models for spatial data as well (e.g.,
the spatial autoregression model); see Chapter 7, Cressie (1993).
A special case of model (2.8) is the autoregression model of order p
(AR(p)), given by
(2.9)
e:",n = T(F:",n) ,
where F:",n denotes the empirical distribution of (Xi, ... , X;;').
26 2. Bootstrap Methods
• • • • •
FIGURE 2.1. The collection {B 1 , ••• , B N } of overlapping blocks under the MBB.
j=1
where }j = (Xj , ... , XHp-d and where for any y E ]RP, Dy denotes the
probability measure on ]RP putting unit mass on y. The general version of
the MBB concerns estimators of the form
(2.11)
where T(·) is now a functional defined on a (rich) subset of the set of all
probability measures on ]RP. Here, p ~ 1 may be a fixed integer, or it may
tend to infinity with n suitably. Some important examples of (2.11) are
given below.
n-k
i'n(k) = (n - k)-1 L(XHk - Xn,k)(Xj - Xn,k) ,
j=1
where Xn,k = (n - k)-1 'L;::: Xj. Then, i'n(k) is of the form (2.11) with
p=k+l. 0
Example 2.2: Let 'l/J be a function from]RP x ]Rk into ]Rk such that
n-p+l
L'l/J(Xj, ... ,XHP-liTn)=O.
j=1
28 2. Bootstrap Methods
Example 2.3: Let fO denote the spectral density ofthe process {Xn}n~l.
Then, a lag-window estimator of the spectral density (cf., Chapter 6, Priest-
ley (1981)) is given by
(n-l)
in(>\) = L w(k/P)-rn(k) cos(k'\), .\ E [0, n],
k=-(n-l)
(2.12)
block bootstrap observations xi, ... , x:n are generated by resampling from
the blocks Bi = {Xi, ... , XiH-I}, i = 1, ... , N of X-values. Then, define
bootstrap "analogs" of the p-dimensional variable Yi == (Xi, ... ,XHp - I )'
in terms of Xi, ... , x:n as Yi** == (xt, .. · ,Xt+p_l)', i = 1, ... ,m - p + 1.
Then, the bootstrap version of On under this alternative approach is defined
as
where F:;'~n = L:::~p+I 6y :*. We call this approach of defining the moving
block bootstrap version of On as the "naive" approach, and the other ap-
proach leading to e:nn in (2.12) as the "ordinary" approach of the MBB.
We shall also use the 'terms "naive" and "ordinary" in the context of boot-
strapping estimators of the form (2.11) using other block bootstrap methods
described later in this chapter.
For a comparison of the two approaches, suppose that {Xn}n~l is a
sequence of stationary random variables. Then, for each i, the random
vector Yi = (Xi"'" XHp-d' has the same distribution as (Xl"'" Xp)',
and hence, the resampled vectors Yi* under the "ordinary" approach al-
ways retains the dependence structure of (Xl, ... , Xp)'. However, when the
bootstrap blocks are selected by the "naive" approach, the bootstrap ob-
servations Xt's, that are at lags less than p and that lie near the boundary
of two adjacent resampled blocks B; and B;+1' are independent. Thus the
components of Yi** under the "naive" approach do not retain the depen-
dence structure of (Xl, ... , Xp)'. As a result, the naive approach introduces
additional bias in the bootstrap version e;,;: n of On. We shall, therefore, al-
ways use the "ordinary" form of a block b~otstrap method while defining
the bootstrap version of estimators On given by (2.11). For a numerical
example comparing the naive and the ordinary versions of the MBB and
certain other block bootstrap methods, see Section 4.5.
We conclude this section with two remarks. First, it is easy to see that the
above description of the MBB and the "blocks of blocks" bootstrap applies
almost verbatim if, to begin with, the observations Xl,' .. , Xn were ran-
dom vectors instead of random variables. Second, performance of a MBB
estimator critically depends on the block size e. Since the sampling dis-
tribution of a given estimator typically depends on the joint distribution
of Xl, ... ,Xn , the block size e must grow to infinity with the sample size
n to capture the dependence structure of the series {Xn}n?:l, eventually.
Typical choices of e are of the form e = Cno for some constants C > 0,
6 E (0,1/2). For more on properties of MBB estimators and effects of block
lengths on their performance, see Chapters 3-7.
30 2. Bootstrap Methods
(Here we use the index "2" in the superscipt to denote the blocks for the
NBB resampling scheme. We reserve the index 1 for the MBB and we shall
use the indices 3, 4, etc. for the other block bootstrap methods described
later.) Note that while the blocks in the MBB overlap, the blocks Bi
2 ),s
under the NBB do not. See Figure 2.2. As a result, the collection of blocks
from which the bootstrap blocks are selected is smaller than the collection
for the MBB.
FIGURE 2.2. The collection {Bi2) , ... , B~2)} of nonoverlapping blocks under Carl-
stein's (1986) rule.
The next step in implementing the NBB is exactly the same as that for
the MBB. We select a simple random sample of blocks B;(2), ... , BZ(2) with
replacement from {BF), ... , B?)} for some suitable integer k 2: 1. With
m = k.e, let F;;'\;: denote the empirical distribution of the bootstrap sam-
ple (X2',1, ... ,X2',J!; ... ;X2',{(b_1)£+1}, ... ,X2',m), obtained by writing the
elements of B;(2), ... , BZ(2) in a sequence. Then, the bootstrap version of
an estimator en= T(Fn) is given by
()*(2) =
m,n
T(F*(2))
m,n
. (2.13)
Even though the definition of the bootstrapped estimators are very sim-
ilar for the MBB and for the NBB, the resulting bootstrap versions ()::n n
and ();;i~J have different distributional properties. We illustrate the poi~t
with the simplest case, where en = n- 1 2:7=1 Xj is the sample mean. The
2.7 Generalized Block Bootstrap 31
i=l
N £
N- l 2: (C- 2: X 1 j+i-1)
j=l i=l
£-1 }
N- 1 { nXn - C- 1 ~(C - j)(Xj + Xn-j+d . (2.14)
To obtain a similar expression for E*(O~:J), note that under the NBB,
the bootstrap variables (X2',l, ... ,X2,£),· .. , (X~,(m-Hl)' ... ,X2',m) are iid,
with common distribution
(2.15)
E (0*(2))
* m,n
i=l
b £
b- l 2: (rl 2: X (j-l)Hi)
j=l i=l
(2.16)
weights to the observations toward the beginning and the end of the data
set than to the middle part. Indeed, for R :s: j :s: n - R, the jth obser-
vation Xj appears in exactly R of the blocks {B I , ... , B N }, whereas for
1 :s: j :s: R- 1, Xj and X n - j +1 appear only in j blocks. Since there is no
observation beyond Xn (or prior to Xl), we cannot define new blocks to get
rid of this boundary effect. A similar problem also exists under the NBB
with the observations near the end of the data sequence when n is not a
multiple of R. Politis and Romano (1992b) suggested a simple way out of
this boundary problem. Their idea is to wrap the data around a circle and
form additional blocks using the "circularly defined" observations.· Politis
and Romano (1992b, 1994b) put forward two resampling schemes based on
circular blocks, called the "circular block bootstrap" (CBB) and the "sta-
tionary bootstrap" (SB). Here we describe a generalization of their idea and
formulate the generalized block bootstrap method, which provides a unified
framework for describing different block bootstrap methods, including the
CBB and the SB.
Given the variables Xn = {X I, ... , X n }, first we define a new time series
Yn,i, i 2: 1 by periodic extension. Note that for any i 2: 1, there are integers
k i 2: 0 and ji E [1, n] such that i = kin + ji. Then, i = ji (modulo n). We
define the variables Yn,i, i 2: 1 by the relation Yn,i = X ji . Note that this
is equivalent to writing the variables Xl, ... , Xn repeatedly on a line and
labeling them serially as Yn,i, i 2: 1. See Figure 2.3.
Xn Xl
Yn,n Yn,(n+l)
measure on ®:l ({I, ... , n} x N) == {{ it, Rtl~l : 1 :s: it :s: n,l :s: Rt <
00 forall t 2: I} and for any set A c ®:I({l, ... ,n} x N), fn(-;A)
is a Borel measurable function from ~n into [0,1]. Then, the generalized
block bootstrap (GBB) resamples blocks from the collection {B(i,j) : i 2:
1, j 2: I} according to the transition probability function f n as follows. Let
(h, JI), (12, J 2 ), ... be a sequence of random vectors with conditional joint
distribution f n (Xn; .), given X n . Then, the blocks selected by the GBB
2.7 Generalized Block Bootstrap 33
are given by B(h, Jd, B(h, h), ... (which may not be independent). Let
X C,1,XC,2' ... denote the elements of these resampled blocks. Then, the
bootstrap version of an estimator On = T(Fn) under the GBB is defined as
();J;:,) = T(F;;'~<;')) for a suitable choice of m ~ 1, where F;;'~<;') denotes the
empirical distribution of Xc Xc
1> •.• ' m·
Almost all block bootstrap method~ proposed in the literature can be
shown to be special cases of the GBB. For example, for the MBB based on
a block length i, 1 ::; i::; n, the transition probability function f n is given
by
N
® ((N- L8 8£), x E IRn
CXl
fn(x;·) = 1 j ) X
i=l j=l
As a consequence, the resampled blocks B(h, J 1 ), B(I2 , h), ... , come from
the subcollection {B(i,j) : 1 ::; i ::; N,j = i}, which is the same as the
collection of overlapping blocks {B 1 , • .. , B N} defined in Section 2.5. Simi-
larly, the NBB method can also be shown to be a special case of the GBB.
Here, we consider a few other examples.
fn(x;·) = 1 j ) x (2.17)
i=l j=l
Denote the resampling block indices for the eBB (i.e., the variables Ii'S
in the collection (h, J l ), (h, h), ... whose joint distribution is specified
by the f n (·,·) of (2.17)) by h,1,I3 ,2, .... Then, (2.17) implies that the
variables h,l,!3,2, ... are conditionally iid with P*(I3,1 = i) = n- 1 and
P* (Ji = i) = 1 for all i = 1, ... , n. Since each Xi appears exactly i times in
the collection of blocks {B(i, i), ... , B(n, in,
and since the eBB resamples
the blocks from this collection with equal probability, each of the original
observations Xl' ... ' Xn receives equal weight under the eBB. This prop-
erty distinguishes the eBB from its predecessors, viz., the MBB and the
34 2. Bootstrap Methods
NBB, which suffer from edge effects. This is also evident from the following
observation. Let X3' 1, X3' 2, ... denote the eBB observations obtained by
arranging the eleme~ts of the resampled blocks {B(I3,i,R) : i 2: 1} and let
- .(3)
Xm denote the CBB sample mean based on m bootstrap observations,
where m = kR. for some integer k 2: 1. Then, by (2.17),
E. [m- 1
f
2=1
X3',i]
t
g-lE. [t;X;,i]
g-1 [n- 1
{t,Yn,(Hi-l)}]
R-l[RXn]
Xn . (2.18)
Thus, the conditional expectation of the bootstrap sample mean under the
CBB equals the sample mean of the data X n , a property not shared by the
MBB or the NBB. As noted by Politis and Romano (1992b), this makes it
easier to define the bootstrap version of a pivotal quantity of the form Tn =
- • - .(3) - .
tn(Xn; fJ), where fJ = EX 1 . Under the CBB, Tm,n = tm(Xm ; Xn) gIves
the appropriate bootstrap version of Tn. However, replacing the population
parameter fJ simply by Xn to define the bootstrap version of Tn under the
MBB or the NBB introduces some extra bias and hence, it is no longer the
right thing to do (cf. Lahiri (1992a)). We shall look at properties of the
CBB method in Chapters 3, 4, and 5.
(2.19)
2.7 Generalized Block Bootstrap 35
rn(x;·) = x E ffi. n .
i=l j=l
Mo 1,
Mj inf {i ~ M j -1 + 1 : Wi = I}, j ~ 1.
1 :::; i < n, j = i + 1
1 :::; i < n, j =I- i + 1
(2.20)
i = n, 2:::; j :::; n
i = n,j = 1 .
Thus, Zl takes the values 1, ... , n with probability n- 1 each. Also, for
any k :::0: 1, given that Zk = i, 1 :::; i :::; n, the next index Zk+1 takes
the value i + 1 (modulo n) with probability p + n- 1 (1 - p) and it takes
each of the remaining (n - 1) values with probability n- 1 (1 - p). Thus,
from the second formulation of the SB described earlier, it follows that the
SB observations {X4' ihEN may also be generated by the index variables
{ZdiEN as '
X:,i = X z" i :::0: 1 . (2.21 )
To see that {X4' JiEN is stationary, note that by definition, the transition
matrix Q is doubly stochastic and that it satisfies the relation 7r' Q = 7r'.
Therefore, 7r is the stationary distribution of {ZdiEN and {ZihEN is a
stationary Markov chain. Thus, we have proved the following Theorem.
Theorem 2.3 Let Fin denote the a-field generated by Zi and Xn , i :::0: 1.
Then, conditional on Xn , {X4' i' Fin} iEN is a stationary Markov chain for
each n :::0: 1, i.e., '
and
2.8 Subsampling
Use of different subsets of the data to approximate the bias and variance
of an estimator is a common practice, particularly in the context of iid
observations. For example, the Jackknife bias and variance estimators are
computed using subsets of size n-1 from the full sample Xn = (Xl"'" Xn)
(cf. Efron (1982)). However, as noted recently (see Carlstein (1986), Politis
and Romano (1994a), Hall and Jing (1996), Bickel et al. (1997), and the
references therein), subseries of dependent observations can also be used to
produce valid estimators of the bias, the variances, and more generally, of
the sampling distribution of a statistic under very weak assumptions.
To describe the subsampling method, suppose that On = tn(Xn) is an
estimator of a parameter (), such that for some normalizing constant an > 0,
the probability distribution Qn(x) = P(an(On -()) :::; x) ofthe centered and
scaled estimator On converges weakly to a limit distribution Q(x), i.e.,
N
On(X) = N- l L ll(at(Oi,t - On) :::; x), x E IR , (2.24)
i=l
(2.25)
(2.26)
which is the sample variance of Oi,e 's multiplied by the scaling factor ai a~2.
In (2.25) and (2.26), we need to use the correction factors (a£!a n ) and
(aeJa n )2 to scale up from the level of Oi,R.'S, which are defined using £-
observations, to the level of On, which is defined using n-observations. In
applying a bootstrap method, one typically uses a resample size that is com-
parable to the original sample size, and therefore, such explicit corrections
of the bootstrap bias and variance estimators are usually unnecessary.
In analogy to the bootstrap methods, one may attempt to apply the
subsampling method to a centered variable of the form TIn == (On - ()).
However, this may not be the right thing to do. Indeed, if instead of using
the subsampling method for the scaled random variable an(On - ()), we
consider only the centered variable TIn = (On - ()), then the subsampling
estimator of the distribution Qln, say, of TIn would be given by
N
Qln(X) == N- I L n((Oi,e - On) ::; x), x E lR .
i=l
'Va;ln((~n)
N
Var(X£) + N- I L {[Xi ,£ - B]2 - Var(X£)} - [Pn - B]2
i=1
(2.27)
n
Yn(w) = n- I / 2 L Xj exp( -(wj), wE (-Jr, Jr] ,
j=l
2.10 Sieve Bootstrap 41
where recall that ~ = yCT. Though the Xi's are dependent, a well known
result in time-series states (cf. Brockwell and Davis (1991, Chapter 10);
Lahiri (2003a)) that for any set of distinct ordinates -7r < AI, ... ,Ak :::; 7r,
the Fourier transforms Yn(AI), ... , Yn(Ak) are asymptotically independent.
Furthermore, the original observations Xn admit a representation in terms
of the transformed values Yn = {Yn(Wj) : j E In} as (cf. Brockwell and
Davis (1991, Chapter 10)),
where Wj = 27rj/n and In = {-l(n - 1)/2J, ... , In/2j}. Thus, using the
inversion formula (2.28), we can express a given variable Rn = rn(Xn; B)
also in terms of the transformed values Yn. Since the variables in Yn are
approximately independent, we may (suitably) resample these Y-values to
define the FDB version of Rn. Here, however, some care must be taken since
the (asymptotic) variance of the Y-variables are not necessarily identical.
A more complete description of the FDB method and its properties are
given in Chapter 9.
(2.29)
for Borel sets B in ]R, where Po t:;;1 denotes the probability distribution on
]R induced by the transformation tn (.) under P. As described in Chapter 1,
the bootstrap and other resampling methods are general estimation meth-
ods for estimating the level-2 parameters like Gn(B), Var(Tn) , etc. When
the Xi's are iid with a common distribution F, we may write P = F oo
and an estimator of Gn(B) in (2.29) may be generated by replacing P with
Pn = F:t' in (2.28), where Fn is an estimator of F. However, when the
Xi's are dependent, such a factorization of P does not hold. In this case,
estimation of the level-2 parameter Gn(B) can be thought of as a two-step
procedure where, in the first step, P is approximated by a "simpler" prob-
ability distribution Pn and in the next step, Pn is estimated using the data
{X 1, ... , X n }. The idea of the sieve bootstrap is to choose {i\ }n~ 1 to be
a sieve approximation to P, i.e., {Fn }n>1 is a sequence of probability mea-
sures on (]Roo, 8(]ROO)) such that for each n, Fn+l is a finer approximation
to P than Fn and Fn converges to P (in some suitable sense) as n --+ 00.
42 2. Bootstrap Methods
For the block bootstrap methods like the NBB or the MBB, the first step
approximation Pn is taken to be Pc ® Pc ® ... , where Pc denotes the joint
distribution of the block {Xl, ... ,Xc} of length e. In the second step, Pc
is estimated by the empirical distribution of all overlapping (under MBB)
or nonoverlapping (under NBB) blocks of length e contained in the data.
For a large class of stationary processes, Biihlmann (1997) presents a sieve
bootstrap method based on a sieve of autoregressive processes of increas-
ing order, which we shall briefly describe here. However, other choices of
{Pn }n>l is possible. See Biihlmann (2002) for another interesting proposal
based on variable length Markov chains for finite state space categorical
time series. In general, there is a trade-off between the accuracy and the
range of validity of a given sieve bootstrap method. Typically, one may
choose a sieve to obtain a more accurate bootstrap estimator, but only at
the expense of restricting the applicability to a smaller class of processes
(cf. Lahiri (2002b)).
Let {XdiEZ be a stationary process with EX1 = f-L such that it admits
the one-sided moving average representation
00
00
with 2::;:1,8J < 00. The representation (2.31) suggests that autoregressive
processes of finite orders Pn, n 2': 1, may be used to define a sieve ap-
proximation for the joint distribution P of {XihEz. To describe the sieve
bootstrap based on autoregression, let Xn = {Xl, ... , Xn} denote the ob-
servations from the process {XdiEZ. Let {Pn}n>l be a sequence of positive
p:
integers such that Pn i 00 as n -+ 00, but n- 1 -+ 0 as n -+ 00. The sieve
approximation Pn to P is determined by the autoregressive process
Pn
X i -f-L=L,8j(Xi- j -f-L)+Ei, iEZ. (2.32)
j=l
Next, we fit the AR(Pn) model (2.32) to the data Xn to obtain estimators
of the autoregression parameters b1n, ... ,bpnn (for example, by the least
2.10 Sieve Bootstrap 43
by setting the initial Pn-variables X;, ... ,X;n equal to X n. The autore-
gressive sieve bootstrap version of the estimator Tn = t n (X 1 , ... , Xn) is
now given by
T;",n = trn(X;, ... , X;,), m > Pn .
Under some regularity conditions on the variables {EihEZ of (2.30) and
the sieve parameter Pn, Biihlmann (1997) establishes consistency of the
autoregressive sieve bootstrap. It follows from his results that the autore-
gressive sieve bootstrap provides a more accurate variance estimator for the
class of estimators given by (2.11) than the MBB and the NBB. However,
consistency of the autoregressive sieve bootstrap variance estimators holds
for a more restricted class of processes than the block bootstrap methods.
See Biihlmann (1997), Choi and Hall (2000), and the references therein for
more about the properties of the autoregressive sieve bootstrap.
3
Properties of Block Bootstrap
Methods for the Sample Mean
3 .1 Introduction
In this chapter, we study the first-order properties of the MBB, the NBB,
the eBB, and the SB for the sample mean. Note that for the first three
block bootstrap methods, the block length is nonrandom. In Section 3.2,
we establish consistency of these block bootstrap methods for variance and
distribution function estimations for the sample mean. The SB method uses
a random block length and hence, requires a somewhat different treatment.
We study consistency properties of the SB method for the sample mean in
Section 3.3.
For later reference, we introduce some standard measures of weak depen-
dence for time series. Let (n, F, P) be a probability space and let A and E
be two sub a-fields of F. When A and E are independent, for any A E A
and any BEE, we have the relations ~1 == [P(A n B) - P(A) . P(B)] = 0
and ~2 == [P(BIA) - P(B)] = 0, provided P(A) =1= O. When A and E are
not independent, we may quantify the degree of dependence of A and E
by looking at the maximal values of ~1 or ~2 or of some other similar
quantities. This leads to the following coefficients of dependence:
¢-mixing:
¢(A, B) = { I P(AnB)
P(A) I
- P(B) : A E A,P(A) # O,B E B } . (3.2)
W-mixing:
w(A, B) = sup {I I
p(AnB) - 1 : A E A, B E B, P(A)
P(A)P(B) # 0, P(B) # 0 }
(3.3)
p-mixing:
See Chapter 1 of Doukhan (1994) for the properties of these mixing coef-
ficients. For an index set I c Z, I # 0, the mixing coefficients of a time
series {XihEI at lag m ~ 1 are defined by considering the maximal val-
ues of these coefficients over the <T-fields A = <T({Xi : i ~ k, i E I}) and
B = <T({Xi : i ~ k + m, i E I}) for all k E I. Specifically, we have the
following definition for the a-mixing and the p-mixing cases.
(b) Let p, q, r E (1,00) be any real numbers satisfying p-l +q-l +r- 1 = 1.
Then,
ICov(X, Y)I ::; 8[a(a({X}),a({Y}))F/r IIXllpllYllq .
(c)
In the next section, we establish consistency of the MBB, the NBB and
the eBB method for the sample mean.
where J.L = E(X1) and Xn = n- 1 L:~l Xi. In this section, we establish con-
sistency of the MBB, the NBB, and the CBB estimators ofthe (asymptotic)
covariance matrix
Var(Tn) == ETnT~
of Tn, and also, of the sampling distribution
Gn(x) == P(Tn :::; x), x E ]R.d
of Tn. For simplicity, we suppose that for each of the block bootstrap meth-
ods, b == ln/£J blocks are resampled and thus, the resample size is n1 = b£.
- *(1) - *(2) - *(3)
Write Xn ,Xn ,and Xn for the sample means of the n1 bootstrap
observations based on the MBB, the NBB, and the CBB, respectively. The
bootstrap versions of Tn are then given by
and
X']
n
£ [n -1 "
~
u~3)
~
U(3)1
1.
- Xn n (3.9)
i=l
3.2 Consistency of MBB, NBB, eBB: Sample Mean 49
(b)
2j£
R(j)= L
i=2(j-1)£+1
(3.11)
Var ( t, Wi n f(U 1i ))
< 2 [I:
i=1
ER(j)2 + (J + 1)· L
1'5. k,!;J/2
a((2k - 1)2£ - f) . 4(4f ll !1100)2]
This yields the first term in the upper bound in part (a). The second term in
the bound is obtained similarly by using the inequalities ICov(R(j), R(j +
k))1 ~ C(8) (ER(j)2+<5) 1/(2+<5) (ER(j + k)2+<5) 1/(2+<5) a((k -1)2f _ f).5/(2+<5)
and (ER(j)2+<5) 1/(2+<5) ~ 2f(2+<5,n, and retracing the steps above.
For proving part (b), splitting the sum over odd and even indices, we get
var( 8win!(Ui;»))
b
p*(U; = Ui ) = l/N, 1 ~ i ~ N,
where, recall that, N = n-f+1 and Ui = (Xi +·· ·+XiH-1)/f, i?: 1. Also,
note that X~(1) = b- 1 2:~=1 Ut. Hence, by the conditional independence of
Ui,···,U;,
nvar*(i t,ut)
1
n 1 b- 1Var*(U{)
52 3. Properties of Block Bootstrap Methods for the Sample Mean
?= UiU: - PnP~] ,
N
= £[N- 1 (3.12)
t=1
0(£) as n ~ 00
for any constants WIn, ... , Win E [-1,1]. For d > 1, using this bound
component-wise and using the stationarity of the Xi'S, from (2.14), we get
+ Ell t(i/£)X£-iln}
0([£/n]2) + O([£/n])
O(£/n) as n ~ 00 ,
Next, note that by definition, Un = ;ZX£, and that under the conditions
of Theorem 3.1 (cf. Appendix A), ynXn ~d N(O, ~oo). Hence, by the
(extended) dominated convergence theorem,
> 3f)
This proves Theorem 3.1 for the MBB. Next, consider the NBB. Write
Ut(2) for the ith resampled block average under the NBB. Then, by (2.15),
n1 b-1Var* (U*(2»)
1
Var*(T,;'(2)) ------7p I;oo as n ---+ 00. Finally, for the eBB, note that E*X~(3) =
Xn for any block length C E [1, n]. Hence,
C[
n
-1 ~ U(3)u(3)'
~""
- X X']
n n
(3.17)
i=l
where, recall that, UP) == (Yn,i + ... + Yn,(iH-1))/C, Noting that for 1 :::;
i :::; N, UP) = Ui and that under the conditions of Theorem 3.1, EIIX1 +
... + Xml1 2 :::; C(d)m for all m 2: 1, we get
that for x = (Xl, ... , Xd)', Y = (Yl, ... , Yd)' E jRd, X s:; Y if Xi s:; Yi for all
i = 1, ... ,d.
Theorem 3.2 Suppose that there exists a 6" > 0 such that EIIXl I12+c5 < 00
and L~=l a(n)<l/(2+c5) < 00. Also, suppose that ~oo = LiEZ Cov(Xl , X Hi )
is nonsingular and that £-1 + n- l £ = 0(1) as n ---; 00. Then, for j = 1,2,3,
xErn:. d
I
sup P* (T~(j) s:; x) - P(Tn s:; x) I--+p 0 as n ---; 00 .
Theorem 3.2 shows that like the bootstrap variance estimators, the dis-
tribution function estimators o~fl's are consistent estimators of G n for a
wide range of values of the block length parameter £. Indeed, the conditions
on £ presented in both Theorem 3.1 and Theorem 3.2 are also necessary
for consistency of these bootstrap estimators. If £ remains bounded, then
the block bootstrap methods fail to capture the dependence structure of
the original data sequence and converge to a wrong normal limit as in the
example of Singh (1981) (cf. Section 2.3). On the other hand, if £ goes to
infinity at a rate comparable to the sample size n (violating the condition
n -1 £ = o( 1) as n ---; (0), then there are not enough number of distinct
blocks to recreate a representative image of the "infinite population". It
can be shown (cf. Lahiri (2001)) that, in this case, the estimators oW)
converge to certain random probability measures.
Consistency of the bootstrap estimators oW)'s remains valid over a much
larger class of sets than asserted in the statement of Theorem 3.2. Let (! be a
metric on the set of all probability measures on jRd, metricizing the topology
of weak convergence of probability measures (see (A.2) of Appendix A or
Parthasarathi (1967)). The proof actually shows that under the conditions
of Theorem 3.2,
for j = 1,2,3, where, recall that .c(T~(j) IXn ) denotes the conditional dis-
tribution of T~(j) given X n . Since N(O, ~oo) is an absolutely continuous
distribution on jRd, a result of Rao (1962) on uniformity classes (see Sec-
tion 1.2 of Bhattacharya and Rao (1986)) implies that the convergence of
P*(T;:(j) E .) and of P(Tn E .) to <I>(.; ~oo) is uniform over the collection C
of all Borel-measurable convex subsets of jRd. Hence, it follows that under
the conditions of Theorem 3.2,
for j = 1,2,3.
Consistency of the block bootstrap estimators of [(Tn) continues to hold
when a resample size of an order different from the sample size is chosen (cf.
56 3. Properties of Block Bootstrap Methods for the Sample Mean
Thus,
(3.21 )
3.3 Consistency of the SB: Sample Mean 57
Next, note that (3.19) would follow if for any subsequence {ni}, there is
a further subsequence {nk} C {nil such that
(3.22)
Fix a subsequence {nil. Then, by (3.21) and Theorem 3.1, there exists a
subsequence {nd of {ni} such that as k -+ 00
Var * (T*(l))
nk
-+ I: 00 a.s. (3.23)
Note that T~(l) = L~=l (ut - flnhlijb is a sum of conditionally iid ran-
dom vectors (Ui - fln)Jl7b, ... , (Ub - fln)v'lfb, which, by (3.23), satisfy
Lindeberg's condition along the subsequence nk, almost surely. Hence, by
the CLT for independent random vectors (cf. Theorem A.5, Appendix A),
the conditional distribution C(T~~l) IXnk ) of T~~l) converges to N(O, I: oo )
as k -+ 00, almost surely. Hence, by a multivariate version of Polya's Theo-
rem, (3.22) follows. This proves Theorem 3.2 for the case j = 1. The proof
is similar for j = 2,3. The reader is invited to supply the details. 0
n- 1{
t-1
0
n-k
XiX{+k +_ L
n
t-n-k+1
}
XiX{H_n qk
+ XnX~(l _ qk)
{f n(k) + t n(n - k)' }qk + XnX~ . (3.24)
Next, noting that the bootstrap samples under the SB form a stationary
sequence, we have
[{ E*XiXi' - XnX~} +
~(1- n- 1k){ E*XiXi~k + E*Xi' Xi+k - 2XnX~}].
k=l
(3.25)
Hence, the proposition follows from (3.24) and (3.25). D
For proving the theorem, we need two auxiliary results. The first one is a
standard bound on the cumulants of strongly mixing random vectors. For
later reference, we state it in a slightly more general form than the set up of
Theorem 3.3, allowing nonstationarity of the random vectors {XihEZ, For
any random variables ZI, ... , Zr, (r::::: 1), we define the rth-order cumulant
Kr(ZI, ... , Zr) by
r (*j) j
where extends over all partitions {h, ... , I j } of {1, ... , r} and where
2:(*j)
C(I1"" ,Ij)'s are combinatorial coefficients (cf. Zhurbenko (1972)). It is
easy to check that cumulants are multilinear forms, i.e.,
for any r ::::: 1, s ::::: 1. This identity plays an important role in the proof of
the lemma below.
/j > O. Also, let a1, ... , am be any m unit vectors in]Rd (i.e., Ilaill = 1, 1 :::;
i :::; m) for some 2 :::; m :::; 2r. Then,
IKm(a~Xj,,"" a;"'Xj=) I
IKm(a~Xj,,"" a;"'Xj=)
-IT {(E
2=1
II a~Xjs) (E II a~Xjs)} I
sElinJ, sElinh
where (s,n == sup {(EIIXj II S)l/S : 1 :::; j :::; n} for all s > 0, n~ 1,
and C(m) is a constant that depends only on m, not on n. Next, note
that for any 0 :::; t :::; n - 1, there are at most n . t m - 1 sets of indices
{j1,"" jm} C {I, ... , n} that has maximal gap t. Hence,
l: IKm(a~Xj,,"" a;"'Xj"JI
l:'Oj" ... ,j=:'On
3.3 Consistency of the SB: Sample Mean 61
n-1
< 2)ntm - 1). C(m)· (:+8,n[a(t)],,/m+8
t=o
< C(m)~(r, O)(~+8 .n . (3.30)
This, together with (3.29), completes the proof of the first inequality.
The bound on EIS~I readily follows by using cumulant expansions
for moments (cf. Section 6, Bhattacharya and Rao (1986)) and the first
inequality. Hence, the lemma is proved. D
The next lemma gives an expression for the covariance of the "uncen-
tered" sample cross-covariance estimators
n-j
Un ] ,
A ("
a, (3) -= n -1 '~
" ' X in X{3
i+j'
i=l
Lemma 3.3 Suppose that the conditions of Theorem 3.3 hold and that
EX1 = O. Then, for any a, (3, 'Y, v E Z~ with lal = 1(31 = I'YI = Ivl = 1, and
any 0 S; j, k S; n - 1,
(3.31)
r
Proof of Theorem 3.3: Let n(k) = n- 1 L.~:lk XiXI+k' 0:::; k < nand
r r
En = n(O) + l:~:i qnk(r n(k) + n(kY). Then, by Proposition 3.2,
(3.32)
We prove this by showing that the bias and the variance of each element
of the matrix En go to zero as n -+ 00. For this, we label the elements of a
d x d matrix A by the d-dimensional unit vectors a, {3 E Zi. For example,
if a = (1,0, ... ,0)' and {3 = (0,1,0, ... ,0)', then A( a, (3) would denote the
(1,2)-th element of the matrix A. With this notation, for any a, {3 E Zi
with lal = 1{31 = 1,
-L (EXfXf+k+EXfXf+k). (3.33)
k=n
Hence, the bias part goes to zero. Next, we consider the variance part. Note
that by Lemma 3.3,
+ IRn(j,k;a,,6,1,v)1 (3.35)
Var(tn(a, (3))
n-ln-l
+ LLqnjqnk( max IRn(j,k;al,a2,a3,a4)1)
j=O k=O a;E{a,.6}
< cn-l(~qnkr
n-ln-ln-jn-k
+ C(d)n-2 L L L L sup {IK4(Z~XO'Z~Xt-s'Z~Xj,Z~Xk-j)1 :
j=O k=O 8=1 t=l
IIZili ~ 1, i= 1,2,3,4}
Nl
j*(4)
n
= N-1 1/ 2 '"'(X"(4)
~ tn·
_ X )
i=l
Thus, T~(4) differs from T~(4) only by the inclusion of the additional boot-
strap observations, if any, in the last block that lie beyond the first n
resampled values. It is easy to see that N1 - LK ~ n ~ N1, so that the
difference between nand N1 is at most LK. Since we assume that the
expected value of the block lengths is negligible compared to n, the differ-
ence between these two versions can be shown to be negligible under the
conditions of Theorem 3.4. Theorem 3.4' establishes consistency of the SB
method for estimating the distribution function of Tn.
Theorem 3.4 Suppose that EIIXl 11 6 +O < 00, Eoo is nonsingular, and
L~=l n 5 a(n)6/(6+O) < 00 for some 8 > O. Also, assume that p +
(n 1/ 2 p)-1 -+ 0 as n -+ 00. Then,
From the discussion following Theorem 3.2, it follows that (3.37) is equiv-
alent to
as n-+oo,
sup IE{P(
xEIR V
~ t (8(14,j; Lj ) -
N1 j=1
LjXn) ::; x I Tn) I Xn}
- <I>(x/O"oo) I
X ll(An) I Xn]
To complete the proof of Theorem 3.4 and for later reference, here we
establish some basic properties of the stopping time K and of the (random
length) block sums for the SB method in the following result.
Lemma 3.4 Assume that p + (np)-l ---. 0 as n ---. 00. Let {t n }n2:1 be a
sequence of positive number such that tn ---. 00 as n ---. 00. Also, let r ;::: 1
be a given integer. Then,
(b) Suppose that the conditions of Theorem 3.4 hold. Let +2(£)
Var(S(h1; £) I Tn), £;::: 1. Then,
Next, let f(t) = log{e- tn (pe t (l- qet)-l)ko},O < t < -logq. It is easy
to see that f(t) attains its minimum at to == log[(n - ko)/n] - log q E
(0, -logq). Now using Taylor's expansion, after some algebra,we get
P(K ~ k o) ~ exp(f(to))
< p(L1=m)[1+tp(k~K<k+m)]
k=2
t
k), k ~ 1. Then, by (3.39) and Cauchy-Schwarz inequality, we have,
~ (n- 1 t
t
E{ [SCi; k)2 _ T2(k)J) Wkn} 2
< L
k<5.p-l (log n)2
Wkn Var (n- 1 t
i=l
SCi; k)2)
+2 L
k>p-l(logn)2
Wkn{E(n- 1 t
i=l
S (i;k)4)+T(k)4}
3.3 Consistency of the SB: Sample Mean 69
+C
k>p-l (log n)2
+ {E(Ld}' I: b(i)l]
i~p-l/2
r
E{pE*f2(Ld _ u!,} 2
E[E*{ n- 1 tS(i;L1)4}]
+ (EIIXd 4 ) . L k 4w kn
(3.42)
rr
J=l
4.1 Introduction
In this chapter, we establish consistency of different block bootstrap meth-
ods for some general classes of estimators and consider some specific exam-
ples illustrating the theoretical results. Section 4.2 establishes consistency
of estimators that may be represented as smooth functions of sample means.
Section 4.3 deals with (generalized) M-estimators, including the maximum
likelihood estimators of parameters, which are defined through estimat-
ing equations. Some special considerations are required while defining the
bootstrap versions of such estimators. We describe the relevant issues in
detail in Section 4.3. Section 4.4 gives results on the bootstrapped empiri-
cal process, and establishes consistency of bootstrap estimators for certain
differentiable statistical functionals. Section 4.5 contains three numerical
examples, illustrating the theoretical results of Sections 4.2-4.4.
Xi = (XOi,XOiXO(iH))', i E Z .
estimator
where Xn-k (n - k)-l "£~::lk Xi. Thus, this is an example that falls
under the purview of the Smooth Function Model. 0
and set Yi = (XOi , X6i' XOiXO(Hk))', i E Z. Then, it is easy to see that the
function H(·) is smooth in a neighborhood of EY1 and that r(k) and Tn(k)
can be expressed as
r(k) = H(EYd
and
- -1 ",m
where Y m = m wi=l Yi, m ~ 1. D
(4.3)
has no zero on the closed unit disc {Izl ~ I}. Then, the AR(p)-process
{XOdiEZ admits a representation of the form
L ajEi-j, i E Z
00
X Oi = (4.4)
j=O
We claim that the parameter vector () = (!31, ... , !3p; 0"2)' and the estimator
On = (~ln, ... , ~pn; a;J' satisfy the requirements of the Smooth Function
Model. To see this, define a new jRP+2-valued process {XdiEZ by
Let h1(X) = (X2 - xi) and h2(X) = (X3 - xi, ... , Xp+2 - xt)', x =
(Xl, ... ,Xp+2)' E jRP+2. Then, writing Xm = m-12:::1Xi, m ~ 1, we
have 'Yn(O) = h 1(Xn- p) and 'Yp,n = h 2(Xn- p).
Next, let S; (and S;+, respectively) denote the collection of all symmet-
ric (and symmetric nonsingular) matrices of order p, and let 91 : -+ S; S;
be defined by
if A E S*+
p
otherwise.
Since, for A E S;+, the elements of A-I are given by the ratios of the co-
factors of A and the determinant of A, and the determinant of a matrix is a
polynomial in its elements, the components of the function 91 (.) are rational
functions (and, hence, infinitely differentiable functions) of the elements of
its argument at any A E S;+. Also, let g2 : jRP -+ S; be defined by
2
Then the estimator ()n = (!31n, ... ,!3pn; an)' can be expressed as
A A A
where
and
2 - (1) - , - _ (2)-
h1(Xn- p) - H (Xn- p) h2(Xn- p) = H (Xn- p).
A
O"n
where "(p = b(1), ... ,"{(p))' and rp is the p x p matrix with (i,j)-th
element "((i - j), 1 ~ i,j ~ p. Thus, the Yule-Walker estimators also fall
under the purview of the Smooth Function Model. 0
TIn = y'n(On - 0) ,
where O*(j)
n = H(X*(j))
n and Bn,]. = H(E * X*(j))
n , 1 <
_ J. <
_ 4.
Then, we have the following result.
Theorem 4.1 Suppose that the function H is differentiable in a neigh-
borhood NH == {x E ~d : Ilx - EX111 < 21]} of EX 1 for some 1] > 0,
Llal=1 IDa H(EX1)1 =I- 0, and that the first-order partial derivatives of H
satisfy a Lipschitz condition of order /'i, > 0 on N H. Assume that the condi-
tions of Theorem 3.2 hold for j = 1,2,3 and that the conditions of Theorem
3.4 hold for j = 4 (with the transformed sequence {XihEZ). Then,
sup Ip*(T;~j) :::; x) - P(Tln :::; x)l---+p 0 as n - t 00
xEIR
for j = 1,2,3,4.
For proving Theorem 4.1, we shall use a suitable version of the well
known Slutsky's theorem for conditional distributions.
Lemma 4.1 (A CONDITIONAL SLUTSKY'S THEOREM). For n E N, let
b~ and T~ be r-dimensional (r E N) and s-dimensional (s E N) random
vectors, and let A~ be a r x s random matrix, all defined on a common
probability space (0, F, P). Suppose that Xoo is a sub-a-field of F and that
there exist Xoo-measurable variables A and b, such that
i.e.,
(!r(.c(A~T~ + b~ I X oo ), vog- l ) ----tp 0 as n -+ 00 ,
where 9 : IRs -+ IRS is the mapping g(t) = At + b, t E IRS, and where vog- l
denotes the probability distribution induced by the mapping 9 on IRr under
v, i.e., vog-I(B) = v(g-I(B)), BE 8(IRr).
Lemma 4.1' If .c(T;: I Xoo) ----t d .c(T) in probability and (4.7) holds, then
For proving the lemma, we shall use the following equivalent form of
condition (4.7):
There exists a sequence {E n }n21 of positive real numbers such that En 10
as n -+ 00 and
----tp 0 as n -+ 00 . (4.8)
It is clear that (4.8) implies (4.7). To prove the converse, note that by (4.7),
for each k ~ 2, there exists a positive integer mk > mk-l such that
where
Fix a subsequence {nd. Let (in(E) be as defined above. Then, by (4.8) and
the conditions of the lemma, there exists a subsequence {nd c {ni} such
that
(ink (E nk ) ---+ 0 as k ---+ 00, a.s. (P) (4.10)
and
eS(.c(T~k I X=), v) ---+ 0 as k ---+ 00, a.s. (P) . (4.11)
We shall show that (4.9) holds for this choice of the subsequence {nd. For
any vector x E lR T , write
(4.13)
e(.c(x'[A~kT~k + b~kll X=), vx) ---+ 0 as n", ---+ 00, a.s. (P)
for all x E lR T • Thus, by the Cramer-Wold device, (4.9) holds and the
lemma is proved. D
(4.15)
T*(j)
In = AT*(j)
n + R*(j)
n' J. = 1 , 2, 3 , 4 ,
80 4. Extensions and Examples
(4.16)
for some fixed integers k 1 , ... , kd, not depending on n, where il,···, fd
denote the components of the function f : JRdo -+ JRd. It can be easily
shown that the conclusions of Theorem 4.1 continue to hold for iJn if we
replace en by (jn and e~(j) by the corresponding block bootstrap versions.
In the same vein, it is easy to check that consistency of the block bootstrap
4.3 M-Estimators 81
Remark 4.2 As in Theorems 3.1 and 3.3, the block bootstrap methods
also provide consistent estimators of the asymptotic variance of the statistic
On considered in Theorem 4.1. However, we need to assume some stronger
moment and mixing conditions than those of Theorem 4.1 to establish the
consistency of bootstrap variance estimators. A set of sufficient conditions
will be given in Chapter 5, which guarantee (mean squared error or L2_)
consistency of these bootstrap estimators of the asymptotic variance of On.
4.3 M-Estimators
Suppose that {XdiEZ is a stationary process taking values in ]Rd. Also,
suppose that the parameter of interest e is defined implicitly as a solution
to the equation
(4.17)
for some function \II : ]Rdm+s ---t ]Rs, m, s E No An M-estimator On of e is
defined as a solution of the 'estimating equation'
(n-m+1)
(n - m + 1) -1 L...-
""' \II (Xi , ... , X Hm - 1 ; en)
A
=0 . (4.18)
i=1
T*(j)
2n
= f7i::o (e*(j)
V"U n
- ij(j))
n' (4.20)
to ensure the bootstrap analog of (4.17) at the centering value ij~) in the
definition of T;~j). Note that for the CBB or the SB applied to the series
. -(j) ,
{Y1 , ... , Y no }, equatlOn (4.21) reduces to (4.18) and, hence, en = en for
j = 3,4. Thus, the original estimator en
itself may be employed for center-
ing its bootstrap version e~(j) for the CBB and the SB. However, for the
MBB and the NBB, ij~) need not be equal to en
and, hence, computation
of the bootstrap version T;~j) in (4.20) requires solving an additional set
of equations for the "right" centering constant ij~). It may be tempting to
replace ij~) with en and define
t*(j)
2n
= Vf7i::(e*(j)
1·0 n
- e)
n (4.22)
as a bootstrap version of T 2n for j = 1,2. However, for the MBB and the
NBB, centering e~(j) at en
introduces some extra bias, which typically leads
to a worse rate of approximation of £(T2n) by £(t;~j) I Xn) compared
to the classical normal approximation (cf. Lahiri (1992a)). Indeed, this
"naive centering" can render the bootstrap approximation totally invalid
for M-estimators in linear regression models as noted by several authors
in the independent case (cf. Freedman (1981), Shorack (1982), and Lahiri
(1992b)).
An altogether different approach to defining the bootstrap version of T 2n
is to reproduce the structural relation between equations (4.17) and (4.18)
in the definition of the bootstrap version of the M-estimator itself. Note
that if we replaced en
in (4.18) bye, then the expected value of the left
side of (4.18) would be zero. As a result, the estimating function defin-
ing en is unbiased at the centering value e. However, in the definition of
the bootstrapped M-estimator in (4.19), this unbiasedness property of the
estimating function does not always hold. A simple solution to this prob-
lem has been suggested by Shorack (1982) in the context of bootstrapping
M-estimators in a linear regression model with iid errors. Following his
approach, here we define an alternative bootstrap version e~*(j) of en
as a
solution to the modified equation
no
nol L [W(y;*(j); e~*(j)) -,(j;j] = 0 , (4.23)
i=l
4.3 M-Estimators 83
T,**(j) =
2n rn::(8**Cj)
V"O n - en·) (4.24)
An advantage of using (4.24) over (4.20) is that for finding the boot-
strap approximation under the MBB or the NBB, we need to solve only
one set of equations (viz. (4.23)), as compared to solving two sets of
equations (viz., (4.19) and (4.21)) under the first approach. Since ¢j =
o
n 1 L~~lW(Yi;en) = 0 for j = 3,4, the centering is automatic for the
CBB and the SB. 'As a consequence, both approaches lead to the same
bootstrap version of T2n under the CBB and the SB.
The following result establishes validity of the block bootstrap ap-
proximations for the two versions T;~j) and T;~Cj), j = 1,2,3,4. Write
o
~\]i = lim n --+ oo Var(n 1 / 2 L~~l w(Yi; 8)) and let D\]i be the s x s matrix
with (i,j)-th element E[8~ Wi(Y1; 8)], 1 :::; i,j :::; s. Also, assume that the
J
solutions to the estimating equations are measurable and unique.
(i) w(y; t) is differentiable with respect to t for almost all y (under Fy)
and the first-order partial derivatives of W (in t) satisfy a Lipschitz
condition of order K E (0,1]' a.s. (Fy), where Fy denotes the proba-
bility distribution of Y 1 .
(iii) There exists a 5 > 0 such that EIID Q w(Y1 ;8)11 2 r j +J < 00 for all
a E Z:+ with lal = 0,1, and ~(rj; 5) < 00, where rj = 1 for j = 1,2,3
and rj = 3 for j = 4.
Then,
f(xo) = Xo .
Proof: See page 14, Milnor (1965). o
Lemma 4.2 Suppose that A and B are two d x d matrices for some dEN
and A is nonsingular.
(a) If IIA- BII < 0/IIA- 1 1 for some 0 E (0,1), then B is nonsingular and
Proof:
(a) Let (n° = lId for any d x d matrix f. Since L~o IllId -
A-I Bilk :::;
L~o(IIA-IIIIIA -BII)k < 00, (each of the d 2 components of) the
matrix-valued series L~o(lId - A-I B)k is absolutely convergent.
Write Q = LC;:=o(lId - A-I B)k. Then,
(4.25)
IIR1nli :::; n- 1 tL
i=ll",l=l
{liOn - ell
:::; Elln- 1 t 2
w(Xi; e)11 (n(10gn)-2)
= O((10gn)-2) (4.27)
t{
=
Lemma 4.2 and condition (ii), DiJ!,n is nonsingular on the set A 1n , for n
large. Hence, for n large, on the set A 1n , we can write (4.26) as
(4.29)
Note that the right side of (4.29) is a continuous function of (en - e);
call it g(e n - e). Now, using (4.27), (4.29), and the bound on R1n,
we see that there exists a C1 E (1, (0) such that Ilg(e n - e)11 : : ;
C 1n- 1/ 2(logn) for all lien - ell::::; C 1n- 1/ 2(logn). Thus, setting f(x) =
[C1n-1/2lognt1g([C1n-1/21ognjx), x E U, we have a continuous func-
tion f : U -+ U. Hence, by Proposition 4.1, there exists a Xo E U such
that f(xo) = Xo, or equivalently, g([C1n-1/2lognlxo) = [C1n- 1/ 2 lognlxo.
Since, by assumption, (en - e) is the unique solution to (4.29), we must
have en - e = [C1n-1/21ognjxo. Therefore, lien - ell::::; C 1n- 1/ 2(logn) on
the set A 1n , for n large. Since P(A 1n ) -+ 1 as n -+ 00, this implies that
(4.30)
v'ri(Bn-e) = (DiJ!+Op(1))-1[Jn~1l1(Xi;e)
+ Op (n-I</2(log n)1+I<) ]
---+d N(O, DiJ!r,iJ!D~) as n -+ 00 .
o = nIl f [{
i=l
Z;iU) + L (e~*U) - e)a Z:~j)}
lal=l
+ R**U)
In ,
where IIR~~U)II < c[lle~*(j) - ell 1+1< + lien - ell 1+1<]. We can rewrite this as
(e**(j)
n
- e)
n
= [D**U)]
w,n
-1 [n- 1 0 ~ (Z*U) - E * Z*(j)) + R**(j)]
1 02 2n'·
02
(431)
i=l
+ IIR~~(j)11 . (4.32)
A~*(j) = { t
lal=O
IIn I fz:~j) - E*z:~j)ll:s; n- 1/ 2 log n} .
1
2=1
Note that by condition (ii) and Lemma 4.2, D~*~) is nonsingular on A~*(j) n
AW) for large n. Hence, as in the proof of pa~t (a), by (4.31), (4.32), and
Proposition 4.1, on the set AW), there exists a constant C > 0 such that
p* (1Ie~*U) - enll :s; Cn- 1/ 2 logn)
> P*(A~*U»)
> 1- Cn(logn)-2 t
lal=O
E*llnI f Z:\j) _E*z:;j) 112
1
2=1
for n large.
Next write Z~~) = nIl 2::~~1 Z~~j) and Zak = k- 1 2::7=1 Zai, kEN,
-.(2) - -.(3) - .
lal = 0,1. Note that E.Zan = Zanll E.Zan = Zan, and as III (3.13),
p( (A~)r)
< p(llen - Oil > Cn- 1/ 2 Iogn)
+ Cn~/2 L EIIE. (Z~~)) - EZal112
10.1=1
1
Hence, for j = 1,2,3, part (b) of the theorem now follows from Theorem
3.2, (4.31), (4.35), (4.36), and Lemma 4.1.
Next consider j = 4. In this case, by (4.18) and Wald's lemmas (cf.
Appendix A),
(L 1l1(Xi*(j); en))
N1
~j Nl1 E.
i=l
L1
N 1 1(E.K)E. (L 1l1(Xi · (j); en))
i=l
n
N 1 1(E.K)(E.Ld (n- 1 L 1l1(Xi ; en))
i=l
o.
Hence, the bootstrapped M-estimator O~·(j) is a solution of
N1
L 111 ( X;(j); O~·(j)) = 0. (4.37)
i=l
4.3 M-Estimators 89
1- p* (A~*(4))
< Cn(logn)-2 t
lal=O
E*{ N I 2 11 f
t=l
(Z:~4) _ E*Z:~4)) 112}
< Cn(logn)-2n- 2 t
lal=O
E* II f
t=l
(Z:~4) _ E*z:~4)) 112
< C(s)(logn)-2. (4.39)
Let S:~4) is the kth block sum of the Z:~4),S, a E Z:+, lal = 0,1. Then,
using an iterated conditioning argument similar to the one employed in the
proof of Theorem 3.4 (cf. (3.38)), and using Wald's lemmas (cf. Appendix
A), we have
N,
E* L (Z:~4) - EZal)
i=l
E[ t E {
k=l
(S:~4) - LkEZal) I Tn} I Xn]
E{ t Lk(Zan - EZal ) I Xn}
k=l
{(E*K)p-1} (Zan - EZ(1 )
and
for all a E Z+, lal = 0,1. Let f2(£; r, a) denote the conditional variance of
the rth component of s:i given Tn, lal = 0, 1, r = 1, ... , s. Then, noting
4)
that E(s:i
4 ) I Tn) = LlZan , we have
E*lls:i LlZanl12
4
) -
s
2..: E*f (L r, a) .2 l ;
r=l
Hence, using Lemma 3.4 (parts a(iv) and b(ii)) and the above identities,
we get
1-p(A~n))
+ p( 2..: IIZan -
lal=l
EZal11 > cn-I</4)
Now part (b), j = 4 follows from (4.39), (4.40), Theorem 3.4 and Lemma
4.1.
Part (c) of the theorem can be proved using similar arguments. 0
i=l
where Yi == (X:, ... ,X:+m-d, i :::: 1, and where by denotes the probability
measure putting mass one at y. The probability distribution F~m) serves
as an estimator of the marginal distribution F(m) of the m-dimensional
subseries (Xl, ... , Xm).
4.4 Differentiable Functionals 91
Example 4.4: Suppose that the process {XdiEZ is real-valued. Then, for
any 1/2 < a < 1, the a-trimmed mean is given by
Ln(l-a)J
AI"
en = n(l _ 2a) ~ X i :n , (4.43)
i=LnaJ+I
en = 10 1
J(U)F;;l(U)du.
i=l
and this essentially reduce the general problem to the case m = 1 for
the dm-dimensional random vectors Y 1 , ... , Y no . Hence, without loss of
generality, we set m = 1. Also, for notational simplicity, write F~1) = Fn
and F(l) = F.
Let [])d denote the space of all real-valued functions on [-00, oold that are
continuous from above and have limits from below. We equip [])d with the
(extended) Skorohod metric (cf. Bickel and Wichura (1971)). Write Fn(x)
and F(x) for the distribution functions corresponding to the probability
measures Fn and F, respectively. Thus,
n
Fn(x) = n- 1 2.: D.(Xi :::; x) (4.44)
i=l
4.4 Differentiable Functionals 93
and
F(x) = P(XI ::::; x) , (4.45)
x E [-oo,oold. Recall that for any two vectors x == (Xl, ... ,Xd)' and y ==
(Yl, ... , Yd)', x ::::; Y means Xi ::::; Yi for all 1 ::::; i ::::; d. Define the empirical
process
Wn(x) = v'n(Fn(x) - F(x)), x E [-oo,oold .
Then, under some regularity conditions (cf. Yoshihara (1975)), the empir-
ical process Wn converges weakly (as JD)d-valued random elements) to a
Gaussian process W on [-00, oold satisfying
EW(x) = 0
L
00
Note that JD)d is a complete separable metric space and, hence, there is a
metric, say, {!, that metricizes the topology of weak convergence on JD)d (cf.
Parthasarathi (1967), Billingsley (1968)). Thus, Theorem 4.3 implies that
and
T(FA m )) - T(F(m))
T(1)(F(m); FA m ) - F(m)) + Rn
no
nol LT(l) (F(m); 6Yi - F(m)) + Rn
i=l
no
nol L h(Yi) + Rn , say, (4.48)
i=l
Now part (a) of the theorem follows from (4.48) and the Central Limit
Theorem for dependent random vectors (cf. Theorem A.8, Appendix A).
To prove the second part, for notational simplicity, we assume that no =
bo/!o Also, let Z be a random vector having the N(O, Et')) distribution on
JRs. Then, we need to show that
(4.50)
4.4 Differentiable Functionals 97
whenever IIG(m) - F(m) 1100 < o. Also, by the linearity of T(1)(F(m); .),
no-€+lj+€-l
[(no - C+ 1)Cr 1
L L h(Yi)
j=l i=j
- (T(E*F~m)*) - T(F(m)))]
vk ~ [h(Yi*) - E*h(yn] + R~
T~ + R~, say, (4.52)
o
where n 1 / 2 R~ = R(F~m)*) + R([E*F~m)*]).
Let A (x, E) be the E- neighbor hood of the boundary of ( - 00, x], defined by
A(x, E) = (-00, x+El]\( -00, x-Ell, E > 0, x E ffi,s, where 1 = (1, ... ,1), E
ffi,s. Then, for any E > 0,
Since Z has a normal distribution, there exist Co > 1 and EO > 0 such that
for all 0 < E < EO,
(4.54)
Next, note that IIW~m)lloo = Op(1) and that by (an extension of) Theo-
rem 4.3 and the continuous mapping theorem (cf. Theorem 5.1, Billingsley
98 4. Extensions and Examples
Now, fix", E (0, to). Let t1 = ",/(3Co). Then, by (4.51) (with t = tI/6M),
there exists M1 ~ M such that for all n ~ M 1,
on An == {IIE*F~m)* _F(m) 1100 :::; 2M/v'n}. Hence, using (4.52) and (4.55),
and the arguments leading to (3.13), for n ~ M1 from (4.53), we get
Also, note that by Theorem 3.2, (4.52), and (4.53), ~ln ---+p O. Hence,
for any 0 < ", < EO, by (4.54) and (4.56), for sufficiently large n,
as IIG - FII(k) ---+ 0 and IIH - FII(k) ---+ O. While Frechet differentiability
of many robust estimators is known, the notion of strong Frechet differen-
tiability for statistical functionals is not very well-studied in the literature.
Hence, we have established validity of the bootstrap approximation assum-
ing regular Frechet differentiability only, so that Theorem 4.4 can be readily
applied in such known cases. For results under a further weaker notion of
differentiability, viz., Hadamard differentiability, see Chapter 12.
4.5 Examples
Example 4.5: Let {XoihEZ be a stationary real-valued time series with
autocovariance function 'Y(k) = Cov(XOi , XO(Hk)), i, k E Z. For 0 ~ u < n,
let '1'n(k) = (n - k)-l L~::lk XOiXO(i+k) - X5(n-k) be the estimator of 'Y(k)
introduced in Example 4.1. Then On = '1'n(k) and () = -y(k) admit a rep-
resentation satisfying the requirements of the "Smooth Function Model" .
Since the function H (.) is infinitely many times differentiable, conclusions
of Theorem 4.1 hold for '1'n(k) and -y(k), provided the time series {XoihEZ
satisfies the relevant moment and strong mixing conditions.
For the purpose of illustration, now suppose that {XoihEZ is an ARMA
(3,4) process specified by
For the process (4.57), the value of 'Pn, found by 10,000 simulation runs, is
given by 1.058, and the value of the level-l parameter () is given by -0.0131.
Figure 4.1 below presents a realization of a sample of size n = 102 from
the ARMA process (4.57). We now apply the MBB, the NBB, the CBB,
and the SB to this data set.
'"
o
~ ~------~------------------------------------~
o 20 40 60 80 100
FIGURE 4.1. A simulated data set of size n = 102 from the ARMA process
(4.57).
where X;6~) is sample mean of the first 100 MBB samples and where {In,1 ==
{In,1 (C) = E*X;6~)· The centering variable {In,1 may be evaluated without
any resampling by using the formula
where {Ln,l is computed (only once) using formula (4.59). The Monte-Carlo
approximation to the MBB estimator ~n(l, £.) is now given by
L
B 2
~n(l; £.)MC = B- 1 [rT1*2)] . (4.61 )
r=l
Note that as B ---* 00, the average of the [rT;2)] 2 -values tends to the
corresponding expected value E* [rT;~1)] 2 == ~n (1; £.). Thus, by choosing B
appropriately large, one can get an approximation to the MBB estimator
~n(1; £.) to any given degree of accuracy. In Table 4.1 below, we report the
MBB estimators (along with the other block bootstrap estimators) of 'Pn
for the data set of Figure 4.1 for different block sizes, including £. = 8.
As mentioned earlier, the "true" value of the target parameter is given by
'Pn = 1.058. The number of bootstrap replicates used here is B = 800.
(This value of B is chosen only for the purpose of illustration. In practice,
a much larger value of B may be desirable depending on the parameter
'Pn.)
TABLE 4.1. Block bootstrap estimates of the level-2 parameter C{Jn = ETfn based
on different (expected) block sizes, for the data set of Figure 4.1. The true value
of C{Jn is given by 1.058.
Block Size 4 6 8 10 15 20
MBB 1.159 1.085 0.881 0.820 1.078 0.884
NBB 1.299 0.904 1.093 0.763 0.879 1.030
CBB 1.020 1.106 0.951 0.812 0.968 0.808
SB 0.935 0.941 0.898 0.810 0.746 0.642
h
{Bi(2) .. .Z -_ 1, ... , b} were B(i-l)Hl and b -- l 100 / eJ. Next, we gen-
Bi(2) -
=
erate the NBB samples by res amp ling ko blocks from this collection. For
e= 8, this amounts to resampling ko = 13 blocks from the collection of b =
12 disjoint blocks {Bi2 ), ... ,Bi;)} = {{Xl"",Xs}"",{XS9, ... ,X96}}'
The NBB estimator of the parameter 'Pn is given by
<Pn(2;e) = E* [Tl*~2)r.
Note that for this choice of e, the last 4 Xi-values never appear in the
definition of the NBB estimator <Pn(2; e). For Monte-Carlo approximation
to the NBB estimator <Pn(2; e), generate B sets of iid random variables
{rh,l, ... , rh,ko} with the Discrete Uniform distribution on {1, ... , b} and
define the replicates of T~(2) as
- *(3)
where X lOO is the sample mean of the first 100 elements of ko resampled
- *(3) -
blocks. Note that E*X lOO = X lOO , the sample mean of {Xl"", XlOO}, for
any choice of e and hence, we do not need an additional formula like (4.59)
to find E*X;6~). The CBB estimator of 'Pn is now given by
Note that like the CBB, E*X;~~) = X100 for any choice of £ and hence,
centering the SB version H(X;~~») of the estimator en
is simpler compared
to centering the MBB and the NBB versions. The SB estimator of 'Pn is
given by
CPn( 4; £) == E* [T;~4)] 2 •
For Monte-Carlo evaluation of CPn (4, £), say, again with £ = 8, for
each r = 1, ... , B, first we generate iid Geometric ( i)
random variables
{rLI, ... ,rLrK}, where rK = inf{l ::; k ::; 100 : rL1 + ... + rLk ~ 100}.
Note that rK, the number of resampled blocks under the SB method at
the rth replication, is random and unlike the first three block bootstrap
methods, rK's take on different values for different r's. Next, having
generated the rL/s, we independently generate iid Discrete Uniform
{I, ... ,100} random variables rI4,i, i = 1, ... , rK for each r = 1, ... , B.
This yields B sets of SB resampled blocks {B(rI4,i; rLi) : i = 1, ... 'rK},
for r = 1, ... ,B, where each set of resampled blocks contains at least 100
SB observations. We compute a replicate rT;~4) of T;~4) using the first 100
SB observations from the rth set, and combine these to get the Monte-
Carlo value of the SB estimate as CPn(4; £)MC = B-1 'E~=1 [rT;~4)t The
estimates for the SB method for the data set of Figure 4.1 are given in
row 4 of Table 4.1. D
T*(1)nv
In = V~
n ('Tn*(1) (2)
-,t,
_ 'T
A (2))
n, (4.63)
h
were 'Tn*(1)(2) -- (100)-1 "1\'100
L.,i=1
X*Oi X*0(i+2) - [X-*(1)j2
n-2 an d X*(1)
n-2
(100)-1 Li~~ X Oi . The "naive" MBB estimator of 'Pn is given by
Similarly, we can define the naive versions of the other three block boot-
strap estimators. In Table 4.2, we report the "naive" block bootstrap esti-
mators of 'Pn = 1.058 based on (expected) block sizes £ = 4,6,8,10,15,20.
TABLE 4.2. Block bootstrap estimates of cpn = ETfn for the data set of Figure 4.1
based on the "naive" approach. Here, the true value of cpn is 1.058.
Block Size 4 6 8 10 15 20
MBB 1.768 1.242 1.230 1.094 1.297 0.986
NBB 1.275 1.113 0.672 1.578 0.905 0.852
CBB 1.402 1.349 1.077 1.187 0.888 0.937
SB 1.571 1.183 1.126 1.151 0.902 0.760
r(k) = H(EYd
and
= H(Y1n, Y2n , Y3 (n-k)) ,
fn(k)
where fjm = m- 1 2::1 Y;ii' m 2: 1, 1 S; j S; 3. Note that in this case,
the estimator f n (k) does not directly fall in the framework of the Smooth
Function Model treated in Section 4.2, since it is a function of averages
of different numbers of X-variables in the first, the second, and the third
co-ordinates. However, by Remark 4.1 of Section 4.2, the block bootstrap
approximations for the sampling distribution of
106 4. Extensions and Examples
sup Ip*
x
(.In - k(r~(j)(k) - 1'~)(k)) :::; x) - P(Tln :::; x) I
---'>p 0 as n ----. 00 ,
for the data set of Figure 4.1 with k = 2 and n = 102. As in Example
4.5, we define the blocks in terms of the transformed vectors Y1 , ... , YlOO
for each of the block bootstrap methods. Suppose, for example, that the
(expected) block size C is chosen to be 6. Thus, for the MBB, the blocks
are {Hi == (Yi, ... , Yi+5) : i = 1, ... , 95}, for the NBB the blocks are
{(Y1 , ... , Y6 ), (Y7 , ... , Y12 ), ... , (Y91 , ... , Y96 )} , and for the CBB and the
SB, the blocks are defined using the periodic extension of Y 1 ,·· ., YlOO .
To generate the bootstrap samples, we res ample ko blocks for the first
three methods, where ko is the smallest integer not less than 100/C. For
C = 6, ko = 17. Similarly, for the SB, we resample a random number of
blocks of lengths L 1, .. . , L K where L 1, L 2 , ... are iid Geometric( i)
vari-
a bles an d K = m . f{k ::::: 1 : Ll + ... + Lk ::::: 100 . Let 1 , ... , 100
} y*(j) y*(j)
denote the first 100 bootstrap samples under the jth method, j = 1,2,3,4.
Although in Theorem 4.1 we have proved validity of the four block boot-
strap methods for res ample sizes nl for j = 1,2,3 and Nl for j = 4 mainly
to simplify proofs, consistency of bootstrap approximations continue to
hold if the res ample size is set equal to n. Hence, in practice, a resample
size equal to the size of the observed Y-vectors may be employed for the
"ordinary" block bootstrap estimators. Accordingly, in practice, we define
the block bootstrap versions of TIn as
T*(j)
In = V'iOo(r*(j)
lUU n (2) - 1'(j)
n (2))
' J' = 1, 2 , 3 , 4 ,
4.5 Examples 107
where r~(j) (2) = H (Y:;(j)) and r~) (2) = H ( E* (y:;(j))) with y:;(j) ==
(100)-1 2::;~~ ~*(j). Strictly speaking, this definition of the bootstrapped
statistic does not reflect the difference in the number of variables aver-
aged in different components of lj's in the definition of r~(j) (2), but the
effect is negligible. The bootstrap estimators of the distribution function
G ln (x) == P(Tln S x), x E lR is given by
(4.64)
(4.66)
A(j)MC( ) TV*(j). 2 3 4
qln a = (LBaJ) In , J = 1, , , .
108 4. Extensions and Examples
MBB NBB
~
51
§
fi!
0 o
-4 ·3 ·2 ·1 0 2 ·3 ·2 ·1 0 2
eBB SB
~
51 fi!
§ §
fi! fi!
0 0
·3 ·2 ·1 0 · 1 ·3 ·2 ·1 0
f(j)
a:,percentile -
- (T n (2) - _1_ ,(j)(1 - ~) T (2) _ _
VlOO qin 2 ' n
1_ ,(j)(~))
VlOO qin 2
(4.67)
For computing the interval estimates of r(2) using formula (4.67) , the boot-
strap quantiles q~~ (·)'s in (4.67) are further replaced with their Monte-
4.5 Examples 109
for a given function J : (0, 1) ---+ JR, where Fn is the empirical distribution
function of X I, ... , X n , and F- I and F;: I are as defined in Section 4.4. As
discussed there, iJ n may be represented as a statistical functional, say iJ n =
T(Fn). Fn§chet differentiability of Tat F depends on the joint behavior of
the functions J (.) and F (.) and may be guaranteed through different sets of
conditions on J(.) and Fe). Here, we state one set of sufficient conditions.
For variations of these conditions, see Serfling (1980) and Fernholz (1983),
and the references therein.
Note that the function F- I is nondecreasing and left continuous. Hence,
F- I corresponds to a measure on JR. We assume that
(ii) there exists 0 < a < b < 1 such that J(u) = 0 for all u rt. [a, b].
110 4. Extensions and Examples
By Boos (1979), under assumptions (i) and (ii), the functional T(·) is
Frechet differentiable at P under the sup-norm I . 11=. Hence, if we write
and T 2n for the MBB version of T2n based on blocks of size £, then under
the conditions of Theorem 4.4 and under assumptions (i) and (ii) above,
Note that for the a-trimmed mean (0 < a < 1/2), the level-l parameter
of interest is given by (cf. Example 4.4)
(4.69)
where the sgn(-) function is defined as sgn(x) = n(x ~ 0) - n(x ::; 0), x E lR
and {Xl,ihEZ is an ARMA(3,3) process satisfying
where E/S are iid N(O, 1) variables. Note that the marginal distribution P
of Xi is symmetric (about the origin), continuous and is strictly increasing
over R Furthermore, Xi has finite moments of all order and {XihEZ is
strongly mixing with an exponentially decaying mixing coefficient. Thus,
the conditions and conclusions of Theorem 4.4 hold for the centered and
scaled a-trimmed mean T2n = v'n(e n - B), where B is given by (4.68),
0< a < ~.
Now we consider the performance of the MBB in a finite sample situ-
ation. Figure 4.3 below gives a realization of Xi, i = 1, ... ,250 from the
process (4.69). We apply the MBB with block size £ = 10 to estimate the
distribution of T2n for different values of a. As in the previous examples,
we res ample bI == In/ £l = 25 blocks from the collection of overlapping
blocks Bi = (Xi, ... , Xi+9) , i = 1, ... ,241 to generate the MBB observa-
tions Xi, ... ,Xio; ... ; X2'4I, ... ,X2'50' Let XCI) ::; ... ::; X C250 ) denote the
4.5 Examples 111
order statistics corresponding to X;(l) , ... , X;~6). Then, define the MBB
version of T2n as
T;n = v'250(B~ - en)
where B~ = 2: na :':;:i:.:;:n(l-a) XCi)/[n(l - 2a)] and where en = (1-
2a)-1 J~-a F;:l(U)du and Fn(x) = E*[n-l2:~=l ll(X; :::; x)], x E JR. Us-
ing arguments similar to (4.59), we can express Fn(-) as
n
Fn(x) = LWinn(Xi :::; x), x E JR
i=l
where, with N = n - f + 1,
N-1 if f:::;i:::;N
Win = { i/(Nf) if 1:::;i:::;f-1
(n - i + l)/(Nf) if N+1:::;i:::;n.
With the help of this formula, we may further simplify the definition of
en, and write down an explicit expression for en that may be evaluated
without any resampling. Let X(l) :::; ... :::; X(n) denote the order statistics
of Xi, i = 1, ... , n. Also, let Wei) denote the weight associated with the
order statistic XCi)' For example, if X(l) = XlO and X(2) = X 3, then
W(l) = WlOn and W(2) = W3n. Then, the centering variable en may be taken
as
where La = max{k : 1 :::; k :::; n, 2:7=1 Wei) < a} and Ua = min{k : 1 :::;
k :::; n, 2:7=1 Wei) 2: 1 - a}.
FIGURE 4.3. A simulated data set of n = 250 Xi-values from model (4.69).
Figure 4.4 below gives the histograms of the MBB estimates of the distri-
bution function G 2n (x) == P(T2n :::; x), x E JR based on B = 800 bootstrap
replicates for a = 0, 0.08, 0.2, 0.4, and 0.5. Note that a = 0 represents the
case where On is the sample mean and a = .5 represents the case where On
is the sample median. Although we have verified the conditions of Theorem
112 4. Extensions and Examples
4.4 only for 0 < a < ~, here we include these limiting a-values to obtain
a more complete picture of how the MBB performs under varying degrees
of trimming. It follows from Figure 4.4 that the bootstrap estimates of the
sampling distribution are more skewed for larger values of a. Although T2n
is asymptotically normal for all these a-values, the "exact" distribution of
T2n is not symmetric for finite sample sizes. The limiting normal distri-
bution fails to reveal this feature of the true sampling distribution. But
the bootstrap estimates of the sampling distribution functions of T2n for
different a-values provide useful information on the skewness of the true
distributions of T2n .
DD DDO D'
D< D.
FIGURE 4.4. Histograms of the MBB distribution function estimates of the cen-
tered and scaled a-trimmed mean T2n for a = 0.0, 0.08,0.2, 0.4,0.5.
where Q2n({3), 0 < f3 < 1, is the f3th quantile of the conditional distribution
of T;;n. For the data set of Figure 4.3, 80% (equal-tailed) CIs based on the
MBB with block size C = 10 are given by
4.5 Examples 113
5.1 Introduction
In this chapter, we compare the performance of the MBB, the NBB, the
CBB, and the SB methods considered in Chapters 3 and 4. In Section 5.2,
we present a simulated data example and illustrate the behavior of the
block bootstrap methods under some simple time series models. Although
the example treats the simple case of the sample mean, it provides a rep-
resentative picture of the properties of the four methods in more general
problems. In the subsequent sections, the empirical findings of Section 5.2
are substantiated through theoretical results that provide a comparison of
the methods in terms of the (asymptotic) MSEs of the bootstrap estimators.
In Section 5.3, we describe the framework for the theoretical comparison.
In Section 5.4, we obtain expansions for the MSEs of the relevant boot-
strap estimators as a function of the block size (expected block size, for
the SB). These expansions provide the basis for the theoretical comparison
of the sampling properties of the bootstrap methods. In Section 5.5, the
main theoretical findings are presented. Here, we compare the bootstrap
methods using the leading terms in the expansions of the MSEs derived
in the previous section. In Section 5.5, we also derive theoretical optimal
(expected) block lengths for each of the block bootstrap estimators and
compare the methods at the corresponding optimal block lengths. Some
conclusions and implications of the theoretical and finite sample simula-
tion results are discussed in Section 5.6. Proofs of two key results from
Section 5.4 are separated out into Section 5.7.
116 5. Comparison of Block Bootstrap Methods
where, in each of the three models, the innovations Ei'S are iid N(O,l)
random variables.
Figure 5.1 below shows a plot of the MSEs of the block bootstrap es-
timators of 'Pn, produced by the MBB, the NBB, and the SB under each
of the models (5.1)-(5.3) and for a sample of size n = 100. The MSEs are
5.2 Empirical Comparisons 117
ARMA(l.l)
________________~
ar::.3,ma=.4
'"o n=100
SBB ~'-=MB=B--.//
~L~~~==~~:~=~~~
o
10 15 20 25
------~--
on
AR(l)
'"
~
- NBB
81'=.3
n 100 MBB
,..,on
on
0
10 15 20 25
~I ~
MA(l)
ma=.4
n=100
MB NBB
-- :::::
10 15 20 25
FIGURE 5.1. Mean square errors of the block bootstrap estimators of the level-2
parameter <.pn = nVar(Xn) at n = 100 under models (5.1)-(5.3). MSEs are
computed using K = 2000 simulation runs, with B = 500 bootstrap iterations at
each (expected) block length under each simulation run.
~
:2 ARMA(l.l)
0
ar=.3,ma=.4
0
;; n=400
SBB
~
0
10 20 30 40
.'"'
q
AR(l)
'"0 SBB ar=.3
n=4oo
NBB MBB
~
10 20 30 40
:~ 10 20 30
MA(l)
ma=.4
n=400
40
large sample sizes? In the same vein, if the SB has the largest MSE, how
large can the relative magnitude of the MSE of a SB estimator compared
to a MBB estimator be? Also, as the performance of each block bootstrap
method depends on the blocking parameter and there is a different "opti-
mal" block length in each case, how do the MSEs of the "best" estimators
from each of the four methods compare against one another? In the next
few sections, we describe some theoretical results that provide answers to
some of these questions. As we will see, the empirical results of the numer-
ical examples above are in close agreement with our theoretical findings
described below.
(5.4)
and
(5.5)
and compare the performance of the block bootstrap methods for estimat-
ing these, using the MSE criterion. Similar results can be proved for the
bootstrap estimators of the distribution function and for certain other func-
tionals (e.g., the quantiles) of the sampling distribution of en, although a
different set of regularity conditions and arguments would be needed.
For the sake of completeness, we now briefly describe the specific versions
of the bootstrap estimators of 'PIn and 'P2n considered here. Recall that,
we index the methods MBB, NBB, CBB, and SB as method number 1,2,3,
and 4, respectively, and denote the bootstrap samples generated by the
J'th me th0 d as X*j,l' X*j,2,···, or as X*(j)
1 , X*(j)
2 . t , were
, ... , as convemen h
j = 1,2,3,4. Let e denote the block length for the first three methods
and the expected block length for the SB method. For a given value of e,
we suppose that b = lnjeJ blocks are resampled for the MBB, the NBB,
and the CBB, resulting in n1 == be bootstrap observations. Denote the
corresponding sample mean by x~:P, j = 1,2,3. Thus, the bootstrap
version of the centered variable Tn = en - e under the MBB, the NBB, and
the CBB are given by
(5.6)
h
were *(j) = n -1 I:nl
X- n,.(. X*' '.
' - 1 ),t
D 1 ~-
Next consider the SB method. Since we denote the expected block length
bye, the block length variables L 1 , L 2 , ... of the SB method are now con-
ditionally iid Geometric random variables with parameter p = e-
1 . As a
120 5. Comparison of Block Bootstrap Methods
result, for the comparison here, we consider the SB estimators only cor-
responding to a finite set of values of the parameter p E (0,1), which are
reciprocals of an integer £ in the interval (1, n). Since the typical choice of
p is such that p -> 0 as n -> 00, it is possible to find an asymptotically
equivalent choice, p rv £-1, for a suitable sequence £ == £n -> 00, and thus,
this unified framework does not impose a serious restriction.
For a given value of £, we suppose that under the SB method K =
inf{1 ~ k ~ n : L1 + ... + Lk ?: n} blocks are resampled, resulting in N1 =
L1 + ... + LK bootstrap observations. Let X~::) == n- 1 2::~=1 X~,i denote
the average of the first n SB observations. As noted earlier, E*X~(:) = Xn
for all £. Hence, we define the bootstrap version of the centered'variable
Tn = en - () under the SB method by
(5.7)
Note that the level-2 parameters of interest given in (5.4) and (5.5) are
the first two moments of Tn, viz., 'P1n = Bias(en ) = ETn and 'P2n =
Var(en ) = Var(Tn). Hence, the bootstrap estimators of 'P1n and 'P2n are
respectively defined as
and
In the next section, we obtain expansions for the MSEs of the block
bootstrap estimators BIAS
j (£) and VAR
j (£), j = 1,2,3,4.
Then, we have the following result on the bias part of the bootstrap
estimators 'hn(j;.e) and CP2n(j;.e), j = 1,2,3,4.
Theorem 5.1 Assume that .e is such that .e- 1 + n- 1 / 2.e = 0(1) as n ---+ 00.
Thus, it follows from Theorem 5.1, that the biases of the bootstrap es-
timators of 'PIn and 'P2n are identical up to the first-order terms for all
four block bootstrap methods considered here. In particular, contrary to
the common belief, the stationarity of the SB observations X.x
1, Xi 2, ...
does not contribute significantly toward reducing the bias of th~ res~lting
bootstrap estimators. Also, the use of either overlapping or nonoverlapping
blocks results in the same amount of bias asymptotically. Since the bias of
a block bootstrap estimator essentially results from replacing the original
data sequence XI, ... , Xn by independent copies of smaller subsequences,
all the methods perform similarly as long as the (expected) length .e of
these subsequences are asymptotically equivalent.
Next we compare the variances of the block bootstrap estimators of 'PIn
and 'P2n.
Theorem 5.2 Assume that the conditions of Theorem 5.1 on the block
length parameter .e and on the index r in Conditions Dr and Mr for the
respective parts hold. Then, there exist symmetric, nonnegative real valued
122 5. Comparison of Block Bootstrap Methods
+ 0(n- 3 C), j =4 ;
(b)
+ o(n- 3 C), j =4 .
Thus, if ARE( -01n; -02n) < 1, then the sequence of estimators {-01n}n21
are less efficient than {-02n}n>1 in the sense that {(;In'S have larger MSEs
than the MSEs of the estimat~rs -02n'S, for large n.
Theorem 5.3 Assume that the conditions of Theorems 5.1 and 5.2 hold
and that Ak i= 0, gk(O) i= 0, k = 1,2.
(i) For £-1 + n- 1/ 3£ = 0(1), for any i,j E {1, 2, 3, 4}, k = 1,2,
£))
1:
ARE( c{!kn( 4; f); c{!kn(2;
Theorems 5.1-5.3 are due to Lahiri (1999a). Note that asymptotic rel-
ative efficiency of the SB estimators with respect to the MBB and the
CBB estimators in parts (ii) and (iii) of Theorem 5.3 can be found
by the identity ARE (tPkn (4; C); tPkn(j; C)) = ARE(tPkn(4; C); tPkn(2; C)) .
ARE(tPkn(2; C);tPkn(j; C)), j = 1,3 and, hence, are not stated separately.
Also note that parts (i) and (ii) of Theorem 5.3 correspond to the cases
where the leading terms of the MSEs of the block bootstrap estimators
are determined solely by their biases and their variances, respectively. It
follows that for smaller values of the block length parameter C (i.e., un-
der case (i)), all methods have an ARE of 1 with respect to one another.
For large values of C (i.e., under case (ii)), the ARE of the SB is less than
1/2 compared to the other block bootstrap methods based on nonrandom
block lengths. In the intermediate case (i.e., under case (iii)), the MSE has
nontrivial contributions from both the bias part and the variance part. In
this case, the ARE of the NBB or the SB with respect to the MBB and
the CBB lies between 1 and lower bound on the limits under case (iii). In
particular, the ARE of the SB estimator tPkn (4; C) with respect to the MBB
estimator tPkn(l; C) under case (iii) lies in the interval (0,1), depending on
the value of the constant C and the function gk.
and
19j = argmin{MSE(VARj(C)) : n E :s; C :s; n(I-E)/2} ,
1 :s; j :s; 4 denote the MSE-optimal block lengths for estimating the bias and
the variance of en, i)
where E E (0, is a given number. The following result
gives the optimal block lengths C~j' k = 1,2, j = 1,2,3,4 for estimating
'PIn, 'P2n for the four block bootstrap methods considered in this chapter.
Theorem 5.4 Suppose that the conditions of Theorem 5.3 hold. Then, for
k = 1,2,
5.5 Theoretical Comparisons 125
r/
£gj f"V (AV[7r 2gk(0)])1/3 . n 1/ 3, j = 2 ;
The formulas in Theorem 5.4 for the MBB and the NBB were noted
by Hall, Horowitz and Jing (1995). Note that the optimal block size for
the MBB is larger than that of the NBB by a factor of (3/2)1/3. For the
SB variance estimator of the sample mean, Politis and Romano (1994b)
show that the order of the MSE-optimal expected block length is n 1 / 3 • The
explicit formulas for the optimal block sizes for the SB bias and variance
estimators under the Smooth Function Model are due to Lahiri (1999a).
It is clear from the definitions of £gj that each block bootstrap method
provides the most accurate estimator of the parameter 'Pkn when it is used
with the corresponding optimal block length. In the next result, we compare
the block bootstrap methods at their best possible performances, i.e. when
each method is used to estimate a given parameter with its MSE-optimal
block length.
Theorem 5.5 Suppose that the conditions of Theorem 5.3 hold.
(a) Then, for k = 1,2,
MSE(CPkn(j; £gj)) = 3 1 / 3 [27r 2gk(0)Ak]2/3n -S/3 + 0(n- S/ 3),
j = 1,3 ;
(1 + e'W)gk(w)dw }Ak]
2/3
j = 1,3;
126 5. Comparison of Block Bootstrap Methods
Proof of Theorem 5.5: Follows from Theorems 5.1 and 5.2, and the
proof of Theorem 5.4. 0
Theorem 5.5 shows that when each method is used with the correspond-
ing MSE-optimal value of f, the MBB and the CBB has an optimal MSE
that is (2/3)2/3 times smaller than the optimal MSE for the NBB, and the
MSE of the optimal NBB estimator is, in turn, at least 2- 2/ 3 -times smaller
than that of the optimal SB estimator.
The following result shows that the ARE of the SB with respect to the
NBB at the optimal block length admits a lower bound.
Theorem 5.6 Assume that the conditions of Theorem 5.3 hold. Then, for
k = 1,2
Theorem 5.5 is due to Lahiri (1999a). The lower bound result in Theorem
5.6 is due to Politis and White (2003).
part. In this case, performance of all four methods are comparable with
all the AREs being equal to 1. This is a simple consequence of the fact
that, asymptotically, the biases of the bootstrap estimators derived from
all four methods have the same leading term. The finite sample simula-
tion example of Section 5.2 also supports this observation. When the block
bootstrap methods are used with block lengths close to the correspond-
ing optimal block lengths, the MBB and the CBB give the most accurate
bootstrap estimators.
Going beyond the bias and the variance of en,
it is possible to carry out a
comparison of the block bootstrap methods for more complicated function-
als (e.g., quantiles) of the sampling distribution of (a suitably studentized
version of) en.For estimating the distribution function and quantiles of a
studentized version of en,
the optimal block length is of the form const.n 1 / 4
for all four block bootstrap methods (cf. Hall, Horowitz and Jing (1995),
Lahiri (1999c, 2003c)). In this case, the AREs of the block bootstrap distri-
bution function estimators has an ordering that is exactly the same as that
for the bias and the variance functionals. Indeed, for block lengths growing
at a rate not slower than const.n 1 / 4 , the MBB and the CBB are the most
accurate among the four block bootstrap methods.
The results above show optimality of the MBB and the CBB only
among the four methods considered above. Carlstein, Do, Hall, Hesterberg
and Kiinsch (1998) have proposed a block bootstrap method, called the
Matched Block Bootstrap (MaBB) where the bootstrap blocks are resam-
pled using a Markov chain. Thus, unlike the four block bootstrap methods
covered here, the res amp led blocks under the MaBB are dependent. Un-
der some structural assumptions on the underlying process (e.g., AR(p)
or Markov), Carlstein et al. (1998) show that the MaBB estimator of the
variance of the sample mean has a variance that is of a comparable order
to the variance of the NBB estimator and has a bias that is of smaller or-
der. Thus, the minimum MSE of the MaBB is of a smaller order than the
minimum MSEs for the four methods considered here. Consequently, for
processes with a Markovian structure, the MaBB outperforms the above
methods at the respective optimal block sizes. For more general time se-
ries that may not necessarily have a Markovian structure, Paparoditis and
Politis (2001, 2002) recently proposed a method, called the Tapered Block
Bootstrap (TaBB) method, and showed that the TaBB method yields a
more accurate estimator of the variance-type level-2 parameters than do
the MBB and the CBB methods.
5.7 Proofs
Let S(i; k) = 2::~~7-1 Yn,j, i, kEN denote the partial sums of the periodi-
cally extended time series {Yn,ih?:l. Recall that (s = (EIIX11IS)1/S, s > 0,
128 5. Comparison of Block Bootstrap Methods
and for r E Z+ and 8 E (0, (0), let ~(r; 8) = 1 + E:=1 n2r-l[a(n)l2r~o. Let
{L(j;£) = E*X~(P, Co,j = DOH({L(j;£)), 1 :::; j:::; 4, Co = DOH(Xn ), a E
Zi, and Eoo = 'limn--+ooCov( V'nXn).
Under Condition Mr for any r E N, {XihEZ has a (continuous) spectral
density matrix fO, defined by the relation
5.7.1 Proofs of Theorems 5.1-5.2 for the MBB, the NBB, and
the eBB
Lemma 5.1 Assume that £ = O(n 1 - E ) Jor some 0 < E < 1, EIIXI I1 2r +8 <
00 and ~(r; 8) < 00, for some positive integer rand Jor some 8> O. Then,
Proof: We prove part (i) first. Note that by Lemma 3.2, for j = 1, we get
The proof is similar for j = 2. For j = 3, note that for any v E zt and any
1 < m < n/2,
Hence, the bound for j = 3 follows from (5.9), (5.10), and Lemma 3.2. This
proves part (i).
As for part (ii), note that for all j E {I, 2, 3}, fl(j; f) = L~=I WijnXi for
some nonrandom weights Wijn with IWijn 1 ::; 1 for all i, j, n. Hence, using
cumulant expansion for moments and Lemma 3.2, we get (ii).
Part (iii) is a consequence of parts (i) and (ii) and the following result
(cf. Lemma III.3.1 of Ibragimov and Hasminskii (1980)): For zero mean
independent random variables WI, ... , W m and for any integer r 2 1,
(5.11)
Proof of Theorem 5.1 for j=1,2,3: We prove the theorem only for the
bias estimators 1.{!1n(j;f), j = 1,2,3. The proof for the variance estima-
tors 1.{!2n(j; f), j = 1,2,3 is similar and, hence, is omitted. Without loss of
generality, suppose that IL = O. (Otherwise, replace Xi's with (Xi - IL)'S
in every step below.) Note that by Taylor's expansion of H(X~CP) around
fl(j; f), we have '
1.{!1n (j; f)
where, after some lengthy and routine algebra, the remainder term R1n (j; £)
can be shown to satisfy the inequality
IR 1n (j; £)1
< C((2, ao, d){ b- 1 11fl(j; £)11 2
+ (1 + IIfl(j;£)llao) ·IIP(j;£)II(b- 1E*IIS(Ij,1;£)/£11 2 )
+ E* (1 + IIP(j; £)llaO + IIX~:P lIao) IIX~:P 11 3 } . (5.14)
(g (g
£~~ g-1 t,E{ t~Ull) S~U1(Hj)) }
= L L {b(III + IJI;p + q)EZoo(IC)EZoo(JC)EZoo(I)Zoo(J)} ,
I ]
V(k, m; A) = II t~S(k; m) ;
iEA
5.7 Proofs 131
W(k, mj A) := II s~S(kj m) ,
iEA
m 2::: k 2::: 1. Then, for £1/2 :<: ; j :<: ; £ - £1/2, setting m := L£1/4 J, we have
+ S(£ + m + 1j £ + j)) }]
< C(p, q) . £-E.:fL max { ( EIIS(l, k) IIp+q) p+q ( EIIS(l, i) IIp+q) 1- p+q :
l:<::;a:<::;p+q}
Note that the variables V(l,j - m;IC), V(j + 1,£jI)W(j + 1,£j1), and
W(£+m+1, £+jj JC) are functions of disjoint sets of Xi-variables, which are
separated by m many Xi-variables from one another. Hence, by Proposition
3.1, Lemma 3.2, and Holder's inequality, with "y = 2r - (p + q) > 0, we get
lim E [C 1 / 2 8(1,
....... 00
i)] v = EZ~ (5.18)
for any v E Z~ with Ivl < 2r. Hence, the lemma follows from (5.16)-(5.18)
by observing that for any J, J,
{-J-m}-2 {{.-J}-2
. IICI+IJCI 0 • III+IJI
£-1 £- -£- U(j_m)V(£_j)
£1/25,j5,£-£1/2
where
Uj EVj(JC)EWj(JC)/j (IICltIJCll ,
Vj E{Vj(I)Wj(I)}/jC!lltIJll, j 2: 1 ,
and Uoo and Voo denote the limits of the sequences {Un}n~1 and {Vn}n~1
(cf. (5.18)), respectively. 0
IVar(CP1n(j;£)) - Var(Tn,j) 1
+ N- 1 g2 EllUl1 114]
0(n- 3g) .
r
Elb- l g- 2 L c<>(E*(S(I3,l;g)t -E*(S(h,l;g)t)1
t,
1<>1=2
o(n- r [n- .e + n- .e N
2 2 2 4 4 2 2 .e2 ])
The expansion for Var('Pln(j; .e)), j = 3 now follows from (5.22) and the
result for the case j = l.
To prove (5.20), note that for j = 2, with U};) = S((i - 1).e + 1; .e)/V£
and V2i = LI"'I=2 C", [U};)]"', i E Z,
+ t; a(i.e - .e)Wa
b-l
(EIIUl1 ll6+2a)
2
a+3
]
provided we show that ICov( Ufl, U~Hl») I = 0(1) for all lal = 2 = 1.81·
This can be done using arguments similar to those used in the proof
of Lemma 5.2. More precisely, writing U1 (Hl) as the sum U1 (Hl) =
.e- 1/ 2 [8(.e + 1,.e + m) + 8(.e + m + 1,2.e)] with m = l.e 1/ 4J, we have for
any lal = 1.81 = 2,
+ 4rl (Ell Ul1 114) 1/2 (Ell 8(1; m) 114) 1/4 (Ell 8(1;.e _ m) 114) 1/4
2r£ logn
< L E{E*118(I3,1;m)11 2rp(1-p)m-1}
m=l
O'(XI , ... , Xn) and Tn = .en V Xn = the smallest O'-field containing both
.en and Xn , n ~ 1, as in Chapter 3. Then, R 4,n may be considered as a
random vector on (n, F, P) and
E{ E(IIR~,nI12rIXn)} = E(IIR~,nI12r)
E[E{E(IIR~,nI12rl7;,)I.en}] .
Note that the random variables {I4,I,oo.,I4,n}, {LI,oo.,L n }, and
{Xl, ... ,Xn } are all independent. Hence, it follows that conditional on
.en, LK and wi's may be treated as nonrandom quantities. Consequently,
by Lemma 3.2,
E{ E(IIR~,nI12rITn) l.en }
< C(r) max { Ell ~aixil12r : 1:::; m:::; LK /\n, ai E {D, I}}
< C(r)(~;+lifl(r; J)LK .
Therefore, by Lemma 3.4,
Note that L~l Li :::; n+LK and that conditional on 7;" S(I4,i; Li)-LiXn,
1 :::; i :::; K are zero mean independent (but not necessarily identically
distributed) random vectors. By Lemma 3.4, (5.11), and the inequality
above (5.23),
E ( E* IIX~::) 112r)
Lemma 5.4 Let 9 : (-7f, 7f] ---t [0,00) be a continuous function that is
I: [T1
(i) nl!...~l: [e'W/(l- qe'W)]g(w)dw
+ cosw + (cos(w/2))2]g(w)dw .
I:
= 7fg(O)
nl!...~p.
I: I:
(ii) [e'w /(1- qe'W)] [e'2W /(1- q2e'2W)]g(w)dw
I:
= g(O) [2 (1
Proof: (i) Since 9 is real and symmetric, for any M > 1, we have
i:
138 5. Comparison of Block Bootstrap Methods
()
h M = 2 1lyl<M/2
(1+2qp-1sin2yp)cos2py+qp-1(sin22yp)
. 2
1 + 4qy2[(py)-2 sm py]
g(2py)dy
Since x/3 < sinx for all x E (0, 7r /2]' for any M > I, we have
< C[ r
JM<y«Mp)-'
y-2g(py)dy +
JO<W<M-l
r
g(W)dW]
lim h(M)
n->oo
r
JM-l<lwl<7I'
(4sin 2 ~ )-1 [(2 sin 2 ~) cosw + (sin W)2] g(w)dw
2 2
.
(5.27)
WE(-7r,7r).
Then, using the symmetry of g(w), it can be shown that the integral on the
left side of (ii) equals D7I' hn(w)dw. As before, we split this integral into
5.7 Proofs 139
three parts, now ranging over the sets [-Mp,Mp], {w: Mp < Iwl < 7f/2},
and {w : 7f /2 ::::: Iwl < 7f}, where M > 1. Arguments similar to (5.25) and
(5.26) yield,
lim p. r
Jlwl~Mp
hn(w)dw
(1
n-+=
171'/2<lwl<7I'
Ihn(w)ldw
Proof of Theorem 5.1, j=4: We prove the theorem only for 'PIn. With-
out loss of generality, let f..L = O. Note that the SB observations {X4' ih>1
form a stationary dependent sequence. As in Section 5.7.1, using Taylo~'s
expansion, for j = 4, we have (cf. (5.13)),
Hence, using Holder's inequality, Proposition 3.1, and Lemma 5.3 as in the
derivation of (5.15), for the case j = 4 we have,
E(R1n(j; £)) 2
+ E{E*(IIX*(j)
n,.c 1 6 + IIX*(j) o
n,£ 116+2a )}]
(5.32)
lal=11f31=1
n-1
+ L(1- n- 1j)qj {(a(j; 0'., (3) + a(j; (3, 0'.))
j=l
+ (a(n - j; 0'., (3) + a(n - j; (3, 0'.)) - 2(xn)a+f3}]
n- 1 L L a+f3
C [~qnj (a(j; 0'., (3) + a(j; (3, 0'.))
lal=llf3l=l j=O
-{1+2~(1-n-1j)qj}(Xn)a+f3], (5.33)
where qnj = (1 - n- 1j)qj + (n- 1j)q(n- j ), 1 :s; j :s; n - 1; and qnO = 1/2.
Therefore, by (5.31), (5.32), and (5.33), it follows that
Note that by Taylor's expansion, 11 - qj - jpl :S Pp2/2 for all j ;::: 1 and
all 0 < p < 1, and that
p-l(1 - qnj)(1 - n- 1 j) -+ j as n -+ 00 for all j;::: 1 .
Also, by the mixing and moment condition, L~1 j21EXf xf+ j I < 00.
Hence, using the Dominated Convergence Theorem (DCT), from (5.34),
we get
ca+f3[~qnj(u(j;a,jJ)+u(j;jJ,a))])
1:
var(n- 1 L L
lal=llf3l=1 j=O
{l- ijj~m}la(m;a,')')lla(v+m;{3,v)1
O(n- 4 s 2 ) . (5.40)
Next note that the functions '¢m(x) == (27f)-1/2exp(~mx), x E (-7f,7f],
mE Z form an orthonormal basis of the Hilbert space L2( -7f, 7f] with re-
spect to the inner product (iI, iz) = J::1I: /1 (X)/2(x)dx, iI, iz E L 2(-7f, 7f],
and, hence, 2:.':.=-00 (iI, '¢m)(iz, '¢m) = (iI,iz) for any iI,iz E L 2(-7f,7f].
Now using (5.37)-(5.40) and Condition M r , it can be shown that for any
unit vectors a,{3,,),,v E Zi,
n-2 n-l-j (n-j)-v-l_
n- 3 ~ ~ qnjqn(j+v) { L . (1 _ rlj~m)
J-O v-I m=-(n-J)+l
(n-j)-v-l
x{ L. m=-(n-J)+l
a(m;a,')')a(v+m;{3,V)}
5.7 Proofs 143
x (m~oo IIEXIX~+mll))
+O(n- 4 s2 +n- 2 exp (-(logn)2))
x (I: e-,(v+m)wf(w;(3,V)dW)]
+ O(n- 4 s 2 )
00
+ O(n- 4 s 2 )
I:
n-3(TI +q2/(1_ q2))
+ O(n- 4 s 2 ) . (5.41)
By similar arguments, it follows that for any a, (3, ry, v E Z~ with lal =
1 = 1(31 = Iryl = lvi,
I:
(1 - n-lijjvm)u(m + j + V; a, v)u(m - j; (3, ry)}
and
n-I (n-j)-I
n- 3 L q;j L (1 - n-I(lml + j)) x
j=O m=-(n-j)+1
144 5. Comparison of Block Bootstrap Methods
i:
n- 327r[4- 1 + q2(1 - q2)-1] f(w; a, ,)/(w; {3, v)dw
Let &In (j) = Ll et l=l LI,6I=l cet +,6 (&(j; a, (3) + &(j; (3, a)), 0 :::; j :::; n- l.
Then, by (5.36), (5.41)-(5.43) and Lemmas 3.3 and 5.4, we have
i:
(1 _(qe'W)2) -1}91(W)dw
+(qe'w)2
+ O(n- 4 s 2 )
27rn- 3 e[i: gl(w)dw + 27rg1(O) + i: cos Wg1 (W)dW]
+ o(n- 3 l) .
6.1 Introduction
In this chapter, we consider second-order properties of block bootstrap es-
timators for estimating the sampling distribution of a statistic of inter-
est. The basic tool for studying second-order properties of block boot-
strap distribution function estimators is based on the theory of Edge-
worth expansions. Let On be an estimator of a level-1 parameter Band
Tn = y'n(On - B)/sn be a scaled version of On such that Tn ~d N(O, 1).
If we set Sn to be the (asymptotic) standard deviation of y'n(On - B), then
Tn is called a normalized or standardized version of On. If Sn is an estimator
of the asymptotic standard deviation of y'n(On - B), then Tn is called a
studentized version of On. In many instances, it is possible to expand the
distribution function of Tn in a series of the form
uniformly in x E JR, where <I> and </J, respectively, denote the distribution
function and the density (with respect to the Lebesgue measure) of the
standard normal distribution on JR and where Pl (.; 'Y) is a polynomial such
that its coefficients are (smooth) functions of some population parameters
'Y. The right side of (6.1) is called a first-order Edgeworth expansion for
the distribution function of Tn. Next, let T~ denote the bootstrap version
of Tn based on one of the several block bootstrap methods presented in
Chapter 2. Under suitable regularity conditions on Tn, on the resampling
mechanism, and on the underlying time series, we can often expand the
146 6. Second-Order Properties
op(n-I/2) , (6.3)
log z = (logr) + LW ,
where L2 = -1. Then, log z is the so-called principal branch of the loga-
rithm, and it is analytic in the domain C \ (-00,0].
Let X be a JR.d-valued random vector and let '(t) = E exp(d' X), t E
JR.d denote the characteristic function of X. Note that under the moment
condition EllXlls < 00, log,(t) is s-times differentiable in a neighborhood
of zero. For v E Ziwith 1 ::; Ivi ::; s, let Xv denote the vth cumulant of X,
defined by
(6.4)
Let /-tv denote the vth moment of X, i.e., /-tv = E(XV), 1::; Ivl ::; s. Then,
it is possible to express the cumulants of X in terms of the moments of
X, and vice versa. Note that the exponential and the logarithm functions
admit the power series expansions
and
00 k
log(l + z) = 2:(-1)k-1 Zk , z E C, Izl < 1.
k=l
Using these expansions, we get the formal identity
f: (dt
(1 + Ivl=l
00
where C(VI' ... , Vki iI, ... , ik) are combinatorial constants, depending only
on their arguments, and where the summation L(k) extends over all
VI, . .. ,Vk EZi and il, ... , ik E N satisfying L~=l imvm = v. In the
one-dimensional case, we have the following relations (cf. page 46, Bhat-
tacharya and Rao (1986)):
/-LI
2
/-L2 - /-LI
/-L3 - 3/-L2/-LI + 2/-L~
/-L4 - 4/-L3/-LI - 3/-L~ + 12/-L2/-L~ - 6/-Li
/-L5 - 5/-L4/-LI - 1O/-L3/-L2 + 20/-L3/-L~ + 30/-L~/-LI
- 60/-L2/-Lf + 24/-L~ . (6.7)
For expressions for higher order cumulants, see Petrov (1975) and Kendall
and Stuart (1977).
Cumulants play an important role in the development of Edgeworth ex-
pansions for sums of independent random vectors. To gain some insight
into the derivation of the Edgeworth expansions for sums of independent
random vectors, first we consider the simpler situation involving iid ran-
dom vectors, at a heuristic level. Suppose that X is a JR.d-valued random
vector with EX = 0 and EllXlls < 00 for some integer s ~ 3. Then, in a
neighborhood of t = 0, by Taylor's expansion, we have
where R(t) = o(lltII S ) as Iltll ---- o. Let Tn denote the sum of n independent
copies of X. Then, noting that Xv = EX V = 0 for allivi = 1, by (6.8), for
any given t E IR d , we have
Eexp (d'(n- I / 2 Tn ))
[E exp(d' X/vn)]n
6.2 Edgeworth Expansions for the Mean Under Independence 149
exp(n[ L (d/vn)"Xv/v!+R(t/vn)])
2:'OIvl:'Os
exp( -t'~t/2) [1 +
f
m=l
(~n-rj2{ L (~t)"xv/v!}+o(n-(S-2)j2))m/m!]
r=l Ivl=r+2
(6.9)
(6.10)
for each fixed t E JRd, where PrO's are polynomials. The Edgeworth expan-
sions for n- lj2 T n (or, equivalently, for the scaled sample mean) is obtained
by inverting the expansion for the characteristic function in (6.10). Theoret-
ical justification for this inversion step and for the formal approximations in
(6.9) and (6.10) are quite involved, and will not be presented here. We refer
the interested reader to the monograph of Bhattacharya and Rao (1986)
for details.
Using the heuristic arguments above, we now describe the Edgeworth ex-
pansion theory for the normalized sum of independent (but not necessarily
identically distributed) random vectors. Let Xl, ... ,Xn be a collection of
independent JRd-valued random vectors with EXj = 0 for 1 :::; j :::; n. Let
Xv,j denote the vth cumulant of Xj, 1 :::; j :::; n and let Xv = n- l ~7=1 Xv,j'
Then, define the polynomials Pj (z) == Pj (z; .) in Z E Cd by the formal iden-
tity in u E JR (d. (6.9) and (6.10))
L UjPj(z; {Xv})
00
1+
j=l
It can be shown (d. Lemma 7.1, Bhattacharya and Rao (1986)) that for
each j ~ 1, Pj (z; {Xv}) is a polynomial of degree 3j in Z and its coefficients
are smooth functions of the cumulants Xv of order Ivl :::; j + 2. The density
7/Jn,s of the (s - 2)-th order Edgeworth expansion of the scaled sample mean
Sn == n- lj2 ~7=1 Xj is defined via its Fourier transform
Jexp(d'x)7/Jn,s(x)dx
150 6. Second-Order Properties
(6.12)
'1fJn,s(x) = (27r)-d r
J~d
'1fJ~,s(t)exp(-d'x)dt, x E JRd. (6.13)
Next we evaluate the integral on the right side of (6.13). Note that the
function
(6.14)
is the Fourier transform of the function
(6.15)
(6.16)
(6.17)
for j = 1, ... , s - 2. The exact forms of the polynomials Pj (-; .) are diffi-
cult to write down explicitly for a general d 2:: 1. See Bhattacharya and
Rao (1986), Chapter 7, for an illustrative example. Here we list the first
two polynomials for d = 1. For simplicity, suppose that the Xi's are iid
with mean zero and variance 1. Then, f; = 1. Furthermore, in the one-
dimensional case, the derivatives of the standard normal density function
cp(x) = (27r)-1/2 exp( -x 2 /2), x E JR, may be expressed in terms of the
Hermite polynomials Hk(X)'S, defined by the relation
dk
Hk(X)CP(X) = (-l)k-k (cp(x)), x E JR , (6.18)
dx
6.2 Edgeworth Expansions for the Mean Under Independence 151
See, for example, Hall (1992), page 44 and Petrov (1975), page 137. The
polynomials Pj(-; .), j = 1,2 are given by
1
P1(Xj {Xv}) = "6H3(x)J-L3, x E ~ , (6.20)
_ 1 2 1
P2(Xj {Xv}) = 72H6(x)J-L3 + 24H4(x)[J-L4 - 3], x E JR, (6.21)
whenever
(6.25)
Here v n = (p-n,s Un )-l/(s+d+1) , p-n,s = n- 3/ 2 ",n 1 n.(IIX·11 <
L....J=l EIIX·lls+
J J -
n 1/ 2), and 'Yn(E) L1:<=;jl, ... ,jS+d+l:<=;n sup{IINJi, ... ,j8+d+lI Bn,j(t)1
(16pn,3)-1 ::; Iltll ::; c 4 }, with Bn,j(t) = IEexp(d' Xj)1 + 2P(IIXj ll > v'n),
1 ::; j ::; n, t E ]Rd.
0, 1 :s; j :s; nand n- l 'L7=1 EXn,jX~,j = lId for each n :::: 1. Suppose that
for some integer s :::: 3 and some 15 E (0,1/2),
n
lim sup n- l
n->oo
L EIIXn,jIIS < 00 , (6.27)
j=l
and for some sequence {17n}n>l C (0, (0) with 17n = 0(n-(s-2)/2),
Let C denote the collection of all measurable convex subsets of ]Rd. Then,
(6.30) holds with B = C. For d = 1, if we set B = {( -00, xl : x E ]R}, then
also (6.30) holds and Theorem 6.2 yields a (s - 2)-th order Edgeworth
expansion for the distribution function of Sn.
Next we consider the important special case where the triangular array
{Xn,j : 1 :s; j :s; n }nEN derives from a sequence of iid random vectors
{Xn}n;:::l, i.e., Xn,j = Xj for all 1 :s; j :s; n, n:::: 1. Then, (6.26) and (6.27)
holds if and only if EllXI!Is < 00. And condition (6.28) holds if and only if
E = n~~ n- 1 var(f x
J=l
j) (6.32)
(C.3) There exists 5 E (0,1) such that for all n, m = 1,2, ... with m > 5-1,
there exists a V~~:-measurable random vector X~,m satisfying
(C.4) There exists 5 E (0,1) such that for all i E Z, mEN, A E V~OO, and
BE V'ttm'
(C.5) There exists 5 E (0,1) such that for all m, n, k = 1,2, ... , and A E
n +k
V n-k
(C.6) There exists 5 E (0,1) such that for all m, n = 1,2, ... with 5- 1 <
m < n and for all t E ffi.d with Iltll ~ 5,
not always the most useful one for the verification of the rest of the con-
ditions. See the examples given below, illustrating various choices of the
a-fields Vj's in different problems.
Condition (C.5) is an approximate Markov-type property, which says
that the conditional probability of an event A E V~~Z, given the larger
a-field V{Vj : j =I=- n}, can be approximated with increasing accuracy when
the conditioning a-field V{Vj : 0 < Ij - nl ::; m + k} grows with m. This
condition trivially holds if Xj is Vj-measurable and {XdiEZ is itself a
Markov chain of a finite order. Finally, we consider (C.6). It is a version of
the Cramer condition in the weakly dependent case. Note that if Xj's are
iid and the a-fields Vi's are chosen as Vj = a(Xj), j E Z, then Condition
(C.6) is equivalent to requiring that for some J E (0,1),
8-2
~n,8(X) = [1 +L n- rj2 pr(X)]¢V(X), x E IRd
r=l
for some polynomials PI (.), ... , Pr (-) and for some positive definite matrix
V, where ¢v is the density of the N(O, V) distribution on IRd. The sequence
{XihEZ in the example of Gotze and Hipp (1983) is stationary and m-
dependent with m = 1. Furthermore, Xl has finite moments of all orders
and it satisfies the standard Cramer condition (6.31). However, a "regular"
Edgeworth expansion for the sum of the Xj's does not hold.
Next, we give examples of some important classes of weakly dependent
processes that fit into the above framework and we indicate the choices of
the a-fields V/s and the variables X~,m's for the verification of Conditions
(C.3)-(C.6).
Xi = LajEi-j, i E Z , (6.34)
jEZ
6.3 Edgeworth Expansions for the Mean Under Dependence 157
(6.37)
where a1, ... , a p, {31, ... ,{3q (p EN, q E N) are real numbers and {EdiEZ is
a sequence of iid random variables as in (6.34). We also suppose that the
polynomials a(z) == 1- (a1z+··· +apz P), and (3(z) == 1 + {31Z+··· + (3qzq,
z E C have no common zeros in C and a(z) i= 0 for all z in the closed unit
disc {z E C: Izi ::; I}. Then, it can be shown that there exists a sequence
of constants {adiEz c JR satisfying (6.35) such that representation (6.34)
holds (see, for example, Chapter 3, Brockwell and Davis (1991)). If, in
addition, (3(l)ja(l) i= 0, then LiEz ai i= O. Thus, Conditions (C.3)-(C.6)
hold for the ARMA(p, q) model (6.37), provided E1 satisfies the standard
Cramer's condition (6.36), and the polynomials a(z) and (3(z) satisfy the
regularity conditions pointed out above. D
Example 6.2: Let {EdiEZ be a sequence of iid random variables and let
(6.38)
(6.39)
158 6. Second-Order Properties
See G6tze and Hipp (1983), pages 218-219, for the details of the verifica-
tion of (C.6). As mentioned earlier, in this case Condition (C.1) may be
replaced by the weaker moment condition (6.33) for a valid (8 - 2)-th order
Edgeworth expansion for the probability distribution of Sn. See Theorem
2.2, Lahiri (1993a). 0
more d~tails. 0
Xi = f(Yi), i E Z
00
= 1+ LUrQr,n(t) , (6.41 )
r=l
where Xr,n(t) is the rth cumulant of the random variable t'Sn, t E ~d.
For 1 ::; r ::; 8 - 2, this definition of Qr,n(t) is essentially equivalent to the
definition of the polynomials Pr(-; {Xv}) given in (6.11) in the independent
case. To appreciate why, note that we may replace the sum on the left side
of (6.41) by 2:::3 and formally define the polynomials Qr,nO using the
resulting identity. But this modification does not affect the first (8 - 2)
6.3 Edgeworth Expansions for the Mean Under Dependence 159
+ n- 1/ 2X,.,2(t) + ...
X,.,l(t)
+ n-(S-2)/2 X ,.,s_1 + o(n-(S-2)/2) (6.42)
(for t E JR d fixed) for some polynomials X,., 1(t), ... , X,., k (t), not depending
on n. As a result, for a stationary sequence {XdiEZ' the Edgeworth ex-
pansion for Sn may be written in terms of a set of polynomials that do not
depend on n. See Remark 2.12, Gotze and Hipp (1983) for more details.
Next, with ij,.,n(t) given by (6.41), we define density ~n,s ofthe Edgeworth
expansion for Sn in terms of its Fourier transform, by the relation
JeLt'x~n,s(x)dx
exp( -X2,n(t)/2) [1 + ~ n-,./2ij,.,n(d)] , t E JRd . (6.43)
IEf(Sn) - J fdYn,sl
where (BB)' = {x E ~d : Ilx - yll < E for some y EBB}, E > 0 and BB
denotes the boundary of B.
(6.47)
and, hence,
(6.48)
6.4 Expansions for Functions of Sample Means 161
where T2 = 2: 1.81=12: 1",1=1 D'" H(J.L)D.8 H(J.L)Cov(Xf, Xf). This is often re-
ferred to as the Delta method. Edgeworth expansions for Wn may be derived
by considering higher-order Taylor's expansions of the function H around
J.L. Suppose that for some integer s ~ 3, His s-times continuously differen-
tiable in a neighborhood of J.L. Then, we may express Wn as
s-l
WIn L n-(I"'I-1)/2 D'" H(J.L) [In(Xn - J.L)]'" la! + Rn,s
1"'1=1
Vn,s + Rn,s, say, (6.49)
where Rn,s is a remainder term that, under the moment condition
EIIX1 11 s < 00, satisfies
(6.50)
for some sequence on,s = o(n-(s-2)/2). Here the random variable Vn,s is
called a (s - 2)-th order stochastic approximation to WIn' Under (6.50),
the (s - 2)-th order Edgeworth expansions for WIn and Vn,s coincide. It is
customary to describe the (s - 2)-th order Edgeworth expansion for WIn
using that for Vn,s' Supposing (for the time being) that Xl has sufficiently
many finite moments, the rth cumulant Xr(Vn,s) of Vn,s can be expressed
as
Xr (TT) -
Vn,s = Xr,n,s + 0 (n -(s-2)/2) (6.51)
for 1 ~ r ~ s, where
t E ~, where p~ll (.), ... ,p~1~2(') are polynomials defined by the identity (in
u E~)
1 + ~1
00 [ s ( s-2
~(r!)-1 ~ UjXr,j (ttr
) ] m/ m!
162 6. Second-Order Properties
00
= 1 + :~::>jp1l1(Lt) , (6.54)
j=l
for t ERAs in Section 6.2, the Edgeworth expansion \]!~,ls for Vn,s is the
signed measure having the density (with respect to the Lebesgue measure
on ~)
say, where p~ll(_ d~) is defined by replacing (Lt)j in the definition of the
polynomial p}ll(it) with the differential operator (-1)j d~j' j ;::: 1, and
where ¢r2(X) = (27r7 2)-1/2exp(-x2/27 2), x E R The following result
of Bhattacharya and Ghosh (1978) shows that \]!~:s is a valid (s - 2)-th
order expansion for WIn, i.e., the error of approximating the probability
distribution of WIn by the signed measure \]!~,ls is of the order o(n-(s-2)/2)
uniformly over classes of sets satisfying an analog of (6.46).
for any collection B of Borel subsets ofIR satisfying (6.46) with d = 1 and
E =7 2 .
for polynomials p~2l, ... ,P~~2' where by (6.55) and a change of variables, it
easily follows that p~2l (x) = p~ll (TX), x E R
Next consider the case of studentized statistics. It turns out that in the
independent case, we can also apply Theorem 6.5 with a "suitable H" to
obtain an Edgeworth expansion for the studentized version of en,
given by
(6.57)
BEB
sup IP(W3n E B) - r [1 + ~
iB j=l
n-j/2P13l(X)] ¢(X)dxl = o(n-(s-2)/2)
(6.58)
for any collection 13 of Borel subsets of lR satisfying (6.30) with d = 1, where
2
pFl, ... ,p13 2 are polynomials and where, ¢(x) = (27r)-1/2 exp( _x2 /2),
164 6. Second-Order Properties
(6.61)
where T! = L:1"1=1 L: 1,61=1 C"c,6I: oo (a, (3), c" = D" H(f.-L)/a! and for lal =
1(31 = 1, a, (3 E Z~, I: 00 (a, (3) == I:(a, (3) = limn--+oo E[v'n(Xn - f.-L)]"+,6 =
6.4 Expansions for Functions of Sample Means 165
LjEZ E(X1 - f-L)a (X Hj - f-L)f3. In the dependent case, a valid (s-2)-th order
Edgeworth expansion (s 2': 3) can be derived for the normalized version
(6.62)
n-2
~~Js(x) = 1>(x) +L n-r/2q~2J(x)¢(x), x E JR .
r=l
As mentioned in Section 6.3, under the stationarity of the process {XdiEZ'
the vth cumulant Xv,n of Sn for v E zt,
2 :S Ivl :S s, may be expressed in
the form (cf. (6.42))
-
Xv,l,oo + n -1/2 Xv,2,oo
- + . . . + n -(s-2)/2-Xv,s-l,oo
+ o(n-(S-2)/2) as n ----> 00 (6.64)
for some Xv,j,oo E R The coefficients of the polynomials q~2J, ... ,ql~2 are
smooth functions of the partial derivatives D V H(f-L) , Ivl :S s -1, and of the
constants Xv,j,oo, 1 :S j :S s - 1,2 :S Ivl :S s, appearing in (6.64).
Although under the stationarity assumption on the process {XdiEZ' it
is possible to describe the Edgeworth expansion of W2n in terms of the
polynomials q}2J that do not depend on n, in practice one may group some
of these terms together to describe the Edgeworth expansion in terms of
the moments (or cumulants) of the centered and scaled sample mean Sn
directly. For example, a first-order Edgeworth expansion for P(W2n :S x)
(with s = 3) is given by
(6.65)
x E JR, where the constants K31 and K32 are given by K31 == K31n =
Llal=2 caES~ /Tn and K32 == K32n = [foE(Llal=l caS~)3 - 3T~K31 +
3E{ (Llal=l CaS~)2(Llal=2 CaS~)}l/(6T~). Here, T~ = Var(Llal=l caS~)
and Ca = DaH(f-L)/o;!, 0; E zt.
166 6. Second-Order Properties
7! = L COV(Y1, 1j+d ,
jEZ
(£-1)
f~ = L Wkn[h(Xn)'fn(k)h(Xn )] (6.66)
k=O
n-£ - -.
where r n(k) = n- 1 Lj=l (Xj - Xn)(Xj+k - Xn)', h IS the d x 1 vector of
A
first-order partial derivatives of H, and Wkn'S are lag weights, with WOn = 1
and Wkn = 2w( k / C), 1 :::; k :::; C-1 for some continuous function W : [0, 1) --+
[0, 1] with W (0) = 1. If C --+ CXJ and n / C --+ CXJ as n --+ CXJ, then f~ is consistent
for 7!. We define the studentized version of en as
(6.67)
and Lahiri (1996a). While Gotze and Kiinsch (1996) considered studen-
tized statistics under the Smooth Function Model (6.47), Lahiri (1996a)
considered studentized versions of M-estimators of the regression param-
eters in a multiple linear regression model. Here we follow Gotze and
Kiinsch (1996) to describe the Edgeworth expansion result for W3n . Re-
call the notation Yn = In
L7=1 1j = LIc>I=l cc>S~, Sn = 'L7=1 (Xj -In
fL), and T~ = n-1Var(L~=1 Yi). Let Tfn = 'L~:~ wknEY1Yl+k, 1Tn =
n- 1 L~=l L7=1 L~:~ Wkn E (Yi1j1j+k), and fL3,n = n 2E(Yn)3. Also, let
3 n denote the variance matrix of the (d + 1) x 1 dimensional vector
W4n == (vnYn; S~)' and let a'Y's be constants defined by the identity
~[31t (t)
n,3 Jexp(d'x)dY~,13(x)
1 + -1 . -1 [(fL3n
- - -1Tn) (d) 3 - (d)1T n ]
vn T~ 6 2
-
2
- exp(-t 2 /2)
+ ~(d)
vn t
'Y
a'Y( -l)hID'Y exp( -w' 3 n w/2) I _
w-(t,o, ... ,O)
(6.68)
Then, we have the following result due to Gotze and Kiinsch (1996) on
Edgeworth expansion for the studentized statistic W3n under dependence.
Theorem 6.6 Suppose that Condition (5.Dr) of Section 5.4 on the func-
tion H holds with r = 3, Llal=l!D"H(fL)! i= 0, and that E!!X1!!P+" < 00
for some J > 0 and p 2: 8, pEN. Furthermore, suppose that
Proof: See relations (6) and (7) and Theorem 4.1 of G6tze and Kiinsch
(1996). 0
Note that under the conditions of Theorem 6.6, the second term IT; -Tfn I
on the right side of (6.70) is o(n- 1 / 2 ) if the weight function w(x) == 1 for
all x E [0,1). A drawback of this choice of the weight function is that it
does not guarantee that the estimator f; of the asymptotic variance T! is
always nonnegative. However, under the regularity conditions of Theorem
6.6, the event {r; ::; O} has a negligible probability and it does not affect the
rate of approximation 0(£n-1+ 2 / p ) of the first-order Edgeworth expansion
T~,13((-oo,x]) to P(W3n ::; x). Another class of popular weights are given
by functions w(·) that satisfy w(x) = 1 + 0(x 2) as x --+ 0+. For such
weights, IT; - Tfnl = 0(£-2) and thus, in such cases, £ must grow at a
faster rate than n 1 / 4 to yield an error of o(n- 1 / 2 ) in (6.70).
(6.71)
6.5 Second-Order Properties of Block Bootstrap Methods 169
where n1 = b.e, ()~ = H(X~), and, with iln == E*(X~), On = H(iln) and T~ =
n1·Var*(LI",I=1 D'" H(iln)(X~)"'). Note that conditional on Xl' ... ' X n, X~
is the average of a collection of b iid random vectors. Hence, an expansion
for W2'n may be derived using the Edgeworth expansion theory of Sections
6.2 and 6.4 for independent random vectors. The exact form of the first-
order Edgeworth expansion for W2'n is given by
and
[y'nE* ( L c",(S~)"') 3
1"'1=1
(6.73)
sup Ip(W2n
xER
~ x) - Y~,J3(( -00, xl) 1= 0(n-1) .
Hence, by part (a) and (6.65),
The first term is of the order 0(n- 1/ 2 e), by Lemma 3.1. It is easy to check
that the second term is of the order 0(e- 1 ). This completes the proof of
Theorem 6.7. 0
L~ LC"[vnl(X~-Pnr']
1"1=1
b
b- 1 / 2 L {L C" (U~i - Pn Vcr'}
i=1 1<>1=1
b
b- 1 / 2 LYl~' say,
i=1
where C" = D" H(Pn)/a!, a E Zi. Hence, Var*(L~) = Var*(Ytl). This
suggests that an estimator of the conditional variance Var *(Ytl) is given
by the "sample variance" of the iid random variables Ytl' ... ,Ytb. Hence,
with Y"tb = b- 1 I:~=1 Yl~' we define
b
*2 =
Tn
b-1 "(y*
~ 1z
_ y*)2
Ib , (6.77)
i=1
as an "estimator" of Var*(Ytl) and define the bootstrap version of the
studentized statistic W3n as
172 6. Second-Order Properties
Gotze and Kiinsch (1996) suggested setting the MBB block length £ to be
equal to the smoothing parameter £ in the definition of the studentizing
factor f~ (cf. (6.66)). However, as they pointed out, second-order correct-
ness of the MBB approximation holds for other choices of the block length £
satisfying (6.69). See the last paragraph on page 1217 or Gotze and Kiinsch
(1996). For notational simplicity, we suppose that the block size parame-
ter £ and the lag-window parameter £ in (6.66) are equal. With this, we
now define the first-order Edgeworth expansion y~l3 of W3'n in terms of its
Fourier transformation (cf. (6.68)) .
lt
n,3 (t)
€[3 J exp( d ' x )dY ~,l3 (x)
M3
[1 + -'_- n { 1 3 I}] exp( -t
- -(Lt) - -(d) 2 /2)
ylnT~ 3 2
(6.78)
~~~ Ip* (W;n :::; x) -P(W3n :::; x)1 = Op (n-1+2/P£+n-l/2g-1 + IT~ -Trnl)
Proof: See Theorems 4.1 and 4.2 of Gotze and Kiinsch (1996). D
in the definition of the studentizing factor f~) important, but also is the
choice of the weight function w(·). Lahiri (1996a) considers the case where
the weight function w(·) == 1 and employs a different definition of the
bootstrap studentized statistic to establish second-order correctness of the
MBB for M-estimators in a multiple linear regression model. Relative merits
of the two approaches are not clear at this stage.
Second-order correctness of the NBB and the CBB, which are also based
on independent resampling of blocks of a nonrandom length, can be es-
tablished using arguments similar to those used in the proofs of Theorems
6.7 and 6.8. See Hall, Horowitz and Jing (1995) and Politis and Romano
(1992a) for a proof in the normalized case for the NBB and the CBB,
respectively. As for the SB, Lahiri (1999c) developed some iterated condi-
tioning argument to deal with the random block lengths in the SB method
and established second-order correctness of the SB method for studentized
statistics. For second and higher order investigations into the properties
of bootstrap methods for some popular classes of estimators in Economet-
rics (e.g., the "Generalized Method of Moments" estimators), see Hall and
Horowitz (1996), Inoue and Shintani (2001), Andrews (2002), , and the
references therein.
7
Empirical Choice of the Block Size
7 .1 Introduction
As we have seen in the earlier chapters, performance of block bootstrap
methods critically depends on the block size. In this chapter, we describe
the theoretical optimal block lengths for the estimation of various level-2
parameters and discuss the problem of choosing the optimal block sizes
empirically. For definiteness, we restrict attention to the MBB method.
Analogs of the block size estimation methods presented here can be de-
fined for other block bootstrap methods. In Section 7.2, we describe the
forms of the MSE-optimal block lengths for estimating the variance and the
distribution function. In Section 7.3, we present a data-based method for
choosing the optimal block length based on the subsampling method. This
is based on the work of Hall, Horowitz and Jing (1995). A second method
based on the Jackknife-After-Bootstrap (JAB) method is presented in Sec-
tion 7.4. Numerical results on finite sample performance of these optimal
block length selection rules are also given in the respective sections.
(7.1)
(7.4)
for some small E > 0. It will follow from the arguments and results below
that the theoretical optimal block length C~n is of the order n 1 / 3 for the
bias and the variance functionals (with k = 1,2), while the order of C~n
for the one- and the two-sided distribution functions, with k = 3 and
k = 4, are of the orders n 1/4 and n 1/5, respectively. Thus, the ranges
[mE, c 1 n 1/ 2 - E ] and [mE, c 1 n 1 / 3 - E ] of block lengths C in (7.6) and (7.7),
respectively, contain the optimal block lengths C~n for all k = 1,2,3,4.
Indeed, it can be shown that under some additional regularity conditions,
the theoretical optimal block lengths C~n have the same order even when
the ranges of C values in (7.6) and (7.7) are replaced by the larger interval
[m', c 1n 1 -,] for an arbitrarily small E E (0,1). However, we will restrict
7.2 Theoretical Optimal Block Lengths 177
e
attention to the range of values specified by (7.6) and (7.7) and will not
pursue such generalizations here.
For deriving expansions for the MSEs of the block bootstrap estimators
CPkn(e)'S, k = 1,2,3,4, we shall suppose that the level-1 parameter () and
its estimator On satisfy the requirements of the Smooth Function Model
(cf. Section 4.2). Thus, there exists a function H : lR d --t lR such that
(7.8)
and the function H is "smooth" in a neighborhood of j.L, where j.L = EX1
and Xn = n- 1 2:~=1 Xi. Recall that we write Co = DO H(j.L)ja!, DO for the
.a:. aal +.,,+ad
d1llerentml operator ax"l ... ax"d and a.
'TId,
= i=l ai· for a = (0.1,·.·, ad )' E
1 d
Zi·
7.2.1 Optimal Block Lengths for Bias and Variance
Estimation
Expansions of the MSEs of the MBB estimators of the bias and the variance
of the estimator On under the Smooth Function Model (7.8) was given in
Chapter 5. Here, we recast the relevant results in a slightly different form
by expressing relevant population quantities in the time domain. Let Zoo
be a d-dimensional Gaussian random vector with mean zero and covariance
matrix :Eoo = 2:;:-00 E{(X1 - j.L)(X1+j - j.L)'}.
(a) Suppose that Conditions (5.D r ) and (5.Mr ) of Section 5.4 hold with
r = 3 and r = 3 + ao, respectively, where ao is as specified by (5.D r ).
Then
A1 =- L L
101=11J31=1
Co+J3 [ f
j=-oo
IjIE(X1 - j.L)O(X1+j - j.L)J3] .
(b) Suppose that Conditions (5.D r ) and (5.Mr ) of Section 5.4 hold with
r = 2 and r = 4 + 2ao, respectively, where ao is as specified by Con-
r)
dition (5.D r ). Then,
where
A2 =- L L CaCf3 [
lal=II,BI=1
f
j=-oo
IjIE(XI - J1.)a(X1+j - J1.)f3] .
Proof: Follows from the proofs of Theorems 5.1 and 5.2 for the case
'j = l' (corresponding to the MBB estimators). 0
Note that under the regularity conditions of Theorem 7.1, both the bias
and the variance of the estimator On are of the order O(n- I ). Hence, we
state the MSEs of the scaled bootstrap bias estimator n· (PIn(l) and of
the scaled bootstrap variance estimator n . CP2n(l), in Theorem 7.1. Al-
ternatively, we may think of the scaled bootstrap estimators n . CPkn(l)
as estimators of the limiting level-2 parameters 'Pk,oo == limn---+oo n . 'Pkn,
k = 1,2, given by
'PI,oo L L
lal=II,BI=1
Ca+,B [ f
j=-oo
E(XI - J1.)a(X1+j - J1.),B]
and
'P2,oo = L L
lal=II,BI=1
cac,B [ f
j=-oo
E(XI - J1.)a(X1+j - J1.),B] .
Theorem 7.1 immediately yields expressions for the leading terms of the
theoretical optimal block lengths for bias and variance estimation. We note
these down in the following corollary.
Corollary 7.1 Suppose that the respective set of conditions of Theorem 7.1
hold for the bias functional (k = 1) and the variance functional (k = 2),
and that the constants Al and A2 are nonzero. Then, for k = 1,2,
for a given value Xo E JR. Hall, Horowitz and Jing (1995) consider both
the NBB and the MBB estimators of 'P3n and derive expansions for the
MSEs in the case of the sample mean, i.e., in the case where en = Xn
and () = EX1. An expansion for the MSE of the MBB estimator (f!Jn(f)
(say) of 'P3n is obtained by Lahiri (1996d) under the Smooth Function
Model (7.8). Here we follow the exposition of Lahiri (1996d) and describe
an expansion for MSE (<P3n(f)) under the framework of G6tze and Hipp
(1983), introduced in Chapter 6. Suppose that {XihEZ is defined on a
probability space (O,F,P), {XihEZ is stationary, and that {VihEZ is a
given sequence of sub-a-fields of F. For -00 ::; a ::; b ::; 00, let V~ denote
the smallest a-field containing {Vi: i E [a, b] nil}. For easy reference, we
now restate some of the conditions from Section 6.3, under the stationarity
assumption on the process {XihEZ,
(C.1) There exists 8 E (0,1) such that for all n, m = 1,2, ... with m > 8-1,
there exists a V~~:-measurable random vector X~,m satisfying
(C.2) There exists 8 E (0,1) such that for all i E Il, mEN, A E V~oo, and
BE V'tt-m'
(C.3) There exists 8 E (0,1) such that for all m, n, k = 1,2, ... , and A E
n +k
V n-k
(C.4) There exists 8 E (0,1) such that for all m, n = 1,2, ... with 8- 1 <
m < n, and for all t E JRd with Iltll ~ 8,
(7.12)
Then, there exist constants V31,V32 E (0,00) and B31,B32 E; lR such that
for Ixol ¥= 1,
r
MSE( <P3n (xo; £) )
where K3i (£) == K 3in (£), i = 1, 2 are smooth functions of certain bootstrap
moments. For Ixol ¥= 1, the leading term of the variance of <P3n(XO; £) comes
from the variance of the dominant term n-l/2(x~ -1)K32(£), which is of the
order (n- 1/ 2)2.n -l£2. In contrast, for Ixol = 1, the term n-l/2(x~-1)K31(£)
is zero and in this case, the leading term in the variance of <P3n(XO; £) is
given by the variance of n- 1/ 2K 31 (£), which is of the order (n- 1/ 2)2 . n- 1£.
7.2 Theoretical Optimal Block Lengths 181
On the other hand, the contribution to the bias of tP3n (xo; £) comes from
both !C 31 (£) and !C 32 (£), each having a bias of the order £-1. This explains
the sources of the various terms in the expansions for MSE( tP3n (xo; £))
in (7.13) and (7.14). The exact forms of the population quantities V31,
V32, B 31 , and B32 are very complicated, and hence are not presented here.
Interested readers are referred to Lahiri (1996d) for explicit expressions for
these parameters. Interestingly, neither of the two empirical methods, that
we describe in Sections 7.3 and 7.4 below for data-based selection of the
optimal block sizes, requires explicit definitions of these parameters.
Theorem 7.2 readily yields the following asymptotic expressions for the
optimal block lengths for estimating 'P3n(XO)'
Corollary 7.2 Assume that the conditions of Theorem 7.2 hold. Then, for
Ixol =1= 1,
Thus, the optimal block length for estimating the distribution function
of the normalized version of en is of the order n 1/4 at any given point
Xo E lR, Ixol =1= 1. For Ixol = 1, the optimal order is n 1/3 , the same as that
for estimating the bias and variance parameters 'P1n and 'P2n (cf. (7.11)).
Relations (7.15) and (7.16) give optional block lengths for local estimation of
the distribution function of the pivotal quantity fo( en -0) / Tn. The optimal
block length for global estimation of the distribution function 'P3n(-) ==
P( fo( en - 0) / Tn S .) can be obtained by minimizing an expansion for the
(weighted) mean integrated squared error (MISE) of tP3n (. ). An integration
of the expansions (7.13) and (7.14) yields
(7.18)
182 7. Empirical Choice of the Block Size
o _ 1/4 [2 2] 1/4
1'3n ,global - n B33/V33 + o(n 1/4 ). (7.19)
CPn(£) denotes the MBB estimator of the level-2 parameter 'Pn, based on
blocks of length £, where n is the sample size. Furthermore, suppose that
the MSE of CPn (£) admits an expansion of the form
for some constants C 1,C2 E (0,00), r E N, and for some sequence {an}n~1
of positive real numbers, over a suitable set In C N of block sizes. We shall
1 +
assume that the set In contains the set [n r+2 -€, n r+2 €] for some small
1
Note that by (7.21) and (7.22), the optimal block length £~ is of the order
1
n r + 2 • To define the Hall, Horowitz and Jing (1995) estimator of the theo-
retical optimal block length £~, we proceed as follows. Let m == mn be a
sequence of real numbers satisfying
(7.23)
(7.25)
where we employ the set Jm (not In) to define i~. Then, i~ is an estimator
of the theoretical optimal block length when the sample size is m. We need
to rescale this initial estimator to get an estimator of £~ of (7.22). Since
the optimal block length £~ in (7.22) is of the order W.!-2, the right scaling
factor here is [n/m] r.!-2. The Hall, Horowitz and Jing (1995) estimator of
£~ is given by
i~ = (i~) . [n/m] r.!-2 . (7.26)
Note that the Hall, Horowitz and Jing (1995) estimation method is ap-
plicable quite generally, requiring only that the MSE of the bootstrap es-
timator has (an expansion of) the form (7.21) for some r ~ 1 and that the
184 7. Empirical Choice of the Block Size
True values of 'P2n and 'P3n were found by 20,000 simulation runs. These
are given by 'P2n = 3.984 and 'P3n = .5226.
To find the theoretical optimal block lengths for 'P2n and 'P3n , we applied
the MBB method to generate block bootstrap estimators of the level-2
parameters 'P2n and 'P3n with several values of the block length .e. Table 7.1
below gives the expected value (Mean), the bias, the standard deviation
(SD) and the MSE's of the MBB estimators based on 1000 simulation runs.
From the table, it is evident that the optimal block lengths for estimating
'P2n and 'P3n are respectively given by .e~n = 3 and .egn = 2. Next the
7.3 A Method Based on Subsampling 185
TABLE 7.1. Determination of the true optimal block sizes for MBB estimation
of the level-2 parameters 'P2n and 'P3n of (7.28) and (7.29) for model (7.27). The
results are based on 1000 simulation runs. An asterisk(*) denotes the minimun
MSE value for a functional.
Table 7.2 gives the frequency distribution of the optimal block size esti-
mator R~n for 'Pkn, computed by formula (7.26) using 500 simulation runs.
As in Hall, Horowitz and Jing (1995), in this simulation study also, the
optimal block size estimators converged after a couple of iterations in a
majority of the cases. However, in some instances, there was a circular
behavior of the estimated optimal block size in successive iterations (e.g.,
the initial value 5 led to 8 which led to 3 and then, 3 led back to 5). The
frequency of such cases is given under the value -1. This problem ap-
peared to be more prevalent for distribution function estimation (i.e., for
'P3n of (7.29)) than for variance estimation (i.e., for 'P2n of (7.28)). In such
a situation, one may pick a value of R~n (from the set of all optimal block
lengths in different iterations) that corresponds to the minimum estimated
MsE m (1!).
Parts (a) and (b) of Table 7.2 show that for both level-2 parameters
'P2n and 'P3n, the estimated optimal block sizes have a pronounced mode
at the true optional block sizes, i.e., at £gn = 3 for 'P2n and at £~n = 2
for 'P3n. Furthermore, the distribution of the estimated optimal block size
for variance estimation has a longer right tail compared to that for the
distribution function estimation. However, the performance of this method
improves as the sample size n increases. See Hall, Horowitz and Jing (1995)
for further numerical examples and discussions.
TABLE 7.2. Frequency distribution of the optimal block sizes selected by the
Hall, Horowitz and Jing (1995) method for model (7.27) with n = 125, m = 30,
and initial block size Cion = 5, k = 1,2. Results are based on 500 simulation runs.
The value -1 of i~n' k = 1,2, corresponds to the cases where the iterations of
the method failed to converge.
7.4.1 Motivation
Let 'Pn be a level-2 parameter of interest and let <Pn(e) be a block bootstrap
estimator of 'Pn based on blocks of length £. From the discussion of Section
7.2, it follows that under suitable regularity conditions, the variance of
<Pn (£) and the bias of <Pn (£) admit expansions of the form
(7.30)
and
(7.31)
for some population parameters B E JR, v E (0, (0) and for some known
constants a E (0,00), r E N. For example, for the bias and variance func-
tionals 'Pn = 'PIn, 'P2n, r = 1, and a = 1, while for the distribution function
(at a given point xo) 'Pn = 'P3n(XO) with Ixol =1= 1, r = 2 and a = 1/2. In
this case, the MSE-optimal block size £~ == £~('Pn) is given by
o (2B2 )
£n = rv
rt2 1
nr+ 2 1
(
+ 0(1) )
. (7.32)
Like any other plug-in method, the nonparametric plug-in method focuses
2 _1_ 1
on the leading term (2;; ) r+2 n r+2 but estimates the level-3 parameters B
and v nonparametrically, as follows. Note that from (7.30) and (7.31), we
have
lim (n- l £r)-ln 2a Var(<pn(£)) = V (7.33)
n--->oo
and
lim £. n a Bias(<pn(£))
n--->oo
=B . (7.34)
This suggests that consistent estimators of v and B may be derived if we can
estimate Var(<pn(£)) and Bias(<pn(£)) consistently. Let YARn and BiASn be
nonparametric estimators of Yare <Pn (£)) and Bias( <Pn (£)), respectively, that
are consistent in the following sense:
and
(7.36)
and
(7.38)
(7.39)
(7.40)
7.4 A Nonparametric Plug-in Method 189
(7.42)
1
Indeed, if the optimal order of the block length for estimating 'Pn is n r+2 (cf.
(7.32)), then by Cauchy-Schwarz inequality, it follows that for any sequence
{e1} = {{lln}n>l satisfying the requirement
1
1 «C 1 «n r + 2 as n --7 ()() , (7.43)
(7.44)
Here m denotes the number of observations (or the size of the block) to
be deleted for defining the MBJ point values. For i = 1, ... ,n - m + 1, let
Xn,i = Xn \ {Xi, ... ,Xi+m-d denote the set of observations after the block
{Xi, ... , Xi+m- d of size m has been deleted from X n . Then, the ith MEJ
point value i~il is defined as
1 n 2
AR J"in
V (' ) --n(n-1)L....
'" (-Ci
"in l -"in
') (7.4 7)
"=1
For properties of the jackknife method for independent data, see Miller
(1974), Efron (1982), Wu (1990), Liu and Singh (1992), Efron and Tib-
shirani (1993), Shao and Th (1995), Davison and Hinkley (1997), and the
references therein.
Next we describe the JAB method for dependent data. Let rpn == rpn(R.) be
the MBB estimator of a level-2 parameter i.pn based on (overlapping) blocks
of size R. from Xn = {Xl, ... , X n }. Let Hi = {Xi, ... , XiH-l}, i = 1, ... , N
(with N = n-R.+ 1) denote the collection of all overlapping blocks contained
in Xn that are used for defining the MBB estimator rpn. Also, let m be
an integer such that (7.44) holds. Note that the MBB estimator rpn(R.)
is defined in terms of the "basic building blocks" Hi'S. Hence, instead of
deleting blocks of original observations {Xi, ... , Xi+m-l}, as done in the
MBJ method described above, the JAB method of Lahiri (2002a) defines
7.4 A Nonparametric Plug-in Method 191
the jackknife point-values by deleting blocks of 13i 's. Later in this section,
we will discuss how this simple modification plays an important role in
ensuring computational efficacy of the JAB method.
Since there are N observed blocks of length g, we can define M == N -
m + 1 many JAB point-values corresponding to the bootstrap estimator
CPn, by deleting the overlapping "blocks of blocks" {13i , ... , 13i+m-l} of size
m for i = l, ... ,M. Let If = {l, ... ,N}\{i, ... ,i+m-1}, i = l, ... ,M.
To define the ith JAB point-value cp~) == cp~) (g), we need to resample b =
Ln/gJ blocks randomly and with replacement from the reduced collection
{13j : j E If} and construct the MBB estimator of V'n using these resampled
blocks. More precisely, suppose that Tn = tn{A.'n; B) be a random variable
with probability distribution G n and let V'n = V'(G n ) for some functional V'.
Let J i1 , ... , Jib be a collection of b random variables such that, conditional
on X n , these are iid with common distribution
(7.48)
Then, the resampled blocks to be used for defining the JAB point-value
cp~) are given by
{13j*(i) -= 13J ; j '.'J -- 1, ... , b} . (7.49)
(7.50)
(7.51)
where B~(i) = H(X~(i») and where we set On,i H({ln,i) with {In,i =
E* X n ,Z = 1, ... ,M.
-*(i) .
192 7. Empirical Choice of the Block Size
Next we return to the general case of Tn == tn(Xn; 8) and define the JAB
variance estimator of r{;n as (cf. (7.46))
(7.53)
P( J 1 = jl, J b = jb I Pi = 1)
00 • ,
To appreciate the relevance of this result, suppose that hE;', ... , kEb},
k = 1, ... , K denote the set of blocks drawn randomly, with replacement
from the collection {B 1, ... , B N} for the Monte-Carlo evaluation of the
given block bootstrap estimator r{;n. Let hJ1 , ... , kJb} denote the random
indices corresponding to hB;',oO',kBb}' i.e., kBj = kEkJJ' 1 S j S b,
k = 1, ... , K. Then for any k, if all b indices kJ1,.'" kJb lie in If, by
Proposition 7.2, we may consider (kJ1,"" kJb) as a random sample of size
b from the reduced index set If = {I, ... ,N} \ {i, ... ,i + m - I}. Let
denote the index set of all such random vectors (kJ1,"" kJb). Then,
{(kJ1, ... , kJb) : k E It} gives us an iid collection of random vectors (of
possibly different sizes for different i E {I, ... , M}), each having the same
7.4 A Nonparametric Plug-in Method 193
distribution as (Ji1 , ... , Jib) of the Proposition. Thus, the res am pIes for
computing the ith JAB point-value I{!~) may be obtained by extracting
the subcollection {(kBr, ... , kBi;) : k E I;} from the original resamples
{(kBr, ... 'kBi;) : 1 :::; k :::; K}, and no additional res amp ling is needed.
The Monte-Carlo approximations generated by this method are close to
the true values of I{!~) 's, provided K is large.
As an illustration, consider the random variable Tn of (7.51) and suppose
that the level-2 parameter of interest is rpn = rp( G n ) for some functional
rp where G n is the sampling distribution of Tn. Figures 7.1 and 7.2 give
a schematic description of the main steps involved in the computations of
the MBB estimator I{!n and its JAB point-values r{!~), i = 1, ... , M. For
computing I{!n, we generate K iid sets of b many blocks {kBr,···, kBi;}
for k = 1, ... , K, compute the bootstrap sample mean kX~ and the boot-
strap version kT~ = Vn(ke~ - en) for each set with ke~ = H(kX~) and
en = H(fln). Then, the Monte-Carlo approximation to I{!n is given by rp( G~)
where G~ denotes the empirical distribution of the bootstrap replicates
hT~ : k = 1, ... , K}. For computing I{!~), we scan the K sets of resam-
pled blocks hBr, ... ,kBb}, k = 1, ... ,K and extract the ke~-values corre-
sponding to the block-sets hBr, . .. 'kBb} that do not contain any of the
blocks B i , ... ,Bi+m-l. Next, the bootstrap version of T~(i) are computed
by employing these ke~'S in the formula kT~(i) = yfnl(ke~ - en,i) where
en,i == H(fln,i). Note that fln,i is given by the average of block-averages
in the reduced collection {B j : j E IP} and can be computed without any
resampling. The copies kT~(i) 's are now combined to generate the Monte-
Carlo approximation to I{!~), just in the same way the kT~ 's are used for
computing the original bootstrap estimate r{!n.
(7.54)
C~=
2B2) r~2 nr+2(1+o(1))
(-:;:;; 1
(7.55)
194 7. Empirical Choice of the Block Size
1
G~ = empirical distribution of 1 e~ ,2 e~, ... ,K e~
(7.56)
COn =
2B;'] r!2 n _1
[-_- r + 2 , (7.57)
'rVn
7.4 A Nonparametric Plug-in Method 195
yes no
k~K
I ke~ : kElt I
~
r-::(i)l
~
FIGURE 7.2. Computation of the ith JAB point value 1f~) starting with
the resampled blocks {IB~, ... , IB;;}, ... , hBi, ... , kB;;} generated for the
Monte-Carlo computation of the block bootstrap estimator 1fn of Figure 7.1.
'PIn (7.58)
196 7. Empirical Choice of the Block Size
(7.59)
For k = 1,2, let £~n denote the optional block length for estimating 'Pkn,
defined by (7.6). Then, we have the following result.
Theorem 7.3 Suppose that Condition (5.D r ) of Section 5.4 holds with
r = 4,
(7.60)
and
(7.61 )
Also, suppose that Condition (5.Mr ) of Section 5.4 holds with r = 2 + k(2 +
ao) where ao is as in the statement of Condition (5.D r ). Then
(7.62)
for k = 1,2.
Proof: See Lahiri, Furukawa and Lee (2003). D
for some constant C4 • Numerical results of Lahiri, Furukawa and Lee (2003)
show that the choice C 3 = 1 in (7.63) for the initial block size i\ yields
good results for both the variance and the distribution function estimation
problems, while the corresponding values for C4 in (7.64) are given by
C4 = 1.0 for the variance functional and C4 = 0.1 for the distribution
function. Below we report the results from a small simulation study with
the above choices of C3 and C4 . For more simulation results, see Lahiri,
Furukawa and Lee (2003).
We consider the moving average model of Section 7.3, given by (cf. (7.27))
Xi = (Ei + Ei-l) /../2, i E Il, where {EdiE:&: is a sequence of iid random
variables having the centered Chi-squared distribution with one degree of
freedom. As in Section 7.3, we also set the level-l parameter to be () = EX 1 ,
the estimator On to be the sample mean Xn , and the level-2 parameters as
'P2n = n.Var(Xn) and 'P3n = P(y'n(On - ())/Tn ::; 0). The true value of
() is zero. Also, we take the sample size n to be 125. As stated in Section
7.3, the true values of 'P2n and 'P3n are 'P2n = 3.984 and 'P3n = 0.5226.
Furthermore, the theoretical optimal block sizes for estimating 'P2n and
'P3n by the MBB are .e~n = 3 and .egn = 2, as shown in Table 7.1.
Next we applied the nonparametric plug-in method to estimate the tar-
get values .e~n and .egn . Table 7.3 gives the frequency distribution of the
estimated optimal block sizes based on 500 simulation runs. The block
boostrap estimators in each case were evaluated using 1000 Monte-Carlo
replicates. Table 7.3 shows that more than 80% of the mass of the esti-
mated block size i~n for variance estimation lies in the interval [2,5] (the
true value being .e~n = 3). The method also produces very good results for
distribution function estimation, with a pronounced mode at the true value
.egn = 2, and a small support set {I, 2, 3}.
TABLE 7.3. Frequency distribution of the optimal block sizes selected by the
nonparametric plug-in method for model (7.27) with n = 125.
8.1 Introduction
In this chapter, we consider bootstrap methods for some popular time series
models, such as the autoregressive processes, that are driven by iid random
variables through a structural equation. As indicated in Chapter 2, for such
models, it is often possible to adapt the basic ideas behind bootstrapping
a linear regression model with iid error variables (cf. Freedman (1981)).
In Section 8.2, we consider stationary autoregressive processes of a general
order and describe a version of the autoregressive bootstrap (ARB) method.
Like Efron's (1979) IID resampling scheme, the ARB also resamples a single
value at a time. We describe theoretical and empirical properties of the
ARB for the stationary case in Section 8.2. In Section 8.3, we consider the
explosive autoregressive processes. In the explosive case, the initial variables
defining the model have nontrivial effects on the limit distributions of the
least squares estimators ofthe autoregression (AR) parameters. As a result,
the validity of the ARB critically depends on the initial values. In Section
8.3, we describe the relevant issues and provide conditions for the validity
of the ARB method in the explosive case.
The unstable autoregressive processes are considered in Section 8.4. Here,
the ARB with the natural choice of the resample size fails. A remedy to
this problem is given by the "m out of n" ARB, where the resample size
m grows to infinity at a rate slower than the sample size n. In the unstable
case, we describe the theoretical and numerical aspects of the ARB for
the first-order AR-models only. In Section 8.5, we present some results on
200 8. Model-Based Bootstrap
(8.1)
where pEN, (31, ... ,(3p are the autoregression parameters and {EdiEZ is
a sequence of zero mean iid random variables with a common distribution
F. In the sequel, we shall often assume that the autoregression parameters
(31, ... ,(3p are such that
p
It is well known (cf. Brockwell and Davis (1991), Chapter 3) that under
(8.2), the AR(p) process {Xi}iEZ of (8.1) admits an infinite-order moving-
average representation
00
Xi = L bjEi-j , (8.3)
j=O
Although the random variables Xi's under the AR(p) model (8.1) are de-
pendent, here we can use the model structure to generate valid bootstrap
approximations without any block resampling. As described in Chapter 2,
the basic idea is to consider the "residuals" from the fitted model, which
turn out to be "approximately independent," and then res ample the resid-
uals (with a suitable centering adjustment) to define the bootstrap ob-
servations through an estimated version of the structural equation (8.1).
Suppose that a finite segment Xl, ... ,Xn of the process {XdiEZ is ob-
served. Let fj1n, ... , fjpn denote the least squares estimators of (31, ... , (3p
based on Xl' ... ' X n . Thus, fj1n, ... , fjpn are given by the relation
8.2 Bootstrapping Stationary Autoregressive Processes 201
where Vn is a (n-p) xp matrix with ith row (Xi+P-1, ... , Xi), i = 1, ... , n-
p. Let, Ei = Xi - fhnXi-1 - ... - /JpnXi-p, i = P + 1, ... ,n denote the
residuals. Note that by using (8.1), we may express the residuals as
P
Ei = Ei - I::(/JjP - {lj)Xi- j , P + 1 ~ i ~ n .
j=l
As a consequence, when {ljn ----+p {lj for j = 1, ... ,p, the second term
is small for large values of n and thus, the residuals are approximately
independent. This suggests that we may resample the residuals, a single
value at a time as in Efron's (1979) lID bootstrap, to define the bootstrap
version of a random variable Tn == t n (X 1, ... , Xn; {l1, ... , (lp, F). However,
to generate a valid approximation, we need to center the residuals Ei'S first
and resample from the collection of the centered residuals, defined by
where En = (n - p)-l L~=P+1 Ei. Next, generate the bootstrap error vari-
ables <, i E /£ by sampling randomly with replacement from {Ep+1, ... , En}.
Thus, the random variables Ei, i E /£ are conditionally iid (given
Xl, ... ,Xn ) with common distribution
and let {Xi hEZ be a stationary solution of (8.7). If /Jjn ----+p (lj as n --+ 00
for j = 1, ... ,p, then such a solution exists on a set of Xi's that has prob-
ability close to one for n large. In practice, one makes use of the recursion
relation (8.7) for i ~ p + 1 to generate the bootstrap "observations" by
setting the initial p variables (arbitrarily) equal to Xl, ... ,Xp or equal to
zeros. When the polynomial /J(z) == 1- L;=l /Jjnzj does not vanish in the
region {izi ~ 1} (d. (8.2)), the coefficients of the p initial values die out
geometrically fast and, therefore, have a negligible effect in the long run. As
a consequence, one may generate a long chain using the recursion relation
(8.7) until stationarity is reached (Le., the effect ofthe initial values become
inappreciable) and may take the next m-values as the desired "resample" of
size m. The autoregressive bootstrap (ARB) version of a random variable
Tn = tn(Xl, ... , Xn; {l1, ... , (lp, F) based on a resample of size m > p is
given by
(8.8)
202 8. Model-Based Bootstrap
/31, ... ,/3p, given by (8.4). Let /3~ = (/3;n, ... ,/3;n)' denote the bootstrap
version of (In, obtained by replacing {Xl, . .. , Xn} in the definition of (In
by {X{, ... ,X~}. For a nonnegative definite matrix A of order p, let A 1/2
denote a p x p symmetric matrix satisfying A = A 1/2 . A 1/2. Also, recall
that t = A. Then, we have the following theorem.
(8.9)
o 20 40 60 80 100
FIGURE 8.1. A data set of size n = 100 simulated from the AR(l) model
Xi = 0.5Xi - 1 + Ei, i E Z where Ei'S are iid N(O, 1) variables.
True
AABOOT
I
'"
.,;
'"
.,;
..
.,;
"l
o
o
° i
·3 ·2 ·1 o 2 ·2 ·1 o
FIGURE 8.2. The ARB estimate of the sampling distribution of Tn for the data
set of Figure 8.1 is given as a histogram on the left. The corresponding cdf
(denoted by the dotted curve) and the true cdf of Tn (denoted by the solid
curve) are given on the right .
MBB method with block sizes C = 5, 10. Here, in each case, B = 500 boot-
strap replicates were used. It appears that the MBB distribution function
estimates are both skewed, with C = 5 leading to a higher level of skewness
compared to C = 10. The overall errors of the resulting bootstrap approx-
imations are effectively captured by the plots of the MBB cdf estimates
against the true sampling distribution of Tn, as given by the right panels
of Figure 8.3.
Next consider bootstrap CIs for (31 at nominal coverage levels 80% and
90%. Using the bootstrap quantiles from the above computations, we de-
rived two-sided equal tailed CIs for (31 based on the ARB method and
based on the MBB method with block lengths C = 2,5,10,20. The upper
and the lower end points of the resulting CIs are given in Table 8.1. CIs for
8.3 Bootstrapping Explosive Autoregressive Processes 205
~
8
'" <Xl
0
True
MBB (10.5)
~
'"0
~ ~ 0'"
§ '"0 j
0
0
0 ... -. ---.----
-4 ·3 ·2 ·1 0 -4 ·3 ·2 ·1 0 2
8 ~
'" <Xl True
0 MBB (1 ~ 1O)
~ <D
0
~ ~ 0" ,
~ '"0
0
0
0 ----._-
·4 ·3 ·2 ·1 0 2 -4 ·3 ·2 ·1 0
FIGURE 8.3. MBB estimates of the sampling distribution of the normalized least
square estimator Tn = (L~:ll X;)1/2(t31n - fh) with block lengths £ = 5 (top
row) and £ = 10 (bottom row) for the data set of Figure 8.1. The left panels give
the MBB estimates as histograms, while the right panels give the corresponding
MBB cdfs (denoted by dotted curves) against the true cdf of Tn (denoted by solid
curves).
/31 based on the true distribution of Tn are also included for comparison.
Except for the 80% MBB CI with £ = 20, all CIs contain the target value
/31 = 0.5. Note that the CIs given by the ARB have end points that are
closer to those of the exact CIs, than the end points of the MBB CIs.
The ARB approximation tends to be more accurate than the MBB ap-
proximation because it explicitly makes use of the structure of the model
(8.1). The quality of ARB approximation becomes poor when the model
assumptions are violated. In particular, if the order of the autoregressive
process is misspecified, the ARB method may be invalid, but the MBB
would still give a valid approximation. Also, the standard ARB method is
not robust against the values of the autoregressive parameters, particularly
when some of these parameters lie on the boundary, e.g., when /31 E {-1, 1}
for the AR(l) case (cf. Section 8.4).
TABLE 8.1. Two-sided equal tailed CIs for f31 for the data set of Figure 8.1 at
80% and 90% nominal levels, obtained using the true distribution of Tn, the ARB
method, and the MBB method with block sizes £. = 2,5,10,20.
90% CI 80% CI
Lower Upper Lower Upper
TRUE 0.253 0.559 0.287 0.525
ARB 0.255 0.551 0.288 0.520
MBB (£=2) 0.464 0.752 0.494 0.715
MBB (£ = 5) 0.340 0.629 0.368 0.594
MBB (£ = 10) 0.300 0.576 0.325 0.546
MBB (£ = 20) 0.262 0.524 0.292 0.495
where {Xl, ... , Xp} is a given initial set of random variables, {Eih?:p+l is
a sequence of iid random variables that are independent of {Xl"'" Xp},
and the autoregression parameters (31, ... ,(3p are such that the roots of
the characteristic polynomial \[I p (z) = zP - (31 zp-l - ... - (3p all lie in the
region {z E CC : Izl > I} ofthe complex plane. The model given by (8.10) is
known as an explosive autoregressive model of order (p). Note that unlike
the stationary case, the error variables E/S in (8.10) are not required to
have zero mean. This is because, in the following, we allow the Ei'S to be
heavy-tailed so that the expectation of El may not even exist. Another
notable difference of the explosive AR(p) model with the stationary case
is that the initial random variables {Xl, ... , Xp} have nontrivial effects
on the subsequent Xi'S. To gain some insight, consider the case p = l.
Then, the ith random variable Xi generated by the recursion relation (8.10)
involves the term (3i- l Xl. For the stationary case, 1(311 < 1 and the effect
of the initial random variable Xl on Xi becomes small at a geometric rate
as i ---; 00. In contrast, for the explosive case, 1(311 > 1 and hence, the
contribution from this term "explodes" in the long run, unless Xl = 0 a.s.
Furthermore, as we shall shortly see, the initial set of random variables have
a nontrivial effect on the limit distribution of the least squares estimators.
Suppose that {Xl, ... ,Xn } denote the observations from model (8.10).
Define the least square estimator /3n = (/3ln, ... ,/3pn)' of the autoregression
parameters (3 = ((31, ... ,(3p)' by relation (8.4) and let
(3'
A-
-
[
J[p-l
(8.11)
be a p x p matrix with its first row equal to (3' ((31, ... , (3p), where 0
denotes the (p -1) x 1 vector of zeros and where, recall that, for kEN, J[k
denotes the identity matrix of order k. Also, let
ex:>
U = LA-jEJ+p (8.12)
j=l
8.3 Bootstrapping Explosive Autoregressive Processes 207
and write U(1) for the first column of U. The following result, due to Datta
(1995), gives the limit distribution of the least square estimator vector i3n
in the explosive case.
Theorem 8.2 Suppose that the error variables {Eih2P+l are iid nonde-
generate random variables with Elog (1 + IEp +1l) < 00. Then,
(8.13)
Note that the limit distribution of the normalized least square estimator
Tn is nonnormal and depends on the initial variables (Xl, ... , Xp)' through
W. As a result, in the explosive case, any bootstrap method must use a
consistent estimator of the joint distribution of (Xl, ... ,Xp)' to produce
a valid approximation to the sampling distribution of Tn. This requires
one to impose further restrictions on the joint distribution of Xl, ... ,Xp ,
e.g., Xl' ... ' Xp are degenerate or (Xl' ... ' Xp)' follows a "known" ~
dimensional distribution. Alternatively, one may consider the conditional
distribution of Tn given (X!, ... , Xp)'. In view of the independence as-
sumption on (Xl' ... ' Xp)' and the sequence {Eih2P+l of error variables,
the conditional (limit) distribution of Tn is determined by the joint distri-
bution of {Eih2P+l, with (Xl, ... ,Xp)' held fixed at its observed value. The
bootstrap method described here follows this latter approach and generates
the bootstrap observation using the "bootstrap" recursion relation
(8.14)
by setting (Xi, ... , X;)' =: (Xl' ... ' Xp)'. Here, the bootstrap error vari-
ables Ei's are generated by random, with replacement sampling of the resid-
uals {Ei =: Xi - 2::~=1 i3jnXi-j : p+ 1::::: i ::::: n}. Unlike the stationary case,
because the expectation of the Ei'S may not be finite, centering of the resid-
uals is not carried out in the explosive case. However, in case EIEp+ll < 00
and E€p+l = 0 in (8.10), one may center the residuals Ei and resample
from {Ei =: Ei - En : i = p + 1, ... , n}, where En = (n - p) -1 2::~=P+1 Ei. The
resulting bootstrap approximation also yields consistent estimators of the
sampling distribution of Tn (conditional on X!, ... , Xp) (cf. Theorem 3.1,
Datta (1995)).
Let f3~ denote the bootstrap version of i3n, obtained by replacing the
Xi'S in (8.4) with X;*'s of (8.14). Also, define An and A~ by replacing {3
in (8.11) by i3n and {3~, respectively. Then, a studentized version of i3n is
given by
208 8. Model-Based Bootstrap
The ARB versions of Tn and TIn are respectively given by T;; = (A~)n(,6~
iJn) and T!n = ([A~]')n(,6~ - iJn).
To state the main results on the ARB method in the explosive case,
write Y = (X I, ... , Xp)'. Also, for any y E ]R.P and for a random vector
R = r(Ep +I,E p +2, ... ;Y), depending on {Eih?:p+1 and Y, let Py(R E')
denote the conditional distribution of R given Y at Y = y. Then, we have
the following result.
Theorem 8.3 Suppose that the conditions of Theorem 8.2 hold and that
the ARB samples Xi, ... ,X~, n 2: p + 1 are generated by (8.14) with the
initial values xt = Xi for i = 1, ... ,po Then, for any y E ]R.P,
(8.16)
under f3 = ±1, where W(·) denotes the standard Brownian motion on [0,1]
(see, for example, Fuller (1996)). In particular, the limit distribution of
Tn is nonnormal. In comparison, for the stationary case (viz., 1f311 < 1),
/31n - f31 = Op(n- 1/ 2 ) and Tn ---;d N(O, ( 2 ) as n ----> 00, while for the
explosive case (viz., 1f311 > 1), /31n - f31 = Op(f31n ) and the limit distribution
of Tn is also nonnormal and is given by (8.13) with p = 1.
For bootstrapping the unstable AR(l) process, we combine the recipes
for the stationary and the explosive cases as follows. Define the centered
residuals Ei = Ei - n- 1 2::7=1 Ei, 1 S; i S; n, where Ei = Xi - /31nXi-1,
1 S; i S; n. Starting with Xo = 0, generate the ARB sample Xi, ... , X,';-"
m 2: 1, using the bootstrap version of the relation (8.15)
(8.17)
where <'s are obtained by simple random sampling from the collection of
centered residuals {Ei : 1 S; i S; n}, with replacement. Unlike the stationary
case treated in Section 8.2, here the stationarity of {Xi, ... , X,';-,} is not of
paramount interest, as the AR(l) process (8.15) is itself nonstationary in
the unstable case.
210 8. Model-Based Bootstrap
L
(JR,B(JR)) by
Goo(A) = G(dx; -r), A E B(JR) , (8.19)
----+ Goo,
8.4 Bootstrapping Unstable Autoregressive Processes 211
Thus, Theorem 8.4 shows that for any x E JR, if n is large, the ARB
estimator Gn (( -00, xl) == P* (T;:,n ::; x) of the target probability P(Tn ::; x)
behaves like the random variable Goo (( -00, xl), which has a nondegenerate
distribution on JR. As a result, there exists an "lo, 0 < "lo < 1, such that
as n ----+ 00, where Too denotes the random variable appearing on the right
side of ----+d in (8.16). Thus, (8.20) shows that with a positive probability,
the ARB estimator P* (T;: n ::::: x) takes values that are at least "lo-distance
away from the target P(Tn ::::: x) for large n. In practical applications,
this means that for a nontrivial part of the sample space, the bootstrap
estimator P* (T;:,n ::; x) will fail to come to within "lo-distance of the true
value even for an arbitrarily large sample size.
In the literature, similar inconsistency of bootstrap estimators have been
noted in other problems. For sums of heavy-tailed random variables, in-
consistency of the IID bootstrap of Efron (1979) has been established by
Athreya (1987) under independence. A similar result for the MBB has been
proved by Lahiri (1995) in the weakly dependent case (cf. Chapter 11). See
also Fukuchi (1994) and Bretagnolle (1983) for other examples. The main
reason for the failure of the ARB method in the unstable case seems to be
different from the failure of the bootstrap methods in the other situations
mentioned above. The ARB method fails here apparently because of the
fact that the least square estimator fhn of (31, which we have used here
to define the residuals for ARB resampling, does not converge at a "fast
enough" rate when 1(311 = 1. Datta and Sriram (1997) propose a modified
ARB where they replace the least square estimator fhn in the resampling
stage by a shrinkage estimator of (31 that converges at a faster rate for
1,611 = 1. With this, they show that the modified ARB method produces a
valid approximation to the normalized statistics Tn for all possible values
Of,61 E R
A second modification that is known to have worked in the other ex-
amples mentioned earlier, including the heavy-tail case and the sample
extremes, is to use a resample size m that grows to infinity at a rate slower
than the sample size n. On some occasions, this has been called the "m out
of n" bootstrap (cf. Bickel et al. (1997)) in the literature. We shall refer to
the ARB method based on a smaller resample size m as the "m out of n"
ARB method. Validity of the "m out of n" ARB method for the unstable
case (as well as for the other two cases) has been independently established
by Datta (1996) and Heimann and Kreiss (1996).
212 8. Model-Based Bootstrap
Theorem 8.5 Suppose that Elfll2+8 < 00 for some 8 > 0, that the AR
parameter (31 E JR, and that m i 00 as n ---+ 00. Also, suppose that T;" n is
as defined in (8. 18}. '
(a) If min --+ 0 as n --+ 00, then
Don == sup Ip(Tn :::; x) - P*(T;",n :::; x)l---t p 0 as n --+ 00 .
xEIR
(b) Ifm(loglogn)2/n --+ 0 as n ---+ 00, then Don = 0(1) an n ---+ 00, a.s.
Proof: See Theorem 2.1, Datta (1996). o
Theorem 8.5 shows that, for a wide range of choices of the resample
size m, the "m out of n" ARB approximation adapts itself to the different
shapes of the sampling distribution .c(Tn) of Tn in all three cases, viz., in
the stationary case (1(311 < 1), to .c(Tn) that has a normal limit, and in
the explosive (1(311 > 1) and the unstable (1(311 = 1) cases, where .c(Tn)
has distinct nonnormal limits. An optimal choice of m seems to be un-
known at this stage and it is expected to depend on the value of (31. In a
related problem Datta and McCormick (1995) have used a version of the
Jackknife-After-Bootstrap method of Efron (1992) to choose m empirically.
The Jackknife-After-Bootstrap method seems to be a reasonable approach
for data-based choice of m in the present set up as well. Also, see Sakov
and Bickel (1999) for a related work on the choice of m.
An important implication of Theorem 8.5 is that the "m out of n" ARB
can be effectively used to construct valid CIs for the AR parameter (31
under all three cases. Indeed, as the scaling factor o=~:; Xf)1/2 in the
definition of Tn is the same in all three cases, this provides a unified way
of constructing CIs for (31 that attain the nominal coverage probability
asymptotically for all (31 E JR. For a E (0,1), let im n(a) denote the ath
quantile of T;",n, defined by im,n(a) = inf{t E JR : 'P*(T;",n :::; t) ~ a}.
Then, for 0 < a < 1/2, a 100(1- 2a)% equal tailed "m out of n" bootstrap
CI for (31 is given by
(8.22)
8.4 Bootstrapping Unstable Autoregressive Processes 213
for all (31 E JR, where P~l denotes the joint distribution of {Xih2:1 under a
given value (31. Thus, the Cl Im,n(a) enjoys a "robustness" property over
the values of the parameter (31 in the sense that it gives an asymptotically
valid Cl for all (31 E JR. However, the price paid for this remarkable property
is that in the stationary case, the "m out of n" Cl Im,n(a) has a larger
coverage error than the usual Cl In,n(a) where the resample size m equals
n. Thus, if there is enough evidence in the data to suggest that (31 E (-1, 1),
then m = n is a better choice.
We now describe a numerical example to illustrate finite sample prop-
erties of the ARB in the unstable case. We considered model (8.15) with
Ei ,....., N(O, 1) and (31 = 1, and compared the accuracy of the usual "m = n"
and the "m out of n" ARB approximations to the distribution function of
the normalized statistic Tn when the sample of size n = 100. The choice
of m in the "m out of n" bootstrap was taken as m = 30, which was close
to the choice m = n 3 / 4 , considered in Datta (1996). Figure 8.4 shows the
usual ARB distribution function estimators with m = n = 100 and the "m
out of n" ARB distribution function estimators with m = 30 for four data
sets of size n = 100, generated from the AR(l) model (8.15) with the above
specifications. In each case, B = 500 bootstrap replicates have been used
to compute the bootstrap estimator P* (T~,n ::; .). The true distribution of
Tn, found by 10,000 simulation runs is shown by a solid curve, while the
"m = n" and the "m = o(n)" ARB distribution function estimators are
denoted by dotted and dashed curves, respectively. Notice that for all four
data sets, the modified ARB produced a better fit to the true distribution
function of Tn. A more quantitative comparison is carried out in Table 8.2,
which gives the values of the Kolmogorov-Smirnov goodness-of-fit statistic
for the four data sets. For all four data sets, the distance of the "m = n"
ARB from the true distribution function of Tn is at least 34% larger than
that of the "m out of n" ARB, as measured by the Kolmogorov-Smirnov
statistic.
C! 0
~,~
'"ci '"ci
/"/.-.
/~ ..
'"ci '"ci
..,.
ci ci
,/
'"ci '"ci
0 0
ci ci
·3 ·2 ·1 0 2 3 -3 -2 -1 2 3
C! C!
'"ci '"ci
'"ci
..
"l
0
..,.
ci ci
'"ci '"ci
0 0
ci ci
-3 -2 -1 0 3 -3 -2 -1 0 2 3
FIGURE 8.4. Bootstrap distribution function estimates and the sampling distri-
bution of the normalized least square estimator Tn = [I:~:11 Xl]l /2 (~1 n - fh) for
four data sets of size n = 100 from model (8.15) with fh = 1, Ei ~ N(O, 1). The
solid line is for the true distribution function, while the dashed and the dotted
lines respectively denote the usual "m = n" approximation and the "m out of n"
ARB approximation with m = 30.
{
(3(z) -=f. 0 for all Izl::; 1 (8.24)
a(z) -=f. 0 for all Izl::; 1
and a(z) and (3(z) have no common zero. Furthermore, we suppose that
a q -=f. 0, and (3p -=f. O. Then there exists a TJo > 1 (depending on the values
of (31, ... ,(3p and aI, ... , a q ) such that in the disc Izl ::; TJo, we have the
power series expansions
00
J j ,
"L.... b·z
j=O
"a
00
L.... ·zj
J , and
j=O
Xi l:PjEi-j, iEZ
j=O
l: 1 X
00
Ei j i- j , i E Z, and
j=O
l: aj(Xi - j -
00
and, for all k 2': 1, equating the coefficients of zk in the product on the left
side to zero, we have
(8.27)
216 8. Model-Based Bootstrap
L aj (Xi- j -
00
i
~ aj-1
(P ) q-1
- t; f3k X i+1-j-k + ~ €-s
(S~ ai+1+s-k a k
)
(8.28)
Note that by (8.25), ai = O('T/o i ) as i --+ 00. Hence, for large i's, the
contribution ofthe second term in (8.28) is small. Thus, we may concentrate
on the first term only and define an "approximation" to €i by estimating
the coefficients aj-1 's and 13k's above. This observation forms the basis for
defining a residual-based resampling method for a stationary ARMA (p, q)
process, which we describe next.
Suppose that a finite segment X n +p = {X 1- p, ... , Xn} of the ARMA
(p,q) process {XihEZ of (8.23) is observed. Let (ihn, ... ,/3pn)' and
(a1n, ... , a qn )' respectively denote some estimators of the parameter vec-
tors (131, ... , f3p)' and (a1, ... , a q)' based on X n +p such that
p q
L ajn zj ,
00
fin = tj=l
aj-1,n ( - t
k=O
/3knXi+1-j-k) , i = 1, ... , n , (8.31 )
where /3on = -1. Note that for a purely AR(p) process, if we set the moving
average parameters a1, ... ,aq equal to zero and also take ajn = 0, 1 :<:::; j :<:::;
8.5 Bootstrapping a Stationary ARMA Process 217
q, then it follows from (8.30) that aOn = 1 and ajn = 0 for all j 2': 1. In
this case, the "residual" fin of (8.31) reduces to fin = Xi - L:~=1 ~knXi-k'
which corresponds to the residuals defined in Section 8.2 for the ARB
method with ~kn = ~kn, the least square estimator of (3k, 1 ::; k ::; p.
The remaining steps of the bootstrap procedure for the stationary ARMA
process (we will call it the ARMA bootstrap or the ARMAB, in short)
parallel the steps in the ARB. Starting with fin, 1 ::; i ::; n, we form the
centered residuals Ein = fin - En, 1 ::; i ::; n, where En = n- 1L:~=1 fin.
Next, we generate iid bootstrap error variables C;:, i ;::: 1 - max{p, q} by
sampling at random, with replacement from {fin: 1 ::; i ::; n}. Then, we
define the bootstrap observations by using the recursion relation
p q
for i 2': 1 - max{p, q}, where, for i ::; - max{p, q}, we set Xt = 0
and Ei = 0 . The ARMA-bootstrap version of a random variable Tn
tn(Xn+p ; (31, ... , (3p, a1,···, a q ; F) is now defined by
(8.33)
where, with 0 = (0 1, ... , Op+q)', the variables Ej(O) and Z(j - 1; 0) are de-
fined as Ej(O) == 2:{:~[ak(O)l (Xj- k - 2:f=l OiXj-k-i) and Z(j - 1; 0) =
2:t:~[ak(O)l (Xj- k- 1, ... ,Xj-k-p;Ej-k-1(0), ... ,Ej-k-q(0))', 1 ::; j ::;
n. Here, the factors ak (O)'s are formally defined by the relation (cf.
(8.25),(8.30))
(8.36)
(8.37)
(8.39)
8.5 Bootstrapping a Stationary ARMA Process 219
(8.41)
(a) P* (there exists a solution B~ of (8.40) such that IB~ - enl <
Cn- 1 / 2 logn) 2: 1 - C(10gn)-2 a.s. ;
Proof: Part (a) follows by using arguments similar to the proof of Theorem
4.2, Chapter 4. Allen and Datta (1999) gives a proof of part (b) assuming
vin((}~ - On) = OP. (1), a.s. Essentially, the same proof works in this
case. We leave the details of the modification of their proof to the reader. 0
9.1 Introduction
In this chapter, we describe a special type of transformation based-
bootstrap, known as the frequency domain bootstrap (FDB). Given a finite
stretch of observations from a stationary time series, here we consider the
discrete Fourier transforms (DFTs) of the data and use the transformed
values in the frequency domain to derive bootstrap approximations (hence,
the name FDB). In Section 9.2, we describe the FDB for a class of estima-
tors, called the ratio statistics. Dahlhaus and Janas's (1996) results show
that under suitable regularity conditions, the FDB is second-order accurate
for approximating the sampling distributions of ratio statistics. In Section
9.3, we describe the FDB method and its properties in the context of spec-
tral density estimation. Material covered in Section 9.3 is based on the
work of Franke and HardIe (1992). In Section 9.4, we describe a modified
version of the FDB due to Kreiss and Paparoditis (2003) that, under suit-
able regularity conditions, removes some of the limitations of the standard
FDB and yields valid approximations to the distributions of a larger class
of statistics than the class of ratio statistics. It is worth pointing out that
the results presented in this chapter on the FDB are valid only for linear
processes.
222 9. Frequency Domain Bootstrap
where'; = (6, ... ,';p)' and where, for i = 1, ... ,p'';i : [0,71"] ----> lR. is a
function of bounded variation. The parameter A(';, f) in (9.2) is called a
spectral mean. A canonical estimator of A(';, f) is given by
The following are examples of some common spectral means and their
canonical estimators.
£:
211r(cOSkW)In(W)dW
In(w)exp(-~kw)dw
(271"n)-1 LL
n n j1r
XtXj exp( -~wt) exp(~wj) exp( -~kw)dw
t=l j=l -1r
n-k
n- l L XtXHk ,
t=l
as I::1r exp(~mw)dw = 0 for any nonzero integer m. By similar arguments,
(9.4)
and
(9.5)
where ~4 is the fourth cumulant of (1. Dahlhaus and Janas (1996) showed
that under some regularity conditions, the FDB version of Rn (to be de-
e
scribed below) converges in distribution to N(O, 2n f P), with probabil-
ity 1. As a consequence, the FDB yields a valid approximation if either
~4 = 0 or f (I = O. The first condition is restrictive, as it specifies the
fourth cumulant of the innovations exactly and it holds, for example, if
(1 is Gaussian. In comparison, the second condition is less restrictive on
the distribution of the innovations but it limits the collection of ((.) func-
tions. Dahlhaus and Janas (1996) identified a large class of spectral mean
estimators, called the ratio statistics, for which the ((.) functions satisfy
the second condition f (I = 0 for any given spectral density f. We now
describe the ratio statistics.
Let
g(w) = I(w)/F(n), wE [O,n] (9.7)
denote the normalized spectral density of the process {XihEZ, where F
is the spectral distribution function, given by (9.4). Then, A((; g) == f (g
is a normalized spectral mean parameter with kernel ( : [0, n] ____ lRP • The
224 9. Frequency Domain Bootstrap
(9.8)
where In(w) = In (w)/ Fn (1f) is the normalized periodogram and Fn (·) is the
spectral distribution estimator of (9.5). Note that the normalized spectral
mean estimator A(~; I n ) can be written as the ratio of two spectral mean
estimators as
(9.9)
(9.10)
(9.11)
is given by
T~ = vn(B(~; J~) - B(~; gn)) , (9.12)
where
and
226 9. Frequency Domain Bootstrap
The summations in B(~; J~) and B(~; fin) above are approximations to the
corresponding integrals over the interval [0,7f], where the approximating
step functions are constant over subintervals of length 27f I n. This results
in the factor 27fln, which is comparable to the factor (7flno) appearing in
Dahlhaus and Janas (1996). However, the effect of this scalar multiplier
vanishes for ratio statistics, as the constants from the numerator and the
denominator cancel out each other.
The rescaling in Step 2 plays a role similar to the centering of the es-
timating equations in the context of bootstrapping the M-estimators (cf.
Section 4.3). Without the rescaling, the FDB approximation may fail to
be consistent. Although the given variables Xi's are dependent, the stu-
dentized periodogram variables tjn'S are approximately iid and, hence, the
resampling scheme in Step 3 above resamples a single value at a time as
in Efron's (1979) iid resampling scheme (cf. Section 2.2). An alternative
version of the FDB can be defined by replacing the bootstrap variables Ej'S
by iid standard exponentially distributed variables EJ's (say) in Step 3
and using Ij~ == fjn . Ej, 1 :::; j :::; no as the bootstrap periodogram values,
instead of Ijn's of Step 4. Both versions of the FDB are known to have a
similar accuracy up to the second-order (cf. Remark 6, Dahlhaus and Janas
(1996)).
Conditions:
(C.1) {XdiEZ is a linear process of the form
Xi = Laj(i-j, i E Z, (9.13)
jEZ
where ai's are real numbers satisfying ai = O(exp( -Clil)) as Iii ----> 00
for some C E (0,00) and {(diEZ is a collection of iid random variables
with E(l = 0, E(f. = 1, Ea = 0 and Ea < 00.
(C.2) (i) The spectral density f( w) == (27f)-1[ I:jEz aj exp(~wj) [2, Iwl :::;
7f satisfies
inf f(w) > 0 . (9.14)
wE[O,n]
(C.3) The function ~ = (6, ... , ~p) : [0, 'if] ----+ ]RP is of bounded variation
(component wise) and
for some C E (0,00) where ~t (k) = 2 foK ~(w) cos kwdw is the Fourier
coefficient of ~ (extended as a symmetric function over [-'if,'if]).
(C.4) (i) ((1, (r)' satisfies Cramer's condition, i.e.,
for {j1, ... ,j8} C {I, ... ,no -I} or the (d+ 1) dimensional spec-
tral mean estimator f(e,l)'In. Then, I; = limn-tCXJ Cov(Wn )
exists and is nonsingular in each case. Further, L;l
f((, 1)'((, 1)P is nonsingular.
Next, we briefly comment on the conditions. The exponential decay of the
coefficients {aj}jEZ in (9.13) is required for establishing valid Edgeworth
expansions for the normalized ratio estimator Tn, based on the work of
GCitze and Hipp (1983) and Janas (1994). It can be replaced by a suitable
polynomial decay condition if one is to have only consistency of the FDB for
Tn. The condition on the first two moments the innovations (i 's is standard.
°
However, the requirement that E(r = is very stringent. Dahlhaus and
Janas (1996) point out (in their Remark 5, page 1942) that the rate ofFDB
approximation is only O(n- 1 / 2 ), i.e., the FDB is only first-order accurate
when this condition fails. Condition (C.2)(i) ensures a nondegenerate limit
distribution of the periodogram In(w) at each w E [0, 'if]. The uniform
strong consistency of in
is known when in
is a kernel spectral density
estimator of f. See, for example, Theorem AI, Franke and HardIe (1992).
Exponential decay of the Fourier coefficients e(k) in (9.16) of Condition
(C.3) is again required for establishing a valid Edgeworth expansion for
the ratio statistic Tn (cf. Janas (1994)). This condition does not hold for
the ~-function of Example 9.2, corresponding to the spectral distribution
function estimator. The technical conditions of Condition (C.4) are needed
to establish valid Edgeworth expansions for Tn. As mentioned in Chapter
6, (9.17) holds if (1 has an absolutely continuous component with respect
to the Lebesgue measure on R
Theorem 9.1 Suppose that Conditions (C.1)-(o.4) hold. Then,
where C is the collection of all convex measurable sets in JRP, and An and
An are symmetric p x p matrices satisfying
respectively.
Thus, under the conditions of Theorem 9.1, the FDB provides a better
approximation to the distribution of the normalized ratio statistic than the
normal approximation, which has an error of the order O(n- l / 2 ). Dahlhaus
and Janas (1996) prove the second-order correctness of the FDB for nor-
malized ratio statistics in the more general case where the periodogram is
defined using a data-taper. Furthermore, they also establish the superiority
of the FDB for the normalized Whittle estimator over normal approxima-
tion.
The results on the FDB for ratio statistics are valid under the assumption
that EX l = 0. If the mean of the stationary process {XdiEZ is indeed
unknown and estimated explicitly, say, by using Xi - Xn in place of Xi
for the calculation of the periodogram In (cf. (9.1)), the FD B has an error
of the order O(n-l/2) for ratio statistics. As a result, in this case, the
FDB no longer possesses the superiority over the normal approximation
(cf. Remark 4, Dahlhaus and Janas (1996)). Furthermore, as pointed out
earlier, the superiority of the FDB is also lost if the third moment of the
innovation variables does not vanish, i.e., if E(? -=I- O. Hence, it appears
that the superiority of the FDB approximation for ratio statistics is rather
sensitive to violations of the model assumptions.
I:
RMSE(w; h) E[in(w; h) - f(w)f / f(W)2
+ o([nh]-l + h- 4 ) (9.22)
as n ----+ 00, h ----+ 0 such that nh ----+ 00. Thus, the optimal h that minimizes
the RMSE in (9.22) is asymptotically equivalent to Co . n- 1 / 5 for some
suitable constant Co E (0,00). In the sequel, we suppose that bandwidth
h for the spectral density estimator inC h) lies in an interval of the form
[c5n- 1 / 5 , c5- 1 n- 1 / 5 ] for some arbitrarily small c5 > O. In the next section, we
describe the FDB for inC h) under this restriction, although the bootstrap
algorithm itself may be stated almost without any changes for other values
of h.
has a bias that is of the same order as its standard deviation. As a re-
sult, for a valid approximation, the bootstrap algorithm must implicitly
correct for the effect of the bias. A similar situation arises in the con-
text of density estimation (cf. Romano (1988), Faraway and Jhun (1990),
Hall (1992), Hall, Lahiri and Truong (1995)) and regression function esti-
mation (cf. HardIe and Bowman (1988), Hall, Lahiri and Polzehl (1995))
with both independent and dependent data. Since hnP'jn)! f(Ajn)'S are
approximately independent, this leads to the "approximate" multiplicative
regression model
(9.23)
with "approximately" independent error variables Ejn'S and with the "re-
gression function" fe). The FDB for the spectral density estimation makes
use of two bandwidths hln and h2n of different orders as in bootstrapping a
nonparametric regression model with independent errors (cf. Hall (1992)).
The main steps of the bootstrap procedure are as follows:
h - -l,\",no
were E· n = no Dj=l
A
Ejn·
Step 3: Draw a sample Ein' ... ,E~on of size no, randomly, with replacement
from {Ejn : j = 1, ... , no}.
(9.24)
fn*( w; h) = (
nh)- 1 0~ K (W-Ajn)
h *
I1n(Ajn), wE [-7f,7f] .
j=-no
(9.25)
As in the FDB for ratio statistics, the rescaling of the "residuals" in Step
2 ensures that Ein's have mean 1, and this avoids an additional bias at the
resampling stage. Indeed, the FDB may fail without the rescaling. As the
regularity conditions below show, the two bandwidths h 1n and h2n used in
the FDB above are required to satisfy different decay conditions. The initial
9.3 Bootstrapping Spectral Density Estimators 231
Conditions:
(C.S) (i) {h n }n>l is a sequence of positive real numbers such that there
exists "5 E (0,1) such that h n E [In- 1/ 5 , J-1 n -1/5] for all n 2
J-1.
(ii) hln ---+ 0 and (nhfn)-l = 0(1) as n ---+ 00.
(9.26)
(9.27)
Note that we use in(w; h 2n ) to center and scale the FDB version f~(w; h n )
in (9.27). That this is the appropriate quantity for normalizing f~(w; h n )
follows from (9.24) in Step 4 of the FDB. Since the bootstrap periodogram
values were generated with in(>'jn; h 2n ), by comparing the relations be-
tween the pairs of equations (9.21) and (9.23) in the unbootstrapped case
with their bootstrap analogs (9.24) and (9.25), we see that in(-; h 2n ) plays
the role of the true density fO for the bootstrap spectral density estimator
f~(w; h n ).
The following result shows that the FDB provides a valid approximation
to the distribution of the normalized spectral density estimator in(w; h n )
for any given w E [-7f,7f] and any h n satisfying (C.8)(i).
Theorem 9.2 Suppose that Conditions (C.5}-(C.8) hold. Then, for any
wE [-7f,7fJ,
where Rn(·;·) and R~(-;·) are as given by (9.26) and (9.27), respectively.
Proof: Theorem 9.2 is a version of Theorem 1 of Franke and HardIe
(1992), where their Condition (C.4) has been dropped and where the dis-
tance between the probability distributions of Rn and R~ in Mallow's met-
ric has been replaced by the sup-norm distance. Note that if (27f)-IKt(.)
denotes the characteristic function corresponding to the probability density
(27f)-1 K(·), then
JX{
t--->O
-1['. (9.29)
This shows that Condition (C.4) of Franke and HardIe (1992) follows from
Condition (C.7) above, which is a restatement of their Condition (C.3).
Hence, Theorem 9.2 follows from Theorem 1 of Franke and HardIe (1992),
in view of (9.29) and in view of the fact that convergence in Mallow's
metric implies weak convergence. D
Note that, in view of (9.22), the optimal bandwidth h~ satisfies the relation
h~ n- 1 / 5 [(21[')-1 J CXJ
(9.32)
:RMsE(w; h)
j=-no
J=-no
(9.34)
Thus, one may find the FDB estimator of the optimal bandwidth h~ by
equivalently minimizing the explicit expression (9.34). The following result
shows that h~ is consistent for h~. Furthermore, the estimated criterion
function at h~ attains the optimal theoretical RMSE level over the set H n ,
asymptotically, in probability.
Theorem 9.3 Assume that the conditions of Theorem 9.2 hold and that
f"(w) #- 0 and 0 < Co < 0- 1 . Then, for h~ and h~, respectively defined by
(9.30) and (9.33),
(i) n1/5(h~ - h~) ---+p 0 as n ---+ 00 ,
9.4 A Modified FDB 235
9.4 A Modified FD B
In this section, we describe a modified version of the FDB based on the
work of Kreiss and Paparoditis (2003). The modified FDB removes some
of the limitations of the FDB and provides valid approximations to the
distributions of a larger class spectral mean estimators than the class of ra-
tio statistics (cf. Section 9.2). Furthermore, the modified FDB continues to
provide a valid approximation in the spectral density estimation problems
considered above. Let {XdiEZ be a causal linear process, given by
=UL
00
Xi aj(i-j, i E Z , (9.35)
j=O
random variables with E(t < 00. Also, let InO denote the periodogram of
Xl"'" X n , defined by (9.1), i.e.,
n 2
In(w) = (21fn)-11 LXtexP(-LWt)1 ' w E ~-1f,1fl , (9.36)
t=l
and let fO denote the spectral density of {XihEZ, It is known (cf. Priestley
(1981), Chapter 6) that at the discrete frequencies Ajn == 21fj/n, 1 ::; j ::;
no,
(9.37)
and
if j=l=k
(9.38)
for all 1 ::; j, k ::; no, where no = Ln/2J and where ~4 = (E(t - 3) de-
notes the fourth cumulant of the innovation (1. Thus, if ~4 =1= 0, then the
periodogram at distinct ordinates Ajn and Akn have a nonzero correlation
and are dependent. Although the dependence of the periodogram values
In(Ajn) and In(Akn) vanishes asymptotically, the aggregated effect of this
dependence on the limit distribution of a spectral mean estimator may
not be negligible. Indeed, as noted in Section 9.2 (cf. (9.6)), for the spec-
J
tral mean A(~; f) == o71: U and its canonical estimator A(~; In) == o71: ~In J
236 9. Frequency Domain Bootstrap
(9.39)
9.4.1 Motivation
We now describe the intuitive reasoning behind the formulation of the
modified FDB. Let {Yi};EZ be a stationary autoregressive process of order
p, fitted to {Xi}iEZ by minimizing the distance E(Xi - ~~=l (3jXi - j )2 over
(31, ... ,(3p. Write '"'((k) = CoV(Xl,X Hk ), k E Z. Then, the {Yi};Ez-process
is given by
p
- - -, -1 -2 -, - - .
where (3 ((31, ... ,(3p) = fp '"'(p, up = '"'((0) -(3fp(3 and {(diEZ IS a
sequence of iid random variables with E(l = 0 and E(f = 1. Here, f p
is the p x p matrix with (i,j)th element '"'((i - j), 1 :s; i, j :s; p and
'"'(p = ('"'((1), ... ,'"'((p))'. As '"'((k) -+ 0 as Ikl -+ 00, by Proposition 5.1.1
of Brockwell and Davis (1991), for every pEN, f;l exists. For the rest of
this section, suppose that E(t < 00. Let
P 2
fAR(W) = 0-;/1 1 - L;3j exp(-~jw)1 ' wE [-1T,1T] (9.41 )
j=l
(9.42)
9.4 A Modified FDB 237
Note that the periodogram I:;R of the fitted autoregressive process satisfies
relations (9.37) and (9.38) with f replaced by fAR and K:4 replaced by
k4 = (E(t - 3), the fourth cumulant of (1. As a result, by (9.42) and
(9.43), it follows that the variables Wn(Ajn)'S satisfy
and
for all 1 :::; j, k :::; no. Thus, the covariance structure of Wn (Ajn), 1 :::; j :::; no
closely mimics that of the periodogram variables In (Ajn), 1 :::; j :::; no,
provided k4 is close to K:4. The modified version of the FDB, proposed by
Kreiss and Paparoditis (2003), fits an autoregressive process empirically
and replaces the multiplicative factor qU by a data-based version. In the
next section, we describe the details of this modified FDB method, known
as the autoregressive-aided FDB (or the ARFDB) method.
where A(~; In) and A(~; f) are as in relation (9.39), i.e., A(~; In) = fo7r Un
and A(~; f) = fo7r U for a given function ~ : [0, 'if] --+ lR of bounded varia-
tion. Extensions to the vector-valued case is straightforward and is left out
in the discussion below.
The basic steps in the ARFDB are as follows:
Step (I): Given Xl' ... ' X n , fit an autoregressive process {YihEZ ==
2
{Yin}iEZ of order P (== Pn). Let ((3ln, ... , (3pn) and (In denote the esti-
A A
(9.44)
Step (II): Generate the bootstrap variables Xi, ... ,X~ from the autore-
gression model
P
Xi = L OjnXi_ j + an . C, i E fZ , (9.46)
j=l
(9.49)
The ARFDB version of the centered and scaled spectral mean estimator
Tn = Vn(A(~; In) - A(~; f)) is given by
(9.50)
9.4 A Modified FDB 239
where B(~; I~) = J07l" ~(w)I~(w)dw and B(~; in) = J07l" ~(W)in(w)dw, with
in(w) == iin(w)in,AR(W), W E [-1f,1f]. As an alternative, the integrals in
the definitions of B(~; I~) and B(~; in) may be replaced by a sum over
the frequencies Pjn : 1 :::; j :::; no} as in the case of the FDB of Section
9.2. The conditional distribution of T~ given X!, ... , Xn now gives the
ARFDB estimator of the distribution of Tn.
Remark 9.1 One may use alternative methods for estimating the
parameters /31,"" /3p and 0'2 in Step (I) of the ARFDB. However, an
advantage of using the Yule-Walker equations to estimate the parameters
/31, ... ,/3p and 0'2 of the fitted autoregression model in Step (I) is that
all the roots of the polynomial 1 - E~=l /3jnzj lie outside the unit circle
{z E C : Izl :::; I} and hence, the spectral density function in,AR(-) of Step
(IV) is well defined.
Remark 9.2 In practice, one generates the variables Xi, ... , X~ from the
"estimated" autoregression model (9.46) by using the recursion relation
(9.46) with some initial values Xi-p, ... ,Xo and running the chain for a
long-time until stationarity is reached (cf. Chapter 8). Kreiss and Paparodi-
tis (2003) also point out that the order p of the fitted model may be chosen
using some suitable data-based criteria, such as, the Akaike's Information
Criterion.
For establishing the validity of the ARFDB, we shall make use of the
following conditions, as required by Kreiss and Paparoditis (2003).
Conditions:
(C.9) The linear process {XihEZ of (9.35) is invertible and has an infinite
order autoregressive process representation
00
Xi = L/3jXi - j + O'(i' i E Z
j=l
where E~l j1/21/3jl < 00 and 1 - E~l /3jzj =j:. 0 for all complex z
with Izl :::; 1.
(C.lO) {(ihEZ is a sequence of iid random variables with E(l = 0, E(f = 1,
and Ea < 00. Further, 0' E (0,00).
(C.11) The spectral density f of {XihEZ is Lipschitz continuous and satisfies
inf f(w»O.
wE [0,71"]
(C.12) (i) The characteristic function Kt(-) of the kernel K(·), given by
Kt(u) == J::'oo exp(wx)K(x)dx, is a nonnegative even function
with Kt(u) = 0 for lui> 1.
240 9. Frequency Domain Bootstrap
Theorem 9.4 shows that under suitable regularity conditions, the modi-
fied version of the FDB provides a valid approximation to a wider class of
spectral mean estimators than the standard version of the FDB, which is
applicable only to the class of ratio statistics. However, the validity of the
ARFDB crucially depends on the additional requirement of invertibility (cf.
Condition (C.1)), which narrows the class of linear processes {XihEZ to
some extent. Kreiss and Paparoditis (2003) point out that this restriction
may be dispensed with, if one modifies the FDB by fitting a finite-order
moving average model to the data instead of fitting an autoregressive pro-
cess and then by using a suitable version of the correction factor qn (-) in
Step (IV) for the moving average case. Because of these additional tuning-
up-steps involved in the autoregressive-aided or the moving-average-aided
versions, the modified FDB is expected to have a better finite sample perfor-
mance than the usual FDB, even when such modifications are not needed
for its asymptotic validity, i.e., when the methods are applied to ratio-
statistics. A similar remark applies on the finite sample performance of the
ARFDB in the spectral density estimation problems considered in Section
9.3. We refer the interested reader to Kreiss and Paparoditis (2003) for
a discussion of these issues, for guidance on the choice of the smoothing
parameters P and h, and for numerical results on finite sample performance
of the ARFDB.
10
Long-Range Dependence
10.1 Introduction
The models considered so far in this book dealt with the case where the
data can be modeled as realizations of a weakly dependent process. In this
chapter, we consider a class of random processes that exhibit long-range
dependence. The condition of long-range dependence in the data may be
described in more than one way (cf. Beran (1994), Hall (1997)). For this
book, an operational definition of long-range dependence for a second-order
stationary process is that the sum of the (lag) autocovariances of process
diverges. In particular, this implies that the variance of the sample mean
based on a sample of size n from a long-range dependent process decays
at a rate slower than O(n-1) as n ----+ 00. As a result, the scaling factor for
the centered sample mean under long-range dependence is of smaller order
than the usual scaling factor n 1/2 used in the independent or weakly depen-
dent cases. Furthermore, the limit distribution of the normalized sample
mean can be nonnormal. In Section 10.2, we describe the basic framework
and review some relevant properties of the sample mean under long-range
dependence. In Section 10.3, we investigate properties of the MBB approx-
imation. Here the MBB provides a valid approximation if and only if the
limit law of the normalized sample mean is normal. In Section 10.4, we con-
sider properties of the subsampling method under long-range dependence.
We show that unlike the MBB, the subsampling method provides valid ap-
proximations to the distributions of normalized and studentized versions
of the sample mean for both normal and nonnormallimit cases. In Section
242 10. Long-Range Dependence
10.5, we report the results from a small simulation study on finite sample
performance of the subsampling method.
We shall suppose that the auto covariance function r(·) can be represented
as
r(k) = k- a L(k), k 2 1 (10.1)
for some 0 < a < 1 and for some function L : (0,00) ---+ ffi. that is slowly
varying at 00, i.e.,
(10.3)
In spite of its simple form, this formulation is quite general. It allows the
one-dimensional marginal distribution of Xl to be any given distribution on
ffi. with a finite second moment. To appreciate why, let P be a distribution
function on ffi. with J x 2 dP(x) < 00. Set G l = p- l 0 <l> in (10.3), where
<l> denotes the distribution function of N(O, 1) and p- l is the quantile
transform of P, given by
Then,
denotes the kth order Hermite polynomial. Then, the Hermite rank q of
GO is defined as
q = inf {k EN: E( HdZl)G(Zr)) # o} . (10.5)
~
Aq/2
Jexp{L(xl + ... + x q )}
L(XI + ... + Xq)
- 1 IT IXkl(a-l)/2
k=l
dW(Xl)'" dW(xq) . (10.6)
When q = 1, Wq has a normal distribution with mean zero and variance
2c~j{(1- a)(2 - a)}, but for q ~ 2, the distribution of Wq is nonnormal
(Taqqu (1975)). For details of the representation of Wq in (10.6), and the
concept of a multiple Wiener-Ito integral with respect to the random spec-
tral measure of a stationary process, see Dobrushin and Major (1979) and
Dobrushin (1979), respectively. The complicated form of the limit distri-
bution in (10.6) makes it difficult to use the traditional approach where
large sample inference about the level-1 parameter fJ is based on the limit
distribution. In the next section, we consider the MBB method of Kiinsch
(1989) and Liu and Singh (1992) and investigate its consistency properties
for approximating the distribution of the normalized sample mean.
244 10. Long-Range Dependence
(10.8)
the definition of the bootstrap variable T~ tends to zero rather fast and
thus forces T~ to converge to a degenerate limit. Intuitively, this may be
explained by noting that by averaging independent bootstrap blocks to de-
fine the bootstrap sample mean, we destroy the strong dependence of the
underlying observations Xl, ... , Xn in the bootstrap samples. As a result,
the variance of the bootstrap sample sum nX~ has a substantially slower
growth rate (viz., bd~) compared to the growth rate d~ for Var(nXn). When
the unbootstrapped mean Xn is asymptotically normal, one can suitably
redefine the scaling constant in the bootstrap case to recover the limit law.
However, for nonnormal limit distributions of X n , the MBB fails rather
drastically; the bootstrap sample mean is asymptotically normal irrespec-
tive of the nonnormallimit law of normalized Xn . For a rigorous statement
of the result, define the modified MBB version of Tn as
Theorem 10.3 Assume that the conditions of Theorem 10.2 hold. Let O'~
be as in (10.8). Then,
if and only if q = 1.
Thus, Theorem 10.3 shows that with the modified scaling constants, the
MBB provides a valid approximation to the distribution of the normalized
sample mean Tn only in the case where Tn is asymptotically normal. The
independent resampling of blocks under the MBB scheme fails to reproduce
the dependence structure of the Xi'S for transformations G with Hermite
rank q 2 2. As a consequence, the modified MBB version T~ of Tn fails
to emulate the large sample behavior of Tn in the nonnormallimit case.
A similar behavior is expected if, in place of the MBB, other variants of
the block bootstrap method based on independent resampling (e.g., the
NBB or the eBB, are employed. Theorems 10.2 and 10.3 are due to Lahiri
(1993b). Lahiri (1993b) also shows that using a resample size other than the
sample size n also does not fix the inconsistency problem in the nonnormal
limit case, as long as the number of resampled blocks tend to infinity. As
a result, the "m out of n" bootstrap is not effective in this problem if the
number of resampled blocks is allowed to go to infinity with n. However, if
repeated independent resampling in the MBB method is dropped and only
246 10. Long-Range Dependence
10.3.2 Proofs
Define G(y) = G(y) - cqHq(y), y E lEt For i,j E Z, let Oij = 1 or 0
according as i = j or i i= j. Also, recall that x V y = max{ x, y}, x, y E JR,
N = n - £ + 1, and Ut = £-1 2.:;:(i-l)£+1 X;, 1:::; i:::; b.
Lemma 10.1 Suppose that r(-), L(·), a, and q satisfy the requirements
of Theorem 10.1. Assume that £ = O(n 1 - E ) for some 0 < E < 1, and that
£-1 = 0(1). Then,
< C(cq)(nde)-Z£Z
O(£2+ E n- Z) = 0(1) .
Therefore, by stationarity,
e-l
(Ndp)-1 2:)j - £)(Xj + Xn-j+d = op(l) .
j=1
Similarly, £ = O(n 1 - E) implies that
(Ndp)-ZE(n£Xn)2 = O(n-2£2d"izd;,) = 0(1) .
Hence, Lemma 10.1 follows. o
10.3 Properties of the MBB Method 247
a~=u~+op(l).
Proof: Define
and
Var(ain)
l(q!)4 T 2q (2 q!)-1 L
1
IT
k=l
r(mk - jk) - (q!)2r(i 1 - i2)qr(i3 - i 4 )ql '
(10.13)
where L:l extends over all ml, jl; ... ; m2q, j2q E {iI, ... , i4} such that
(a) mk -I- jk
for all k = 1, ... ,2q, and
(b) there are exactly q indices among {mk' j k : 1 ::; k ::; 2q} that are
equal to it for each t = 1,2,3,4 . (10.14)
248 10. Long-Range Dependence
Next write 2::1 = 2::11 + 2::12' where 2::11 extends over all indices {mk,jk:
1::::: k ::::: 2q} under 2::1 for which Imk - jkl = IiI -i21 for exactly q pairs and
Imk - jkl = li3 - i41 for the remaining q pairs, and where 2::12 extends over
the rest of the indices under 2::1' Clearly, for any {mk' jk : 1 ::::: k ::::: 2q}
appearing under 2::11' I1~~1 r(mk - jk) = r(il - i2)qr(i3 - i 4)q. We claim
that the number of such indices is precisely (2q!)22q(q!)-2. Hence, assuming
the claim, one gets
l(q!)4 T 2q(2q!)-1 L
1
IT
k=1
r(mk - jk) - (q!)2r(il - i2)qr(i3 - i4)ql
2q
::::: C(q) LIT Ir(mk - jk)1 . (10.15)
12 k=1
To prove the claim, note that, for any {ml' jl; ... ; m2q, j2q} under 2:: 1, if
Imk - jkl = IiI - i21 for some k 1, ... , kq E {I, ... , 2q}, then, by (10.14),
(a) Imk - jkl = li3 - i41 for all k E {I, ... , 2q}\{k1, ... , kq}, and
(b) exactly q of {mkll jk 1 ; • • • ; mk q , jk q } are ik, k = 1,2 and exactly q of
the remaining 2q integers are ik, k = 3,4.
Using this, one can check that the set of all indices {ml' jl; ... ; m2q, j2q}
under 2::11 can be obtained by first selecting a subset {k 1 , ... , k q }
of size q from {1, ... ,2q}, and then setting (mk,jk) = (i 1,i 2) or
(i2' h) for k E {k1, ... , kq} and (mk,jk) = (i3, i 4) or (i4' i3) for k E
e
{I, ... , 2q} \ {kl' ... , kq}. Hence, it follows that the number of terms un-
der 2::11 is qq ) . 2q • 2Q , proving the claim.
Next define No = ,Cnol, where <5 is any real number satisfying 0 < <5 <
e(5 - 2qa)-1 and where e is as in the statement of Theorem 10.2. Let
r(j) = Ijl-"'(1 + IL(ljl)l), j E Z, and Mn = max{l + IL(j)1 : 1 ::::: j ::::: n}.
Then, it is easy to check that for n large,
C(C q , a, q, E) (n 8a NdiJ- 1 Mn
N-1 £ l £ £
X L L LLL f(i1 - i 2)qf(i3 - i4)q
h
were I 1n = ""No
L..j=O ""l L..i2=1 ""j+£-l
L..il=l ""l L..is=j ""j+£-l
L..i4=j ""
2q
L..12 k=l 1r (mk - Jk n
. )1 . It
is now easy to see that for any i1,i 2 ,i3,i4, by (10.14)(b),
2q *
L II Ir(mk - jk)1 < C(q) L f(i1 - i2) q1 f(i1 - i3) q2 f(it - i4)q3 x
12 k=l
j+£-lj+£-l '- l
L L L L f(i1 - i2) q1 f(i1 - i3) q2 f(it - i4)q3 x
j+£-lj+£-l '-
< L L L f(i2 - i 3)q4 f(i 2 - i 4)q5 f(i3 - i4)q6 X
j+£-lj+£-l
< C(a,q)M~ L L f(i3-i4)q6((t'-i3)Vi3)aq2((t'-i4)Vi4)aq3
250 10. Long-Range Dependence
L
j+£-l
< 0(0:, q)M~q(j + €)d(ql+q3+q5) {((€ - i3) V i3)a(q2+q4)
i3=j
implying that
Var(ain) = 0(1) .
Since Earn --) (J~, (10.12) follows. This completes the proof of the lemma D
Lemma 10.3 Let WI, W 2, ... , Wn be n iid random variables with EW1 =
0, and EWr = 1. Then, for any rJ > 0, and every n E N satisfying n- 1 +
on(l) < 1,
sup Ip(W l + ... + Wn ::s; y'nx) - <I> (x) I
xElR
::s; (1 + ,8n)On(l) + 22[rJ + on(rJ)l,8~ ,
where On (a) == EWrn(IW11 > ay'n), a> 0, and,8n == 11-n-l-on(I)I-1/2.
Proof: Define Wi = Win(IWil ::s; Vn), and Vi = Wi - EWi , 1 ::s; i ::s; n.
Then, it is easy to check that
< sup
xElR
Ip(V + ... + Vn ::s; y'nx(EV12)1/2) - <I> (x) I
l
Proof of Theorem 10.2: Let G-~ == g2 E. (Ui -/In)2 / d;. Then, by Lemma
10.3 with TJ = b- 1 / \
Now, using arguments similar to those used in the proof of Lemma 10.2, one
can show that E(2::;=l Hq(Zi))4 = O(di). Hence, with Sf = 2::;=1 CqH(Zi),
0(1) .
Proof of Theorem 10.3: (i) follows from (10.7). As for (ii), the "if"
part follows from (i) and Theorem 10.1. To prove the converse, suppose
q #- 1. Then, by Theorem 10.1, n(Xn - fL)/d n converges in distribution to
a nonnormallimit while by (i), t~ is asymptotically normal, implying the
"only if" part. 0
for some constant C(a) E (0, (0). Conversely, if (10.20) holds for a slowly
varying function L(·), then rC) admits the representation (10.1) with (with-
out loss of generality) the same function L(·). Thus, the requirement (10.20)
on the spectral density functiong(·) and the condition (10.1) on the auto co-
variance function r(·) are equivalent. Because g(.) is a symmetric function,
the Fourier series of log g(.) is a pure cosine series. Replacing each cosine
function by the corresponding sine function, we obtain the harmonic conju-
gate of logg(·). Let 10ggC) denote the ha~onic conjugate of logg(·). The
key regularity condition on g(.) is that logg(·) is continuous on [-7r,7r].
10.4 Properties of the Subsampling Method 253
While log 9 ( .), being unbounded in every neighbor hood of the origin, is
not continuous on [-n, n J, an appropriately chosen branch of log 9 can be
continuous on [-n, nJ. The following result on On is due to Hall, Jing and
Lahiri (1998).
Theorem 10.4 Suppose that {XihEZ is generated by relation (10.3) and
that the function G has Hermite rank q E N. Also, suppose that
(i) g(x) = Ixl a - 1 L 1 (lxl), ° °
< Ixl :::; n where < a < q-1 and where
L1 (-) is slowly varying at 0 and of bounded variation on every closed
subinterval of [0, nJ;
°
Then,
sup IOn(X) - Qn(X)1 ----+p as n --+ 00 . (10.21)
xElR.
and let
(10.23)
where Sim == (Xi + ... + Xi+m-l)/m, m ~ 1, i ~ 1. Under the condition
of Theorem 10.5 below, e;'/[Var(nXn )] ---+ 1 in probability as n ---+ 00. We
use n-len for scaling the sample mean Xn and define the "studentized"
sample mean
(10.24)
Let Qln(X) = P(Tln ~ x), x E lR denote the distribution function of T ln . To
define the subsampling estimator of Qln(-) based on blocks of length f, let
eu denote the subsample version of en, obtained by replacing {Xl,"" Xn}
and n in the definition of en by {Xi,"" XiH-l} and e, respectively. In
particular, this requires replacing mk == mkn by mkf, k = 1,2. Let TU,i =
(Su - eXn)/eu, 1 SiS N denote the subsample copies of T ln . Then, the
subsampling estimator of Qln is given by
N
Qln(X) == Qln(Xjf) = N- l L n(TU,i ~ x), x E lR . (10.25)
i=l
Theorem 10.5 Assume that the conditions of Theorem 10.1 hold for some
q E N, that ml and m2 satisfy (10.22), that n€f-l + n-(l-€)f = 0(1) for
some E E (0,1) and that
(10.26)
Then,
Theorem 10.5 shows that the empirical device yields a consistent esti-
mator of the variance of the sample sum nXn = L~l Xi for all values of
the Hermite rank q ~ 1. Furthermore, the subsampling method provides
a valid approximation to the distribution of the studentized sample mean
T ln for both normal and nonnormallimits of T ln . Consequently, we may
use Theorem 10.5 to set approximate confidence intervals for the level-1
parameter f..t that attain the nominal coverage levels asymptotically, for all
q ~ 1. An advantage of this approach over the traditional large sample
theory is that the subsampling confidence intervals may be constructed
without making explicit adjustments for the Hermite rank q and without
estimating the covariance parameter Q. For 'Y E (0,1), let q"( denote the
10.4 Properties of the Subsampling Method 255
LN,),J-th order statistic of the subsample copies TU,i, 1:S i:S N. Then, an
approximate (1 - ')')100% two-sided equal-tailed confidence interval for jJ,
is given by
In(')') = Xn - q1-! . --;:' Xn - q! . --;: .
A ( - A en - A en ) (10.27)
10.4·3 Proofs
Proof of Theorem 10.4: By Theorem 5.2.24 of Zygmund (1968), r(k) '"
k-OI.L 1 (1/k) as k ----> 00, and hence, by Theorem 10.1,
(10.28)
(10.29)
In view of (10.28) and (10.29), this implies that .e(Xn - jJ,)/dt = op(l).
Hence, it is enough to show that
where, for simplicity of notation, we have set f.l = 0 in the last line. Now
by Theorem 5.5.7 of Ibragimov and Rozanov (1978), the second term on
the right side of (10.31) tends to zero. Hence, Theorem lOA is proved. D
Lemma 10.4 Suppose that the function G (.) has Hermite rank q EN, that
0<0: < q-1 and that n cg- 1 + n-(l-c)g = 0(1) for some E E (0,1). Then,
for any <5 E (0,1), a, bEN,
as n ----+ 00
Proof of Lemma 10.4: The proof of Lemma lOA is somewhat long and
hence, is omitted. We refer the interested reader to the proof of relation
(4.8), page 1201-1202 of Hall, Jing and Lahiri (1998) for details. D
Ele;. - e;.1
L m[E(Xn -
AI 1/2
< 4M- 1 f.l)2{ E(Sim - mf.l)2 + m 2 E(Xn - f.l)2}]
i=l
Ie;;, - e;;, I
f/'
E
of (10.24). For this, as in Hall, Jing and Lahiri (1998), we generated station-
ary increments of a self-similar process with self-similarity parameter (or
Hurst constant) H = !(2 - a), and took a suitable linear transformation
of these data to produce a realization of a long-range dependent process
with Hermite rank q. The details of the relevant steps are as follows:
where a = 2 - 2H E (0,1). Note that the r(k) 's are the autoco-
variances of the stationary increments of a self-similar process with
self-similarity parameter H (cf. Beran (1994), p. 50).
258 10. Long-Range Dependence
Step 4: Define Xi = Hq(Zi), for i = 1, ... ,n, where Hq is the qth Her-
mite polynomial. Then Xn = {Xl"'" Xn} is a long-range dependent
series with Hermite rank q.
with e = 0.8.
Results from the simulation study are summarized in Table 10.1. In the
table, the headings "Lower" and "Upper" represent coverage probabilities
of the lower and the upper 90% one-sided confidence intervals, respectively,
while q denotes the Hermite rank. The formula for the 100(1 - a)%, 0 <
a < 1, lower and upper confidence limits are, respectively, given by
(10.34)
and
(10.35)
where t{3 is the {3-quantile of the subsampling estimator Qln, 0 < {3 < 1, and
SIn = I:~=l Xi· It appears that for a = 0.5,0.9, the choice e = 2y'n leads to
more accurate results while for a = 0.1, e = 0.5n l / 2 works better. For each
value of a = 0.1,0.5,0.9 and for each value of q = 1,2,3, coverage accuracy
with these "optimal" choices of the subsampling parameter e increases with
the sample size. Interestingly, the Hermit rank q E {I, 2, 3} seems to have
little effect on accuracy of the subsampling method. We also repeated the
e
whole simulation study with = 0.9 in the definitions of ml = Ln(l+O)/2J
and m2 = LnoJ, as considered by Hall, Jing and Lahiri (1998). The choice
e = 0.8 had slightly better performance than e = 0.9 for the combinations
of the factors q, a, nand e considered here.
10.5 Numerical Results 259
TABLE 10.1. Coverage probabilities of 90% lower and upper confidence limits,
given by (10.34) and (10.35) respectively, based on K = 1000 simulation runs.
Here n denotes the sample size, .e is the length of subsamples, q denotes the
Hermite rank of {XdiEZ and a is as in (10.33).
(a): £ = ~ . n 1 / 2
a =0.1 a = 0.5 a = 0.9
q Lower Upper Lower Upper Lower Upper
1 87.0 86.1 95.0 93.1 97.0 94.2
n = 100 2 91.9 94.2 99.2 95.9 99.6 93.9
3 96.3 93.3 98.0 97.0 97.5 95.7
1 89.0 88.8 94.9 95.4 95.7 95.8
n = 400 2 90.0 96.6 98.0 95.5 99.1 93.5
3 95.1 93.5 96.9 96.7 97.0 95.8
1 91.7 92.0 96.8 95.8 97.6 96.6
n = 900 2 91.5 96.5 99.2 97.5 98.9 95.1
3 97.6 97.2 97.5 97.5 98.1 97.1
(b): £ = n 1 / 2
a = 0.1 a= 0.5 a =0.9
q Lower Upper Lower Upper Lower Upper
1 82.0 81.2 91.1 89.7 93.1 90.9
n = 100 2 81.6 91.9 95.7 93.9 96.6 91.4
3 90.6 87.4 93.7 92.8 92.8 91.4
1 87.6 86.6 93.4 93.2 94.2 94.0
n = 400 2 84.1 95.4 95.7 94.8 97.1 92.3
3 92.1 90.0 94.6 94.9 95.3 94.7
1 87.5 88.2 93.9 93.3 94.2 93.3
n = 900 2 84.9 94.4 96.9 94.9 97.0 92.7
3 92.7 93.5 95.2 95.3 94.5 94.3
(c): £ = 2n 1 / 2
a =0.1 a= 0.5 a = 0.9
q Lower Upper Lower Upper Lower Upper
1 78.7 76.0 89.4 85.1 90.8 86.9
n = 100 2 76.2 88.4 90.0 91.0 92.2 88.7
3 86.2 81.5 90.6 88.3 89.5 87.8
1 82.9 81.4 90.5 89.2 92.0 91.7
n = 400 2 77.8 92.9 91.4 92.2 93.2 89.4
3 87.5 85.6 92.3 90.2 91.8 90.3
1 82.6 82.2 89.9 90.3 91.3 92.0
n = 900 2 79.3 91.7 93.5 92.2 93.8 91.1
3 88.7 89.0 93.0 91.7 91.8 90.6
11
Bootstrapping Heavy-Tailed Data and
Extremes
11.1 Introduction
In this chapter, we consider two topics, viz., bootstrapping heavy-tailed
time series data and bootstrapping the extremes (i.e., the maxima and the
minima) of stationary processes. We call a random variable heavy-tailed if
its variance is infinite. For iid random variables with such heavy tails, it
is well known (cf. Feller (1971b), Chapter 17) that under some regularity
conditions on the tails of the underlying distribution, the normalized sample
mean converges to a stable distribution. Similar results are also known for
the sample mean under weak dependence. In Section 11.2, we introduce
some relevant definitions and review some known results in this area. In
Sections 11.3 and 11.4, we present some results on the performance of the
MBB for heavy-tailed data under dependence. Like the iid case, here the
MBB works if the res ample size is of a smaller order than the original
sample size. Consistency properties of the MBB are presented in Section
11.3, while its invalidity for a resample size equal to the sample size is
considered in Section 11.4.
In Sections 11. 5-11. 7, we consider the extremes of stationary processes.
This is another classic example where the "fewer than n" resampling works
better. In Section 11.5, we review some relevant definitions and results on
extremes of dependent data. Results on bootstrapping the extremes are
presented in Sections 11.6 and 11.7 respectively for the cases where the
normalizing constants are known and where they are estimated.
262 11. Bootstrapping Heavy-Tailed Data and Extremes
(11.1)
Ma(A) = Co [p r
J(O,oo)nA
x1-adx + q 1
(-oo,O)nA
IxI1-adx] , A E 8(JR) ,
°
for some constants Co E (0, (0), p?: 0, q ?: with p + q = 1.
Next, let {Xn}n>l be a sequence of iid random variables with common
(11.2)
(11.3)
(11.11)
where p(.) denotes the p-mixing coefficient of the process {Xn }n2:1. Recall
that for n 2: 1, we define
The main result of this section says that bootstrap approximations to the
distribution of Tn are valid along every subsequence ni for which Tni --+d
W o , provided k --7 00 and the resample size m = o(n) as n --7 00.
Theorem 11.2 Suppose that 1 ::::: C « n and that k --7 00 such that m ==
kC = o( n) as n --7 00. Also, suppose that the conditions of Theorem 11.1
hold for some subsequence {ndi>l. If the subsequence {mni : i 2': I} is
contained in {ni : i 2': I} and k;;!/2(mninil)[an)amn] --7 0 as n --7 00,
then '
Q(r m ni ,ni , r ni) --+p 0 as i ----7 00 ,
r
where m,n(X) = P*(T';',n ::::: x) and r n(x) = P(Tn ::::: x), x E lR, n 2': 1, and
Qis a metric that metricizes the topology of weak convergence of probability
measures on (lR, B(lR)).
Thus, Theorem 11.2 asserts the validity of the MBB approximation along
every subsequence for which the limit distribution of the normalized sum
266 11. Bootstrapping Heavy-Tailed Data and Extremes
When the scaling constants {an}n>l in (11.13) are unknown, we may in-
stead consider a "studentized" statistic of the form Tin == an(Sn - nIL),
where an is an estimator of an satisfying
(11.16)
where the same data-based scaling sequence {an}n>l that appears in the
definition of Tin is also used to define the bootstrap version of Tin. Write
Om.n and Om,n for the conditional distributions of T{.m,n and T{,m.n re-
spectively. Also, let Gn(x) = P(Tln ~ x), x E lR. Then, using Lemma
4.1, it is easy to show that if the conditions of Theorem 11.2, (11.13), and
(11.16) hold, then
(11.17)
provided, for every E > 0,
On the other hand, consistency of the "hybrid" estimator Om,n holds with-
out the additional condition (11.18). Indeed, under the conditions of The-
orem 11.2, (11.13), and (11.16), it is easy to show that
(11.19)
Both (11.17) and (11.19) can be used to construct bootstrap confidence in-
tervals for IL when the scaling constants {an}n;;'l are not completely known.
268 11. Bootstrapping Heavy-Tailed Data and Extremes
Let tm(rr) and tm(rr) denote the 'Y-quantile (0 < l' < 1) of Gm,n and Gm,n,
respectively. Then, a (1 - 1') equal-tailed two-sided bootstrap confidence
interval for p, is given by
where Wa has distribution Fa. It is well known (cf. Feller (1971b), Chapter
17) that in this case, the tails of F must satisfy the growth conditions:
where Wa has characteristic function (11.1) with c = +00 and with the
canonical measure MaO of (11.2) with Co = a, i.e., Wa has the charac-
teristic function
(11.25)
with
(11.26)
(ii) for every disjoint collection of sets AI"'" Ak E B(JR.), 2 ::::; k <
00, the random variables N(Al)"'" N(Ak) are independent Poisson
random variables with respective means Aa(Al)"'" Aa(Ak), i.e., for
anYXl,X2, ... ,Xk E {0,1,2, ... },
(11.27)
Note that as a consequence of the "inversion formula" (cf. Chow and Teicher
(1997)), €O uniquely determines the probability measure t.
With this, we
have the following result.
Theorem 11.3 shows that, with the resample size m = n, the MBB es-
timator t n,n(x) converges in distribution to a random variable t(x) for
every X E JR. and, hence, is an inconsistent estimator of the nonrandom
level-2 parameter r n(x). Indeed, for any real number x, if n is large, the
bootstrap probability t n,n(X) behaves like the random variable t(x), hav-
ing a nondegenerate distribution on the interval [0,1], rather than com-
ing close to the desired target r n(x) or to the nonrandom limiting value
r a(x) == P(Wa ::::; x) = lim r n(x). From a practical point of view, this
n-+oo
11.5 Extremes of Stationary Random Variables 271
Theorem 11.4 Suppose that there exist constants an > 0 and bn E JR such
that
(11.28)
for some nondegenerate random variable V. Also suppose that Condition
D(u n ) is satisfied for Un = anx + bn , n ~ 1, for each x E JR. Then, the
distribution function of V is of the type of one of the following distribution
functions:
(I) A(x) = exp( -e- X ), x E JR ;
- { 0 if x:::; 0
(II) <I>a(x) = exp( -x-a) if x> 0
for some a> 0;
(11.29)
the distribution function of V is G. In the iid case, a set of possible choices
of the constants {a n }n2'l and {b n }n2'l for the three extremal classes are
given by (cf. Gnedenko (1943), de Haan (1970)):
lim
n->oo
n(1 - F(Un(T))) = T for all T E (0,00) , (11.31 )
11.5 Extremes of Stationary Random Variables 273
When the extremal index () > 0, both Xn:n and its iid counterpart Xn:n
have extremal limit distributions ofthe same type. However, for () = 0, Xn:n
and Xn:n .may have different asymptotic behaviors. Here, we shall restrict
our attention only to the case () > 0, covered by the following result.
Theorem 11.5 Suppose that the sequence {Xn}n~l has extremal index
() > °
and that F E V(G) for some extreme value distribution G. Let
{an}n~l and {bn}n~l be the sequences of constants specified by (11.30) for
the class containing G. Then
°
tions of the normalized maxima are of the same type but not necessarily
identical. When < () < 1, the two limit laws are related by a nontrivial
linear transformation in the sense that if a;;: 1 (Xn:n - bn ) -+d V, then
a;;:l(Xn:n - bn ) --,>d [aV + b] for some (a, b) #- (1,0). Furthermore, the
°
values of (a, b) depend on (). Thus, for < () < 1, the limit distribution
in the dependent case is different from that in the iid case, and the effect
of the dependence of {Xn }n>l shows up in the limit through the extremal
index (). In contrast, when () = 1, both Xn:n and Xn:n have the same limit
distribution. In this case, the effect of the dependence of {Xn}n~l vanishes
asymptotically. This observation has an important implication regarding
validity of the bootstrap methods for dependent random variables. In the
next section, we shall show that with a proper choice of the resampling
274 11. Bootstrapping Heavy-Tailed Data and Extremes
size, the MBB provides a valid approximation for all () E (0,1], while the
IID-bootstrap method of Efron (1979) is effective only in the case () = 1.
Because of the special role played by the case () = 1, we now briefly
describe a general regularity condition on the sequence {Xn}n;:::l that leads
to the extremal index () = 1.
lim limsup n [
k---+-oo n~oo
L
2$j$n/k
P(XI > Un, Xj > un)] = °.
To get some idea about the class of processes for which Condition D' (un)
holds, suppose that {Xn}n;:::l are iid and that nP(X I > un) = 0(1). Then
it is easy to check that Condition D'(u n ) holds. However, condition D'(u n )
need not hold for a sequence {Un}n;:::l with nP(X I > un) = 0(1), even
when {Xn}n>l are m-dependent with m ~ 1. The following result shows
that Xn:n and Xn:n have the same limit law when Condition D'(u n ) holds.
Theorem 11.6 Suppose that a;;: I (Xn:n - bn ) ---+d V for some constants
an > 0, bn E JR, n ~ 1 where V is a non degenerate random variable. Also,
suppose that Conditions D(u n ) and D'(u n ) hold for all Un == anx + bn ,
n ~ 1, X E JR. Then,
In the next section, we describe properties of the MBB and the IID
bootstrap of Efron (1979) for stationary random variables under Conditions
like D(u n ) and D'(u n ).
(11.34)
11.6 Results on Bootstrapping Extremes 275
It is clear that for all n 2:: 1, a(n) ::; a(n), the strong-mixing coefficient
of the sequence {Xn }n> 1. Here we shall require that a( n) decreases at a
polynomial rate as n ---t 00. The following result proves the validity of the
MBB approximation.
Theorem 11. 7 Suppose that {Xn }n> 1 is a stationary process with ex-
tremal index () E (0,1] (as defined in Definition 11.4) and that a(r) ::; r- Tl ,
r 2:: 1, for some 'r/ > 0. Further suppose that (11.28) holds and the MBB
block size variable e and the number of resampled blocks k satisfye = ln E J,
°
k = lnoJ for some < E < 1, 0< Ij < min{E, I;E}. Then,
Proof: Follows from Theorem 3.4 and Corollary 3.1 of Fukuchi (1994),
by noting that his mixing coefficient aj (u) is bounded above by the
coefficient a(j) of (11.35) for all j ~ 1, u E IR. 0
SUp{IP(XjEIj , jEAUB)-
P(Xj E I j , j E A) . P(Xj E Ij , j E B)I :
11.7 Bootstrapping Extremes With Estimated Constants 277
Then, for m = n,
P*(V';:n ~ x) - td exp( -r x)
for every x E JR., where rx is a Poisson random variable with the mean
-log [P(V ~ x)].
Thus, under the conditions of Theorem 11.9, the bootstrap distribution
function at any given x E JR., being a random variable with values in the
interval [0, 1], converges in distribution to a nondegenerate random variable
exp( - r x). As a consequence, when the resample size m equals n, the re-
sulting bootstrap estimator of the target probability P(Vn ~ x) fluctuates
around the true value even for arbitrarily large sample sizes. Like the heavy-
tail case, a similar behavior is expected of the MBB even when the block
length £ ---+ 00, if the resample size m grows at the rate n, i.e., if m '" n.
However, a formal proof of this fact is not available in the literature.
We conclude the discussion of the asymptotic properties of the IID boot-
strap of Efron (1979) by considering the case where {Xn}n?:l has an ex-
tremal index () E (0,1). In this case, Fukuchi (1994) (cf. p. 47) shows
that under regularity conditions similar to those of Theorem 11.8, for
m = o(nl/2),
while
P(Vn ~ x) -t exp ( - ()')'(x))
for each x E JR., where ')'(x) == lim n[1 - P(XI ~ anx + bn )]. Thus, even
n--->oo
with a resample size m that grows at a slower rate than the sample size
n, the IID resampling scheme of Efron (1979) fails. As explained earlier,
the reason behind this is that the value of () E (0,1) is determined by the
joint distribution of the Xi'S, but when a single observation is resampled
at a time, this information is totally lost. As a consequence, the limit dis-
tribution of the IID bootstrap version Vr:;"n of Vn coincides with the limit
distribution of the normalized sample maximum Vn of the associated iid
sequence {Xn}n?:l.
(11.39)
and
(11.40)
for some constants ao E (0,00) and bo E R Here, we do allow the possibility
that bn or an be a function of a population parameter and be nonrandom,
so that the bootstrap approximation may be used to construct inference
procedures like tests and confidence intervals for the parameter involved.
For example, we may be interested in setting a confidence interval for the
upper endpoint Mp == sup{x : F(x) < 1} of the distribution function F
of Xl when F E D(~a) (cf. Theorem 11.4). In this case we would set
bn = Mp and replace the corresponding scaling constant an = Mp -
F- l (l-l/n) of (11.30) by a random scaling constant an that is a suitable
function of Mp and the empirical quantile function F;;l. Then, we may
apply the MBB to the pivotal quantity Vn == a;;: 1 (Xn:n - MF) and construct
bootstrap confidence intervals for the parameter M p . In general, consider
the normalized sample maximum with "estimated" constants
(11.41)
Let a;", b;" be some suitable functions of the MBB sample {Xi, ... , X;;'},
based on blocks of length £, and of the data Xl' ... ' X n , such that for every
E > 0,
The following result shows that both Vi:m,n and Ve7m,n provide a valid
approximation to the distribution of Vn .
Theorem 11.10 Suppose that the conditions of Theorem 11.7 hold and
that relations (11.39) and (11.40) hold. Then,
sup Ip*(Ve7m,n ::; x) - P(Vn ::; x)1 -----+p 0 as n --+ 00 . (11.45)
x
Proof: We consider (11.46) first. Note that by (11.39) and (11.40) and
Slutsky's theorem,
d 1
Vn a O (V - bo ) as n (11.47)
A
-----+ -+ 00
p(1 :: -
m
ao 11 > E I Xoo) + p(1 bma~ b;" + ao1bol > E I Xoo)
m
--+ 0 in probability as n --+ 00 . (11.49)
Hence, by Lemma 4.1, (11.46) follows from (11.47)-(11.49) and Theorem
11.7.
Next consider (11.45). Because am and bm are Xoo-measurable, with
a;" = am and b;" = bm in (11.42), for any E > 0, we get
the left side of (11.42)
= n(la~lam - aol > E) + n(la~l(bm - bm) - bol > E) ,
which goes to zero in L1 and, hence, in probability as n --+ 00, by (11.39)
and (11.40). Hence, (11.45) follows from (11.46). D
A similar result may be proved for the lID bootstrap of Efron (1979)
under the regularity conditions of Theorem 11.8. Theorem 11.10 and its
analog for the lID bootstrap in the "unknown normalizing constant" case
may be used for statistical inference for dependent random variables. For
results along this line for independent data, see Athreya and Fukuchi (1997)
who apply the lID bootstrap of Efron (1979) to construct CIs for the end-
points of the distribution function F of Xl, when the random variables
Xn'S are iid. For results on bootstrapping the joint distribution of the sum
and the maximum of a stationary sequence, see Mathew and McCormick
(1998).
12
Resampling Methods for Spatial Data
12.1 Introduction
In this chapter, we describe bootstrap methods for spatial processes ob-
served at finitely many locations in a sampling region in ]Rd. Depending
on the spatial sampling mechanism that generates the locations of these
data-sites, one gets quite different behaviors of estimators and test statis-
tics. As a result, formulation of resampling methods and their properties
depend on the underlying spatial sampling mechanism. In Section 12.2,
we describe some common frameworks that are often used for studying
asymptotic properties of estimators based on spatial data. In Section 12.3,
we consider the case where the sampling sites (also referred to as data-sites
in this book) lie on the integer grid and describe a block bootstrap method
that may be thought of as a direct extension of the MBB method to spatial
data. Here, some care is needed to handle sampling regions that are not
rectangular. We establish consistency of the bootstrap method and give
some numerical examples to illustrate the use of the method. Section 12.4
gives a special application of the block resampling methods. Here, we make
use of the resampling methods to formulate an asymptotically efficient least
squares method of estimating spatial covariance parameters, and discuss its
advantages over the existing estimation methods. In Section 12.5, we con-
sider irregularly spaced spatial data, generated by a stochastic sampling
design. Here, we present a block bootstrap method and show that it pro-
vides a valid approximation under nonuniform concentration of sampling
sites even in presence of infill sampling. It may be noted that infill sam-
282 12. Resampling Methods for Spatial Data
for some prediction problems treated in Section 12.6.2, the sampling region
R == Rn in all other sections becomes unbounded as n increases to infinity.
We conclude this section with a description of the structure of the sam-
pling regions R n , n 2: 1. Let R c (-!, !]d be an open connected set con-
taining the origin and let Ro be a prototype set for the sampling regions
such that R C Ro c cl.(R), where cl.(R) denotes the closure of the set R.
Also, let {An}n~l C [1,00) be a sequence of real numbers such that An loo
as n ~ 00. We shall suppose that the sampling region Rn is obtained by
"inflating" the prototype set Ro by the scaling constant An as
(12.1)
Because the origin is assumed to lie in R o, relation (12.1) shows that the
shape of the sampling region remains unchanged for different values of n.
Furthermore, this formulation allows the sampling region Rn to have a
wide range of (possibly irregular) shapes. Some examples of such regions
are spheres, ellipsoids, polyhedrons, and star-shaped regions. Here we
call a set A C ]Rd containing the origin star-shaped if for any x E A, the
line joining x and the origin lies in A. As a result, star-shaped regions
can be nonconvex. To avoid pathological cases, we shall suppose that the
prototype set Ro satisfies the following boundary condition:
(12.2)
284 12. Resampling Methods for Spatial Data
N rv vol.(Ro) . A~ , (12.3)
where, recall that, for any Borel set A C ]Rd, vol.(A) denotes the volume
(i.e., the Lebesgue measure) of A and where for any two sequences {Tn}n~l
and {tn}n~l of positive real numbers, we write Tn rv tn if Tn/tn ----- 1 as
n ----- 00. Let
Tn = tn(Zn; 0)
be a random variable of interest, where Zn = {Z(Sl), ... , Z(SN)} denotes
the collection of observations and where 0 is a parameter. For example,
we may have Tn = VN(Zn - /L) with Zn = N- 1 2:!1 Z(Si) denoting the
sample mean and /L denoting the population mean. Our goal is to define
block bootstrap estimators of the sampling distribution of Tn.
Different variants of spatial subsampling and spatial block bootstrap
methods have been proposed in the literature. See Hall (1985), Possolo
(1991), Politis and Romano (1993, 1994a), Sherman and Carlstein (1994),
Sherman (1996), Politis, Paparoditis and Romano (1998, 1999), Politis,
Romano and Wolf (1999), and the references therein. Here we shall follow a
version of the block bootstrap method, suggested by Biihlmann and Kiinsch
(1999b) and Zhu and Lahiri (2001), that is applicable to sampling regions
of general shapes, given by (12.1).
Thus, ,Bn goes to infinity but at a rate slower than the scaling factor An
for the sampling region Rn (cf. (12.1)). Here, ,Bn gives the scaling factor
for the blocks or subregions for the spatial block bootstrap method. Let
U = [0, l)d denote the unit cube in ]Rd. As a first step, we partition the
sampling region Rn using cubes of volume ,B~. Let Kn = {k E Zd : ,Bn (k +
12.3 Block Bootstrap for Spatial Data on a Regular Grid 285
U) n R", =f. 0} denote the index set of all cubes of the form f3n(k + U) that
have nonempty intersections with the sampling region Rn. We will define
a bootstrap version of the process Z (.) over Rn by defining its version on
each of the subregions
(12.5)
For this, we consider one R",(k) at a time and for a given Rn(k), resample
from a suitable collection of subregions of Rn (called subregions of "type
k") to define the bootstrap version of Z (.) over Rn (k). Let In = {i E 7i} :
i + f3nU C R",} denote the index set of all cubes of volume f3~ in R"" with
"starting points" i E Zd. Then, {i + f3nU : i E In} gives us a collection of
cubic subregions or blocks that are overlapping and are contained in Rn.
Furthermore, for each i E In, the subsample of observations {Z(s) : s E
Zd n [i + f3n U]} is complete in the sense that the Z(·)-process is observed
at every point of the integer grid in the subregion i + f3nU.
For any set A C ]Rd, let Zn(A) = {Z(s): s E AnSn } denote the set of
observations lying in the set A, where, recall that Sn == {Sl' ... , S N} is the
set of all sampling sites in Rn. Thus, in this notation, Zn(R",) is the entire
sample Zn = {Z(Sl), ... , Z(SN)} and Zn(Rn(k)) denotes the subsample
lying in the subregion Rn(k), k E Kn. For the overlapping version of the
spatial block bootstrap method, for each k E K n , we resample one block
at random from the collection {i + f3nU : i E In}, independently of the
other resampled blocks, and define a version of the observed process on the
subregion Rn(k) using the observations from the resampled subregion. To
that end, let K == Kn denote the size of Kn and let {h : k E Kn} be a
collection of K iid random variables having common distribution
1
P(h = i) = lIn I' i E In· (12.6)
(12.7)
Note that the set [R",(k) - kf3n + IkJ is obtained by an integer translation of
the subregion Rn(k) that maps the starting point kf3n of the set (k+U)f3n
to the starting point Ik of the resampled block (h + f3nU). As a result,
Rn(k) and (h + f3nU) n [Rn(k) - kf3n + IkJ have the same shape, and the
resampled observations retain the same spatial dependence structure as the
original process Zn(Rn(k)) over the subregion R",(k). Furthermore, because
of translation by integer vectors, the number of resampled observations in
Z~(Rn(k)) is the same as that in Zn(Rn(k)), for every k E Kn.
To gain further insight into the structure of the resampled blocks of
observations Z~(Rn(k))'s in (12.7), let K ln = {k E Kn : (k + U)f3n eRn}
286 12. Resampling Methods for Spatial Data
as in the time series case (cf. Chapter 2). The overlapping block bootstrap
version Z~(Rn) of Zn(Rn) is now given by concatenating the resampled
blocks of observations {Z~(Rn(k)) : k E Kn}. Note that by our construc-
tion, the res ample size equals the sample size. Hence, the bootstrap version
of a random variable Tn == tn(Zn; fJ) is given by
(12.8)
where the same function t n (·; .), appearing in the definition of Tn, is also
used to define its bootstrap version. Here, On is an estimator of fJ, defined
by mimicking the relation between the joint distribution of Zn and fJ. For
an example, consider Tn = IN(Zn - J.L) with Zn = n- 1 2:;:'1 Z(Si). Then,
the overlapping block bootstrap version of Tn is given by
a boundary block
r-- 11
/ ~ I'--
/ if
< II
1\ ~
\ - ~
a complete block
FIGURE 12.1. The blocking mechanism for the overlapping spatial block boot-
strap method. (a) Partition of a pentagonal sampling region Rn by the subregions
Rn(k), k E lCn of (12.5); (b) a set of overlapping "complete" blocks; (c) a set
of overlapping copies of the "boundary" block shown in (a). Bootstrap versions
of the spatial process Z(·) over the shaded "complete" and the shaded "bound-
ary" blocks in (a) are, respectively, obtained by resampling from the observed
"complete" blocks in (b) and the observed "boundary" blocks in (c).
Rn} and generate K iid random variables {Jk : k E JC n } with common dis-
tribution
1
P( J 1 = j) = l.Jn I ' j E .In , (12.9)
T*(2)
n = t n (Z*(2)
n D). en )
(.LLn, , (12.10)
s E Rn,p n Zd for some pEN, where h l , ... , hp E Zd are given lag vectors
and
Rn,p == {s E]Rd : s + hI, ... , s + hp ERn} .
For example, consider the centered and scaled estimator
Tn = INn (h)ll/2(On - (}) ,
where () = Cov(Z(O), Z(h)) denotes the autocovariance of the spatial pro-
cess at a given lag h E Zd \ {O}, On = INn(h)l- l LSENn(h) Z(s)Z(s + h) -
(INn(h)l- l LSENn(h) Z(S))2 is a version of the sample auto covariance esti-
mator, and Nn(h) = {s E Zd : s, s + hE Rn}. Here, recall that, IAI denotes
the size of a set A. Then, Tn is a function of the bivariate spatial process
Y(s) = (Z(s), Z(s + h))', s E Rn,2, where the set R n,2 is given by
R n,2 = {s E]Rd : s, s + hE Rn} = Rn n (Rn - h) .
As in the time series case, the bootstrap version of such variables may be
defined by using the vectorized process {Y(s) : s E R n ,2 n Zd}.
Next we return to the case of a general p-dimensional vectorized process
Y(·). Let Tn,p = tn(Yn; (}) be a random variable of interest, where Yn =
{y(s) : s E Rn,p} and () is a parameter. To define the overlapping bootstrap
version of Tn,p, we introduce the partition {Rn,p(k) : k E Kn,p} of Rn,p by
cubes of the form (k + U)(3n, k E Zd as before, where Kn,p = {k E Zd :
(k + U)(3n n Rn,p # 0}. Next, we resample IKn,pl-many indices randomly
and with replacement from the collection In,p == {i E Zd : i+U(3n c Rn,p},
define a version of the Y-process on each subregion Rn,p(k), k E Kn,p as
before, and then concatenate the resampled blocks of Y-values to define a
version Y~ of Yn over the region Rn,p. The "blocks of blocks" version of Tn
is now given by
T~,p = tn(Y~;On) (12.11)
where On is a suitable estimator of ().
o
N
(0.2.1)
(1.1.1)
"'o
q
o ~ ________- ,________- ,________- .________ -.~
o 2 4 8
FIGURE 12.2. Plots of the isotropic variogram 2,(h; 0) of (12.12) against Ilhll for
o= (0,2,1)' (shown in solid line) and for 0 = (1,1,1)' (shown in dot-and-dash
line).
: ....
....... . ...
..........
...... .....
, ....... . ..........
.......
............. ..... ,
·:··1
o
FIGURE 12.3. Realizations of a zero mean unit variance Gaussian random field '"
with variogram (12 .12) over a 20 x 30 region on the planar integer grid for
() = (0,2,1)' (with no nugget effect).
FIGURE 12.4. Realizations of a zero mean unit variance Gaussian random field
with variogram (12.12) over a 20 x 30 region on the planar integer grid for
() = (1,1 , 1)' with nugget effect ()1 = 1.
and use all observations from 4 of these for the 4 "complete" blocks of size
8 x 8 and use suitable parts of the remaining 12 blocks for the 12 boundary
regions. For example, for the 8 x 7 region [0,8) x [8,15), we would use only
the observations lying in [i~, i~ + 8) x rig, ig + 7) if the selected block is given
by [i~, i~ +8) x rig, ig +8). Similarly, for the 2 x 8 region [-10, -8) x [-8,0),
we would use the observations lying in [i~ +6, i~ +8) x rig, ig +8) only, when
the selected block is given by [i~, i~ + 8) x rig, ig + 8). When U{(k +U)(3n :
k E Kn} #- R n , a simpler and valid alternative (not described in Section
12.3.1) is to use the complete sets of observations in all K (= 16 in the
example, for (3n = 8) resampled blocks and define the bootstrap version of
a random variable Tn = t(N; {Z(SI), ... , Z(SN)}, e) as
where {Z* (s 1), ... , Z* (s M)} is the collection of all observations in the K-
many res amp led complete blocks, and where en is an estimator of based e
on {Z(st}, ... ,Z(SN)}. However, for the rest of this section, we continue
to work with the original version of the block bootstrap method described
in Section 12.3.l.
First we consider the problem of variance estimation by the overlapping
block bootstrap method. Suppose that the level-2 parameter of interest is
given by a;;' = Var(TIn ), the variance of the centered and scaled sample
mean
_ d/2 -
TIn = An (Zn - f.L)
(note that here d = 2 and f.L = 0). To find the block bootstrap estimator
0-;;' ((3n) of the parameter a;;', note that by the linearity of the sample mean
in the observations, we can write down an exact formula for 0-;;' ((3n), as
in the time series case. For later reference, we state the formula for the
general case of a ~d-valued random field {Z(s) : S E ~d}. Let Sn(i; k)
denote the sum of all observations in the ith block of "type k," Bn(i; k) ==
[Rn(k) - k(3n +i] n [i +U(3n], for i E In == {j E 7l,d : j +U(3n eRn}, k E Kn.
Then, the spatial bootstrap estimator of a;;' is given by
e = (0,2,1)' e = (1,1,1)'
5 5.950 4.469
8 7.811 5.590
·5
"'!
true
0 bel,,-n- 5
be t,,-naS
<D
.,;
....
.,;
,,
<'!
0
0:
0
-5 0 5
- N
Zn = N- 1 L:i=l Z(Si) denote the sample mean and let ()n = H(Zn) be an
A -
co true
0 bet n_5
beta_ na B
"l
0
"':
0
C!
0
~ -2 o 2 4
rameter !3n. Then, the bootstrap version of en is given by ()~ = H(Z~) , and
the bootstrap estimator of the level-2 parameter a~ == A~Var(en) is given
by
(12.14)
12.3 Block Bootstrap for Spatial Data on a Regular Grid 295
For establishing consistency of a;, we shall assume that the random field
{Z (s) : s E ]R.d} satisfies certain weak dependence condition. The weak-
dependence condition will be specified through a spatial version of the
strong-mixing condition. For 8 1,82 C ]R.d, let
(12.16)
for a > 0, b ~ 1, ~here R(b) is the collection of all sets in ]R.d that have
a volume of b or less and that can be represented as unions of up to fb1
many cubes.
Many variants of the strong-mixing coefficient have been proposed and
used in the literature, where the suprema in (12.16) are taken over various
classes of sets 81, 8 2 . In (12.16), we restrict attention to sets 81 and 8 2
that are finite unions of d-dimensional cubes and have a finite volume. As
a result, the sets 8 1 and 8 2 are bounded subsets of ]R.d. This restriction is
important in dimensions d ~ 2. Some important results of Bradley (1989,
1993) show that a random field in ]R.d, d ~ 2 with a strong mixing coefficient
satisfying
lim a(a; 00) = 0 (12.17)
a-HX)
is also p-mixing. Thus, if one allows unbounded sets 8 1,82 in (12.16), then
random fields satisfying (12.17) necessarily belong to the smaller class of
p-mixing random fields. For more discussion of various mixing coefficients
for random fields, see Doukhan (1994).
The following result proves consistency of the bootstrap variance estima-
tors.
Theorem 12.1 Suppose that the random field {Z(i) : i E Zd} is station-
ary with EIIZ(0)11 6 +8 < 00 and with the strong mixing coefficient a(a, b)
satisfying
(12.18)
for some 8 > 0, T1 > 5d(6 + 8)/8, and 0 ::; T2 ::; TI/d. Also, suppose
that H is continuously differentiable and the partial derivatives DO. H(·),
lal = 1, satisfy a Holder's condition of order 'TJ E (0,1]. If, in addition,
f3;;1 + >..;;lf3n = 0(1), then
(12.19)
296 12. Resampling Methods for Spatial Data
For proving the result, we need the following moment bound on partial
sums of a possibly nonstationary random field that is a special case of a
result stated in Doukhan (1994).
(12.21 )
(}~ H(Z~)
H(fi,n) + L D V H(fi,n)(Z~ - fi,nt + Qin
11'1=1
H(fi,n) + L D V H(J.l)(Z~ - fi,nt + Q;n , (12.22)
Ivl=l
for the number of data sites in Rn(k), k E Kn. Then, using (12.22) and the
independence of the resampled blocks, we get
Var*(B~) = Var*(LD"H(fL)(Z~-Pn))+Q3n
1,,1=1
N- 2 Var* ( L S~(k)) + Q3n
kElC n
Note that by (12.3) and the fact that IKlnl rv vol.(Ro) . (A n lf3n)d, we
have
A~E[N-2 L E*S~(k)2]
kEIC 1n
A~IKlnl· N- 2 . E[ L W(i)f
iEi3n Unzd
A~E[N-2 L E*S~(k)2]
kEIC 2n
Next, define S~(k) and Sn(k;i) by replacing the W(i)'s in the definition
of S~(k) and Sn(k; i) by Z(i)'s. Note that Pn can be expressed as Pn =
N- 1 2::;:=1 WrnZ(sr) for some nonrandom weights Wrn E [0,1]. Then, by
Lemma 12.1, (5.11), and arguments similar to (12.26) and (12.27),
E{E*IIZ~-PnIl4}
< C(d)N- 4E{ L E*IIS~(k) - LkPnl1 4
kEiCn
+( L E*IIS~(k) - LkPnI1 2)2}
kEiCn
< C(d)N-4IKnI2 max {EIISn(k; 0) - LkJLI14
+ LkEllPn - JLI14 : k E Kn }
< C(d)N-4IKnI2,B~d
O(-\;;-2d) . (12.28)
EIQ3nl :S E{E*(Q2n)2}
+2[E{E*(Q2n)2}]1/2[N- 2E{ L E*S~(k)2}r/2
kEiCn
(12.29)
I n is the index set of a partition of Rn by cubes of sides 2f3n and for each
hE {O, 1 }d, In(h) is the subset of I n consisting of integral vectors of "type
h". For example, with h = 0, In(O) is the set of all vectors i in I n such that
all d coordinates of i are even integers. Similarly, with h = (1,0, ... ,0)"
every i E I n ((l, 0, ... ,0),) has an odd integer in its first coordinate and
even integers in the remaining (d - 1) coordinates. For j E I n , let Vn(j)
denote the sum over all [Sn(i; 0)2 - ESn(i; 0)2] such that i E [j +U]2f3n. Set
Vn(j) = 0 if In n [j +U]2f3n = 0. Note that the .(iI-distance between the re-
grouped blocks U{Bn(i;O): i E [j+U]2f3n} and U{Bn(i;O): i E [k+U]2f3n}
of "type h" for any two distinct indices j i= k E In(h) is at least Ij - kl· f3n.
Hence, using Holder's inequality and Lemma 12.1, we have
r=O
00
r=O
An inspection of the proofs of Theorem 12.1 and Theorem 3.1 (on consis-
tency of the MBB variance estimator for time series data) shows that the
consistency of the spatial block bootstrap variance estimator may be estab-
lished under reduced moment conditions by using suitable truncations of
the variables Sn(i; k)'s in the proof of (12.30) and elsewhere. However, we
avoid the truncation step here in order to keep the proof simple. It follows
that the spatial block bootstrap variance estimator is consistent whenever
the block-size parameter f3n satisfies 13;;1 + >.;;1 f3n = 0(1) as n ---+ 00. Going
through the proof of Theorem 12.1, we also see that the leading term in the
variance part of the bootstrap variance estimator, o-;(f3n) == >'~Var*(B~),
is determined by Var(>'~N-2 LkEKl n E*S~(k)2), where S~(k) is the sum
300 12. Resampling Methods for Spatial Data
of the variables 2::1"1=1 D" H(/-l)(Z(Si) - /-l)" over Si in the resampled block
Bn(h; k). As in the time series case, this term increases as the block size
parameter f3n increases. On the other hand, the leading term in the bias
part of a-~ (f3n) is determined by the difference
N-2A~E[ L E*S~(k)2] - a~
kEKn
N-2A~E[ L E*S~(k)2 + L E*S~(k)2] - a~ .
kEK 'n kEK2n
As (12.27) shows, the contribution from the boundary subregions to the
bootstrap variance estimator, viz., B2n == N-2A~ 2::kEK 2n E{E*S~(k)2}
vanishes asymptotically, at the rate O(f3n/ An) as n ----t 00. However, the
exact rate at which B 2n goes to zero heavily depends on the geometry
of the boundary of Ro and is difficult to determine without additional
restrictions on the prototype set Ro when d ~ 2. To appreciate why, note
that in the one-dimensional case, the number of boundary blocks is at most
two (according to our formulation here) and hence, is bounded. However, in
dimensions d ~ 2, it grows to infinity at a rate O([A n /f3n]d-l). As a result,
the contribution from the "incomplete" boundary blocks playa nontrivial
role in higher dimensions. In contrast, the behavior of the first term arising
from the interior blocks, viz., BIn == N-2A~E{2::kEKln E*S~(k)2}, can
be determined for a general prototype set R o, solely under the boundary
condition, Condition B.
The discussion of the previous paragraph suggests that we may settle
for an alternative bootstrap variance estimator of a!, that is based on
the "bootstrap observations" over the interior blocks {Rn(k) : k E KIn}
only. Let Nl == N 1n = IK 1n 1f3~ denote the total number of data-values
in the resampled "complete" blocks Bn(h; k), k E KIn and let Z~* be the
average of these Nl resampled values. Then, we define the bootstrap version
ofen based on the complete blocks as ()~* = H(Z~*) and the corresponding
variance estimator of a!, as
(12.31 )
In the context of applying the MBB to a time series data set of size n, this
definition corresponds to the case where we resample b = Ln/ CJ "complete"
blocks of length C and define the bootstrap variance estimator in terms of a
resample of size nl = bC only, ignoring the last few boundary values (if any)
in the bootstrap reconstruction of the chain. For the modified estimator
a-rn(f3n), we can refine the error bounds in the proof of Theorem 12.1 to
obtain an expansion for its MSE. Indeed, applying the results of Nordman
and Lahiri (2003a, 2003b) to the leading term in the variance of a-rn(f3n),
we get
12.3 Block Bootstrap for Spatial Data on a Regular Grid 301
{3d 2 d 2174
,\~ [(3) . (vol.(~O))3] (1 + 0(1))
~1· ')'f (1 + 0(1)), say. (12.32)
n
Next, using arguments as in (12.26), we see that the bias part of afn({3n)
is given by
1
- (3n vol. (Ro) L lilaw(i) + o ({3;;-1 )
2EZ d
(12.34)
Now, minimizing the leading terms in the expansion above, we get the
first-order optimal block size for estimating a~ (or a~) as
(12.35)
Note that for d = 1 and Ro = (-~,~], the constants 'l'f and ')'2 in
(12.32) and (12.33) are respectively given by 'l'f = ~ . [2a~l and ')'2 =
-22::: 1 iaw(i) and hence, the formula for the MSE-optimal block length
coincides with that given in Chapter 5. In particular, the optimal block
length (3~ for variance estimation grows at the rate O(NI/3) for d = l.
For d = 2, the optimal rate of the volume of the blocks (viz., (f3~)d) is
O(N 1/2), while for d = 3 it is O(N3/5), where N is the sample size. As
d!2 is an increasing function of d, (12.35) shows that one must employ
blocks of larger volumes in higher dimensions to achieve the best possible
performance of the bootstrap variance estimators.
In the next two sections, we consider validity of approximations generated
by the spatial bootstrap method for estimating the sampling distributions
of some common estimators.
are given by Z(Si) = (Zl(Si), .. " Zm(Si))', i = 1, ... , N, where the data
locations {Sl,' .. ,SN} lie on the integer grid 7l,d inside the sampling region
Rn (cf. (12.2)). Let F~m)O denote the empirical distribution function of
Z(sI), ... , Z(SN), defined by
N
F~m)(z) = N- l L l1(Z(si) :::; z), z E]Rm , (12.36)
i=l
where, recall that for two vectors x = (Xl"'" Xm)' E ]Rrn and Y =
(Yl,"" Ym)' E ]Rrn, we write X :::; Y if Xi :::; Yi for all 1 :::; i :::; m. Let
G(m)(z) = P(Z(O) :::; z), z E]Rm denote the marginal distribution function
of the process Z(·) under stationarity. Define the empirical process
(12.37)
Because the sample size N grows at the rate [vol.(Ro) . A~] (cf. (12.3)), an
alternative scaling sequence for the difference F~m) (-) - G(m) (.) is given by
the more familiar choice ViV. However, in the context of spatial asymp-
toties, A~/2 happens to be the correct scaling sequence even in presence
of partial infilling (cf. Zhu and Lahiri (2001)), while the scaling N l /2 is
inappropriate in presence of infilling. As a result, we shall use A~/2 as the
scaling sequence here.
Next, we define the bootstrap version of ~~m). Let Z~(Rn)
{Z* (Sl), ... , Z* (s N )} denote the block bootstrap version of the process
{Z(s) : S E Rn n 7l,d}, based on a block size parameter f3n. Let F~m)*(z) =
N- l 2:!ll1(Z*(Si) :::; z), z E ]Rrn, be the empirical distribution function
of {Z*(Sl),"" Z*(SN n. Then, the block bootstrap version of ~~m) is given
by
(12.38)
(12.39)
[vol.(Ro)r1 L {P(Z(O) :::; Zl, Z(i) :::; Z2) - G(m) (zI)G(m) (Z2) } ,
iEZ d
and
~~m)* ----+d w(m) as n ---7 00, a.s.
Proof: This is a special case of Theorem 3.3 of Zhu and Lahiri (2001),
who establish the theorem under a polynomial strong-mixing condition.
Here, we used the exponential mixing condition only to simplify the
statement of Theorem 12.2. See Zhu and Lahiri (2001) for details. D
(12.41 )
sup
jE'K.O ,go+anjEIIJ)O
IIY(gO + an!) - Y(go) - an y(l)(gO; !)II
= o(a n ) as n --+ 00 , (12.45)
where 11·11 denotes the usual Euclidean norm on lRP . In comparison, Frechet
differentiability of Y at go requires (12.45) to be valid for all bounded sets
OCO C lI»m. As a result, Frechet differentiability of a functional is a stronger
condition than Hadamard differentiability.
Hadamard differentiability of M-estimators and other important statisti-
cal functionals have been investigated by many authors; see Reeds (1976),
Fernholz (1983), Ren and Sen (1991, 1995), van der Vaart and Wellner
(1996), and the references therein. The following result proves the validity
of the spatial bootstrap for Hadamard differentiable functionals. Here, we
shall always assume that the domain lI»0 of definition of the functional Y
is large enough such that G(m), F~m)(.), F~m)*(.), E*F~m)*(.) E lI»0 (with
probability one). This ensures that the estimators Y(F~m)), Y(E*F~m)*)
of the parameter Y(G(m)) and the bootstrap version Y(F~m)*) of Y(F~m))
are well defined.
Theorem 12.3 Suppose that the conditions of Theorem 12.2 hold. Let
Y : lI»0 __ lRP be Hadamard differentiable at G(m) tangentially to C~ with
derivative y(l)(G(m);.) for some lI»0 C lI»m.
306 12. Resampling Methods for Spatial Data
(a) Then,
(b) Suppose that Y and y(l)(c(m);.) satisfy the following stronger ver-
sion of (12.44): For any an -+ 0, fn -+ f E JI]l+ and gn -+ c(m) with
gn, gn + anfn E JI]lO for all n 2: 1,
(12.47)
(12.48)
Proof: Part (a) follows from Theorem 3.9.4 of van der Vaart and Well-
ner (1996). Next consider part (b). Using Lemma 12.1, the Borel-Cantelli
Lemma, and the arguments in the proof of the Glivenko-Cantelli Theorem,
it can be shown that
f E JI]lm· Then, by (12.47) and (12.49), there exists a set A with P(A) = 1
such that on A,
and
Var(Z(i) - Z(i + h)) = Var(Z(O) - Z(h)) (12.51 )
for all i, h E Zd. The function 2"((h) == Var(Z(O) - Z(h)) is called the
variogram of the process Z(·). Note that if the process Z(.) is second-order
stationary with auto covariance function u(h) = Cov(Z(O), Z(h)), hE Zd,
then (12.50) holds and, for any i, h E Zd,
m rn
for some sequence {an}n>l of positive real numbers and for some positive
definite matrix E(O), the~ for all () E e,
(i)
an(On,v - 0) ---+d N(O, Dv(O)) as n - t 00 (12.57)
where Dv(O) = A(O)'V(O)E(O)V(O)A(O) ;
(ii)
(12.58)
is nonnegative definite for any V(O) ;
(12.60)
The subregions for the spatial subsampling method are obtained by con-
sidering suitable translates of the set {3nRO. More specifically, we consider
d-dimensional cubes of the form i +Uo{3n, i E 7l,d that are contained in the
sampling region Rn, where Uo = (-~, ~ld is the unit cube in ]Rd, centered
at the origin. Let ~ = {i E 7l,d : i + Uo{3n eRn} denote the index set of
such cubes. Then, we define the subregions {R~) : i E ~} by inscribing
for each i E ~, a translate of (3nRo inside the cube i + Uo{3n such that the
origin is mapped onto i, i.e., we define
R n(i) _.
- ~ + {3n R 0, .
~ E Ln .
.-Ml
(12.61)
Next we apply the subs amp ling method to obtain an estimator of the
covariance matrix of the (scaled) variogram estimator at lags hI' ... ' h K .
Thus, the random vector Tn here is now given by
(12.63)
Let 21'(i) (h) denote the lag-h variogram estimator obtained by replacing Zn
and n in the definition of 2in(h) by the subsample Z(i) and the subs ample
size e, respectively. Also, let 2i'n(h) == I~I-l L:iEIo 21'(i) (h). Then, the
subsample version of Tn is given by n
Proof: Let g(e) = (21'(h 1;eO) - 21'(h 1;B), ... ,21'(h K ;Bo) - 21'(h K ;B))',
Q(B) = g(B)'[~(eo)]-lg(B), Qn(B) = gn(B)'[i: n ]-l gn (B), and Qn(B)
gn(B)'[~(Bo)tlgn(B). Then,
°
Now, if possible, suppose that en -H Bo in probability as n ---+ 00. Then,
(by Proposition A.l, Appendix A), there exist an E > and a subsequence
{mn}n>l such that Ilemn - Boll 2: E for all n 2: 1. Now, by (12.66), there
is a fu;ther subsequence {ml n }n2':l of {m n }n2':l such that Li m1n = 0(1)
almost surely. Also note that under the hypotheses of Theorem 12.4, Q( B)
is strictly positive on 8\{Bo}, and Q(Bo) = 0. Thus, Q(B) has a unique
minimum at Bo. However, with probability 1, Qmln(em1J - QmlJBO) 2:
Q(e m1n ) - Q(Bo) - 2Li m1n 2: inf{Q(B) : liB - Boll> E} - 2Li m1n > for °
all n 2: no, for some no 2: 1. This contradicts the definition of em1n as the
minimizer of Qml n (B) for all n 2: no, proving part (a) of Theorem 12.4.
To prove the second part, let Wqr denote the (q, r) component of [i:n]-l,
1 :::; q, r :::; K. Also, let gnq(B) denote the qth component of gn(B) and
let rq('; B) = 0r(-; B)/oBq, 1 :::; q :::; p. Since en minimizes the function
gn(B)'[i:n]-lgn(B), it satisfies the equations
1 :::; m :::; p. Next, let {el == (I,O, ... ,O)', ... ,ep == (O, ... ,O,I)'} denote
the standard basis of ffi.p. Hence, by a one-term Taylor series expansion of
gnq(e n ) and gnr(en ) around Bo, we obtain,
KK Wqr {P[l
~?; ~ 1
-2ra(hr; uBo + (1- u){}n)du
]
KK
+ ~?; Wqr
{P[l
~ 1
-2ra(h q; uBo + (1 - u)en)du
]
1 :s; m :s; p. Then, it is easy to see that the set of p equations in (12.67)
can be rewritten as
(12.68)
where f~ = J;
f(ue o + (1 - u)en)du. Because en is a consistent estimator
of e and the matrix-valued function r(e) is continuous in e, the result
follows from (12.68), Condition (C.2), and Slutsky's Theorem. 0
Here, Nn(h) == {(Si' Sj) : Si - Sj = h, Si, Sj ERn} and, recall that, for any
finite set A, IAI denotes its size.
Theorem 12.5 Suppose that (12.4) and Condition (G.2) of Theorem 12.4
hold. Also, suppose that there exists a 'f) > 0 such that max{ EIZ(h j ) -
Z(0)112+21J : 1 :::; j :::; K} < 00 and the strong mixing coefficient a(a, b) of
Z(·) (cf. (12.16), Section 12.3) satisfies the condition .
for some C E (0, (0),71 > 5d(6+'f))/'f), and 0 < 72 :::; (71 -d)/d. Then, parts
(a) and (b) of Theorem 12.4 hold with en,RGLS = en,SGLS and 2i'nO =
2inO. The asymptotic covariance matrix DE-l(O) in part (b) is given by
DE-l (0) = (r(o)'~(O)-l r(O))' where the (q, r)-th element of ~(O) is
Proof: Follows from Theorem 5.1 and Remark 5.1 of Lee and Lahiri (2002).
D
1.769
10 12
x-lag
· 10 10 16
The results of the simulation study based on 3000 simulation runs are
summarized in Table 12.2. The leading numbers in columns 4- 5, respec-
tively, denote the means for the estimators of fh and fh, while the numbers
within parentheses represent N times the MSE, where N denotes the sam-
ple size. The first and the third columns of Table 12.2 specify the sizes of
the two sides of the rectangular sampling and subsampling regions, respec-
tively.
From the table, it appears that the SGLS method performed better than
the OLS and CWLS methods in most cases, and produced MSE values
that fell between those of the CWLS and TGLS methods. Furthermore, for
the nonsquare sampling region of size (10,30), the rectangular subregions
of size (4,6) yielded slightly better results than the square subregions of
size (4,4). See Lee and Lahiri (2002) for more simulation results under a
different variogram modeL The SGLS method has a similar performance
under the variogram model treated therein.
Although the SGLS method has the same asymptotic optimality as the
GLS method, its finite-sample statistical accuracy (as measured by the
MSE) may not be as good as the GLS (or the idealized TGLS) method,
particularly for small sample sizes. In the simulation studies carried out in
Lee and Lahiri (2002), the SGLS estimators typically provided improve-
ments over the OLS and the WLS estimators for small sample sizes and
became competitive with the GLS estimators for moderately large sample
sizes. A negative feature of the SGLS method is that the block size param-
eter !3n must be chosen by the user . A working rule of thumb is to use a !3n
that is comparable to A~·f2 in magnitude. On the other hand, as explained
in the previous paragraph, the computational complexity associated with
318 12. Resampling Methods for Spatial Data
TABLE 12.2. Mean and scaled mean squared error (within parentheses) of various
least squares estimators of 81 and 82 under variogram model (12.72). Here Rn
denotes the size of the rectangular sampling region, BS denotes the size of the
subsampling regions.
the GLS method can be much higher than that associated with the SGLS
method. Table 12.3 gives a comparison of the time required for computing
the SGLS and the GLS estimators, using an Alpha workstation. Here, I
denotes the number of times iterations in the optimization routine for the
GLS method are carried out. The reported times are obtained by aver-
aging 100 repetitions. It follows from Table 12.3 that the SGLS method
is considerably faster than the GLS method. However, the most impor-
tant advantage of the SGLS and other RGLS methods is that they provide
asymptotically efficient estimates of the covariance parameters even when
the form of the asymptotic covariance matrix of the generic variogram esti-
mator is unknown, in which case the GLS method is no longer applicable.
12.5 Bootstrap for Irregularly Spaced Spatial Data 319
(12.73)
(12.78)
for any d ~ 2 and for d = 1, (12.78) holds with T2 = o. As before, let G(m)
denote the marginal distribution of Z(O), Le., G(m)(A) = P(Z(O) E A),
A E 8(JRm). Also, recall that for a positive-definite matrix E of order kEN,
<p(.; E) denotes the probability measure corresponding to the k-dimensional
Gaussian distribution with mean 0 and variance matrix E.
Next, suppose that {Xn}n~l and {Z(s) : s E jRd} are defined on a com-
mon probability space (0, F, P). Let Px denote the joint probability distri-
bution of the sequence {Xn}n~l and let PIX and E.lx, respectively, denote
322 12. Resampling Methods for Spatial Data
the conditional distribution and expectation, given Xoo == 0'( {Xn : n 2 I}).
We now state the main result of this section that asserts consistency and
asymptotic normality of the multivariate M-estimator en conditional on
the sequence of iid random vectors Xl, X 2, ....
Theorem 12.6 Suppose that (12.75) has a unique solution, that (12.76)
holds, and that the following conditions hold:
(C.5) For some 7]0 E (0,00), llJ(z; t) has continuous second-order partial
derivatives with respect to t on the set {lit - 811 ::; 7]0} == 8 0 for
almost all z (G(m)).
Then, for both the pure- and the mixed-asymptotic structures (i.e., for '~ E
(0,00) , and for '~ = +00', respectively),
When the solution to equation (12.75) is unique and the other condi-
tions of Theorem 12.6 hold, (12.79) shows that en is consistent for 8 for
almost all realizations of the random vectors Xl, X 2, ... , i.e., On ---+ 8 in
Pix-probability, a.s. (Px ). (See Definition 12.3, (12.89), and (12.90) in
Section 12.5.4 below for a precise definition of this notion of convergence
and its connection with the usual notion of convergence in probability.)
When the uniqueness condition on the solution to (12.75) does not hold,
Lahiri (2003d) shows that a consistent sequence of solutions of (12.75) ex-
ists. For the nonunique case, conclusions of Theorem 12.6 remain valid for
this sequence of solutions.
12.5 Bootstrap for Irregularly Spaced Spatial Data 323
Although all the blocks Bn(i; k), i E In of "type k" have the same shape as
the subregion Rn(k) in the stochastic design case, each of them may contain
a different number of sampling sites, as the sampling sites Sl, . .. ,Sn are
randomly distributed over the sampling region Rn. Since the bootstrap
version of the process over Rn(k) is defined by randomly selecting one of
the "type k" blocks, the number of the resampled observations over Rn(k)
is typically different from the number of observations in Rn(k) itself. Let
L'k == L'k n denote the size of the resample Z~(Rn(k)) over the subregion
Rn(k), k' E IC n . Also, let n* = 2:kEKn L'k denote the total number of the
resampled values over the sampling region R n , i.e., n* is the size of Z~(Rn).
Although the resample size n * is typically different from the original sample
size n, it can be shown that
E.lx(n*)
--'----'-...:... ~ 1 as n ~ 00, a.s. (P:x). (12.81)
n
We define the bootstrap version of a statistic in = tn(Zn(Rn)) by
(12.82)
In particular, the bootstrap version of the sample mean Zn
n- 1 2:~=1 Z(Si) is given by
Z~ = L S~(k)/n* , (12.83)
kEKn
where S~(k) is the sum of the L'k-many resampled values Z~(Rn(k)) over
the subregion Rn(k), k E IC n . For later reference, we also define the boot-
strap version of the normalized M-estimator
_ d/2
Tn - An (On - 0) ,
A
12.5 Bootstrap for Irregularly Spaced Spatial Data 325
where, in this section, P* and E*, respectively, denote the conditional prob-
ability and the conditional expectation given g == a( {Xn : n :::: I} U {Z( s) :
S E ]Rd}). The bootstrap version of Tn is now given by
It can be shown (cf. Lahiri (2003d)) that a~,LJ. is not only the asymptotic
variance of A~/2 (Zn - /L), but it is also the exact limit of A~ . Var.lx(Zn), a.s.
(Px ). This shows that both the spatial sampling density f and the type of
the asymptotic structure (viz., pure increasing domain and mixed increas-
ing domain asymptotic structures) have nontrivial effects on the variance
- d/2 -
of Zn. Under both asymptotic structures, the variance of An Zn takes the
minimum value when the design density f is uniform over Ro. On the other
hand, as noted earlier, the infill component of the mixed increasing domain
asymptotic structure (with "b. = 00") leads to a reduction in the variance
of the scaled sample mean A~/2 Zn. Inspite of the variations in the form of
the asymptotic variance due to these factors, the block bootstrap method
provides a "consistent" estimator of a~ LJ. in all cases, as shown by the
following result. '
Theorem 12.7 Suppose that Conditions (0.4) and (0. 7)r of Theorem
12.6 hold with p = 1 = m, r = 3 and 1]!(x; t) = x - t. Also, suppose
that there exists <5 E (0, 1) such that
f3;;1 + A;;(l-O) f3n = 0(1) as n ---+ 00 (12.88)
and that (12.76) holds. Then, for almost all realizations of Xl, X 2 , ... under
Px,
A~.Var*(Z~) ---+ a~,LJ. in P.lx-probability, a.s. (Px)
where Z~ is as defined by (12.83).
Proof: See Lahiri (2003d) D
The reason for stating Theorem 12.7 using the nonstandard notion of con-
vergence is that it allows us to interpret consistency of the bootstrap vari-
ance estimator for almost all realizations of the stochastic design vectors
Xl, X 2 , . ..• Thus, once the values of Xl, X 2 , ..• are given, i.e., once the
locations of the sampling sites are given, we may concern ourselves only
with the randomness arising from the random field Z(·) and the bootstrap
variables {h : k E Kn} n> 1, by treating the locations {Sl' ... , Sn} as non-
random. However, the usual notion of "convergence-in-probability" (viz.,
(12.90)) does not allow such an interpretation in the stochastic design case.
Next we consider properties of the bootstrap distribution function es-
timators. As in Section 12.5.2, here we suppose that the random field
{Z(s) : s E ~d} is stationary and m-dimensional for some mEN. Let
en denote the M-estimator of the p-dimensional parameter e based on
d/2
Z(sd, ... , Z(sn), as defined by (12.75). Let Tn = An (en - e) be the nor-
A
malized version of en and let T~, defined in (12.85), be its bootstrap version.
Then, we have the following result.
Theorem 12.8 Suppose that e~ is a unique solution of (12. 84}. Also, sup-
pose that Conditions (C.4), (C.5), (C.6), and (C.7}r of Theorem 12.6 hold
with r = 3 and that (12.76) and {12.88} hold. Then,
of Chapter 10, here the block bootstrap method remains valid even in pres-
ence of a particular form of "strong" dependence in the data, engendered
by the mixed increasing domain asymptotic structure.
Although we use the same symbol Sn to denote the collection of all sampling
sites in Sections 12.3, 12.5, and in here, the size of the set Sn is different in
each case, depending on the spatial design. For the rest of Section 12.6.1,
we shall use N 2n to denote the size of Sn. Then, under Condition B on the
boundary of the prototype set Ro, the sample size N2n satisfies the relation
°
Since 'fJn 1 as n ---+ 00, this implies that the sample size N2n grows at a
faster rate than the volume of the sampling region Hr.. Thus, the resulting
asymptotic structure is of the "mixed increasing domain" type, with a
nontrivial infill component. A predictor of 1:::..00 based on the observations
{Z(s): s E Sn} is given by
°
Under mild conditions on the process Z(·) and the function g(.), I:::.. n is L2_
consistent for 1:::..00 in the sense that E(l:::.. n - 1:::..(0)2 ---+ as n ---+ 00. The rate
at which the mean squared prediction error (MSPE) E(l:::.. n - 1:::..(0)2 goes
to zero depends on both the increasing domain scaling parameter {An}n>l
and the infill scaling parameter {'fJn}n::::l.
Lahiri (1999b) considers the spatial cumulative distribution function
(SCDF)
Foo(zo) = f n(Z(s)::; zo)ds, Zo (12.96)
lRn
E]R ,
Fn(zo)
A
= N2n1 ~
' " n(Z(s) ::; zo), Zo E ]R , (12.97)
sESn
(12.98)
(12.100)
and
(12.101)
Here ,8n will be used to construct the blocks or subregions of R n , while "Yn
will be used to construct a subsample version of the Z (. )- process on the
subregions at a lower level of resolution. As in Sections 12.3-12.5, the re-
quirement (12.101) says that the volume of the subregions grow to infinity,
but not as fast as the volume ofthe original sampling region Rn. Similarly,
the conditions on bn}n~l given by (12.100) say that "Yn tends to zero but
at a slower rate than the original rate 'rJn of infilling. Thus, the scaled grid
"Yn7l,d is a subgrid of'rJn7l,d for any n 2: 1 and, therefore, has a lower level of
resolution. For a given subregion Rn,i (say), we use the observations in Rn,i
on the finer grid 'rJn7l,d to define the subsample copy of the unobservable pre-
dictand ADO and the observations in Rn,i on the coarser grid "Yn7l,d to define
the subsample copy of the predictor An. Here we only consider overlapping
subregions Rn,i'S; a nonoverlapping version of the subsampling method
can be defined analogously by restricting attention to the sub collection of
nonoverlapping subregions only. Let U o = (-~, ~ld denote the unit cube in
!Rd , with its center at the origin. Also, let IOn = {i E 7l,d : 'rJni+Uo,8n eRn}
be the index set of all cubes of volume ,8~ that are centered at 'rJni E 'rJn7l,d
and are contained in Rn. Then, the subregion Rn,i is defined by inscribing
a scaled down copy of the sampling region Rn inside i + Uo,8n such that
the origin is mapped onto i (cf. Section 12.4.3). Specifically, we let
Note that Rn,i has the same shape as the original sampling region R n , but
a smaller volume, ,8~vol.(Ro), than the volume A~vol.(Ro) of Rn. Next, we
define the subsample versions of ADO and An for each i E Ion. To that end,
note that Rn,i's are congruent to ,8nRo == Rn,o and that the numbers of
sampling sites in Rn,i over the finer grid 'rJn7l,d and over the coarser grid
"Yn7l,d are respectively the same for all i. Let Ln == Land Rn == R denote
the sizes of the sets ,8nRo n 'rJn7l,d and ,8nRo n "Yn7l,d, respectively. For each
i E Ion, we think ofthe L observations {Z(s) : s E Rn,in'rJnZd} on the finer
grid as the analog of {Z(s) : s ERn} and the R observations {Z(s) : s E
Rn,i n "Yn7l,d} as the analog of the original sample {Z(s) : s E Rn n'rJn7l,d},
at level of the subsamples. Hence, we define the subs ample versions of ADO
12.6 Resampling Methods for Spatial Prediction 331
£-1
I:
sE,n'ildnRn,i
g(Z(S)) , (12.102)
i E Ton. Then, for a random variable of interests Tn = tN 2n (.6.. n ; .6.. CXJ ), its
subsample version on the subregion Rn,i is defined as
(12.103)
Note that we use tp in the definition of T:' i' as £ is the analogous quantity
to the sample size N 2n at the level of subs~mples. The subsample estimator
of G n (-) == P(Tn :::; .) is now given by
F~,i(ZO) L- 1
I:
SE7)n 'il d nRn "
n(Z(s) :::; zo) (12.105)
F:',i(ZO) C- 1 I:
sE,n'ildnRn,i
n(Z(s) :::; zo) , (12.106)
Zo E JR, i E Ton. Let w : JR ---+ [0, (0) be a measurable function and let
(12.107)
where for any function h : JR ---+ JR, we write IlhllCXJ,w = sup{lh(x)lw(x) :
x E JR}. Then, the subsampling estimator of the sampling distribution
GIn (-) == P(Tln :::; .) is given by
Zl, Z2 E JR, s E Rd. Also, define the p-mixing coefficient of the random field
Z(·) by
(C.S) There exist positive constants C, Tl, T2 satisfying T1 > 3d and T2 <
Td d such that
(a) Then, there exists a zero mean Gaussian process W such that
(12.109)
where w(x) == 1 for all x E JR. and ]PI is the collection of all probability
distribution functions on R Then, under the conditions of Theorem 12.9,
this prediction band attains the nominal coverage probability as n ----+ 00.
334 12. Resampling Methods for Spatial Data
g(Z(s)) , (12.113)
(12.114)
(12.116)
n X n matrix with (i,j)-th entry CJ(Si - Sj), 1:S i,j:S n and let In denote
the n x 1 vector with i-th entry CJ(so - Si), 1 :S i :S n. Then, the BLUP
Zn(SO) of Z(so) is given by
T~ E( Zn(SO) - Z(so) f
CJ(0)2 _ y'n~-ly + (I' ~-ly _ 1)2(1' ~-11 )-1
nn nnn nnn'
(12.118)
(12.119)
where Z", is the upper a critical point of the standard normal distribution,
i.e., <I>(z",) = 1 - a, with <1>(.) denoting the distribution function of the
standard normal distribution N(O, 1). Note that in(a) attains the nominal
coverage probability (1 - 2a) exactly, i.e.,
(12.120)
(12.122)
In general, ao also depends on the true parameter value eo and may not be
known. Sjostedt-DeLuna and Young (2003) suggested using a parametric
bootstrap to calibrate the plug-in interval. Let
(12.124)
where GnU == 7fn ('; eo, On), eo denotes the unknown true value of the pa-
rameter e, On is the estimator used to define the plug-in interval inCa), and
ao is the unknown calibration level that depends on the distribution of On
and on eo. Because z"! is a decreasing function of "Y, it is easy to verify that
Gnh) is a decreasing function of "Y. Hence, ao can be found by inverting
relation (12.124).
Next we generate an estimator of GnU, using the parametric bootstrap
method. Let Z* (so), Z*(Sl), ... ,Z*(sn) be a collection of Gaussian random
variables with mean 0 and covariances
(12.125)
where Cov* denotes the conditional covariance given Z(sd, ... , Z(sn). Let
e~ be defined by replacing Z(sd, ... , Z(Sn) in the definition of On with
Z*(sd, ... , Z*(sn). Similarly, let Z~(so) and T~ be respectively defined by
replacing Z(sd, ... , Z(sn) and On in the definitions of Zn(SO) == Zn(SO; On)
and Tn(On) by Z*(Sl),"" Z*(sn) and e~. Then, for 0 < "Y < 1/2, the
bootstrap estimator G n h) of G n h) is given by
(12.128)
For easy reference, here we collect some standard results and definitions
from Probability Theory for independent and dependent random variables
that have been used in this monograph. For proofs and further discussions,
see the indicated references. The first set of definitions deal with the basic
convergence notions.
(ii) Suppose that {Xn }n> 1 and X are defined on the same probability
space (O,F,P). Then, {Xn}n>l is said to converge to X in prob-
ability, denoted by Xn ----';p X, if (Xn - X) ----';p 0, i.e., for any
E > 0,
for any A E B(R d) with P(X E 8A) = 0, where 8A denotes the boundary
of A.
For more details and discussion on this topic, see Parthasarathi (1967),
Billingsley (1968), Huber (1981), and the references therein.
The next set of definitions and results relate to the notion of stopping
times and moments of randomly stopped sums, which play an important
role in the analysis of the SB method in Chapters 3-5.
The next result is a Strong Law of Large Numbers (SLLN) for indepen-
dent random variables.
Theorem A.3 (SLLN): Let {Xn }n>l be a sequence of iid random vari-
ables with EIX1 1 < 00. Then, -
n
n- 1 EXi -+ EX 1 as n -+ 00, a.s.
i=l
A refinement of Theorem A.3 is given by the following result. For a proof,
see Theorem 5.2.2, Chow and Teicher (1997).
Theorem A.4 (Marcinkiewicz-Zygmund SLLN): Let {Xn}n~l be a se-
quence of iid random variables and p E (0,00). If EIX1 1P < 00, then
n
n- 1/ p E(Xi - c) -+ 0 as n -+ 00, a.s. , (A.5)
i=l
for any c E JR if p E (0,1) and for c = EX1 if P E [1,00). Conversely, if
(A.5) holds for some c E JR, then EIX1 1P < 00.
The next result is a Central Limit Theorem (CLT) for sums of indepen-
dent random vectors with values in JRd. For a proof in the one-dimensional
(Le., d = 1) case, see Theorem 9.1.1 of Chow and Teicher (1997). For d 2: 2
it follows from the one-dimensional case and the Cramer-Wold Device (cf.
Theorem A.1).
Theorem A.5 (Lindeberg's CLT): Let {Xnj : 1 :::; j :::; rn}n~l be a
triangular array where, for each n 2: 1, {Xnj : 1 :::; j :::; rn} is a finite
collection of independent JRd-valued (d E N) random vectors with EXnj = 0
2:;:1
for all 1 :::; j :::; rn and EXnjX~j = lId. Suppose that {Xnj : 1 :::; j :::;
rn}n~l satisfies the Lindeberg's condition:
for every E > 0,
rn
lim "EIIXnjIl2n(IIXnjll > E) = 0 .
n-+oo L..J
(A.6)
j=l
Then,
E X nj ~d N(O, lId)
rn
as n -+ 00 .
j=l
Appendix A. 343
sup
xEIR
Ip(; tX ~ x) - <J>(x) I
V nan j=1
j
< (2.75)
n
~2 t
j=1
(EIXj I3/a!) , (A.7)
where <J>(x) denotes the distribution function of the standard normal distri-
bution on JR..
Next we consider the dependent random variables.
Definition A.4 A sequence of random vectors {XihEZ is called stationary
if for every i 1 < i2 < . . . < i k, kEN, and for every m E Z, the distributions
of(Xi1, ... ,Xik )' and (Xil+m, ... ,Xik+m)' are the same.
Definition A.5 A sequence of random vectors {XihEZ is called m-
dependent for some integer m ~ 0 if a({Xj : j ~ k}) and a({Xj : j ~
k + m + 1}) are independent for all k E Z.
Definition A.6 Let {XihEZ be a sequence of random vectors. Then the
strong mixing or a-mixing coefficient of {XihEZ is defined as
(i) Suppose that P(IX1 1 :::; c) = 1 for some c E (0, (0) and that
2:::~=1 a(n) < 00. Then
°: :; O"~ ==
00
(A.9)
2:::~=1[a(n)l6/2H < 00. Then (A.8) holds. If, in addition, O"~ > 0,
then (A . g) holds.
Appendix B
IEf(Sn) - J fdWn,sl
where fn(x) = f(x - n- I / 2 'L.7=1 E1jn), x E ~d, and where WI,n,s is ob-
tained from Wn,s by replacing the cumulants of Xl's with those of Zjn's.
Since SIn may not have a density, it is customary to add to SIn a suitably
small random vector that has a density and that is independent of SIn.
The additional noise introduced by this operation is then assessed using
a smoothing inequality. Applying Corollary 11.2 (a smoothing inequality)
and Lemma 11.6 (the inversion formula) of Bhattacharya and Rao (1986),
as in their proof of Theorem 20.1, for any 0 < E < 1, we get
S Ms(f)· C(s,d)
s+d+1
L
1<>1=0
JID<>H~(t)1 exp(-Elltll l / 2 )dt
346 Appendix B.
~ S 11311 s ~, 113- 1 11 s ~ .
Let 31 = 3- 1/ 2 and Htn(t) = H),(3 1 t), t E JRd. Then, it is easy to check
that
n
n- 1 '"""' EIIZ·In Ils+d+l <
~ _ C(8
, d)n(d+l)/2 min{p-n,s, p- n,s } ,
j=1
Next consider the case where Iltll > aln. Note that for a > 0,
(B.5)
Appendix B. 347
and
(B.6)
kEN. Now using (B.5), (B.6), the definition ofthe polynomials Pr(';') (cf.
(6.17)), and Lemma 9.5 of Bhattacharya and Rao (1986), we get
s+d+1
L
lal=O
l Iltll>al n
IDa Jexp(~t'x)dIJ!1,n,s+d+1(x)1
< C(s, d) r [1 + IltI13(s+d-2)+(S+d+1)]
J11tll>al n
[ ?; n- r / 2(1 + Pn,s)n(r-S)+/2]
S+d+1
X e-31ItI12/8dt
< C(s, d) r
Ja n:::::lltll:::::a2n
l
(1 + Iltll s +d+1) exp( -511t11 2/24)dt
r
L n- r / 2 II EIZjinl
r=O
+ 1· r
J 11 tll>c 4
exp( -ElltI11/2)dt]
and
(B.10)
Theorem 6.1 now follows from the bounds (B.1), (B.2), (B.4), and (B.7)-
(B.IO).
Proof of Theorem 6.2: Note that by (6.26) and (6.27),
n
Pn,s = n- 3 / 2 LEIIXn,jIlS+1n(IIXn,jll S n 1 / 2 )
j=l
n
< n- 1 LEIIXn,jIISn(IIXn,jll > n 1 / 2 -,,)
j=l
n
+n- 1 / 2 n- 1 LEIIXn,jlls+ln(IIXn,jll S n 1 / 2 -,,)
j=l
Hence, setting E = 'f/n and applying (6.28) and Theorem 6.1, we get
sup IP(Sn E B) - Wn,s(B)1 = 0(n-(s-2 l /2) + o( sup w(2'f/n; nB, <1») •
BEB BEB
Theorem 6.2 now follows by noting that W(E; nB, <1» = <I> ((8B)€) for all
E > 0 (cf. Corollary 2.6, Bhattacharya and Rao (1986)).
References
Arcones, M. and Gine, E. (1989), 'The bootstrap of the mean with arbitrary
bootstrap sample', Annales de l'Institut Henri Poincare 25,457-481.
Arcones, M. and Yu, B. (1994), 'Central limit theorems for empirical and
U-processes of stationary mixing sequences', Journal of Theoretical
Probability 7, 47-70.
Athreya, K. B., Lahiri, S. N. and Wei, W. (1998), 'Inference for heavy tailed
distributions', Journal of Statistical Planning and Inference 66,61-75.
Beran, J. (1994), Statistics for Long Memory Processes, Chapman and Hall,
London.
Bickel, P., Gotze, F. and van Zwet, W. (1997), 'Resampling fewer than n
observations: Gains, losses, and remedies for losses', Statistica Sinica
7, 1-31.
Bretagnolle, J. (1983), 'Limit laws for the bootstrap of some special func-
tionals (french)', Annales de l'Institut Henri Poincare, Section B, Cal-
cul des Probabilities et Statistique 19, 281-296.
Carlstein, E. (1986), 'The use of subseries methods for estimating the vari-
ance of a general statistic from a stationary time series', The Annals
of Statistics 14, 1171-1179.
Carlstein, E., Do, K.-A., Hall, P., Hesterberg, T. and Kiinsch, H. R. (1998),
'Matched-block bootstrap for dependent data', Bernoulli 4,305-328.
Chernick, M. R. (1981a), 'A limit theorem for the maximum of autore-
gressive processes with uniform marginal distributions', The Annals
of Probability 9, 145-149.
Chernick, M. R. (1981b), 'On strong mixing and leadbetter's D condition',
Journal of Applied Probability 18, 764-769.
Chernick, M. R. (1999), Bootstrap Methods: A Practitioner's Guide, Wiley,
New York.
Chibishov, D. M. (1972), 'An asymptotic expansion for the distribution of
a statistic admitting an asymptotic expansion', Theory of Probability
and Its Applications 17, 620-630.
Choi, E. and Hall, P. (2000), 'Bootstrap confidence regions computed from
autoregressions of arbitrary order', Journal of the Royal Statistical
Society, Series B 62,461-477.
Chow, Y. and Teicher, H. (1997), Probability Theory: Independence, In-
terchangeability, Martingales, 2nd edn, Springer-Verlag, Berlin.
Cressie, N. (1985), 'Fitting variogram models by weighted least squares',
Journal of the International Association for Mathematical Geology
17, 693-702.
Cressie, N. (1993), Statistics for Spatial Data, 2nd edn, Wiley, New York.
Dahlhaus, R. (1983), 'Spectral analysis with tapered data', Journal of
Times Series Analysis 4, 163-175.
Dahlhaus, R. (1985), 'Asymptotic normality of spectral estimates', Journal
of Multivariate Analysis 16, 412-43l.
Dahlhaus, R. and Janas, D. (1996), 'A frequency domain bootstrap for ratio
statistics in time series analysis', The Annals of Statistics 24, 1934-
1963.
Datta, S. (1995), 'Limit theory and bootstrap for explosive and partially
explosive autoregression', Stochastic Processes and their Applications
57, 285-304.
Datta, S. (1996), 'On asymptotic properties of bootstrap for AR(l) pro-
cesses', Journal of Statistical Planning and Inference 53,361-374.
References 353
Gine, E. and Zinn, J. (1989), 'Necessary conditions for the bootstrap of the
mean', The Annals of Statistics 17,684-691.
Kiinsch, H. R. (1989), 'The jackknife and the bootstrap for general station-
aryobservations', The Annals of Statistics 17,1217-1261.
Lahiri, S. N. (1993b), 'On the moving block bootstrap under long range
dependence', Statistics and Probability Letters 18, 405-413.
Lahiri, S. N. (2002a), 'On the jackknife after bootstrap method for depen-
dent data and its consistency properties', Econometric Theory 18,79-
98.
Politis, D. N. and Romano, J. P. (1992b), A circular block res amp ling pro-
cedure for stationary data, in R. Lepage and L. Billard, eds, 'Exploring
the Limits of Bootstrap', Wiley, New York, pp. 263-270.
Yoshihara, K., 92
Young, S., 337-338
Yu, B., 92
Optimal block size, 12, 124, 126- Trimmed mean, 91, 110
127, 176, 178, 181, 183-
184, 186, 194 Variance, 119, 122, 299
374 Subject Index
...............
._N._ _r._
-....
. . . 111_ Slal',licsnntf (amp "I i"!)
--
Subsampllua
.......
-
DESIGN AND ANALYSIS OF RANDOM NUMBER
COMPUTER EXPERIMENTS GENERATION AND MONTE
THOMAS J. SANTER. WIWAM NOTZ.
BRIAN J. WIWAMS
CARLO METHODS
econd Edition
This book describes methods for designing and JAMES E. GENTLE
analyzi ng experimenl~ conducted u ing com-
putercode in lieu of a physical experiment. It di - Thi book survey techniques of random num-
cus es how to elect the values of the factors at ber generation and the u e of random numbers
which 10 run the code (the design of the computer in Monte Carlo simulation. The book covers
experiment) in light of the research objective of ba ic principle , as well as newer methods such
the experimenter. It also provides techniques for as parallel random number generation, nonlinear
analyzing the re ulting data so as to achieve congruential generators, quasi Monte Carlo meth-
these research goal . It illu trales these methods ods, and Markov chain Monte Carlo. The best
with code that i avai lable to the reader at the com- methods for generating random variate from the
panion web ite for the book. standard distributions are presented, but al 0
2003/240 PP./ HARDCOVER/ISBN 0-387·95420-1
general techniques useful in more complicated
SPRINGER SERIES IN STATISTICS model. and in novel settings are described. The
emphasis throughout the book is on practical
methods that work well in current computing envi-
SUBSAMPLING ronment . The second edition 'includes advances
DIMITRIS N. POlITIS, JOSEPH P. ROMANO, in methods for parallel random number genera-
MICHAEl WOLF
tion, universal method for generation of nonuni-
Since Efron's profound paper on the bootstrap, form variates, perfect sampling, and software for
an enorm us amount of effon has been pent on random number generation.
the development of boot trap,jacknife, and other 2003/392 PP ./HARDCOVER/ ISBN 0-387-00178-6
resampling methods. The primary goal of the e STATISTICS AND COMPUTING
computer-intensive methods has been to provide
stati tical tools that work in complex situations
without imposing unrealistic or unverifiabl e
assumptions about the data generating mechani m.
The primary goal of this book is to lay ome of
the foundation for subsam pling methodology
and relaled method .
To Order or for Infonnation:
1999/ 384 PP./ HARDCOVER/ ISBN 0-387·98854-8
SPRINGER SERIES IN STATISTICS In the Americas: CAll.: 1-800-SPRINGER Of
FAX: (201) 348-4505 • WIIIT'E: 5p(inger·Verlag New
Yofk. Inc.. Dept. 55612. po Box 2485. Secaucus. NJ
07~2485 • VISIT: Your locallechnlcal bookstore
'.
• E-MM..: ordersOsprtnger-ny.com
'. .~
_ .C) • • OUtside the Americas; CAll.: -+49 (0) 6221 345-217/B
• FAX: + 49 (0) 6221 345-229 • WIIIT'E: SprInger
'~: ":i Springer Customer Service. Haberstrasse 7, 69126
Heidelberg. Germany · E-MM..: OfdersOsprlnger.de
.f, .'\
PRGMOTION: &5612