You are on page 1of 382

Springer Series in Statistics

Advisors:
P. Bickel, P. Diggle, S. Fienberg, K. Krickeberg,
I. Olkin, N. Wermuth, S. Zeger

Springer Science+Business Media, LLC


Springer Series in Statistics
Andersen/Borgan/Gil/IKeiding: Statistical Models Based on Counting Processes.
Atkinson/Riani: Robust Diagnostic Regression Analysis.
Berger: Statistical Decision Theory and Bayesian Analysis, 2nd edition.
BorglGroenen: Modem Multidimensional Scaling: Theory and Applications
BrockwelllDavis: Time Series: Theory and Methods, 2nd edition.
Chan/Tong: Chaos: A Statistical Perspective.
Chen/ShaoIIbrahim: Monte Carlo Methods in Bayesian Computation.
David/Edwards: Annotated Readings in the History of Statistics.
DevroyelLugosi: Combinatorial Methods in Density Estimation.
Efromovich: Nonparametric Curve Estimation: Methods, Theory, and Applications.
EggermontlLaRiccia: Maximum Penalized Likelihood Estimation, Volume I:
Density Estimation.
FahrmeirlTutz: Multivariate Statistical Modelling Based on Generalized Linear
Models, 2nd edition.
Fan/Yao: Nonlinear Time Series: Nonparametric and Parametric Methods.
Farebrother: Fitting Linear Relationships: A History of the Calculus of Observations
1750-1900.
Federer: Statistical Design and Analysis for Intercropping Experiments, Volume I:
Two Crops.
Federer: Statistical Design and Analysis for Intercropping Experiments, Volume II:
Three or More Crops.
GhoshiRamamoorthi: Bayesian Nonparametrics.
GlaziNauslWallenstein: Scan Statistics.
Good: Permutation Tests: A Practical Guide to Resampling Methods for Testing
Hypotheses, 2nd edition.
Gourieroux: ARCH Models and Financial Applications.
Gu: Smoothing Spline ANOVA Models.
Gyorj'zlKohlerlKrzyzak/ Walk: A Distribution-Free Theory of Nonparametric
Regression.
Haberman: Advanced Statistics, Volume I: Description of Populations.
Hall: The Bootstrap and Edgeworth Expansion.
Hardie: Smoothing Techniques: With Implementation in S.
Harrell: Regression Modeling Strategies: With Applications to Linear Models,
Logistic Regression, and Survival Analysis
Hart: Nonparametric Smoothing and Lack-of-Fit Tests.
HastieiTibshiranilFriedman: The Elements of Statistical Leaming: Data Mining,
Inference, and Prediction
HedayatiSloanelStujken: Orthogonal Arrays: Theory and Applications.
Heyde: Quasi-Likelihood and its Application: A General Approach to Optimal
Parameter Estimation.
HuetlBouvierlGruetlJolivet: Statistical Tools for Nonlinear Regression: A Practical
Guide with S-PLUS Examples.
Ibrahim/Chen/Sinha: Bayesian Survival Analysis.
Jolliffe: Principal Component Analysis.

(continued after index)


S.N. Lahiri

Resampling Methods
for Dependent Data

With 25 Illustrations

, Springer
S.N. Lahiri
Department of Statistics
Iowa State University
Ames, IA 50011-1212
USA

Library of Congress Cataloging-in-Publication Data


Labiri, S.N.
Resampling metbods for dependent data / S.N. Labiri.
p. cm. - (Springer series in statistics)
Includes bibliographical references and index.
ISBN 978-1-4419-1848-2 ISBN 978-1-4757-3803-2 (eBook)
DOI 10.1007/978-1-4757-3803-2
1. Resampling (Statistics) 1. Title. II. Series.
QA278.8.L344 2003
519.5'2-dc21 2003045455

ISBN 978-1-4419-1848-2 Printed on acid-free paper.

© 2003 Springer Science+Business Media New York


Originally published by Springer-Verlag New York, Inc. in 2003
Softcover reprint ofthe hardcover Ist edition 2003
AlI rights reserved. This work may not be translated or copied in whole or in part without the
written permission of tbe publisher Springer Science+Business Media, LLC,
except for brief excerpts in connection with reviews or scholarly analysis. Use
in connection with any form of information storage and retrieval, electronic adaptation, computer
software, or by similar or dis similar methodology now known or hereafter developed is forbidden.
The use in this publication of trade names, trademarks, service marks, and similar terms, even if
they are not identified as such, is not to be taken as an expression of opinion as to whether or not
they are subject to proprietary rights.

987 6 5 4 3 2 l SPIN 10922705

Typesetting: Pages created by the author using a Springer TEX macro package.

www.springer-ny.com
To my parents
Preface

This is a book on bootstrap and related resampling methods for temporal


and spatial data exhibiting various forms of dependence. Like the resam-
pling methods for independent data, these methods provide tools for sta-
tistical analysis of dependent data without requiring stringent structural
assumptions. This is an important aspect of the resampling methods in the
dependent case, as the problem of model misspecification is more preva-
lent under dependence and traditional statistical methods are often very
sensitive to deviations from model assumptions. Following the tremendous
success of Efron's (1979) bootstrap to provide answers to many complex
problems involving independent data and following Singh's (1981) example
on the inadequacy of the method under dependence, there have been several
attempts in the literature to extend the bootstrap method to the dependent
case. A breakthrough was achieved when resampling of single observations
was replaced with block resampling, an idea that was put forward by Hall
(1985), Carlstein (1986), Kiinsch (1989), Liu and Singh (1992), and others
in various forms and in different inference problems. There has been a vig-
orous development in the area of res amp ling methods for dependent data
since then and it is still an area of active research. This book describes
various aspects of the theory and methodology of resampling methods for
dependent data developed over the last two decades.
There are mainly two target audiences for the book, with the level of
exposition of the relevant parts tailored to each audience. The first five
chapters of the book are written in a pedantic way, giving full details of
the proofs of the theoretical results and step-by-step instructions for im-
plementation of the methodology. This part of the book, together with
VIII Preface

selected material from the later chapters, can be used as a text for a grad-
uate level course. For the first part, familiarity with only basic concepts of
theoretical Statistics is assumed. In particular, no prior exposure to Time
Series is needed. The second part of the book (Chapters 6-12) is written
in the form of a research monograph, with frequent reference to the litera-
ture for the proofs and for further ramification of the topics covered. This
part is primarily intended for researchers in Statistics and Econometrics,
who are interested in learning about the recent advances in this area, or
interested in applying the methodology in their own research. A third po-
tential audience is the practitioners, who may go over the descriptions of
the resampling methods and the worked out numerical examples, but skip
the proofs and other technical discussions. Many of the results presented in
the book are from preprints of papers and are yet to appear in a published
medium. Furthermore, some (potential) open problems have been pointed
out.
Chapter 1 gives a brief description of the "bootstrap principle" and ad-
vocates resampling methods, at a heuristic level, as general methods for
estimating what are called "level-2" (and "higher-level") parameters in the
book. Chapter 2 sketches the historical development of bootstrap methods
since Efron's (1979) seminal work and describes various types of bootstrap
methods that have been proposed in the context of dependent (temporal)
data. Chapter 3 establishes consistency of various block bootstrap meth-
ods for estimating the variance and the distribution function of the sample
mean. Chapter 4 extends these results to general classes of statistics, in-
cluding M-estimators and differentiable statistical functionals, and gives
a number of numerical examples. Chapter 5 starts with a numerical com-
parison of different block bootstrap methods and follows it up with some
theoretical results. Chapter 6 deals with Edgeworth expansions and second-
order properties of block bootstrap methods for normalized and studentized
statistics under dependence. Chapter 7 addresses the important problem
of selecting the optimal block size empirically. Chapter 8 treats bootstrap
based on independent and identically distributed innovations in popular
time series models, such as the autoregressive processes. Chapter 9 deals
with the frequency domain bootstrap. Chapter 10 describes properties of
block bootstrap and subsampling methods for a class of long-range depen-
dent processes. Chapter 11 treats two special topics - viz., extremums of
dependent random variables and sums of heavy-tailed dependent random
variables. As in the independent case, here the block bootstrap fails if the
resample size equals the sample size. A description of the random limit is
given in these problems, but the proofs are omitted. Chapter 12 consid-
ers resampling methods for spatial data under different spatial sampling
designs. It also treats the problem of spatial prediction using resampling
methods. A list of important definitions and technical results are given in
Appendix A, which a reader may consult to refresh his or her memory.
Preface IX

I am grateful to my colleagues, coauthors, and teachers, A. Bose, K.B.


Athreya, G.J. Bahu, N. Cressie, A. C. Davison, P. Hall, J. Horowitz, D.
Isaacson, B. Y. Jing, H. Koul, D. Politis, and A. Young for their interest,
encouragement, and constructive suggestions at various stages of writing
the hook. Special thanks are due to K. Furukawa for help with the nu-
merical examples and to D. Nordman for carefully going over parts of the
manuscript. I also thank J. Fukuchi, Y. D. Lee, S. Sun, and J. Zhu who
have enriched my research on the topic as students at various time points.
I thank my wife for her moral support and understanding. Many thanks go
to Sharon Shepard for converting my scrihhlings into a typed manuscript
with extraordinary accuracy and consistency. I also thank Springer's Ed-
itor, John Kimmel, for his patience and good humor over the long time
period of this project. I gratefully acknowledge the continuous support of
the National Science Foundation for my research work in this area.
Contents

1 Scope of Resampling Methods for Dependent Data 1


1.1 The Bootstrap Principle 1
1.2 Examples . . . . 7
1.3 Concluding Remarks 12
1.4 Notation..... 13

2 Bootstrap Methods 17
2.1 Introduction...................... 17
2.2 IID Bootstrap.. . . . . . . . . . . . . . . . . . . 17
2.3 Inadequacy of IID Bootstrap for Dependent Data. 21
2.4 Bootstrap Based on IID Innovations 23
2.5 Moving Block Bootstrap . . . . . 25
2.6 Nonoverlapping Block Bootstrap 30
2.7 Generalized Block Bootstrap .. 31
2.7.1 Circular Block Bootstrap 33
2.7.2 Stationary Block Bootstrap 34
2.8 Subsampling . . . . . . . . . . . 37
2.9 Transformation-Based Bootstrap 40
2.10 Sieve Bootstrap. . . . . . . . . 41

3 Properties of Block Bootstrap Methods for the Sample


Mean 45
3.1 Introduction..................... 45
3.2 Consistency of MBB, NBB, CBB: Sample Mean. 47
XII Contents

3.2.1 Consistency of Bootstrap Variance Estimators 48


3.2.2 Consistency of Distribution Function Estimators 54
3.3 Consistency of the SB: Sample Mean . . . . . . . . . . . 57
3.3.1 Consistency of SB Variance Estimators . . . . . 57
3.3.2 Consistency of SB Distribution Function Estimators 63

4 Extensions and Examples 73


4.1 Introduction . . . . . . . 73
4.2 Smooth Functions of Means 73
4.3 M-Estimators . . . . . . . . 81
4.4 Differentiable Functionals . 90
4.4.1 Bootstrapping the Empirical Process. 92
4.4.2 Consistency of the MBB for Differentiable
Statistical Functionals 94
4.5 Examples . . . . . . . . . . . . . . . . . . 99

5 Comparison of Block Bootstrap Methods 115


5.1 Introduction . . . . . . . . . 115
5.2 Empirical Comparisons. . . 116
5.3 The Theoretical Framework 118
5.4 Expansions for the MSEs . 120
5.5 Theoretical Comparisons. . 123
5.5.1 Asymptotic Efficiency 123
5.5.2 Comparison at Optimal Block Lengths. 124
5.6 Concluding Remarks . . . . . . . . . . . . . . . 126
5.7 P r o o f s . . . . . . . . . . . . . . . . . . . . . . . 127
5.7.1 Proofs of Theorems 5.1-5.2 for the MBB, the NBB,
and the CBB . . . . . . . . . . . . . . 128
5.7.2 Proofs of Theorems 5.1-5.2 for the SB 135

6 Second-Order Properties 145


6.1 Introduction . . . . . . . 145
6.2 Edgeworth Expansions for the Mean Under Independence 147
6.3 Edgeworth Expansions for the Mean Under Dependence 154
6.4 Expansions for Functions of Sample Means . . . . . . 160
6.4.1 Expansions Under the Smooth Function Model
Under Independence . . . . . . . . . . . . . 160
6.4.2 Expansions for Normalized and Studentized
Statistics Under Independence . . . . . . . . . . . . 163
6.4.3 Expansions for Normalized Statistics Under
Dependence . . . . . . . . . . . . . . . . . . 164
6.4.4 Expansions for Studentized Statistics Under
Dependence . . . . . . . . . . . . . . . . . . . 166
6.5 Second-Order Properties of Block Bootstrap Methods 168
Contents XIII

1 Empirical Choice of the Block Size 115


7.1 Introduction........................ 175
7.2 Theoretical Optimal Block Lengths. . . . . . . . . . . 175
7.2.1 Optimal Block Lengths for Bias and Variance
Estimation . . . . . . . . . . . . . . . . . . . . 177
7.2.2 Optimal Block Lengths for Distribution Function
Estimation . . . . . . . . . 179
7.3 A Method Based on Subsampling . 182
7.4 A Nonparametric Plug-in Method. 186
7.4.1 Motivation . . . . . . . . . 187
7.4.2 The Bias Estimator . . . . 188
7.4.3 The JAB Variance Estimator 189
7.4.4 The Optimal Block Length Estimator 193

8 Model-Based Bootstrap 199


8.1 Introduction . . . . . . 199
8.2 Bootstrapping Stationary Autoregressive Processes 200
8.3 Bootstrapping Explosive Autoregressive Processes 205
8.4 Bootstrapping Unstable Autoregressive Processes 209
8.5 Bootstrapping a Stationary ARMA Process 214

9 Frequency Domain Bootstrap 221


9.1 Introduction.................. 221
9.2 Bootstrapping Ratio Statistics . . . . . . . 222
9.2.1 Spectral Means and Ratio Statistics 222
9.2.2 Frequency Domain Bootstrap for Ratio Statistics 224
9.2.3 Second-Order Correctness of the FDB . . . . . . 226
9.3 Bootstrapping Spectral Density Estimators . . . . . . . 228
9.3.1 Frequency Domain Bootstrap for Spectral Density
Estimation . . . . . . . . . . . . . . . . . . . . 229
9.3.2 Consistency of the FDB Distribution Function
Estimator . . . . . . 231
9.3.3 Bandwidth Selection 233
9.4 A Modified FDB . . . . . . 235
9.4.1 Motivation . . . . . 236
9.4.2 The Autoregressive-Aided FDB . 237

10 Long-Range Dependence 241


10.1 Introduction . . . . . . . . . . . . . . . . . . . 241
10.2 A Class of Long-Range Dependent Processes 242
10.3 Properties of the MBB Method 244
10.3.1 Main Results . . . . . . . . . . . 244
10.3.2 Proofs . . . . . . . . . . . . . . . 246
10.4 Properties of the Subsampling Method. 251
10.4.1 Results on the Normalized Sample Mean. 252
XIV Contents

10.4.2 Results on the Studentized Sample Mean 253


10.4.3 Proofs . . . 255
10.5 Numerical Results . . . . . . . . . . . . . . . . . 257

11 Bootstrapping Heavy-Tailed Data and Extremes 261


11.1 Introduction. . . . . . . . . 261
11.2 Heavy-Tailed Distributions 262
11.3 Consistency of the MBB . . 265
11.4 Invalidity of the MBB . . . 268
11.5 Extremes of Stationary Random Variables 271
11.6 Results on Bootstrapping Extremes. . . . 274
11. 7 Bootstrapping Extremes With Estimated Constants 277

12 Resampling Methods for Spatial Data 281


12.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . 281
12.2 Spatial Asymptotic Frameworks. . . . . . . . . . . . 282
12.3 Block Bootstrap for Spatial Data on a Regular Grid 283
12.3.1 Description of the Block Bootstrap Method . 284
12.3.2 Numerical Examples . . . . . . . . . . . . . . 288
12.3.3 Consistency of Bootstrap Variance Estimators 292
12.3.4 Results on the Empirical Distribution Function 301
12.3.5 Differentiable Functionals . . . . . . . 304
12.4 Estimation of Spatial Covariance Parameters 307
12.4.1 The Variogram . . . . . . . . . . . . 307
12.4.2 Least Squares Variogram Estimation 308
12.4.3 The RGLS Method. . . . . . . . . . 310
12.4.4 Properties of the RGLS Estimators. 312
12.4.5 Numerical Examples . . . . . . . . . 315
12.5 Bootstrap for Irregularly Spaced Spatial Data 319
12.5.1 A Class of Spatial Stochastic Designs 319
12.5.2 Asymptotic Distribution of M-Estimators 320
12.5.3 A Spatial Block Bootstrap Method . . . . 323
12.5.4 Properties of the Spatial Bootstrap Method 325
12.6 Resampling Methods for Spatial Prediction 328
12.6.1 Prediction of Integrals . . . 328
12.6.2 Prediction of Point Values . . . . . . 335

A 339

B 345

References 349

Author Index 367

Subject Index 371


1
Scope of Resampling Methods for
Dependent Data

1.1 The Bootstrap Principle


The bootstrap is a computer-intensive method that provides answers to
a large class of statistical inference problems without stringent structural
assumptions on the underlying random process generating the data. Since
its introduction by Efron (1979), the bootstrap has found its application
to a number of statistical problems, including many standard ones, where
it has outperformed the existing methodology as well as to many com-
plex problems where conventional approaches failed to provide satisfactory
answers. However, it is not a panacea for every problem of statistical infer-
ence, nor does it apply equally effectively to every type of random process
in its simplest form. In this monograph, we shall consider certain classes
of dependent processes and point out situations where different types of
bootstrap methods can be applied effectively, and also look at situations
where these methods run into problems and point out possible remedies, if
there is one known.
The bootstrap and other res amp ling methods typically apply to the sta-
tistical inference problems involving what we call1evel-2 (and higher-level)
parameters of the underlying random process. Let Xl, X 2, . .. be a se-
quence of random variables with joint distribution P. Suppose that the
data at hand can be modeled as a realization of the first n random vari-
ables {Xl, ... , Xn} == Xn . Also suppose that B == B(P) is a real-valued
(say) parameter of interest, which depends on the unknown joint distribu-
tion of the sequence Xl, X 2 , ... . A common problem of statistical infer-
2 1. Scope of Resampling Methods for Dependent Data

ence is to define an (point) estimator of e based on the observations X n ·


Many standard and general methods for finding estimators of e are typically
available, such as those based on likelihood theory (maximum likelihood,
quasi-likelihood), estimating equations (M-estimators), and nonparametric
smoothing (kernel estimators), depending on the form of the parameter e.
Suppose that en is an estimator of e based on X n . Having chosen en as an
estimator, the statistician needs to answer further questions regarding the
accuracy of the estimator en or about the quality of inference based on en.
Let G n denote the sampling distribution of the centered estimator en - e.
Because the joint distribution of Xl, ... , Xn is unknown, G n also typi-
cally remains unknown. Thus, quantities like the mean squared error of en,
viz. MSE(en ) = J x 2 dG n (x), and the quantiles of en, are unknown popula-
tion quantities based on the sampling distribution G n . We call parameters
like e, level-l parameters and parameters like MSE(en ), which relate to
the sampling distribution of an estimator of a level-1 parameter, level-2
parameters. Bootstrap and other resampling methods can be regarded as
general methods for finding estimators of level-2 parameters. In the same
vein, functionals related to the sampling distribution of an estimator of a
level-2 parameter are level-3 parameters, and so on. For estimating such
higher-level parameters, one may use a suitable number of iterations of
the bootstrap or may successively apply a combination of more than one
resampling method, e.g., the Jackknife-After-Bootstrap method of Efron
(1992) (see Example 1.3 below).
The basic principle underlying the bootstrap method in various settings
and in all its different forms is a simple one; it attempts to recreate the
relation between the "population" and the "sample" by considering the
sample as an epitome of the underlying population and, by res amp ling
from it (suitably) to generate the "bootstrap sample", which serves as an
analog of the given sample. If the res amp ling mechanism is chosen appropri-
ately, then the "resample," together with the sample at hand, is expected
to reflect the original relation between the population and the sample. The
advantage derived from this exercise is that the statistician can now avoid
the problem of having to deal with the unknown "population" directly, and
instead, use the "sample" and the "resamples," which are either known or
have known distributions, to address questions of statistical inference re-
garding the unknown population quantities. This (bootstrap) principle is
most transparent in the case where X I, ... , X n are independent and iden-
tically distributed (iid) random variables. First we describe the principle
for iid random variables, and then describe it for dependent variables.
Suppose, for now, that Xl"'" Xn are iid random variables with com-
mon distribution F. Then, the joint distribution of Xl"'" Xn is given by
Pn = F n , the n-fold product of F. The level-1 parameter e is now com-
pletely specified by a functional of the underlying marginal distribution F.
Hence, suppose that e = 8(F). Let en = t(X I , ... , Xn) be an estimator
1.1 The Bootstrap Principle 3

of e. Suppose that we are interested in estimating some population char-


acteristic, such as the mean squared error (MSE) of On. It is clear that
the sampling distribution of the centered estimator On - e and, hence, the
MSE of On, depend on the population distribution function F, which is
itself unknown. Note that in the present context, we would know F if we
could observe all (potential) members of the underlying population from
which the sample Xl' ... ' Xn was drawn. For example, if Xi denoted the
hexamine content of the ith pallet produced by a palletizing machine under
identical production conditions, then we would know F and the distribu-
tion of random variables like On - e if all possible pallets were produced
using the machine. This may not be possible in a given span of time or
may not even be realistically achievable since the long-run performance
of the machine is subject to physical laws of deterioration, violating the
"identical" -assumption on the resulting observations.
The bootstrap principle addresses this problem without requiring the full
knowledge of the population. The first step involves constructing an estima-
tor Fn , say, of F from the available observations Xl' ... ' X n , which presum-
ably provides a representative picture of the population and plays the role
of F. The next step involves generating iid random variables Xi, ... ,X~
from the estimator Fn (conditional on the observations X n ), which serve
the role of the "sample" for the bootstrap version of the original problem.
Thus, the "bootstrap version" of the estimator On based on the original
sample Xl, ... ,Xn is given by e~, obtained by replacing X!, . .. ,Xn with
Xi, ... ,X~, and the "bootstrap version" of the level-l parameter e = e(F)
based on the population distribution function F is given by e(Fn). Note
that the bootstrap versions of both the population parameter e and the
sample-based estimator On can be defined using the knowledge of the sam-
ple Xl' ... ' Xn only. For a reasonable choice of Fn , the bootstrap version
accurately mimics those characteristics of the population and the sample
that determine the sampling distribution of variables like On -e. As a result,
the bootstrap principle serves as a general method for estimating level-2
parameters related to the unknown distribution of On - e. Specifically, the
bootstrap estimator of the unknown sampling distribution On of the ran-
dom variable On - e is given by the conditional distribution On, say, of its
bootstrap version e~ - e(Fn). And the bootstrap estimator of the level-2
parameter <Pn == <p(On), derived through a functional <p(.) of On, is simply
given by the "plug-in" estimator 'Pn == <p(On). For example, the bootstrap
estimators of the bias and MSE of On (i.e., the first two moments of On - e),
are respectively given by

and
4 1. Scope of Resampling Methods for Dependent Data

where E* denotes the conditional expectation given X!, ... , X n . Note that
these formulas are valid for any estimator On, and do not presuppose any
specific form.
The most common choice of Fn is the empirical distribution function
FnO == n- l I:~=l n(Xi :::; .), where n(A) denotes the indicator function of
a statement A, and it takes the values 0 or 1 according as the statement
A is false or true. In this case, the bootstrap random variables Xi, ... , X;
represent a simple random sample drawn with replacement from the ob-
served sample Xl"'" X n , and as a result, can assume only the n data
values observed, thereby justifying the name "resample." There are other
situations, e.g., the parametric bootstrap, where Xi, ... ,X; are generated
by a different estimated distribution and may take values other than the
observed values Xl"'" X n . However, the basic characteristics of the pop-
ulation are captured through the estimated distribution in both cases and
are reflected in the respective bootstrap observations Xi, ... ,X;. Thus, in
spite of such variations due to different choices of Fn , all these bootstrap
methods fall under the general bootstrap principle described above.
The same can be said about the bootstrap methods that have been pro-
posed in the context of dependent data. Here, the situation is slightly more
complicated because the "population" is not characterized entirely by the
one-dimensional marginal distribution F alone, but requires the knowledge
of the joint distribution of the whole sequence Xl, X 2, . . . . Nonetheless, the
basic principle still applies. Here, we consider the block bootstrap methods
that are most commonly used in the context of general time series data, and
show that these methods fall within the ambit of the bootstrap principle.
For simplicity, we restrict the discussion to the case of the nonoverlapping
block bootstrap (NBB) method (cf. Carlstein (1986)). The principle behind
other block bootstrap methods presented in Chapter 2 can be explained by
straightforward modification of our discussion below.
Suppose that Xl, X 2 , •.. is a sequence of stationary and weakly depen-
dent random variables such that the series of the autocovariances of Xi'S
converges absolutely. To fix ideas, first consider the case where the level-
1 parameter () of interest is the population mean, i.e., () = E(XI)' If we
use On == X n, the sample mean, as an estimator of (), then the distribu-
tion of On - () depends on not only on the marginal distribution of X I,
but it is a functional of the joint distribution of Xl"'" X n . For example,
Var(On) = n- l [Var(XI ) + 2 I:~:11(1- i/n)Cov(XI , Xl+ i )] depends on the
bivariate distribution of Xl and Xi for all 1 :::; i :::; n. Note that since the
process {Xn}ne:: l is assumed to be weakly dependent, the main contribu-
tion to Var(On) comes from the lower-order lag autocovariances and the
total contribution from higher-order lags is negligible. More specifically, if
£ is a large positive integer (but smaller than n), then the total contribution
to Var(On) from lags of order £ or more, viz., I:~:/ (1- i/n)Cov(XI, Xl+ i ),
is bounded in absolute value by I::e ICov(X I , Xl+i)l, which tends to zero
1.1 The Bootstrap Principle 5

e
as goes to infinity. As a consequence, accurate approximations for the
level-2 parameter Var(On) can be generated from the knowledge of the lag
covariances COV(Xl' X Hi ), 0 :S i < e, which depend on the joint distribu-
tion of the shorter series {Xl, ... , X£} of the given sequence of observations
{Xl, ... ,Xn }.
The block bootstrap methods exploit this fact to recreate the relation be-
tween the "population" and the "sample," in a way similar to the iid case.
Suppose that we are interested in approximating the sampling distribution
of a random variable ofthe form Tn = tn(Xn; B), where B == B(P) is a level-1
parameter based on the joint distribution P of Xl, X 2 , •.. and t n (·; B) is in-
variant under permutations of Xl' ... ' X n . For example, we may have Tn =
e
nl/2(Xn - f.l), with B = f.l == EX l . Next, suppose that is an integer such
e e e
that both and n/ are large, e.g., = ln" J for some 0 < 8 < 1, where for
any real number x, l x J denotes the largest integer not exceeding x. Also, for
simplicity, suppose that b == n/ e is an integer. Then, for the NBB method,
the given sequence of observations {Xl, ... ,Xn } is partitioned into b sub-
series or "blocks" {Xl, ... , Xt}, {X£+l, ... , X 2l }, ... , {X(b-l)l+l' ... ' Xn}
of length e, and a set of b blocks are resampled from these observed blocks
to generate the bootstrap sample Xi, ... , X~. The NBB version of Tn is de-
fined as T;' == tn(X;'; en), where X;' == {Xi, ... , X~} and en is an estimator
of B based on the conditional distribution of Xi, ... ,X~.
Again, the bootstrap principle underlying the NBB attempts to recreate
the relation between the population and the sample, although in a slightly
different way. Let Pk denote the joint distribution of (Xl, ... , Xk), k ~ 1,
and let Yl , ... , Yi, denote the b blocks under the NBB, defined by Yl =
{Xl, ... ,Xt}, ... ,Yi, = {X(b-l)l+l, ... ,Xn }. Note that because of sta-
tionarity, each block has the same (e-dimensional joint) distribution Pe.
Furthermore, because of the weak dependence of the original sequence
{Xn}n>l, these blocks are approximately independent for large values of
e. Thus, Yl , ... , Yb gives us a collection of "approximately independent"
and "identically distributed" random vectors with common distribution Pl.
By resampling from the collection Yl , ... , Yb randomly, the "block" boot-
strap method described above actually reproduces the relation between
the sample {Xl, ... , Xn} and the "approximate" population distribution
PI == P£ 0 ···0 Pl, which is "close" to the exact population distribution
Pn because of the weak-dependence assumption on {Xn }n2:l. Indeed, if Pt
denotes the empirical distribution of Yl , ... , Yi" then the joint distribution
of the bootstrap observations Xi, ... ,X~ under the NBB is given by Pl.
Thus, the "resampling population" distribution Pj is close to PI, which is
in turn close to the true underlying population Pn . Hence, the relation be-
tween {Xl, ... , Xn} and Pn in the original problem is reproduced (approx-
imately) by the relation between {Xi, ... , X~} and PI under the NBB. Let
£(W; Q) denote the probability distribution of a random quantity Wunder
a probability measure Q. For the random quantity Tn = tn(Xn; Bn(Pn)) of
6 1. Scope of Resampling Methods for Dependent Data

interest, the approximations involved in application of the bootstrap prin-


ciple may be summarized by the following description:

C(TniPn) Pn(tn(Xn;en(Pn)) E·)


~ Pf(tn(Xn; en(pl)) E .)
~ P2(tn(X~i e n (P2)) E .)
-b
C(T~; P£) . (1.1)

The. justification of the first approximation follows from the weak-


dependence assumption on the original process {Xn }n>l. The second
approximation rests on the bootstrap principle, which says the relation
Xn : Pn is reproduced in the relation X;;: : PI, so that for nice functions
t n (-;·) and eno, the distribution of Tn under Pn is "close" to the distribu-
tion of T;;: underPI. However, it should be remembered that this is just a
heuristic argument, which requires further qualification of the quality and
validity of the above approximations in each specific problem. In the sub-
sequent chapters of this book, we look at different situations where this
simple principle provides a correct solution, often outperforming alterna-
tive traditional inference methods. We also consider certain pathological
cases where the simple principle, if applied naively, fails drastically.
Before we look at some illustrative examples, it is worthwhile to point
out the important role played by the computer in the implementation of the
principle. Although the bootstrap principle in (1.1) prescribes an estimator
(i.e., a function of the data) of the unknown population quantity £(Tn; Pn )
under a very general framework, it does not spell out any general method for
evaluating this estimator. An exact computation of the bootstrap estimator
is impractical, if not impossible, in most situations. This is due to the fact
that for a given set Xn of n data values, the number of possible values
of the NBB observations X;;: is (b)b, which grows to infinity very fast. For
example, if fI. = lnO J for some 0 < r5 < 1, then this number grows as fast as
n(1-o)n1-<5 as n goes to infinity. For r5 = 1/3, this is comparable to 1.8 x 10 19
for n = 64 and to (10)200 for n = 1000. This problem is tackled by using a
further Monte-Carlo approximation to the last step in (1.1). It is precisely
in this step, where the computer plays an indispensable role. Let X~(j), j =
1, ... ,B be iid copies of X;;: having the common distribution PI, and let
T~(j) = tn (X~(j) i P2), j = 1, ... , B denote the "bootstrap replicates" of
T;;:. Then, by the strong law of large numbers (or the Glivenko-Cantelli
Theorem),

! L n(T~(j)
B

j=l
E .) ~ Pf(T~ E .) , (1.2)

if B is large. Therefore, the computer can be used to obtain approximations


for PI(T;;: E .) to any degree of accuracy by generating a sufficiently large
1.2 Examples 7

number of X~(j),s by block-resampling from the given set Xn of observations


and by computing the bootstrap replicates T~(j) 's.
In the next section, we look at some illustrative examples involving de-
pendent data where the bootstrap principle can be applied effectively.

1.2 Examples
The first example illustrates some of the basic usage of the bootstrap
method with a simulated data set.

Example 1.1: Suppose that the observations Xl, . .. , Xn are generated


by a stationary ARMA(l,l) process:

(1.3)

where la11 < 1, 1,811 < 1 are constants and {fihEZ is a sequence of
iid random variables with Ef1 = 0, Eff = 1. Figure 1.1 shows a sim-
ulated data set of size n = 100 from the ARMA(l,l) process in (1.3)
with a1 = 0.2, ,81 = 0.3 and with Gaussian variables fl. Suppose we are
interested in estimating the variance and certain other population charac-
teristics of the sample mean Xn = n- 1 E~=l Xi (which, according to our
terminology, are level-2 parameters). For the sake of illustration, suppose
that we decided to use the NBB method with block length £ = 5, say, then
we first form the blocks 8 1 = (Xl, ... , X 5 ), 8 2 = (X6 , ... , X lO ), ... ,820 =
(X96, ... , X lOO ), and then, resample b = 20 blocks 8i, ... , 8 20 with re-
placement from {81 , ... , 8 20 } to generate the block bootstrap observations
Xi,··· ,Xioo·

'l' L-------__________________________ ~ ________ ~

o 20 40 60 80 100

FIGURE 1.1. A simulated data set of size n = 100 from model (1.3) with (31 = 0.2
and al = 0.3.

Example 1.1.1: Variance Estimation


First, we consider estimating the level-2 parameter 'Pn Var(Xn). As
described before, the bootstrap estimator of Var(Xn) is given by

(1.4)
8 1. Scope of Resampling Methods for Dependent Data

where X~ = n- 1 L~=l Xt and n = 100. Unlike most problems, this is


a special case where the bootstrap estimator V;;;(Xn) can be evaluated
directly, without any recourse to Monte-Carlo simulation. It will be shown
in Chapter 3 that we can express the bootstrap estimator as

(1.5)

where Vi denotes the mean of the jth block. Thus, for our example,
Vi = (Xl + .. ·+X5 )/5, V2 = (X6 +·· .+XlO )/5, ... , etc. Applying (1.5) to
the data set of Figure 1.1, we obtain V;;;(Xn) = 8.77 X 10- 3 as an estimate
of Var(Xn). This should be compared to the true value of Var(Xn), which
is 11.74 x 10- 3 under the assumed specification of model (1.3).

Example 1.1.2: Distribution Function Estimation


Next we consider estimating the sampling distribution of the studentized
sample mean Tn = y'n(Xn - p,)/an , where p, = EX 1 (which is zero un-
der the model (1.3)) and a; is an estimator of the (asymptotic) variance
of y'nXn . For definiteness, suppose that a; = nV;;;(Xn ), where V;;;(Xn)
is given by (1.5). In principle, to obtain the block bootstrap estimator
of Gn(x) == P(Tn :::; x), x E JR, we have to form the bootstrap version
T~ = y'n(X~ - E*X~)/a~ of Tn and estimate Gn(x) by the conditional
distribution function Gn(x) == P*(T~ :::; x), x E JR. Here, the bootstrap
variable a~2 is defined by replacing Xl' ... ' Xn in a; with Xi, ... , X~.
However, unlike the block bootstrap estimator of Var(Xn) in (1.5), an
explicit, simple expression for Gn(x) is not available. For evaluating the
block bootstrap estimator Gn(x) for the given data set, we used Monte-
Carlo simulation as follows. From the "observed" blocks B1 , ..• , B20 , formed
using the data set in Figure 1.1, we repeated the resampling step (cf.
(1.2)) B = 2000 times to generate 2000 sets of bootstrap observations
{ Xl*(1) , ... , X lOO
*(1)} {*(2000)
, ... , Xl
*(2000)
, ... , X lOO _
}. Here, the value B - 2000
is chosen arbitrarily for the purpose of illustration. A different value may
be used, if necessary, depending on the desired level of accuracy. Next, we
computed T~ for each set of 100 bootstrap observations. The histogram of
these 2000 values of T~ is shown in Figure 1.2 below. The block bootstrap
estimator of P(Tn :::; x) is now given by the proportion of T~-values that
are less than or equal to x. For example, for x = 1.2, this yields 0.8755 as
an estimate of P(Tn :::; 1.2). Similarly, we obtain 0.3170 and 0.4965 as es-
timates of P(Tn :::; -0.5) and P(Tn :::; 0), respectively, from the histogram
of T~-values.

Example 1.1.3: Estimation of Critical Values


If, instead of the distribution function of Tn, it is of interest to find certain
quantiles of Tn, bootstrap estimates of quantiles of Tn can also be found
1.2 Examples 9

8...

FIGURE 1.2. Histogram of T~-values based on B = 2000 block bootstrap repli-


cates.

from the same Monte-Carlo simulation. For 0 < a < 1, let to; denote the
(smallest) a-quantile of Tn, defined by

to; = inf{x : P(Tn :::; x) :2: a} . (1.6)

Then, by the "plug-in principle," the bootstrap estimator of the level-2


parameter to; = G;;:l(a) is given by to; = G;;:l(a), the a-quantile of the
conditional distribution Gn (.) of T;, given Xl , ... , X n . The Monte-Carlo
approximation to this is simply the lBaJth order statistic of the B boot-
strap replicates {T;(l), ... , T;(B)} of Tn. Thus, for the data set of Fig-
ure 1.1, an estimate of the 0.9-quantile of Tn is the l(2000)(0.9)J = 1800th
order statistic of the B = 2000 bootstrap replicates of T; summarized in
the histogram in Figure 1.2, and is given by 1.3514.

Example 1.1.4: Bootstrap Confidence Intervals


Next we consider the problem of finding a confidence interval (C1) for /l
with confidence level, say, 80%. If the quantiles to; 's of Tn were known, an
equal-tailed 80% C1 for /l would be

(1.7)

However, since the sampling distribution of Tn is unknown, the quantiles of


Tn also remain unknown in practice. Here again, one may use the bootstrap
to find an approximate C1 for /l. An 80% bootstrap C1 for /l is now obtained
by replacing the quantiles to .9 and to .l of Tn by the corresponding quantiles
of the conditional distribution of T; as

(1.8)

For the data set of Figure 1.1 , the (Monte-Carlo) values of to.9 and to.l are
respectively given by 1.3514 and -1.3675, the 1800th and the 200th order
statistics of the 2000 bootstrap replicates of T;. The resulting bootstrap
C1 for /l is thus given by

(-0.1258,0.1287) . (1.9)
10 1. Scope of Resampling Methods for Dependent Data

Note that the true value of the parameter JL in this case is zero. It may be
of some interest to compare this CI with the traditional large-sample CI
for JL based on asymptotic normality of Tn. Indeed, for the given sample, a
80% approximate normal CI for JL is given by

(Xn -1.28o-n /v'n, xn + 1.28o-n /v'n) = (-0.1191,0.1206),

which happens to be shorter than the bootstrap confidence interval in (1.9).


How do the accuracies of the bootstrap CI in (1.8) and of its one-sided ver-
sion compare with those of the corresponding large-sample normal CIs?
Answers to these questions depend on second- and higher-order properties
of the bootstrap approximation P* (T;: :=:; .) to the sampling distribution
P(Tn :=:; .) of Tn. We shall address some of these issues in more details in
Chapter 6. D

The next three examples present instances of various inference prob-


lems involving level-2 and higher-level parameters, where the bootstrap
and other resampling methods can be applied effectively.

Example 1.2: Smoothing parameter selection. Consider the model

where m : [0, 1] -+ lR. is an unknown function, Xl, ... , Xn E [0, 1] are nonran-
dom design points, and {En}n>l is a sequence of unobservable zero mean
stationary random variables. Here, the function m(·) is a level-1 population
parameter that can be estimated from the observations 'YI , ... , Y n by one
of the "smoothing" methods. For the discussion here, suppose we use the
Nadaraya-Watson kernel estimator (cf. Nadaraya (1964), Watson (1964)):

to estimate m(·). Here K (-) is a probability density function on lR. with a


compact support and vanishing first moment, and h > 0 is a bandwidth.
Performance of mh(·) as an estimator of m(·) critically depends on the
bandwidth or smoothing parameter h. A standard measure of global accu-
racy of mhO is the mean integrated squared error (MISE), defined as

where J C [0, 1]. For optimum performance, we need to use the estima-
tor mh (-) with bandwidth h = h *, where h * minimizes the risk function
MISE(h). Note that the function MISE(h) and, hence, h* depend on the
sampling distribution of the estimator mh 0 of a level-1 parameter m(·),
1.2 Examples 11

and thus, are level-2 parameters. In this case, one can apply the bootstrap
principle to obtain estimators of the risk function MISE(h) and the opti-
mum bandwidth h*. For independent error variables {En}n~l' estimation
of these second level parameters by resampling methods has been initiated
by Taylor (1989), Faraway and Jhun (1990) (although in the context of
density estimation) and has been treated in detail in Hall (1992) and Shao
and Tu (1995). For the case of dependent errors, estimation of MISE(h) and
h* becomes somewhat more involved due to the effect of serial correlation
of the observations. But the bootstrap principle still works. Indeed, block
bootstrap methods can be used to estimate the level-2 parameters MISE(h)
and h * consistently for a wide class of dependent processes including a class
of long range dependent processes (cf. Hall, Lahiri and Polzehl (1995)). D

Example 1.3: Estimation of variogram parameters. Let {Z(s) : s E JR.d}


be a random field. Suppose that Z(·) is intrinsically stationary, i.e.,

E(Z(s + h) - Z(s)) E(Z(h) - Z(O)),


Var(Z(s + h) - Z(s)) Var(Z(h) - Z(O)), (1.10)
for all s, h E ]Rd. Intrinsic stationarity of a random field is similar to the
concept of second-order stationarity (cf. Cressie (1993), Chapter 2) and
is commonly used in geostatistical applications. Like the autocovariance
function, the second moment structure of an intrinsic random field may be
described in terms of its variogram 2,,((·), defined by

2"((h) = Var(Z(h) - Z(O)), hE JR.d. (1.11)

Estimation of the variogram is an important problem in Geostatistics.


When the true variogram lies in a parametric model {2,,((·; B) : B E 8}
of valid variograms, a popular approach is to estimate the variogram pa-
rameter B by minimizing a least-squares criterion of the form
K K
Qn(B) == L L vij(B)[2i'n(hi ) - 2"((hi; B)][2i'n(hj ) - 2"((hj; B)]
i=l j=1

where V(B) == ((vij(B)))KXK is a K x K positive-definite weight ma-


trix and 2i'n(hi ) is a nonparametric estimator of the variogram at lag hi,
1 :::; i :::; K. Statistical efficiency of the resulting least-squares estimator,
say, en,v, depends on the choice of the weight matrix V. An optimal choice
of V depends on the covariance matrix E (say) of the vector of variogram
estimators (2i'n(h 1 ), ... , 2i'n(hK ))'. This again presents an example of a
problem where an "optimal" estimator of the level-1 parameter B requires
the knowledge of the level-2 parameter E. Thus, one may apply the boot-
strap or other resampling methods for spatial data to estimate the level-2
parameter first and minimize the estimated criterion function to obtain an
12 1. Scope of Resampling Methods for Dependent Data

estimator of the level-1 parameter. In Chapter 11, we show that the esti-
mator derived using this approach is asymptotically optimal. 0

Example 1.4: Selection of optimal block size. Performance of a block boot-


strap method critically depends on the particular block length employed
in finding the bootstrap estimator. Suppose that en is an estimator of a
level-1 parameter of interest, (), based on observations Xl' ... ' X n , and that
we want to estimate some characteristic rpn of the sampling distribution of
(en - ()). Thus, rpn is a level-2 parameter. When the observations are corre-
lated, we may apply a block bootstrap method, based on blocks of length
C E (1, n) to estimate rpn. Let <Pn(C) denote the block bootstrap estimator
of rpn. Typically, <Pn (C) is a consistent estimator of rpn for a wide range of
values of the block length C. If one considers the bias and the variance of
<Pn(C) as an estimator of rpn, in many examples the bias of <Pn(C) decreases
as the block length C increases, while the variance of <Pn (C) increases with
C. Thus, there is an optimal value, say Co, of the block length that balances
the trade off between the bias and the variance, and thus minimizes the
MSE. Therefore, if we define CO as CO == argmin{MSE(<pn(C)) : 1 < C < n},
it is clear that CO is an unknown parameter that depends on the sampling
distributions of the estimators <Pn(C), 1 < C < n of a level-2 parameter rpn
and hence, CO is a level-3 parameter. It may certainly be possible to use a
plug-in estimator of CO for a given rpn, using analytical calculations. How-
ever, a general method that would be applicable more widely and without
new analytical calculations for each case may be developed by applying two
rounds of resampling methods iteratively. In Chapter 7, we present two such
methods that yield non parametric estimators of the optimal block length
Co. 0

1.3 Concluding Remarks


In this chapter, we described the basic principle underlying the bootstrap
methods for dependent and independent data. The key point that we try to
convey is that the bootstrap and other resampling methods can be viewed
as "general" methods for inferring about level-2 and higher-level param-
eters, just as the Maximum Likelihood and similar methods may be con-
sidered as general methods for estimating level-1 parameters. The basic
idea underlying different versions of the bootstrap is the same, which is
to recreate the original "population versus sample" relation for a level-
2 inference problem through a suitable res amp ling mechanism. Section 1.1
attempts to describe this for the (nonoverlapping) block bootstrap method.
The arguments apply to the other bootstrap methods described in Chapter
2 with minor modifications. In Section 1.2, we presented examples of some
inference problems for dependent data where resampling methods were nat-
1.4 Notation 13

urally applicable. In Example 1.1, we illustrated the use of the (nonover-


lapping) block bootstrap method with a given data set and showed how it
might be used in practice to address various inference issues. Examples 1.2,
1.3, and 1.4 were described in less specific terms. They served as examples
of inference problems involving level-2 and higher-level parameters where
the bootstrap and other resampling methods might be used effectively, and,
thus, illustrated the "scope" of bootstrap methods for dependent data. In
subsequent chapters, we describe various forms of bootstrap methods and
investigate their properties (success and failure) in similar (level-2+) infer-
ence problems under different dependence structures of the data.

1.4 Notation
For reference later on, we collect some common notation to be used in the
rest of the book. Let Nand Z, respectively, denote the set of all positive
integers and the set of all integers. Also, let Z+ = {O} U N be the set
of all nonnegative integers. Let JR denote the set of all real numbers and
C the set of all complex numbers. The extended real line is denoted by
lR = JRU{ -oo,oo}. The Borel o--field on a metric space § is denoted by B(§)
and for a set A C §, let cl.(A) and 8A respectively denote the closure and
the boundary of A. For a real number x, let l x J denote the largest integer
not exceeding x, let Ix 1 denote the smallest integer not less than x, and let
x+ = max{ x, O}. For x, Y E JR, let x /\ Y = min{ x, y} and x Vy = max{ x, y}.
Let n(8) denote the indicator function of a statement 8, with n(8) = 1
if 8 is true and n(8) = 0 otherwise. For a subset A of a nonempty set n,
we also write nA for the function nA(W) == n(w E A), wEn. For a finite
set A, we write IAI to denote the size, i.e., the number of elements of A.
Also, for a set A E B(JR k ), kEN, we write vol.(A) to denote the volume or
the Lebesgue measure of A. For kEN, let lh denote the identity matrix
of order k. Let r' denote the transpose of a matrix r. We write a m x n
matrix r with (i,j)-th element '"'Iij, 1 ~ i ~ m, 1 ~ j ~ n as r = (('"'Iij)) or
as r = (("(ij))mxn' Let det(r) denote the determinant of a square matrix
r.
As a convention, (random) vectors in JR k , kEN are regarded as column
vectors in this book. For x = (Xl"'" Xk)', y = (Yl,"" Yk)' E JR k , and 0: =
(0:1, ... ,O:k)' E Z~ (k EN), write xC> = I17=1 Xfi, Ixl = IXll+' "+Ixkl, o:! =
I17=1 O:i!, and Ilxll = (xi+" .+xD l / 2, and write x ~ Y if Xi ~ Yi for all 1 ~
i ~ k. For a m x n matrix A, write IIAII = sup{IIAxll : x E JRn, Ilxll = I} for
the spectral norm of A. For a smooth function h : JRk ----+ JR, let Djh denote
the partial derivative of h with respect to the j-th coordinate, i.e., Djh =
!xh . For 0: E Z~, let DC> denote the differential operator DC> == D~l ... D~k
J
8"'l +···+"'k k 11
- 8X~l , ... ,8x:k on JR . Write ~ = v -1. For a real valued function f defined
14 1. Scope of Resampling Methods for Dependent Data

on a nonempty set A, we write "argmin{f(x) : x E A}" to denote a value


of the argument x E A at which f attains its minimum over the set A
(assuming that the minimum is attainable on A).
Let C denote a generic constant in (0,00) that may assume different
values at each appearance and that does not depend on the limit variables
like the sample size n. For example, C :::; C /2 is a valid inequality under
this convention, which should be interpreted as C 1 :::; C 2 /2 for some con-
stants C 1 and C2 . In some instances, we also use C(·) as a notation for
generic constants that may depend on the arguments specified within the
parentheses.
For sequences {a n }n>l
-
C lR, {b n }n>l
-
C (0, (0), we write

an = O(b n ) as n -+ 00 if lim sup lanl/b n < 00 ,


n-->CX)

and
bn » an as n -+ 00 if an = o(bn ) as n -+ 00 .
For {an}n~l' {bn}n~l C (0, (0), we write

an rv bn as n -+ 00 if lim an/b n = 1
n-->CX)

and

Similarly, for a sequence of random variables {X n }n>l, and a sequence


{b n }n>l C (0,00), we write

Xn = op(bn ) as n -+ 00 if Xn/bn -+ ° in probability as n -+ 00

and

i.e., for every E > 0, there exists M E (0, (0) such that

sUPP(IXn/bnl > M) < E .


n~l

Unless otherwise specified, the limits in order symbols are taken letting the
variable "n" tend to infinity. Thus, "an = o(b n )" is the same as "an = o(b n )
as n -+ 00".
Convergence in distribution and convergence in probability of random
entities are respectively denoted by ----+d and ----+p. Almost sure conver-
gence with respect to a measure v is written as a.s. (v) or simply, a.s.,
1.4 Notation 15

if the relevant measure v is clear from the context. In the later case, we
also use "a.s." as an abbreviation for "almost sure" or "almost surely," as
appropriate.
For k-dimensional random vectors X and Y with EIIXI1 2 + EIIYI12 < 00,
we define the covariance matrix of X and Y and the variance matrix of X
as
Cov(X, Y) = E{(X - EX)(Y - EY)'} and Var(X) = Cov(X, X) ,
respectively. For a random variable X and for p E [1,00], we define the
LP-norm of X by
(EIXIP)l/p if p E [1, (0)
IIXllp= {
ess. sup{X} if p = 00 .

For a collection of a-fields {Fi : i E I} on a nonempty set fl, we write


ViE1Fi to denote the smallest a-field containing all F i , i E I. Furthermore,
for a collection of random vectors {Xi : i E I} on a probability space
(O,F, P), we write a({Xi : i E I}) to denote the sub a-field of:F generated
by {Xi: i E I}. For a random vector X and a a-field Q, we write £(X)
and £(XIQ) to denote the probability distribution of X and the conditional
probability distribution of X given Q, respectively. For two random vectors
X and Y, we write X =d Y if £(X) = £(Y). For a distribution G and for a
random vector X, we write X rv G if £(X) = G. For a nonempty finite set
A, we say that a random variable X has the Discrete Uniform distribution
on A if
1
P(X = a) = lAI for all a EA.

For a k x k positive definite matrix ~, let <l>d') and <I>(.;~) both denote
the Gaussian distribution N(O,~) on JRk with mean zero and covariance
matrix ~. Let ¢~ and ¢(';~) both denote the density of <I>~ with respect
to the Lebesgue measure on JRk, given by
¢~(x) =¢(x;~) = (27f)-k/2[det(~)rl/2exp(-x'~-lx/2), x EJR k .

Furthermore, we use <I>~ and/or <I>(.;~) also to denote the distribution


function of the N(O,~) distribution. Thus, <I>~ and/or <I>(.;~) stands for
either of the two functions

<I>(x; ~) <I>~(x) == r
J(-oo,xl
cp(y;L.)dy, x E JRk,

i
and
<I>(A;~) <I>~(A) == ¢(y; ~)dy, A E B(JR k ) .

When ~ = Ilk, we abbreviate <I>~ and ¢~ as <I> and ¢, respectively. The


dependence of <I> and ¢ on the dimension k is suppressed in the notation
and will be clear from the context.
16 1. Scope of Resampling Methods for Dependent Data

As a convention, notation for random and nonrandom entities are "local"


to a section where they appear, i.e., the same symbol may have different
meanings in two different sections. Similarly, the numbering of conditions
are "local" to a chapter. Unless otherwise mentioned, the symbols for ran-
dom and nonrandom entities and the condition labels refer to their local
definitions. For referring to a condition introduced in another chapter, we
add the chapter number as a prefix. For example, an occurrence of Condi-
tion 5.Dr in Chapter 6 refers to Condition Dr of Chapter 5, etc. We use the
abbreviations cdf (cumulative distribution function), CI (confidence inter-
val), iid (independent and identically distributed), and MSE (mean squared
error), as convenient. We also use a box D to denote the end of a proof or
of an example.
2
Bootstrap Methods

2 .1 Introduction
In this chapter, we describe various commonly used bootstrap methods
that have been proposed in the literature. Section 2.2 begins with a brief
description of Efron's (1979) bootstrap method based on simple random
sampling of the data, which forms the basis for almost all other bootstrap
methods. In Section 2.3, we describe the famous example of Singh (1981),
which points out the limitation of this res amp ling scheme for dependent
variables. In Section 2.4, we present bootstrap methods for time-series mod-
els driven by iid variables, such as the autoregression model. In Sections 2.5,
2.6, and 2.7, we describe various block bootstrap methods. A description of
the subs amp ling method is given in Section 2.8. Bootstrap methods based
on the discrete Fourier transform of the data are described in Section 2.9,
while those based on the method of sieves are presented in Section 2.10.

2.2 IID Bootstrap


In this book, we refer to the non parametric res amp ling scheme of Efron
(1979) , introduced in the context of "iid data," as the IID bootstrap.
There are a few alternative terms used in the literature for Efron's (1979)
bootstrap, such as "naive" bootstrap, "ordinary" bootstrap, etc. These
terms may have a different meaning in this book, since (for example) using
18 2. Bootstrap Methods

the lID bootstrap may not be the "naive" thing to do for data with a
dependence structure.
We begin with the formulation of the IID bootstrap method of Efron
(1979). For the discussion in this section, assume that Xl, X 2, . .. is a
sequence of iid random variables with common distribution F. Suppose,
Xn = {X1, ... ,Xn } generate the data at hand and let Tn = tn(Xn;F),
n 2': 1 be a random variable of interest. Note that Tn depends on the data
as well as on the underlying unknown distribution F. Typical examples
of Tn include the normalized sample mean Tn == nl/2(Xn - p,)/(J" and the
studentized sample mean Tn == n 1 / 2(X n - p,)/ Sn where Xn = n- 1 2::~=1 Xi,
S;;, = n- 1 2::~=1 (Xi - Xn)2, p, = E(X 1 ), and (J"2 = Var(Xd. Let G n denote
the sampling distribution of Tn. The goal is to find an accurate approxima-
tion to the unknown distribution of Tn or to some population characteris-
tics, e.g., the standard error, of Tn. The bootstrap method of Efron (1979)
provides an effective way of addressing these problems without any model
assumptions on F.
Given X n , we draw a simple random sample X;' = {X;, ... ,X~} of size
m with replacement from X n . Thus, conditional on X n , {X;, ... , X~} are
iid random variables with

where P* denotes the conditional probability given X n . Hence, the common


distribution of Xi's is given by the empirical distribution
n
Fn = n- 1 L6xi ,
i=l

where 6y denotes the probability measure putting unit mass at y. Usually,


one chooses the res ample size m = n. However, there are several known ex-
amples where a different choice of m is desirable. See, for example, Athreya
(1987), Arcones and Gine (1989, 1991), Bickel, Gotze and van Zwet (1997),
Fukuchi (1994), and the references therein.
Next define the bootstrap version T;' n of Tn by replacing Xn with X;'
and F with Fn as '
T;',n = trn(X~; Fn) .

Also, let Grn,n denote the conditional distribution of T;',n' given X n . Then
the bootstrap principle advocates Grn,n as an estimator of the unknown
sampling distribution G n of Tn. If, instead of G n , one is interested in esti-
mating only a certain functional t.p( G n ) of the sampling distribution of Tn,
then the corresponding bootstrap estimator is given by plugging-in Grn,n
for G n , i.e., the bootstrap estimator of t.p(G n ) is given by t.p(Grn,n). For
example, if t.p(G n ) = Var(Tn) = J x 2dG n (x) - (J xdG n (X))2, the boot-
strap estimator of Var(Tn) is given by t.p(Grn,n) = Var(T;'.n I Xn) =
2.2 IID Bootstrap 19

J x 2dG m,n(x)-(J xdGm,n(X))2. Once the variables Xn have been observed,


the common distribution Fn of Xi's becomes known, and, hence, it is possi-
ble (at least theoretically) to find the conditional distribution Gm,n and the
bootstrap estimator 'P(Gm,n) from the knowledge of the data. In practice,
however, finding Gm,n exactly may be a daunting task, even in moder-
ate samples. This is because the number of possible distinct values of X;'
grows very rapidly, at the rate O(nm) as n ---t 00, m ---t 00 under the IID
bootstrap. Consequently, the conditional distribution of T;' n is further
approximated by Monte-Carlo simulations as described in Ch~pter 1.
To illustrate the main ideas, again consider the simplest example where
Tn = y'n(Xn - J-l)/a, the centered and scaled sample mean. Here J-l = EX 1
is the level-1 parameter we want to infer about. Following the description
given above, the bootstrap version T;',n of Tn based on a bootstrap sample
of size m is given by
T;',n = v'rii(X;;' - E*X;)/(Var*(X[))1/2
where X;;' = m- 1 2::1 Xi denotes the bootstrap sample mean based on
Xi, ... ,X;;" and E* and Var* respectively denote the conditional expec-
tation and conditional variance, given X n . It is clear that for any k ~ 1,

E*(X[)k = J
xkdFn(x) = n- 1 fx: .
i=l
(2.1)

In particular, this implies E*(Xi) = Xn , and Var*(Xi) s; =


n- 1 2:~=1 (Xi - Xn)2. Hence, we define T;',n by replacing Xn with X~
and J-l and a 2 by E*(Xi) and Var*(Xi), respectively. Thus, the bootstrap
version of Tn is given by
(2.2)
If, for example, we are interested in estimating 'Po:(Gn ) = the nth quan-
tile of Tn for some n E (0,1), then the bootstrap estimator of 'Po: (Gn) is
'Po:(Gm,n), the nth quantile of the conditional distribution of T;',n'
As mentioned above, determining Gm,n exactly is not very easy even
in this simple case. However, when EX? < 00, and m = n, we have the
following result. Recall that we use the abbreviation a.s. for almost sure or
almost surely, as appropriate, and we write <I>(-) to denote the distribution
function of the standard normal distribution on ~.
Theorem 2.1 If Xl, X 2 , ... are iid with a 2 = Var(Xt} E (0,00), then
sup JP*(T~,n :::; x) - <I>(x/a)J = 0(1) as n --+ 00, a.s. (2.3)
x

Proof: Since Xi, ... ,X~ are iid, by the Berry-Esseen Theorem (see The-
orem A.6, Appendix A)
sup JP*(T~ n :::; x) - <I>(x)J :::; (2.75)Lin , (2.4)
x '
20 2. Bootstrap Methods

3 r:::)
where sn2 = E.(X1• - Xn)
- 2
and ~n = E.IX1 - Xnl 3 j(snyn
A

. Clearly, by
• -

the Strong Law of Large Numbers (SLLN) (see Theorem A.3, Appendix
A),
n
S; = n- 1 LX; - (Xn)2 -7 (52 a.s.
i=l

and by the Marcinkiewicz-Zygmund SLLN (see Theorem A.4, Appendix


A),
°
n
n- 3 / 2 L IXi l3 -7 a.s.
i=l

Hence, An -70 a.s. as n -700, and Theorem 2.1 follows. o

Actually Theorem 2.1 holds for any resample size mn that goes to infinity
at a rate faster than loglog n, but the proof requires a different argument.
See Arcones and Gine (1989, 1991) for details.
Note that by the Central Limit Theorem (CLT), Tn also converges in
distribution to the N(O, 1) distribution. Hence, it follows that

~n sup IGn,n(x) - Gn(x)1


x
sup 1P.(T~,n ::; x) - P(Tn ::; x)1 = 0(1) as n -700, a.s.,
x
(2.5)

i.e., the conditional distribution Gn,n of T~,n generated by the IID boot-
strap method provides a valid approximation for the sampling distribution
G n of Tn. Under some additional conditions, Singh (1981) showed that

~n = O(n-l(loglogn)1/2) as n -7 00, a.s.

Therefore, the bootstrap approximation for P(Tn ::; .) is far more accu-
rate than the classical normal approximation, which has an error of or-
der O(n- 1/ 2). Similar optimality properties of the bootstrap approxima-
tion have been established in many important problems. The literature on
bootstrap methods for independent data is quite extensive. By now, there
exist some excellent sources that give comprehensive accounts of the the-
ory and applications of the bootstrap methods for independent data. We
refer the reader to the monographs by Efron (1982), Hall (1992), Mammen
(1992), Efron and Tibshirani (1993), Barbe and Bertail (1995), Shao and
Tu (1995), Davison and Hinkley (1997), and Chernick (1999) for the boot-
strap methodology for independent data. Here, we have described Efron's
(1979) bootstrap for iid data mainly as a prelude to the bootstrap methods
for dependent data considered in later sections, as the basic principles in
both cases are the same. Furthermore, it provides a historical account of
the developments that culminated in formulation of the bootstrap methods
for dependent data.
2.3 Inadequacy of lID Bootstrap for Dependent Data 21

2.3 Inadequacy of IID Bootstrap for Dependent


Data
The lID bootstrap method of Efron (1979), being very simple and gen-
eral, has found application to a hoard of statistical problems. However,
the general perception that the bootstrap is an "omnibus" method, giv-
ing accurate results in all problems automatically, is misleading. A prime
example of this appears in the seminal paper by Singh (1981), which in
addition to providing the first theoretical confirmation of the superiority of
the lID bootstrap, also pointed out its inadequacy for dependent data.
In this section we consider the aforementioned example of Singh (1981).
Suppose Xl, X 2 , ... is a sequence of m-dependent random variables with
EX I = J.t and EX? < 00. Recall that {Xn }n2:1 is called m-dependent for
some integer m :::: 0 if {XI, ... X k} and {Xk+m+ I, ... } are independent for
all k :::: 1. Thus, an iid sequence of random variables {En}n>l is O-dependent
and if we define Xn = En +0.5En+l, n:::: 1, with this iid sequence {En}n>l,
then {Xn } n 2: I is I-dependent.
Var(XI ) + 2 2:i=~ COV(XI' X Hi ) and Xn
2 m I -
Next, let am
n- l I:~=l Xi' If a;" E (0,00), then by the CLT for m-dependent variables
(cf. Theorem A.7, Appendix A),

(2.6)

where ---,>d denotes convergence in distribution. Now, suppose that we want


to estimate the sampling distribution of the random variable Tn = vn(Xn -
J.t) using the lID bootstrap. For simplicity, assume that the resample size
equals the sample size, i.e., from Xn = (Xl"'" X n ), an equal number of
bootstrap variables Xi, ... ,X~ are generated. Then, the bootstrap version
T:;,n of Tn is given by

where X~ = n- l I:~=l Xi· The conditional distribution of T:;,n under the


lID bootstrap method still converges to a normal distribution, but with a
"wrong" variance, as shown below.

Theorem 2.2 Suppose {Xn}n>l is a sequence of stationary m-dependent


random variables with EX I = ;;" and a 2 = Var(X 1 ) E (0, (0). Then

sup IP*(T:;,n ::; x) - <I>(x/a) I = 0(1) as n --+ 00, a.s. (2.7)


x

Proof: Note that conditional on X n , Xi, ... ,X~ are iid random variables.
As in the proof of Theorem 2.1, by the Berry-Esseen Theorem, it is enough
to show that
s~ --+ a 2 as n --+ 00 a.s.
22 2. Bootstrap Methods

and
n
L
IXi l3 --+ 0 as n --+ 00, a.s.
n- 3 / 2
i=l
These follow easily from the following lemma. Hence Theorem 2.2 is proved.
D

Lemma 2.1 Let {Xn }n>l be a sequence of stationary m-dependent ran-


dom variables. Suppose that f : lR --+ lR is a Borel measurable function with
Elf(Xdi P < 00 for some p E (0,00), and that Ef(X 1) = 0 ifp 2': 1. Then,
n
n- 1 / p L f(X i ) --+ 0 as n --+ 00, a.s.
i=l
Proof: This is most easily proved by splitting the given m-dependent
sequence {Xn }n;:>l into m + 1 iid subsequences {Yjdi~l' j = 1, ... , m + 1,
defined by Yji = Xj+(i-1)(m+1), and then applying the standard results for
iid random variables to {Yjdi;:>l'S (cf. Liu and Singh (1992)). For 1 ::; j ::;
m + 1, let I j == I jn = {I ::; i ::; n : j + (i - 1) (m + 1) ::; n} and let N j == N jn
denote the size of the set I j . Note that N j / n --+ (m + 1) -1 as n --+ 00 for all
1 ::; j ::; m + 1. Then, by the Marcinkiewicz-Zygmund SLLN (cf. Theorem
A.4, Appendix A) applied to each of the sequence of iid random variables
{Yjdi;:>l' j = 1, ... , m + 1, we get
n m+1
n- 1/ p L j(Xi ) = L [Nj- 1/ P L j(Yji)]' (Nj /n)l/ P --+ 0 as n --+ 00, a.s.
i=l j=l iEl j

This completes the proof of Lemma 2.1. D.

Corollary 2.1 Under the conditions of Theorem 2.2, if


L:1 Cov(X1,Xl+i ) =f. 0 and CT!, =f. 0, then for any x =f. 0,

lim [P*(T; , n ::; x) - P(Tn ::; x)]


n--+(X)
= [<I>(X/CT) - <I>(X/CToo )] =f. 0 a.s.

Proof: Follows from Theorem 2.2 and (2.6). D

Thus, for all x =f. 0, the IID bootstrap estimator P* (T;: n ::; x) of the level-
2 parameter P(Tn ::; x) has a mean squared error that tends to a nonzero
number in the limit and the bootstrap estimator of P(Tn ::; x) is not con-
sistent. Therefore, the lID bootstrap method fails drastically for dependent
data. It follows from the proof of Theorem 2.2 that res amp ling individual
Xi'S from the data Xn ignores the dependence structure of the sequence
{Xn }n;:> 1 completely, and thus, fails to account for the lag-covariance terms
(viz., Cov(X1 ,Xl+ i ), 1::; i::; m) in the asymptotic variance.
Following this result, there have been several attempts in the literature to
extend the lID bootstrap method to the dependent case. In the next section,
2.4 Bootstrap Based on lID Innovations 23

we first look at extensions of this method to certain dependent models gen-


erated by iid random variables. More general resampling schemes (such as
the block bootstrap and the frequency domain bootstrap methods), which
are applicable without any parametric model assumptions, have been put
forward in the literature much later. These are presented in Sections 2.5-
2.10.

2.4 Bootstrap Based on lID Innovations


Suppose {Xn}n~l is a sequence of random variables satisfying the equation
Xn = h(Xn-l, ... ,Xn- p; (3) + En , (2.8)
n > p, where (3 is a q x 1 vector of parameters, h : lRp +q ----t lR is a known
Borel measurable function, and {En}n>p is a sequence of iid random vari-
ables with common distribution F that are independent of the random
variables Xl' ... ' Xp. For identifiability of the model (2.8), assume that
EEl = o. A commonly used example of model (2.8) is the autoregressive
process of order p (cf. (2.9) below). Noting that the process {Xn}n~l is
driven by the innovations Ei'S that are iid, the IID bootstrap method can
be easily extended to the dependent model (2.8).
As before, suppose that Xn = {Xl, ... , Xn} denotes the sample and that
we want to approximate the sampling distribution of a random variable
Tn = tn(Xn; F, (3). Let {In be an estimator, e.g., the least squares estimator,
of (3 based on X n . Define the residuals
Ei = Xi - h(Xi- l , ... , Xi-pi (In), P < i ::s: n.
Note that, in general,
n-p
En == (n - p) -1 L Ei+p =I 0 .
i=l

Hence, we center the "raw" residuals Ei'S and define the "centered" residuals

Without such a centering, the resulting bootstrap approximation often has


a random bias that does not vanish in the limit and renders the approx-
imation useless. (See, for example, Freedman (1981), Shorack (1982), and
Lahiri (1992b) that treat a similar bias phenomenon in regression prob-
lems.)
Next draw a simple random sample E;+l' ... ' E~ of size (m-p) from {Ei :
p < i ::s: n} with replacement and define the bootstrap pseudo-observations,
using the model structure (2.8), as:
xt = Xi for i = 1, ... ,p, and
24 2. Bootstrap Methods

Note that by construction <:,


p < i ~ mare iid and E*Ei = O. The
bootstrap version of the random variable Tn = tn(Xn; F, (3) is defined as

where X,';, = {X;, ... , X,';,} and Fn denotes the empirical distribution of
the centered residuals Ei, p < i ~ n. The sampling distribution of Tn
is approximated by the conditional distribution of T,';, n given X n . For cer-
tain time-series models satisfying (2.8), different versio~s of this resampling
scheme have been proposed by Freedman (1984), Efron and Tibshirani
(1986), Swanepoel and van Wyk (1986), and Kreiss and Franke (1992).
The IID-innovation-bootstrap method can be applied with some simple
modifications to popular parametric models for spatial data as well (e.g.,
the spatial autoregression model); see Chapter 7, Cressie (1993).
A special case of model (2.8) is the autoregression model of order p
(AR(p)), given by

(2.9)

where (3 = ((31, ... , (3p) is the vector of autoregressive parameters, and


{En}n>p is an iid sequence satisfying the requirements of model (2.8). For
AR(p)- models, validity and the rate of approximation of the IID-Innovation
bootstrap have been well-studied in the literature. When the sequence
{X n }n>l is stationary, Bose (1988) shows that under suitable regularity
conditions, a version of the IID-innovation bootstrap approximation to the
sampling distribution of the standardized least square estimator is more
accurate than the normal approximation. For nonstationary cases, perfor-
mance of this method has been studied by Basawa, Mallik, McCormick
and Taylor (1989), Basawa, Mallik, McCormick, Reeves and Taylor (1991),
Datta (1995, 1996), Datta and Sriram (1997), and Heimann and Kreiss
(1996), among others. It follows from their work that the IID-innovation
bootstrap method is very sensitive to the values of the autoregression pa-
rameter vector (3. Indeed, if the value of (3 is such that the roots of the
characteristic equation zP + (31Zp~1 + ... + (3p = 0 lie on the unit circle,
then the IID-innovation bootstrap fails. Because of its dependence on the
validity of the model (2.9), and drastic change in the performance with
a small change in the parameter value, one needs to be particularly care-
ful when applying the IID-innovation bootstrap method. Properties of the
IID-innovation bootstrap and related model based bootstrap methods are
described in Chapter 8.
2.5 Moving Block Bootstrap 25

2.5 Moving Block Bootstrap


Bootstrap methods described in the previous sections are applicable either
under the hypothesis of independence or under specific model assumptions
for dependent data. The main idea in the latter case is to use the ap-
proximate independence of the residuals, and then apply the resampling
scheme of the IID-bootstrap method to get the right approximation. In a
problem where the statistician does not have enough prior knowledge to
specify such models, these methods are not very useful. In a significant
breakthrough, Kiinsch (1989) and Liu and Singh (1992) independently for-
mulated a substantially new resampling scheme, called the moving block
bootstrap (MBB), that is applicable to dependent data without any para-
metric model assumptions. In contrast to resampling a single observation
at a time, as has been commonly done under the earlier formulations of
the bootstrap, the MBB resamples blocks of (consecutive) observations at
a time. As a result, the dependence structure of the original observations
is preserved within each block. Furthermore, the common length of the
blocks increases with the sample size. As a result, when the data are gen-
erated by a weakly dependent process, the MBB reproduces the underlying
dependence structure of the process asymptotically. Essentially the same
principle was put forward by Hall (1985) in the context of bootstrapping
spatial data and by Carlstein (1986) for estimating the variance of a statis-
tic based on time series data. A description of Carlstein's method will be
given in the next section. We now turn to a description of the MBB.
Let Xl, X 2 , ... be a sequence of stationary random variables, and let
Xn = {Xl,,'" Xn} denote the observations. We shall define the MBB ver-
sion of estimators of the form en = T(Fn), where Fn denotes the empirical
distribution function of XI, ... ,Xn' and where T(·) is a (real-valued) func-
tional of Fn. Suppose f == fn E [1, n] is an integer. For dependent data, we
typically require that
f ---) 00 and n -1 f ---) 0 as n ---) 00 .

However, a description of the MBB can be given without this restriction.


Let Bi = (Xi"", XiH-d denote the block of length f starting with Xi,
1 ::; i ::; N where N = n - f + 1. (See Figure 2.1 below.) To obtain
the MBB samples, we randomly select a suitable number of blocks from
the collection {Bl , ... , BN }. Accordingly, let Bi, ... , Bic denote a simple
random sample drawn with replacement from {Bl"'" BN}' Note that each
of the selected blocks contains f elements. Denote the elements in Bi by
(X('i-l)£+l"" ,X:£), i = 1, ... , k. Then, Xi, ... ,X;;' constitute the MBB
sample of size Tn == kf. The MBB version e:",n of On is defined as

e:",n = T(F:",n) ,
where F:",n denotes the empirical distribution of (Xi, ... , X;;').
26 2. Bootstrap Methods

• • • • •

FIGURE 2.1. The collection {B 1 , ••• , B N } of overlapping blocks under the MBB.

An alternative formulation of the MBB can be given as follows. Note


that selecting the blocks Bi's randomly from {B 1 , ... , BN} is equiv-
alent to selecting k indices at random from the set {I, ... , N}. Ac-
cordingly, let h, ... , h be iid random variables with the discrete uni-
form distribution on {l, ... , N}. If we set Bi = Bfi for i = 1, ... , k,
then Bj, ... , Bic represent a simple random sample drawn with re-
placement from {B1' ... ' BN }. The bootstrap sample Xi, ... , X;;' can
be defined using the resampled blocks Bj, ... , Bk as before. Note
that conditional on the data X n , the resampled blocks of observa-
tions (Xi, ... ,X;)',(Xi+1), ... ,X;e)', ... ,(X('k_1)H1, ... ,Xkd are iid £-
dimensional random vectors with

P*((X;, ... ,Xi)' = (Xj , ... ,XjH-t)')


= P*(h =j)
= N- 1 , for 1::; j ::; N, (2.10)

where P* denotes the conditional probability given Xn . In the special case,


when each block consists of a single element (Le., £ = 1), then by (2.10),
Xi, ... , X;;' are iid with the common distribution Fn, and hence, the MBB
reduces to the IID bootstrap method of Efron (1979) described in Section
2.2. For £ > 1, the £-dimensionaljoint distribution of the underlying process
{Xn }n>1 is preserved within the resampled blocks. Since £ tends to infinity
with n, any finite-dimensional joint distribution of {Xn }n;?:l-process at a
given number of finite lag distances can be eventually recovered from the
resampled values. As a result, the MBB can effectively capture those char-
acteristics of the underlying process {Xn }n;?:1 that are determined by the
dependence structure of the observations at short lags.
As in the case of the IID bootstrap, the MBB sample size is typically
chosen to be of the same order as the original sample size. If b1 denotes
the smallest integer such that b1£ ~ n, then one may select k = b1 blocks
to generate the MBB samples, and use only the first n values to define the
bootstrap version of Tn. However, there are some inference problems where
a smaller sample size works better (cf. Chapter 11).
2.5 Moving Block Bootstrap 27

Though estimators of the form en T(Fn) considered above include


many commonly used estimators, e.g., the sample mean, M-estimators of
location and scale, von Mises functionals, etc., they are not sufficiently rich
for applications in the time series context. This is primarily because en
above depends only on the one-dimensional marginal empirical distribu-
tion F n , and hence does not cover standard statistics like the sample lag
correlations, or the spectral density estimators. We shall now consider a
more general version of the MBB that covers such statistics.
Given the observations X n , let Fp,n denote the p-dimensional empirical
measure
n-p+l
Fp,n = (n - p + 1)-1 L DYj ,

j=1
where }j = (Xj , ... , XHp-d and where for any y E ]RP, Dy denotes the
probability measure on ]RP putting unit mass on y. The general version of
the MBB concerns estimators of the form

(2.11)

where T(·) is now a functional defined on a (rich) subset of the set of all
probability measures on ]RP. Here, p ~ 1 may be a fixed integer, or it may
tend to infinity with n suitably. Some important examples of (2.11) are
given below.

Example 2.1: A version of the sample lag covariance of order k ~ 0 is


given by

n-k
i'n(k) = (n - k)-1 L(XHk - Xn,k)(Xj - Xn,k) ,
j=1

where Xn,k = (n - k)-1 'L;::: Xj. Then, i'n(k) is of the form (2.11) with
p=k+l. 0

Example 2.2: Let 'l/J be a function from]RP x ]Rk into ]Rk such that

Here, () is a functional of the p-dimensional joint distribution of


(X!, ... , Xp), implicitly defined by the equation above. A generalized M-
estimator of the parameter () E ]Rk is defined (cf. Bustos (1982)) as a
solution of the equation

n-p+l
L'l/J(Xj, ... ,XHP-liTn)=O.
j=1
28 2. Bootstrap Methods

The generalized M-estimators can also be expressed in the form (2.11). D

Example 2.3: Let fO denote the spectral density ofthe process {Xn}n~l.
Then, a lag-window estimator of the spectral density (cf., Chapter 6, Priest-
ley (1981)) is given by
(n-l)
in(>\) = L w(k/P)-rn(k) cos(k'\), .\ E [0, n],
k=-(n-l)

where P == Pn tends to infinity at a rate slower than n and where w


is a weight function such that w(O) = (2n)-1 and w vanishes outside
the interval (-1, 1). For different choices of w, one gets various com-
monly used estimators of the spectral density, such as the truncated
periodogram estimator, the Bartlett estimator, etc. Since in is a func-
tion of -rn(O), ... ,-rn(P), from Example 2.1, it follows that we can express
it in the form (2.11). Note that in this example, P tends to infinity with n. D

To define the MBB version of en


in (2.11), fix a block size R, 1 < R <
n - P + 1, and define the blocks in terms of ¥i's as

13j = (Yj, ... , YjH-I), 1 -.5. j -.5. n - P - R+ 2 .


For k ::::: 1, select k blocks randomly from the collection {Bi : 1 -.5. i -.5. n-p-
R+2} to generate the MBB observations yt,···, 1£*; 1£'+1'···' Y2£;···' Y;',
where m = kR. The MBB version of (2.11) is now defined as

(2.12)

where P~,n m- l "Ej:l by/ denotes the empirical distribution of


yt, ... , Y;,.Thus, for estimators of the form (2.11), the MBB version is de-
fined by resampling from blocks of Y-values instead of blocks of X-values
themselves. This formulation of the MBB was initially given by Kiinsch
(1989) and was further explored by Politis and Romano (1992a). Clearly,
the definition (2.12) applies to both the cases where P is fixed and where
P tends to infinity with n. In the latter case, Politis and Romano (1992a)
called the modified blocking mechanism as the "blocks of blocks" boot-
strap, and gave a more general formulation that allows one to control the
amount of overlap between the successive blocks of Y-values. We refer the
reader to Politis and Romano (1992a) for the other versions of the "blocks
of blocks" bootstrap method.
Note that for the more general class of statistics en
given by (2.11) for
some P ::::: 2, there is an alternative way of defining the bootstrap version
of en.
Since the estimator en
can always be expressed as a function of the
given observations X!, ... , X n , one may define the bootstrap version of
en by resampling from Xl, ... , Xn directly. Specifically, suppose that the
2.5 Moving Block Bootstrap 29

block bootstrap observations xi, ... , x:n are generated by resampling from
the blocks Bi = {Xi, ... , XiH-I}, i = 1, ... , N of X-values. Then, define
bootstrap "analogs" of the p-dimensional variable Yi == (Xi, ... ,XHp - I )'
in terms of Xi, ... , x:n as Yi** == (xt, .. · ,Xt+p_l)', i = 1, ... ,m - p + 1.
Then, the bootstrap version of On under this alternative approach is defined
as

where F:;'~n = L:::~p+I 6y :*. We call this approach of defining the moving
block bootstrap version of On as the "naive" approach, and the other ap-
proach leading to e:nn in (2.12) as the "ordinary" approach of the MBB.
We shall also use the 'terms "naive" and "ordinary" in the context of boot-
strapping estimators of the form (2.11) using other block bootstrap methods
described later in this chapter.
For a comparison of the two approaches, suppose that {Xn}n~l is a
sequence of stationary random variables. Then, for each i, the random
vector Yi = (Xi"'" XHp-d' has the same distribution as (Xl"'" Xp)',
and hence, the resampled vectors Yi* under the "ordinary" approach al-
ways retains the dependence structure of (Xl, ... , Xp)'. However, when the
bootstrap blocks are selected by the "naive" approach, the bootstrap ob-
servations Xt's, that are at lags less than p and that lie near the boundary
of two adjacent resampled blocks B; and B;+1' are independent. Thus the
components of Yi** under the "naive" approach do not retain the depen-
dence structure of (Xl, ... , Xp)'. As a result, the naive approach introduces
additional bias in the bootstrap version e;,;: n of On. We shall, therefore, al-
ways use the "ordinary" form of a block b~otstrap method while defining
the bootstrap version of estimators On given by (2.11). For a numerical
example comparing the naive and the ordinary versions of the MBB and
certain other block bootstrap methods, see Section 4.5.
We conclude this section with two remarks. First, it is easy to see that the
above description of the MBB and the "blocks of blocks" bootstrap applies
almost verbatim if, to begin with, the observations Xl,' .. , Xn were ran-
dom vectors instead of random variables. Second, performance of a MBB
estimator critically depends on the block size e. Since the sampling dis-
tribution of a given estimator typically depends on the joint distribution
of Xl, ... ,Xn , the block size e must grow to infinity with the sample size
n to capture the dependence structure of the series {Xn}n?:l, eventually.
Typical choices of e are of the form e = Cno for some constants C > 0,
6 E (0,1/2). For more on properties of MBB estimators and effects of block
lengths on their performance, see Chapters 3-7.
30 2. Bootstrap Methods

2.6 Nonoverlapping Block Bootstrap


In this section, we consider the blocking rule due to Carlstein (1986). For
simplicity, here we shall consider estimators given by (2.11) with p = 1 only.
Extension to the case of a general p 2: 1 is straightforward. The key feature
of Carlstein's blocking rule is to use nonoverlapping segments of the data to
define the blocks. The corresponding block bootstrap method will be called
the nonoverlapping block bootstrap (NBB). Suppose that £ == £n E [1, n] is
an integer and b 2: 1 is the largest integer satisfying £b ::::; n. Then, define
the blocks
Bi b.
2 ) = (X(i-1)£+1l ... , XiJ!) , , i = 1, ... ,

(Here we use the index "2" in the superscipt to denote the blocks for the
NBB resampling scheme. We reserve the index 1 for the MBB and we shall
use the indices 3, 4, etc. for the other block bootstrap methods described
later.) Note that while the blocks in the MBB overlap, the blocks Bi
2 ),s

under the NBB do not. See Figure 2.2. As a result, the collection of blocks
from which the bootstrap blocks are selected is smaller than the collection
for the MBB.

FIGURE 2.2. The collection {Bi2) , ... , B~2)} of nonoverlapping blocks under Carl-
stein's (1986) rule.

The next step in implementing the NBB is exactly the same as that for
the MBB. We select a simple random sample of blocks B;(2), ... , BZ(2) with
replacement from {BF), ... , B?)} for some suitable integer k 2: 1. With
m = k.e, let F;;'\;: denote the empirical distribution of the bootstrap sam-
ple (X2',1, ... ,X2',J!; ... ;X2',{(b_1)£+1}, ... ,X2',m), obtained by writing the
elements of B;(2), ... , BZ(2) in a sequence. Then, the bootstrap version of
an estimator en= T(Fn) is given by
()*(2) =
m,n
T(F*(2))
m,n
. (2.13)

Even though the definition of the bootstrapped estimators are very sim-
ilar for the MBB and for the NBB, the resulting bootstrap versions ()::n n
and ();;i~J have different distributional properties. We illustrate the poi~t
with the simplest case, where en = n- 1 2:7=1 Xj is the sample mean. The
2.7 Generalized Block Bootstrap 31

bootstrap version of en under the two methods are respectively given by


m m
*
0m,n == m -1 ~X*
~ j'
and 0*(2)
m,n
= m- 1 ~
~
X*2,)'
.
j=l j=l

From (2.10), we get

i=l
N £
N- l 2: (C- 2: X 1 j+i-1)
j=l i=l

£-1 }
N- 1 { nXn - C- 1 ~(C - j)(Xj + Xn-j+d . (2.14)

To obtain a similar expression for E*(O~:J), note that under the NBB,
the bootstrap variables (X2',l, ... ,X2,£),· .. , (X~,(m-Hl)' ... ,X2',m) are iid,
with common distribution

(2.15)

for j = 1, ... , b. Hence,

E (0*(2))
* m,n
i=l
b £
b- l 2: (rl 2: X (j-l)Hi)
j=l i=l

(2.16)

which equals Xn if n is a multiple of C. Thus, the bootstrapped estimators


have different (conditional) means under the two methods. However, note
that if the process {Xn h> 1 satisfies some standard moment and mixing
conditions, then E{E*(O:n~) - E*O~:Jp = O(Cjn2). Hence the difference
between the two is negligible for large sample sizes.

2.7 Generalized Block Bootstrap


As follows from its description (cf. Section 2.5), the MBB resampling
scheme suffers from an undesirable boundary effect as it assigns lesser
32 2. Bootstrap Methods

weights to the observations toward the beginning and the end of the data
set than to the middle part. Indeed, for R :s: j :s: n - R, the jth obser-
vation Xj appears in exactly R of the blocks {B I , ... , B N }, whereas for
1 :s: j :s: R- 1, Xj and X n - j +1 appear only in j blocks. Since there is no
observation beyond Xn (or prior to Xl), we cannot define new blocks to get
rid of this boundary effect. A similar problem also exists under the NBB
with the observations near the end of the data sequence when n is not a
multiple of R. Politis and Romano (1992b) suggested a simple way out of
this boundary problem. Their idea is to wrap the data around a circle and
form additional blocks using the "circularly defined" observations.· Politis
and Romano (1992b, 1994b) put forward two resampling schemes based on
circular blocks, called the "circular block bootstrap" (CBB) and the "sta-
tionary bootstrap" (SB). Here we describe a generalization of their idea and
formulate the generalized block bootstrap method, which provides a unified
framework for describing different block bootstrap methods, including the
CBB and the SB.
Given the variables Xn = {X I, ... , X n }, first we define a new time series
Yn,i, i 2: 1 by periodic extension. Note that for any i 2: 1, there are integers
k i 2: 0 and ji E [1, n] such that i = kin + ji. Then, i = ji (modulo n). We
define the variables Yn,i, i 2: 1 by the relation Yn,i = X ji . Note that this
is equivalent to writing the variables Xl, ... , Xn repeatedly on a line and
labeling them serially as Yn,i, i 2: 1. See Figure 2.3.

Xn Xl
Yn,n Yn,(n+l)

FIGURE 2.3. The periodically extended time series Y n •i , i 2: 1.

Next define the blocks

B(i,j) = (Yn,i, ... , Yn,(i+j-l))

for i 2: 1, j 2: 1. Let f n be a transition probability function on the set


~n X ®:I({l, ... ,n} x N), i.e., for each x E ~n, fn(x;·) is a probability

measure on ®:l ({I, ... , n} x N) == {{ it, Rtl~l : 1 :s: it :s: n,l :s: Rt <
00 forall t 2: I} and for any set A c ®:I({l, ... ,n} x N), fn(-;A)
is a Borel measurable function from ~n into [0,1]. Then, the generalized
block bootstrap (GBB) resamples blocks from the collection {B(i,j) : i 2:
1, j 2: I} according to the transition probability function f n as follows. Let
(h, JI), (12, J 2 ), ... be a sequence of random vectors with conditional joint
distribution f n (Xn; .), given X n . Then, the blocks selected by the GBB
2.7 Generalized Block Bootstrap 33

are given by B(h, Jd, B(h, h), ... (which may not be independent). Let
X C,1,XC,2' ... denote the elements of these resampled blocks. Then, the
bootstrap version of an estimator On = T(Fn) under the GBB is defined as
();J;:,) = T(F;;'~<;')) for a suitable choice of m ~ 1, where F;;'~<;') denotes the
empirical distribution of Xc Xc
1> •.• ' m·
Almost all block bootstrap method~ proposed in the literature can be
shown to be special cases of the GBB. For example, for the MBB based on
a block length i, 1 ::; i::; n, the transition probability function f n is given
by
N
® ((N- L8 8£), x E IRn
CXl

fn(x;·) = 1 j ) X
i=l j=l

where N = n - i + 1 and 8y is the probability measure putting mass one at


y. In this case, f n(x;·) does not depend on x E IRn, and the random indices
(h, Jd, (h, h), . .. are conditionally iid random vectors with conditional
distribution
if 1::; j ::; Nand k= i
otherwise.

As a consequence, the resampled blocks B(h, J 1 ), B(I2 , h), ... , come from
the subcollection {B(i,j) : 1 ::; i ::; N,j = i}, which is the same as the
collection of overlapping blocks {B 1 , • .. , B N} defined in Section 2.5. Simi-
larly, the NBB method can also be shown to be a special case of the GBB.
Here, we consider a few other examples.

2.7.1 Circular Block Bootstrap


The Circular Block Bootstrap (eBB) method, proposed by Politis and
Romano (1992b) resamples overlapping and periodically extended blocks
of a given length i (say) satisfying 1 «: i «: n from the subcollection
in.
{B( i, i), ... , B( n, The transition function f n for the eBB is given by
n
® ((n- L8 8£), x E IRn.
CXl

fn(x;·) = 1 j ) x (2.17)
i=l j=l

Denote the resampling block indices for the eBB (i.e., the variables Ii'S
in the collection (h, J l ), (h, h), ... whose joint distribution is specified
by the f n (·,·) of (2.17)) by h,1,I3 ,2, .... Then, (2.17) implies that the
variables h,l,!3,2, ... are conditionally iid with P*(I3,1 = i) = n- 1 and
P* (Ji = i) = 1 for all i = 1, ... , n. Since each Xi appears exactly i times in
the collection of blocks {B(i, i), ... , B(n, in,
and since the eBB resamples
the blocks from this collection with equal probability, each of the original
observations Xl' ... ' Xn receives equal weight under the eBB. This prop-
erty distinguishes the eBB from its predecessors, viz., the MBB and the
34 2. Bootstrap Methods

NBB, which suffer from edge effects. This is also evident from the following
observation. Let X3' 1, X3' 2, ... denote the eBB observations obtained by
arranging the eleme~ts of the resampled blocks {B(I3,i,R) : i 2: 1} and let
- .(3)
Xm denote the CBB sample mean based on m bootstrap observations,
where m = kR. for some integer k 2: 1. Then, by (2.17),

E. [m- 1
f
2=1
X3',i]

t
g-lE. [t;X;,i]

g-1 [n- 1
{t,Yn,(Hi-l)}]

R-l[RXn]
Xn . (2.18)

Thus, the conditional expectation of the bootstrap sample mean under the
CBB equals the sample mean of the data X n , a property not shared by the
MBB or the NBB. As noted by Politis and Romano (1992b), this makes it
easier to define the bootstrap version of a pivotal quantity of the form Tn =
- • - .(3) - .
tn(Xn; fJ), where fJ = EX 1 . Under the CBB, Tm,n = tm(Xm ; Xn) gIves
the appropriate bootstrap version of Tn. However, replacing the population
parameter fJ simply by Xn to define the bootstrap version of Tn under the
MBB or the NBB introduces some extra bias and hence, it is no longer the
right thing to do (cf. Lahiri (1992a)). We shall look at properties of the
CBB method in Chapters 3, 4, and 5.

2.7.2 Stationary Block Bootstrap


The stationary bootstrap (SB) of Politis and Romano (1994b) differ from
the earlier block bootstrap methods (i.e., from MBB, NBB, and CBB) in
that it uses blocks of random lengths rather than blocks of a fixed length
R. Let P == Pn E (0,1) be such that P ---> 0 and np ---> 00 as n ---> 00. Then
the SB resamples the blocks B(I4,l, J 4,I), B(I4,2, J 4,2), ... where the index
vectors (I4,l, J4,1), (I4,2, J 4,2), ... are conditionally iid with I 4,1 having the
discrete uniform distribution on {1, ... , n}, and J 4 ,1 having the geometric
distribution Vn with parameter p, i.e.,

(2.19)
2.7 Generalized Block Bootstrap 35

Furthermore, 14 ,1 and J 4 ,1 are independent. Thus, the SB corresponds to


the G BB method with the transition function r n (-; .) given by
n
® ((n- 1 2: OJ) x Vn),
00

rn(x;·) = x E ffi. n .
i=l j=l

Note that here also, r n(x;,) does not depend on x E ffi. n .


The SB method can be described through an alternative formulation,
also given by Politis and Romano (1994b). Suppose, X4'1,X4'2"" denote
the SB observations, obtained by arranging the element~ of the resampled
blocks B(14,1, J4,I), B(14,2, J4,2),'" in a sequence. The sequence {X4',JiEN
may also be generated by the following resampling mechanism. Let Xi 1
be picked at random from {Xl, ... , Xn}, i.e., let X4',l = Yn,I4,1 where 14:1
is as above. To select the next observation X4' 2, we further randomize
and perform a binary experiment with probability of "Success" equal to
p. If the binary experiment results in a "Success," then we select X4' 2
again at random from {Xl, ... , X n}. Otherwise, we set X4',2 = Y n,(I4,1 +1),
the observation next to X4',l == Yn,I4,1 in the periodically extended series
{Yn ,ih:O:1' In general, given that X4',i has been chosen and is given by Yn,io
for some io ~ 1, the next SB observation X:,(i+l) is chosen as Y n,(io+1)
with probability (1- p) and is drawn at random from the original data set
Xn with probability p.
To see that these two formulations are equivalent, let Wi denote the
variable associated with the binary experiment for selecting X4' i' i ~ 2.
Then, conditional on X n , Wi,i ~ 2 are iid random variables with 'P* (Wi =
1) = p = 1- P*(Wi = 0), and {Wi: i ~ 2} is independent of {14,i : i ~ I}.
Next define the variables Mj,j ~ 0, by

Mo 1,
Mj inf {i ~ M j -1 + 1 : Wi = I}, j ~ 1.

Thus, M j , j E N denotes the trial number in the sequence of trials


{Wi : i ~ 2} at which the binary experiment resulted in the jth "Suc-
cess" and has a negative binomial distribution with parameters j and p
(up to a translation). Note that the corresponding SB observation, viz.,
X4' , M, is then selected at random from {Xl, ... , Xn} as X4''M = Yn"14 (-+1)'
j ~ 1. On the other end, for any i between M j -1 + 1 and M j - 1,
J J J

the binary experiment resulted in a block of "Failures," and the corre-


sponding SB observations are selected by picking (Mj - M j- 1 - 1) vari-
ables following Y n ,I4,j in the sequence {Yn,diEN. Thus, the binary tri-
als {Wi : i = M j - 1 , ... , M j - I} lead to the "SB block" of observa-
tions{X4'M , J -1
"",X4*(M--1)}
'J
= {Yn14
'
-"",Yn'(14
,j , J-+M--M
J J
~1-1)},j ~ 1.
Now, defining J 4 ,j = M j - M j - 1 , j ~ 1 and using the properties of the
negative binomial distribution (cf. Section XI.2, Feller (1971a)), we may
conclude that J 4 ,1, J 4 ,2, ... are (conditionally) iid and follow the geometric
36 2. Bootstrap Methods

distribution with parameter p. Hence, the two formulations of the SB are


equivalent.
An important property of the SB method is that conditional on X n , the
bootstrap observations {X4' JiEN are stationary (which is why it is called
the "stationary" bootstrap). A simple proof of this fact can be derived
using the second formulation of the SB as follows. Let {ZihEN be a Markov
chain on {I, ... , n} such that conditional on X n , the initial distribution of
the chain is 7r == (n- 1 , ... , n- 1 ), and the stationary transition probability
matrix of {ZihEN is Q == ((%)), where

1 :::; i < n, j = i + 1
1 :::; i < n, j =I- i + 1
(2.20)
i = n, 2:::; j :::; n
i = n,j = 1 .

Thus, Zl takes the values 1, ... , n with probability n- 1 each. Also, for
any k :::0: 1, given that Zk = i, 1 :::; i :::; n, the next index Zk+1 takes
the value i + 1 (modulo n) with probability p + n- 1 (1 - p) and it takes
each of the remaining (n - 1) values with probability n- 1 (1 - p). Thus,
from the second formulation of the SB described earlier, it follows that the
SB observations {X4' ihEN may also be generated by the index variables
{ZdiEN as '
X:,i = X z" i :::0: 1 . (2.21 )
To see that {X4' JiEN is stationary, note that by definition, the transition
matrix Q is doubly stochastic and that it satisfies the relation 7r' Q = 7r'.
Therefore, 7r is the stationary distribution of {ZdiEN and {ZihEN is a
stationary Markov chain. Thus, we have proved the following Theorem.

Theorem 2.3 Let Fin denote the a-field generated by Zi and Xn , i :::0: 1.
Then, conditional on Xn , {X4' i' Fin} iEN is a stationary Markov chain for
each n :::0: 1, i.e., '

and

In particular, Theorem 2.3 implies that conditional on X n , {Xi Ji>l is


stationary. Furthermore, by (2.20) and (2.21), for a given resample' siz~ m,
the conditional expectation of the SB sample mean X:J4) == m- 1 2:::1 X4',i
is given by
(2.22)
We shall consider other properties of the SB method in Chapters 3-5.
2.8 Subsampling 37

2.8 Subsampling
Use of different subsets of the data to approximate the bias and variance
of an estimator is a common practice, particularly in the context of iid
observations. For example, the Jackknife bias and variance estimators are
computed using subsets of size n-1 from the full sample Xn = (Xl"'" Xn)
(cf. Efron (1982)). However, as noted recently (see Carlstein (1986), Politis
and Romano (1994a), Hall and Jing (1996), Bickel et al. (1997), and the
references therein), subseries of dependent observations can also be used to
produce valid estimators of the bias, the variances, and more generally, of
the sampling distribution of a statistic under very weak assumptions.
To describe the subsampling method, suppose that On = tn(Xn) is an
estimator of a parameter (), such that for some normalizing constant an > 0,
the probability distribution Qn(x) = P(an(On -()) :::; x) ofthe centered and
scaled estimator On converges weakly to a limit distribution Q(x), i.e.,

Qn(x) ---. Q(x) as n ---. 00 (2.23)

for all continuity points x of Q. Furthermore, assume that an ---. 00 as


n ---. 00 and that Q is not degenerate at zero, i.e., Q( {O}) < 1. Let 1 :::; f :::; n
be a given integer and let

1 :::; i :::; N, denote the overlapping blocks of length f where N = n - f + 1.


Note that the blocks Hi'S are the same as those defined in Section 2.4 for the
MBB. Then, the subsampling estimator of Qn, based on the overlapping
version of the subsampling method, is given by

N
On(X) = N- l L ll(at(Oi,t - On) :::; x), x E IR , (2.24)
i=l

where Oi,l is a "copy" of the estimator On on the block Hi, defined by


Oi,£ = tt(Hi ), i = 1, ... , N. Note that we used tl(-) (in place of t n (-))
to define the subsample copy "Oi,t," as the ith block Hi contains only f
observations. That is also the reason behind using the scaling constant al
instead of an. From the above description, it follows that the overlapping
version of the subsampling method is a special case of the MBB where a
single block is resampled.
The estimator On of the distribution function Qn(x) can be used to
obtain subs amp ling estimators of the bias and the variance of On. Note
that the bias of On is given by
38 2. Bootstrap Methods

The subsampling estimator of Bias(On) is then obtained by replacing Qn(-)


by Qn(-), viz.,

(2.25)

Similarly, the subsampling estimator of the variance of On is given by

(2.26)

which is the sample variance of Oi,e 's multiplied by the scaling factor ai a~2.
In (2.25) and (2.26), we need to use the correction factors (a£!a n ) and
(aeJa n )2 to scale up from the level of Oi,R.'S, which are defined using £-
observations, to the level of On, which is defined using n-observations. In
applying a bootstrap method, one typically uses a resample size that is com-
parable to the original sample size, and therefore, such explicit corrections
of the bootstrap bias and variance estimators are usually unnecessary.
In analogy to the bootstrap methods, one may attempt to apply the
subsampling method to a centered variable of the form TIn == (On - ()).
However, this may not be the right thing to do. Indeed, if instead of using
the subsampling method for the scaled random variable an(On - ()), we
consider only the centered variable TIn = (On - ()), then the subsampling
estimator of the distribution Qln, say, of TIn would be given by
N
Qln(X) == N- I L n((Oi,e - On) ::; x), x E lR .
i=l

Since Bias(On) = ETln = J XdQln(X), using Qln(X), we would get

as an estimator of Bias(On) , and similarly, we would get

as an estimator of Var(On). However, these subsampling estimators of the


bias and the variance of On, defined using Qln(X), are very "poor" esti-
mators of the corresponding population parameters. To appreciate why,
consider the case where On = Xn and () = EXI and n l / 2 (On - ()) --.d
N(O, a!,) as n ----t 00 with a!, = 2:::-
00 Cov (Xl, Xi+!) =f. O. Write
2.8 Subs amp ling 39

X i ,£ for the average of the £ observations in Bi , i = 1, ... , N. Then,


- , _ -1 N - 2 '2 '_ -1 N - .
Varln(Bn) - N 2:i=1 X i ,£ - J.L n , where J.Ln = N 2:i=1 X i ,£ IS the av-
erage of the N block averages. Then, it is not difficult to show that under
some standard moment and weak dependence conditions on the process
{XihEZ and under the assumption that £-1 + n- I £ = 0(1) as n -+ 00,

'Va;ln((~n)
N
Var(X£) + N- I L {[Xi ,£ - B]2 - Var(X£)} - [Pn - B]2
i=1

(2.27)

whereas Var(Xn) = n- 1 a!, + O(n-2) as n -+ 00. Thus, 'Va;ln(lln) indeed


overestimates the variance of en by a scaling factor of n/ £, which blows up
to infinity with n. It is easy to see that the other estimator, viz., 'Va; (en)
is equal to £/n times 'Va;ln(e n ) in this case and thus, provides a sensible
estimator of Var(Xn). The reason that the subsampling estimator based
on TIn does not work in this case is that the limit distribution of TIn is
degenerate at zero, and does not satisfy the nondegeneracy requirement
stated above.
Formulas (2.24), (2.25), and (2.26) illustrate a very desirable property of
the subsampling method that holds true generally. Computations of QnO
and of estimates of other population quantities based on Qn do not involve
any resampling and hence, are less demanding. Typically, a simple, closed-
form expression can be written down for a subsampling estimator of a level-
2 parameter, and it needs computation of the subsampling version i ,£ of the e
estimator en only N times, as compared to a much larger number of times
for the res amp ling methods like the MBB. However, the price paid is the
lack of "automatic" second-order correctness of the subs amp ling method
compared to the MBB and other block bootstrap methods.
We conclude this section with an observation. As noted previously, the
subs amp ling method is a special case of the MBB where the number of
res amp led blocks is identically equal to 1. Exploiting this fact, we may sim-
ilarly define other versions of the subsampling method based on nonover-
lapping blocks or circular blocks. More generally, it is possible to extend
the subsampling method in the spirit of the GBB method. We define the
"generalized subsampling" method as the GBB method with a single sam-
ple (h, J I ) of the indices. Thus, the generalized subsampling estimator of
Qn(x) (d. (2.23)) is given by

where ejl,n = tJl (B(h, J I )) is a copy of en based on the GBB samples


from a single block B(h, Jd of length J I .
40 2. Bootstrap Methods

2.9 Transformation-Based Bootstrap


As described in Chapter 1, the basic idea behind the bootstrap method is
to recreate the relation between the population and the sample using the
sample itself. For dependent data, the most common approach to this prob-
lem is to resample "blocks" of observations instead of single observations,
which preserves the dependence structure of the underlying process within
the res amp led blocks and is able to reproduce the effect of dependence
at short lags. A quite different approach to the problem was suggested
by Hurvich and Zeger (1987). In their seminal work, Hurvich and Zeger
(1987) considered the discrete Fourier transform (DFT) of the data and
rather than resampling the data values directly, they applied the IID boot-
strap method of Efron (1979) to the DFT values. The transformation based
bootstrap (TBB) described here is a generalization of Hurvich and Zeger's
(1987) idea.
To describe it, let e == e(p) be a parameter of interest, which depends on
the underlying probability measure P generating the sequence {XdiEZ'
and let Tn == tn(Xn) be an estimator of e based on the observations
Xn = (Xl, ... ,Xn). Our goal is to approximate the sampling distribution
of a normalized or studentized statistic Rn = Tn(Xn; e). Let Yn = hn(Xn)
be a (one-to one) transformation of Xn such that the components of Yn,
say, {Yi : i E In}, are "approximately independent". Also suppose that
the variable Rn can be expressed (at least to a close approximation) in
terms of Yn as Rn = Tln(Yn; e) for some reasonable function TIn. Then,
to approximate the distribution of Rn by the TBB, we res ample from a
suitable subcollection {Yi : i E In} of {Yi : i E In} to get the bootstrap
observations Y~ == {~* : i E In} either by selecting a single Y-value at
a time as in the IID-bootstrap method of Efron (1979) or by selecting a
block of Y-values from {Yi : i E In} as in the MBB, depending on the
dependence structure of {Yi : i E In}. The TBB estimator of the distribu-
tion of Rn is then given by the conditional distribution of R~ == Tln(Y~; en)
given the data Xn , where en is an estimator of e based on X n . Thus, as a
principle, the TBB method suggests an additional transformation step to
reduce the dependence in the data to an iid structure or to a weaker form
of dependence.
An important example of the TBB method is the Frequency Domain
Bootstrap (FDB), which uses the Fourier transform of the data to generate
the Y-variables of the TBB. Suppose that {XdiEZ is a sequence of sta-
tionary, weakly dependent random variables. The Fourier transform of the
observations Xn is defined as

n
Yn(w) = n- I / 2 L Xj exp( -(wj), wE (-Jr, Jr] ,
j=l
2.10 Sieve Bootstrap 41

where recall that ~ = yCT. Though the Xi's are dependent, a well known
result in time-series states (cf. Brockwell and Davis (1991, Chapter 10);
Lahiri (2003a)) that for any set of distinct ordinates -7r < AI, ... ,Ak :::; 7r,
the Fourier transforms Yn(AI), ... , Yn(Ak) are asymptotically independent.
Furthermore, the original observations Xn admit a representation in terms
of the transformed values Yn = {Yn(Wj) : j E In} as (cf. Brockwell and
Davis (1991, Chapter 10)),

Xt=n-1/22::Yn(Wj)exp(LtWj), t=l, ... ,n (2.28)


jETn

where Wj = 27rj/n and In = {-l(n - 1)/2J, ... , In/2j}. Thus, using the
inversion formula (2.28), we can express a given variable Rn = rn(Xn; B)
also in terms of the transformed values Yn. Since the variables in Yn are
approximately independent, we may (suitably) resample these Y-values to
define the FDB version of Rn. Here, however, some care must be taken since
the (asymptotic) variance of the Y-variables are not necessarily identical.
A more complete description of the FDB method and its properties are
given in Chapter 9.

2.10 Sieve Bootstrap


Let {XihEZ be a stationary time series and let Tn = t n (X 1 , ... ,Xn ) be an
estimator of a level-1 parameter of interest B = B(P), where P denotes the
(unknown) joint distribution of {XihEZ, Then, the sampling distribution
of Tn is given by

(2.29)

for Borel sets B in ]R, where Po t:;;1 denotes the probability distribution on
]R induced by the transformation tn (.) under P. As described in Chapter 1,
the bootstrap and other resampling methods are general estimation meth-
ods for estimating the level-2 parameters like Gn(B), Var(Tn) , etc. When
the Xi's are iid with a common distribution F, we may write P = F oo
and an estimator of Gn(B) in (2.29) may be generated by replacing P with
Pn = F:t' in (2.28), where Fn is an estimator of F. However, when the
Xi's are dependent, such a factorization of P does not hold. In this case,
estimation of the level-2 parameter Gn(B) can be thought of as a two-step
procedure where, in the first step, P is approximated by a "simpler" prob-
ability distribution Pn and in the next step, Pn is estimated using the data
{X 1, ... , X n }. The idea of the sieve bootstrap is to choose {i\ }n~ 1 to be
a sieve approximation to P, i.e., {Fn }n>1 is a sequence of probability mea-
sures on (]Roo, 8(]ROO)) such that for each n, Fn+l is a finer approximation
to P than Fn and Fn converges to P (in some suitable sense) as n --+ 00.
42 2. Bootstrap Methods

For the block bootstrap methods like the NBB or the MBB, the first step
approximation Pn is taken to be Pc ® Pc ® ... , where Pc denotes the joint
distribution of the block {Xl, ... ,Xc} of length e. In the second step, Pc
is estimated by the empirical distribution of all overlapping (under MBB)
or nonoverlapping (under NBB) blocks of length e contained in the data.
For a large class of stationary processes, Biihlmann (1997) presents a sieve
bootstrap method based on a sieve of autoregressive processes of increas-
ing order, which we shall briefly describe here. However, other choices of
{Pn }n>l is possible. See Biihlmann (2002) for another interesting proposal
based on variable length Markov chains for finite state space categorical
time series. In general, there is a trade-off between the accuracy and the
range of validity of a given sieve bootstrap method. Typically, one may
choose a sieve to obtain a more accurate bootstrap estimator, but only at
the expense of restricting the applicability to a smaller class of processes
(cf. Lahiri (2002b)).
Let {XdiEZ be a stationary process with EX1 = f-L such that it admits
the one-sided moving average representation

00

Xi - f-L = L ajEi_j, i E Z (2.30)


j=O

where { Ei hEz is a sequence of zero mean uncorrelated random variables and


where ao = 1, I:~1 aT < 00. Suppose that {XdiEZ satisfies the standard
invertibility conditions for a linear process (cf. Theorem 7.6.9, Anderson
(1971)). Then, we can represent {Xi - f-LhEz as a one-sided infinite order
autoregressive process

00

(Xi - f-L) = L,8j(Xi - j - f-L) + Ei, i E Z (2.31 )


j=l

with 2::;:1,8J < 00. The representation (2.31) suggests that autoregressive
processes of finite orders Pn, n 2': 1, may be used to define a sieve ap-
proximation for the joint distribution P of {XihEz. To describe the sieve
bootstrap based on autoregression, let Xn = {Xl, ... , Xn} denote the ob-
servations from the process {XdiEZ. Let {Pn}n>l be a sequence of positive
p:
integers such that Pn i 00 as n -+ 00, but n- 1 -+ 0 as n -+ 00. The sieve
approximation Pn to P is determined by the autoregressive process

Pn
X i -f-L=L,8j(Xi- j -f-L)+Ei, iEZ. (2.32)
j=l

Next, we fit the AR(Pn) model (2.32) to the data Xn to obtain estimators
of the autoregression parameters b1n, ... ,bpnn (for example, by the least
2.10 Sieve Bootstrap 43

squares method). This yields the residuals


Pn
fin = (Xi - X) - 2:)3jn (Xi- j - X n), Pn + 1 ::; i ::; n
j=l

where Xn n- 1 2:~1 Xi. As in Section 2.4, we center the residuals at


En = (n - Pn)-l 2:~Pn+1 fin and resample from the centered residuals
{fin - En : Pn + 1 ::; i ::; n} to generate the sieve bootstrap error variables
Ei, i 2: Pn + 1. Then, the sieve bootstrap observations are generated by the
recursion relation
Pn
(xt - Xn) = L,8jn(Xt- j - Xn) + Ei, i 2: Pn +1
j=l

by setting the initial Pn-variables X;, ... ,X;n equal to X n. The autore-
gressive sieve bootstrap version of the estimator Tn = t n (X 1 , ... , Xn) is
now given by
T;",n = trn(X;, ... , X;,), m > Pn .
Under some regularity conditions on the variables {EihEZ of (2.30) and
the sieve parameter Pn, Biihlmann (1997) establishes consistency of the
autoregressive sieve bootstrap. It follows from his results that the autore-
gressive sieve bootstrap provides a more accurate variance estimator for the
class of estimators given by (2.11) than the MBB and the NBB. However,
consistency of the autoregressive sieve bootstrap variance estimators holds
for a more restricted class of processes than the block bootstrap methods.
See Biihlmann (1997), Choi and Hall (2000), and the references therein for
more about the properties of the autoregressive sieve bootstrap.
3
Properties of Block Bootstrap
Methods for the Sample Mean

3 .1 Introduction
In this chapter, we study the first-order properties of the MBB, the NBB,
the eBB, and the SB for the sample mean. Note that for the first three
block bootstrap methods, the block length is nonrandom. In Section 3.2,
we establish consistency of these block bootstrap methods for variance and
distribution function estimations for the sample mean. The SB method uses
a random block length and hence, requires a somewhat different treatment.
We study consistency properties of the SB method for the sample mean in
Section 3.3.
For later reference, we introduce some standard measures of weak depen-
dence for time series. Let (n, F, P) be a probability space and let A and E
be two sub a-fields of F. When A and E are independent, for any A E A
and any BEE, we have the relations ~1 == [P(A n B) - P(A) . P(B)] = 0
and ~2 == [P(BIA) - P(B)] = 0, provided P(A) =1= O. When A and E are
not independent, we may quantify the degree of dependence of A and E
by looking at the maximal values of ~1 or ~2 or of some other similar
quantities. This leads to the following coefficients of dependence:

Strong mixing or a-mixing:

a(A, E) = {IP(A n B) - P(A) . P(B)I : A E A, BEE} . (3.1)


46 3. Properties of Block Bootstrap Methods for the Sample Mean

¢-mixing:

¢(A, B) = { I P(AnB)
P(A) I
- P(B) : A E A,P(A) # O,B E B } . (3.2)

W-mixing:

w(A, B) = sup {I I
p(AnB) - 1 : A E A, B E B, P(A)
P(A)P(B) # 0, P(B) # 0 }
(3.3)
p-mixing:

p(A, B) = sup { ICov(X, Y)I : X E L 2(A), Y E L 2(B)} , (3.4)


JVar(X)JVar(Y)

where Cov(X, Y) = EXY - EXEY, Var(X) = Cov(X,X) and L2(A) =


{X: X is a random variable on (!1,A,P) with EX 2 < oo}. In general,

4 a(A,B) ~ p(A, B) ~ 2 ¢1/2(A,B)· ¢1/2(B,A)


and (3.5)
p(A, B) ::; w(A, B) .

See Chapter 1 of Doukhan (1994) for the properties of these mixing coef-
ficients. For an index set I c Z, I # 0, the mixing coefficients of a time
series {XihEI at lag m ~ 1 are defined by considering the maximal val-
ues of these coefficients over the <T-fields A = <T({Xi : i ~ k, i E I}) and
B = <T({Xi : i ~ k + m, i E I}) for all k E I. Specifically, we have the
following definition for the a-mixing and the p-mixing cases.

Definition 3.1 Let {Xn}nEN be a sequence of random variables on


(!1,A, P). Let F! = <T({Xi : a ~ i < b}), 1::; a ~ b ~ 00.

(1) The strong mixing (or a-mixing) coefficient of {Xi}~l is defined by

a(m) = sup {a (Ff+1, F~m+1) : kEN}, m ~ 1 (3.6)

where a{,·) is as defined in (3.1). The process {Xih:?:l is called


strongly mixing if a(m) --+ 0 as m --+ 00.

(2) The p-mixing coefficient of {Xi}~l is defined by

p(m) = sup {p(Ff+ 1 , F~m+1) : kEN}, m ~1 (3.7)

where p(.,,) is as defined in (3.4). The process {Xih:?:l is called p-


mixing if p(m) --+ 0 as m --+ 00.
3.2 Consistency of MBB, NBB, eBB: Sample Mean 47

For a doubly-infinite time series {XdiEZ' the coefficients a(m) and


p(m), m ~ 1 are defined by replacing the a-field Ff+l by F~~ == a({Xi :
i::; k}) and the set N by Z in (3.6) and (3.7), respectively. The ¢-mixing
and the W-mixing coefficients of {XihEZ are defined similarly. Each of the
four mixing conditions says that the dependence of the process {XihEZ
decreases as the distance m between the two segments {Xi : i ::; k} and
{Xi: i ~ k + m + 1} increases. Note that if a process is ma-dependent for
some ma ~ 0, then all these mixing coefficients are zero for all m > mo.
Thus, the notion of m-dependence is the most stringent than the four mea-
sures of weak dependence described above. And by (3.5), the notion of
strong mixing is the least stringent.
The following is a useful inequality for the covariance of mixing random
variables. Recall that for a random variable Wand p E [1,00], we define
the p- norm of W by

W = { (EIWIP)l/p, p E [1,00) (3.8)


II lip inf{x : P(IWI > x) = O}, p = 00 .
Proposition 3.1 Let X and Y be two random variables on a probability
space (0, F, P).
(a) If P(IXI ::; ad = 1, P(IYI ::; a2) = 1 for some al,a2 E (0,00), then
ICov(X, Y)I ::; 4ala2 a(a({X}),a({Y})) .

(b) Let p, q, r E (1,00) be any real numbers satisfying p-l +q-l +r- 1 = 1.
Then,
ICov(X, Y)I ::; 8[a(a({X}),a({Y}))F/r IIXllpllYllq .
(c)

ICov(X, Y)I ::; p(a({X}),a({Y})) IIXI1211Y112 .


Proof: See Section 1.2.2 of Doukhan (1994). o

In the next section, we establish consistency of the MBB, the NBB and
the eBB method for the sample mean.

3.2 Consistency of MBB, NBB, CBB: Sample


Mean
Let {XihEZ be a sequence of stationary random vectors taking values in
IRd. Let Tn denote the centered and scaled sample mean
48 3. Properties of Block Bootstrap Methods for the Sample Mean

where J.L = E(X1) and Xn = n- 1 L:~l Xi. In this section, we establish con-
sistency of the MBB, the NBB, and the CBB estimators ofthe (asymptotic)
covariance matrix
Var(Tn) == ETnT~
of Tn, and also, of the sampling distribution
Gn(x) == P(Tn :::; x), x E ]R.d

of Tn. For simplicity, we suppose that for each of the block bootstrap meth-
ods, b == ln/£J blocks are resampled and thus, the resample size is n1 = b£.
- *(1) - *(2) - *(3)
Write Xn ,Xn ,and Xn for the sample means of the n1 bootstrap
observations based on the MBB, the NBB, and the CBB, respectively. The
bootstrap versions of Tn are then given by

T~(j) == ..;nl (x~(j) - E*X~(j)) , j = 1,2,3 .

The bootstrap estimators of Var(Tn) are given by Var*(T~(j)), j = 1,2,3,


where, as before, E* and Var* respectively denote the conditional expec-
tation and the conditional variance, given X n . Similarly, the bootstrap es-
timators of GnO are given by the conditional distribution of T~(j), given
X n , j = 1,2,3. First, we consider properties of the variance estimators.

3.2.1 Consistency of Bootstrap Variance Estimators


The bootstrap variance estimators Var*(T~(j)), j = 1,2,3 have a very
desirable property, namely, that they can be expressed by simple, closed-
form formulas involving the observations X n , and thus, may be computed
directly without any Monte-Carlo simulations. This is possible because
of the linearity of the bootstrap sample mean in the resampled observa-
tions. Let Ui = (Xi + ... + XiH-I)/£ denote the average of the block
(Xi, ... ,XiH-I), i ~ 1, let U?) == U(i-1)£+1 = (X(i-l)£+l + ... + Xu)/£,
i ~ 1 be the average of the nonoverlapping blocks, and similarly, let
UP) = (Yn,i + : .. + Yn,(iH-l))/£, i ~ 1 be the block averages for the
periodically extended series {Yn,ih~l. Then, using the independence of
the resampled blocks, we get

and
X']
n
£ [n -1 "
~
u~3)
~
U(3)1
1.
- Xn n (3.9)
i=l
3.2 Consistency of MBB, NBB, eBB: Sample Mean 49

where N = n - e+ 1, Mn = N- 1 E~l Ui , and Mn,2 = b- 1 E~=l uF)·


When the process {XihEZ satisfies certain standard moment and strong
mixing conditions (such as those of Theorem 3.1 below), the asymptotic
covariance matrix of Tn is given by the infinite (matrix) series
00

Eoo == lim Var(Tn)


n----i'OO
= "~ EZ1Z~+i ,
i=-oo

where Zi = Xi - p" i E Z. Thus, the bootstrap estimators Var*(T~(j»), j =


1,2,3 may be viewed also as estimators of the population parameter Eoo.
The following result proves consistency of the bootstrap estimators for the
level-2 parameter Var(Tn) or, equivalently, for Eoo.
Theorem 3.1 Suppose that there exists a 8 > 0 such that EIIXl 11 2H < 00
and that E~l a(n)O/2H < 00. If, in addition, e-
1 + n- 1 = 0(1) as e
n -+ 00, then for j = 1,2,3,
Var* (T*(j»)
n
~ P E 00 as n -) 00 . (3.10)
Theorem 3.1 shows that under mild moment and strong mixing
conditions on the process {XihEZ, the bootstrap variance estimators
Var*(T~(j»), j = 1,2,3 are consistent for a wide range of bootstrap block
sizes e, so long as e tends to infinity with n but at a rate slower than n.
Thus, block sizes given by e = loglogn or e = n 1- e , 0 < € < 1, are all
admissible block lengths for the consistency of Var.(T~(j»), j = 1,2,3. We
shall show later (cf. Chapters 5 and 7), an optimal choice of that asymp-e,
totically minimizes the mean squared error of the block bootstrap variance
estimator Var*(T~(j»), is of the form e = Ojn 1/ 3 (1 + 0(1)) as n -) 00
where OJ > 0 is a suitable constant that depends on certain population
parameters.
For proving the theorem, we need the following lemma. Let U1i =
..Ji Ui = (Xi + ... + X i H-1)/..Ji, and ui;) = ..Ji uF), i ~ 1.
Lemma 3.1 Let f : ~d -+ ~ be a Borel measurable function and let
{XdiEZ be a (possibly nonstationary) sequence of random vectors with
strong mixing coefficient a(·). Define Ilflloo = sup{lf(x)1 : x E ~d}
and (2+o,n = max {(Elf(U1i )l2H) 1/(2H) : 1 :::; i :::; N}, 8 > O. Let
{Win: i ~ 1,n ~ 1} c [-1,1] be a collection of real numbers. Then, there
exist numerical constants 0 1 , O2, and constants 0 3 (8) and 0 4 (8) (none de-
pending on f('), e,
n, and Win'S), such that for any 1 < < n/2 and anye
n> 2,
(a)
50 3. Properties of Block Bootstrap Methods for the Sample Mean

< min { c, II 11I;,n£ [1+ '<~nlf "(k£)] ,

C,(WiH,nnC[' + "~/' a(k£)'/('Hl]}

(b)

var( 8 Winf(uii »))


b

< min { c, 11111;'+ + '<~nlf ,,(k£)] ,


C,( 6)(;H,n b [ 1 + '<~nlf "(kP)'/('Hl] }
Proof: First we consider part (a). We group the summands into "blocks
of blocks" such that alternate "blocks of blocks" sums are approximately
independent. Define the variables

2j£
R(j)= L
i=2(j-1)£+1
(3.11)

where J = IN/2€J and set R(J + 1) = 2::~=1 winf(U1i ) - 2:::=1 R(j).


Also, let 2::(1) and 2::(2) respectively denote summation over even and
odd j E {I, ... , J + I}. Note that for any 1 ::; j, j + k ::; J + 1,
k ::::: 2, the random variables R(j) and R(j + k) depend on disjoint sets
of Xi's that are separated by (k -1)2€ - € observations in between. Hence,
noting that IR(j)1 ::; 2€llfll= for all 1 ::; j ::; J + 1, by Proposition
3.1, we get ICov(R(j), R(j + k))1 ::; 4a((k - 1)2€ - €)(4€llflloo)2 for all
k ::::: 2, j ::::: 1. Therefore, using the inequalities (a + b)2 ::; 2(a 2 + b2 ) and
Var(Zl + ... + Zm) ::; 2::~=1 EZ] + m 2::~=~1 max{ICov(Zi, Zi+k) I : 1 ::;
i, i + k ::; m} for any set {Zl,"" Zm} of bounded random variable, we
have

Var ( t, Wi n f(U 1i ))

Var( 2::(1) R(j) + 2::(2) R(j))


::; 2 [var(2::(l) R(j)) + Var(2::(2) R(j))]
3.2 Consistency of MBB, NBB, CBB: Sample Mean 51

< 2 [I:
i=1
ER(j)2 + (J + 1)· L
1'5. k,!;J/2
a((2k - 1)2£ - f) . 4(4f ll !1100)2]

< 2 [(~ + 1)4f211!11~ + 64(nf)II!II~ t;a(kf)]

< C111!11~· [nf + (nf) t, a(kf)] .

This yields the first term in the upper bound in part (a). The second term in
the bound is obtained similarly by using the inequalities ICov(R(j), R(j +
k))1 ~ C(8) (ER(j)2+<5) 1/(2+<5) (ER(j + k)2+<5) 1/(2+<5) a((k -1)2f _ f).5/(2+<5)
and (ER(j)2+<5) 1/(2+<5) ~ 2f(2+<5,n, and retracing the steps above.
For proving part (b), splitting the sum over odd and even indices, we get

var( 8win!(Ui;»))
b

< 2 [Var(2:eveni win!(Ui;»)) + Var(2:oddiWin!(U~;»))]


< 2 [ 2:i E!(Ui;»)2 + b· 2:1'5.k9/2 a((2k - 1)2£ - f)(211!1100)2]
< 2 [b(llfII00)2 + 4bll!ll~ 2: 1'5.k'5.b a(kf)] .
The other term in the bound may be obtained as in part (a). Hence, the
proof of the lemma is complete. 0

Next we prove the theorem.

Proof of Theorem 3.1: Without loss of generality, let Il = O. We


prove the theorem for j = 1, i.e., for the MBB estimator first. Let
Ut = (X('l-1)i+1 + ... + Xei)/f denote the average of the ith resampled
block under the MBB, 1 ~ i ~ b. Then, from (2.10), it follows that condi-
tional on Xn, Ui, ... ,U; are iid and

p*(U; = Ui ) = l/N, 1 ~ i ~ N,

where, recall that, N = n-f+1 and Ui = (Xi +·· ·+XiH-1)/f, i?: 1. Also,
note that X~(1) = b- 1 2:~=1 Ut. Hence, by the conditional independence of
Ui,···,U;,

nvar*(i t,ut)
1

n 1 b- 1Var*(U{)
52 3. Properties of Block Bootstrap Methods for the Sample Mean

?= UiU: - PnP~] ,
N
= £[N- 1 (3.12)
t=1

where Pn == E.X~(I) = E.Ui = N- 1 E~1 Ui . Next note that by Proposi-


tion 3.1, in the d = 1 case,
£-1
L (£ -liI)IEX1XHil
i=I-£
£-1 2/(2+8)]
< £[ EX; + 16 ~ a(i)8/(2+8) (EIXII2+8)

0(£) as n ~ 00

for any constants WIn, ... , Win E [-1,1]. For d > 1, using this bound
component-wise and using the stationarity of the Xi'S, from (2.14), we get

nEIIPn - Xnl1 2 ::; nE{ 11- ~ IllXnl1


+ (Nf)-'II t ( l - i)(X; + X n - H ,) I }'
< 2{ (£/N)2nEIIXnI12 + 2nN- 2 [Ell t(i/£)XiI12

+ Ell t(i/£)X£-iln}
0([£/n]2) + O([£/n])
O(£/n) as n ~ 00 ,

as EIIv'nXnI12 = 0(1). This implies


E{£IIPnP~II} ::; C(d) . £EIIPnI1 2
::; C(d)· £. {2EIIXI1 2+ 2EIIXn - Pn11 2}
= O(£/n). (3.13)
Hence, by (3.12) and (3.13), it remains to show that
N
£N- 1 L UiU: --+p ~oo .
i=1
Let Yin = U1iU{i n(llUlill < (n/£)I/B) and Win = U1i U{i - Yin, 1::; i ::; N.
Then, applying Lemma 3.1 component-wise, for large n we get,
N 2
EIIN- 1 L (Yin - EYin) II
i=1
3.2 Consistency of MBB, NBB, eBB: Sample Mean 53

< C(d)(n/C)1/2 [nC + nCI:1$k<n/£a(kC)] /N2


< C(d)(n/C)-1/2[1+I:k~la(k)]
0(1) .

Next, note that by definition, Un = ;ZX£, and that under the conditions
of Theorem 3.1 (cf. Appendix A), ynXn ~d N(O, ~oo). Hence, by the
(extended) dominated convergence theorem,

nl~EIIUnI12n(llUnll > (n/C)l/B) =0. (3.14)

Therefore, IIEVin - ~ooll ::::: C(d)EllUnI1 2n (llUnI1 8 > n/C) + IIEUnU{l -


~ooll = 0(1). Hence, for any f > 0, by Markov's inequality,

nl~~ p(llcN- 1 8 uiu: - ~ooll


N

> 3f)

< nl~~ p(IIN- 1 t,(Vin - EVin) II + IIEVin - ~ooll


N

+ IIN- 1 8Winll > 3f)

< nl~~ p(IIN- 1 t,(Vin - EVin)11 > f)


+ nl~~ p(IIN- 1 t,winll > E)
N 2
< lim f- 2EIIN- 1 "'(Vin
n~~ ~
- EVin) II + n~oo
lim f-1EIIWnll
i=l
< 0 + .!~ E- 1C(d)EllUnI12n (11Unll > (n/C)l/B)
o. (3.15)

This proves Theorem 3.1 for the MBB. Next, consider the NBB. Write
Ut(2) for the ith resampled block average under the NBB. Then, by (2.15),

n1 b-1Var* (U*(2»)
1

[b- 1 2: U(i-l)Hl U(i-l)H1 - X


b
C n, X~,] . (3.16)
2=1

Since yriXn ~d N(O, ~oo), it follows that CIIXnlX~,11 = Op(C/n). Now,


using Lemma 3.1(b) and (3.14), and retracing the steps in (3.15), we get
54 3. Properties of Block Bootstrap Methods for the Sample Mean

Var*(T,;'(2)) ------7p I;oo as n ---+ 00. Finally, for the eBB, note that E*X~(3) =
Xn for any block length C E [1, n]. Hence,

Var*(T~(3)) = n1b- 1 [Var*(U;(3))]

C[
n
-1 ~ U(3)u(3)'
~""
- X X']
n n
(3.17)
i=l

where, recall that, UP) == (Yn,i + ... + Yn,(iH-1))/C, Noting that for 1 :::;
i :::; N, UP) = Ui and that under the conditions of Theorem 3.1, EIIX1 +
... + Xml1 2 :::; C(d)m for all m 2: 1, we get

Ell (CN- 1 t,Ui U:) - (cn- t1


uP)uP)') II

< E[(N- 1 - n-1)11 t, U1iU{ill + n-1C i=~l IIUP) 112]


N n
< C(Nn)-lC(d) L El1U1il1 2 + 2n-1g-1 L {EIIXi + ... + Xnl1 2
i=l i=N+1
+ EIIX1 + ... + Xe-(n-H1) 112}
< C(d)(n- 1C)EIIv'£XeI1 2
+ 2n- 1C- 1 [C{ 2 max{EIIX1 + ... + Xil1 2 : 1 :::; i :::; C}]
O(n-1C) . (3.18)

Hence, by (3.12), (3.13), (3.15), (3.17) and (3.18), it follows that


Var*(T,;'(3)) - Var*(T,;') = op(l), proving the theorem for the eBB.

3.2.2 Consistency of Distribution Function Estimators


In this section, we establish consistency of the bootstrap methods for dis-
tribution function estimation. Recall that G n denotes the sampling distri-
bution of Tn. Let aW),j = 1,2,3 denote the conditional distribution of
T,;'(j) under the MBB, the NBB, and the eBB methods, respectively. We
may think of {Gn}n>l as a sequence of points in the space of all probability
measures on JRd. When G n weakly converges to a limit distribution Goo,
say, the classical approach is to approximate G n using the limit distribu-
tion Goo. In contrast, the bootstrap methods attempt to approximate G n
by generating random probability measures aW)
's that change with n. The
next theorem establishes consistency of these random measures 's when aW)
Tn is asymptotically normal. For stating the result, recall the convention
3.2 Consistency of MBB, NBB, CBB: Sample Mean 55

that for x = (Xl, ... , Xd)', Y = (Yl, ... , Yd)' E jRd, X s:; Y if Xi s:; Yi for all
i = 1, ... ,d.

Theorem 3.2 Suppose that there exists a 6" > 0 such that EIIXl I12+c5 < 00
and L~=l a(n)<l/(2+c5) < 00. Also, suppose that ~oo = LiEZ Cov(Xl , X Hi )
is nonsingular and that £-1 + n- l £ = 0(1) as n ---; 00. Then, for j = 1,2,3,

xErn:. d
I
sup P* (T~(j) s:; x) - P(Tn s:; x) I--+p 0 as n ---; 00 .

Theorem 3.2 shows that like the bootstrap variance estimators, the dis-
tribution function estimators o~fl's are consistent estimators of G n for a
wide range of values of the block length parameter £. Indeed, the conditions
on £ presented in both Theorem 3.1 and Theorem 3.2 are also necessary
for consistency of these bootstrap estimators. If £ remains bounded, then
the block bootstrap methods fail to capture the dependence structure of
the original data sequence and converge to a wrong normal limit as in the
example of Singh (1981) (cf. Section 2.3). On the other hand, if £ goes to
infinity at a rate comparable to the sample size n (violating the condition
n -1 £ = o( 1) as n ---; (0), then there are not enough number of distinct
blocks to recreate a representative image of the "infinite population". It
can be shown (cf. Lahiri (2001)) that, in this case, the estimators oW)
converge to certain random probability measures.
Consistency of the bootstrap estimators oW)'s remains valid over a much
larger class of sets than asserted in the statement of Theorem 3.2. Let (! be a
metric on the set of all probability measures on jRd, metricizing the topology
of weak convergence of probability measures (see (A.2) of Appendix A or
Parthasarathi (1967)). The proof actually shows that under the conditions
of Theorem 3.2,

for j = 1,2,3, where, recall that .c(T~(j) IXn ) denotes the conditional dis-
tribution of T~(j) given X n . Since N(O, ~oo) is an absolutely continuous
distribution on jRd, a result of Rao (1962) on uniformity classes (see Sec-
tion 1.2 of Bhattacharya and Rao (1986)) implies that the convergence of
P*(T;:(j) E .) and of P(Tn E .) to <I>(.; ~oo) is uniform over the collection C
of all Borel-measurable convex subsets of jRd. Hence, it follows that under
the conditions of Theorem 3.2,

sup Ip*(T~(j) E B) - P(Tn E B)I--+p 0 as n ---; 00


BEe

for j = 1,2,3.
Consistency of the block bootstrap estimators of [(Tn) continues to hold
when a resample size of an order different from the sample size is chosen (cf.
56 3. Properties of Block Bootstrap Methods for the Sample Mean

Lahiri (2001)). Furthermore, under some additional conditions, the conver-


gence of bootstrap estimators of the variance and the sampling distribution
of Tn may be strengthened to almost sure convergence. For such extensions,
see Kiinsch (1989), Peligrad and Shao (1995), Radulovic (1996), and the
references therein.
The rest of this section is devoted to the proof of Theorem 3.2. Recall the
<I>(.; Eoo) denotes the N(O, Eoo) probability measure on the Borel O'-field
B(]Rd) on ]Rd. For simplicity of notation, we also write <I>(x; E oo ), x E ]Rd
for the distribution function of N(O, Eoo) on ]Rd.

Proof: We begin with the case j = 1. Since Tn converges in distribution


to N(O, Eoo) and N(O, Eoo) is a continuous distribution, by a multivariate
version of Polya's Theorem (cf. Section 1.2, Bhattacharya and Rao (1986)),

sup iP(Tn ::; x) - <I>(x; Eoo)i-+ 0 as n -+ 00 .


xEIR d

Hence, it is enough to show that

sup ip*(T~(l)::; x) - <I> (x; Eoo)i ~p 0 as n -+ 00. (3.19)


xEIR d

Let An(a) = Cb-lL~=lE*llUt -PnI12n(v'£IIUt -Pnll > 2a), a> O.


Note that conditional on X n , U{, . .. ,U; are iid random vectors, and that
for any two random variables X and Y and any 'T/ > 0,

EIX + YI 2n(IX + YI > 'T/)


< 4E(IXI 2V 1Y12)n(2IXI V IYI > 'T/)
::; 4[ElxI 2n(IXI > 'T//2) + EIYI 2n(1Y1 > 'T// 2)] ,
where recall that x Vy = max{x, y}, x, y E R Hence, by (3.13) and (3.14)
and the inequality above, for any E > 0,

P(An((n/C)1/4) > E) ::; E-IEAn((n/C)1/4)


E- 1E{ CE*"U~ - PI1 2n (v'£IIU~ - Pnll > 2(n/£)1/4) }
E- 1EllUll - v'cPn112n (I lUll - v'CPnll > 2(n/C)1/4)
< 4E- 1[EllUllI1 2n(llull l > (n/C)1/4) +CEIIPnI1 2]
-+0 as n-+oo. (3.20)

Thus,
(3.21 )
3.3 Consistency of the SB: Sample Mean 57

Next, note that (3.19) would follow if for any subsequence {ni}, there is
a further subsequence {nk} C {nil such that

(3.22)

Fix a subsequence {nil. Then, by (3.21) and Theorem 3.1, there exists a
subsequence {nd of {ni} such that as k -+ 00

Var * (T*(l))
nk
-+ I: 00 a.s. (3.23)

Note that T~(l) = L~=l (ut - flnhlijb is a sum of conditionally iid ran-
dom vectors (Ui - fln)Jl7b, ... , (Ub - fln)v'lfb, which, by (3.23), satisfy
Lindeberg's condition along the subsequence nk, almost surely. Hence, by
the CLT for independent random vectors (cf. Theorem A.5, Appendix A),
the conditional distribution C(T~~l) IXnk ) of T~~l) converges to N(O, I: oo )
as k -+ 00, almost surely. Hence, by a multivariate version of Polya's Theo-
rem, (3.22) follows. This proves Theorem 3.2 for the case j = 1. The proof
is similar for j = 2,3. The reader is invited to supply the details. 0

3.3 Consistency of the SB: Sample Mean


In this section, we consider consistency of the SB method for estimating
the variance and the distribution function of the sample mean. As before,
let {XdiEZ be a sequence of stationary JRd-valued random vectors with
mean J.L. Also, for n ::::: 1, let {Yn,ih~l denote the periodically extended
time series, defined by Yn,i = Xj if i = j (modulo) n (cf. Section 2.7). First
we consider the SB estimator of the asymptotic covariance matrix I:oo of
the sample mean. From Section 2.7, note that the SB resamples blocks of
- *(4)
random lengths to generate the bootstrap sample. Let Xn denote the
mean of the first n bootstrap values under the SB. As noted in Section 2.7,
E*X~(4) = Xn and, hence, the SB version of Tn = y'n(Xn - J.L) is given by
T~(4) = y'n(X~(4) - X n ).

3.3.1 Consistency of BB Variance Estimators


For the centered and scaled sample mean Tn, the SB variance estimator
admits a closed form expression and hence, it can be calculated without
recourse to any resampling of the data. We note this in the following propo-
sition.

Proposition 3.2 Let f n(k) = n- 1 L~:lk XiXI H - XnX~, 0 :::; k < n,


q = 1- p, qnO = 1/2, and qnk = (1- n-1k)qk + (n-1k)qn-k, 1:::; k < n. If
58 3. Properties of Block Bootstrap Methods for the Sample Mean

0< p < 1, then


n-1
Var*(T~(4)) = Lqnk(tn(k)+tn(k)')
k=O
Proof: Note that conditional on X n , L1 has the Geometric distribu-
tion with parameter p. Also, under the SB resampling scheme, Xi and
XiH' k ~ 1, lie in the same resampled block if and only if 1 + k :::; L 1 .
Hence, writing Tn = O"(Xn' L 1, ... , L n ), the O"-field generated by Xn and
L 1 , ... , L n , for any 1 :::; k :::; n - 1, we get

E ( Xi Xi~k I Xn) = E { E ( Xi Xi~k I Tn) I Xn }

E{ (n- 1 ~ Yn'iY~'iH) . n(L1 ~ 1 + k) I Xn}


+E{ (n- 1 ~Yn'i) (n- 1 ~Y~'i)n(L1:::; k) I Xn}
(n- 1 ~Yn'iY~'i+k )P(L1 > k I Xn)
+ XnX~ P(L 1 :::; k I Xn)

n- 1{
t-1
0
n-k
XiX{+k +_ L
n

t-n-k+1
}
XiX{H_n qk

+ XnX~(l _ qk)
{f n(k) + t n(n - k)' }qk + XnX~ . (3.24)

Next, noting that the bootstrap samples under the SB form a stationary
sequence, we have

[{ E*XiXi' - XnX~} +
~(1- n- 1k){ E*XiXi~k + E*Xi' Xi+k - 2XnX~}].
k=l
(3.25)
Hence, the proposition follows from (3.24) and (3.25). D

Next we prove consistency of the SB variance estimator. For this, we


assume a stronger set of conditions than those in Theorem 3.l.
Theorem 3.3 Assume that EIIXIil 4 +8 < 00 and L::'=l n 3 a(n)O/(4+8) <
00 for some <5 > O. Also, assume that p + (n 1/ 2 p)-1 ---+ 0 as n ---+ 00. Then,

Var*(T~(4)) ----"p I:oo as n ---+ 00 .


3.3 Consistency of the SB: Sample Mean 59

For proving the theorem, we need two auxiliary results. The first one is a
standard bound on the cumulants of strongly mixing random vectors. For
later reference, we state it in a slightly more general form than the set up of
Theorem 3.3, allowing nonstationarity of the random vectors {XihEZ, For
any random variables ZI, ... , Zr, (r::::: 1), we define the rth-order cumulant
Kr(ZI, ... , Zr) by

where L = A. Also, for a random vector W = (W1 , •.. , Wd)'


and an integer-vector v = (V1,"" Vd)' E Zi, we set Kv(W) =
Klvl (W1 , ... , W 1 ;···; Wd, ... , Wd), where the jth component Wj of W is
repeated Vj times, 1 :::; j :::; d. We may express the cumulant Kr(ZI, ... , Zr)
in terms of the moments of the Zi'S by the formula

r (*j) j

Kr(Zl"",Zr) = LLc(h, ... ,I II E II Zi,


j ) (3.27)
j=l

where extends over all partitions {h, ... , I j } of {1, ... , r} and where
2:(*j)
C(I1"" ,Ij)'s are combinatorial coefficients (cf. Zhurbenko (1972)). It is
easy to check that cumulants are multilinear forms, i.e.,

Kr(ZI, ... ,Zli + Z2i, ... , Zr)


= Kr(Zl, ... , Zli,"" Zr) + Kr(Zl, ... , Z2i,"" Zr)

for every 1 :::; i :::; r.


Note that jf {Zl' ... , Zr} and {W1' ... , W s } are independent, then

Kr+s(Zl,"" Zr; W1, ... , Ws)


f)r+s [
(Lr+ s f) f) log E exp(L[t1Z1 + ... + trZrD
t1 . .. tr+s

+ log E exp(L[tr+1 W 1 + ... + tr+s WsD] Ih= ...=tr+s=O


o ~.28)

for any r ::::: 1, s ::::: 1. This identity plays an important role in the proof of
the lemma below.

Lemma 3.2 Let {XdiEZ be a sequence of (possibly nonstationary) JR.d_


valued random vectors with sup {(EIIXiI12r+c5)2r~J : i E Z} == (2r+c5 < 00
and A(r, 8) == 1 + 2::1 i2r-1[a(i)]8/( 2r +c5) < 00 for some integer r ::::: 1 and
60 3. Properties of Block Bootstrap Methods for the Sample Mean

/j > O. Also, let a1, ... , am be any m unit vectors in]Rd (i.e., Ilaill = 1, 1 :::;
i :::; m) for some 2 :::; m :::; 2r. Then,

IKm(a~Sn, ... ,a;"'Sn)l:::; C(d,r).6.(r;/j)(g:.+<5. n


for all n ~ 1, where Sn = Xl + ... + Xn and where C(d, r) is a constant
that depends only on d and r but not on n. Furthermore, for any v E Zi
with Ivl :::; 2r,
n -< C(d , r).6.(r·'':,2r+8
EISVI d)/'ivi nlvl/2
for all n ~ 1.

Proof: Using the multilinearity property of cumulants, we get

IKm(a~Sn' ... ' a;"'Sn) I


< l: IKm(a~Xj1l'''' a;"'Xj=) I (3.29)
l:'Oj, ,... ,j=:'On
Next, for any set of indices 1 :::; j1 :::; ... :::; jm :::; n, consider the maximal
gap in the sequence j1,'" ,jm' Suppose that jk+1 - jk = max{ji+1 - ji :
1 :::; i < m}. Let J 1 = {jl, ... , jk} and J 2 = {jk+1,"" jm}. Also, let
{.,t : i E h} be an independent copy of {Xi: i E J 2 }. Then, by (3.28),
Km(a~Xj" ... ,a~Xjk' a~+1Xjk+1, ... ,a;",Xj,J = O. Hence, by (3.27), the
strong mixing condition and Proposition 3.1,

IKm(a~Xj,,"" a;"'Xj=) I

IKm(a~Xj,,"" a;"'Xj=)

- Km(a~Xj1"'" a~Xjk' a~+1Xjk+1"'" a;"'Xj=) I


I j
II E II a~Xjs
m (*j)
< l:l:c(h, ... ,Ij )
J=l 2=1 sEli

-IT {(E
2=1
II a~Xjs) (E II a~Xjs)} I
sElinJ, sElinh

< C(m)C~+<5,n [a(jk+ 1 - jk)t/ m+<5

where (s,n == sup {(EIIXj II S)l/S : 1 :::; j :::; n} for all s > 0, n~ 1,
and C(m) is a constant that depends only on m, not on n. Next, note
that for any 0 :::; t :::; n - 1, there are at most n . t m - 1 sets of indices
{j1,"" jm} C {I, ... , n} that has maximal gap t. Hence,

l: IKm(a~Xj,,"" a;"'Xj"JI
l:'Oj" ... ,j=:'On
3.3 Consistency of the SB: Sample Mean 61

n-1
< 2)ntm - 1). C(m)· (:+8,n[a(t)],,/m+8
t=o
< C(m)~(r, O)(~+8 .n . (3.30)

This, together with (3.29), completes the proof of the first inequality.
The bound on EIS~I readily follows by using cumulant expansions
for moments (cf. Section 6, Bhattacharya and Rao (1986)) and the first
inequality. Hence, the lemma is proved. D

The next lemma gives an expression for the covariance of the "uncen-
tered" sample cross-covariance estimators
n-j
Un ] ,
A ("
a, (3) -= n -1 '~
" ' X in X{3
i+j'
i=l

where 0 S; j S; n -1 and a,(3 E Z~, lal = 1(31 = 1.

Lemma 3.3 Suppose that the conditions of Theorem 3.3 hold and that
EX1 = O. Then, for any a, (3, 'Y, v E Z~ with lal = 1(31 = I'YI = Ivl = 1, and
any 0 S; j, k S; n - 1,

Cov( a(j; a, (3), a(k; 'Y, v))


(n-j-v-1)
n- 1 L {1 - n- 1(1]jv (m) + k)}
m=-(n-j)+l
. {(EX? X{+m)(EXf Xf+m+v) + (EX? X::'+kH) (EXf X{+m_j)}
+ Rn(j, k; a, (3, 'Y, v)

where v = k-j, 1]jv(m) = m·n(m > 0)-(m+v)n(-(n-j)+1 S; m < -v),


and the remainder terms Rn(j, k; a, (3, 'Y, v) 's satisfy the inequality
n-1n-1
LL qnjqnkIRn(j, k; a, (3, 'Y, v)1
j=O k=O
n-1n-1n-jn-k
< Cn- 2 LLLL
j=O k=O 8=1 t=l
. sup {1}(4(Z~ X o, Z~Xt-8' Z~Xj, Z~Xk-j) I : Ilzill S; 1, i =:= 1,2,3,4} .

(3.31)

Proof: This is a variant of Bartlett's (1946) formula for the covariance


of sample auto covariance estimators. See, for example, Section 5.3 of
Priestley (1981) for a derivation in the one-dimensional case. Inequality
62 3. Properties of Block Bootstrap Methods for the Sample Mean

(3.31) is obtained from Bartlett's (1946) bound on Rn(j,k;a,{3,,),,v)'s


upon using the fact that qnj :::; 1 for all 0 :::; j < n. 0

r
Proof of Theorem 3.3: Let n(k) = n- 1 L.~:lk XiXI+k' 0:::; k < nand
r r
En = n(O) + l:~:i qnk(r n(k) + n(kY). Then, by Proposition 3.2,

Since L.~:iqnk :::; 2L.~oqk = 2p-l and p- 1 11Xn 11 2 = Op((np)-l)


op(I), it is enough to show that

(3.32)

We prove this by showing that the bias and the variance of each element
of the matrix En go to zero as n -+ 00. For this, we label the elements of a
d x d matrix A by the d-dimensional unit vectors a, {3 E Zi. For example,
if a = (1,0, ... ,0)' and {3 = (0,1,0, ... ,0)', then A( a, (3) would denote the
(1,2)-th element of the matrix A. With this notation, for any a, {3 E Zi
with lal = 1{31 = 1,

EEn (n, (3) - Eoo (a, (3)

E{ an(O; n, (3) + ~ qnk (an(k; a, (3) + an(k; (3, a)) }


k=l
- { EXf+ f3 + f
k=l
(EXf xf+k + Exf Xf+k) }
n-l

L {qnk(l- n-1k) - I} (EXf xf+k + Exf Xf+k)


k=l
00

-L (EXfXf+k+EXfXf+k). (3.33)
k=n

Note that Iqnk(l-n- 1 k)-11 = l{qk-n-lk(qk_qn-k+qnk)-11 :::; 11-qk l+


3n- 1 k:::; kp+3n- 1 k for alII:::; k 2 :::; p-l. Since L.~l IkIIIEX1Xf+kll < 00,
from (3.33), we get

IEEn(a,{3) - Eoo (a,{3)1


< 2 L (kp + 3n- 1k)IIEX1 Xf+k1l + 8 L IIEX1Xf+k1l
lS:;k 2 S:;p-l

-+0 as n-+oo. (3.34)


3.3 Consistency of the SB: Sample Mean 63

Hence, the bias part goes to zero. Next, we consider the variance part. Note
that by Lemma 3.3,

ICov(un(j; a, (3), un(k; 1, v)) I


< 6n- (m~oo IIEXIX~+mll)EIIXII12
1

+ IRn(j,k;a,,6,1,v)1 (3.35)

for all lal = 1,61 = 111 = Ivl = 1 and for all 0 ~ j, k ~ n


- 1. Hence, using
(3.31), (3.35) and the arguments in the proof of Lemma 3.2 (cf. (3.30)), for
any a,,6 E Zd, lal = 1,61 = 1, we get

Var(tn(a, (3))

var(~ qnk(un(k; a,,6) + un(k;,6, a)) )


< 6n- 1 ~~qnkqnj(
3=0 k=O
f
m=-oo
IIEXIX~+mll) .EIIX Il 2 l

n-ln-l
+ LLqnjqnk( max IRn(j,k;al,a2,a3,a4)1)
j=O k=O a;E{a,.6}

< cn-l(~qnkr
n-ln-ln-jn-k
+ C(d)n-2 L L L L sup {IK4(Z~XO'Z~Xt-s'Z~Xj,Z~Xk-j)1 :
j=O k=O 8=1 t=l

IIZili ~ 1, i= 1,2,3,4}

< Cn- 1 p-2 + C(d) . n- 2 . n( ~ i 3 a(i)8/(4+6)) (EIIXI114+6) 4/(4+6)

-+0 as n-+oo. (3.36)

This completes the proof of Theorem 3.3. o

3.3.2 Consistency of BB Distribution Function Estimators


Next we prove consistency of the SB estimator for estimating the distribu-
tion function of the sample mean. For simplicity of exposition, we consider a
variant of the bootstrapped statistic T~(4) where all of the SB samples from
the last resampled block are retained. Let K == inf{i ~ 1 : Ll + ... +Li ~ n}
denote the minimum number of blocks necessary to generate n SB samples,
64 3. Properties of Block Bootstrap Methods for the Sample Mean

and let N1 = L1 + ... + LK denote the total number of SB observations


in the first K resampled blocks. Define the SB version of the centered and
scaled sample mean Tn based on a resample of size N1 by

Nl
j*(4)
n
= N-1 1/ 2 '"'(X"(4)
~ tn·
_ X )
i=l

Thus, T~(4) differs from T~(4) only by the inclusion of the additional boot-
strap observations, if any, in the last block that lie beyond the first n
resampled values. It is easy to see that N1 - LK ~ n ~ N1, so that the
difference between nand N1 is at most LK. Since we assume that the
expected value of the block lengths is negligible compared to n, the differ-
ence between these two versions can be shown to be negligible under the
conditions of Theorem 3.4. Theorem 3.4' establishes consistency of the SB
method for estimating the distribution function of Tn.

Theorem 3.4 Suppose that EIIXl 11 6 +O < 00, Eoo is nonsingular, and
L~=l n 5 a(n)6/(6+O) < 00 for some 8 > O. Also, assume that p +
(n 1/ 2 p)-1 -+ 0 as n -+ 00. Then,

sup Ip*(T~(4) ~ x) - P(Tn ~ x)l--+p 0 as n -+ 00 . (3.37)


xElR d

From the discussion following Theorem 3.2, it follows that (3.37) is equiv-
alent to

as n-+oo,

where (! is a metric metricizing weak convergence of probability measures


on ]R.d.

Proof: We prove the result only for the one-dimensional case to


keep the proof simple. Write a 00 for Eoo under d = 1 and set
8(i; e) == L~~~-l Yn,j, i ;:: 1, e ;:: 1. Note that conditional on Tn ==
a(Xn' L1, ... , L n ), 8(14 ,1; L 1), ... , 8(14,K; L K ) are independent, but not
necessarily identically distributed random variables with

P(8(14,j;Lj)=8(i;Lj)ITn)=~ for i=1, ... ,n, 1~j~n.


Hence, E(8(14 ,j; L j ) I Tn) = n- 1 2:~=1 8(i; L j ) = LjXn for all 1 ~ j ~
n. By the Berry-Esseen Theorem for independent random variables (cf.
Theorem A.6, Appendix A),

sup Ip*(T~(4) ~ x) - if!(x/aoo ) I


xEIR
3.3 Consistency of the SB: Sample Mean 65

sup IE{P(
xEIR V
~ t (8(14,j; Lj ) -
N1 j=1
LjXn) ::; x I Tn) I Xn}

- <I>(x/O"oo) I

< E[{ sup Ip(


xEIR V
~ t (8(14,j; Lj ) -
N1 j=1
LjXn) ::; x I Tn)

- <I>(x/an,p) I+ 1<I>(x/an,p) - <I>(x/O"oo) I}ll(An) I Xn]


+ 2P(A~ I Xn)
< C· E[ {~E(IS(I4J;L;) - L;.\'nI N;-'/' I Tn) / a~,,}
3

X ll(An) I Xn]

+ E{ ~~~ I<I>(x/an,p) - <I>(x/O"oo)lll(An) I Xn}


+ 2P(A~ I Xn) , (3.38)

where a;,p = 2:.f=1 Var((8(14j; Lj ) I Tn)/N1, An = {Ian,p - 0"001 ::;


anO"oo}, and an = max{p1/2, (n 1/ 2p)-1/2}. Next, note that N1 ~ nand
that K is a stopping time with respect to 0"(L 1, ... , Lj ), 1 ::; j ::; n (cf.
Definition A.3, Appendix A). Hence, with P3(£) = E (1 8 (14,1; £)1 3 I Tn),
£ ~ 1, by (3.38) and Wald's Lemmas (cf. Theorem A.2, Appendix A), we
get

sup Ip*(T~(4) ::; x) - <I>(x/O"oo) I


xEIR
K
< c.n-3/20"~3E{~P3(Lj) I Xn}
+ {~~~ Ix¢(x)l} . E{ [ian,p - 0"001(an,pO"oo)-11l(An )ll Xn}
+ 2P(A~ I Xn)
< C(0"00)n- 3/ 2E(K I Xn)E(P3(Ld I Xn) + C(0"00) . an + 2P(A~ I Xn)
Op(n- 3/ 2 . (np) . (p-3/2)) + O(a n )
+ Op(p1/2 + n-1/4(logn)3)
---. 0 as n ---. 00 ,

by the following Lemma. Hence, Theorem 3.4 is proved. o


66 3. Properties of Block Bootstrap Methods for the Sample Mean

To complete the proof of Theorem 3.4 and for later reference, here we
establish some basic properties of the stopping time K and of the (random
length) block sums for the SB method in the following result.

Lemma 3.4 Assume that p + (np)-l ---. 0 as n ---. 00. Let {t n }n2:1 be a
sequence of positive number such that tn ---. 00 as n ---. 00. Also, let r ;::: 1
be a given integer. Then,

(a) (i) P(L 1 >tnP-1)=O(exp(-t n ));


(ii) P(IK - npl > (np)1/2(logn)) = O( exp( -C(logn)2));
(iii) E(LKt = O(p-(r+1));
(iv) E(Kt = O((npt);
K
(v) E( Kr-1 t;Lr) = O(nr).

(b) Suppose that the conditions of Theorem 3.4 hold. Let +2(£)
Var(S(h1; £) I Tn), £;::: 1. Then,

(i) E{E*IS(I4 ,1; L 1)1 4 } = O(P-2);


(ii) E{pE*+2(Lt} - a~p = O(p2 + (np )-1(logn)6);
(iii) P(lo-n,p - aool > unaoo I Xn) = Op( u;;-l[P+ (np )-1/2(logn)3])

for any {U n }n2:1 satisfying Un + U;;-l (np)-1/2(log n)3 = 0(1) as n ---.


00.

Proof of Lemma 3.4: Let q = 1 - p. Since L1 has the Geometric distri-


bution with parameter p, we have

< q(t n / p )-l


exp([p-1tn - l]logq)
O( exp(-tn )) ,

proving part a(i). Next consider a(ii). Let ko == k on = lnp - (np)1/2lognJ.


Then, by the definition of K,

P(K ~ ko) P(L 1 +


... + Lko ;::: n)
P(exp(t(Ll + ... + Lko)) > exp(tn))
< e-tn(pet /[1 _ qet])ko

for all 0 < t < - log q.


3.3 Consistency of the SB: Sample Mean 67

Next, let f(t) = log{e- tn (pe t (l- qet)-l)ko},O < t < -logq. It is easy
to see that f(t) attains its minimum at to == log[(n - ko)/n] - log q E
(0, -logq). Now using Taylor's expansion, after some algebra,we get
P(K ~ k o) ~ exp(f(to))

= exp ( - ~np. r? + O(n(pry)2 + npry3)) ,

where ry == (np)-1/2Iogn. By similar arguments,


P(K > np + (np)1/2Iogn) = O( exp ( - C(logn)2)).

This completes the proof of a(ii).


Next we consider a(iii). Using the definition of K, we have for m 2:: 1,
n
P(L K = m) = LP(LK = m,K = k)
k=1
n k-1 k
LP(Lk = m, LLi < n ~ LLi)
k=1 i=l i=1
n k-1
P(L 1 = m,L1 2:: n) +P(L1 = m) LP(n - m ~ LLi < n)
k=2 i=1

< p(L1=m)[1+tp(k~K<k+m)]
k=2

peL, ~ m) [1+ t, .~-' P(K ~ j)1


< (m+1)P(L 1 =m).
Hence, E(LK Y ~ 2::=1 mr(m + 1)P(L 1 = m) ~ C(r)p-(r+1). For a(iv),
noting that 1 ~ K ~ n, by part a(ii), we have,

E(KY = E(Kyn(IK - npl ~ ..foP logn) + O(n r exp (-C(logn)2))


= O((npy).
To prove part a(v), note that K is a stopping time with respect to the a-
fields {a(Ll' ... ,Lk) : 1 ~ k ~ n}. Hence, using part a(iv), Wald's Lemmas
(cf. Theorem A.2, Appendix A), and Holder's inequality, we get,
r
68 3. Properties of Block Bootstrap Methods for the Sample Mean

< C(r)(npf-1 [E{ t,(Lr - ELI) + (EK2)(ELI)2f/2


< C(r)(npf-1 [E(K) . ELi r + (EK2)(ELir)F/2
< C(r)(npfp-r.
Next consider part (b). Part b(i) follows from a more general result (cf.
Lemma 5.3) proved in Chapter 5. To prove b(ii), first note that by Lemmas
3.1 and 3.2, for any 1 ::::: f < n/2,

Var (t SCi; f)2)

< 2 [£2var(n~1 SCi; £)2 If) + E(=E+2 SCi; £)2) 2]


< C£2(n£) (EIS(1;f)Iv'e1 6 f/ 3 . (1 + L a(k£)1/3)
l<5.k<5.n/C

+ 2£2 max { E S (i; £) 4 : n - £ + 2 ::::: i ::::: n }

< Cnf· [E(S(1;£)rr/ 3


+ 24. £2 max { E(X1 + ... + X k )4
+ E(X1 + ... + X C_ k )4 : 1 ::::: k < £}
6 ] 2/3 3
< C(J) [ (6+<5~(3; J) n£. (3.39)

Next write T2(k) = E(X1 + ... + X k )2 and Wkn = pqk-1 = P(L 1

t
k), k ~ 1. Then, by (3.39) and Cauchy-Schwarz inequality, we have,

E{ E* (n- 1 [SCi; L1)2 _ T2(Ld]) } 2

~ (n- 1 t
t
E{ [SCi; k)2 _ T2(k)J) Wkn} 2

< ~Wkn. E( n- 1 [SCi; k)2 _ T2(k)J) 2

< L
k<5.p-l (log n)2
Wkn Var (n- 1 t
i=l
SCi; k)2)

+2 L
k>p-l(logn)2
Wkn{E(n- 1 t
i=l
S (i;k)4)+T(k)4}
3.3 Consistency of the SB: Sample Mean 69

< (I:Wkn) ·n- 2 .C(8)[(~+c5~(3;8)r/3 .n(p-1(IOgn)2)3


k~l

+C
k>p-l (log n)2

O(n- 1p - 3(logn)6) +o( (I:k8Wknf/2( I: Wknf/2)


k~l k>p-l (log n)2

O(n- 1p- 3(logn)6) + o(p-4 exp ( - ~(IOgn)2)) . (3.40)

Also, with 'Y(i) == EX1 X1+i, i ~ 1, we have

Ip, E*r 2(L 1) - u!, I


IpE* (r 2(Ld - L 1u!,) I
IpE.{ J=:, (L, -1·lh(i) - J~ Ln(')I}
< 2p f
i=l
Iii 1'Y(i) I + 2PE*{ L1 I: b(i)l}
i~Ll

< C· P + 2p· [P(L1 ::; p-1/2) . p-1/2 . (~b(i)l)

+ {E(Ld}' I: b(i)l]
i~p-l/2

< CP+Cp.(p1/2)p-1/2+ C p .(p-1)p.( I: i 21 'Y(i)l)


i~p-l/2

< Cp, (3.41)


since P(L 1 ::; t) = 1 - qt ::; Ctp for alII::; t ::; p-1/2. Hence, by (3.40) and
(3.41), it follows that

r
E{pE*f2(Ld _ u!,} 2

E{PE* (n- 1 t S(i; L1)2 - LiX~) - u!,

< 4 [p2 E{ E*(LiX~) r r


t
+ E{pE*r 2(L 1) - u!,

+ p2 E{ E* (n- 1 [S(i; L1)2 _ r 2(L1)]) } 2]


O(p2p-4n -2) + O(p2) + O((np)-1(logn)6) .
70 3. Properties of Block Bootstrap Methods for the Sample Mean

This proves b(ii).


Next we consider b(iii). Using a(i) and arguments similar to (3.39) and
(3.40), we get

E[E*{ n- 1 tS(i;L1)4}]

< max {n- 1 t ES(i; k)4 : 1 :s; k :s; p-1(logn)2 }

+ (EIIXd 4 ) . L k 4w kn

(3.42)

and by a(iv), Wald's lemmas and the fact that N1 ~ n,


E*I N i 1K - pi = EI N i 1K - pi
< {EIKp-1 - N11 } (p/n)

n- 1p. E/ £=(Lj - ELj ) /

rr
J=l

< (n-'p). [E{ t,(L; -EL;)


(n -lp) [(EK)Var(L1)]1/2
< C. (n- 1p) . [np. p-1P/2
O(n- 1 / 2 p) . (3.43)
Next, note that f2(L j ) = n-12:7=lS(i,Lj)2 - L;X;" 1 :s; j:S; n are
conditionally iid given Xn , and a;,p = Nil 2:7=1 f2(L j ). Now, using b(ii),
(3.42), and (3.43), we have

p( lan,p - O"CXJI > UnO"CXJ I Xn)


< p( la;,p - O"~I > UnO"~ I Xn)

< p(/ t, {f2(Lj) - E(f2(Lj) I Xn)} / > ~ . nUnO"~ I Xn)


+P(/Ni 1KE (f 2(L1) I Xn) -O"~/ > ~ .UnO"~ I Xn)
< C· (nUnO"~)-2{ E*(K)}{ Var* (f2(Ld)}
+ 2(UnO"~)-lE*INi1KE*(f2(L1)) - O"~I
3.3 Consistency of the SB: Sample Mean 71

< C(a(X))· (nu n )-2(np)E* [n- 1 tS(i;Lt)4]

+ C(a(X)) . (un)-l{ (E*IN1 1K - pi) (E*f2 (L 1 ))


+ IpE*f2 (L 1 ) - a~l}
Op( n-lpu:;;2 [p-l(1ogn)2 f)
+ Op (U:;;l{ (n- 1/ 2p)p-l + [p + (np)-1/2(lOgn)3]}).
This completes the proof of part b(iii), and hence of the lemma. 0
4
Extensions and Examples

4.1 Introduction
In this chapter, we establish consistency of different block bootstrap meth-
ods for some general classes of estimators and consider some specific exam-
ples illustrating the theoretical results. Section 4.2 establishes consistency
of estimators that may be represented as smooth functions of sample means.
Section 4.3 deals with (generalized) M-estimators, including the maximum
likelihood estimators of parameters, which are defined through estimat-
ing equations. Some special considerations are required while defining the
bootstrap versions of such estimators. We describe the relevant issues in
detail in Section 4.3. Section 4.4 gives results on the bootstrapped empiri-
cal process, and establishes consistency of bootstrap estimators for certain
differentiable statistical functionals. Section 4.5 contains three numerical
examples, illustrating the theoretical results of Sections 4.2-4.4.

4.2 Smooth Functions of Means


Results of Sections 3.2 and 3.3 allow us to establish consistency of the
MBB, the NBB, the eBB, and the SB methods for some general classes
of estimators. In this section, we consider the class of estimators that
fall under the purview of the Smooth Function Model (cf. Bhattacharya
and Ghosh (1978); Hall (1992)). Suppose that {XOdiEZ is a JRdo-valued
stationary process. Let f : JRdo ---+ JRd be a Borel measurable function, and
74 4. Extensions and Examples

let H : ]Rd -7 ]R be a smooth function. Suppose that the level-1 parameter


of interest is given by () = H(Ef(Xod). A natural estimator of () is given
by On = H(n- 1 2:~=1 f(X Oi )). Thus, the parameter e and its estimator On
are both smooth functions, respectively, of the population and the sample
means of the transformed sequence {f(XOi)}iEZ. Many level-1 parameters
and their estimators may be expressed as smooth functions of means
as above. Some common examples of estimators satisfying this 'Smooth
Function Model' formulation are given below.

Example 4.1: Let {XOdiEZ be a stationary real-valued time series with


autocovariance function 'Y(k) = Cov(XOi,XO(iH)), i,k E Z. An estimator
of 'Y(k) based on a sample X Ol , ... , XOn of size n is given by
n-k
1n(k) = (n - k)-l L XOiXO(iH) - X5(n-k) , (4.1)
i=l

where XO(n-k) = (n - k)-l 2:~::lk X Oi . Note that 1n(k) is a version of the


sample autocovariance at lag k. We now show that the estimator 1n(k)
and the parameter 'Y(k) admit the representation specified by the Smooth
Function Model. Define a new sequence of bivariate random vectors

Xi = (XOi,XOiXO(iH))', i E Z .

Then, the level-1 parameter of interest e == 'Y(k) is given by


e = EXOlXO(l+k) - (EXod2 = H(EXd

where H : ]R2 ]R is given by H((x, y)') = y - x2. And similarly, its


en == 1n(k) is given by
-7

estimator

where Xn-k (n - k)-l "£~::lk Xi. Thus, this is an example that falls
under the purview of the Smooth Function Model. 0

Example 4.2: Let {XOdiEZ be a stationary time series with Var(Xod E


(0, (0), and the level-l parameter of interest is the lag-k autocorrelation
coefficient
r(k) = COV(XOl, XO(l+k))/Var(Xod ,
for some fixed integer k ~ o. As an estimator of r(k), we consider the
following version of the sample autocorrelation coefficient
4.2 Smooth Functions of Means 75

where XO(n-k) = (n - k)-l "£~:lk X Oi . Then, as in Example 4.1, r(k) and


Tn (k) may be expressed as smooth functions of sample means of certain
lag-product variables. Define the function H : lR3 -+ lR by

and set Yi = (XOi , X6i' XOiXO(Hk))', i E Z. Then, it is easy to see that the
function H(·) is smooth in a neighborhood of EY1 and that r(k) and Tn(k)
can be expressed as
r(k) = H(EYd
and

- -1 ",m
where Y m = m wi=l Yi, m ~ 1. D

EXaIIlple 4.3: Let {XOdiEZ be a zero-mean stationary autoregressive


process of order pEN (AR(p), in short), satisfying
P
X Oi = L (3jXO(i-j) + Ei, i EZ , (4.2)
j=l

where {EihEZ is an uncorrelated stationary process with EEl 0 and


EE~ = 0"2 E (0,00), and {31, ... , {3p E lR are the autoregressive parameters.
Suppose that ({31, ... , (3p) are such that the polynomial

(4.3)
has no zero on the closed unit disc {Izl ~ I}. Then, the AR(p)-process
{XOdiEZ admits a representation of the form

L ajEi-j, i E Z
00

X Oi = (4.4)
j=O

where the sequence of constants {adi<':o is determined by "£';.0 aj zj =


1/{3(z), Izl ~ 1 (see Theorem 3.1.1, Brockwell and Davis (1991)). This
property of {XoihEZ is referred to as "causality" and it yields the Yule-
Walker estimators of the parameters {31, ... , (3p and 0"2.
Let 'Y(k) and 'Yn(k), 1 ~ k ~ p be as in Example 4.1 with (n - k)
replaced by (n - p). Also, let 'Yp,n == ('Yn(I), ... ,'Yn(P))' and let rp,n be
the p x p matrix with (i,j)-th element 'Yn(i - j), 1 ~ i,j ~ p. When rp,n
is nonsingular, a version of the Yule-Walker estimators /J1n, ... , /Jpn, of a;
{31, ... , {3p, 0"2, is given by
, ' , ' -1 '
((31n, ... ,{3pn) =rp,n'Yp,n, (4.5)

a~ = 'Yn(O) - (/J1n, ... , /Jpn)' 'Yp,n . (4.6)


76 4. Extensions and Examples

We claim that the parameter vector () = (!31, ... , !3p; 0"2)' and the estimator
On = (~ln, ... , ~pn; a;J' satisfy the requirements of the Smooth Function
Model. To see this, define a new jRP+2-valued process {XdiEZ by

Let h1(X) = (X2 - xi) and h2(X) = (X3 - xi, ... , Xp+2 - xt)', x =
(Xl, ... ,Xp+2)' E jRP+2. Then, writing Xm = m-12:::1Xi, m ~ 1, we
have 'Yn(O) = h 1(Xn- p) and 'Yp,n = h 2(Xn- p).
Next, let S; (and S;+, respectively) denote the collection of all symmet-
ric (and symmetric nonsingular) matrices of order p, and let 91 : -+ S; S;
be defined by
if A E S*+
p
otherwise.

Since, for A E S;+, the elements of A-I are given by the ratios of the co-
factors of A and the determinant of A, and the determinant of a matrix is a
polynomial in its elements, the components of the function 91 (.) are rational
functions (and, hence, infinitely differentiable functions) of the elements of
its argument at any A E S;+. Also, let g2 : jRP -+ S; be defined by

2
Then the estimator ()n = (!31n, ... ,!3pn; an)' can be expressed as
A A A

n - H(X-n-p ) =- (H(l) (X-n-p )'., H(2)(X-n-p'


(A) _ ))'

where

and
2 - (1) - , - _ (2)-
h1(Xn- p) - H (Xn- p) h2(Xn- p) = H (Xn- p).
A

O"n

The corresponding representation for the parameter () as () = H(EX 1 )


holds since by (4.4) (cf. p. 239, Brockwell and Davis (1991)),

where "(p = b(1), ... ,"{(p))' and rp is the p x p matrix with (i,j)-th
element "((i - j), 1 ~ i,j ~ p. Thus, the Yule-Walker estimators also fall
under the purview of the Smooth Function Model. 0

Now we consider the general framework of the Smooth Function Model


described in the first paragraph of this section and establish consistency
4.2 Smooth Functions of Means 77

of different block bootstrap estimators of the sampling distribution of the


centered and scaled estimator On, given by

TIn = y'n(On - 0) ,

with 0 = H(Ef(Xod) and On = H(n- 1 L~=1 f(X oi )). Block bootstrap


versions of TIn can be defined following the descriptions given in Chapter
2 and Sections 3.2 and 3.3. Let Xi = f(X oi ), i E Z. Note that in terms of
the transformed variables Xi'S, we may rewrite 0 and On, respectively, as
o = H(EX 1) and On = H(Xn). Next, let X~(j), j = 1,2,3 denote the set
of nl bootstrap samples based on b blocks of length £ from the transformed
variables Xn = {X!, ... ,Xn } under the MBB, the NBB, and the CBB,
respectively, and let X~(4) denote the set of Nl bootstrap observations
under the SB, with expected block length p-l E (1,n), where b = In/£J,
nl = b£ and Nl = Ll +- . -+ LK are as in Chapter 3. Also, with a slight abuse
of notation, let X~(j), j = 1,2,3,4 denote the means of the corresponding
bootstrap samples. Then, the block bootstrap versions of TIn are given by
*(j) --
T In ~(ll*(j)
V nl un
- iJ .)
Un,J , J. -- 1, 2 , 3 ,
and
T*(j)=
In V!]\il(O*(j)-B
lVl n n,l.) , J·=4 ,

where O*(j)
n = H(X*(j))
n and Bn,]. = H(E * X*(j))
n , 1 <
_ J. <
_ 4.
Then, we have the following result.
Theorem 4.1 Suppose that the function H is differentiable in a neigh-
borhood NH == {x E ~d : Ilx - EX111 < 21]} of EX 1 for some 1] > 0,
Llal=1 IDa H(EX1)1 =I- 0, and that the first-order partial derivatives of H
satisfy a Lipschitz condition of order /'i, > 0 on N H. Assume that the condi-
tions of Theorem 3.2 hold for j = 1,2,3 and that the conditions of Theorem
3.4 hold for j = 4 (with the transformed sequence {XihEZ). Then,
sup Ip*(T;~j) :::; x) - P(Tln :::; x)l---+p 0 as n - t 00
xEIR

for j = 1,2,3,4.
For proving Theorem 4.1, we shall use a suitable version of the well
known Slutsky's theorem for conditional distributions.
Lemma 4.1 (A CONDITIONAL SLUTSKY'S THEOREM). For n E N, let
b~ and T~ be r-dimensional (r E N) and s-dimensional (s E N) random
vectors, and let A~ be a r x s random matrix, all defined on a common
probability space (0, F, P). Suppose that Xoo is a sub-a-field of F and that
there exist Xoo-measurable variables A and b, such that

P(IIA~ - All> f I Xoo) + P(llb~ - bll > f I Xoo) (4.7)


---+p 0 as n - t 00 ,
78 4. Extensions and Examples

for every E > o. Also, suppose that there is a probability distribution v


on IRs such that .c(T;: I Xoo) ----t d V in probability, as n -+ 00, i.e.,
(!s(.c(T;: I X oo ), v) ----tp 0 as n -+ 00, where (!k(·) metricizes convergence
in distribution of random vectors on IRk, kEN. Then,

.c(A~T~ + b~ I Xoo) ----t d VOg-1 in probability,

i.e.,
(!r(.c(A~T~ + b~ I X oo ), vog- l ) ----tp 0 as n -+ 00 ,

where 9 : IRs -+ IRS is the mapping g(t) = At + b, t E IRS, and where vog- l
denotes the probability distribution induced by the mapping 9 on IRr under
v, i.e., vog-I(B) = v(g-I(B)), BE 8(IRr).

Note that if T is a random vector with distribution v, then vog- l is the


distribution of the transformed random vector g(T). Hence, the lemma
can be restated in a less formal way as follows:

Lemma 4.1' If .c(T;: I Xoo) ----t d .c(T) in probability and (4.7) holds, then

.c(A~T~ + b~ I Xoo) ----t d .c(AT + b) in probability.

For proving the lemma, we shall use the following equivalent form of
condition (4.7):
There exists a sequence {E n }n21 of positive real numbers such that En 10
as n -+ 00 and

----tp 0 as n -+ 00 . (4.8)

It is clear that (4.8) implies (4.7). To prove the converse, note that by (4.7),
for each k ~ 2, there exists a positive integer mk > mk-l such that

where

and ml = 1. Define En = k- l for mk ~ n < mk+l, k ~ 1. Then, En 10 as


n -+ 00 and P(an(En) > En) ::; En for all n ~ 1, which implies (4.8).

Proof of Lemma 4.1: It is enough to show that given any subsequence


{nil, there is a further subsequence {nk} C {nil such that

(!r(.c(A~kT~k +b~k I Xoo),vog- l ) -+ 0 as k -+ 00 a.s. (4.9)


4.2 Smooth Functions of Means 79

Fix a subsequence {nd. Let (in(E) be as defined above. Then, by (4.8) and
the conditions of the lemma, there exists a subsequence {nd c {ni} such
that
(ink (E nk ) ---+ 0 as k ---+ 00, a.s. (P) (4.10)
and
eS(.c(T~k I X=), v) ---+ 0 as k ---+ 00, a.s. (P) . (4.11)
We shall show that (4.9) holds for this choice of the subsequence {nd. For
any vector x E lR T , write

x'[A~T~ + b~l = (x' AT~ + x'b) + R~(x) , (4.12)

where R~(x) = x'(A~ - A)T;:: + x'(b~ - b).


Note that by (4.11) and the continuous mapping theorem (applied point-
wise on a set of P-probability one that does not depend on x E lR T ),

(4.13)

where Vx = vog;;;1 and gx : lR s ---+ lR is defined by gx(t) = x'(At+b), t E lR S.


Also, by (4.10) and (4.11),

p( IR~k(X)1 > 2E;:':211xlll X=)


< P(IIT~J > E:;;;/2 I X=) + a nk (E nk )
---+ 0 as k ---+ 00, a.s. (F). (4.14)

Hence, from (4.12), (4.13), and (4.14), it follows that

e(.c(x'[A~kT~k + b~kll X=), vx) ---+ 0 as n", ---+ 00, a.s. (P)

for all x E lR T • Thus, by the Cramer-Wold device, (4.9) holds and the
lemma is proved. D

Proof of Theorem 4.1: Write T~(j) = an(j) (X~(j) - E*X~(j)), j =

1,2,3,4 and Tn = vn(Xn - EX 1 ), where an(j)2 = n1 for j = 1,2,3


and a n (4)2 = N 1 . Note that for j = 4, T~(j) was denoted by T~(4) in
Chapter 3. For notational simplicity, we shall use the updated definitions
of T;::(4) and X~(j) 's in this chapter. Also, set 2:= = limn->= n . Var(Xn)
and .1'= = a({Xi : i 2': I}). Then, by Theorems 3.2 and 3.4, it follows that
for j = 1,2,3,4,

(4.15)

Also, note that

T*(j)
In = AT*(j)
n + R*(j)
n' J. = 1 , 2, 3 , 4 ,
80 4. Extensions and Examples

where A is the 1 x d dimensional matrix (row vector) with elements


D1H(EXd,···, DdH(EX1), where DiH(X) = a~i H(x), and where the
remainder term R~(j) is defined by subtraction. We now show that

(4.16)

for any E > 0 and for all j = 1,2,3,4.


Let tn(j) = IIE*X~(j) - EX1!I, j = 1,2,3,4. Then, on the set {tn(j) :::;
1]} n {IIX~(j) - E*X~(j) I :::; 1]}, using a one-term Taylor's expansion of the
function H around EX l , we get

IR~(j)1 lan(j)(H(X~(j)) -H(E*X~(j))) -AT~(j)1


< C· (1IX~(j) - E*X~(j) 11K + tn(j)K) . IIT~(j) I .

Hence, it follows that

P (IR~(j)1 > 2E I Xoo)


< P* (IR~(j)1 > 2E, IIX~(j) - E*X~(j)11 :::; 1]) . n(tn(j) :::; 1])
+ n(tn(j) > 1]) + P* (1IX~(j) - E*X~(j) II > 1])
< P* (CIIT~(j) I 1+ Kan(j)-K > E) + P* (tn(j)KIIT~(j) I > E)
+P* (1IX~(j) - E*X~(j)11 > 1]) + n(tn(j) > 1])
< [3P* (1IT~(j) II > C( E, /'l" 1]) . log n) + 2n (tn(j) > 1] (log n)-l)]
op(l)
by (4.15) and the fact that En(tn(j) > 1](logn)-l) = p(tn(j) >
1](logn)-l):::; (1]- l logn)2Etn (j)2 = O(n-l(logn)2) as n -+ 00. Hence, by
Lemma 4.1, the theorem now follows from (4.15) and (4.16). D

Remark 4.1 In many applications, we may be interested in an estimator


of the parameter e = H(EX 1) of the form

for some fixed integers k 1 , ... , kd, not depending on n, where il,···, fd
denote the components of the function f : JRdo -+ JRd. It can be easily
shown that the conclusions of Theorem 4.1 continue to hold for iJn if we
replace en by (jn and e~(j) by the corresponding block bootstrap versions.
In the same vein, it is easy to check that consistency of the block bootstrap
4.3 M-Estimators 81

distribution function estimators continue to hold if the function is vec-


tor valued and each component of H satisfies the conditions of Theorem 4.1.

Remark 4.2 As in Theorems 3.1 and 3.3, the block bootstrap methods
also provide consistent estimators of the asymptotic variance of the statistic
On considered in Theorem 4.1. However, we need to assume some stronger
moment and mixing conditions than those of Theorem 4.1 to establish the
consistency of bootstrap variance estimators. A set of sufficient conditions
will be given in Chapter 5, which guarantee (mean squared error or L2_)
consistency of these bootstrap estimators of the asymptotic variance of On.

4.3 M-Estimators
Suppose that {XdiEZ is a stationary process taking values in ]Rd. Also,
suppose that the parameter of interest e is defined implicitly as a solution
to the equation
(4.17)
for some function \II : ]Rdm+s ---t ]Rs, m, s E No An M-estimator On of e is
defined as a solution of the 'estimating equation'
(n-m+1)
(n - m + 1) -1 L...-
""' \II (Xi , ... , X Hm - 1 ; en)
A
=0 . (4.18)
i=1

Estimators defined by an estimating equation of the form (4.18) are called


generalized M-estimators (cf. Bustos (1982)). This class of estimators con-
tains the maximum likelihood estimators and certain robust estimators of
parameters in many popular time series models, including the autoregres-
sive moving average models of order p, q (ARMA (p, q)), p, q E Z+. See
Bustos (1982), Martin and Yohai (1986) and the references therein.
To define the bootstrap version of On, let Yi = (X:, .. . ,X:+m- 1 )', 1 :::::
i ::::: no denote the (m - 1)-th order lag vectors, where no = n - m + l.
Next suppose that yt(j), ... , y;o(j) denote the "ordinary" block bootstrap
sample of size no drawn from the observed y-vectors under the jth method,
j = 1,2,3,4. Because of the structural restriction (4.17), there appears to
be more than one way of defining the bootstrap version of the generalized
M-estimator On and its centered and scaled version
T2n == Vri( On - e) .
Following the description of the block bootstrap methods in Chapter 2,
we may define the bootstrap version e~(j) of On based on the jth method,
j = 1,2,3,4, as a solution of the equation
no
n- 1
L...- \II (y*(j).
o ""' t 'n
e*(j)) = 0
. (4.19)
i=1
82 4. Extensions and Examples

The bootstrap version of T2n is then given by

T*(j)
2n
= f7i::o (e*(j)
V"U n
- ij(j))
n' (4.20)

where the centering value ij~) is defined as a solution of


no
n- 1 ' " E W(Y*(j)· ij(j)) = 0 (4.21)
a ~ * 'l 'n
i=l

to ensure the bootstrap analog of (4.17) at the centering value ij~) in the
definition of T;~j). Note that for the CBB or the SB applied to the series
. -(j) ,
{Y1 , ... , Y no }, equatlOn (4.21) reduces to (4.18) and, hence, en = en for
j = 3,4. Thus, the original estimator en
itself may be employed for center-
ing its bootstrap version e~(j) for the CBB and the SB. However, for the
MBB and the NBB, ij~) need not be equal to en
and, hence, computation
of the bootstrap version T;~j) in (4.20) requires solving an additional set
of equations for the "right" centering constant ij~). It may be tempting to
replace ij~) with en and define

t*(j)
2n
= Vf7i::(e*(j)
1·0 n
- e)
n (4.22)

as a bootstrap version of T 2n for j = 1,2. However, for the MBB and the
NBB, centering e~(j) at en
introduces some extra bias, which typically leads
to a worse rate of approximation of £(T2n) by £(t;~j) I Xn) compared
to the classical normal approximation (cf. Lahiri (1992a)). Indeed, this
"naive centering" can render the bootstrap approximation totally invalid
for M-estimators in linear regression models as noted by several authors
in the independent case (cf. Freedman (1981), Shorack (1982), and Lahiri
(1992b)).
An altogether different approach to defining the bootstrap version of T 2n
is to reproduce the structural relation between equations (4.17) and (4.18)
in the definition of the bootstrap version of the M-estimator itself. Note
that if we replaced en
in (4.18) bye, then the expected value of the left
side of (4.18) would be zero. As a result, the estimating function defin-
ing en is unbiased at the centering value e. However, in the definition of
the bootstrapped M-estimator in (4.19), this unbiasedness property of the
estimating function does not always hold. A simple solution to this prob-
lem has been suggested by Shorack (1982) in the context of bootstrapping
M-estimators in a linear regression model with iid errors. Following his
approach, here we define an alternative bootstrap version e~*(j) of en
as a
solution to the modified equation
no
nol L [W(y;*(j); e~*(j)) -,(j;j] = 0 , (4.23)
i=l
4.3 M-Estimators 83

where ¢j = n o1 E* { L~~l W(~*(j); en) }. = 1,2,3,4, the


Note that for all j
(conditional) expectation of the estimating function L~~l [w (y;*(j); t) -¢j]
is zero at t = en. Thus, ¢j is the appropriate constant that makes the
estimating function in (4.23) unbiased if we are to center the bootstrapped
M-estimator at en. The bootstrap version of T 2n under this alternative
approach is given by

T,**(j) =
2n rn::(8**Cj)
V"O n - en·) (4.24)

An advantage of using (4.24) over (4.20) is that for finding the boot-
strap approximation under the MBB or the NBB, we need to solve only
one set of equations (viz. (4.23)), as compared to solving two sets of
equations (viz., (4.19) and (4.21)) under the first approach. Since ¢j =
o
n 1 L~~lW(Yi;en) = 0 for j = 3,4, the centering is automatic for the
CBB and the SB. 'As a consequence, both approaches lead to the same
bootstrap version of T2n under the CBB and the SB.
The following result establishes validity of the block bootstrap ap-
proximations for the two versions T;~j) and T;~Cj), j = 1,2,3,4. Write
o
~\]i = lim n --+ oo Var(n 1 / 2 L~~l w(Yi; 8)) and let D\]i be the s x s matrix
with (i,j)-th element E[8~ Wi(Y1; 8)], 1 :::; i,j :::; s. Also, assume that the
J
solutions to the estimating equations are measurable and unique.

Theorem 4.2 Assume that

(i) w(y; t) is differentiable with respect to t for almost all y (under Fy)
and the first-order partial derivatives of W (in t) satisfy a Lipschitz
condition of order K E (0,1]' a.s. (Fy), where Fy denotes the proba-
bility distribution of Y 1 .

(ii) Ew (Y1 ; 8) = 0, and ~\]i and D\]i are nonsingular.

(iii) There exists a 5 > 0 such that EIID Q w(Y1 ;8)11 2 r j +J < 00 for all
a E Z:+ with lal = 0,1, and ~(rj; 5) < 00, where rj = 1 for j = 1,2,3
and rj = 3 for j = 4.

(iv) £-1 + n- 1/ 2£ = 0(1) and p + (n 2p)-1 = 0(1) as n ----; 00.

Then,

(a) {e n }n>l is consistent for 8 and

(b) For j = 1,2,3,4,

sup Ip*(T;~(j) :::; x) - P(T2n :::; x)l------'>p 0 as n ----; 00 .


xEIRS
84 4. Extensions and Examples

(c) Part (b) remains valid if we replace T;~(j) by T;~j) of (4.20).


To prove Theorem 4.2, we shall follow an approach used in Bhattacharya
and Ghosh (1978). The following result is useful for establishing the con-
sistency of {On}n>l.
Proposition 4.1 (Brouwer's Fixed Point Theorem). Let l1 = {x E ffi.S :
Ilxll :::; I} denote the unit ball in ffi.s and let f : l1 ---+ l1 be a continuous
function. Then, f has a fixed point in l1, i. e., there exists Xo E l1 such that

f(xo) = Xo .
Proof: See page 14, Milnor (1965). o
Lemma 4.2 Suppose that A and B are two d x d matrices for some dEN
and A is nonsingular.
(a) If IIA- BII < 0/IIA- 1 1 for some 0 E (0,1), then B is nonsingular and

liB-III < IIA- 111/(1- 0) .


(b) If B is also nonsingular, then

Proof:
(a) Let (n° = lId for any d x d matrix f. Since L~o IllId -
A-I Bilk :::;
L~o(IIA-IIIIIA -BII)k < 00, (each of the d 2 components of) the
matrix-valued series L~o(lId - A-I B)k is absolutely convergent.
Write Q = LC;:=o(lId - A-I B)k. Then,

Q- '2) lId - A-I B)k = lId ,


k=1
and similarly, (A-1B)Q = lId, so that A-1B is nonsingular and Q =
(A -1 B)-I. Now, premultiplication by A and postmultiplication by
A-I ofthe identity (A-l B)Q = lId implies that BQA- 1 = lId. Hence
(QA-1)B = Q(A-l B) = lId = B(QA- 1 ). This proves that B is
nonsingular, with B- 1 = QA- 1 , and

liB-III < IIA- 1 1111QII


CX)

< IIA-111 L IllId - A-1Bll k


k=O
CX)

< IIA-111 LOk = IIA- 111(1- 0)-1 .


k=O
4.3 M-Estimators 85

(b) Follows from the identity

(4.25)

Hence, the proof of Lemma 4.2 is complete. D

Proof of Theorem 4.2: For notational simplicity, without loss of gener-


ality, we suppose that m = 1 and EX 1 = O. Then, no = n, Yi = Xi and
y*(j)
'l = X*(j)
1 for.
all '
i and for all J' = 1 , 2 , 3 , 4 •
(a) By (4.18) and Taylor's expansion,
n
0= n- 1 L [W(Xi; e) + L D"'W(Xi; e)(On - e)] + R 1n , (4.26)
i=l 1",1=1
where, by condition (i),

IIR1nli :::; n- 1 tL
i=ll",l=l
{liOn - ell

X 1111 D"'W(Xi ; e + U(On - e))du - D"'w(Xi;e)ll}

< ellon - ell 1+1< .

Note that, by Markov's inequality,

p(lln- 1 tw(xi;e)11 > n- 1j2 10 g n)

:::; Elln- 1 t 2
w(Xi; e)11 (n(10gn)-2)

= O((10gn)-2) (4.27)

and, similarly, for all lal 1,

t{
=

p (1In- 1 D"'W(Xi; e) - ED"'W(X1; e)} II > (n- 1/2 10g n))


= O((10gn)-2) . (4.28)

Next, define A1n {lln-l2::~=lW(Xi;e)11 < n- 1j2 10gn and


Iln- 12::~=1 (D"'W(X i ; e) - ED"'W(Xi ;e)) II :::; n- 1/ 210gn for all lal = I}.
Then, by (4.27) and (4.28), P(A 1n ) -+ 1 as n -+ 00. Let D\II,n be the s x s
matrix with (i, j)-th element n -1 2::;=1 (aWi (Xt ; e) / ae j ), 1 :::; i, j :::; s. By
86 4. Extensions and Examples

Lemma 4.2 and condition (ii), DiJ!,n is nonsingular on the set A 1n , for n
large. Hence, for n large, on the set A 1n , we can write (4.26) as

(4.29)

Note that the right side of (4.29) is a continuous function of (en - e);
call it g(e n - e). Now, using (4.27), (4.29), and the bound on R1n,
we see that there exists a C1 E (1, (0) such that Ilg(e n - e)11 : : ;
C 1n- 1/ 2(logn) for all lien - ell::::; C 1n- 1/ 2(logn). Thus, setting f(x) =
[C1n-1/2lognt1g([C1n-1/21ognjx), x E U, we have a continuous func-
tion f : U -+ U. Hence, by Proposition 4.1, there exists a Xo E U such
that f(xo) = Xo, or equivalently, g([C1n-1/2lognlxo) = [C1n- 1/ 2 lognlxo.
Since, by assumption, (en - e) is the unique solution to (4.29), we must
have en - e = [C1n-1/21ognjxo. Therefore, lien - ell::::; C 1n- 1/ 2(logn) on
the set A 1n , for n large. Since P(A 1n ) -+ 1 as n -+ 00, this implies that

(4.30)

In particular, {e n }n>l is consistent for e.


Next, multiplying both sides of (4.29) by vn and using (4.28), the bound
on R 1n , and Slutsky's Theorem, we get

v'ri(Bn-e) = (DiJ!+Op(1))-1[Jn~1l1(Xi;e)

+ Op (n-I</2(log n)1+I<) ]
---+d N(O, DiJ!r,iJ!D~) as n -+ 00 .

This proves part (a).


Next we prove the results on the bootstrapped M-estimators. Note that
for m = 1, y;*(j) = X;(j) and no = n. To simplify the proofs, we shall
consider the case where resamples are based on "complete" blocks; the
prooffor the res ample size n is similar, but is somewhat more involved due
to the effect of the "incomplete" segment from the last resampled block.
Accordingly, suppose that the resample size is nl == bf for j = 1,2,3 with
b = lnfJ for some integer f E (1, n) and it is N1 == L1 + ... + L K , with
K == inf{k ~ 1 : L1 + ... + Lk ~ n} for j = 4. The bootstrap equations
(4.19)-(4.21) and (4.23)-(4.24) are now redefined by replacing no with n1
for j = 1,2,3 and by replacing no with Nl for j = 4.
For proving the results, we now introduce some notation. Let Z:~j) =
a
D 1l1(x;(J); e), Zai = D a 1l1(Xi ; e), 0: E Z:+, i ~ 1, j = 1,2,3,4. Also, define
AW) = {lien - ell : : ; Cn- 1/ 2 log n, Llal=l IIE* L~~l (Z:~j) - EZad I ::;
4.3 M-Estimators 87

n 1-1</4 , and ",1


61001=0
E* I ",n,
(Z*U)
a2
- E * a2
62=1
2
<
-
C(s)n} ,1- < J.- Z*(j»)11
< 3.
And for j = 4, let A~4) be defined as A~l) above, but with n1 replaced by
N1 and j = 1 replaced by j = 4 ..
First we consider the case j E {l, 2, 3}. Expanding both W(·; e~*(j») and
w(·; en) on the left side of the equation (4.23) in Taylor's series around e
and using condition (i) on the set A~!l, we get

o = nIl f [{
i=l
Z;iU) + L (e~*U) - e)a Z:~j)}
lal=l

- E* { Z;i(j) + L (en - e)a Z:~j)}]


lal=l

+ R**U)
In ,

where IIR~~U)II < c[lle~*(j) - ell 1+1< + lien - ell 1+1<]. We can rewrite this as
(e**(j)
n
- e)
n
= [D**U)]
w,n
-1 [n- 1 0 ~ (Z*U) - E * Z*(j)) + R**(j)]
1 02 2n'·
02
(431)
i=l

provided D~~~) is nonsingular, where D~~~) is the s x s matrix with (k,r)-


th element nll2:~lawk(Xi*(j);e)/aeTl 1:S; k,r:S; s and where the re-
mainder term R;~(j) admits the bound

IIR;~(j) I < lien - ell L IIn I f (Z:\j) - 1 E*Z:\j») I


lal=l 2=1

+ IIR~~(j)11 . (4.32)

Next define the set A~*(j), j E {l, 2, 3} by

A~*(j) = { t
lal=O
IIn I fz:~j) - E*z:~j)ll:s; n- 1/ 2 log n} .
1

2=1

Note that by condition (ii) and Lemma 4.2, D~*~) is nonsingular on A~*(j) n
AW) for large n. Hence, as in the proof of pa~t (a), by (4.31), (4.32), and
Proposition 4.1, on the set AW), there exists a constant C > 0 such that
p* (1Ie~*U) - enll :s; Cn- 1/ 2 logn)
> P*(A~*U»)

> 1- Cn(logn)-2 t
lal=O
E*llnI f Z:\j) _E*z:;j) 112
1

2=1

> 1- C· (logn)-2 , (4.33)


88 4. Extensions and Examples

for n large.
Next write Z~~) = nIl 2::~~1 Z~~j) and Zak = k- 1 2::7=1 Zai, kEN,
-.(2) - -.(3) - .
lal = 0,1. Note that E.Zan = Zanll E.Zan = Zan, and as III (3.13),

EIIE.Z~~) - EZal 11 2 = O(n-1) .


Hence, noting that IIxl1 2 ::; C(d)llxx'll for any x E ]Rd and using (4.30),
Lemma 3.2, and Theorem 3.1, we get

p( (A~)r)
< p(llen - Oil > Cn- 1/ 2 Iogn)
+ Cn~/2 L EIIE. (Z~~)) - EZal112
10.1=1
1

+ L P(IIVar. (fo1Z~~)) - Var( v'nZan) II> C)


10.1=0
o(1)+O(n~/2n-1)+o(1)~0 as n~oo. (4.34)
Hence, using the definitions of the sets A~·(j) and A~), it follows from
(4.32)-(4.34) that

p. (IID~~~) - Dwll > Cn-~/4) ---.p 0 as n~ 00 (4.35)


and
(4.36)

Hence, for j = 1,2,3, part (b) of the theorem now follows from Theorem
3.2, (4.31), (4.35), (4.36), and Lemma 4.1.
Next consider j = 4. In this case, by (4.18) and Wald's lemmas (cf.
Appendix A),

(L 1l1(Xi*(j); en))
N1
~j Nl1 E.
i=l
L1
N 1 1(E.K)E. (L 1l1(Xi · (j); en))
i=l
n
N 1 1(E.K)(E.Ld (n- 1 L 1l1(Xi ; en))
i=l
o.
Hence, the bootstrapped M-estimator O~·(j) is a solution of
N1
L 111 ( X;(j); O~·(j)) = 0. (4.37)
i=l
4.3 M-Estimators 89

Now, as in (4.31), using Taylor's expansion of the function w(x;·) in (4.37)


around e, we get

( e**(j) - 0n ) = [D**(j)] -1 [N- l ~ (z*(j) - E Z*(j)) + R**(j)] (438)


n \Ii, n I L Ot * Ot 2n .
i=l
where, with j = 4, IIR;~(j)11 < C[lle~*(j) - ell 1+1< + liOn - elll+I<] + IIOn-
ell Llal=l IINIl L~l (z:~j) - E*Z:~j))11 and where D~~~4) has (k, r)-th el-
ement NI1L~lawk(xt(4);e)/aer' 1::; k,r::; s. Next, define the set
A~*(4) by replacing n1 with Nl in the definition of A~*(l). Since Nl ~ n, it
follows that on the set A~4),

1- p* (A~*(4))

< Cn(logn)-2 t
lal=O
E*{ N I 2 11 f
t=l
(Z:~4) _ E*Z:~4)) 112}
< Cn(logn)-2n- 2 t
lal=O
E* II f
t=l
(Z:~4) _ E*z:~4)) 112
< C(s)(logn)-2. (4.39)
Let S:~4) is the kth block sum of the Z:~4),S, a E Z:+, lal = 0,1. Then,
using an iterated conditioning argument similar to the one employed in the
proof of Theorem 3.4 (cf. (3.38)), and using Wald's lemmas (cf. Appendix
A), we have
N,
E* L (Z:~4) - EZal)
i=l

E[ t E {
k=l
(S:~4) - LkEZal) I Tn} I Xn]
E{ t Lk(Zan - EZal ) I Xn}
k=l
{(E*K)p-1} (Zan - EZ(1 )
and

E* II t, (Z:~4) - E*Z:~4)) 112


E{ ~E (1IS:~4) - LkZanl12 I Tn) I Xn}

(E*K)E* IIS:~4) - L1Zanl12


90 4. Extensions and Examples

for all a E Z+, lal = 0,1. Let f2(£; r, a) denote the conditional variance of
the rth component of s:i given Tn, lal = 0, 1, r = 1, ... , s. Then, noting
4)

that E(s:i
4 ) I Tn) = LlZan , we have
E*lls:i LlZanl12
4
) -

E( {EIIS:i LlZanl12 I Tn} I Xn)


4) -

s
2..: E*f (L r, a) .2 l ;
r=l
Hence, using Lemma 3.4 (parts a(iv) and b(ii)) and the above identities,
we get

1-p(A~n))

< P (11Bn - 811 > n- l / 2 Iog n)

+ p( 2..: IIZan -
lal=l
EZal11 > cn-I</4)

+ p( t E* IIS:i 4) - LlZanl12 > C(s)Cp-l)


lal=O
----t 0 as n----too. (4.40)

Now part (b), j = 4 follows from (4.39), (4.40), Theorem 3.4 and Lemma
4.1.
Part (c) of the theorem can be proved using similar arguments. 0

4.4 Differentiable Functionals


In this section, we consider consistency of the MBB approximation for
estimators that are smooth functions of the empirical process. Let {Xi hEZ
be a sequence of stationary JRd-valued random vectors. Define the empirical
distribution of m-dimensional subseries of the data Xl"'" as Xn
n-m+l
F~m) = (n - m + 1)-1 2..: byi ,

i=l
where Yi == (X:, ... ,X:+m-d, i :::: 1, and where by denotes the probability
measure putting mass one at y. The probability distribution F~m) serves
as an estimator of the marginal distribution F(m) of the m-dimensional
subseries (Xl, ... , Xm).
4.4 Differentiable Functionals 91

Suppose that the level-1 parameter of interest e is a s-dimensional func-


tional of F(m), given by
(4.41 )
Then, a natural estimator of e is given by
(4.42)

Many commonly used estimators, including the generalized M-estimators


of Section 4.3, may be expressed as (4.42). For the generalized M-estimator,
the relevant functional T (.) is defined implicitly by the relation

for a suitable family gem) of probability measures on lR dm , depending


on the function Ill. Below we describe another important class of robust
estimators that can be expressed in this form.

Example 4.4: Suppose that the process {XdiEZ is real-valued. Then, for
any 1/2 < a < 1, the a-trimmed mean is given by

Ln(l-a)J
AI"
en = n(l _ 2a) ~ X i :n , (4.43)
i=LnaJ+I

where X I :n :::; ... :::; Xn:n denote the order-statistics corresponding to


Xl' ... ' X n . Write F~l) = Fn and F(l) = F for notational simplicity. Then,
the estimator en in (4.43) is asymptotically equivalent to a functional T(·)
of the one-dimensional empirical distribution F n , given by

where, for any probability distribution G on lR, G-I(u) = inf{x E lR :


G((-oo,x]) 2 u}, 0 < u < 1, denotes the quantile transform of G. The
estimator enor T(Fn) is used for estimating the parameter

e= T(F) == iI-a F-I(u)du / (1 - 2a) .

The a-trimmed mean en


is a robust estimator of e that guards against the
influence of extreme values and outliers. Note that as a --+ ~-, the limiting
form of the parameter e is the population median F- I (1/2) (provided
F-I(U) is continuous at u = 1/2) while for a = 0, we get e = EX I =
the population mean. Thus, for different values of a, e provides some of
the most commonly used measures of central tendency.
92 4. Extensions and Examples

Somewhat more generally, the class of L-estimators of location param-


eters can be expressed in the form (4.42). Let J : (0,1) --> lR be a Borel
measurable function. Then, define the L-estimator en with weight function
J(.) .as

en = 10 1
J(U)F;;l(U)du.

In this case, the functional T(·) is given by T(G) = Jo


1 J(u)G- 1 (u)du. D

Now we return to the discussion of a general estimator (4.42) that is


a functional of F~m). If the functional T(·) is smooth, then asymptotic
distribution of en can be derived from the asymptotic distribution of the
empirical process {yin (F~m) (-) - F(m) (-))} on a suitable metric space,
using a version of the Delta method. To prove consistency of the block
bootstrap methods for the distribution of such estimators, we need to know
the asymptotic behavior of the bootstrapped empirical process. In Section
4.4.1, we describe some results on the bootstrapped empirical process, and
establish consistency of bootstrap for estimators of the form (4.42) in Sec-
tion 4.4.2.

4.4.1 Bootstrapping the Empirical Process


Results on asymptotic distribution of empirical processes for dependent
random variables have been obtained by several authors; see, for example,
Billingsley (1968), Deo (1973), Sen (1974), Yoshihara (1975), Arcones and
Yu (1994), and the references therein. Similar results for the bootstrapped
empirical process are known only for the MBB and the eBB. For the ease
of exposition, in this section, we shall suppose that m = 1. Indeed, if we
set Y; = (Xi,.·., Xi+m-d, i E Z and no = n - m + 1, then, in terms of
the Y-variates,
no
F~m) = ni)l 2.: OYi

i=l
and this essentially reduce the general problem to the case m = 1 for
the dm-dimensional random vectors Y 1 , ... , Y no . Hence, without loss of
generality, we set m = 1. Also, for notational simplicity, write F~1) = Fn
and F(l) = F.
Let [])d denote the space of all real-valued functions on [-00, oold that are
continuous from above and have limits from below. We equip [])d with the
(extended) Skorohod metric (cf. Bickel and Wichura (1971)). Write Fn(x)
and F(x) for the distribution functions corresponding to the probability
measures Fn and F, respectively. Thus,
n
Fn(x) = n- 1 2.: D.(Xi :::; x) (4.44)
i=l
4.4 Differentiable Functionals 93

and
F(x) = P(XI ::::; x) , (4.45)
x E [-oo,oold. Recall that for any two vectors x == (Xl, ... ,Xd)' and y ==
(Yl, ... , Yd)', x ::::; Y means Xi ::::; Yi for all 1 ::::; i ::::; d. Define the empirical
process
Wn(x) = v'n(Fn(x) - F(x)), x E [-oo,oold .
Then, under some regularity conditions (cf. Yoshihara (1975)), the empir-
ical process Wn converges weakly (as JD)d-valued random elements) to a
Gaussian process W on [-00, oold satisfying

EW(x) = 0

L
00

EW(x)W(y) = COV(n(Xl::::; x), n(Xl+k ::::; y)) ,


k=-oo
(4.46)
and
P(W E C~) = 1,

for all x, Y E [-00, oold, where C~ is the collection of continuous functions


on [-oo,oold that vanish at (-00, ... , -00)' and (00, ... ,00)'. The next
theorem shows that a similar result holds for the bootstrapped empirical
process. Let F;; (x) denote the empirical process of nl MBB samples based
on blocks of length £, and let W;;(x) = y'nl(F;;(x) - E*F;;(x)) , x E ]Rd.

Theorem 4.3 Suppose that {Xn }n2:l is a sequence of stationary strong-


mixing ]Rd-valued random vectors with 2::1 i 8d+7[a(i)ll/2 < 00, and that
Xl has a continuous distribution. Let £ ----t 00 and £ = O(n1/ 2-€) as n ----t 00
for some 0 < E < 1/2. Then,

W~ ---+ d W as n ----t 00 almost surely .

Proof: See Biihlmann (1994).

Note that JD)d is a complete separable metric space and, hence, there is a
metric, say, {!, that metricizes the topology of weak convergence on JD)d (cf.
Parthasarathi (1967), Billingsley (1968)). Thus, Theorem 4.3 implies that

{!(c(W~ I X n ), C(W)) ---+ 0 as n -+ 00 a.s. (4.47)

Results in Yoshihara (1975) yield weak convergence of Wn to W on JD)d


under the conditions of Theorem 4.3. Hence, combining the two, we have
the following corollary.

Corollary 4.1 Under the conditions of Theorem 4.3,


94 4. Extensions and Examples

A variant of Theorem 4.3 in the one-dimensional case (i.e., d = 1) has


been proved by Naik-Nimbalkar and Rajarshi (1994), also for the MBB.
In the d = 1 case, Peligrad (1998) proves a version of Theorem 4.3 for
the eBB under a significantly weaker condition on the strong mixing co-
efficient, but assuming a restrictive condition on the block length. A rig-
orous proof of weak convergence of empirical processes for the SB does
not seem to exist in the literature at this point. Note that by Prohorov's
Theorem (cf. Billingsley (1968)), proving this would involve showing (i)
weak convergence of finite dimensional distributions and (ii) tightness of
the bootstrapped empirical processes. Theorem 3.4 shows that the finite-
dimensional distributions under the SB method has the appropriate limits.
Thus, the main technical problem that needs to be resolved here is the
tightness of the bootstrapped empirical process.

4.4.2 Consistency of the MBB for Differentiable


Statistical Functionals
Let On be an estimator that can be represented as a smooth functional of
the m-dimensional empirical distribution F~m) as in (4.42) for some m ~ 1.
A general approach to deriving asymptotic distributions of such statistical
functionals using differentiability was initiated by von Mises (1947), and
further developed by Filippova and Brunswick (1962) and Reeds (1976),
among others. For a systematic treatment of the topic and some applica-
tions, see Boos (1979), Serfling (1980), Huber (1981), and Fernholz (1983).
Here we consider statistical functionals that are Frechet differentiable. Let
lPk denote the set of all probability measures on ~k and let §k denote the
set of all finite signed measures on ~k, k ~ 1. Let II . II(k) be a norm on
§k and let II . II denote the Euclidean norm on ~k. Then, we say that a
functional T : lPk -+ ~s is Frechet differentiable at F E lPk under II . II(k) if
there exists a function T(1)(F;.) : §k -+ ~s such that

(i) T(l) (F; .) is linear, i.e.,

for all a, b E ~, and !l1,!l2 E §k ;

and

(ii) for G E ·lP'k,

IIT(G) - T(F) - T(I)(F; G - F)II -+ 0


IIG-FII(k)
as IIG - FII(k) -+ 0.
4.4 Differentiable Functionals 95

The linear functional T(1)(F;.) is called the Frechet derivative of T at


F. This definition of Frechet differentiability is slightly different from the
standard Functional Analysis definition (cf. Dieudonne (1960)), where the
functional T is assumed to be defined on a normed vector space, like §k,
rather than just on IP'k. Thus, for the standard definition, one has to extend
the definition of a functional T from IP'k to §k. However, the given definition
is adequate for our purpose, since we are only interested in the values of
the functional at probability measures, not at signed measures. A similar
definition of Frechet differentiability has been used by Huber (1981).
Now suppose that the parameter 0 and its estimator en
are given by
(4.41) and (4.42), i.e., 0 = T(F(m)) and en
= T(FA m )) for some func-
tional T : IP'dm ---7 ]Rs, for some m, sEN. If T is Frechet differentiable at
F(m) with Frechet derivative T(1)(F(m); .), then by the linearity property
of T(l)(F(m); .),

T(FA m )) - T(F(m))
T(1)(F(m); FA m ) - F(m)) + Rn
no
nol LT(l) (F(m); 6Yi - F(m)) + Rn
i=l
no
nol L h(Yi) + Rn , say, (4.48)
i=l

where no = (n - m + 1), h(y) = T(1)(F(m);6 y -F(m)), y E ]Rdm, and


Rn is the remainder term satisfying IIRnl1 = o (1IFAm) - F(m) II(dm)) as
IIFA m) -F(m)ll(dm) ---7 O. Therefore, we may obtain asymptotic distribution
of vn(e n - 0) from (4.48), provided that Rn = op(n-l/2). The latter con-
dition holds if vnllFAm) - F(m) II (dm) is stochastically bounded under the
norm II ·11(dm). Here, we shall take II . II(dm) to be Kolmogorov's half-space
norm defined by

Ilvll oo = sup{lv((-oo,x])1 : x E ]Rdm}, v E §dm . (4.49)

Then, we have the following result.

Theorem 4.4 Assume that T(·) is Frechet differentiable at F(m) in the


II . lloo-norm and that Ellh(Ydl1 3 < 00, Eh(YJ) = 0 and ~t'l ==
lim Var( n- 1 / 2 L~=l h(Yi)) is nonsingular. Also, assume that the condi-
n-->oo
tions of Theorem 4.3 hold. Then,

(a) vn(en-o) ---->dN(O,~t'l) as n---7oo.

(b) Let O~ = T(FA ml *) and en


= T(E*FA ml *), where FA m )* is the empiri-
cal distribution corresponding to a MBB sample based on bo == lno/ £J
96 4. Extensions and Examples

resampled blocks of length /!. Then,

:~CS Ip* (VnO(O~ - On) ~ x) - p( vn(On - 0) ~ x) I


--+p 0 as n -+ 00 .

Thus, the MBB approximation to the sampling distribution of fo(On -0)


is consistent under the conditions of Theorem 4.4. Here we remark that
Theorem 4.4 remains valid under a weaker moment condition on h(Yd
than what we have assumed above. Indeed, conclusions of Theorem 4.4
hold under the moment condition 'Ellh(YdI1 2 +<5 < 00 for some 8 > 0',
provided 2::1 a(i)o/(2+<5) < 00 and the remaining conditions of Theorem
4.4 hold. We have used the stronger moment condition on h(Y1 ) only to
simplify the proof.

Proof: For a function f : [-oo,oo]k -+ JR, let Ilflloo = sup{lf(x)1 : X E


[-oo,oo]k}, k :::: 1. Note that this definition is consistent with (4.49), in
the sense that Ilvll oo can be interpreted either as the Kolmogorov norm for
the signed measure v or as the sup norm for the corresponding distribution
function.
Under the conditions of Theorem 4.3, W~m)(x) == fo(F~m)((-oo,x])­
F(m)(( -00, x])), x E [-00, oo]dm converges weakly (as random elements of
]jJ)dm equipped with the extended Skorohod topology) to a Gaussian process
w(m)(x), where w(m)(x) has continuous sample paths with probability
one. Note that for a continuous function f on [_oo,oo]dm, if fn E ]jJ)dm
converges to f in the Skorohod metric, then sup{lfn(x) - f(x)1 : x E
[-00, oo]dm} -+ 0 as n -+ 00. Thus, the mapping 9 : ]jJ)dm -+ JR, defined by
g(f) = Ilflloo is continuous under the Skorohod metric at f = fo if fo is
continuous on [-00, oo]d. Hence, by the continuous mapping Theorem (cf.
Theorem 5.1, Billingsley (1968)),

Ilvn(F~m) - F(m)) 1100 = Op(1) .

Now part (a) of the theorem follows from (4.48) and the Central Limit
Theorem for dependent random vectors (cf. Theorem A.8, Appendix A).
To prove the second part, for notational simplicity, we assume that no =
bo/!o Also, let Z be a random vector having the N(O, Et')) distribution on
JRs. Then, we need to show that

Note that by the F'n~chet differentiability of T(·) at F(m) under 11·1100, we


can write

(4.50)
4.4 Differentiable Functionals 97

where, for any E > 0, there exists a 0 > 0 such that


(4.51)

whenever IIG(m) - F(m) 1100 < o. Also, by the linearity of T(1)(F(m); .),

T(1) (F(m). E F(m)* _ F(m))


, * n

T(') (F(m); [(no - f+ 1)fr' no~+"~' (oy, - F(m)) )

no-€+lj+€-l
[(no - C+ 1)Cr 1
L L h(Yi)
j=l i=j

Hence, by (4.48) and (4.50), we get

v'nO(e~ - en) = v'nO[ (T(F~m)*) - T(F(m)))

- (T(E*F~m)*) - T(F(m)))]

vk ~ [h(Yi*) - E*h(yn] + R~
T~ + R~, say, (4.52)

o
where n 1 / 2 R~ = R(F~m)*) + R([E*F~m)*]).
Let A (x, E) be the E- neighbor hood of the boundary of ( - 00, x], defined by
A(x, E) = (-00, x+El]\( -00, x-Ell, E > 0, x E ffi,s, where 1 = (1, ... ,1), E
ffi,s. Then, for any E > 0,

~n IIp* (Vn(e~ - en) :::; x) - P(z:::; x) 1100


< IIP*(T~:::; x) - P(Z :::; x)lloo + P*(IIR~II > E)
+supP(Z E A(X,E))
x
(4.53)

Since Z has a normal distribution, there exist Co > 1 and EO > 0 such that
for all 0 < E < EO,
(4.54)

Next, note that IIW~m)lloo = Op(1) and that by (an extension of) Theo-
rem 4.3 and the continuous mapping theorem (cf. Theorem 5.1, Billingsley
98 4. Extensions and Examples

(1968)), ~(c(IIW~(m)lloo I Xn),£(llw(m)lloo)) ---+p 0 as n - 7 00. Hence,


given", E (0,1), there exists M > 1 such that for all n ~ M,

p(llw(m)lloo > M) < ",/12


p(llw~m)lloo > M) < ",/6
p(P*(llw~(m)lloo > M) > ",/3) < ",/6. (4.55)

Now, fix", E (0, to). Let t1 = ",/(3Co). Then, by (4.51) (with t = tI/6M),
there exists M1 ~ M such that for all n ~ M 1,

IIR(F~m)*)11 :::; EI/(2Fo)


on {IIF~m)* - F(m) 1100 :::; 3M/Fa} and

IIR(E*F~m)*)11 :::; EI/(2Fo)

on An == {IIE*F~m)* _F(m) 1100 :::; 2M/v'n}. Hence, using (4.52) and (4.55),
and the arguments leading to (3.13), for n ~ M1 from (4.53), we get

P(~2n(E1) > ",/3)


< p( {P*(IIR~II > €I) > ",/3} nAn n {IIR(E*F~m)*)11 :::; EI/(2Fo)})
+ P(IIE*F~m)* - F(m)lloo > 2M/v'n)
< p( {P*(IIR(F~m)*)11 > EI/(2Fo)) > ",/3} nAn)
+ p(IIE*F~m)* - F~m)lloo > M/v'n) + p(llw~m)lloo > M)
< p( {P*(II(F~m)* - F(m))lloo > 3M/Fo) > ",/3} nAn)
+M- 1 n 1 / 2 E(IIE*F~m)* - F~m) 1100) + ",/6
< p(P*(llw~(m) 1100 > M) > ",/3) + M- 1n 1/ 2(2f/n) + ",/6
< ~ + 2M- 1 n- 1/ 2 f . (4.56)

Also, note that by Theorem 3.2, (4.52), and (4.53), ~ln ---+p O. Hence,
for any 0 < ", < EO, by (4.54) and (4.56), for sufficiently large n,

P(~n >",) :::; P(~ln > ",/3) + P(~2n(E1) > ",/3)


:::; ",/3 + (",/3 + 2£/(Mn1/2))
< ",.
This completes the proof of the theorem. D

The proof of Theorem 4.4 can be simplified significantly, if instead of


Frechet differentiability, we assume a stronger version of it, known as the
4.5 Examples 99

strong Frechet differentiability (cf. Liu and Singh (1992)). A functional T


is called strongly Frechet differentiable at F E IP'k under II . II(k) if there
exists a linear function T(l)(F;.) : §k ---+ IRs such that

IIT(G) - T(H) - T(l)(F; G- H)II/IIG - HII(k) ---+ 0

as IIG - FII(k) ---+ 0 and IIH - FII(k) ---+ O. While Frechet differentiability
of many robust estimators is known, the notion of strong Frechet differen-
tiability for statistical functionals is not very well-studied in the literature.
Hence, we have established validity of the bootstrap approximation assum-
ing regular Frechet differentiability only, so that Theorem 4.4 can be readily
applied in such known cases. For results under a further weaker notion of
differentiability, viz., Hadamard differentiability, see Chapter 12.

4.5 Examples
Example 4.5: Let {XoihEZ be a stationary real-valued time series with
autocovariance function 'Y(k) = Cov(XOi , XO(Hk)), i, k E Z. For 0 ~ u < n,
let '1'n(k) = (n - k)-l L~::lk XOiXO(i+k) - X5(n-k) be the estimator of 'Y(k)
introduced in Example 4.1. Then On = '1'n(k) and () = -y(k) admit a rep-
resentation satisfying the requirements of the "Smooth Function Model" .
Since the function H (.) is infinitely many times differentiable, conclusions
of Theorem 4.1 hold for '1'n(k) and -y(k), provided the time series {XoihEZ
satisfies the relevant moment and strong mixing conditions.
For the purpose of illustration, now suppose that {XoihEZ is an ARMA
(3,4) process specified by

X Oi - 0.4XO(i-1) - 0.2XO(i-2) - 0.lXo(i-3)


= Ei + 0.2Ei-1 + 0.3Ei-2 + 0.2Ei-3 + 0.lEi_4 , (4.57)

where {EihEZ is a sequence of iid N(O, 1) random variables. Then, {XoihEZ


is strongly mixing, with an exponentially decaying mixing coefficient (cf.
Doukhan (1994)) and it has finite moments of all orders. Thus, the con-
ditions of Theorem 4.1 hold for this ARMA (3,4) process. By Theorem
4.1 and Remark 4.2, all four block bootstrap methods provide consistent
estimators of the sampling distribution and the second moment of

Here, we look at the finite-sample performance of different block bootstrap


methods for estimating the mean squared error (MSE) of '1'n(k) when k = 2
and the sample size n = 102. Thus, the level-2 parameter of interest here
is given by
'Pn == ETln = (n - 2) . MSE ('1'n (2)) .
100 4. Extensions and Examples

For the process (4.57), the value of 'Pn, found by 10,000 simulation runs, is
given by 1.058, and the value of the level-l parameter () is given by -0.0131.
Figure 4.1 below presents a realization of a sample of size n = 102 from
the ARMA process (4.57). We now apply the MBB, the NBB, the CBB,
and the SB to this data set.

'"
o

~ ~------~------------------------------------~
o 20 40 60 80 100

FIGURE 4.1. A simulated data set of size n = 102 from the ARMA process
(4.57).

First consider the MBB. From the n = 102 original observations


{XOi : i = 1, ... , 102}, we define the vectors Xi'S by the relation Xi =
(XOi' X Oi X O(i+ 2»)' for i = 1, ... , 100. For the purpose of illustration, we sup-
pose that the block length f is equal to 8. Then, we define the overlapping
blocks in terms of the Xi'S as Bi = {Xi, ... , X i+7} , i = 1, ... ,93 and draw
a simple random sample of size ko blocks from {Bl' ... ,B93 }, where ko is the
smallest integer 2 100/f. Thus, for f = 8, ko = 13. Let Bi, ... ,Bko denote
the resampled blocks. Then, writing down the kof elements of Bi, ... , Bko
in a series, we get a MBB sample of size kof = 13 x 8 = 104. We use the
first 100 of these values to define the (ordinary) bootstrap version of Tn
under the MBB as
T*(I)
In
= v'100(H(X*(I))
100 _ H('J-Ln,l,
)) (4.58)

where X;6~) is sample mean of the first 100 MBB samples and where {In,1 ==
{In,1 (C) = E*X;6~)· The centering variable {In,1 may be evaluated without
any resampling by using the formula

{In,1 = 100- 1 [(ko - 1)(93)-1 t, Vi (f) + (93)-1 t, Vi(a)] , (4.59)

where a == 100-(ko-l)C = 4, and Vi(O) = 0, and Vi(m) = X i +·· .+Xi+m-l


is the sum of the first m elements of the block Bi , 1 :::; m :::; C. This easily
follows by noting that Xi, ... , Xioo consist of (k o -1) = 12 complete blocks
of length 8 and the first a = 4 elements from the koth resampled block.
The MBB estimator of the level-2 parameter 'Pn based on blocks of size C
is given by
4.5 Examples 101

A closed-form expression for ~n(l; £.) is intractable because of the nonlin-


earity of the estimator en = H(X 100 ). Therefore, we evaluate ~n(l,£.) by
Monte-Carlo simulation as follows. Let B be a large positive integer, denot-
ing the number of bootstrap replicates. For each r = 1, ... , B, generate a set
of ko = 13 iid random variables {rh, ... ,rho} with the Discrete Uniform
distribution on {I, ... ,93}, the index set of all overlapping blocks of size
£. = 8. Then, for each r, {Bi : i = rh, ... , r1ko} represents a random sample
of size ko from {B1' ... , B1OO -£+l}, also called a replicate of Bi, ... ,Bko. Let
rX;66) denote the sample mean of first 100 values in the resampled blocks
{Bi : i = rh, ... , rho}. Then, for r = 1, ... , B, the rth replicate of T 1*2)
based on the resampled MBB blocks {Bi : i = rh, ... , rho} is given by
*(1)
rT1n = ~(H( r X-*(1»)
v100 100 - H (~))
f-ln,l , (4.60)

where {Ln,l is computed (only once) using formula (4.59). The Monte-Carlo
approximation to the MBB estimator ~n(l, £.) is now given by

L
B 2
~n(l; £.)MC = B- 1 [rT1*2)] . (4.61 )
r=l
Note that as B ---* 00, the average of the [rT;2)] 2 -values tends to the
corresponding expected value E* [rT;~1)] 2 == ~n (1; £.). Thus, by choosing B
appropriately large, one can get an approximation to the MBB estimator
~n(1; £.) to any given degree of accuracy. In Table 4.1 below, we report the
MBB estimators (along with the other block bootstrap estimators) of 'Pn
for the data set of Figure 4.1 for different block sizes, including £. = 8.
As mentioned earlier, the "true" value of the target parameter is given by
'Pn = 1.058. The number of bootstrap replicates used here is B = 800.
(This value of B is chosen only for the purpose of illustration. In practice,
a much larger value of B may be desirable depending on the parameter
'Pn.)

TABLE 4.1. Block bootstrap estimates of the level-2 parameter C{Jn = ETfn based
on different (expected) block sizes, for the data set of Figure 4.1. The true value
of C{Jn is given by 1.058.

Block Size 4 6 8 10 15 20
MBB 1.159 1.085 0.881 0.820 1.078 0.884
NBB 1.299 0.904 1.093 0.763 0.879 1.030
CBB 1.020 1.106 0.951 0.812 0.968 0.808
SB 0.935 0.941 0.898 0.810 0.746 0.642

The steps involved the implementation of the other block bootstrap


methods are similar. For the NBB, we consider the nonoverlapping blocks
102 4. Extensions and Examples

h
{Bi(2) .. .Z -_ 1, ... , b} were B(i-l)Hl and b -- l 100 / eJ. Next, we gen-
Bi(2) -
=
erate the NBB samples by res amp ling ko blocks from this collection. For
e= 8, this amounts to resampling ko = 13 blocks from the collection of b =
12 disjoint blocks {Bi2 ), ... ,Bi;)} = {{Xl"",Xs}"",{XS9, ... ,X96}}'
The NBB estimator of the parameter 'Pn is given by

<Pn(2;e) = E* [Tl*~2)r.

where Tl*~2) = V106(H(X;6~)) - H(Pn,2)), X;6~) is NBB sample mean of


E - *(2) 1 ",96 X
the fi rst 100 resamp1ed d ata va1ues, an d /Ln,2 = *XlOO = 96 ui=l i·
A -

Note that for this choice of e, the last 4 Xi-values never appear in the
definition of the NBB estimator <Pn(2; e). For Monte-Carlo approximation
to the NBB estimator <Pn(2; e), generate B sets of iid random variables
{rh,l, ... , rh,ko} with the Discrete Uniform distribution on {1, ... , b} and
define the replicates of T~(2) as

for r = 1, ... , B. The Monte-Carlo approximation to the NBB estimator


<Pn(2; e) is now given by
B 2
'Pn (2', e)MC = B- 1
A ,"", [
~ r
T*(2)]
In . (4.62)
r=l
The NBB estimates of 'Pn for different block lengths e for the data set of
Figure 4.1 are given in the second row of Table 4.1 above.
For the CBB and the SB, we need to consider blocks defined in terms of
the periodically extended time series {Yn,diEZ, where we define Yn,(kn+i) =
Xi for all k E Z and all 1 :::; i :::; 100. As described in Chapter 2, for
a given block length e, the CBB resamples ko blocks from the collection
{B( i; e) : i = 1, ... , lOa}, where, as before, ko is the smallest integer not
less than 100/e and where

B(i; k) = {Yn,i,"" Yn,(i+k-l)}, i 2: 1, k 2: 1 .


The CBB version of Tn based on these circular blocks of length e is given
by
*(3) -_
TIn V r;r:;n(
100 H (- *(3)) -
X lOO H (X-lOO
)) ,

- *(3)
where X lOO is the sample mean of the first 100 elements of ko resampled
- *(3) -
blocks. Note that E*X lOO = X lOO , the sample mean of {Xl"", XlOO}, for
any choice of e and hence, we do not need an additional formula like (4.59)
to find E*X;6~). The CBB estimator of 'Pn is now given by

'Pn 3, ~fl) -= E * [T*(3)]2


A ( •
In .
4.5 Examples 103

For Monte-Carlo evaluation of CPn(3; e), with £ = 8, we generate


B = 800 sets of iid random variables {rh,I, ... , rh,ko } with the Dis-
crete Uniform distribution on {I, ... , 100}, setting ko equal to 13 as
before. Let rT;~3) be the replicate of T;~3) based on the resampled
blocks{B(rI3,1;8), ... ,B(rI3,ko;8)} with ko = 13. Then, the Monte-Carlo
value of the CBB estimator is given by CPn(3; £)MC = B-1 'E~=1 [rT;~3)t
The CBB estimates of 'Pn, computed using the data set of Figure 4.1 are
given in the third row of Table 4.1 for £ = 8, and also for other block
lengths.
Finally, we consider the SB. For notational consistency, we write £ for the
expected block length under the SB. Thus, following the description of the
SB given in Chapter 2, we resample K == inf{l ::; k ::; 100 : L1 + ... + Lk ~
100} circular blocks {B(I4,i, L i ) : i = 1, ... , K}, where Li'S are iid random
variables having the Geometric (p) distribution with p = £-1 and where
14 ,i's are iid random variables having the Discrete Uniform distribution on
{I, ... , 100}. Furthermore, 14,i's and Li'S are independent. The SB version
of T 1n is given by

Note that like the CBB, E*X;~~) = X100 for any choice of £ and hence,
centering the SB version H(X;~~») of the estimator en
is simpler compared
to centering the MBB and the NBB versions. The SB estimator of 'Pn is
given by
CPn( 4; £) == E* [T;~4)] 2 •

For Monte-Carlo evaluation of CPn (4, £), say, again with £ = 8, for
each r = 1, ... , B, first we generate iid Geometric ( i)
random variables
{rLI, ... ,rLrK}, where rK = inf{l ::; k ::; 100 : rL1 + ... + rLk ~ 100}.
Note that rK, the number of resampled blocks under the SB method at
the rth replication, is random and unlike the first three block bootstrap
methods, rK's take on different values for different r's. Next, having
generated the rL/s, we independently generate iid Discrete Uniform
{I, ... ,100} random variables rI4,i, i = 1, ... , rK for each r = 1, ... , B.
This yields B sets of SB resampled blocks {B(rI4,i; rLi) : i = 1, ... 'rK},
for r = 1, ... ,B, where each set of resampled blocks contains at least 100
SB observations. We compute a replicate rT;~4) of T;~4) using the first 100
SB observations from the rth set, and combine these to get the Monte-
Carlo value of the SB estimate as CPn(4; £)MC = B-1 'E~=1 [rT;~4)t The
estimates for the SB method for the data set of Figure 4.1 are given in
row 4 of Table 4.1. D

Example 4.6: (Comparison of "naive" and "ordinary" versions of block


bootstraps). We continue with the set up of the last example. Suppose
104 4. Extensions and Examples

that {XOi hEZ is a stationary process and we are interested in estimating


population characteristics of TIn = vn - k(in(k) - 'T(k)), where 'T(k) =
COV(XOl, X O(k+1)) and in(k) is as defined in Example 4.5. For estima-
tors like in(k), that are defined in terms of kth order lag-vectors of the
original observations {XOi}iEZ, an alternative approach, called the "naive
approach," of defining block bootstrap estimators was described in Section
2.5. Here we compare the performance of the "naive" approach and the
"blocks of blocks" (i.e., the "ordinary") approach in the context of Exam-
ple 4.5.
Suppose that {XOdiEZ is the ARMA(3,4) process given by (4.57) and the
level-2 parameter of interest is given by 'Pn = ET;, with n = 102 and k = 2.
Under the "naive approach," one applies the block bootstrap methods to
the observations X Ol , ... ,XO(102) directly. For the purpose of illustration,
here we describe the "naive" MBB estimator of 'Pn based on blocks of length
£ = 8. Define the MBB blocks HOi = {XOi , ... , X O(i+7)}' i = 1, ... ,95
based on XOi'S (but not defined in terms of the vectors Xi'S). Let b1 be
the smallest integer?: 102/£. Thus, for our example, b1 = 13. Then, we
resample b1 blocks at random from the collection {HOi : i = 1, ... , 95} to
generate b1 £ = 13 x 8 = 104 MBB samples {XOi : i = 1, ... , 104}, and use
the first n = 102 of these variables to define a "naive" version of Tn as

T*(1)nv
In = V~
n ('Tn*(1) (2)
-,t,
_ 'T
A (2))
n, (4.63)

h
were 'Tn*(1)(2) -- (100)-1 "1\'100
L.,i=1
X*Oi X*0(i+2) - [X-*(1)j2
n-2 an d X*(1)
n-2
(100)-1 Li~~ X Oi . The "naive" MBB estimator of 'Pn is given by

(1. £)nv -= E * [T*(1)nv]


'Pn,
A
In 2

Similarly, we can define the naive versions of the other three block boot-
strap estimators. In Table 4.2, we report the "naive" block bootstrap esti-
mators of 'Pn = 1.058 based on (expected) block sizes £ = 4,6,8,10,15,20.

TABLE 4.2. Block bootstrap estimates of cpn = ETfn for the data set of Figure 4.1
based on the "naive" approach. Here, the true value of cpn is 1.058.

Block Size 4 6 8 10 15 20
MBB 1.768 1.242 1.230 1.094 1.297 0.986
NBB 1.275 1.113 0.672 1.578 0.905 0.852
CBB 1.402 1.349 1.077 1.187 0.888 0.937
SB 1.571 1.183 1.126 1.151 0.902 0.760

As explained in Section 2.5, the "naive" bootstrap estimators tend to


have larger biases compared to the "ordinary" or the "blocks of blocks"
4.5 Examples 105

versions. For this example, contributions to the biases of the "naive"


bootstrap estimators result from the "within-block-independence" of the
components of (XOi' XOiX~(i+2)) near the boundary of the resampled
blocks. Further, this bias effect is more pronounced when the (expected)
block size R is small. The result of Table 4.2 appears to lend some support
to this observation for the data set of Figure 4.1. From the table, we find
that the "naive" estimators tend to overestimate the level-2 parameter
'Pn = 1.058 for values of R S; 10 for the MBB, the CBB, and the SB
methods, while the estimates based on the NBB method fluctuates around
the true value 'Pn = 1.058. The discrepancy for each of the four methods
is indeed larger for smaller values of R. D

Example 4.7: (Estimation of Distribution Function). Next we look at


block bootstrap estimators of the distribution function. Let {XoihEZ be a
stationary time series with Var(XOl ) E (0,00), and the level-1 parameter
of interest is the lag-k autocorrelation coefficient

r(k) = COV(XOl, XO(1+k))jVar(Xod ,

for some given integer k 2: o. As an estimator of r(k), we consider the


following version of the sample autocorrelation coefficient

which is slightly different from the estimator considered in Example 4.2.


With

H(x, y, z) = {(z - x 2 )j(y - x 2 )}1l(y > x 2 ), (x, y, z)' E ~3

and li == (Yli , Y2i , Y3 i)' = (XOi' X5i' XOiXO(iH)', i E Z, it is easy to see


that r(k) and fn(k) can be expressed as

r(k) = H(EYd

and
= H(Y1n, Y2n , Y3 (n-k)) ,
fn(k)
where fjm = m- 1 2::1 Y;ii' m 2: 1, 1 S; j S; 3. Note that in this case,
the estimator f n (k) does not directly fall in the framework of the Smooth
Function Model treated in Section 4.2, since it is a function of averages
of different numbers of X-variables in the first, the second, and the third
co-ordinates. However, by Remark 4.1 of Section 4.2, the block bootstrap
approximations for the sampling distribution of
106 4. Extensions and Examples

remain valid, whenever the regularity conditions of Theorem 4.1 on H(·),


ao(-) and {XOdiEZ are satisfied. Since 1J2 = Var(XOl ) E (0,00), the function
H, being a rational function with a nonvanishing denominator at EY1 , is
infinitely differentiable in a neighborhood of EY1 . Hence, if the sequence
{XOdiEZ satisfies the moment and mixing conditions of Theorem 4.1, then
for j = 1,2,3,4,

sup Ip*
x
(.In - k(r~(j)(k) - 1'~)(k)) :::; x) - P(Tln :::; x) I
---'>p 0 as n ----. 00 ,

where, as usual, we write j = 1 for the MBB, j = 2 for the NBB,


j = 3 for the CBB, and j = 4 for the SB, and define the vari-
ables r*n (J') 's and l'n (J') 's as l' n*(j) = H (Y,*(j). y*(j). y*(j)) and 1'(j)
In2' 2n2' 3n2 n
(k) =
H(E* y*(j)
1n2'
E * -p;*(j)
2n2'
E * y*(j))
3n2
for J' = 1 ' 2 , 3 and 1'*(4)
n
and 1'(4)
n
similarly,
with n2 replaced by N2 = L1 + ... + L K2 . Here n2 and K2 are defined by
replacing n with n - k, e.g., n2 = l(n - k)/CJC for a given block size C E
(1, n - k) for the Y-variables, and K2 = inf{j EN: L1 + ... + L j ::::: n - k}.
Again for the purpose of illustration, suppose that {XoihEZ is the ARMA
(3,4) process given by (4.57). We now consider the block bootstrap estima-
tors of the distribution function of

for the data set of Figure 4.1 with k = 2 and n = 102. As in Example
4.5, we define the blocks in terms of the transformed vectors Y1 , ... , YlOO
for each of the block bootstrap methods. Suppose, for example, that the
(expected) block size C is chosen to be 6. Thus, for the MBB, the blocks
are {Hi == (Yi, ... , Yi+5) : i = 1, ... , 95}, for the NBB the blocks are
{(Y1 , ... , Y6 ), (Y7 , ... , Y12 ), ... , (Y91 , ... , Y96 )} , and for the CBB and the
SB, the blocks are defined using the periodic extension of Y 1 ,·· ., YlOO .
To generate the bootstrap samples, we res ample ko blocks for the first
three methods, where ko is the smallest integer not less than 100/C. For
C = 6, ko = 17. Similarly, for the SB, we resample a random number of
blocks of lengths L 1, .. . , L K where L 1, L 2 , ... are iid Geometric( i)
vari-
a bles an d K = m . f{k ::::: 1 : Ll + ... + Lk ::::: 100 . Let 1 , ... , 100
} y*(j) y*(j)
denote the first 100 bootstrap samples under the jth method, j = 1,2,3,4.
Although in Theorem 4.1 we have proved validity of the four block boot-
strap methods for res ample sizes nl for j = 1,2,3 and Nl for j = 4 mainly
to simplify proofs, consistency of bootstrap approximations continue to
hold if the res ample size is set equal to n. Hence, in practice, a resample
size equal to the size of the observed Y-vectors may be employed for the
"ordinary" block bootstrap estimators. Accordingly, in practice, we define
the block bootstrap versions of TIn as

T*(j)
In = V'iOo(r*(j)
lUU n (2) - 1'(j)
n (2))
' J' = 1, 2 , 3 , 4 ,
4.5 Examples 107

where r~(j) (2) = H (Y:;(j)) and r~) (2) = H ( E* (y:;(j))) with y:;(j) ==
(100)-1 2::;~~ ~*(j). Strictly speaking, this definition of the bootstrapped
statistic does not reflect the difference in the number of variables aver-
aged in different components of lj's in the definition of r~(j) (2), but the
effect is negligible. The bootstrap estimators of the distribution function
G ln (x) == P(Tln S x), x E lR is given by

(4.64)

For Monte-Carlo evaluation of G~) 's, as in Example 4.5, we generate B


sets of block bootstrap samples and compute the replicates rT;~j) for r =
1, ... , B. Then, the Monte-Carlo approximation to the jth block bootstrap
estimator is given by
B
GA(j)MC(x)
In = B- 1 ,", In <
L n( r TV*(j) _ x) , X E lTll·
IN., J = 1, 2 , 3 , 4 . (4.65)
r=1

As an example, we computed the block bootstrap distribution function


estimates of (4.65) for the data set of Figure 4.1 with B = 800. (In practice,
a higher value of B should be used for distribution function estimation.)
Figure 4.2 below gives the corresponding histograms. As follows from the
discussion of Section 4.2, the variable TIn is asymptotically normal. The
block bootstrap estimates also show a similar shape with slightly higher
masses to the left of the origin.
The bootstrap estimator Gi~ (x) of the sampling distribution of TIn can
also be used to obtain estimators of the quantiles of TIn. For a E (0,1),
define
qln(a) = G1,;(a)
where, recall that, for any distribution function G on lR, G- l (a) =
inf{x E lR : G(x) ~ a}. Thus, qln(a) is the ath quantile of GIn- The
block-bootstrap estimators of the level-2 parameter qln(a) are given by
"plugging-in" Gi~ in place of GIn as

(4.66)

Monte-Carlo evaluation of qi~(a) is rather simple, once the bootstrap


replicates rT;~j), r = 1, ... ,B have been computed. Arrange the replicates
rTl*~j), r = 1, ... ,B in an increasing order to get the corresponding order
statistics (I)T;~j) S ... S (B)Tl*~j). Then, the Monte-Carlo approximation
to qi~ (a) is given by the l Ba Jth order statistic, i.e., by

A(j)MC( ) TV*(j). 2 3 4
qln a = (LBaJ) In , J = 1, , , .
108 4. Extensions and Examples

MBB NBB
~
51

§
fi!
0 o
-4 ·3 ·2 ·1 0 2 ·3 ·2 ·1 0 2

eBB SB
~
51 fi!

§ §

fi! fi!

0 0

·3 ·2 ·1 0 · 1 ·3 ·2 ·1 0

FIGURE 4.2. Histograms of block bootstrap distribution function estimates


Gi~,j = 1,2,3,4 of (4.64), based on (expected) block length 6 and B = 800
bootstrap replicates.

As an example, consider the computation of q~~MC (a) for a = 0.05, based


on B = 800 block bootstrap replicates. Then, the Monte-Carlo values of the
block bootstrap estimates of qIn(a) are given by the l(800)(0.05)J = 40th
order statistic of the bootstrap replicates IT;~j), .. . ,8ooT;~j). For the data
set of Figure 4.1, the block bootstrap estimates are, respectively, given by
-1.635 (MBB), -1.535 (NBB), -1.583 (CBB), and -1.624 (SB).
Confidence intervals (CIs) for the level-1 parameter r(2) may also be
constructed using the bootstrap methods. Note that if the quantiles of TIn
were known, an equal-tailed two-sided (1 - a) CI for r(2) would be given
by
fa: == (fn (2) - v:OO qIn(1- i), Tn(2) - v:oo qln(i)) .
A percentile block bootstrap CI for r(2) is obtained by replacing the "true"
quantiles qIn(a/2) and qIn(1- ~) by their bootstrap estimators, viz.,

f(j)
a:,percentile -
- (T n (2) - _1_ ,(j)(1 - ~) T (2) _ _
VlOO qin 2 ' n
1_ ,(j)(~))
VlOO qin 2
(4.67)

For computing the interval estimates of r(2) using formula (4.67) , the boot-
strap quantiles q~~ (·)'s in (4.67) are further replaced with their Monte-
4.5 Examples 109

Carlo approximations. As an example, we construct 90% equal-tailed per-


centile bootstrap CIs for r(2) = COV(XOl, X 03 ) under model (4.57). By
(4.67), these are given by (rn(2)-qi~(0.95)/10, rn(2)-qi~(0.05)/1O), j =
1,2,3,4. The block bootstrap quantiles qi~ (0.05) and qi~ (0.95), j =
1,2,3,4 may be found as described in the previous paragraph. They are,
respectively, given by the 40th and the l(800)(0.95)J = 760th order statis-
· 0 f t h e rep1·lcates I TVIn
t lCS *(j) , J. = 1, 2 , 3 , 4 . For t h e d ata-set 0 f
*(j) ,· .. ,800 TVIn
Figure 4.1,90% percentile block bootstrap CIs for r(2) based on (expected)
block size i! = 6 are given by
(-1.059, 1.740) (MBB)
(-1.079, 1.633) (NBB)
(-1.085, 1.687) (CBB)
(-1.093, 1.728) (SB)
Note that for constructing these bootstrap interval estimates, we do
not have to estimate the standard error of rn(2) explicitly. All four block
bootstrap methods provide consistent estimators of the standard error
implicitly through the bootstrap quantiles qi~ (.) 'so In comparison, the
user must separately derive and compute an estimate of the standard error
of rn(2) for constructing a large sample interval estimate for r(2) using
the traditional normal approximation. 0

Example 4.8: (Differentiable Statistical Functionals). Suppose that


{XdiEZ is a sequence of stationary random variables with (one-
dimensional) marginal distribution function F. Suppose that we are in-
terested in estimating the parameter = e J
J(u)F-I(u)du using the L-
estimator

for a given function J : (0, 1) ---+ JR, where Fn is the empirical distribution
function of X I, ... , X n , and F- I and F;: I are as defined in Section 4.4. As
discussed there, iJ n may be represented as a statistical functional, say iJ n =
T(Fn). Fn§chet differentiability of Tat F depends on the joint behavior of
the functions J (.) and F (.) and may be guaranteed through different sets of
conditions on J(.) and Fe). Here, we state one set of sufficient conditions.
For variations of these conditions, see Serfling (1980) and Fernholz (1983),
and the references therein.
Note that the function F- I is nondecreasing and left continuous. Hence,
F- I corresponds to a measure on JR. We assume that

(i) J is bounded and continuous almost everywhere with respect to F- I


and with respect to the Lebesgue measure, and

(ii) there exists 0 < a < b < 1 such that J(u) = 0 for all u rt. [a, b].
110 4. Extensions and Examples

By Boos (1979), under assumptions (i) and (ii), the functional T(·) is
Frechet differentiable at P under the sup-norm I . 11=. Hence, if we write

and T 2n for the MBB version of T2n based on blocks of size £, then under
the conditions of Theorem 4.4 and under assumptions (i) and (ii) above,

sup Ip*(T2n ::; x) - P(T2n ::; x)1 --7p 0 as n ---; 00 .


xElR

Note that for the a-trimmed mean (0 < a < 1/2), the level-l parameter
of interest is given by (cf. Example 4.4)

B= iI-a P-l(u)du/(I- 2a) , (4.68)

which corresponds to the function J(u) = (1-2a)-1·n[a,1_a](U), u E (0,1).


Clearly, J(.) satisfies assumptions (i) and (ii), if the two-point set {a, I-a}
has p-l-measure zero. It is easy to check that this holds, provided the
function P is strictly increasing in some neighborhoods of the quantiles
p-l(a) and P-l(1 - a).
As an example, we now consider the stationary process {XdiEZ given by

(4.69)

where the sgn(-) function is defined as sgn(x) = n(x ~ 0) - n(x ::; 0), x E lR
and {Xl,ihEZ is an ARMA(3,3) process satisfying

X1,i - 0.4X1,i-l - 0.2X1,i-2 - 0.IX1,i-3


= Ei + 0.2Ei-l + 0.3Ei-2 + 0.2Ei-3, i E Z ,

where E/S are iid N(O, 1) variables. Note that the marginal distribution P
of Xi is symmetric (about the origin), continuous and is strictly increasing
over R Furthermore, Xi has finite moments of all order and {XihEZ is
strongly mixing with an exponentially decaying mixing coefficient. Thus,
the conditions and conclusions of Theorem 4.4 hold for the centered and
scaled a-trimmed mean T2n = v'n(e n - B), where B is given by (4.68),
0< a < ~.
Now we consider the performance of the MBB in a finite sample situ-
ation. Figure 4.3 below gives a realization of Xi, i = 1, ... ,250 from the
process (4.69). We apply the MBB with block size £ = 10 to estimate the
distribution of T2n for different values of a. As in the previous examples,
we res ample bI == In/ £l = 25 blocks from the collection of overlapping
blocks Bi = (Xi, ... , Xi+9) , i = 1, ... ,241 to generate the MBB observa-
tions Xi, ... ,Xio; ... ; X2'4I, ... ,X2'50' Let XCI) ::; ... ::; X C250 ) denote the
4.5 Examples 111

order statistics corresponding to X;(l) , ... , X;~6). Then, define the MBB
version of T2n as
T;n = v'250(B~ - en)
where B~ = 2: na :':;:i:.:;:n(l-a) XCi)/[n(l - 2a)] and where en = (1-
2a)-1 J~-a F;:l(U)du and Fn(x) = E*[n-l2:~=l ll(X; :::; x)], x E JR. Us-
ing arguments similar to (4.59), we can express Fn(-) as
n
Fn(x) = LWinn(Xi :::; x), x E JR
i=l

where, with N = n - f + 1,
N-1 if f:::;i:::;N
Win = { i/(Nf) if 1:::;i:::;f-1
(n - i + l)/(Nf) if N+1:::;i:::;n.

With the help of this formula, we may further simplify the definition of
en, and write down an explicit expression for en that may be evaluated
without any resampling. Let X(l) :::; ... :::; X(n) denote the order statistics
of Xi, i = 1, ... , n. Also, let Wei) denote the weight associated with the
order statistic XCi)' For example, if X(l) = XlO and X(2) = X 3, then
W(l) = WlOn and W(2) = W3n. Then, the centering variable en may be taken
as

where La = max{k : 1 :::; k :::; n, 2:7=1 Wei) < a} and Ua = min{k : 1 :::;
k :::; n, 2:7=1 Wei) 2: 1 - a}.

o 50 100 150 200 250

FIGURE 4.3. A simulated data set of n = 250 Xi-values from model (4.69).

Figure 4.4 below gives the histograms of the MBB estimates of the distri-
bution function G 2n (x) == P(T2n :::; x), x E JR based on B = 800 bootstrap
replicates for a = 0, 0.08, 0.2, 0.4, and 0.5. Note that a = 0 represents the
case where On is the sample mean and a = .5 represents the case where On
is the sample median. Although we have verified the conditions of Theorem
112 4. Extensions and Examples

4.4 only for 0 < a < ~, here we include these limiting a-values to obtain
a more complete picture of how the MBB performs under varying degrees
of trimming. It follows from Figure 4.4 that the bootstrap estimates of the
sampling distribution are more skewed for larger values of a. Although T2n
is asymptotically normal for all these a-values, the "exact" distribution of
T2n is not symmetric for finite sample sizes. The limiting normal distri-
bution fails to reveal this feature of the true sampling distribution. But
the bootstrap estimates of the sampling distribution functions of T2n for
different a-values provide useful information on the skewness of the true
distributions of T2n .

DD DDO D'

D< D.

FIGURE 4.4. Histograms of the MBB distribution function estimates of the cen-
tered and scaled a-trimmed mean T2n for a = 0.0, 0.08,0.2, 0.4,0.5.

We may also use the bootstrap replicates of T;;n to construct percentile


MBB CIs for the level-l parameter () == (1 - 2a)-1 J~-O! F-1(u)du as in
the last example. Since the marginal distribution F of Xi is symmetric
about the origin, the true value of () is equal to 0 for all a. An equal-tailed
two-sided 80% MBB percentile CI for () is given by

where Q2n({3), 0 < f3 < 1, is the f3th quantile of the conditional distribution
of T;;n. For the data set of Figure 4.3, 80% (equal-tailed) CIs based on the
MBB with block size C = 10 are given by
4.5 Examples 113

(-2.215, 2.615) for a = 0.00


(-1.468,1.715) for a = 0.08
(-0.936, 1.310) for a = 0.2
(-0.423, 0.575) for a = 0.4
(+0.022, 1.021) for a = 0.5
Note that all interval estimates except the one for a = 0.5 (the median)
contains the true value () = o.
5
Comparison of Block Bootstrap
Methods

5.1 Introduction
In this chapter, we compare the performance of the MBB, the NBB, the
CBB, and the SB methods considered in Chapters 3 and 4. In Section 5.2,
we present a simulated data example and illustrate the behavior of the
block bootstrap methods under some simple time series models. Although
the example treats the simple case of the sample mean, it provides a rep-
resentative picture of the properties of the four methods in more general
problems. In the subsequent sections, the empirical findings of Section 5.2
are substantiated through theoretical results that provide a comparison of
the methods in terms of the (asymptotic) MSEs of the bootstrap estimators.
In Section 5.3, we describe the framework for the theoretical comparison.
In Section 5.4, we obtain expansions for the MSEs of the relevant boot-
strap estimators as a function of the block size (expected block size, for
the SB). These expansions provide the basis for the theoretical comparison
of the sampling properties of the bootstrap methods. In Section 5.5, the
main theoretical findings are presented. Here, we compare the bootstrap
methods using the leading terms in the expansions of the MSEs derived
in the previous section. In Section 5.5, we also derive theoretical optimal
(expected) block lengths for each of the block bootstrap estimators and
compare the methods at the corresponding optimal block lengths. Some
conclusions and implications of the theoretical and finite sample simula-
tion results are discussed in Section 5.6. Proofs of two key results from
Section 5.4 are separated out into Section 5.7.
116 5. Comparison of Block Bootstrap Methods

5.2 Empirical Comparisons


First we consider the behavior of the block bootstrap methods across dif-
ferent block lengths for a fixed set of observations. In Section 4.5, we con-
sidered two numerical examples where the four block bootstrap methods
have been applied to the data set of Figure 4.1 for variance estimation and
also for distribution function estimation. As Table 4.1 shows, the block
bootstrap estimates of the level-2 parameter 'Pn = ET~ (cf. Example 4.5)
produced by the various methods exhibit different patterns of variations
across the (expected) block lengths considered. The SB method produced
variance estimates that (nearly monotonically) decreased in value as the ex-
pected block length increased and resulted in the biggest "underestimation"
of the target parameter 'Pn = 1.058 among all four methods. The MBB and
the eBB tended to have a similar pattern across the block lengths and were
of comparable value. The NBB estimates fluctuated around the true value
'Pn = 1.058, having both over- and underestimates at different block sizes.
Similar comments apply to the bootstrap distribution function estimates
as well.
The observations, noted above, on the behavior of the block bootstrap
methods are based on a single data set only, and as such do not say much
about the properties of these methods across different realizations of the
variables X1, ... ,Xn , i.e., about their sampling properties. To get some
idea about the sampling properties of these methods, we need to compare
suitable population measures of accuracy (e.g., the MSE) of the resulting
estimators. More precisely, let 'Pn be a level-2 parameter of interest, which
is to be estimated by the various block bootstrap methods. For j = 1,2,3,4
and f E (l,n), write c{!n(j;f) for the bootstrap estimator of 'Pn obtained
by using the jth block bootstrap method with (expected) block length f.
Then, from the statistical decision-theoretic point of view, one effective
way of comparing the performance of the block bootstrap methods is to
compare the values of MSE( c{!n (.; f)) 'so For the sake of illustration, we now
suppose that 'Pn = nVar(Xn), where Xn denote the sample mean of the
first n observations from a stationary time series {XihEZ. We compare
the performance of the four block bootstrap methods under the following
models for {XdiEZ:

ARMA (1,1) model: Xi - 0.3Xi _ 1 = Ei + O.4Ei-l, i E Z, (5.1)


AR(l) model: Xi 0.3Xi - 1 + Ei, i E Z,
= (5.2)
MA(l) model: Xi = Ei + O.4Ei-l, i E Z, (5.3)

where, in each of the three models, the innovations Ei'S are iid N(O,l)
random variables.
Figure 5.1 below shows a plot of the MSEs of the block bootstrap es-
timators of 'Pn, produced by the MBB, the NBB, and the SB under each
of the models (5.1)-(5.3) and for a sample of size n = 100. The MSEs are
5.2 Empirical Comparisons 117

computed using K = 2000 simulation runs. In each simulation run, the


block bootstrap estimators at a given (expected) block length are com-
puted using B = 500 bootstrap replicates. From the plots, it appears that
the SB estimators have larger MSEs than the MSEs of the MBB and the
NBB estimators under all three models and at all levels of the block length
parameter C considered, starting from C = 2. In the plots, the MSE curves
for the eBB estimators have been left out because of the almost identical
performance of the eBB estimators compared to the MBB estimators over
the range of values of C considered. Note that when considered as a function
of the block length, the MSEs of the NBB estimators lie between the cor-
responding MSE curves of the MBB and the SB. Indeed, a similar pattern
continues to hold for larger sample sizes. Figure 5.2 gives the MSE curves
for the three methods for a sample size n = 400 for models (5.1)-(5.3),
using the same values of the simulation parameters K and B as above.

ARMA(l.l)

________________~
ar::.3,ma=.4
'"o n=100
SBB ~'-=MB=B--.//

~L~~~==~~:~=~~~
o
10 15 20 25

------~--
on
AR(l)
'"
~
- NBB
81'=.3
n 100 MBB
,..,on

on
0
10 15 20 25

~I ~
MA(l)
ma=.4
n=100

MB NBB
-- :::::
10 15 20 25

FIGURE 5.1. Mean square errors of the block bootstrap estimators of the level-2
parameter <.pn = nVar(Xn) at n = 100 under models (5.1)-(5.3). MSEs are
computed using K = 2000 simulation runs, with B = 500 bootstrap iterations at
each (expected) block length under each simulation run.

The numerical example considered above naturally leads us to the fol-


lowing questions. Is it true that the MBB outperforms the NBB and the
SB in terms of the MSE criterion for more general processes and for more
generallevel-2 parameters? If so, can we quantify the relative efficiency of
the MBB with respect to the other block bootstrap methods, at least for
118 5. Comparison of Block Bootstrap Methods

~
:2 ARMA(l.l)
0
ar=.3,ma=.4
0
;; n=400

SBB
~
0

10 20 30 40

.'"'
q

AR(l)
'"0 SBB ar=.3
n=4oo
NBB MBB
~
10 20 30 40

:~ 10 20 30
MA(l)
ma=.4
n=400

40

FIGURE 5.2. Mean square errors of the block bootstrap estimators of


'Pn = nVar(Xn) at n = 400 under models (5.1)-(5.3). The same values of K
and B as in Figure 5.1 are used.

large sample sizes? In the same vein, if the SB has the largest MSE, how
large can the relative magnitude of the MSE of a SB estimator compared
to a MBB estimator be? Also, as the performance of each block bootstrap
method depends on the blocking parameter and there is a different "opti-
mal" block length in each case, how do the MSEs of the "best" estimators
from each of the four methods compare against one another? In the next
few sections, we describe some theoretical results that provide answers to
some of these questions. As we will see, the empirical results of the numer-
ical examples above are in close agreement with our theoretical findings
described below.

5.3 The Theoretical Framework


Let {XihEZ be a JR.d-valued stationary process with EX l = J-L. For the the-
oretical development, we shall work under the "Smooth Function Model"
introduced in Section 4.2. Without loss of generality, we may then suppose
that the level-l parameter of interest () and its estimator On can be ex-
pressed as a smooth function of the population and the sample means, J-L
and Xn == n- 1 E~=l Xi, respectively. As noted in Chapter 4, by considering
suitable transformations of the original observations, this formulation al-
5.3 The Theoretical Framework 119

lows us to consider a wide class of estimators under the present framework.


In particular, this includes the sample lag cross-covariance estimators, the
sample autocorrelation estimators, and the Yule-Walker estimators for au-
toregressive processes. For the rest of this chapter, we suppose that there
exists a smooth function H : lR d --* lR, such that

Let 'Pn denote the level-2 parameter of interest that is a functional of


the distribution of en. Also, let CPn (j; e) be the bootstrap estimator of 'Pn
obtained by using the jth block bootstrap method with (expected) block
length e,j = 1,2,3,4. As it turns out, the accuracy of the CPn(e;·) depends
on the particular functional 'Pn of the sampling distribution of en being
estimated. For concreteness, we shall restrict attention to the following two
level-2 parameters:

(5.4)
and
(5.5)

and compare the performance of the block bootstrap methods for estimat-
ing these, using the MSE criterion. Similar results can be proved for the
bootstrap estimators of the distribution function and for certain other func-
tionals (e.g., the quantiles) of the sampling distribution of en, although a
different set of regularity conditions and arguments would be needed.
For the sake of completeness, we now briefly describe the specific versions
of the bootstrap estimators of 'PIn and 'P2n considered here. Recall that,
we index the methods MBB, NBB, CBB, and SB as method number 1,2,3,
and 4, respectively, and denote the bootstrap samples generated by the
J'th me th0 d as X*j,l' X*j,2,···, or as X*(j)
1 , X*(j)
2 . t , were
, ... , as convemen h
j = 1,2,3,4. Let e denote the block length for the first three methods
and the expected block length for the SB method. For a given value of e,
we suppose that b = lnjeJ blocks are resampled for the MBB, the NBB,
and the CBB, resulting in n1 == be bootstrap observations. Denote the
corresponding sample mean by x~:P, j = 1,2,3. Thus, the bootstrap
version of the centered variable Tn = en - e under the MBB, the NBB, and
the CBB are given by

(5.6)

h
were *(j) = n -1 I:nl
X- n,.(. X*' '.
' - 1 ),t
D 1 ~-

Next consider the SB method. Since we denote the expected block length
bye, the block length variables L 1 , L 2 , ... of the SB method are now con-
ditionally iid Geometric random variables with parameter p = e-
1 . As a
120 5. Comparison of Block Bootstrap Methods

result, for the comparison here, we consider the SB estimators only cor-
responding to a finite set of values of the parameter p E (0,1), which are
reciprocals of an integer £ in the interval (1, n). Since the typical choice of
p is such that p -> 0 as n -> 00, it is possible to find an asymptotically
equivalent choice, p rv £-1, for a suitable sequence £ == £n -> 00, and thus,
this unified framework does not impose a serious restriction.
For a given value of £, we suppose that under the SB method K =
inf{1 ~ k ~ n : L1 + ... + Lk ?: n} blocks are resampled, resulting in N1 =
L1 + ... + LK bootstrap observations. Let X~::) == n- 1 2::~=1 X~,i denote
the average of the first n SB observations. As noted earlier, E*X~(:) = Xn
for all £. Hence, we define the bootstrap version of the centered'variable
Tn = en - () under the SB method by

(5.7)

Note that the level-2 parameters of interest given in (5.4) and (5.5) are
the first two moments of Tn, viz., 'P1n = Bias(en ) = ETn and 'P2n =
Var(en ) = Var(Tn). Hence, the bootstrap estimators of 'P1n and 'P2n are
respectively defined as

and

In the next section, we obtain expansions for the MSEs of the block
bootstrap estimators BIAS
j (£) and VAR
j (£), j = 1,2,3,4.

5.4 Expansions for the MSEs


For deriving expansions for the MSEs of the bootstrap estimators, note
that for any random variable Y, MSE(Y) = [Bias(YW + Var(Y), and,
hence, an expansion of the MSE can be obtained from expansions for the
bias part and the variance part. Consequently, we look at the bias and
the variance of the bootstrap estimators separately and combine these
to get a single measure of performance in terms of the MSE. Recall the
notation ~(r; 5) = 1 + 2::~=1 n 2r - 1a(n),,/(2r+8) , where a(-) denotes the
strong mixing coefficient of the process {XihEZ. Also, recall that for
a = (a1, ... ,ad)' E zt, we write lal = a1 + ... + ad, a! = rr~=l aj! and
DC< = ax,,~.I.~~x"d . We will use the following conditions for deriving the
1 d
expansions in this section. Here r is a positive integer and its values are
specified in the statements of the results below.
5.4 Expansions for the MSEs 121

Condition Dr The function H : jRd ---+ jR is r-times continuously differ-


entiable and max{IDV H(x)1 : Ivl = r} :s; C(l + Ilxll ao ), x E jRd for
some integer ao 2:: 1.

Condition Mr EIIXI I1 2r +8 < 00 and ~(r; 6) < 00 for some 6 > O.

Then, we have the following result on the bias part of the bootstrap
estimators 'hn(j;.e) and CP2n(j;.e), j = 1,2,3,4.

Theorem 5.1 Assume that .e is such that .e- 1 + n- 1 / 2.e = 0(1) as n ---+ 00.

(a) Suppose that Condition Dr holds with r = 3 and that Condition Mr


holds with r = 3 + ao, where ao is as specified by Condition Dr. Then,

Bias(J'jIXSj(.e)) = n- 1 .e- 1 Al + 0(n- 1 .e- 1 ) for j = 1,2,3,4,

where Al = - 2:: 1al=1 2:: 1f31 =1 Ca+f3(2::~-oo IjlEXf xf+j ) and Ca =


DaH(fL)/O:!,o: E (Z+)d.

(b) Suppose that Condition Dr holds with r = 2 and that Condition Mr


holds with r = 4 + 2ao, where ao is as specified by Condition Dr.
Then,

where A2 = - 2::~-oo IjIEZ 1 Z l+ j , Zi = 2:: 1al =1 ca(Xi - /L)a, i 2:: 1,


and Ca is as defined in (a) above.

Proof: See Section 5.7. o

Thus, it follows from Theorem 5.1, that the biases of the bootstrap es-
timators of 'PIn and 'P2n are identical up to the first-order terms for all
four block bootstrap methods considered here. In particular, contrary to
the common belief, the stationarity of the SB observations X.x
1, Xi 2, ...
does not contribute significantly toward reducing the bias of th~ res~lting
bootstrap estimators. Also, the use of either overlapping or nonoverlapping
blocks results in the same amount of bias asymptotically. Since the bias of
a block bootstrap estimator essentially results from replacing the original
data sequence XI, ... , Xn by independent copies of smaller subsequences,
all the methods perform similarly as long as the (expected) length .e of
these subsequences are asymptotically equivalent.
Next we compare the variances of the block bootstrap estimators of 'PIn
and 'P2n.

Theorem 5.2 Assume that the conditions of Theorem 5.1 on the block
length parameter .e and on the index r in Conditions Dr and Mr for the
respective parts hold. Then, there exist symmetric, nonnegative real valued
122 5. Comparison of Block Bootstrap Methods

functions gl, g2 such that


(a)

VarCBiXSj(C)) {4n 2g1(O)/3}n- 3 C+ 0(n- 3 C), j = 1,3 ;


VarCiiiASj(C)) {2n 2g1(O)}n- 3 C+ 0(n- 3 C), j = 2 ;

Var(BlASj (C)) (2n) [2ng1(O) + 1:(1 + eiW )gl(W)dW] (n- 3 C)

+ 0(n- 3 C), j =4 ;

(b)

Var(VARj (C)) {(2n)2g2(O)/3}n- 3 C+ 0(n- 3 C), j = 1,3 ;


Var("VARj(C)) {(2n)2 g2 (O)/2}n- 3 C + o(n- 3 C), j = 2 ;

Var("VARj(C)) (2n) [2ng2(O) + 1:(1 + e iW )g2(W)dW] (n- 3 C)

+ o(n- 3 C), j =4 .

Proof: See Section 5.7. o

The definitions of the functions gk (.), k = 1, 2 are somewhat complicated


and are given in Section 5.7 (cf. (5.8)). However, even without their exact
definitions, we can compare the relative magnitudes of the variance parts
of different block bootstrap estimators using Theorem 5.2. From parts (a)
and (b) of Theorem 5.2, we see that the MBB and the eBB estimators
(corresponding to j = 1,3) of 'PIn = Bias(Bn) and 'P2n = Var(Bn) have 2/3-
times smaller variances than the corresponding NBB estimators. Since the
blocks in the MBB and the eBB are allowed to overlap, the amount of vari-
ability among the resampled blocks is lesser, leading to a smaller variance
for these estimators. This advantage of the MBB over the NBB was first
noted in Kiinsch (1989) (see Remark 3.3 of Kiinsch (1989)). It is interesting
to note that in spite of all the differences in their resampling mechanisms,
all four block bootstrap methods have the same order of magnitude for the
variances of the resulting estimators. This is particularly surprising in the
case of the SB method, since it introduces additional randomness in the
resampled blocks. The effect of this additional randomness shows up in the
constant of the leading term in the expansion for the variances of the SB
estimators. Since !:l.k == f~7r(l + eiW)gk(w)dw ~ 0 for k = 1,2, it follows
that the SB estimators have asymptotically the largest variances among all
four block bootstrap methods for a given value of C.
5.5 Theoretical Comparisons 123

5.5 Theoretical Comparisons


5.5.1 Asymptotic Efficiency
In the following result, we summarize the implications of Theorems 5.1
and 5.2 for the asymptotic relative efficiency of different block bootstrap
estimators in terms of their MSEs. For any two sequences of estimators
{-01n}n21 and {-02n}n21, we define the asymptotic relative efficiency of the
sequence {-01n}n>1
-
with respect to {-02n}n>1-
as

Thus, if ARE( -01n; -02n) < 1, then the sequence of estimators {-01n}n21
are less efficient than {-02n}n>1 in the sense that {(;In'S have larger MSEs
than the MSEs of the estimat~rs -02n'S, for large n.

Theorem 5.3 Assume that the conditions of Theorems 5.1 and 5.2 hold
and that Ak i= 0, gk(O) i= 0, k = 1,2.

(i) For £-1 + n- 1/ 3£ = 0(1), for any i,j E {1, 2, 3, 4}, k = 1,2,

(ii) For £-l n 1/3 = 0(1), and for k = 1,2,

ARE(h n (2;£);c{!kn(j;£)) = 2/3 for j = 1,3;

£))
1:
ARE( c{!kn( 4; f); c{!kn(2;

[2 + 7f- 1 (1 + eiW)(gk(W)/9dO))dWr1 E (0,1/2) .

(iii) For £ = Cn 1/ 3(1 + 0(1)), C E (0, (0) and for k = 1,2,

A A.) 3 + 47f 2 c3 A;;2gk(0) .


ARE ( CPkn(2;£);CPkn();£) = 3+67f 2C3A;;2 gk (0) E (2/3,1),) = 1,3;

ARE( c{!kn( 4; f); c{!kn(2; £))


1 + 27f 2 C 3 A;;2gk (0)
124 5. Comparison of Block Bootstrap Methods

Proof: A direct consequence of Theorems 5.1 and 5.2. o

Theorems 5.1-5.3 are due to Lahiri (1999a). Note that asymptotic rel-
ative efficiency of the SB estimators with respect to the MBB and the
CBB estimators in parts (ii) and (iii) of Theorem 5.3 can be found
by the identity ARE (tPkn (4; C); tPkn(j; C)) = ARE(tPkn(4; C); tPkn(2; C)) .
ARE(tPkn(2; C);tPkn(j; C)), j = 1,3 and, hence, are not stated separately.
Also note that parts (i) and (ii) of Theorem 5.3 correspond to the cases
where the leading terms of the MSEs of the block bootstrap estimators
are determined solely by their biases and their variances, respectively. It
follows that for smaller values of the block length parameter C (i.e., un-
der case (i)), all methods have an ARE of 1 with respect to one another.
For large values of C (i.e., under case (ii)), the ARE of the SB is less than
1/2 compared to the other block bootstrap methods based on nonrandom
block lengths. In the intermediate case (i.e., under case (iii)), the MSE has
nontrivial contributions from both the bias part and the variance part. In
this case, the ARE of the NBB or the SB with respect to the MBB and
the CBB lies between 1 and lower bound on the limits under case (iii). In
particular, the ARE of the SB estimator tPkn (4; C) with respect to the MBB
estimator tPkn(l; C) under case (iii) lies in the interval (0,1), depending on
the value of the constant C and the function gk.

5.5.2 Comparison at Optimal Block Lengths


From Theorems 5.1 and 5.2, we see that for each of the block bootstrap
methods considered here, as the (expected) block length C increases, the
bias of a block bootstrap estimator decreases while its variance increases.
As a result, for each block bootstrap method, there is a critical value of
the block length parameter C that minimizes the MSE. We call the value
of C minimizing the leading terms in the expansion of the MSE as the
(first-order) MSE-optimal block length. Let

C~j = argmin{MSE(SlASj(C)) : n E :s; C :s; n(1-E)/2}

and
19j = argmin{MSE(VARj(C)) : n E :s; C :s; n(I-E)/2} ,

1 :s; j :s; 4 denote the MSE-optimal block lengths for estimating the bias and
the variance of en, i)
where E E (0, is a given number. The following result
gives the optimal block lengths C~j' k = 1,2, j = 1,2,3,4 for estimating
'PIn, 'P2n for the four block bootstrap methods considered in this chapter.

Theorem 5.4 Suppose that the conditions of Theorem 5.3 hold. Then, for
k = 1,2,
5.5 Theoretical Comparisons 125

r/
£gj f"V (AV[7r 2gk(0)])1/3 . n 1/ 3, j = 2 ;

£gj f"V (AV [27r 2gk (0) + 7r 1.:(1 + eiW)gk(W)dW]


3
. n 1/ 3, j = 4.
Proof: Follows from Theorems 5.1 and 5.2 and the fact that the function
h(x) = C1X+C2X-2, x> 0, with coefficients C1 > 0, C2 > 0 is minimized at
x* (2C2/C1)1/3
and
h(x*)
D

The formulas in Theorem 5.4 for the MBB and the NBB were noted
by Hall, Horowitz and Jing (1995). Note that the optimal block size for
the MBB is larger than that of the NBB by a factor of (3/2)1/3. For the
SB variance estimator of the sample mean, Politis and Romano (1994b)
show that the order of the MSE-optimal expected block length is n 1 / 3 • The
explicit formulas for the optimal block sizes for the SB bias and variance
estimators under the Smooth Function Model are due to Lahiri (1999a).
It is clear from the definitions of £gj that each block bootstrap method
provides the most accurate estimator of the parameter 'Pkn when it is used
with the corresponding optimal block length. In the next result, we compare
the block bootstrap methods at their best possible performances, i.e. when
each method is used to estimate a given parameter with its MSE-optimal
block length.
Theorem 5.5 Suppose that the conditions of Theorem 5.3 hold.
(a) Then, for k = 1,2,
MSE(CPkn(j; £gj)) = 3 1 / 3 [27r 2gk(0)Ak]2/3n -S/3 + 0(n- S/ 3),
j = 1,3 ;

MSE(CPkn(j;£gj)) = 3[7r 2gk(0)Ak]2/3n -S/3 +0(n- S/ 3), j = 2 ;

MSE(CPkn(j;£gj)) = 3[ {27r2gk(0) + 7r j-7r


7r.

(1 + e'W)gk(w)dw }Ak]
2/3

xn- S/ 3 + 0(n- S/ 3), j = 4 .

(b) For k = 1,2,

j = 1,3;
126 5. Comparison of Block Bootstrap Methods

Proof of Theorem 5.5: Follows from Theorems 5.1 and 5.2, and the
proof of Theorem 5.4. 0

Theorem 5.5 shows that when each method is used with the correspond-
ing MSE-optimal value of f, the MBB and the CBB has an optimal MSE
that is (2/3)2/3 times smaller than the optimal MSE for the NBB, and the
MSE of the optimal NBB estimator is, in turn, at least 2- 2/ 3 -times smaller
than that of the optimal SB estimator.
The following result shows that the ARE of the SB with respect to the
NBB at the optimal block length admits a lower bound.

Theorem 5.6 Assume that the conditions of Theorem 5.3 hold. Then, for
k = 1,2

Proof of Theorem 5.6: By Cauchy-Schwarz inequality, the maximum


value of the integral J::7r(l+cos W)9k(w)dw is attained if and only if 9k(W) =
Co· (1 + cosw) for almost all w E (-7r, 7r) (with respect to the Lebesgue
measure) for some Co E R Since 9kO is continuous, letting w - t 0, we get
Co = 1/2. Hence, from Theorem 5.5(b), for k = 1,2,

Theorem 5.5 is due to Lahiri (1999a). The lower bound result in Theorem
5.6 is due to Politis and White (2003).

5.6 Concluding Remarks


The results of Sections 5.3-5.5 show that in terms of their MSEs, the MBB
and the CBB estimators outperform the NBB estimators, which in turn
outperform the SB estimators. This conclusion is valid as long as the cor-
responding (expected) block lengths grow at a rate not slower than the
optimal block length. For estimating the bias and the variance functionals
'PIn and 'P2n, this optimal rate is const.n I / 3 , where n denotes the sam-
ple of size. When the respective block lengths grow at a slower rate than
the optimal rate, the main contribution to the MSE comes from the bias
5.7 Proofs 127

part. In this case, performance of all four methods are comparable with
all the AREs being equal to 1. This is a simple consequence of the fact
that, asymptotically, the biases of the bootstrap estimators derived from
all four methods have the same leading term. The finite sample simula-
tion example of Section 5.2 also supports this observation. When the block
bootstrap methods are used with block lengths close to the correspond-
ing optimal block lengths, the MBB and the CBB give the most accurate
bootstrap estimators.
Going beyond the bias and the variance of en,
it is possible to carry out a
comparison of the block bootstrap methods for more complicated function-
als (e.g., quantiles) of the sampling distribution of (a suitably studentized
version of) en.For estimating the distribution function and quantiles of a
studentized version of en,
the optimal block length is of the form const.n 1 / 4
for all four block bootstrap methods (cf. Hall, Horowitz and Jing (1995),
Lahiri (1999c, 2003c)). In this case, the AREs of the block bootstrap distri-
bution function estimators has an ordering that is exactly the same as that
for the bias and the variance functionals. Indeed, for block lengths growing
at a rate not slower than const.n 1 / 4 , the MBB and the CBB are the most
accurate among the four block bootstrap methods.
The results above show optimality of the MBB and the CBB only
among the four methods considered above. Carlstein, Do, Hall, Hesterberg
and Kiinsch (1998) have proposed a block bootstrap method, called the
Matched Block Bootstrap (MaBB) where the bootstrap blocks are resam-
pled using a Markov chain. Thus, unlike the four block bootstrap methods
covered here, the res amp led blocks under the MaBB are dependent. Un-
der some structural assumptions on the underlying process (e.g., AR(p)
or Markov), Carlstein et al. (1998) show that the MaBB estimator of the
variance of the sample mean has a variance that is of a comparable order
to the variance of the NBB estimator and has a bias that is of smaller or-
der. Thus, the minimum MSE of the MaBB is of a smaller order than the
minimum MSEs for the four methods considered here. Consequently, for
processes with a Markovian structure, the MaBB outperforms the above
methods at the respective optimal block sizes. For more general time se-
ries that may not necessarily have a Markovian structure, Paparoditis and
Politis (2001, 2002) recently proposed a method, called the Tapered Block
Bootstrap (TaBB) method, and showed that the TaBB method yields a
more accurate estimator of the variance-type level-2 parameters than do
the MBB and the CBB methods.

5.7 Proofs
Let S(i; k) = 2::~~7-1 Yn,j, i, kEN denote the partial sums of the periodi-
cally extended time series {Yn,ih?:l. Recall that (s = (EIIX11IS)1/S, s > 0,
128 5. Comparison of Block Bootstrap Methods

and for r E Z+ and 8 E (0, (0), let ~(r; 8) = 1 + E:=1 n2r-l[a(n)l2r~o. Let
{L(j;£) = E*X~(P, Co,j = DOH({L(j;£)), 1 :::; j:::; 4, Co = DOH(Xn ), a E
Zi, and Eoo = 'limn--+ooCov( V'nXn).
Under Condition Mr for any r E N, {XihEZ has a (continuous) spectral
density matrix fO, defined by the relation

We index the components of the d x d matrix-valued function f(x) by unit


vectors a; (3 E ;[4
as f(x; a, (3), lal = 1 = 1(31. Next define

91(W) = L L L L co+p c'Y+v {J(w;a,1')/(w;(3,v)


lol=II.aI=II'YI=llvl=1
+ J(w; a, v)/(w; (3, 1') + J(w; (3, 1')/(w; a, v)
+ J(w; (3, v)/(w; a, 1')} , (5.8)

-7r :::; W where for any complex number z = u + tV, u, v E ~,


:::; 7r,
Z == u - denotes its complex conjugate. Next let 92(W) be the function
tV
obtained by replacing co+.a, c'Y+ v in the definition of gl (w) by cocp, c'Ycv,
respectively. Note that, since f(w;a,(3) = /(w;(3,a) = J(-w;(3,a) for
all a, (3, the functions gl ( w) and g2 ( w) are real valued and are symmetric
about zero.
For clarity of exposition, we separately present the proofs of the main
results for the block bootstrap methods based on nonrandom block lengths
in Section 5.7.1 and for the SB in Section 5.7.2

5.7.1 Proofs of Theorems 5.1-5.2 for the MBB, the NBB, and
the eBB
Lemma 5.1 Assume that £ = O(n 1 - E ) Jor some 0 < E < 1, EIIXI I1 2r +8 <
00 and ~(r; 8) < 00, for some positive integer rand Jor some 8> O. Then,

(i) E{E*IIS(Ij ,I;£)11 2 r}:::; C(r,d)(?;+8~(r;8) .£r Jor j = 1,2,3.


(ii) EII{L(j; £) 11 2r :::; C(r, d)(?;+8~(r; 8) . n- r Jor j = 1,2,3 .

(iii) E{E*IIX~~PI12r}:::; C(r,d)(?;+8~(r;8) ·n- r , j = 1,2,3.

Proof: We prove part (i) first. Note that by Lemma 3.2, for j = 1, we get

E{E*IIS(I ,I;£)11 2r}


j E{ N- 1 t, IIS(i;£)11 2r } = EIIS(i;£)11 2r

< C(d, r)(?;+8~(r; 8)£r. (5.9)


5.7 Proofs 129

The proof is similar for j = 2. For j = 3, note that for any v E zt and any
1 < m < n/2,

IE*(S(I3,1; m)t - E*(S(h,l; m)tl

< C(v) [n- I ~ {IIS(n - i + 1; i)lllv l + IIS(I; m - i)lllv l }

+ n-2m n~+l IIS(i; m)lllv 1] • (5.10)

Hence, the bound for j = 3 follows from (5.9), (5.10), and Lemma 3.2. This
proves part (i).
As for part (ii), note that for all j E {I, 2, 3}, fl(j; f) = L~=I WijnXi for
some nonrandom weights Wijn with IWijn 1 ::; 1 for all i, j, n. Hence, using
cumulant expansion for moments and Lemma 3.2, we get (ii).
Part (iii) is a consequence of parts (i) and (ii) and the following result
(cf. Lemma III.3.1 of Ibragimov and Hasminskii (1980)): For zero mean
independent random variables WI, ... , W m and for any integer r 2 1,

(5.11)

This completes the proof of Lemma 5.1. D

Proof of Theorem 5.1 for j=1,2,3: We prove the theorem only for the
bias estimators 1.{!1n(j;f), j = 1,2,3. The proof for the variance estima-
tors 1.{!2n(j; f), j = 1,2,3 is similar and, hence, is omitted. Without loss of
generality, suppose that IL = O. (Otherwise, replace Xi's with (Xi - IL)'S
in every step below.) Note that by Taylor's expansion of H(X~CP) around
fl(j; f), we have '

1.{!1n (j; f)

L Ca,j { E* (x~~P - fl(j; f) ) a}


lal=2
+3 L (a!)-IE*(X~~P -fl(j;flf
lal=3

X 1\1 - U)2 D a H(fl(j; f) + uX~~P)du . (5.12)

For j E {I, 2, 3}, conditional on X n , X*Ct)


n,<
is the average of b iid random
variables and, hence, we may further simplify (5.12) as

1.{!1n(j;f) = b- I g- 2 L caE*(S(Ij,l;f)t + RIn(j;f) , (5.13)


lal=2
130 5. Comparison of Block Bootstrap Methods

where, after some lengthy and routine algebra, the remainder term R1n (j; £)
can be shown to satisfy the inequality
IR 1n (j; £)1
< C((2, ao, d){ b- 1 11fl(j; £)11 2
+ (1 + IIfl(j;£)llao) ·IIP(j;£)II(b- 1E*IIS(Ij,1;£)/£11 2 )
+ E* (1 + IIP(j; £)llaO + IIX~:P lIao) IIX~:P 11 3 } . (5.14)

By Lemma 5.1 and Holder's inequality


E(R 1n (j; £))2
< C( (2, ao, d) [b- 2 EIIP(j; £) 114
+ £-4b- 2 (Ellfl(j; £) 114) 1/2 (E(E* II S(Ij1; £) 11 8 )) 1/2
(2+ 2ao)
J) (6+ 2a
4

+g-4b-2(EIIP(j;£)116+2ao) (6+ 2a O) (E[E*IIS(Ij1;£)1I6+2a 0 O)

+ E { E. IIX~:P 11 6 + E. IIX~:P 116+ 2ao}


~ _6_

+ (EIIjl(j;£)116+2ao) (6+ 2a O) (E( E.IIX~:PII6+2ao) (6+


2ao ))]

< C((2, aQ, d) [n- 3 + n- 4 £2] . (5.15)


Hence, Theorem 5.1 follows from (5.13) and (5.15) for j E {I, 2, 3}, by
e
noting that Bias(On) == EOn - = n- 1 Llal=2 caEZ~ + O(n- 2 ). 0

For proving Theorem 5.2 for j = 1,2,3, we need a lemma.


Lemma 5.2 Suppose that Condition Mr holds jor some r 2 2 and that
f1 = 0. Then, jor any integers p 2 1, q 2 1 with p + q < 2r and jor any
t1, ... ,tp, Sl,···,Sq ER. d with Iitill:::: 1, Iisill:::: 1,

(g (g
£~~ g-1 t,E{ t~Ull) S~U1(Hj)) }
= L L {b(III + IJI;p + q)EZoo(IC)EZoo(JC)EZoo(I)Zoo(J)} ,
I ]

where b(k; m) = fo1 x k/ 2 (I-x)(m-k)/2dx, m 2 k 2 0, Zoo(I) = TIiEI t~Zoo,


Zoo(J) = TI jE ] sjZoo jor I C {I, ... ,p}, J C {I, ... , q}, Zoo '" N(O, I: oo ),
and where the summations LI and LJ respectively extend over all subsets
I C {I, ... ,p} and J C {I, ... , q}.
Proof: Let S(k; m) = X k + ... + X m , and for a set A, let

V(k, m; A) = II t~S(k; m) ;
iEA
5.7 Proofs 131

W(k, mj A) := II s~S(kj m) ,
iEA

m 2::: k 2::: 1. Then, for £1/2 :<: ; j :<: ; £ - £1/2, setting m := L£1/4 J, we have

~j,£ := E{ (g t~Ul1 (11 S~U1(Hj»)


) }

r(p+q)/2 E [{gt~(S(ljj - m) + S(j - m + 1jj)

+ S(j + 1j£)) }{ 11 s~(S(j + 1j£) + S(£ + 1j£ + m)

+ S(£ + m + 1j £ + j)) }]

£-(p+q)/2 L LE{ V(l,j - mj IC)V(j + 1,£j I)


I J

X W(j + 1,£j J)W(£ + m + 1,£ + jj JC)}


+ Qn,j
where, by Holder's inequality and Lemma 3.2,

max {IQn,j I : £1/2 :<: ; j :<::; £ - £1/2}

< C(p, q) . £-E.:fL max { ( EIIS(l, k) IIp+q) p+q ( EIIS(l, i) IIp+q) 1- p+q :

1 :<: ; k :<: ; m, 1 :<: ; i :<: ; £, 1:<::; a :<: ; p+ q}


P+ q
< C(r, 8)(2r+8~(rj 8) . £- .tl9.
2 max { ( m .tl9.
2
) --!L (
p+q £E±!l)
2
1- --!L
p+q :

l:<::;a:<::;p+q}

O([m/£p/2) = 0(1) as £ ----) 00 . (5.16)

Note that the variables V(l,j - m;IC), V(j + 1,£jI)W(j + 1,£j1), and
W(£+m+1, £+jj JC) are functions of disjoint sets of Xi-variables, which are
separated by m many Xi-variables from one another. Hence, by Proposition
3.1, Lemma 3.2, and Holder's inequality, with "y = 2r - (p + q) > 0, we get

I~j,£ - [rE.:fL L L EV(l,j - m;IC)


I J

X E{V(j + 1,£;I)W(j + 1,£jI)}EW(1,j - mj JC)] I

< C(p, q)rE.:fL LL I J


a(m -l)p+~+'Y 1lllS(ljj - m)IIII WI
p+q+,
132 5. Comparison of Block Bootstrap Methods

x 1111 8 (1; £ - j)IIIIIII+IJIIII18(1;j - m)11111J"1 + Qn,j


p+q+~ p+q+~

< C(r,o)(~:-:(d~(r,o)]. a(m _lp/2r + Qn,j (5.17)


uniformly in £1/2 ::; j ::; £ _ £1/2.
Next note that by the Central Limit Theorem for strong mixing random
variables (cf. Theorem A.8, Appendix A), n- 1/ 28(1; n) ---,>d Zoo as n --t 00
and by Lemma 3.2, [n- 1 / 2 8(1, n)]V is uniformly integrable for any v E ;l4
with Ivl < 2r. Hence, it follows that

lim E [C 1 / 2 8(1,
....... 00
i)] v = EZ~ (5.18)

for any v E Z~ with Ivl < 2r. Hence, the lemma follows from (5.16)-(5.18)
by observing that for any J, J,

£-(p+q+1) L EV(l,j - m; JC)E{ V(j + 1, £; J)W(j + 1, £; J)}


RJ/2 5,j 5,i-i 1/ 2
x EW(l,j - m; JC)

{-J-m}-2 {{.-J}-2
. IICI+IJCI 0 • III+IJI
£-1 £- -£- U(j_m)V(£_j)
£1/25,j5,£-£1/2

[ior1 x (II" 1+iJ"1l


2 (1 - x)
.L!.l±J..:!J.]
2 dx uoovoo(1 + 0(1)) as £--t00,

where
Uj EVj(JC)EWj(JC)/j (IICltIJCll ,

Vj E{Vj(I)Wj(I)}/jC!lltIJll, j 2: 1 ,

and Uoo and Voo denote the limits of the sequences {Un}n~1 and {Vn}n~1
(cf. (5.18)), respectively. 0

Proof of Theorem 5.2 for j=1,2,3: As in the proof of Theorem 5.1,


we shall restrict attention to the bootstrap bias estimator CP1n(j; f). Write
Tn,j = b-1£-2l:lal=2CaE*(S(Jj,1;£))"', 1 ::; j ::; 3. Note that by (5.13)
and the Cauchy-Schwarz inequality,

IVar(CP1n(j;£)) - Var(Tn,j) 1

< IIR1n(j;£)II: + 2 [var(Tn,j)f/21IR1n(j;£)112 .


Hence, in view of (5.15), it is enough to show that

Var(Tn,j) = [4;2 91(0)] . (n-3£) (1 + 0(1)), j = 1,3 (5.19)


5.7 Proofs 133

Var(Tn,j) = [271' 2 91(0)]. (n-3g) (1 +0(1)), j=2. (5.20)

First consider (5.19) with j = 1. With Uli == S(i;g)/V£, i E Z, by Lemma


5.2, we get
N
b- 2g- 2 var( L C<>[N- l LUliJ)
1<>1=2 i=l
£
N- l b- 2g- 2 L L C<>CfJ[Lcov(u1l ,Ufi+ l )]
1<>1=2IfJl=2 i=-£
+Ql1n

2b- 2g- 1 N- 1 L L c<>CfJ[b(4;4)EZ~+fJ


1<>1=2 IfJl=2
+ {2· b(2, 4) + b(O, 4) - l}EZ~EZ!] (1 + 0(1)) + Qlln
2
"3 b- 2g- l N- 1 L L c<>cfJCov(Z~, Z!)[l + 0(1)]
1<>1=2IfJl=2
+ Qlln , (5.21)

where the remainder term Ql1n is defined by subtraction and by Proposi-


tion 3.1, it satisfies the inequality
N-l 2

IQl1n I ::; Cb- 2g-2 N- l [ L a( i _ g)(a o+l)/(ao+3) ( E11U1l116+2ao ) ao+3


i=£+1

+ N- 1 g2 EllUl1 114]
0(n- 3g) .

Now (5.19) follows from (5.21) for the case j = 1.


Next consider (5.19) with j = 3. By (5.10) and Lemma 3.2, with DH =
max{ID<> H(JL)I : 0::; lal ::; 3}, we have

r
Elb- l g- 2 L c<>(E*(S(I3,l;g)t -E*(S(h,l;g)t)1

t,
1<>1=2

< C(D H ,d)n- 2g- 2 [n- 2E{ IIS(1;i)11 2


134 5. Comparison of Block Bootstrap Methods

o(n- r [n- .e + n- .e N
2 2 2 4 4 2 2 .e2 ])

O(n- 4 .e2 ) . (5.22)

The expansion for Var('Pln(j; .e)), j = 3 now follows from (5.22) and the
result for the case j = l.
To prove (5.20), note that for j = 2, with U};) = S((i - 1).e + 1; .e)/V£
and V2i = LI"'I=2 C", [U};)]"', i E Z,

Var(Tn,j) = b- 2.e- 2var( L C",{ b- 1


1"'1=2
t
.=1
[U};)]'" })

b- 2.e- 2 b- 1Var(V2d + Q21n


b- 3 .e- 2 L L C",cf3Cov(Z~, Z!)(1 + 0(1)) + Q21n
1"'1=21f31=2
where, by Proposition 3.1, with a = ao,
b-l

IQ21nl < cb- 3 r 2 L ICOV(V21, V2 (Hl»)1


i=1
< Cb- 3 .e- 2 [ICOV(V21, V22 )1

+ t; a(i.e - .e)Wa
b-l
(EIIUl1 ll6+2a)
2
a+3
]

< Cb- 3 .e- 2 L L IC",Cf3I·ICov(Ufl,uf(Hl»)1 +0(b- 3 .e- 2 )


1"'1=21f31=2
0(n- 3.e) ,

provided we show that ICov( Ufl, U~Hl») I = 0(1) for all lal = 2 = 1.81·
This can be done using arguments similar to those used in the proof
of Lemma 5.2. More precisely, writing U1 (Hl) as the sum U1 (Hl) =
.e- 1/ 2 [8(.e + 1,.e + m) + 8(.e + m + 1,2.e)] with m = l.e 1/ 4J, we have for
any lal = 1.81 = 2,

ICov( Ufl' U~Hl») I


< ICov(Ufl,.e- 1 8(.e+m+l;2£)f3)1

+ 4rl (Ell Ul1 114) 1/2 (Ell 8(1; m) 114) 1/4 (Ell 8(1;.e _ m) 114) 1/4

+ 2£-1 (EllUttll) 1/2 (EI18(1; m)11 4) 1/2


5.7 Proofs 135

C(EII Uu I1 +<5) 4-FO (EII.e- 1/ 2S(1;.e _ m)11 +<5)


2 2

< 4 4 4+6 [a(m)]4!"


+ O(m1/2.e- 1/ 2) + O(m.e- 1)
0(1) .
This completes the proof of (5.20), and hence, of Theorem 5.2. 0

5.7.2 Proofs of Theorems 5.1-5.2 for the BB


Lemma 5.3 Assume that.e = O(n1-f) for some 0 < E < 1, EIIXl I1 2r +<5 <
00 and ~(r; 8) < 00, for some positive integer r and for some 8> o. Then,
(i) E{E* 118(14 ,1; Ld 112r} ::; C(r, d)(i;+<5~(r; 8) . .er .

(ii) E{E*IIX~~:)1I2r} ::; C(r, d)(i;+<5~(r; 8) . n- T (l + .e(np)-2r) .


Proof: Note that the (conditional) distributions of h,l and 14 ,1 are the
same and that 14 ,1 and L1 are independent. Hence, by part (i) of Lemma
5.2 with j = 3, by Lemma 3.4, and by the stationarity of the Xi's, we get

E(E*118(1 ,1; L 1)11 2r)


4

E [f1 {E*118(14 ,1; m)11 2r . p(l- p)m-1}]

2r£ logn
< L E{E*118(I3,1;m)11 2rp(1-p)m-1}
m=l

+ C(r) . max {EI18(1; k)11 2r : 1 ::; k ::; n} exp( -2r logn)


2rl! logn ]
< C(r, d)~(r; d)(i;+<5 [ ~ mrp(l - p)m-1 + n r exp( -2r logn)

::; C(r, d)~(r; d)(i;+<5£T ,


proving (i).
To prove (ii), write Rj,n = L~l X},i - nX~~), for the sum of the excess
(N1 - n) many bootstrap data-values beyond Xl, 1, ... , X},n, j = 4. Note
that
LK
R 4,n = L W;Yn,J(i) ,
i=l

where J(i) = 14 ,K + i - I and w;


= 0 if L1 + ... + LK-1 + i ::; nand = 1,
otherwise. Without loss of generality, assume that the bootstrap variables
14 ,1, ... ,14 ,n and L 1 , ... , Ln are also defined on the same probability space
(0, F, P), supporting the sequence {XihEZ. Let en = (7(L 1 , ••. , L n ), Xn =
136 5. Comparison of Block Bootstrap Methods

O'(XI , ... , Xn) and Tn = .en V Xn = the smallest O'-field containing both
.en and Xn , n ~ 1, as in Chapter 3. Then, R 4,n may be considered as a
random vector on (n, F, P) and

E{ E(IIR~,nI12rIXn)} = E(IIR~,nI12r)
E[E{E(IIR~,nI12rl7;,)I.en}] .
Note that the random variables {I4,I,oo.,I4,n}, {LI,oo.,L n }, and
{Xl, ... ,Xn } are all independent. Hence, it follows that conditional on
.en, LK and wi's may be treated as nonrandom quantities. Consequently,
by Lemma 3.2,

E{ E(IIR~,nI12rITn) l.en }

E ( n- l ~ II ~ w;Yn ,j+i_ 1 1 2r l.e n)

< C(r) max { Ell ~aixil12r : 1:::; m:::; LK /\n, ai E {D, I}}
< C(r)(~;+lifl(r; J)LK .
Therefore, by Lemma 3.4,

Note that L~l Li :::; n+LK and that conditional on 7;" S(I4,i; Li)-LiXn,
1 :::; i :::; K are zero mean independent (but not necessarily identically
distributed) random vectors. By Lemma 3.4, (5.11), and the inequality
above (5.23),

E ( E* IIX~::) 112r)

n- 2r E{ E* II t, S(I4,i; Li ) + R~,n 112r}


< C(r)n-2r [E(E*IIR~,nI12r) + E{ (E*I t,LiI2r) .IIXnI12r}
+ E{ E* II t,(S(I4,i; Li) - LiXn)112r}]

< C(r)n-2r(~;+lifl(r; J){ E(L K ) + (n 2r + EL~)n-r}

+ C(r)n-2r E( E{ E(II t,(S(I4,i; Li ) - Li X n)11 2r ITn) IXn})


5.7 Proofs 137

< C(r)(~;+oLl(r; 8)n- r [1 + n- 2r E(LK )2r]


+ C(r)n-2r E( E{ K r- 1 t, E(IIS(I4 ,i; Li)11 2r ITn)l;t'n })

< C(r)(~;+oLl(r; 8)n- r [1 + n-2rf2r+1]


K
+ C(r)n-2r E{ K r- 1 ~ E(IIS(I4 ,i; Li)112rl.cn)}

< C(r)(~;+oLl(r; 8)n- r [1 + n- 2r f2r+1]


+ C(r)n-2r E [K r- 1 t, {(EIIS(l; Li)112rl.cn)1l(Li ::; 4rf logn)

+ max{EIIS(l; m)11 2 r : 1 ::; m ::; n}ll(Li > 4rf IOgn)}]


K
< C(r)(~;+oLl(r; 8) . n- r [1 + f(np)-2r + n- r { E( K r - 1 ~ L~)

+ E( n r - 1 t {n r . ll(Li ~ 4rf logn)}) }]


< C(r)d;+oLl(r; 8)n- r [1 + f(np)-2r] .
This completes the proof of Lemma 5.3. D

Lemma 5.4 Let 9 : (-7f, 7f] ---t [0,00) be a continuous function that is

symmetric about zero. Then, with p = f- 1 , q = 1 - p,

I: [T1
(i) nl!...~l: [e'W/(l- qe'W)]g(w)dw

+ cosw + (cos(w/2))2]g(w)dw .

I:
= 7fg(O)

nl!...~p.
I: I:
(ii) [e'w /(1- qe'W)] [e'2W /(1- q2e'2W)]g(w)dw

+ 4y2)-2dy - (1 + 4 y2 )-1 dY] .

I:
= g(O) [2 (1

(iii) [e'2W /(1- q2e'2W)]g(w)dw = 0(1) as n ---t 00 .

Proof: (i) Since 9 is real and symmetric, for any M > 1, we have
i:
138 5. Comparison of Block Bootstrap Methods

(1 + q2 - 2qcosw)-1 [(1- qcosw) cosw + q(sin w)2]g(w)dw


( Jlwl~Mp r
r + JMp<lwl<M-l + r ) (p2+4qSin2(w/2»)-1
JM-l<lwl<7I'
. [(p + 2q sin 2 (w/2») cos w + q(sin w)2] g( w)dw
h(M) + h(M) + h(M), say. (5.24)

Using the change of variable y = w/2p and the Bounded Convergence


Theorem (cf. Theorem 16.5, Billingsley (1995», one can show that for any
M> I, as n -7 00,

()
h M = 2 1lyl<M/2
(1+2qp-1sin2yp)cos2py+qp-1(sin22yp)
. 2
1 + 4qy2[(py)-2 sm py]

g(2py)dy

= 2g (0)·1 (1 + 4y2)-1dy + 0(1) . (5.25)


IYI<M/2

Since x/3 < sinx for all x E (0, 7r /2]' for any M > I, we have

h(M) < C JMP~W<M-l (p ::2) g(w)dw

< C[ r
JM<y«Mp)-'
y-2g(py)dy +
JO<W<M-l
r
g(W)dW]

< Cmax{g(w): 0 < w < M- 1} . M- 1 . (5.26)

Finally, by the bounded convergence Theorem, for any M > 1,

lim h(M)
n->oo
r
JM-l<lwl<7I'
(4sin 2 ~ )-1 [(2 sin 2 ~) cosw + (sin W)2] g(w)dw
2 2
.

(5.27)

Part(i) follows from (5.24)-(5.27), by letting M -7 00 and by noting that


I~oo(l + 4y2)-1dy = 7r/2.
Next, consider part (ii). Let

h ( ) = p2(2-p)-2p(2-p)sin 2(w/2)+2(sin 2 w)(q-2cosw) ( )


n W - [p2(1+q)2+4q2 sin2 w][p2+4qsin2(w/2)] g W ,

WE(-7r,7r).

Then, using the symmetry of g(w), it can be shown that the integral on the
left side of (ii) equals D7I' hn(w)dw. As before, we split this integral into
5.7 Proofs 139

three parts, now ranging over the sets [-Mp,Mp], {w: Mp < Iwl < 7f/2},
and {w : 7f /2 ::::: Iwl < 7f}, where M > 1. Arguments similar to (5.25) and
(5.26) yield,

lim p. r
Jlwl~Mp
hn(w)dw

(1
n-+=

2 2 - 8Y2 dy ) g(O) (5.28)


IYI~M/2 [4 + 16y2][1 + 4y2]
and

< Cq-3 [1 Mp<lwl<7I'/2


[P2 +4w 2 ] g(w)dw ]
w

< Cq-3 p -1 [iM= u- 4du + iM= U- 2dU] . Ilgll= , (5.29)

for any M > 1.


For the third region, note that for n large,

171'/2<lwl<7I'
Ihn(w)ldw

< C 171' (p + sin 2 w) . g(w)dw


71'/2 [P2(1 + q)2 + 4q2 sin 2 w] . w 2
< C r/ 2 g(7r - w)(p + sin 2 w) dw
Jo [P2 + 4 sin2 w]
< C, (5.30)
by arguments similar to (5.25) and (5.27). Part (ii) now follows from (5.28)-
(5.30).
Part (iii) also follows by similar arguments. We omit the details. D

Proof of Theorem 5.1, j=4: We prove the theorem only for 'PIn. With-
out loss of generality, let f..L = O. Note that the SB observations {X4' ih>1
form a stationary dependent sequence. As in Section 5.7.1, using Taylo~'s
expansion, for j = 4, we have (cf. (5.13)),

, ("J, ~n) -- ""


'PIn 6 CO' E * (X- n,f
*(j) - X-)o'
n + R In ("J, ~n) (5.31 )
10'1=2
where the remainder term RIn(j; £) now admits the bound

IRIn(j; £)1 < C(ao, d, (2){ IIXnl1 (1 + IIXnllaO)E* IIX~~) 112


+ (1 + IIXnliao + Ilx~:PllaO)E*IIX~~)113} .
140 5. Comparison of Block Bootstrap Methods

Hence, using Holder's inequality, Proposition 3.1, and Lemma 5.3 as in the
derivation of (5.15), for the case j = 4 we have,

E(R1n(j; £)) 2

< C(ao, d, (2) [{ EIIXnI14} 1/2 {E(E* IIX~:P 11 8) } 1/2


(2+ 2aO) __
4_

+ (EIIXnll6+2ao) (6+ 2a O) {E(E* IIX~:P 116+ 2a O) } (B+


2a
o)

+ E{E*(IIX*(j)
n,.c 1 6 + IIX*(j) o
n,£ 116+2a )}]

(5.32)

Next for 0 :::; j :::; n - 1 and 0'., (3 E Zi


with 10'.1 = 1 = 1(31, write
a(j; 0'., (3) = n- L.~:! Xi
1 xf+
j · Then, using (3.24) from Chapter 3 and
the stationarity of the X4',i's, we have

~ '"' - *(4) - a+f3


~ ~ ca+f3 E * (Xn,c - Xn)
lal""l If3I=l
n- 2 L L C a+f3 [n{ E*(X4',l)a+ f3 - (xn)a+ f3 }
lal=llf31=l
n-1
+ L(n - j){ (E*(X4',l)a(X4',(]+l))f3 - (xn)a+ f3 )
j=1
+ (E*(X4',l)f3(X4',(]+l))a - (xn)a+ f3 )}]

n- 1 L L a+f3 [{ a(O; 0'., (3) - (xn)a+f3}


C

lal=11f31=1
n-1
+ L(1- n- 1j)qj {(a(j; 0'., (3) + a(j; (3, 0'.))
j=l
+ (a(n - j; 0'., (3) + a(n - j; (3, 0'.)) - 2(xn)a+f3}]

n- 1 L L a+f3
C [~qnj (a(j; 0'., (3) + a(j; (3, 0'.))
lal=llf3l=l j=O

-{1+2~(1-n-1j)qj}(Xn)a+f3], (5.33)

where qnj = (1 - n- 1j)qj + (n- 1j)q(n- j ), 1 :s; j :s; n - 1; and qnO = 1/2.
Therefore, by (5.31), (5.32), and (5.33), it follows that

Bias( tP1n (4; £))


= EtP1n(4;£) - Bias(On)
5.7 Proofs 141

n- 1 L L C a+f3 [~(qnj - 1)(1 - n- 1 j)


lal=llf3l=1 j=1

x {EXf Xj+1 + Exf Xj+! } ]

+O(n-l(~qj)EIIXnI12) +O(n- 3 / 2). (5.34)

Note that by Taylor's expansion, 11 - qj - jpl :S Pp2/2 for all j ;::: 1 and
all 0 < p < 1, and that
p-l(1 - qnj)(1 - n- 1 j) -+ j as n -+ 00 for all j;::: 1 .

Also, by the mixing and moment condition, L~1 j21EXf xf+ j I < 00.
Hence, using the Dominated Convergence Theorem (DCT), from (5.34),
we get

Bias (<PIn (4; fI))


00

_n- 1 p L L ca+f3 Lj(EXf xf+ j + Exf Xf+j )(1 + 0(1))


lal=1 1131=1 j=1
+ O(n- 2f1) + O(n- 3/ 2 ) .
This completes the proof of Theorem 5.1 for j = 4. o

Proof of Theorem 5.2 for j=4: We restrict attention to the boot-


strap estimator <PIn (4; fI) and without loss of generality, set JL = O. Since
E{ n- 1(1 + 2 L?~}(1 - n- 1j)qj) IIXnI12} 2 = O(n- 4f12), in view of (5.31),
(5.32), and (5.33), it is enough to show that

ca+f3[~qnj(u(j;a,jJ)+u(j;jJ,a))])
1:
var(n- 1 L L
lal=llf3l=1 j=O

= (27r) [27r 91 (0) + (1 + eiw )gl (w )dW] (n -3 fI) + o( n -3 fI) G5.35)


To this end, we make use of Lemma 3.3 of Chapter 3. Note that
under the conditions of Theorem 5.2 for j = 4, the remainder terms
Rn(j,k; a,jJ,,,!, v)'s satisfy the bound (cf. (3.36) of Chapter 3)
n-ln-1
max LLqnjqnkIRn(j,k;a,jJ,,,!,v)1 =O(n- 1 ). (5.36)
lal=lf3l=hl=l vl=1 j=O k=O

Let s = s n = lfl(10gn)2 + n 1 / 3 J.Write, a(k' a ,~(.I) = EX 1a Xf3l+k' k E Z ,


lal = 1 = IjJl and i]jvm = [1Jjv(m) + j + v]. Then,
max{ qnj : s :S j :S n - s} :S max{ qj V qn- j : s :S j :S n - s}
142 5. Comparison of Block Bootstrap Methods

< qS = o( exp( -s/Z))


O( exp( -(logn)2)) ; (5.37)

Iqnjqn(j+v) - qj qj+v I <


_ 2n -1 (.]q j + (]. + v )qj+V) , 1 <.
_ ], v <
_ 2s; (5.38)

n-2 n-l-j (n-j)-v-l


n- 3 L L qnjqn(j+v) L
j=n-s v=l m=-(n-j)+l
1(1- n-1ijjvm)a(m; a, ')')a(v + m; {3, v)1
n-2 s s-v-l
< n- 3 L L L
j=n-s v=l m=-s+l
{n-1(ln - jl + Iml + v)}la(m; a,')')lla(v + m;{3, v)1
< 3n- 4 s 2 ( f:
m=-oo
la(m;a,')')I) ( f
U=-<X)
la(u;{3,v)l) , (5.39)

and by similar arguments,


s (n-j)-l n-j-v-l
n- 3 L L qnjqn(j+v) L
j=O v=n-2s m=-(n-j)+l

{l- ijj~m}la(m;a,')')lla(v+m;{3,v)1
O(n- 4 s 2 ) . (5.40)
Next note that the functions '¢m(x) == (27f)-1/2exp(~mx), x E (-7f,7f],
mE Z form an orthonormal basis of the Hilbert space L2( -7f, 7f] with re-
spect to the inner product (iI, iz) = J::1I: /1 (X)/2(x)dx, iI, iz E L 2(-7f, 7f],
and, hence, 2:.':.=-00 (iI, '¢m)(iz, '¢m) = (iI,iz) for any iI,iz E L 2(-7f,7f].
Now using (5.37)-(5.40) and Condition M r , it can be shown that for any
unit vectors a,{3,,),,v E Zi,
n-2 n-l-j (n-j)-v-l_
n- 3 ~ ~ qnjqn(j+v) { L . (1 _ rlj~m)
J-O v-I m=-(n-J)+l

x a(m; a, ')')a(v + m; (3, V)}

n-3(tTlqv + t,t q2 j+V)

(n-j)-v-l
x{ L. m=-(n-J)+l
a(m;a,')')a(v+m;{3,V)}
5.7 Proofs 143

+0 (n- 4 (~jqj) (U'%;oo (1 + luJ)IIEXIX~+ull)

x (m~oo IIEXIX~+mll))
+O(n- 4 s2 +n- 2 exp (-(logn)2))

n- 3 (TI ~ qV + ~ ~ q2J+V) [m~oo (I: e'mw f(w; a,ry)dw)

x (I: e-,(v+m)wf(w;(3,V)dW)]

+ O(n- 4 s 2 )
00

n-3(TI + (1- q2)-lq2) Lqv


v=1

X (27r l:f(w;a,ry)e- ivw f(w;(3,v)dw)

+ O(n- 4 s 2 )

I:
n-3(TI +q2/(1_ q2))

x 27r qe iW (l_ qe iw )-If(-w;a,,)f(-w;(3,v)dw

+ O(n- 4 s 2 ) . (5.41)

By similar arguments, it follows that for any a, (3, ry, v E Z~ with lal =
1 = 1(31 = Iryl = lvi,

n- 3 ~n~j qnjqn(j+v) { (n-~V-I


}-O v-I m=-(n-})+I

I:
(1 - n-lijjvm)u(m + j + V; a, v)u(m - j; (3, ry)}

n- 3 27r {TI + (qe'W)2(1 - (qe'w)2)-1 }(1 - qe,w)-I


qe'W f(w; (3, ry)f(w; a, v)dw
+ O( n -4 s2) (5.42)

and
n-I (n-j)-I
n- 3 L q;j L (1 - n-I(lml + j)) x
j=O m=-(n-j)+1
144 5. Comparison of Block Bootstrap Methods

{CJ(m; a, ,)CJ(m; (3, v)

i: + CJ(m + j; a, v)CJ(m - j; (3,,)}

i:
n- 327r[4- 1 + q2(1 - q2)-1] f(w; a, ,)/(w; {3, v)dw

+ n- 327r [4- 1 + (qe'w)2 (1 _ (qe'W)2) -1]


x f(-w;{3,,)/(-w;a,v)dw
(5.43)

Let &In (j) = Ll et l=l LI,6I=l cet +,6 (&(j; a, (3) + &(j; (3, a)), 0 :::; j :::; n- l.
Then, by (5.36), (5.41)-(5.43) and Lemmas 3.3 and 5.4, we have

var( n- 1 L L cet +,6 [~qnj (&(j; a, (3) + &(j; (3, a)) ])


1<>1=11,61=1 j=O

n- 2 [~q~j Var (&In (j))


+ 2 ~n~j qnj qn()+v)COV(&ln(j),&ln(j + v))]

27rn- 3 [I: {T1 + q2(1 _ q2)-1

i:
(1 _(qe'W)2) -1}91(W)dw
+(qe'w)2

+ 2{ [2- + q2(1 - q2)-1] (1 - qe,w)-l qe,w gl (w)dw


i:
1

+ [T1 + (qe'W)2( 1- (qe'W)2) -1] (1- qe'W)-l qe'W g1 (W)dW}]

+ O(n- 4 s 2 )
27rn- 3 e[i: gl(w)dw + 27rg1(O) + i: cos Wg1 (W)dW]

+ o(n- 3 l) .

This completes the proof of Theorem 5.2 for j = 4. D


6
Second-Order Properties

6.1 Introduction
In this chapter, we consider second-order properties of block bootstrap es-
timators for estimating the sampling distribution of a statistic of inter-
est. The basic tool for studying second-order properties of block boot-
strap distribution function estimators is based on the theory of Edge-
worth expansions. Let On be an estimator of a level-1 parameter Band
Tn = y'n(On - B)/sn be a scaled version of On such that Tn ~d N(O, 1).
If we set Sn to be the (asymptotic) standard deviation of y'n(On - B), then
Tn is called a normalized or standardized version of On. If Sn is an estimator
of the asymptotic standard deviation of y'n(On - B), then Tn is called a
studentized version of On. In many instances, it is possible to expand the
distribution function of Tn in a series of the form

uniformly in x E JR, where <I> and </J, respectively, denote the distribution
function and the density (with respect to the Lebesgue measure) of the
standard normal distribution on JR and where Pl (.; 'Y) is a polynomial such
that its coefficients are (smooth) functions of some population parameters
'Y. The right side of (6.1) is called a first-order Edgeworth expansion for
the distribution function of Tn. Next, let T~ denote the bootstrap version
of Tn based on one of the several block bootstrap methods presented in
Chapter 2. Under suitable regularity conditions on Tn, on the resampling
mechanism, and on the underlying time series, we can often expand the
146 6. Second-Order Properties

conditional distribution function of T:' in an Edgeworth expansion of the


form
P*(T:' :::; x) = <1>(x) + n- I/ 2pI(X; in)¢(x) + op(n-I/2) (6.2)

uniformly in x E ffi., where PI(·;·) is the same function as in (6.1) and


where in is a data-based version of the population parameter 'I, generated
by the particular res amp ling method in use. Relations (6.1) and (6.2) may
be readily combined to assess the rate of approximation of the bootstrap
distribution function estimator P*(T:' :::; x). Indeed, by (6.1) and (6.2), it
follows that

sup Ip*(T:' :::; x) - P(Tn


xElR
:::; x)1
n- I / 2 sup IPI(X; in)¢(x) - PI(X; '1)¢(X) I + op(n- I / 2)
xElR

op(n-I/2) , (6.3)

provided in is a consistent estimator of 'I and the coefficients of the poly-


nomial PI (.; t) is continuous in the second argument t. In this case, the
bootstrap approximation P*(T:' :::; x) to P(Tn :::; x) has a smaller order of
error than the normal approximation to P(Tn :::; x), which is only of the
order O(n- I / 2 ) (cf. (6.1)). This property is referred to as the second-order
correctness of the bootstrap approximation, as it captures the second-order
term (i.e., the term of order n- I / 2 ) asymptotically. This line of arguments
for studying second-order properties of bootstrap methods was pioneered
by Singh (1981), who established second-order correctness of the IID boot-
strap method of Efron (1979) for the normalized sample mean of iid random
variables and provided the first theoretical confirmation of the superiority
of the bootstrap approximation over the classical normal approximation.
In this chapter, we consider second-order properties of different block
bootstrap methods for normalized and studentized versions of the sample
mean and of (more general) estimators that satisfy the requirements of the
Smooth Function Model. In Section 6.2, we introduce the basic theory of
Edgeworth expansion for the sample mean under independence. Since the
bootstrap blocks are drawn by independent resampling, Edgeworth expan-
sions for the conditional distribution of the bootstrap sample mean can be
derived using these techniques. In Section 6.3, we describe the framework
and the fundamental results of Gotze and Hipp (1983) on Edgeworth ex-
pansions for the sample mean of weakly dependent random vectors. We
next discuss extensions of the Edgeworth expansion theory for the sample
mean to the Smooth Function Model in Section 6.4. In Section 6.5, we
describe the results on second-order properties of various block bootstrap
methods based on independent resampling, including the MBB and the
SB, for normalized and studentized statistics under the Smooth Function
Model.
6.2 Edgeworth Expansions for the Mean Under Independence 147

6.2 Edgeworth Expansions for the Mean Under


Independence
Let Xl, ... ,Xn be independent but not necessarily identically distributed
JR.d-valued random vectors with EXi = 0 and EIiXills < 00 for all 1 ::; i ::;
n, where s 2: 3 is an integer. An excellent account of rigorous Edgeworth
expansion theory for the scaled sample mean Sn == FnXn = n- 1 / 2 L~=l Xi
is given in Bhattacharya and Rao (1986) for the multivariate case (d 2: 1),
and in Petrov (1975) for the univariate case (d = 1). In this section, we
recast some of these results for the multivariate case, which serve as a basis
for deriving Edgeworth expansions for bootstrapped statistics.
For easy reference later on, here we recall some relevant notation and
definitions. For a smooth function h : JR.d --t JR., we write Djh to denote the
partial derivative of h(x) with respect to the jth coordinate of x, 1 ::; j ::; d.
For dx 1 vectors v = (v!, ... , Vd)' E Zi
and x = (Xl' ... ' Xd)' E JR.d, let Ivl =
V1 + ... + Vd, v! = V1! ... Vd!, XV = n:=l (Xit;, and Ilxll = (x~ + ... + X~)1/2.
Also, let D V denote the differential operator Dr! ... D~d. For a nonzero
complex number z = rexp(Lw), r > 0, -7r < W ::; 7f, we define

log z = (logr) + LW ,

where L2 = -1. Then, log z is the so-called principal branch of the loga-
rithm, and it is analytic in the domain C \ (-00,0].
Let X be a JR.d-valued random vector and let '(t) = E exp(d' X), t E
JR.d denote the characteristic function of X. Note that under the moment
condition EllXlls < 00, log,(t) is s-times differentiable in a neighborhood
of zero. For v E Ziwith 1 ::; Ivi ::; s, let Xv denote the vth cumulant of X,
defined by
(6.4)
Let /-tv denote the vth moment of X, i.e., /-tv = E(XV), 1::; Ivl ::; s. Then,
it is possible to express the cumulants of X in terms of the moments of
X, and vice versa. Note that the exponential and the logarithm functions
admit the power series expansions

and
00 k
log(l + z) = 2:(-1)k-1 Zk , z E C, Izl < 1.
k=l
Using these expansions, we get the formal identity

f: (dt
(1 + Ivl=l
00

2: (dtXv/ v ! log (,(t)) = log /-tv/v!)


Ivl=l
148 6. Second-Order Properties

2:( _l)k-1 2: (dt /-Lv/v! )k /k,


00 ( 00
(6.5)
k=l Ivl=l

t E JR.d. Equating the coefficients of tV in (6.5), we can obtain an expression


for Xv's in terms of /-Lv's. In the multi-dimensional case, writing down an
exact expression for Xv is rather cumbersome. For most applications, a
working formula is given by (see page 46, Bhattacharya and Rao (1986))
Ivl
Xv = 2: 2: C(VI' ... ,Vki h, ... ,ik)/-L~l, ... ,/-L~~ (6.6)
k=l (k)

where C(VI' ... , Vki iI, ... , ik) are combinatorial constants, depending only
on their arguments, and where the summation L(k) extends over all
VI, . .. ,Vk EZi and il, ... , ik E N satisfying L~=l imvm = v. In the
one-dimensional case, we have the following relations (cf. page 46, Bhat-
tacharya and Rao (1986)):

/-LI
2
/-L2 - /-LI
/-L3 - 3/-L2/-LI + 2/-L~
/-L4 - 4/-L3/-LI - 3/-L~ + 12/-L2/-L~ - 6/-Li
/-L5 - 5/-L4/-LI - 1O/-L3/-L2 + 20/-L3/-L~ + 30/-L~/-LI
- 60/-L2/-Lf + 24/-L~ . (6.7)

For expressions for higher order cumulants, see Petrov (1975) and Kendall
and Stuart (1977).
Cumulants play an important role in the development of Edgeworth ex-
pansions for sums of independent random vectors. To gain some insight
into the derivation of the Edgeworth expansions for sums of independent
random vectors, first we consider the simpler situation involving iid ran-
dom vectors, at a heuristic level. Suppose that X is a JR.d-valued random
vector with EX = 0 and EllXlls < 00 for some integer s ~ 3. Then, in a
neighborhood of t = 0, by Taylor's expansion, we have

E exp(d' X) = exp ( 2: (dtXv/v! + R(t)) , (6.8)


1:5lvl:5s

where R(t) = o(lltII S ) as Iltll ---- o. Let Tn denote the sum of n independent
copies of X. Then, noting that Xv = EX V = 0 for allivi = 1, by (6.8), for
any given t E IR d , we have

Eexp (d'(n- I / 2 Tn ))
[E exp(d' X/vn)]n
6.2 Edgeworth Expansions for the Mean Under Independence 149

exp(n[ L (d/vn)"Xv/v!+R(t/vn)])
2:'OIvl:'Os

exp(-t'~t/2)exp (~n-rj2[ L (d)"Xv/V!] +nRn(t/vn))


r=l Ivl=r+2

exp( -t'~t/2) [1 +
f
m=l
(~n-rj2{ L (~t)"xv/v!}+o(n-(S-2)j2))m/m!]
r=l Ivl=r+2
(6.9)

as n - 4 00, where ~ = EXX' is the covariance matrix of X. If the terms


of order n- rj2 on the last line of (6.9) are grouped together, then we may
formally expand the characteristic function of the scaled sum Tn as

(6.10)
for each fixed t E JRd, where PrO's are polynomials. The Edgeworth expan-
sions for n- lj2 T n (or, equivalently, for the scaled sample mean) is obtained
by inverting the expansion for the characteristic function in (6.10). Theoret-
ical justification for this inversion step and for the formal approximations in
(6.9) and (6.10) are quite involved, and will not be presented here. We refer
the interested reader to the monograph of Bhattacharya and Rao (1986)
for details.
Using the heuristic arguments above, we now describe the Edgeworth ex-
pansion theory for the normalized sum of independent (but not necessarily
identically distributed) random vectors. Let Xl, ... ,Xn be a collection of
independent JRd-valued random vectors with EXj = 0 for 1 :::; j :::; n. Let
Xv,j denote the vth cumulant of Xj, 1 :::; j :::; n and let Xv = n- l ~7=1 Xv,j'
Then, define the polynomials Pj (z) == Pj (z; .) in Z E Cd by the formal iden-
tity in u E JR (d. (6.9) and (6.10))

L UjPj(z; {Xv})
00

1+
j=l

It can be shown (d. Lemma 7.1, Bhattacharya and Rao (1986)) that for
each j ~ 1, Pj (z; {Xv}) is a polynomial of degree 3j in Z and its coefficients
are smooth functions of the cumulants Xv of order Ivl :::; j + 2. The density
7/Jn,s of the (s - 2)-th order Edgeworth expansion of the scaled sample mean
Sn == n- lj2 ~7=1 Xj is defined via its Fourier transform

Jexp(d'x)7/Jn,s(x)dx
150 6. Second-Order Properties

(6.12)

t E JRd, where f; = n- 1 I:~=l EXjXj. Note that '1fJn,s depends on the


cumulants Xv for Ivl :s: s.
The density '1fJn,s can be recovered from its Fourier transform in (6.12)
by using the inversion formula:

'1fJn,s(x) = (27r)-d r
J~d
'1fJ~,s(t)exp(-d'x)dt, x E JRd. (6.13)

Next we evaluate the integral on the right side of (6.13). Note that the
function
(6.14)
is the Fourier transform of the function

(6.15)

where Pj (- D; {Xv}) is a differential operator obtained by formally substi-


tuting -D == (-Dl, ... ,-Dd)' in place of Z = (Zl, ... ,Zd)' in Pj(z;{Xv})
and where CPt:, denotes the density of the N(O, f;) distribution on JRd. For
example, if Pj (z; {Xv}) = I:~~o I:lvl=k avz v for some constants av's de-
pending on {Xv}, then fJ(x) = I:~~oI:lvl=kav(-l)lvIDvcpdx), x E JRd.
Since the vth order partial derivative of cpd x) is of the form Pv (x )cpd x) for
a polynomial pAx) (with coefficients depending on f;), from (6.12)-(6.15),
it follows that

(6.16)

where Pj (x; {Xv}) are polynomials determined by the relation

(6.17)

for j = 1, ... , s - 2. The exact forms of the polynomials Pj (-; .) are diffi-
cult to write down explicitly for a general d 2:: 1. See Bhattacharya and
Rao (1986), Chapter 7, for an illustrative example. Here we list the first
two polynomials for d = 1. For simplicity, suppose that the Xi's are iid
with mean zero and variance 1. Then, f; = 1. Furthermore, in the one-
dimensional case, the derivatives of the standard normal density function
cp(x) = (27r)-1/2 exp( -x 2 /2), x E JR, may be expressed in terms of the
Hermite polynomials Hk(X)'S, defined by the relation

dk
Hk(X)CP(X) = (-l)k-k (cp(x)), x E JR , (6.18)
dx
6.2 Edgeworth Expansions for the Mean Under Independence 151

k ~ O. The first few Hermite polynomials are given by

Ho(x) == 1, H 1(x) = X, H 2(x) = x 2 -1, H3(X) = x 3 - 3x,


H4(X) = x4 - 6x2 + 3, H5(X) = x5 - lOx3 + 15x, etc. (6.19)

See, for example, Hall (1992), page 44 and Petrov (1975), page 137. The
polynomials Pj(-; .), j = 1,2 are given by
1
P1(Xj {Xv}) = "6H3(x)J-L3, x E ~ , (6.20)

_ 1 2 1
P2(Xj {Xv}) = 72H6(x)J-L3 + 24H4(x)[J-L4 - 3], x E JR, (6.21)

where J-Lk = EXf, k ~ 1. For explicit expressions of the "distribution


function" Wn,s(( -00, xl) == J~oo 'l/Jn,s(y)dy, x E JR for the first few s values
in the case when Xi's are independent (and not necessarily identically
distributed) random variables, see Petrov (1975), page 138.
Next, we return to the general set up of JRd-valued independent random
vectors. Let 'lin,s be a signed measure on (JRd,B(~d)) with density 'l/Jn,s
(with respect to the Lebesgue measure) where 'l/Jn,s is as defined in (6.16).
The main result of this section says that under suitable regularity condi-
tions, asymptotic approximations for the probabilities P(Sn E·), or, more
generally, for the expected values Ef(Sn) for Borel-measurable functions
f : JRd -+ JR, with an error of the order 0(n-(s-2)/2) are given by J fdwn,s'
For E > 0 and sEN, let an(s, E) = n- 1 2:;=1 EIIXjIISn(IIXjll > Ey'n) and
let an(s) = an(s, 1). Also, for a function f : JRd -+JR and a measure J-L on
(JRd, B(JRd)), define the integrated modulus of continuity of f with respect
to J-L as

W(Ej f,J-L) = Jsup{lf(x) - f(y)1 : Ilx - yll ::; E}J-L(dy) , (6.22)

E > O. With {3(s) = 2Ls/2J, let

Ms(J) = sup { (1 + Ilxll,8(S)) -1 If (x)1 : x E JRd} . (6.23)

Then, we have the following result.

Theorem 6.1 Let X!, ... , Xn (n E N) be a collection of JRd-valued inde-


pendent random vectors with EXj = 0 for 1 ::; j ::; n, n- 1 2:;=1 EXjXj =
lId, and Pn,s == n- 1 2:~=1 EIIXjlls < 00 for some integer s ~ 3. Let
f : JRd -+ JR be a Borel-measurable function with Ms(J) < 00. Then, for
any E E (0,1), there exists a constant C = C(d, s) E (0, (0) (not depending
on f and n) such that, with Un == n-(s-2)/2,
152 6. Second-Order Properties

< c· Ms(f) [UnPn,s + (1 + Pn,s){ (an(s) + un)un


+ V-2(s+d+1)
n
+ n(S+2d)/2 Vn-(S+d+1)(d+2S)}
+ {'Yn(E)E- 2d + n s +d+1 E-8d exp( _E- 1 )}]
+ C· (1 + Pn,s) . W(2E; f, <I» (6.24)

whenever
(6.25)
Here v n = (p-n,s Un )-l/(s+d+1) , p-n,s = n- 3/ 2 ",n 1 n.(IIX·11 <
L....J=l EIIX·lls+
J J -
n 1/ 2), and 'Yn(E) L1:<=;jl, ... ,jS+d+l:<=;n sup{IINJi, ... ,j8+d+lI Bn,j(t)1
(16pn,3)-1 ::; Iltll ::; c 4 }, with Bn,j(t) = IEexp(d' Xj)1 + 2P(IIXj ll > v'n),
1 ::; j ::; n, t E ]Rd.

Proof: See Appendix B. D

Theorem 6.1 gives an upper bound on the difference between Ef(Sn)


and the expansion J fdWn,s for any given collection of random vectors
Xl' ... ' Xn (n ~ 1) for which (6.25) and the other conditions of Theorem
6.1 hold. This form of the Edgeworth expansion is most useful for estab-
lishing validity of Edgeworth expansions for the conditional expectation
E*f(S~) in the (block) bootstrap context, where S~ corresponds to a sum
of independent random vectors from a triangular array. Standard results
available in the literature are often stated for a sequence of random vectors
and, hence, they are not directly applicable in the bootstrap case. Theo-
rem 6.1 can be proved by careful modification of the main steps associated
with the proof in the "sequence" case. An outline of the proof of Theorem
6.1 is given in Appendix B. Note that the error of approximation in (6.24)
depends on the quantities 'Yn(E) and W(2E; f, <I». For 'Yn(E) to be small,
the distribution of Xj's must satisfy some smoothness condition (such as
Cramer's condition; see (6.28) and (6.31) below), while for W(2E; f, <I» to
be small, the function f cannot have too much oscillation. In most appli-
cations, we are interested in expansions for P(Sn E B) for B ranging over
some classes of sets in B(]Rd). In this case, f = n.B and the last term is
small under additional smoothness conditions on the boundary, 8B, of the
set B.
We now state a version of Theorem 6.1 for probabilities of events in-
volving the scaled row sum of a triangular array of independent random
vectors. In the statement of Theorem 6.2, for E E (0,00) and B C ]Rd, let
8B denote the boundary of B and let
(8B)' = {x E]Rd : Ilx - yll < E for some y E 8B} .
Theorem 6.2 Let {Xn,j : 1 ::; j ::; n}n~l be a triangular array of row
wise independent ]Rd-valued random vectors Xn,l, ... ,Xn,n with EXn,j =
6.2 Edgeworth Expansions for the Mean Under Independence 153

0, 1 :s; j :s; nand n- l 'L7=1 EXn,jX~,j = lId for each n :::: 1. Suppose that
for some integer s :::: 3 and some 15 E (0,1/2),

J~ n- l L E IIXn,jIISll(IIXn,jll > n~-") = 0 , (6.26)


j=l

n
lim sup n- l
n->oo
L EIIXn,jIIS < 00 , (6.27)
j=l

and for some sequence {17n}n>l C (0, (0) with 17n = 0(n-(s-2)/2),

limsup sup {IEexP(d'Xn,j)1 : (16pn,3)-1:s; lit II <17:;:4,I:S;j :s;n} < 1,


n->oo
(6.28)

sup IP(Sn E B) - 'lin,s (B) I = 0(n-(S-2)/2) (6.29)


BEB

for any collection B of Borel sets in ]Rd satisfying

sup <I> ((8B)') = O(E) as E 10. (6.30)


BEB

Proof: See Appendix B for an outline of the proof. o

Let C denote the collection of all measurable convex subsets of ]Rd. Then,
(6.30) holds with B = C. For d = 1, if we set B = {( -00, xl : x E ]R}, then
also (6.30) holds and Theorem 6.2 yields a (s - 2)-th order Edgeworth
expansion for the distribution function of Sn.
Next we consider the important special case where the triangular array
{Xn,j : 1 :s; j :s; n }nEN derives from a sequence of iid random vectors
{Xn}n;:::l, i.e., Xn,j = Xj for all 1 :s; j :s; n, n:::: 1. Then, (6.26) and (6.27)
holds if and only if EllXI!Is < 00. And condition (6.28) holds if and only if

lim sup IEexp(~t'Xdl < 1 . (6.31)


Iltll->oo

Inequality (6.31) is a smoothness condition on the distribution of Xl and


is known as Cramer's condition. A sufficient condition for (6.31) is that the
probability distribution of Xl has an absolutely continuous component with
respect to the Lebesgue measure on ]Rd. This is an immediate consequence
of the Riemann-Lebesgue Theorem (cf. Theorem 26.1, Billingsley (1995)).
In general, (6.31) does not hold when Xl has a purely discrete distribution.
154 6. Second-Order Properties

6.3 Edgeworth Expansions for the Mean Under


Dependence
Let {XdiEZ be a sequence of ffi.d-valued random vectors with EX i = 0
for all i E Z. The process {XdiEZ need not be stationary. In this sec-
tion, we state an Edgeworth expansion result for the scaled sample mean
Sn = ynXn = n- 1 / 2 I:~=1 Xi, when the Xi's are weakly dependent.
Derivation of Edgeworth expansions for dependent random vectors is tech-
nically difficult primarily due to the fact that unlike the independent case,
the characteristic function of the scaled sum Sn does not factorize into the
product of marginal characteristic functions. Extensions of the Edgeworth
expansion theory to dependent variables arising from a Markov chain have
been established by Statulevicius (1969a, 1969b, 1970), Hipp (1985), Mali-
novskii (1986), and Jensen (1989). For weakly dependent processes {XihEZ
that do not necessarily have a Markovian structure, Edgeworth expansions
for the scaled sum Sn under a very general framework have been obtained
by Gotze and Hipp (1983). In this section, we state some basic Edgeworth
expansion results for Sn under the Gotze and Hipp (1983) framework. Sup-
pose that the process {XdiEZ is defined on a probability space (0" F, P)
and that fDdiEZ is a collection of sub-u-fields of F. A key feature of the
Gotze and Hipp (1983) framework is the introduction of the auxiliary set of
u-fields {VihEZ that allows one to treat various classes of weakly dependent
processes under a common framework, by suitable choices of the sequence
{VihEZ. In the following, we first state and discuss the regularity condi-
tions that specify the role played by the Vi'S, and then give some examples
of processes {XihEZ and the corresponding choices of the u-fields {VihEZ
to illustrate the generality of the framework. For -00 :::; a :::; b :::; 00, write
V~ = U({Vi : i E Z, a :::; i :::; b}). We will make use of the following
conditions:
(C.1) For some integer 8 :::: 3 and a real number 0'(8) > 82,

sup { EllXj 11 8 [ log(l + IIXj II)] 0:(8) : j :::: 1} < 00 .

(C.2) (i) EXj = 0 for all j :::: 1 and

E = n~~ n- 1 var(f x
J=l
j) (6.32)

exists and is nonsingular.


(ii) There exists 8 E (0,1) such that for all n > 8-\ m > 8-1,

inf {t/var(.I: Xi)t: lit II = 1} > 8m.


t=n+1
6.3 Edgeworth Expansions for the Mean Under Dependence 155

(C.3) There exists 5 E (0,1) such that for all n, m = 1,2, ... with m > 5-1,
there exists a V~~:-measurable random vector X~,m satisfying

(C.4) There exists 5 E (0,1) such that for all i E Z, mEN, A E V~OO, and
BE V'ttm'

Ip(A n B) - P(A)P(B)I ::; 5- 1 exp( -5m) .

(C.5) There exists 5 E (0,1) such that for all m, n, k = 1,2, ... , and A E
n +k
V n-k

Elp(A I Vj : j # n) - P(A I Vj : °< Ij - nl ::; m+ k)1


::; 5- 1 exp( -5m).

(C.6) There exists 5 E (0,1) such that for all m, n = 1,2, ... with 5- 1 <
m < n and for all t E ffi.d with Iltll ~ 5,

EIE{ exp(d'[X n- m + ... + X n +m ]) I Vj : j # n} I ::; exp( -5) .

Now we briefly discuss the Conditions (C.1)-(C.6) stated above. Con-


dition (C.1) is a moment condition used by Lahiri (1993a) to derive an
(8 - 2)-th order Edgeworth expansion for the normalized sample mean.
It is slightly weaker than the corresponding moment condition imposed by
Gotze and Hipp (1983), which requires existence ofthe (8 + l)-th order mo-
ments of the Xl's. When the sequence {XdiEZ is m-dependent for some
mEN, Lahiri (1993a) also shows that an (8 - 2)-th order expansion for
the distribution of Sn remains valid under the following reduced moment
condition:
sup {EIIXj lis: j E Z} < 00 , (6.33)
as in the case of independent random vectors. The nonsingularity of ~ in
Condition (C.2)(i) is required for a nondegenerate normal limit distribu-
tion of the scaled mean Sn. When the process {XdiEZ is second-order sta-
tionary, Condition (C.2)(ii) automatically follows from (C.2)(i). Condition
(C.4) is a strong-mixing condition on the underlying auxiliary sequence of
u-fields Vl's. Condition (C.4) requires the u-fields Vl's to be strongly mix-
ing at an exponential rate. For Edgeworth expansions for the normalized
sample mean under polynomial mixing rates, see Lahiri (1996b). Condi-
tion (C.3) connects the strong mixing condition on the u-fields Vl's to the
weak-dependence structure of the random vectors Xj's. If, for all j E Z,
we set Vj = u(Xj), the u-field generated by Xj, then Condition (C.3) is
trivially satisfied with X~,m = Xn for all m. However, this choice of Vj is
156 6. Second-Order Properties

not always the most useful one for the verification of the rest of the con-
ditions. See the examples given below, illustrating various choices of the
a-fields Vj's in different problems.
Condition (C.5) is an approximate Markov-type property, which says
that the conditional probability of an event A E V~~Z, given the larger
a-field V{Vj : j =I=- n}, can be approximated with increasing accuracy when
the conditioning a-field V{Vj : 0 < Ij - nl ::; m + k} grows with m. This
condition trivially holds if Xj is Vj-measurable and {XdiEZ is itself a
Markov chain of a finite order. Finally, we consider (C.6). It is a version of
the Cramer condition in the weakly dependent case. Note that if Xj's are
iid and the a-fields Vi's are chosen as Vj = a(Xj), j E Z, then Condition
(C.6) is equivalent to requiring that for some J E (0,1),

1> e- o > EIE{ exp(d'Xn ) I Xj: j =I=- n}1


IE exp(d' Xd I for all Iltll ~ J .

It is easy to check that this is equivalent to the standard Cramer condition


(cf. (6.31))
limsuplEexp(d'Xdl < 1.
Iltll--+CX)
However, for weakly dependent stationary Xi's, the standard Cramer con-
dition on the marginal distribution of X I is not enough to ensure a "regular"
Edgeworth expansion for the normalized sample mean, as shown by Gotze
and Hipp (1983). Here, by a regular Edgeworth expansion, we mean an
Edgeworth expansion with a density of the form

8-2
~n,8(X) = [1 +L n- rj2 pr(X)]¢V(X), x E IRd
r=l

for some polynomials PI (.), ... , Pr (-) and for some positive definite matrix
V, where ¢v is the density of the N(O, V) distribution on IRd. The sequence
{XihEZ in the example of Gotze and Hipp (1983) is stationary and m-
dependent with m = 1. Furthermore, Xl has finite moments of all orders
and it satisfies the standard Cramer condition (6.31). However, a "regular"
Edgeworth expansion for the sum of the Xj's does not hold.
Next, we give examples of some important classes of weakly dependent
processes that fit into the above framework and we indicate the choices of
the a-fields V/s and the variables X~,m's for the verification of Conditions
(C.3)-(C.6).

Example 6.1: Suppose that {XihEZ is a linear process, given by

Xi = LajEi-j, i E Z , (6.34)
jEZ
6.3 Edgeworth Expansions for the Mean Under Dependence 157

where {adiEZ is a sequence of real numbers and {EdiEZ is a sequence of


iid random variables with EEl = 0, EEi = 1. Furthermore, suppose that
LiEz ai i= 0 and for some 8 E (0,1),

lajl = O( exp(-8Ijl)) as 111---) 00. (6.35)

If, in addition, E1 satisfies the standard Cramer condition,

lim sup IE exp(ttEd I < 1 , (6.36)


Itl-->oo

then Conditions (C.3)-(C.6) hold with Vj = U(Ej), j E Z. In this case, we


may take X~,m = Lljl:<:::= ajEn_j·
A special case of (6.34) is the ARMA(p, q)-model

(6.37)

where a1, ... , a p, {31, ... ,{3q (p EN, q E N) are real numbers and {EdiEZ is
a sequence of iid random variables as in (6.34). We also suppose that the
polynomials a(z) == 1- (a1z+··· +apz P), and (3(z) == 1 + {31Z+··· + (3qzq,
z E C have no common zeros in C and a(z) i= 0 for all z in the closed unit
disc {z E C: Izi ::; I}. Then, it can be shown that there exists a sequence
of constants {adiEz c JR satisfying (6.35) such that representation (6.34)
holds (see, for example, Chapter 3, Brockwell and Davis (1991)). If, in
addition, (3(l)ja(l) i= 0, then LiEz ai i= O. Thus, Conditions (C.3)-(C.6)
hold for the ARMA(p, q) model (6.37), provided E1 satisfies the standard
Cramer's condition (6.36), and the polynomials a(z) and (3(z) satisfy the
regularity conditions pointed out above. D

Example 6.2: Let {EdiEZ be a sequence of iid random variables and let

(6.38)

for some continuously differentiable function h : JRmo -+ JR, where mo E N.


The sequence {XihEZ of (6.38) is known as an mo-dependent shift. Note
that according to our definition (cf. Section 2.3), {Xi}iEZ is an m-dependent
sequence with m = mo -1. In this case, we set Vj = U(Ej), j E Z and take
X~ = = Xn for all m, n E N, with m ;:::: mo. Then, it is easy to see that
Co~ditions (C.3)-(C.5) hold with these choices of Vj and X~,=. A set of
sufficient conditions for (C.6) with Vj = U(Ej), j E Z is that E1 has a
density 9 with respect to the Lebesgue measure on JR, and, that there exist
Y1, ... , Y2=o -1 E JR and an open subset U containing Y1, ... , Y2mo -1 such
that 9 is (everywhere) positive on U and

(6.39)
158 6. Second-Order Properties

See G6tze and Hipp (1983), pages 218-219, for the details of the verifica-
tion of (C.6). As mentioned earlier, in this case Condition (C.1) may be
replaced by the weaker moment condition (6.33) for a valid (8 - 2)-th order
Edgeworth expansion for the probability distribution of Sn. See Theorem
2.2, Lahiri (1993a). 0

Example 6.3: Let {XihEZ be a stationary homogeneous Markov chain


with transition kernel q(x; A), x E ~d, A E B(~d). Suppose that Xl satisfies
the standard Cramer condition (6.31) and that

sup {lq(x,A) - q(y,A)1 : X,y E ~d, A E B(~d)} < 1 . (6.40)

Then, {XihEZ satisfies Conditions (C.3)-(C.6) with Vi = a(Xi ), i E Z


and xlm = Xn for n, mEN. See G6tze and Hipp (1983), page 219 for

more d~tails. 0

Example 6.4: Let {YihEZ be a stationary Gaussian process with a posi-


tive analytic density and let

Xi = f(Yi), i E Z

for some continuously differentiable nonconstant function f : ~ ---+ R


Then, Conditions (C.3)-(C.6) hold with Vi = a(Yi), i E Z, and xl m = Xn
for m, n E N. See pages 219-220, G6tze and Hipp (1983). ' 0

For verification of Conditions (C.3)-(C.6) in other problems, see Bose


(1988), Janas (1993), and G6tze and Hipp (1994).

Next, we describe the form of the Edgeworth expansion for Sn in the


dependent case. Like Section 6.2, we define a set of polynomials Qr,n(t), 1 ::;
r ::; 8 - 2, appearing in the Fourier transform of the Edgeworth expansion,
by the identity (in u E ~),

00

= 1+ LUrQr,n(t) , (6.41 )
r=l

where Xr,n(t) is the rth cumulant of the random variable t'Sn, t E ~d.
For 1 ::; r ::; 8 - 2, this definition of Qr,n(t) is essentially equivalent to the
definition of the polynomials Pr(-; {Xv}) given in (6.11) in the independent
case. To appreciate why, note that we may replace the sum on the left side
of (6.41) by 2:::3 and formally define the polynomials Qr,nO using the
resulting identity. But this modification does not affect the first (8 - 2)
6.3 Edgeworth Expansions for the Mean Under Dependence 159

polynomials, because the cumulants of order r 2: s + 1 do not appear in


these polynomials. Hence, both identities yield the same polynomials ij,.,n's
for 1 ~ r ~ s - 2.
It can be shown that under Conditions (C.1)-(C.6), for any given t E
JRd, X,.,n(t) = O(n-(,.-2)/2) as n -+ 00, and, hence, the coefficients of the
polynomials ij,.,n(t) are bounded sequences in n. However,·for a sequence of
nonstationary random vectors {XdiEZ' the coefficients of ij,.,n (-) typically
depend on n. If {XdiEZ is stationary, n(,.-2)/2 X ,.,n(t) , 2 ~ r ~ s may be
expanded further into a sum of the form

+ n- 1/ 2X,.,2(t) + ...
X,.,l(t)
+ n-(S-2)/2 X ,.,s_1 + o(n-(S-2)/2) (6.42)

(for t E JR d fixed) for some polynomials X,., 1(t), ... , X,., k (t), not depending
on n. As a result, for a stationary sequence {XdiEZ' the Edgeworth ex-
pansion for Sn may be written in terms of a set of polynomials that do not
depend on n. See Remark 2.12, Gotze and Hipp (1983) for more details.
Next, with ij,.,n(t) given by (6.41), we define density ~n,s ofthe Edgeworth
expansion for Sn in terms of its Fourier transform, by the relation

JeLt'x~n,s(x)dx
exp( -X2,n(t)/2) [1 + ~ n-,./2ij,.,n(d)] , t E JRd . (6.43)

The (8 - 2)-th order Edgeworth expansion Y n,s for Sn is defined as the


signed measure on (JR d, B(JRd)) having density ~n,s with respect to the
Lebesgue measure on JRd. As in Section 6.2, let f3( s) = 2l s /2 J and for a
Borel measurable function f : JRd -+ JR, define W(E; f, <PE) by (6.22), i.e., by
J
the relation w( E; f, <PE) = sup{lf(x + y) - f(x) I : llyll ~ E }<pE(dx), E > 0,
where 2: is as in (C.2). Then, we have the following result on asymptotic
expansion for Ef(Sn) in the dependent case.

Theorem 6.3 Suppose that Conditions (C.1)-(C.6) hold. Let f : JRd -+ JR


be a Borel measurable function satisfying sup{(l + Ilxll,6(s»)-llf(x)1 : x E
JRd} == Ms(f) < 00. Then, for any real number a E (0, (0), there exists a
constant C = C(a) E (0, (0) such that

IEf(Sn) - J fdYn,sl

< C· w(n- a ; f, <PE) + o(n-(s-2)/2) as n -+ 00 . (6.44)

Further, the term o(n-(s-2)/2) in (6.44) depends on f only through the


constant Ms(f).
160 6. Second-Order Properties

Proof: See Theorem 2.1, Lahiri (1993a). D

Theorem 6.3 readily yields an (8 - 2)-th order Edgeworth expansion


for the probability distribution of Sn uniformly over classes of Borel sets
satisfying an analog of the boundary condition (6.30). We note this in the
following result.

Theorem 6.4 Suppose Conditions (C.1)-(C.6) hold. Then,

sup IP(Sn E B) - Yn,s(B)1 = o(n-(s-2)/2) (6.45)


BEB

for any class B of Borel sets in ~d satisfying

sup <PI; ((BB)') = O(E) as E! 0, (6.46)


BEB

where (BB)' = {x E ~d : Ilx - yll < E for some y EBB}, E > 0 and BB
denotes the boundary of B.

6.4 Expansions for Functions of Sample Means


6.4.1 Expansions Under the Smooth Function Model
Under Independence
Edgeworth expansion theory presented in the previous two sections deal
with the sample mean. We now discuss some extensions of the theory to
statistics that can be represented as smooth functions of sample means.
First, we consider the case where {XdiEZ is a sequence of iid ~d-valued
random vectors with a finite mean vector EX l = /-L E ~d. Suppose that
the statistic of interest en e
and its target parameter obey the Smooth
Function Model of Chapter 4, i.e.,

(6.47)

for some (smooth) function H : ~d ----; ~, where Xn = n- l L:~=l Xi.


Let Wln = fo(e n - e). If EIIXIi!2 < 00 and H is differentiable at /-L
and the vector of first-order partial derivatives of H at /-L is nonzero, then
a first-order Taylor's expansion of H around /-L shows that

Wln = L D a H(/-L) [v/n(X n - /-L)]a + op(l) ,


lal=l

and, hence,
(6.48)
6.4 Expansions for Functions of Sample Means 161

where T2 = 2: 1.81=12: 1",1=1 D'" H(J.L)D.8 H(J.L)Cov(Xf, Xf). This is often re-
ferred to as the Delta method. Edgeworth expansions for Wn may be derived
by considering higher-order Taylor's expansions of the function H around
J.L. Suppose that for some integer s ~ 3, His s-times continuously differen-
tiable in a neighborhood of J.L. Then, we may express Wn as
s-l
WIn L n-(I"'I-1)/2 D'" H(J.L) [In(Xn - J.L)]'" la! + Rn,s
1"'1=1
Vn,s + Rn,s, say, (6.49)
where Rn,s is a remainder term that, under the moment condition
EIIX1 11 s < 00, satisfies
(6.50)
for some sequence on,s = o(n-(s-2)/2). Here the random variable Vn,s is
called a (s - 2)-th order stochastic approximation to WIn' Under (6.50),
the (s - 2)-th order Edgeworth expansions for WIn and Vn,s coincide. It is
customary to describe the (s - 2)-th order Edgeworth expansion for WIn
using that for Vn,s' Supposing (for the time being) that Xl has sufficiently
many finite moments, the rth cumulant Xr(Vn,s) of Vn,s can be expressed
as
Xr (TT) -
Vn,s = Xr,n,s + 0 (n -(s-2)/2) (6.51)
for 1 ~ r ~ s, where

_ { 2:;:~ n- j / 2Xr,j if 1 ~ r ~ s, r =f. 2


Xr n s - (6.52)
, ,
T
2 + ""s-2
L..Jj=l n -j/2 X2,j
- l'f r =2
for some constants Xr,j, not depending on n. It can be shown that the
constants Xr,j, 1 ~ j ~ s - 2, 1 ~ r ~ s depend only on the moments
EXr for 1 ~ Ivl ~ s and on the partial derivatives D V H(J.L) for Ivl ~ s-1.
The Xr,n,s's in (6.51) are called the approximate cumulants of Vn,s' Thus,
when EIIX 1 11 s < 00, we may formally expand Xr(Vn,s) (pretending that
all moments of Xl are finite) and then extract the approximate cumulants
Xr,n,s for 1 ~ r ~ s, which involves only moments of order s or less. The
Fourier transform of (the density of) the Edgeworth expansion for Vn,s
(and, hence, for WIn) is given by

'ljJ~,lst(t) = exp(-eT 2/2) [1 + ~n-r/2p~ll(tt)] , (6.53)

t E ~, where p~ll (.), ... ,p~1~2(') are polynomials defined by the identity (in
u E~)

1 + ~1
00 [ s ( s-2
~(r!)-1 ~ UjXr,j (ttr
) ] m/ m!
162 6. Second-Order Properties

00

= 1 + :~::>jp1l1(Lt) , (6.54)
j=l

for t ERAs in Section 6.2, the Edgeworth expansion \]!~,ls for Vn,s is the
signed measure having the density (with respect to the Lebesgue measure
on ~)

(27r)-1 J exp( -Ltx)'¢~,ll(t)dt

[1+ ~n-r/2p~11( - d~)]¢r2(X)'


(1 + ~n-r/2p~11(X))¢r2(X)' x E~, (6.55)

say, where p~ll(_ d~) is defined by replacing (Lt)j in the definition of the
polynomial p}ll(it) with the differential operator (-1)j d~j' j ;::: 1, and
where ¢r2(X) = (27r7 2)-1/2exp(-x2/27 2), x E R The following result
of Bhattacharya and Ghosh (1978) shows that \]!~:s is a valid (s - 2)-th
order expansion for WIn, i.e., the error of approximating the probability
distribution of WIn by the signed measure \]!~,ls is of the order o(n-(s-2)/2)
uniformly over classes of sets satisfying an analog of (6.46).

Theorem 6.5 Suppose that {XihEZ is a sequence of iid ~d-valued random


vectors with EIIXlii s < 00 and that H is s-times continuously differentiable
in a neighborhood of J.l = EX l , where s ;::: 3 is an integer. If, in addition,
Xl satisfies the standard Cramer condition (6.31),then
sup Ip(Wln E B) - \]!~:s(B)1 = o(n-(S-2)/2)
BEB

for any collection B of Borel subsets ofIR satisfying (6.46) with d = 1 and
E =7 2 .

Proof: See Theorem 2(b) of Bhattacharya and Ghosh (1978). D

In the literature, the expansion \]!~,ls, defined in terms of the "approxi-


mate cumulants" Xr,n,s of (6.51) and (6.52), is often referred to as the for-
mal Edgeworth expansion of WIn. The seminal work of Bhattacharya and
Ghosh (1978) established validity of this approach of deriving an Edgeworth
expansion for WIn, settling a conjecture of Wallace (1958). They developed
a transformation technique that yielded an alternative valid expansion for
WIn and then showed that the formal expansion coincided with the alter-
native expansion up to terms of order O(n-(s-2)/2). As a result, \]!~,ls gives
a valid (s - 2)-th order Edgeworth expansion for WIn' For related work on
6.4 Expansions for Functions of Sample Means 163

this problem, see Chibishov (1972), Skovgaard (1981), Bhattacharya and


Ghosh (1988), Bai and Rao (1991), and the references therein.

6.4.2 Expansions for Normalized and Studentized


Statistics Under Independence
Note that Theorem 6.5 readily yields an (8 - 2)-th order Edgeworth expan-
sion for the distribution of the normalized (or standardized) version of en,
defined by
(6.56)
Indeed, P(W2n ~ x) = P(Wln ~ TX) = w~,ls((-OO,TX]) +o(n-(s-2)/2)
uniformly in x E R Hence, a valid (8 - 2)-th order Edgeworth expansion
for the distribution function of W 2n is given by w~,ls, with

W~,l8 (( -00, TX])

{Xoo (1 + ~n-r/2p~2l(y)}P(Y)dY, x E lR,

for polynomials p~2l, ... ,P~~2' where by (6.55) and a change of variables, it
easily follows that p~2l (x) = p~ll (TX), x E R
Next consider the case of studentized statistics. It turns out that in the
independent case, we can also apply Theorem 6.5 with a "suitable H" to
obtain an Edgeworth expansion for the studentized version of en,
given by

(6.57)

where f~ == LI<>I=l LII3I=1 D<> H(Xn)D{3 H(Xn)[n- 1 L~=1 (Xi -Xn)<>(Xi-


X n ){3] is an estimator of the asymptotic variance T2 of v'n(e n - B) (cf.
(6.48)). To appreciate why, note that we may express W3n as a smooth
function of the sample mean of the d + d( d + 1) /2-dimensional iid random
vectors Yi, i = 1, ... , n, where the first d components of Yi are given by Xi
and the last d( d+ 1) /2 components are given by the diagonal and the above-
the-diagonal elements of the d x d matrix XiX;' If the function H (defining
en) in (6.47) is 8-times continuously differentiable in a neighborhood of fL =
EX 1 , if EI1Y111s < 00, and if Y1 satisfies the standard Cramer's condition,
then by Theorem 6.4, W 3n has an (8 - 2)-th order Edgeworth expansion of
the form

BEB
sup IP(W3n E B) - r [1 + ~
iB j=l
n-j/2P13l(X)] ¢(X)dxl = o(n-(s-2)/2)

(6.58)
for any collection 13 of Borel subsets of lR satisfying (6.30) with d = 1, where
2
pFl, ... ,p13 2 are polynomials and where, ¢(x) = (27r)-1/2 exp( _x2 /2),
164 6. Second-Order Properties

x E JR is the density of a standard normal random variable. Without addi-


tional parametric distributional assumptions on the Xi'S, the polynomials
. 1 d' iX ' 1s PI[2] , ... , Ps[2]-2 t h at
PI[3] , ... , Ps[3]-2 are typIC a ly 1uerent f
rom thel
po ynomm
appear in the expansion for the normalized version W 2n of en. For an ex-
ample, consider the case when en = X n , the sample mean of a set of n iid
random variables (with d = 1). Then, T2 = (J'2 = Var(Xd, and by (6.16)
and (6.20), a first-order Edgeworth expansion (with s = 3) for W 2n is given
by

P(W2n ::; x) = <l>(x) - 3 (x 2 - l)¢(x)


1;;::: f.-L 3 + o(n- 1 / 2) (6.59)
6 y n(J'

uniformly in x E JR, where f.-L3 = E(X1 - f.-L)3, (J'2 = Var(Xd, and


<l>(x) = f'.oo ¢(y)dy, x E JR is the distribution function of a standard nor-
mal random variable. The corresponding first-order Edgeworth expansion
for the studentized version W 3n of (6.57) for en = Xn is given by (cf. Hall
(1992), page 71-72),

P(W3n ::; x) = <l>(x) + 1;;::: f.-L 3 (2x2 + l)¢(x) + o(n- 1/ 2) , (6.60)


6 y n(J'3
uniformly in x E R Of course, the regularity conditions required for the
validity of the two expansions are different, with the studentized case re-
quiring stronger moment and/or distributional smoothness conditions. The
key observation here is that in the independent case, Edgeworth expansions
for the studentized statistics can be obtained using the same techniques as
those employed for the normalized statistics under the Smooth Function
Model. However, the same is no longer true in the dependent case, as ex-
plained below. For various alternative approaches to deriving expansions for
studentized estimators under independence, see Hall (1987), Gotze (1987),
Helmers (1991), Lahiri (1994), Hall and Wang (2003), and the references
therein.

6·4.3 Expansions for Normalized Statistics Under


Dependence
Next we turn our attention to the case of dependent random vectors.
Let {XdiEZ be a sequence of stationary JRd-valued random vectors with
EX1 = f.-L and let en be an estimator of a parameter of interest () based
on Xl"'" X n , where () and en satisfy the Smooth Function Model formu-
lation (6.47). If the function H is continuously differentiable at f.-L and Xn
satisfies the Central Limit Theorem (cf. Theorem A.8, Appendix A), then

(6.61)

where T! = L:1"1=1 L: 1,61=1 C"c,6I: oo (a, (3), c" = D" H(f.-L)/a! and for lal =
1(31 = 1, a, (3 E Z~, I: 00 (a, (3) == I:(a, (3) = limn--+oo E[v'n(Xn - f.-L)]"+,6 =
6.4 Expansions for Functions of Sample Means 165

LjEZ E(X1 - f-L)a (X Hj - f-L)f3. In the dependent case, a valid (s-2)-th order
Edgeworth expansion (s 2': 3) can be derived for the normalized version

(6.62)

of the estimator en by applying the transformation technique of Bhat-


tacharya and Ghosh (1978) to the (s - 2)-th order Edgeworth expansion
for the centered and scaled mean Sn == fo(Xn - f-L). Indeed, if the con-
ditions of Theorem 6.3 hold and the function H is s-times continuously
differentiable in a neighborhood of f-L, then there exist polynomials q}2J,
r = 1, ... , s - 2 such that

sup iP(W2n :S x) - y~,Js((-OO,x])i = o(n-(s-2)/2) , (6.63)


xElFt

where y~,Js is the signed measure with the Lebesgue density

n-2
~~Js(x) = 1>(x) +L n-r/2q~2J(x)¢(x), x E JR .
r=l
As mentioned in Section 6.3, under the stationarity of the process {XdiEZ'
the vth cumulant Xv,n of Sn for v E zt,
2 :S Ivl :S s, may be expressed in
the form (cf. (6.42))

-
Xv,l,oo + n -1/2 Xv,2,oo
- + . . . + n -(s-2)/2-Xv,s-l,oo
+ o(n-(S-2)/2) as n ----> 00 (6.64)

for some Xv,j,oo E R The coefficients of the polynomials q~2J, ... ,ql~2 are
smooth functions of the partial derivatives D V H(f-L) , Ivl :S s -1, and of the
constants Xv,j,oo, 1 :S j :S s - 1,2 :S Ivl :S s, appearing in (6.64).
Although under the stationarity assumption on the process {XdiEZ' it
is possible to describe the Edgeworth expansion of W2n in terms of the
polynomials q}2J that do not depend on n, in practice one may group some
of these terms together to describe the Edgeworth expansion in terms of
the moments (or cumulants) of the centered and scaled sample mean Sn
directly. For example, a first-order Edgeworth expansion for P(W2n :S x)
(with s = 3) is given by

(6.65)

x E JR, where the constants K31 and K32 are given by K31 == K31n =
Llal=2 caES~ /Tn and K32 == K32n = [foE(Llal=l caS~)3 - 3T~K31 +
3E{ (Llal=l CaS~)2(Llal=2 CaS~)}l/(6T~). Here, T~ = Var(Llal=l caS~)
and Ca = DaH(f-L)/o;!, 0; E zt.
166 6. Second-Order Properties

The expansion y~l3 of (6.65) may be further simplified and rewritten


in the form (6.63). We also point out that Y~,l3 also gives the first-order
Edgeworth expansion of the alternative normalized version of en,

where the limiting standard deviation 7= is replaced by 7 n. This follows


by noting that, under the condition of Theorem 6.3 with s = 3, 7! - 7~ =
O(n- 1 ), and hence, the effect ofreplacing 7= by 7 n is only O(n- 1 ), which
is negligible for a first-order Edgeworth expansion.

6.4.4 Expansions for Studentized Statistics Under


Dependence
Next, we consider the studentized case. Under weak dependence, the
asymptotic variance of ,jTi(en - B) is given by (cf. (6.61))

7! = L COV(Y1, 1j+d ,
jEZ

where we write Yj = LI<>I=l co:(Xj-p)O:, j E Z. Since 7! is an infinite sum


of lag covariances, a studentizing factor must estimate an unbounded num-
ber of lag-covariances, as the sample size n increases. A class of estimators
of 7! (cf. Gotze and Kiinsch (1996)) is given by

(£-1)
f~ = L Wkn[h(Xn)'fn(k)h(Xn )] (6.66)
k=O
n-£ - -.
where r n(k) = n- 1 Lj=l (Xj - Xn)(Xj+k - Xn)', h IS the d x 1 vector of
A

first-order partial derivatives of H, and Wkn'S are lag weights, with WOn = 1
and Wkn = 2w( k / C), 1 :::; k :::; C-1 for some continuous function W : [0, 1) --+
[0, 1] with W (0) = 1. If C --+ CXJ and n / C --+ CXJ as n --+ CXJ, then f~ is consistent
for 7!. We define the studentized version of en as

(6.67)

which has a standard normal distribution, asymptotically. In contrast to


the case of studentized statistics under independence, Edgeworth expan-
sions for W3n cannot be directly obtained from the Edgeworth expan-
sion theory described above. This is because W3n is a (smooth) func-
tion of an unbounded number of sample means, while the classical the-
ory deals mainly with sample means of a fixed finite dimension. Re-
cently, first-order Edgeworth expansions for studentized statistics of the
form W3n have been independently derived by Gotze and Kiinsch (1996)
6.4 Expansions for Functions of Sample Means 167

and Lahiri (1996a). While Gotze and Kiinsch (1996) considered studen-
tized statistics under the Smooth Function Model (6.47), Lahiri (1996a)
considered studentized versions of M-estimators of the regression param-
eters in a multiple linear regression model. Here we follow Gotze and
Kiinsch (1996) to describe the Edgeworth expansion result for W3n . Re-
call the notation Yn = In
L7=1 1j = LIc>I=l cc>S~, Sn = 'L7=1 (Xj -In
fL), and T~ = n-1Var(L~=1 Yi). Let Tfn = 'L~:~ wknEY1Yl+k, 1Tn =
n- 1 L~=l L7=1 L~:~ Wkn E (Yi1j1j+k), and fL3,n = n 2E(Yn)3. Also, let
3 n denote the variance matrix of the (d + 1) x 1 dimensional vector
W4n == (vnYn; S~)' and let a'Y's be constants defined by the identity

(2Tn )-1 L DC> H(fL)S~


1"1=2
- T;;-3 { y'nYn }S~ [D2 H(fL)~ooh(fL)] ,

where D2 H(fL) is the d x d matrix of second-order partial derivatives of H


at fL, ~oo == ~ = limn-->oo Var(Sn) (cf. Condition (C.2)), and h(fL) is the
d x 1 vector of first-order partial derivatives of H at fL. Note that in the
left-hand side of the identity, the index "( E Z~+l, while on the right-hand
side, the index a E Z~. With this, we define the first-order Edgeworth
expansion Y ~,13 of W3n in terms of its Fourier transform

~[31t (t)
n,3 Jexp(d'x)dY~,13(x)
1 + -1 . -1 [(fL3n
- - -1Tn) (d) 3 - (d)1T n ]
vn T~ 6 2
-
2
- exp(-t 2 /2)

+ ~(d)
vn t
'Y
a'Y( -l)hID'Y exp( -w' 3 n w/2) I _
w-(t,o, ... ,O)

(6.68)

Then, we have the following result due to Gotze and Kiinsch (1996) on
Edgeworth expansion for the studentized statistic W3n under dependence.

Theorem 6.6 Suppose that Condition (5.Dr) of Section 5.4 on the func-
tion H holds with r = 3, Llal=l!D"H(fL)! i= 0, and that E!!X1!!P+" < 00
for some J > 0 and p 2: 8, pEN. Furthermore, suppose that

logn« I!:S; n 1 / 3 (6.69)

and that Conditions (C.2)-(C.6) of Section 6.3 hold. Then,

~~~IP(W3n :s;x)-y~,13((-00'x])1 =O(l!n-1+[2/pl+IT~-T?nl)· (6.70)


168 6. Second-Order Properties

Proof: See relations (6) and (7) and Theorem 4.1 of G6tze and Kiinsch
(1996). 0

Note that under the conditions of Theorem 6.6, the second term IT; -Tfn I
on the right side of (6.70) is o(n- 1 / 2 ) if the weight function w(x) == 1 for
all x E [0,1). A drawback of this choice of the weight function is that it
does not guarantee that the estimator f; of the asymptotic variance T! is
always nonnegative. However, under the regularity conditions of Theorem
6.6, the event {r; ::; O} has a negligible probability and it does not affect the
rate of approximation 0(£n-1+ 2 / p ) of the first-order Edgeworth expansion
T~,13((-oo,x]) to P(W3n ::; x). Another class of popular weights are given
by functions w(·) that satisfy w(x) = 1 + 0(x 2) as x --+ 0+. For such
weights, IT; - Tfnl = 0(£-2) and thus, in such cases, £ must grow at a
faster rate than n 1 / 4 to yield an error of o(n- 1 / 2 ) in (6.70).

6.5 Second-Order Properties of Block Bootstrap


Methods
In this section, we establish second-order correctness of block boot-
strap methods under the Smooth Function Model (6.47). Accordingly, let
{Xj}jEZ be a sequence of JRd-valued stationary random vectors and let
() and On be as given by () = H(J.L) , On = H(Xn ), where J.L = EX!,
Xn = n- 1 L~=l Xi, and H : JRd --+ JR is a smooth function. Also, let
W 2n be the normalized version of On and W3n be the studentized version
of On, given by (6.62) and (6.67), respectively. Then, W2n and W3n are
asymptotically pivotal quantities for the parameter (), in the sense that the
limit distributions of W 2n and W 3n are free of parameters. Block bootstrap
methods applied to these pivotal quantities are second-order correct. The
bootstrap estimators of the distribution functions of W2n and W 3n not only
capture the limiting standard normal distribution function, but also capture
the next smaller order terms (viz., terms of order n- 1 / 2 ) in the Edgeworth
expansions of W2n and W3n . As a result, for such pivotal quantities, the
bootstrap distribution function estimators outperform the normal approx-
imation and are second-order correct. As indicated in Section 6.1, this can
be easily shown by comparing the Edgeworth expansions of Wkn 's and their
bootstrap versions Wk'n' k = 2,3. First we consider the normalized statistic
W2n and the bootstrap approximation generated by the MBB method. Let
X~ denote the MBB sample mean based on a random sample of b = lnj£J
blocks from the collection of overlapping blocks {Bi : 1 ::; i ::; N} of length
£, where, recall that, Bi = (Xi, . .. ,XHl- 1 ), 1 ::; i ::; N, and N = n - £ + l.
Then, the MBB version of W2n is given by

(6.71)
6.5 Second-Order Properties of Block Bootstrap Methods 169

where n1 = b.e, ()~ = H(X~), and, with iln == E*(X~), On = H(iln) and T~ =
n1·Var*(LI",I=1 D'" H(iln)(X~)"'). Note that conditional on Xl' ... ' X n, X~
is the average of a collection of b iid random vectors. Hence, an expansion
for W2'n may be derived using the Edgeworth expansion theory of Sections
6.2 and 6.4 for independent random vectors. The exact form of the first-
order Edgeworth expansion for W2'n is given by

where, with C'" = D"'H(iln)/a!, a E Zi, and S~ = yInl(X~ - iln), the


coefficients JC 31 and JC 32 are defined as

JC31 == JC31n (.e) = L c",E*(S~)"'/Tn,


1"'1=2

and

[y'nE* ( L c",(S~)"') 3
1"'1=1

+3E*{( L c",(S~)"')\ L c",(S~)"')}


1"'1=1 1"'1=2
- 3TnJC31] / (6T~) .

The following result establishes second-order correctness of the MBB for


the normalized statistic W2n .

Theorem 6.7 Suppose that {XihEZ is stationary, Conditions (C.2)-(C.6)


hold and EIIXl 11 35 +O < 00 for some 8 > O. Furthermore, suppose that
Condition (5. Dr ) of Section 5.4 on the function H holds with r = 4 and
that the block length .e satisfies

(6.73)

for all n ~ c 1 , for some f E (0,1). Then,


(a) as n --+ 00,

(b) as n --+ 00,

sup Ip*(W2'n ::; x) - P(W2n ::;


xEIR
x)1 = Op(n- 1.e + n- 1/ 2.e- 1) . (6.74)
170 6. Second-Order Properties

Proof: Part (a) is an easy consequence of Lemma 5.6 of Lahiri (1996d),


who also obtains a bound on the MSE of the MBB distribution function
estimator P* (W;n ~ .). As for part (b), note that under the regularity
conditions of Theorem 6.7,

sup Ip(W2n
xER
~ x) - Y~,J3(( -00, xl) 1= 0(n-1) .
Hence, by part (a) and (6.65),

sup Ip*(W;n ~ x) - P(W2n ~ x)1


xER
sup It~J3(( -00, xl) - y~J3 (( -00, xl) I + Op(n-1e)
xER ' ,
(6.75)
To complete the proof of part (b), without loss of generality, we set t-t = o.
Then, it is easy to check that JC 31 - J(31 is a smooth function of the centered
bootstrap moments {(E*(Ui1Y - E(UuY) : JvJ = 1,2} and (JC 32 - JC 32 )
is a smooth function of {Vl(E*(Ui1Y - E(U11 )) : JvJ = 3} U {(E*(Ui1)V-
E(U11 )) : JvJ = 1,2}, where Uu = (Xl + ... + Xe)/Vl and Ui1 = (Xi +
... + X;)/Vl. The rate of error in (6.75) is determined by the first set of
terms {Vl(E*(Ui1r - EU11 ) : JvJ = 3}, whose root-mean-squared-error is
bounded by

max {eEIE*(Ui1t - E(U11 )1 2 }1/2 + max IVCE(U11 ) - VnE(S~)I·


Ivl=3 Ivl=3

The first term is of the order 0(n- 1/ 2 e), by Lemma 3.1. It is easy to check
that the second term is of the order 0(e- 1 ). This completes the proof of
Theorem 6.7. 0

Theorem 6.7 shows that the MBB approximation to the distribution of


the normalized statistic W2n is more accurate than the normal approxi-
mation, which has an error of 0(n- 1 / 2 ). Thus, like the IID bootstrap for
independent data, the MBB also outperforms the normal approximation
under dependence.
A proof of this fact, with the right side of (6.74) replaced by "o(n- 1 / 2 )
a.s.," was first given in Lahiri (1991, 1992a). The second-order analysis of
Lahiri (1991, 1992a) also show that for the MBB, the correct centering
for the bootstrapped estimator ()~ = H(X~) is On = H(fln), not the more
naive choice On = H(Xn). Indeed, if ()~ is centered at On and we define the
bootstrap version of W2n as
(6.76)
then, the error of approximation, sUPx JP*(W;';: ~ x) - P(W2n ~ x)J goes
to zero precisely at the rate n- 1 / 2 e1 / 2 , in probability. As a result, centering
6.5 Second-Order Properties of Block Bootstrap Methods 171

()~ at en yields an approximation that is worse than the normal approxi-


mation. This problem does not occur with the IID bootstrap method for
independent data as the conditional expected value of X~ is Xn-
A second and more important difference of the MBB with the IID boot-
strap is that the rate of MBB approximation depends on the block length
and is typically worse than Op(n- 1 ). Indeed, compared to the IID bootstrap
of Efron (1979) for independent data, where the error of approximation is
of the order Op(n- 1 ) (cf. Section 2.2), the best possible rate of MBB ap-
proximation for distribution function estimation is only O(n- 3 / 4 ), which is
attained by blocks of length £ of the order n 1/4.
Next, we consider the MBB approximation to the distribution of the
studentized statistic W3n . Here, we follow Gotze and Kiinsch (1996) to
define the bootstrap version of W3n, although other alternative definitions
of the bootstrap version of W3n are possible (cf. Lahiri (1996a)). Recall that
Uri = (X Ci - 1 )Hl + .. .+Xte)/VR denotes the sum of the ith resampled MBB
block scaled by £-1/2, i = 1, ... ,b and that UrI' ... , Urb are conditionally
iid with the common distribution

where U1i = (X i +·· .+XiH-d/VR and b = In/£J. To define the bootstrap


version of the studentizing factor for yInl(()~ - en), note that by Taylor's
approximation, the linear part of yInl(()~ - en) is

L~ LC"[vnl(X~-Pnr']
1"1=1
b
b- 1 / 2 L {L C" (U~i - Pn Vcr'}
i=1 1<>1=1
b
b- 1 / 2 LYl~' say,
i=1
where C" = D" H(Pn)/a!, a E Zi. Hence, Var*(L~) = Var*(Ytl). This
suggests that an estimator of the conditional variance Var *(Ytl) is given
by the "sample variance" of the iid random variables Ytl' ... ,Ytb. Hence,
with Y"tb = b- 1 I:~=1 Yl~' we define
b
*2 =
Tn
b-1 "(y*
~ 1z
_ y*)2
Ib , (6.77)
i=1
as an "estimator" of Var*(Ytl) and define the bootstrap version of the
studentized statistic W3n as
172 6. Second-Order Properties

Gotze and Kiinsch (1996) suggested setting the MBB block length £ to be
equal to the smoothing parameter £ in the definition of the studentizing
factor f~ (cf. (6.66)). However, as they pointed out, second-order correct-
ness of the MBB approximation holds for other choices of the block length £
satisfying (6.69). See the last paragraph on page 1217 or Gotze and Kiinsch
(1996). For notational simplicity, we suppose that the block size parame-
ter £ and the lag-window parameter £ in (6.66) are equal. With this, we
now define the first-order Edgeworth expansion y~l3 of W3'n in terms of its
Fourier transformation (cf. (6.68)) .

lt
n,3 (t)
€[3 J exp( d ' x )dY ~,l3 (x)

M3
[1 + -'_- n { 1 3 I}] exp( -t
- -(Lt) - -(d) 2 /2)
ylnT~ 3 2

+ ~(d) ta,(-l)hID' exp(-w' Bn w/2) I _ '


yin , w-(t,o, ... ,O)

(6.78)

where M3,n = £1/2E*(Ytl)3, T~ = E*(Ytl)2, Bn = Var*((Ytl,Ui~)'), and


aa's are defined in analogy to the aa's of (6.68), with J.L replaced by Mn.
The following result establishes second-order correctness of the MBB in
the studentized case.

Theorem 6.8 Suppose that {XihEZ is stationary, Conditions (C.2)-(C.6)


hold, and Ellx11lqpH < 00 for some r5 > 0, and for some integers q 2: 3,
p 2: 8. Also, suppose that Condition (5. Dr ) of Section 5.4 holds with r = 3
and that £ satisfies (6.73). Then,
(a) as n ----+ 00,

(b) as n ----+ 00,

~~~ Ip* (W;n :::; x) -P(W3n :::; x)1 = Op (n-1+2/P£+n-l/2g-1 + IT~ -Trnl)
Proof: See Theorems 4.1 and 4.2 of Gotze and Kiinsch (1996). D

As in the case of the normalized statistic W2n , under additional moment


conditions, the rate of approximation in part (a) of Theorem 6.8 can be
shown to be Op(n- 1 £+n- 1/ 2£-1) (cf. Lahiri (2003c)). In particular, the rate
of MBB approximation in the studentized case also depends on the block
length. For second-order correctness, not only is the choice of £ (which now
represents the block length and, also, the smoothing parameter appearing
6.5 Second-Order Properties of Block Bootstrap Methods 173

in the definition of the studentizing factor f~) important, but also is the
choice of the weight function w(·). Lahiri (1996a) considers the case where
the weight function w(·) == 1 and employs a different definition of the
bootstrap studentized statistic to establish second-order correctness of the
MBB for M-estimators in a multiple linear regression model. Relative merits
of the two approaches are not clear at this stage.
Second-order correctness of the NBB and the CBB, which are also based
on independent resampling of blocks of a nonrandom length, can be es-
tablished using arguments similar to those used in the proofs of Theorems
6.7 and 6.8. See Hall, Horowitz and Jing (1995) and Politis and Romano
(1992a) for a proof in the normalized case for the NBB and the CBB,
respectively. As for the SB, Lahiri (1999c) developed some iterated condi-
tioning argument to deal with the random block lengths in the SB method
and established second-order correctness of the SB method for studentized
statistics. For second and higher order investigations into the properties
of bootstrap methods for some popular classes of estimators in Economet-
rics (e.g., the "Generalized Method of Moments" estimators), see Hall and
Horowitz (1996), Inoue and Shintani (2001), Andrews (2002), , and the
references therein.
7
Empirical Choice of the Block Size

7 .1 Introduction
As we have seen in the earlier chapters, performance of block bootstrap
methods critically depends on the block size. In this chapter, we describe
the theoretical optimal block lengths for the estimation of various level-2
parameters and discuss the problem of choosing the optimal block sizes
empirically. For definiteness, we restrict attention to the MBB method.
Analogs of the block size estimation methods presented here can be de-
fined for other block bootstrap methods. In Section 7.2, we describe the
forms of the MSE-optimal block lengths for estimating the variance and the
distribution function. In Section 7.3, we present a data-based method for
choosing the optimal block length based on the subsampling method. This
is based on the work of Hall, Horowitz and Jing (1995). A second method
based on the Jackknife-After-Bootstrap (JAB) method is presented in Sec-
tion 7.4. Numerical results on finite sample performance of these optimal
block length selection rules are also given in the respective sections.

7.2 Theoretical Optimal Block Lengths


Let (Xl"'" Xn) = Xn denote a finite stretch of random variables, observed
from a stationary weakly dependent process {XihEZ in ]Rd. Let On be an
estimator of a level-1 parameter of interest () E ]R, based on X n . In this
section, we obtain expansions for the MSEs of block bootstrap estimators
176 7. Empirical Choice of the Block Size

for various characteristics of the distribution of On. Let G n denote the


distribution of the centered estimator (On - 0), i.e.,

(7.1)

The level-2 parameters of interest here are given by

'PIn = Bias(On) = / xdGn(x) (7.2)

'P2n = Var(On) = / x 2dGn(x) - ( / xdGn(x)r (7.3)

(7.4)

fo(On-O)1 ) (YOTn) (-YOTn )


'P4n = 'P4n(YO) == P ( l Tn ::; Yo = G n fo - Gn fo '
(7.5)
where Xo E JB; and Yo E (0,00) are given real numbers and where T~ is the
asymptotic variance of fo(On - 0). Here, 'PIn and 'P2n are, respectively,
the bias and the variance of the estimator On, 'P3n denotes the (one-sided)
distribution function of fo(On -0) at a given point Xo E JB;, and 'P4n denotes
the two-sided distribution function of fo(On -0) at Yo E (0, (0). The latter
is useful for constructing symmetric confidence intervals for 0 (cf. Hall
(1992)). Next, for k = 1,2,3,4, let tPkn (C) denote the MBB estimators ofthe
level-2 parameter 'Pkn based on blocks of length C. We define the theoretical
optimal block length C~n as the minimizer of the MSE of tPkn(C) over a set
of values of the block size C, depending on k = 1,2,3,4. Specifically, we
define

C~n = argmin{ MSE(tPkn(C)) : mE < C < E- 1 n 1 / 2- E } , k = 1,2 (7.6)

C~n = argmin{MSE(tPkn(C)) : mE::; C::; E- 1 n 1 / 3- E }, k = 3,4 (7.7)

for some small E > 0. It will follow from the arguments and results below
that the theoretical optimal block length C~n is of the order n 1 / 3 for the
bias and the variance functionals (with k = 1,2), while the order of C~n
for the one- and the two-sided distribution functions, with k = 3 and
k = 4, are of the orders n 1/4 and n 1/5, respectively. Thus, the ranges
[mE, c 1 n 1/ 2 - E ] and [mE, c 1 n 1 / 3 - E ] of block lengths C in (7.6) and (7.7),
respectively, contain the optimal block lengths C~n for all k = 1,2,3,4.
Indeed, it can be shown that under some additional regularity conditions,
the theoretical optimal block lengths C~n have the same order even when
the ranges of C values in (7.6) and (7.7) are replaced by the larger interval
[m', c 1n 1 -,] for an arbitrarily small E E (0,1). However, we will restrict
7.2 Theoretical Optimal Block Lengths 177

e
attention to the range of values specified by (7.6) and (7.7) and will not
pursue such generalizations here.
For deriving expansions for the MSEs of the block bootstrap estimators
CPkn(e)'S, k = 1,2,3,4, we shall suppose that the level-1 parameter () and
its estimator On satisfy the requirements of the Smooth Function Model
(cf. Section 4.2). Thus, there exists a function H : lR d --t lR such that
(7.8)
and the function H is "smooth" in a neighborhood of j.L, where j.L = EX1
and Xn = n- 1 2:~=1 Xi. Recall that we write Co = DO H(j.L)ja!, DO for the
.a:. aal +.,,+ad
d1llerentml operator ax"l ... ax"d and a.
'TId,
= i=l ai· for a = (0.1,·.·, ad )' E
1 d

Zi·
7.2.1 Optimal Block Lengths for Bias and Variance
Estimation
Expansions of the MSEs of the MBB estimators of the bias and the variance
of the estimator On under the Smooth Function Model (7.8) was given in
Chapter 5. Here, we recast the relevant results in a slightly different form
by expressing relevant population quantities in the time domain. Let Zoo
be a d-dimensional Gaussian random vector with mean zero and covariance
matrix :Eoo = 2:;:-00 E{(X1 - j.L)(X1+j - j.L)'}.

Theorem 7.1 Suppose that e- 1 + n- 1 / 2 e= 0(1) as n --t 00.

(a) Suppose that Conditions (5.D r ) and (5.Mr ) of Section 5.4 hold with
r = 3 and r = 3 + ao, respectively, where ao is as specified by (5.D r ).
Then

[(n-1e)~Var( L coZ~ ) + e- 2 Af]


101=2
+ 0(n- 1e+ g-2) , (7.9)
where

A1 =- L L
101=11J31=1
Co+J3 [ f
j=-oo
IjIE(X1 - j.L)O(X1+j - j.L)J3] .

(b) Suppose that Conditions (5.D r ) and (5.Mr ) of Section 5.4 hold with
r = 2 and r = 4 + 2ao, respectively, where ao is as specified by Con-

r)
dition (5.D r ). Then,

[(n-1e)~Var( ( L coZ~ +e-2A~l


101=1
+ 0(n- 1e+ g-2) , (7.10)
178 7. Empirical Choice of the Block Size

where

A2 =- L L CaCf3 [
lal=II,BI=1
f
j=-oo
IjIE(XI - J1.)a(X1+j - J1.)f3] .

Proof: Follows from the proofs of Theorems 5.1 and 5.2 for the case
'j = l' (corresponding to the MBB estimators). 0

Note that under the regularity conditions of Theorem 7.1, both the bias
and the variance of the estimator On are of the order O(n- I ). Hence, we
state the MSEs of the scaled bootstrap bias estimator n· (PIn(l) and of
the scaled bootstrap variance estimator n . CP2n(l), in Theorem 7.1. Al-
ternatively, we may think of the scaled bootstrap estimators n . CPkn(l)
as estimators of the limiting level-2 parameters 'Pk,oo == limn---+oo n . 'Pkn,
k = 1,2, given by

'PI,oo L L
lal=II,BI=1
Ca+,B [ f
j=-oo
E(XI - J1.)a(X1+j - J1.),B]

and

'P2,oo = L L
lal=II,BI=1
cac,B [ f
j=-oo
E(XI - J1.)a(X1+j - J1.),B] .

Theorem 7.1 immediately yields expressions for the leading terms of the
theoretical optimal block lengths for bias and variance estimation. We note
these down in the following corollary.
Corollary 7.1 Suppose that the respective set of conditions of Theorem 7.1
hold for the bias functional (k = 1) and the variance functional (k = 2),
and that the constants Al and A2 are nonzero. Then, for k = 1,2,

19n = nl/3(2AVv~)1/3 + o(n l / 3) , (7.11)

where v? = ~Var(I:lal=2 caZ~J and v~ = ~Var([I:lal=1 caz~J2).


Kiinsch (1989) derived the leading term of the theoretical optional block
length for the variance functional while Hall, Horowitz and Jing (1995)
derived the leading terms for both the bias and the variance functionals
'PIn and 'P2n· The conclusions of Corollary 7.1 can be strengthened to
some extent. A more detailed analysis of the remainder term in the proof
of Theorem 7.1 can be used to show that under some additional smoothness
and moment conditions, the o(n l / 3 ) term on the right side (7.11) is indeed
0(1) as n ~ 00, for both k = 1 and k = 2. Thus, the fluctuations of
the true optimal block length from its leading term is bounded for both
bias and variance functionals. In the next section, we consider theoretical
optimal block lengths for the estimation of distribution functions.
7.2 Theoretical Optimal Block Lengths 179

7.2.2 Optimal Block Lengths for Distribution Function


Estimation
First we consider the one-sided distribution function 'P3n of (7.4), given by

for a given value Xo E JR. Hall, Horowitz and Jing (1995) consider both
the NBB and the MBB estimators of 'P3n and derive expansions for the
MSEs in the case of the sample mean, i.e., in the case where en = Xn
and () = EX1. An expansion for the MSE of the MBB estimator (f!Jn(f)
(say) of 'P3n is obtained by Lahiri (1996d) under the Smooth Function
Model (7.8). Here we follow the exposition of Lahiri (1996d) and describe
an expansion for MSE (<P3n(f)) under the framework of G6tze and Hipp
(1983), introduced in Chapter 6. Suppose that {XihEZ is defined on a
probability space (O,F,P), {XihEZ is stationary, and that {VihEZ is a
given sequence of sub-a-fields of F. For -00 ::; a ::; b ::; 00, let V~ denote
the smallest a-field containing {Vi: i E [a, b] nil}. For easy reference, we
now restate some of the conditions from Section 6.3, under the stationarity
assumption on the process {XihEZ,

(C.1) There exists 8 E (0,1) such that for all n, m = 1,2, ... with m > 8-1,
there exists a V~~:-measurable random vector X~,m satisfying

(C.2) There exists 8 E (0,1) such that for all i E Il, mEN, A E V~oo, and
BE V'tt-m'

Ip(A n B) - P(A)P(B)I ::; 8- 1 exp( -8m) .

(C.3) There exists 8 E (0,1) such that for all m, n, k = 1,2, ... , and A E
n +k
V n-k

Elp(A IVj : j of n) - P(A IVj : °< Ij - nl


::; m + k)1 ::; 8- 1 exp( -8m) .

(C.4) There exists 8 E (0,1) such that for all m, n = 1,2, ... with 8- 1 <
m < n, and for all t E JRd with Iltll ~ 8,

EIE{ exp(it'[Xn - m + ... + X n +m ]) IVj : j of n} I ::; 8- 1 exp( -8m) .

(C.5) EIIXI!I35+8 < 00 for some 8 E (0,1).


180 7. Empirical Choice of the Block Size

Conditions (C.1)-(C.4) are restatements of Conditions (6.C.3)-(6.C.6) from


Chapter 6, respectively. For a discussion of these conditions, see Chapter
6. We do not state Condition (6.C.2) separately here, as it follows from the
conditional Cramer Condition (C.4) and the stationarity of {XihEZ, The
moment Condition (C.5) is rather stringent. Lahiri (1996b) used this condi-
tion to prove negligibility of the remainder terms in the second-order Edge-
worth expansion of the bootstrap distribution function estimator <P3n U~) in
the L2-norm.
The following result gives an expansion for the MSE of <P3n (£) ==
<P3n(XO; £) for a given Xo E R
Theorem 7.2 Assume that Conditions {C.l}-{C.5} hold and that the
smoothness Condition {5.D r } of Section 5.4 on the function H holds with
r = 4. Also, suppose that for some E E (0,3),

(7.12)

Then, there exist constants V31,V32 E (0,00) and B31,B32 E; lR such that
for Ixol ¥= 1,

[(x~ - l)¢(xo)] 2 V~l . n- 2£2

r
MSE( <P3n (xo; £) )

+ [¢(XO){B31 + B32(X~ -1)} n- 1.e- 2

+ o(n- 2£2 + n- 1£-2) , (7.13)

and for Ixol = 1,

MSE( <P3n (xo; £) )


(7.14)

Proof: See Lahiri (1996d). o

From the Edgeworth expansion results of Chapter 6 (cf. Theorem 6.7),


it follows that

<P3n(XO;£) = lJ>(xo) - n- 1/ 2{ K31(£) + (x~ -1)K32(£) }¢(xo)


+ Op(n-l) ,

where K3i (£) == K 3in (£), i = 1, 2 are smooth functions of certain bootstrap
moments. For Ixol ¥= 1, the leading term of the variance of <P3n(XO; £) comes
from the variance of the dominant term n-l/2(x~ -1)K32(£), which is of the
order (n- 1/ 2)2.n -l£2. In contrast, for Ixol = 1, the term n-l/2(x~-1)K31(£)
is zero and in this case, the leading term in the variance of <P3n(XO; £) is
given by the variance of n- 1/ 2K 31 (£), which is of the order (n- 1/ 2)2 . n- 1£.
7.2 Theoretical Optimal Block Lengths 181

On the other hand, the contribution to the bias of tP3n (xo; £) comes from
both !C 31 (£) and !C 32 (£), each having a bias of the order £-1. This explains
the sources of the various terms in the expansions for MSE( tP3n (xo; £))
in (7.13) and (7.14). The exact forms of the population quantities V31,
V32, B 31 , and B32 are very complicated, and hence are not presented here.
Interested readers are referred to Lahiri (1996d) for explicit expressions for
these parameters. Interestingly, neither of the two empirical methods, that
we describe in Sections 7.3 and 7.4 below for data-based selection of the
optimal block sizes, requires explicit definitions of these parameters.
Theorem 7.2 readily yields the following asymptotic expressions for the
optimal block lengths for estimating 'P3n(XO)'

Corollary 7.2 Assume that the conditions of Theorem 7.2 hold. Then, for
Ixol =1= 1,

n1/4 [{ B31 + (X6 _ 1)B32 } 2/ {(X6 _ 1)V31 } 2] 1/4

+ o(n 1/4 ) (7.15)

and for Ixol = 1,


1/3
£03n =- £03n (x 0 ) = n1/3 [2B231 /v 32
2] + o(n1/3) . (7.16)

Thus, the optimal block length for estimating the distribution function
of the normalized version of en is of the order n 1/4 at any given point
Xo E lR, Ixol =1= 1. For Ixol = 1, the optimal order is n 1/3 , the same as that
for estimating the bias and variance parameters 'P1n and 'P2n (cf. (7.11)).
Relations (7.15) and (7.16) give optional block lengths for local estimation of
the distribution function of the pivotal quantity fo( en -0) / Tn. The optimal
block length for global estimation of the distribution function 'P3n(-) ==
P( fo( en - 0) / Tn S .) can be obtained by minimizing an expansion for the
(weighted) mean integrated squared error (MISE) of tP3n (. ). An integration
of the expansions (7.13) and (7.14) yields

MISE (tP3n('; £)) E J [tP3n(X;£) - 'P3n(X;£)f wo(x)dx


v~3n-2p + B~3n-1r2 + o(n- 2£2 + n- 1£-2) ,
(7.17)

where woO : lR -+ (0,00) is a nonnegative weight function with


J wo(x)dx E (0, (0) and where V53 = V51 J(x 2 - 1)2¢>(x)2wo(x)dx and
B53 = J ¢>(x)2 [B31 + B32(X 2 -l)j2wo(x)dx. Hence, the global optimal block
length, defined as

(7.18)
182 7. Empirical Choice of the Block Size

for a given E E (0, ~), is given by

o _ 1/4 [2 2] 1/4
1'3n ,global - n B33/V33 + o(n 1/4 ). (7.19)

Next, consider the two-sided distribution function 'P4n(XO) == P(lvn(e n -


O)/Tnl S xo), Xo E (0,00). Hall, Horowitz and Jing(1995) give an expansion
of the MSE of the MBB estimator tP4n (xo; 1') for the case where en = Xn
and 0 = EX1 . In this case, they show that the optimal block length for
estimating the level-2 parameter 'P4n(XO) is of the form
- 1'04n (x 0 ) = n 1/ 5c,0 (x 0 ) + o(n+ 1/ 5)
1'04n = (7.20)
for some constant Co(xo) E (0,00). We refer the interested reader to Hall,
Horowitz and Jing (1995) for further details in the two-sided case. Thus, one
needs to use blocks of a smaller order (viz., n 1 / 5 ) for optimal estimation of
probabilities assigned to symmetric intervals than those in the asymmetric
case.
As pointed out in Chapter 1, the MSE and the optimal block length 1'0
are population-parameters that are determined by the sampling distribu-
tions of the bootstrap estimators of a level-2 parameter, and therefore, may
be regarded as level-3 parameters. Thus, a general approach to the estima-
tion of MSE(lf'n(1')) and 1'0 is to apply two rounds of resampling methods
iteratively. In Sections 7.3 and 7.4, we describe two such general meth-
ods, proposed by Hall, Horowitz and Jing (1995) and Lahiri, Furukawa and
Lee (2003), respectively. The method proposed by Hall, Horowitz and Jing
(1995) uses a combination of subs amp ling and bootstrapping the data. The
other method, proposed by Lahiri, Furukawa and Lee (2003), is based on the
Jackknife-After-Bootstrap method and it uses a combination of jackknifing
and bootstrapping the data. In the same vein, one may use two rounds of
block bootstrapping to estimate the level-3 parameters MSE( If'n (1')) and
1'0, although properties of this third alternative remain unexplored at this
time. Estimation methods tailored to estimate the optimal block size for a
specific functional are also known. For the case of the variance functional,
Biihlmann and Kiinsch (1999b) propose some novel plug-in estimators of
the optional block length for block bootstrap variance estimation and es-
tablish their convergence rates. For a more direct plug-in method in the
variance functional case, see Politis and White (2003). They employ the
"flat-top" kernel method of Politis and Romano (1995) to estimate the rel-
evant population parameters in the leading term of the optimal block size
given by Corollary 7.1 above.

7.3 A Method Based on Subsampling


In this section, we describe the Hall, Horowitz and Jing (1995) method for
choosing the theoretical optimal block size. For concreteness, suppose that
7.3 A Method Based on Subsampling 183

CPn(£) denotes the MBB estimator of the level-2 parameter 'Pn, based on
blocks of length £, where n is the sample size. Furthermore, suppose that
the MSE of CPn (£) admits an expansion of the form

MSE(CPn(£)) = an [Ctn - 1£r + C2£-2] (1 + 0(1)) as n --+ 00 (7.21)

for some constants C 1,C2 E (0,00), r E N, and for some sequence {an}n~1
of positive real numbers, over a suitable set In C N of block sizes. We shall
1 +
assume that the set In contains the set [n r+2 -€, n r+2 €] for some small
1

E E (0,1). Next, define the optimal block length £~ by

£~ == argmin{ MSE(CPn(£)) :l E In} . (7.22)

Note that by (7.21) and (7.22), the optimal block length £~ is of the order
1
n r + 2 • To define the Hall, Horowitz and Jing (1995) estimator of the theo-
retical optimal block length £~, we proceed as follows. Let m == mn be a
sequence of real numbers satisfying

(7.23)

Consider the subsamples Xi,m == (Xi"'" X Hm - 1), i = 1, ... , n - m + 1


of length m. Let 'Pm denote the level-2 parameter 'Pn at n = m. For each
i = 1, ... , n - m + 1, let CPm,i(£) be the MBB estimator of 'Pm obtained by
resampling blocks of length £ from the m observations Xi,m' Next define
the subsampling estimator of MSE(CPm(£)), the mean squared error of the
MBB estimator of the level-2 parameter 'Pm based on a sample of size m,
as
____ n-m+1 2
MSEm(£) = (n - m + 1)-1 L [CPm,i(£) - CPn(£~)] , (7.24)
i=1

where £~ is a plausible pilot block size. Let

(7.25)

where we employ the set Jm (not In) to define i~. Then, i~ is an estimator
of the theoretical optimal block length when the sample size is m. We need
to rescale this initial estimator to get an estimator of £~ of (7.22). Since
the optimal block length £~ in (7.22) is of the order W.!-2, the right scaling
factor here is [n/m] r.!-2. The Hall, Horowitz and Jing (1995) estimator of
£~ is given by
i~ = (i~) . [n/m] r.!-2 . (7.26)
Note that the Hall, Horowitz and Jing (1995) estimation method is ap-
plicable quite generally, requiring only that the MSE of the bootstrap es-
timator has (an expansion of) the form (7.21) for some r ~ 1 and that the
184 7. Empirical Choice of the Block Size

subsampling estimator MSEmU!) of MSE(CPmU!)) converges in some suit-


able sense, say, in probability. In particular, the method is applicable even
without an explicit expression for the constants C 1 and C2 in (7.21). Sim-
ilarly, the method can be applied when a block bootstrap method other
than the MBB is employed. A set of sufficient conditions for the consis-
tency of the subsampling estimator MSEm(.e) are that the series {XihEZ
is stationary and has an absolutely summable strong mixing coefficient.
From the description of the method, it is clear that accuracy of the Hall,
Horowitz and Jing (1995) method depends on the choices of the subsam-
pling parameter m and the pilot block size .e~. The optimal order of m is
unknown at this stage. However, for the other smoothing parameter, viz.,
the pilot block size .e~, Hall, Horowitz and Jing (1995) suggest a way to
reduce the effect of .e~ on the optimal block length estimator .e~ of (7.26).
To reduce the effect of .e~, they suggest iterating the main steps of the
algorithm, by replacing the pilot block size .e~ with the estimated value
i~ for the second iteration, and repeating this process until convergence.
However, convergence of this iterative scheme is not guaranteed (see the
numerical example below).
We now describe the results of a small simulation study on finite sample
properties of the Hall, Horowitz and Jing (1995) method. We consider the
time series model
(7.27)
where {EihEZ is a sequence of iid random variables with common distri-
bution (X 2 (1) - 1), the centered Chi-squared distribution with one degree
of freedom. Thus, EEl = 0 and EE~ = 2. We took the level-l parameter ()
as EX 1 , and the estimator On as On = Xn , the sample mean with sample
size n = 125. The level-2 parameters of interest are given by (cf. (7.3) and
(7.4) )
(7.28)
and

'P3n P ( y'n(~: - ()) ::; 0)


P( On ::; ()) . (7.29)

True values of 'P2n and 'P3n were found by 20,000 simulation runs. These
are given by 'P2n = 3.984 and 'P3n = .5226.
To find the theoretical optimal block lengths for 'P2n and 'P3n , we applied
the MBB method to generate block bootstrap estimators of the level-2
parameters 'P2n and 'P3n with several values of the block length .e. Table 7.1
below gives the expected value (Mean), the bias, the standard deviation
(SD) and the MSE's of the MBB estimators based on 1000 simulation runs.
From the table, it is evident that the optimal block lengths for estimating
'P2n and 'P3n are respectively given by .e~n = 3 and .egn = 2. Next the
7.3 A Method Based on Subsampling 185

subsampling-based method of Hall, Horowitz and Jing (1995) was applied


to select an optional block size empirically. We chose the subs ample size
m = 30, and the pilot block size parameter t'kn = 5 for both 'P2n and 'P3n.
Thus, for k = 1,2, the MSE estimator MsEm(t') of (7.24) for the level-2
parameter 'Pkn is now evaluated by resampling overlapping blocks of size t'
from each of the 96 (= 125 - 30 + 1) overlapping subsamples of size m = 30
and then computing the MBB estimators tpkm,i(t') (say) of 'Pkn for the ith
subsample, for i = 1, ... ,96. The centering value tpkn(t'kn) in MsEm(t') is
computed using the full sample of size n = 125, with t'kn = 5 for both k.
All bootstrap estimates (including those related to Table 7.2 below) were
evaluated using 800 Monte-Carlo replicates.

TABLE 7.1. Determination of the true optimal block sizes for MBB estimation
of the level-2 parameters 'P2n and 'P3n of (7.28) and (7.29) for model (7.27). The
results are based on 1000 simulation runs. An asterisk(*) denotes the minimun
MSE value for a functional.

(a) Variance Estimation


L Mean Bias SD MSE
1 1.947 -2.037 0.705 4.645
2 2.902 -1.082 1.089 2.358
3 3.204 -0.780 1.244 2.157*
4 3.320 -0.664 1.334 2.221
5 3.394 -0.590 1.412 2.341
6 3.437 -0.547 1.482 2.497
7 3.452 -0.532 1.542 2.660
8 3.460 -0.524 1.594 2.814
9 3.460 -0.524 1.648 2.990
10 3.469 -0.515 1.713 3.198

(b) Distribution Function Estimation


L E.phi Bias SD MSE
1 0.5099 -0.0126 0.0136 0.000345
2 0.5132 -0.0094 0.0132 0.000262*
3 0.5127 -0.0099 0.0142 0.000299
4 0.5136 -0.0089 0.0139 0.000272
5 0.5123 -0.0103 0.0144 0.000313
6 0.5125 -0.0100 0.0149 0.000322
7 0.5125 -0.0100 0.0150 0.000324
8 0.5121 -0.0105 0.0154 0.000347
9 0.5123 -0.0103 0.0157 0.000352
10 0.5103 -0.0122 0.0164 0.000419
186 7. Empirical Choice of the Block Size

Table 7.2 gives the frequency distribution of the optimal block size esti-
mator R~n for 'Pkn, computed by formula (7.26) using 500 simulation runs.
As in Hall, Horowitz and Jing (1995), in this simulation study also, the
optimal block size estimators converged after a couple of iterations in a
majority of the cases. However, in some instances, there was a circular
behavior of the estimated optimal block size in successive iterations (e.g.,
the initial value 5 led to 8 which led to 3 and then, 3 led back to 5). The
frequency of such cases is given under the value -1. This problem ap-
peared to be more prevalent for distribution function estimation (i.e., for
'P3n of (7.29)) than for variance estimation (i.e., for 'P2n of (7.28)). In such
a situation, one may pick a value of R~n (from the set of all optimal block
lengths in different iterations) that corresponds to the minimum estimated
MsE m (1!).
Parts (a) and (b) of Table 7.2 show that for both level-2 parameters
'P2n and 'P3n, the estimated optimal block sizes have a pronounced mode
at the true optional block sizes, i.e., at £gn = 3 for 'P2n and at £~n = 2
for 'P3n. Furthermore, the distribution of the estimated optimal block size
for variance estimation has a longer right tail compared to that for the
distribution function estimation. However, the performance of this method
improves as the sample size n increases. See Hall, Horowitz and Jing (1995)
for further numerical examples and discussions.

TABLE 7.2. Frequency distribution of the optimal block sizes selected by the
Hall, Horowitz and Jing (1995) method for model (7.27) with n = 125, m = 30,
and initial block size Cion = 5, k = 1,2. Results are based on 500 simulation runs.
The value -1 of i~n' k = 1,2, corresponds to the cases where the iterations of
the method failed to converge.

(a) Variance Estimation


C
'0
2n -1 2 3 5 7 9 10 12 14 15 17 19 21
Freq. 35 137 200 63 18 12 13 8 6 2 3 1 2

(b) Distribution Function Estimation


C'03n -1 2 3 4 6 7 9 10 12 13
Freq. 137 276 50 5 7 13 8 1 2 1

7.4 A Nonparametric Plug-in Method


In this section, we describe a plug-in method for selecting the optimal
block length based on a recent work of Lahiri, Furukawa and Lee (2003).
The plug-in method estimates the leading term in the first-order expan-
sion of the optimal block length using a res amp ling method, and does
not require an explicit expression for the level-3 population parameters.
7.4 A Nonparametric Plug-in Method 187

In Section 7.4.1, we describe the motivation and basic construction of the


plug-in estimator and in Section 7.4.2, we describe estimation of the level-3
parameter associated with the bias part of the block bootstrap estima-
tors. Estimation of the level-3 parameter associated with the variance part
employ the Jackknife-After-Bootstrap (JAB) method of Efron (1992) and
Lahiri (2002a). In Section 7.4.3, we describe the JAB method for dependent
data. The nonparametric plug-in estimators of the optimal block lengths
are presented in Section 7.4.4. Some finite sample results are given in Sec-
tion 7.4.5. In all of Section 7.4, we restrict attention to the optimal block
lengths for the MBB method.

7.4.1 Motivation
Let 'Pn be a level-2 parameter of interest and let <Pn(e) be a block bootstrap
estimator of 'Pn based on blocks of length £. From the discussion of Section
7.2, it follows that under suitable regularity conditions, the variance of
<Pn (£) and the bias of <Pn (£) admit expansions of the form
(7.30)
and
(7.31)
for some population parameters B E JR, v E (0, (0) and for some known
constants a E (0,00), r E N. For example, for the bias and variance func-
tionals 'Pn = 'PIn, 'P2n, r = 1, and a = 1, while for the distribution function
(at a given point xo) 'Pn = 'P3n(XO) with Ixol =1= 1, r = 2 and a = 1/2. In
this case, the MSE-optimal block size £~ == £~('Pn) is given by

o (2B2 )
£n = rv
rt2 1
nr+ 2 1
(
+ 0(1) )
. (7.32)

Like any other plug-in method, the nonparametric plug-in method focuses
2 _1_ 1
on the leading term (2;; ) r+2 n r+2 but estimates the level-3 parameters B
and v nonparametrically, as follows. Note that from (7.30) and (7.31), we
have
lim (n- l £r)-ln 2a Var(<pn(£)) = V (7.33)
n--->oo

and
lim £. n a Bias(<pn(£))
n--->oo
=B . (7.34)
This suggests that consistent estimators of v and B may be derived if we can
estimate Var(<pn(£)) and Bias(<pn(£)) consistently. Let YARn and BiASn be
nonparametric estimators of Yare <Pn (£)) and Bias( <Pn (£)), respectively, that
are consistent in the following sense:

YARn ~ 1 as n~oo (7.35)


Var(<pn(£I)) P
188 7. Empirical Choice of the Block Size

and

(7.36)

along some suitable sequence {l\} == {.e1n}n;?:1.


Then, using (7.33) and (7.34), we define estimators of the parameters v
and Bas
(7.37)

and
(7.38)

The nonparametric plug-in estimator i~ of the optimal block length .e~ is


then given by replacing the level-3 parameters v and B in the leading term
in (7.32) by the above estimators, i.e., by

(7.39)

It is clear that the performance of the estimator i~ depends on the se-


quence {.e1n}n>1, on the level-2 parameter <Pn, and on the basic estimators
YARn and mAs n employed in the construction of vn and En in (7.37) and
(7.38), respectively. In the next section we describe the plug-in method of
Lahiri, Furukawa and Lee (2003) who used the JAB method for estimating
Var(<'on(.e)) and constructed an estimator of Bias(<,On(.e)) by combining two
block bootstrap estimators suitably. The use of these basic estimators were
prompted by considerations regarding computational efficacy and accuracy
of the proposed plug-in method. As explained below, the JAB variance
estimator has some computational advantage over other common resam-
pIing methods in that the JAB variance estimator can be computed by
reusing the block bootstrap resamples used in the Monte-Carlo evaluation
of <'on (.e 1 ), and thus, do not involve iterated levels of resampling. Similarly,
the bias estimator proposed in Lahiri, Furukawa and Lee (2003) also in-
volves a single level of resampling. In the section below, we describe further
details of the construction.

7.4.2 The Bias Estimator


For constructing the bias estimator, we begin with relation (7.31), which
gives an asymptotic representation for the bias part of the bootstrap esti-
mator <,On (£) and may be rewritten as

(7.40)
7.4 A Nonparametric Plug-in Method 189

If (7.40) holds for the sequences {C 1} == {C1n }n>1


-
and {2Cd == {2C1n }n>1,
-
then we may combine the corresponding expansions to conclude that

E[<pn(C1) - <pn (2C d]


[{ 'Pn + n~l + o(n- a C11)} - {'Pn + 2~C1 + o(n- a {lll)}]
B
- - + o(n- a r1 1)
2n a C
as n --7 ()() • (7.41)
1

This suggests that a consistent estimator of Bias(cpn({ld) satisfying (7.36)


may be constructed as

(7.42)
1
Indeed, if the optimal order of the block length for estimating 'Pn is n r+2 (cf.
(7.32)), then by Cauchy-Schwarz inequality, it follows that for any sequence
{e1} = {{lln}n>l satisfying the requirement
1
1 «C 1 «n r + 2 as n --7 ()() , (7.43)

BIASn is consistent. A specific choice of {{lln}n>l will be suggested in


Section 7.4.4 for the plug-in estimator £~ of (7.39). Note that, as pointed
out earlier, the estimator BIASn is based on only two block bootstrap
estimator of 'Pn and may be computed using only one level of resampling.
In the next section, we describe the JAB method for dependent data.
Readers familiar with the method may skip this section and proceed to
Section 7.4.4.

7.4.3 The JAB Variance Estimator


The JAB method was proposed by Efron (1992) for assessing accuracy of
bootstrap estimators based on the IID bootstrap method for independent
data. A modified version of the method for block bootstrap estimators
in the case of dependent data was proposed by Lahiri (2002a). The JAB
method for dependent data applies a version of the block jackknife method
to a block bootstrap estimator. For the sake of completeness, first we briefly
describe the block jackknife method.
Let Xn = {Xl, ... , Xn} be the observations and let "Yn == tn(Xn) be an
estimator of a level-1 parameter of interest 'I. The block jackknife method
systematically deletes blocks of consecutive observations to define the jack-
knife copies (called the block jackknife point values) of "Yn and combines
these to produce estimators of the bias and the variance of "Yn. Like the
block bootstrap methods, different versions of the block jackknife method,
such as, overlapping, nonoverlapping, and weighted block jackknife meth-
ods have been proposed in the literature (cf. Kiinsch (1989), Liu and Singh
190 7. Empirical Choice of the Block Size

(1992)). Here we describe the overlapping version of the block jackknife or


the moving blocks jackknife (MBJ) of Kiinsch (1989) and Liu and Singh
(1992). (Like the term "MBB," the term MBJ was also introduced by Liu
and Singh (1992)). Let m == mn be a sequence of integers such that m goes
to infinity but at a rate slower than n, i.e.,

(7.44)

Here m denotes the number of observations (or the size of the block) to
be deleted for defining the MBJ point values. For i = 1, ... ,n - m + 1, let
Xn,i = Xn \ {Xi, ... ,Xi+m-d denote the set of observations after the block
{Xi, ... , Xi+m- d of size m has been deleted from X n . Then, the ith MEJ
point value i~il is defined as

i~il = tn-m(Xn,i), i = 1, ... , n - m +1. (7.45)

The MBJ estimator of the variance of in is given by


n-m+l 2
ARMBJ (')
V _ m 1
"in - (n-m)' n-m+1
'~
" (-Ci l ')
"in -"in (7.46)

where i~il == m -1 (nin - (n - m )i~i)) is the ith MEJ pseudo-value corre-


sponding to in. For consistency and finite sample properties of the MBJ
and its other variants, we refer the reader to Kiinsch (1989), Liu and Singh
(1992), Shao and Tu (1995), Davison and Hinkley (1997), and the references
therein. Note that, if we set m == 1, i.e., if we delete a single observation at
a time, then the MBJ variance estimator in (7.46) reduces to the classical
delete-1 jackknife variance estimator for independent data

1 n 2
AR J"in
V (' ) --n(n-1)L....
'" (-Ci
"in l -"in
') (7.4 7)
"=1
For properties of the jackknife method for independent data, see Miller
(1974), Efron (1982), Wu (1990), Liu and Singh (1992), Efron and Tib-
shirani (1993), Shao and Th (1995), Davison and Hinkley (1997), and the
references therein.
Next we describe the JAB method for dependent data. Let rpn == rpn(R.) be
the MBB estimator of a level-2 parameter i.pn based on (overlapping) blocks
of size R. from Xn = {Xl, ... , X n }. Let Hi = {Xi, ... , XiH-l}, i = 1, ... , N
(with N = n-R.+ 1) denote the collection of all overlapping blocks contained
in Xn that are used for defining the MBB estimator rpn. Also, let m be
an integer such that (7.44) holds. Note that the MBB estimator rpn(R.)
is defined in terms of the "basic building blocks" Hi'S. Hence, instead of
deleting blocks of original observations {Xi, ... , Xi+m-l}, as done in the
MBJ method described above, the JAB method of Lahiri (2002a) defines
7.4 A Nonparametric Plug-in Method 191

the jackknife point-values by deleting blocks of 13i 's. Later in this section,
we will discuss how this simple modification plays an important role in
ensuring computational efficacy of the JAB method.
Since there are N observed blocks of length g, we can define M == N -
m + 1 many JAB point-values corresponding to the bootstrap estimator
CPn, by deleting the overlapping "blocks of blocks" {13i , ... , 13i+m-l} of size
m for i = l, ... ,M. Let If = {l, ... ,N}\{i, ... ,i+m-1}, i = l, ... ,M.
To define the ith JAB point-value cp~) == cp~) (g), we need to resample b =
Ln/gJ blocks randomly and with replacement from the reduced collection
{13j : j E If} and construct the MBB estimator of V'n using these resampled
blocks. More precisely, suppose that Tn = tn{A.'n; B) be a random variable
with probability distribution G n and let V'n = V'(G n ) for some functional V'.
Let J i1 , ... , Jib be a collection of b random variables such that, conditional
on X n , these are iid with common distribution

(7.48)

Then, the resampled blocks to be used for defining the JAB point-value
cp~) are given by
{13j*(i) -= 13J ; j '.'J -- 1, ... , b} . (7.49)

Let X;:(i) denote the resampled data obtained by concatenating {13;(i) , j =


1, ... , b}. Also, let T;:(i) == tnl (X;: (i) ; On,i) be the MBB version of Tn, defined
using the resampled data X;:(i) and using a suitable estimator On,i of B.
Then, the JAB point-value cp~) is given by applying the functional V' to
the conditional distribution an,i (say) of T;:(i) as

(7.50)

For an example illustrating the definition of T;:(i), suppose that

(7.51)

with On = H(Xn) and B = H(J.L) for some (smooth) function H : JRd -+


JR, where {XihEZ is a stationary sequence of JRd-valued random vectors,
Xn = n- 1 L:~=l Xi and J.L = EX1 . Let X~(i) denote the MBB sample mean
based on the nl = b.g resampled values in { 13 t) ,
j = 1, ... , b}. Then, the
MBB version T;:(i) for the ith JAB point-value is defined as

where B~(i) = H(X~(i») and where we set On,i H({ln,i) with {In,i =
E* X n ,Z = 1, ... ,M.
-*(i) .
192 7. Empirical Choice of the Block Size

Next we return to the general case of Tn == tn(Xn; 8) and define the JAB
variance estimator of r{;n as (cf. (7.46))

(7.53)

where rp~) = m- 1(Nr{;n - (N - m)r{;~)) denotes the ith JAB pseudo-value


corresponding to r{;n and where r{;~) 's are defined by (7.50). As with the
given MBB estimator r{;n, computation of the point-values r{;~) 's and hence,
of the pseudo-values rp~) are typically done using the Monte-Carlo method.
A simple representation result, initially noted by Efron (1992) in the con-
text of lID bootstrap, makes it possible to compute the JAB variance
estimator by reusing the resampled blocks used in the computation of the
given bootstrap estimator r{;n. We now give a statement of this result below.
Proposition 7.1 Let J 1 , ... ,Jb be iid random variables with the Discrete
Uniform Distribution on {I, ... ,N} and let J i1 , ... ,Jib be iid random vari-
ables with the Discrete Uniform Distribution on If, 1 S i S M. Let
Pi = b- 1 L~=l R(Jj E If), 1 SiS M. Then, for any i = 1, ... , M,
the conditional distribution of (J1, ... , J b) given Pi = 1 is the same as the
unconditional distribution of (Ji1 , ... , Jib).
Proof: For any j1, ... ,jb E If,

P( J 1 = jl, J b = jb I Pi = 1)
00 • ,

P(J1 =j1,oO·,Jb Ejb)/P(Pi = 1)


[N-b]/[(N - m)/N]b
(N - m)-b = P(Ji1 = j1,"" Jib = jb) .

This completes the proof of the proposition. D

To appreciate the relevance of this result, suppose that hE;', ... , kEb},
k = 1, ... , K denote the set of blocks drawn randomly, with replacement
from the collection {B 1, ... , B N} for the Monte-Carlo evaluation of the
given block bootstrap estimator r{;n. Let hJ1 , ... , kJb} denote the random
indices corresponding to hB;',oO',kBb}' i.e., kBj = kEkJJ' 1 S j S b,
k = 1, ... , K. Then for any k, if all b indices kJ1,.'" kJb lie in If, by
Proposition 7.2, we may consider (kJ1,"" kJb) as a random sample of size
b from the reduced index set If = {I, ... ,N} \ {i, ... ,i + m - I}. Let

It={k:1SkSK, djEIf forall j=l, ... ,b}

denote the index set of all such random vectors (kJ1,"" kJb). Then,
{(kJ1, ... , kJb) : k E It} gives us an iid collection of random vectors (of
possibly different sizes for different i E {I, ... , M}), each having the same
7.4 A Nonparametric Plug-in Method 193

distribution as (Ji1 , ... , Jib) of the Proposition. Thus, the res am pIes for
computing the ith JAB point-value I{!~) may be obtained by extracting
the subcollection {(kBr, ... , kBi;) : k E I;} from the original resamples
{(kBr, ... 'kBi;) : 1 :::; k :::; K}, and no additional res amp ling is needed.
The Monte-Carlo approximations generated by this method are close to
the true values of I{!~) 's, provided K is large.
As an illustration, consider the random variable Tn of (7.51) and suppose
that the level-2 parameter of interest is rpn = rp( G n ) for some functional
rp where G n is the sampling distribution of Tn. Figures 7.1 and 7.2 give
a schematic description of the main steps involved in the computations of
the MBB estimator I{!n and its JAB point-values r{!~), i = 1, ... , M. For
computing I{!n, we generate K iid sets of b many blocks {kBr,···, kBi;}
for k = 1, ... , K, compute the bootstrap sample mean kX~ and the boot-
strap version kT~ = Vn(ke~ - en) for each set with ke~ = H(kX~) and
en = H(fln). Then, the Monte-Carlo approximation to I{!n is given by rp( G~)
where G~ denotes the empirical distribution of the bootstrap replicates
hT~ : k = 1, ... , K}. For computing I{!~), we scan the K sets of resam-
pled blocks hBr, ... ,kBb}, k = 1, ... ,K and extract the ke~-values corre-
sponding to the block-sets hBr, . .. 'kBb} that do not contain any of the
blocks B i , ... ,Bi+m-l. Next, the bootstrap version of T~(i) are computed
by employing these ke~'S in the formula kT~(i) = yfnl(ke~ - en,i) where
en,i == H(fln,i). Note that fln,i is given by the average of block-averages
in the reduced collection {B j : j E IP} and can be computed without any
resampling. The copies kT~(i) 's are now combined to generate the Monte-
Carlo approximation to I{!~), just in the same way the kT~ 's are used for
computing the original bootstrap estimate r{!n.

7.4.4 The Optimal Block Length Estimator


We now return to the problem of choosing the optimal block length for block
bootstrap methods using the nonparametric plug-in method. Let I{!n ==
I{!n (C) be an MBB estimator of a level-2 parameter rpn with an MSE of the
form (d. (7.30),(7.31))

(7.54)

where v E (0, (0), B E JR, B =I- °


are unknown parameters and where
r E N, a E (0, (0). Then, the theoretical optimal block length C~ is given
by (d. (7.32))

C~=
2B2) r~2 nr+2(1+o(1))
(-:;:;; 1
(7.55)
194 7. Empirical Choice of the Block Size

1
G~ = empirical distribution of 1 e~ ,2 e~, ... ,K e~

FIGURE 7.1. Monte-Carlo computation of the MBB estimator of 'Pn = 'P(Gn )


where Gn is the probability distribution of Tn of (7.51).

The nonparametric plug-in method, described in Section 7.4.1, suggests (cf.


(7.39))

(7.56)

as an estimator of the optional block length, where Bn = n a £1 BIAS n and


£n-
vn = [n- 1 1n 2a YARn are estimators of the level-3 parameters B and v,
and BIASn == BIASn(£ 1) and YARn == YARn (£ 1) are some consistent esti-
mators of the bias and the variance parts of the block bootstrap estimator
c,On(£d based on some suitable initial block length £1 (cf. (7.35), (7.36)).
Lahiri, Furukawa and Lee (2003) suggest using the bias estimator BIASn of
(7.42) to define Bn and using the JAB variance estimator YAR JAB (c,On (£ 1) )
for defining vn . With these choices, the plug-in estimator of the optimal
block length £~ is given by

COn =
2B;'] r!2 n _1
[-_- r + 2 , (7.57)
'rVn
7.4 A Nonparametric Plug-in Method 195

hBi, ···,k B;;} n {B i , ... , Bi+m-l


=0

yes no
k~K

I ke~ : kElt I
~
r-::(i)l
~
FIGURE 7.2. Computation of the ith JAB point value 1f~) starting with
the resampled blocks {IB~, ... , IB;;}, ... , hBi, ... , kB;;} generated for the
Monte-Carlo computation of the block bootstrap estimator 1fn of Figure 7.1.

where En = 2£1[Y?n(£1) - y?n(2£1)] and vn = (nti"") . VARJAB(Y?n(£I)),


and VARJAB(Y?n(£I)) is defined by (7.53) with £ = £1. The n a and n 2a
factors in the definitions of En and fin are left out as they cancel from the
numerator and the denominator of (7.56).
We now show that this naive construction yields consistent estimators of
£~ for various functionals 'Pn without explicit form of the constants Band
v in (7.54). For this, we suppose that {XdiEZ is a sequence of stationary
random vectors with values in lR d and the level-l parameter e and its
estimator en satisfy the requirements of the Smooth Function Model (7.8),
i.e., en = H(Xn) and e = H(J1) for some smooth function H : lR d ---> lR,
Xn = n- l 2:~=1 Xi and J1 = EX I . First, we consider the bias and the
variance functionals (d. (7.2),(7.3))

'PIn (7.58)
196 7. Empirical Choice of the Block Size

(7.59)

For k = 1,2, let £~n denote the optional block length for estimating 'Pkn,
defined by (7.6). Then, we have the following result.
Theorem 7.3 Suppose that Condition (5.D r ) of Section 5.4 holds with
r = 4,
(7.60)
and
(7.61 )
Also, suppose that Condition (5.Mr ) of Section 5.4 holds with r = 2 + k(2 +
ao) where ao is as in the statement of Condition (5.D r ). Then
(7.62)

for k = 1,2.
Proof: See Lahiri, Furukawa and Lee (2003). D

Under suitable regularity conditions, Lahiri, Furukawa and Lee (2003)


also prove consistency of the plug-in estimator l~ for bootstrap distribu-
tion function estimation and for bootstrap quantile estimation for certain
studentized versions of en. For these functionals, expansion (7.54) for the
corresponding MSEs hold with r = 2 and a = 1/2, as in the case of the
distribution function 'P3n(XO) of the normalized version of en for Ixol =f. l.
Thus, the optimal block sizes for these functionals in the studentized case
are of the order n 1 / 4 and the corresponding plug-in estimators l~ are de-
fined with r = 2 in such cases. For the normalized version of en, consistency
of l~ for the distribution function estimator 'P3n of (7.4) also holds (cf.
Lahiri (1996d)), provided we set r = 2 for Ixol =f. 1, and r = 1 for Ixol = l.
Thus, the plug-in estimator provides a consistent and computationally ef-
ficacious method for estimating the optimal block length for a variety of
level-2 parameters.
Although the nonparametric plug-in method produces a valid (i.e., con-
sistent) estimator of the optimal block length, finite sample performance
of the estimator depends on the choice of the smoothing parameter £1, and
on the JAB "blocks of blocks" deletion parameter m. It turns out that a
reasonable choice of £1 in (7.57) depends on the functional 'Pn. A careful
analysis of the MSE of Bn shows that the optimal choice of £1 is of the
form
(7.63)
where r is as in (7.54), and C3 is a population parameter. As for the other
smoothing parameter, an heuristic argument in Lahiri (2002a) suggests
that a reasonable choice of the JAB parameter m is given by
1/3fl2/3
m = C4n t-1 (7.64)
7.4 A Nonparametric Plug-in Method 197

for some constant C4 • Numerical results of Lahiri, Furukawa and Lee (2003)
show that the choice C 3 = 1 in (7.63) for the initial block size i\ yields
good results for both the variance and the distribution function estimation
problems, while the corresponding values for C4 in (7.64) are given by
C4 = 1.0 for the variance functional and C4 = 0.1 for the distribution
function. Below we report the results from a small simulation study with
the above choices of C3 and C4 . For more simulation results, see Lahiri,
Furukawa and Lee (2003).
We consider the moving average model of Section 7.3, given by (cf. (7.27))
Xi = (Ei + Ei-l) /../2, i E Il, where {EdiE:&: is a sequence of iid random
variables having the centered Chi-squared distribution with one degree of
freedom. As in Section 7.3, we also set the level-l parameter to be () = EX 1 ,
the estimator On to be the sample mean Xn , and the level-2 parameters as
'P2n = n.Var(Xn) and 'P3n = P(y'n(On - ())/Tn ::; 0). The true value of
() is zero. Also, we take the sample size n to be 125. As stated in Section
7.3, the true values of 'P2n and 'P3n are 'P2n = 3.984 and 'P3n = 0.5226.
Furthermore, the theoretical optimal block sizes for estimating 'P2n and
'P3n by the MBB are .e~n = 3 and .egn = 2, as shown in Table 7.1.
Next we applied the nonparametric plug-in method to estimate the tar-
get values .e~n and .egn . Table 7.3 gives the frequency distribution of the
estimated optimal block sizes based on 500 simulation runs. The block
boostrap estimators in each case were evaluated using 1000 Monte-Carlo
replicates. Table 7.3 shows that more than 80% of the mass of the esti-
mated block size i~n for variance estimation lies in the interval [2,5] (the
true value being .e~n = 3). The method also produces very good results for
distribution function estimation, with a pronounced mode at the true value
.egn = 2, and a small support set {I, 2, 3}.

TABLE 7.3. Frequency distribution of the optimal block sizes selected by the
nonparametric plug-in method for model (7.27) with n = 125.

(a) Variance Estimation


1 2 3 4 5 6 7 8 9 10
Frequency 50 114 125 94 71 29 10 3 2 2

(b) Distribution Function Estimation


123
Frequency 172 268 60
8
Model-Based Bootstrap

8.1 Introduction
In this chapter, we consider bootstrap methods for some popular time series
models, such as the autoregressive processes, that are driven by iid random
variables through a structural equation. As indicated in Chapter 2, for such
models, it is often possible to adapt the basic ideas behind bootstrapping
a linear regression model with iid error variables (cf. Freedman (1981)).
In Section 8.2, we consider stationary autoregressive processes of a general
order and describe a version of the autoregressive bootstrap (ARB) method.
Like Efron's (1979) IID resampling scheme, the ARB also resamples a single
value at a time. We describe theoretical and empirical properties of the
ARB for the stationary case in Section 8.2. In Section 8.3, we consider the
explosive autoregressive processes. In the explosive case, the initial variables
defining the model have nontrivial effects on the limit distributions of the
least squares estimators ofthe autoregression (AR) parameters. As a result,
the validity of the ARB critically depends on the initial values. In Section
8.3, we describe the relevant issues and provide conditions for the validity
of the ARB method in the explosive case.
The unstable autoregressive processes are considered in Section 8.4. Here,
the ARB with the natural choice of the resample size fails. A remedy to
this problem is given by the "m out of n" ARB, where the resample size
m grows to infinity at a rate slower than the sample size n. In the unstable
case, we describe the theoretical and numerical aspects of the ARB for
the first-order AR-models only. In Section 8.5, we present some results on
200 8. Model-Based Bootstrap

bootstrapping autoregressive and moving average (ARMA) processes of a


general (finite) order, in the stationary case.

8.2 Bootstrapping Stationary Autoregressive


Processes
Let {XihEZ be a stationary autoregressive process of order p (AR(p)),
satisfying the linear difference equation

(8.1)

where pEN, (31, ... ,(3p are the autoregression parameters and {EdiEZ is
a sequence of zero mean iid random variables with a common distribution
F. In the sequel, we shall often assume that the autoregression parameters
(31, ... ,(3p are such that
p

(3(z) == 1- L(3jzj =1= 0 for all z E <C with Izl::; 1 . (8.2)


j=l

It is well known (cf. Brockwell and Davis (1991), Chapter 3) that under
(8.2), the AR(p) process {Xi}iEZ of (8.1) admits an infinite-order moving-
average representation
00

Xi = L bjEi-j , (8.3)
j=O

where {b j }~o are constants, determined by the power series expansion of


the function b(z) == [(3(z)]-l, given by
00

b(z) = L bjz j , Izl::; 1 .


j=O

Although the random variables Xi's under the AR(p) model (8.1) are de-
pendent, here we can use the model structure to generate valid bootstrap
approximations without any block resampling. As described in Chapter 2,
the basic idea is to consider the "residuals" from the fitted model, which
turn out to be "approximately independent," and then res ample the resid-
uals (with a suitable centering adjustment) to define the bootstrap ob-
servations through an estimated version of the structural equation (8.1).
Suppose that a finite segment Xl, ... ,Xn of the process {XdiEZ is ob-
served. Let fj1n, ... , fjpn denote the least squares estimators of (31, ... , (3p
based on Xl' ... ' X n . Thus, fj1n, ... , fjpn are given by the relation
8.2 Bootstrapping Stationary Autoregressive Processes 201

where Vn is a (n-p) xp matrix with ith row (Xi+P-1, ... , Xi), i = 1, ... , n-
p. Let, Ei = Xi - fhnXi-1 - ... - /JpnXi-p, i = P + 1, ... ,n denote the
residuals. Note that by using (8.1), we may express the residuals as
P
Ei = Ei - I::(/JjP - {lj)Xi- j , P + 1 ~ i ~ n .
j=l
As a consequence, when {ljn ----+p {lj for j = 1, ... ,p, the second term
is small for large values of n and thus, the residuals are approximately
independent. This suggests that we may resample the residuals, a single
value at a time as in Efron's (1979) lID bootstrap, to define the bootstrap
version of a random variable Tn == t n (X 1, ... , Xn; {l1, ... , (lp, F). However,
to generate a valid approximation, we need to center the residuals Ei'S first
and resample from the collection of the centered residuals, defined by

Ei = Ei - En, i = P + 1, ... ,n , (8.5)

where En = (n - p)-l L~=P+1 Ei. Next, generate the bootstrap error vari-
ables <, i E /£ by sampling randomly with replacement from {Ep+1, ... , En}.
Thus, the random variables Ei, i E /£ are conditionally iid (given
Xl, ... ,Xn ) with common distribution

P*(E;' = Ei) = _1_, P + 1 ~i ~n . (8.6)


n-p

Note that by (8.5) and (8.6), E*Ei = (n - p)-l L~=P+1 Ei = o. Thus,


the bootstrap error variables <'s satisfy an analog of the model condi-
tion EEl = 0 at the bootstrap level. Now, define the bootstrap version of
equation (8.1) by

and let {Xi hEZ be a stationary solution of (8.7). If /Jjn ----+p (lj as n --+ 00
for j = 1, ... ,p, then such a solution exists on a set of Xi's that has prob-
ability close to one for n large. In practice, one makes use of the recursion
relation (8.7) for i ~ p + 1 to generate the bootstrap "observations" by
setting the initial p variables (arbitrarily) equal to Xl, ... ,Xp or equal to
zeros. When the polynomial /J(z) == 1- L;=l /Jjnzj does not vanish in the
region {izi ~ 1} (d. (8.2)), the coefficients of the p initial values die out
geometrically fast and, therefore, have a negligible effect in the long run. As
a consequence, one may generate a long chain using the recursion relation
(8.7) until stationarity is reached (Le., the effect ofthe initial values become
inappreciable) and may take the next m-values as the desired "resample" of
size m. The autoregressive bootstrap (ARB) version of a random variable
Tn = tn(Xl, ... , Xn; {l1, ... , (lp, F) based on a resample of size m > p is
given by
(8.8)
202 8. Model-Based Bootstrap

where Fn is the empirical distribution of the centered residuals Ei, p + 1 ::;


i ::; n. Typically, the res ample size m is chosen equal to the original sample
size n. However, in some cases, a smaller value of m may be desirable;
see, for example, Section 8.4. Furthermore, depending on the values of
the parameters /31, ... , /3p, it may be desirable to use two different sets of
estimators of the parameters /31, ... , /3p, in the formulation of the ARB
method, employing one set of estimators to define the residuals Ei'S and
the other set to define the bootstrap version (8.7) of the AR model; see
Datta and Sriram (1997) for such an alternative approach.
Properties of the ARB have been investigated by many authors. The
idea of exploiting the structural equation (8.1) to adapt Efron's (1979) lID
resampling scheme to the AR-processes has been noted and formalized in
different problems by Freedman and Peters (1984), Efron and Tibshirani
(1986), and Swanepoel and van Wyk (1986), among others. The first paper
applied the bootstrap to estimate the mean square forecasting error for a
class of autoregressive-type time series models and obtained some empirical
results. Theoretical analysis of their method has been subsequently carried
out by Findley (1986). Efron and Tibshirani (1986) developed a bootstrap
estimate of the standard errors of autoregression parameter estimators.
Properties of certain bootstrap confidence bands for the autoregression
spectral density have been studied by Swanepoel and van Wyk (1986).
Bose (1988) investigated higher-order properties of the ARB for the nor-
malized least squares estimators of the autoregression parameters. Here
we state a result of Bose (1988) that shows that the ARB approxima-
tion is indeed second-order correct. Suppose that {XthEZ is the station-
ary solution to (8.7). Let ~ and I;n be the p x p matrices with (i,j)-th
elements CoV(X 1 ,X1+1i-jl) and Cov*(X{,X;+li_jl)' respectively, 1 ::; i,
j ::; p, where Cov* denotes conditional covariance given {XihEZ. Also,
write /3n = (/31n, ... , /3pn) for the vector of least squares estimators of
A A A '

/31, ... ,/3p, given by (8.4). Let /3~ = (/3;n, ... ,/3;n)' denote the bootstrap
version of (In, obtained by replacing {Xl, . .. , Xn} in the definition of (In
by {X{, ... ,X~}. For a nonnegative definite matrix A of order p, let A 1/2
denote a p x p symmetric matrix satisfying A = A 1/2 . A 1/2. Also, recall
that t = A. Then, we have the following theorem.

Theorem 8.1 Assume that {EdiEZ is a sequence of iid random variables


such that

(i) EEl = 0, EEi = 1, EE~ < 00 and

(ii) (E1' Ei) satisfies Cramer's condition, i. e.,

lim sup IE exp(t( E1, Ei)t) I < 1


Iltll-->oo
8.2 Bootstrapping Stationary Autoregressive Processes 203

Also, suppose that all roots of the characteristic polynomial

(8.9)

lie inside the unit circle. Then,

:~fp Ip* (nl/2t;;2(f3~ - (In) ::; x) - p( vn~1/2({Jn - (3) ::; x) I


= 0(n- 1/ 2) a.s.

Proof: See Bose (1988). D

Theorem 8.1 shows that the ARB approximation is second-order correct


and, thus, it outperforms the normal approximation, which has an error of
order O(n- 1 / 2 ). Under additional regularity conditions, the o(n- 1 / 2 ) bound
in Theorem 8.1 can be sharpened further and the ARB can attain a level of
accuracy similar to that of the IID bootstrap under independence (cf. Choi
and Hall (2000)). As a result, the accuracy of the ARB approximation is
higher than the MBB approximation.
It is important to note that Theorem 8.1 asserts second-order correctness
of the ARB method for the normalized least squares estimator when the
observations Xi'S (are known to) have zero mean. For stationary autore-
gressive processes with an unknown mean, the least squares estimators are
defined through (8.4) by centering the Xi'S at the sample mean Xn . Second-
order accuracy of the ARB continues to hold in this case (cf. Remark 2,
page 1710, Bose (1988)).
Next we briefly consider finite sample performance of the ARB method
and compare it with the MBB method. Indeed, the superiority of the ARB
method over the MBB for stationary AR(p) processes shows up in numerical
studies even in samples of moderate sizes. As an example, we consider model
(8.1) with f31 = 0.5, p = 1, and El rv N(O, 1). Figure 8.1 shows a data set
of size n = 100 from this model. We applied the ARB method described
above to approximate the sampling distribution of

The histogram of the ARB version T; of Tn based on B = 500 bootstrap


replicates is given in the left panel of Figure 8.2, which looks reasonably
symmetric. We also obtained the true distribution of Tn using 10,000 sim-
ulation runs. A plot of the cdf of T; against the true cdf of Tn is given
in the right panel of Figure 8.2. Even for a sample size of n = 100, the
approximation appears to be very good.
In the absence of the knowledge of the model (8.1), we may apply a block
bootstrap method to derive alternative estimates of the sampling distribu-
tion of Tn. Left panels of Figure 8.3 show the histograms generated by the
204 8. Model-Based Bootstrap

o 20 40 60 80 100

FIGURE 8.1. A data set of size n = 100 simulated from the AR(l) model
Xi = 0.5Xi - 1 + Ei, i E Z where Ei'S are iid N(O, 1) variables.

True
AABOOT
I
'"
.,;

'"
.,;

..
.,;

"l
o

o
° i
·3 ·2 ·1 o 2 ·2 ·1 o

FIGURE 8.2. The ARB estimate of the sampling distribution of Tn for the data
set of Figure 8.1 is given as a histogram on the left. The corresponding cdf
(denoted by the dotted curve) and the true cdf of Tn (denoted by the solid
curve) are given on the right .

MBB method with block sizes C = 5, 10. Here, in each case, B = 500 boot-
strap replicates were used. It appears that the MBB distribution function
estimates are both skewed, with C = 5 leading to a higher level of skewness
compared to C = 10. The overall errors of the resulting bootstrap approx-
imations are effectively captured by the plots of the MBB cdf estimates
against the true sampling distribution of Tn, as given by the right panels
of Figure 8.3.
Next consider bootstrap CIs for (31 at nominal coverage levels 80% and
90%. Using the bootstrap quantiles from the above computations, we de-
rived two-sided equal tailed CIs for (31 based on the ARB method and
based on the MBB method with block lengths C = 2,5,10,20. The upper
and the lower end points of the resulting CIs are given in Table 8.1. CIs for
8.3 Bootstrapping Explosive Autoregressive Processes 205

~
8
'" <Xl
0
True
MBB (10.5)
~
'"0
~ ~ 0'"
§ '"0 j
0
0
0 ... -. ---.----
-4 ·3 ·2 ·1 0 -4 ·3 ·2 ·1 0 2

8 ~
'" <Xl True
0 MBB (1 ~ 1O)
~ <D
0
~ ~ 0" ,
~ '"0
0
0
0 ----._-
·4 ·3 ·2 ·1 0 2 -4 ·3 ·2 ·1 0

FIGURE 8.3. MBB estimates of the sampling distribution of the normalized least
square estimator Tn = (L~:ll X;)1/2(t31n - fh) with block lengths £ = 5 (top
row) and £ = 10 (bottom row) for the data set of Figure 8.1. The left panels give
the MBB estimates as histograms, while the right panels give the corresponding
MBB cdfs (denoted by dotted curves) against the true cdf of Tn (denoted by solid
curves).

/31 based on the true distribution of Tn are also included for comparison.
Except for the 80% MBB CI with £ = 20, all CIs contain the target value
/31 = 0.5. Note that the CIs given by the ARB have end points that are
closer to those of the exact CIs, than the end points of the MBB CIs.
The ARB approximation tends to be more accurate than the MBB ap-
proximation because it explicitly makes use of the structure of the model
(8.1). The quality of ARB approximation becomes poor when the model
assumptions are violated. In particular, if the order of the autoregressive
process is misspecified, the ARB method may be invalid, but the MBB
would still give a valid approximation. Also, the standard ARB method is
not robust against the values of the autoregressive parameters, particularly
when some of these parameters lie on the boundary, e.g., when /31 E {-1, 1}
for the AR(l) case (cf. Section 8.4).

8.3 Bootstrapping Explosive Autoregressive


Processes
Let {X;}i>1 be an autoregressive process defined by the recursion relation
(8.10)
206 8. Model-Based Bootstrap

TABLE 8.1. Two-sided equal tailed CIs for f31 for the data set of Figure 8.1 at
80% and 90% nominal levels, obtained using the true distribution of Tn, the ARB
method, and the MBB method with block sizes £. = 2,5,10,20.

90% CI 80% CI
Lower Upper Lower Upper
TRUE 0.253 0.559 0.287 0.525
ARB 0.255 0.551 0.288 0.520
MBB (£=2) 0.464 0.752 0.494 0.715
MBB (£ = 5) 0.340 0.629 0.368 0.594
MBB (£ = 10) 0.300 0.576 0.325 0.546
MBB (£ = 20) 0.262 0.524 0.292 0.495

where {Xl, ... , Xp} is a given initial set of random variables, {Eih?:p+l is
a sequence of iid random variables that are independent of {Xl"'" Xp},
and the autoregression parameters (31, ... ,(3p are such that the roots of
the characteristic polynomial \[I p (z) = zP - (31 zp-l - ... - (3p all lie in the
region {z E CC : Izl > I} ofthe complex plane. The model given by (8.10) is
known as an explosive autoregressive model of order (p). Note that unlike
the stationary case, the error variables E/S in (8.10) are not required to
have zero mean. This is because, in the following, we allow the Ei'S to be
heavy-tailed so that the expectation of El may not even exist. Another
notable difference of the explosive AR(p) model with the stationary case
is that the initial random variables {Xl, ... , Xp} have nontrivial effects
on the subsequent Xi'S. To gain some insight, consider the case p = l.
Then, the ith random variable Xi generated by the recursion relation (8.10)
involves the term (3i- l Xl. For the stationary case, 1(311 < 1 and the effect
of the initial random variable Xl on Xi becomes small at a geometric rate
as i ---; 00. In contrast, for the explosive case, 1(311 > 1 and hence, the
contribution from this term "explodes" in the long run, unless Xl = 0 a.s.
Furthermore, as we shall shortly see, the initial set of random variables have
a nontrivial effect on the limit distribution of the least squares estimators.
Suppose that {Xl, ... ,Xn } denote the observations from model (8.10).
Define the least square estimator /3n = (/3ln, ... ,/3pn)' of the autoregression
parameters (3 = ((31, ... ,(3p)' by relation (8.4) and let
(3'
A-
-
[
J[p-l
(8.11)

be a p x p matrix with its first row equal to (3' ((31, ... , (3p), where 0
denotes the (p -1) x 1 vector of zeros and where, recall that, for kEN, J[k
denotes the identity matrix of order k. Also, let
ex:>

U = LA-jEJ+p (8.12)
j=l
8.3 Bootstrapping Explosive Autoregressive Processes 207

and write U(1) for the first column of U. The following result, due to Datta
(1995), gives the limit distribution of the least square estimator vector i3n
in the explosive case.

Theorem 8.2 Suppose that the error variables {Eih2P+l are iid nonde-
generate random variables with Elog (1 + IEp +1l) < 00. Then,

(8.13)

whereQ = 2:::p+lA-i(WW')(A-i)', W = (X l , ... ,Xp )'+U(1), and


where A is independent of Wand A has the same distribution as A-PU.
Proof: See Theorem 2.2 of Datta (1995). D

Note that the limit distribution of the normalized least square estimator
Tn is nonnormal and depends on the initial variables (Xl, ... , Xp)' through
W. As a result, in the explosive case, any bootstrap method must use a
consistent estimator of the joint distribution of (Xl, ... ,Xp)' to produce
a valid approximation to the sampling distribution of Tn. This requires
one to impose further restrictions on the joint distribution of Xl, ... ,Xp ,
e.g., Xl' ... ' Xp are degenerate or (Xl' ... ' Xp)' follows a "known" ~
dimensional distribution. Alternatively, one may consider the conditional
distribution of Tn given (X!, ... , Xp)'. In view of the independence as-
sumption on (Xl' ... ' Xp)' and the sequence {Eih2P+l of error variables,
the conditional (limit) distribution of Tn is determined by the joint distri-
bution of {Eih2P+l, with (Xl, ... ,Xp)' held fixed at its observed value. The
bootstrap method described here follows this latter approach and generates
the bootstrap observation using the "bootstrap" recursion relation

(8.14)

by setting (Xi, ... , X;)' =: (Xl' ... ' Xp)'. Here, the bootstrap error vari-
ables Ei's are generated by random, with replacement sampling of the resid-
uals {Ei =: Xi - 2::~=1 i3jnXi-j : p+ 1::::: i ::::: n}. Unlike the stationary case,
because the expectation of the Ei'S may not be finite, centering of the resid-
uals is not carried out in the explosive case. However, in case EIEp+ll < 00
and E€p+l = 0 in (8.10), one may center the residuals Ei and resample
from {Ei =: Ei - En : i = p + 1, ... , n}, where En = (n - p) -1 2::~=P+1 Ei. The
resulting bootstrap approximation also yields consistent estimators of the
sampling distribution of Tn (conditional on X!, ... , Xp) (cf. Theorem 3.1,
Datta (1995)).
Let f3~ denote the bootstrap version of i3n, obtained by replacing the
Xi'S in (8.4) with X;*'s of (8.14). Also, define An and A~ by replacing {3
in (8.11) by i3n and {3~, respectively. Then, a studentized version of i3n is
given by
208 8. Model-Based Bootstrap

The ARB versions of Tn and TIn are respectively given by T;; = (A~)n(,6~­
iJn) and T!n = ([A~]')n(,6~ - iJn).
To state the main results on the ARB method in the explosive case,
write Y = (X I, ... , Xp)'. Also, for any y E ]R.P and for a random vector
R = r(Ep +I,E p +2, ... ;Y), depending on {Eih?:p+1 and Y, let Py(R E')
denote the conditional distribution of R given Y at Y = y. Then, we have
the following result.

Theorem 8.3 Suppose that the conditions of Theorem 8.2 hold and that
the ARB samples Xi, ... ,X~, n 2: p + 1 are generated by (8.14) with the
initial values xt = Xi for i = 1, ... ,po Then, for any y E ]R.P,

(a) sup IPy(Tn ::::: x) - P*(T;; :::::


xElRP
x)1 = op(l) as n -+ 00 ,

(b) sup Ipy(Tln ::::: x) - P*(Ttn :::::


xElRP
x)1 = op(l) .

Proof: See Theorem 3.1 and Theorem 4.1 of Datta (1995). D

Thus, the ARB approximations to the (conditional) distributions of the


normalized as well as the studentized least square estimator of ,6 are consis-
tent under the very mild moment condition E log(1 + IEp+ll) < 00, which,
in particular, allows the error variables Ei'S to be heavy tailed. We shall
show in Chapter 11 that for iid heavy-tailed random variables and also for
stationary heavy-tailed processes satisfying certain weak dependence con-
ditions, the lID bootstrap of Efron (1979) and the MBB of Kiinsch (1989)
fail when the bootstrap resample sizes are equal to the sample size. In view
of these results, the consistency of the ARB under the same choice of the
res ample size is surprising and may be attributed to the specific nonstation-
ary structure of the explosive AR(p) process. For the resample size equal to
the sample size, Datta (1995) also proves almost sure validity of the ARB
in the explosive case, requiring EjEp+11 < 00.
Theorem 8.3 also shows that the ARB provides a valid approximation
if the random variables {Xih?:p+1 in the explosive case were generated by
(8.10), starting with any initial set of p (nonrandom) real numbers. In this
case, the bootstrap observations must also be generated starting with the
same initial values. On the other hand, to get a valid approximation to the
unconditional distribution of Tn or TIn, the initial p random variables for
the bootstrap recursion should be generated from the (joint) distribution
of (Xl, ... ,Xp )', which therefore, must be estimable consistently.
8.4 Bootstrapping Unstable Autoregressive Processes 209

8.4 Bootstrapping Unstable Autoregressive


Processes
In this section, we consider properties of the ARB method for the unstable
autoregressive processes. We call an autoregressive process of order p un-
stable if one or more ofthe roots of the characteristic polynomial \II p (z) (cf.
(8.9)) lie on the unit circle {z E C : Izl = I}. The standard ARB method
is known to perform poorly for such processes. To describe the major is-
sues involved in this case, we consider the first-order autoregressive process
{Xih~l given by
(8.15)
with Xo = 0 and f31 E {-I, I}, where {Ei}i>l is a sequence of iid random
variables with EEl = 0 and EEt = a 2 E (0,00). The AR(l) process in
(8.15) is unstable because the root of the polynomial \Il 1(z) == z - f31 lies
on the unit circle. The least square estimator /31n, defined by (8.4), is still
a consistent estimator of f31, but has a different convergence rate and a
different limit law from the stationary and the explosive cases. Indeed, for
1f311 = 1, f31n - f31 = Op(n -1 ) and
A

Tn (~ xl y/2 (/31n - f31 )

---;d ±~ (W2(1) _ 1) / [10 1 W 2(t)dt] 1/2 as n---->oo

(8.16)

under f3 = ±1, where W(·) denotes the standard Brownian motion on [0,1]
(see, for example, Fuller (1996)). In particular, the limit distribution of
Tn is nonnormal. In comparison, for the stationary case (viz., 1f311 < 1),
/31n - f31 = Op(n- 1/ 2 ) and Tn ---;d N(O, ( 2 ) as n ----> 00, while for the
explosive case (viz., 1f311 > 1), /31n - f31 = Op(f31n ) and the limit distribution
of Tn is also nonnormal and is given by (8.13) with p = 1.
For bootstrapping the unstable AR(l) process, we combine the recipes
for the stationary and the explosive cases as follows. Define the centered
residuals Ei = Ei - n- 1 2::7=1 Ei, 1 S; i S; n, where Ei = Xi - /31nXi-1,
1 S; i S; n. Starting with Xo = 0, generate the ARB sample Xi, ... , X,';-"
m 2: 1, using the bootstrap version of the relation (8.15)

(8.17)

where <'s are obtained by simple random sampling from the collection of
centered residuals {Ei : 1 S; i S; n}, with replacement. Unlike the stationary
case treated in Section 8.2, here the stationarity of {Xi, ... , X,';-,} is not of
paramount interest, as the AR(l) process (8.15) is itself nonstationary in
the unstable case.
210 8. Model-Based Bootstrap

The ARB version of Tn based on a res ample of size m is given by


m-l ) 1/2
T::',n =
(
~ xt (/3~m - (Jln) , (8.18)

where /3~m = [I::~1 Xt2] -1 ( I::~1 Xt Xt+l) is obtained by replacing


the Xi'S in the definition of {JIm by Xts, 1 :::; i :::; m (cf. (8.4)).
An important result of Datta (1996) shows that the ARB method fails
in the unstable case for the natural choice of the res ample size m = n.
Indeed, the bootstrap distribution of T:' n has a random limit for 1/311 = 1
and thus, it does not converge to the (no~random) limit distribution of Tn,
given by (8.16). A precursor of this result was obtained by Basawa et al.
(1991). They considered the model (8.15) with N(O, 1) error variables and
used a parametric variant of the ARB method where the bootstrap error
variables Ei's were generated from the standard normal distribution. See
Inoue and Kilian (2002) for some recent advances on the problem.
To describe Datta's (1996) result, let G(·; ,) denote the probability dis-
tribution of the random variable

where W(·) is the standard Brownian motion on [0,1]. Also, let r =


(W2(1) -1)j[2Jo W2(t)dt]. Define a random probability measure on
1

L
(JR,B(JR)) by
Goo(A) = G(dx; -r), A E B(JR) , (8.19)

where the randomness in Goo is engendered through the random variable r.


This Goo (-) turns out to be the limit of the bootstrap distribution function
estimator Gn (-) == P* (T:' n E .) in a suitable sense. Let lP' denote the collec-
tion of all probability m~asures on (JR, B(JR)), equipped with the topology
of weak convergence. Then, lP' is metricizable (cf. Parthasarathi (1967))
and Gn and Goo can be viewed as lP'-valued (Borel-measurable) random
elements. The following result asserts convergence in distribution of Gn to
Goo as lP'-valued random elements for the case /31 = 1. (See the discus-
sion following Theorem A.l in Appendix A or see Chapter 1 of Billingsley
(1968).) An analogous result holds also for the case /31 = -1.
Theorem 8.4 Suppose that /31 = 1 and that EIEI12+8 < 00 for some 15 > 0.
Let Gn(A) = P*(T:' n E A) denote the bootstrap distribution of T:' n with
resample size m = ";,, where T:',n is as defined in (8.18) with m = n.' Then,
d
Gn
A A

----+ Goo,
8.4 Bootstrapping Unstable Autoregressive Processes 211

where ----+d denotes convergence in distribution on the metric space IP'.


Proof: See Datta (1996). D

Thus, Theorem 8.4 shows that for any x E JR, if n is large, the ARB
estimator Gn (( -00, xl) == P* (T;:,n ::; x) of the target probability P(Tn ::; x)
behaves like the random variable Goo (( -00, xl), which has a nondegenerate
distribution on JR. As a result, there exists an "lo, 0 < "lo < 1, such that

P(lp*(T::,n ::; x) - P(Tn ::; x)1 > "lo)


----+ p(IGoo (( -00, xl) - P(Too ::; x)1 > "l0) > "lo (8.20)

as n ----+ 00, where Too denotes the random variable appearing on the right
side of ----+d in (8.16). Thus, (8.20) shows that with a positive probability,
the ARB estimator P* (T;: n ::::: x) takes values that are at least "lo-distance
away from the target P(Tn ::::: x) for large n. In practical applications,
this means that for a nontrivial part of the sample space, the bootstrap
estimator P* (T;:,n ::; x) will fail to come to within "lo-distance of the true
value even for an arbitrarily large sample size.
In the literature, similar inconsistency of bootstrap estimators have been
noted in other problems. For sums of heavy-tailed random variables, in-
consistency of the IID bootstrap of Efron (1979) has been established by
Athreya (1987) under independence. A similar result for the MBB has been
proved by Lahiri (1995) in the weakly dependent case (cf. Chapter 11). See
also Fukuchi (1994) and Bretagnolle (1983) for other examples. The main
reason for the failure of the ARB method in the unstable case seems to be
different from the failure of the bootstrap methods in the other situations
mentioned above. The ARB method fails here apparently because of the
fact that the least square estimator fhn of (31, which we have used here
to define the residuals for ARB resampling, does not converge at a "fast
enough" rate when 1(311 = 1. Datta and Sriram (1997) propose a modified
ARB where they replace the least square estimator fhn in the resampling
stage by a shrinkage estimator of (31 that converges at a faster rate for
1,611 = 1. With this, they show that the modified ARB method produces a
valid approximation to the normalized statistics Tn for all possible values
Of,61 E R
A second modification that is known to have worked in the other ex-
amples mentioned earlier, including the heavy-tail case and the sample
extremes, is to use a resample size m that grows to infinity at a rate slower
than the sample size n. On some occasions, this has been called the "m out
of n" bootstrap (cf. Bickel et al. (1997)) in the literature. We shall refer to
the ARB method based on a smaller resample size m as the "m out of n"
ARB method. Validity of the "m out of n" ARB method for the unstable
case (as well as for the other two cases) has been independently established
by Datta (1996) and Heimann and Kreiss (1996).
212 8. Model-Based Bootstrap

The following result of Datta (1996) provides conditions on the resample


size m for the "m out of n" ARB that ensure validity of the bootstrap
approximation to the distribution of Tn "almost surely" and also "in prob-
ability". A version of the "in probability" convergence result was proved
by Heimann and Kreiss (1996) under slightly weaker conditions, assuming
finiteness of the second moment of f1 only.

Theorem 8.5 Suppose that Elfll2+8 < 00 for some 8 > 0, that the AR
parameter (31 E JR, and that m i 00 as n ---+ 00. Also, suppose that T;" n is
as defined in (8. 18}. '
(a) If min --+ 0 as n --+ 00, then
Don == sup Ip(Tn :::; x) - P*(T;",n :::; x)l---t p 0 as n --+ 00 .
xEIR

(b) Ifm(loglogn)2/n --+ 0 as n ---+ 00, then Don = 0(1) an n ---+ 00, a.s.
Proof: See Theorem 2.1, Datta (1996). o

Theorem 8.5 shows that, for a wide range of choices of the resample
size m, the "m out of n" ARB approximation adapts itself to the different
shapes of the sampling distribution .c(Tn) of Tn in all three cases, viz., in
the stationary case (1(311 < 1), to .c(Tn) that has a normal limit, and in
the explosive (1(311 > 1) and the unstable (1(311 = 1) cases, where .c(Tn)
has distinct nonnormal limits. An optimal choice of m seems to be un-
known at this stage and it is expected to depend on the value of (31. In a
related problem Datta and McCormick (1995) have used a version of the
Jackknife-After-Bootstrap method of Efron (1992) to choose m empirically.
The Jackknife-After-Bootstrap method seems to be a reasonable approach
for data-based choice of m in the present set up as well. Also, see Sakov
and Bickel (1999) for a related work on the choice of m.
An important implication of Theorem 8.5 is that the "m out of n" ARB
can be effectively used to construct valid CIs for the AR parameter (31
under all three cases. Indeed, as the scaling factor o=~:; Xf)1/2 in the
definition of Tn is the same in all three cases, this provides a unified way
of constructing CIs for (31 that attain the nominal coverage probability
asymptotically for all (31 E JR. For a E (0,1), let im n(a) denote the ath
quantile of T;",n, defined by im,n(a) = inf{t E JR : 'P*(T;",n :::; t) ~ a}.
Then, for 0 < a < 1/2, a 100(1- 2a)% equal tailed "m out of n" bootstrap
CI for (31 is given by

Im,n(a) = (Sln - im,n(1- a) ·8;;:1, Sln - im,n(a) . S;;:l) , (8.21)

where s;, = (L~:; xl), n ~ 2. By Theorem 8.5, if m = o(n), then

(8.22)
8.4 Bootstrapping Unstable Autoregressive Processes 213

for all (31 E JR, where P~l denotes the joint distribution of {Xih2:1 under a
given value (31. Thus, the Cl Im,n(a) enjoys a "robustness" property over
the values of the parameter (31 in the sense that it gives an asymptotically
valid Cl for all (31 E JR. However, the price paid for this remarkable property
is that in the stationary case, the "m out of n" Cl Im,n(a) has a larger
coverage error than the usual Cl In,n(a) where the resample size m equals
n. Thus, if there is enough evidence in the data to suggest that (31 E (-1, 1),
then m = n is a better choice.
We now describe a numerical example to illustrate finite sample prop-
erties of the ARB in the unstable case. We considered model (8.15) with
Ei ,....., N(O, 1) and (31 = 1, and compared the accuracy of the usual "m = n"
and the "m out of n" ARB approximations to the distribution function of
the normalized statistic Tn when the sample of size n = 100. The choice
of m in the "m out of n" bootstrap was taken as m = 30, which was close
to the choice m = n 3 / 4 , considered in Datta (1996). Figure 8.4 shows the
usual ARB distribution function estimators with m = n = 100 and the "m
out of n" ARB distribution function estimators with m = 30 for four data
sets of size n = 100, generated from the AR(l) model (8.15) with the above
specifications. In each case, B = 500 bootstrap replicates have been used
to compute the bootstrap estimator P* (T~,n ::; .). The true distribution of
Tn, found by 10,000 simulation runs is shown by a solid curve, while the
"m = n" and the "m = o(n)" ARB distribution function estimators are
denoted by dotted and dashed curves, respectively. Notice that for all four
data sets, the modified ARB produced a better fit to the true distribution
function of Tn. A more quantitative comparison is carried out in Table 8.2,
which gives the values of the Kolmogorov-Smirnov goodness-of-fit statistic
for the four data sets. For all four data sets, the distance of the "m = n"
ARB from the true distribution function of Tn is at least 34% larger than
that of the "m out of n" ARB, as measured by the Kolmogorov-Smirnov
statistic.

TABLE 8.2. Values of the Kolmogorov-Smirnov goodness-of-fit statistic compar-


ing the usual "m = n" ARB (column 2) and the "m out of n" ARB (column 3)
distribution function estimators for four data sets of size n = 100 from model
(8.15) in the unstable case ({31 = 1). Column 4 is the ratio column 2/column 3.

Data Set m= 100 m=30 Relative Discrepancy


1 0.159 0.119 1.34
2 0.137 0.077 1.78
3 0.165 0.123 1.34
4 0.075 0.02 3.75
214 8. Model-Based Bootstrap

C! 0
~,~

'"ci '"ci
/"/.-.

/~ ..
'"ci '"ci
..,.
ci ci
,/
'"ci '"ci
0 0
ci ci
·3 ·2 ·1 0 2 3 -3 -2 -1 2 3

C! C!

'"ci '"ci
'"ci
..
"l
0
..,.
ci ci

'"ci '"ci
0 0
ci ci
-3 -2 -1 0 3 -3 -2 -1 0 2 3

FIGURE 8.4. Bootstrap distribution function estimates and the sampling distri-
bution of the normalized least square estimator Tn = [I:~:11 Xl]l /2 (~1 n - fh) for
four data sets of size n = 100 from model (8.15) with fh = 1, Ei ~ N(O, 1). The
solid line is for the true distribution function, while the dashed and the dotted
lines respectively denote the usual "m = n" approximation and the "m out of n"
ARB approximation with m = 30.

8.5 Bootstrapping a Stationary ARMA Process


The idea of bootstrapping a stationary AR(p) model can be easily adapted
to the more general class of stationary autoregressive and moving average
(ARMA) processes. The key observation here is that a stationary ARMA
process may be expressed both as an infinite order autoregressive process
as well as an infinite-order moving average process, when the invertibility
conditions hold. The autoregressive representation allows one to identify
the "residuals" in terms of the observable random variables, which are
then resampled to generate the "bootstrap error variables" . These, in turn,
are used to generate the bootstrap observations by employing the ARMA
recursive relation. The details of the method are notationally awkward.
As a result, we will first look at some auxiliary properties of the ARMA
process itself that will help us understand the main steps of the ARMA
bootstrap method better.
Let {XdiEZ be a stationary ARMA (p, q) process satisfying the difference
equation
p q

Xi = L!3iXi-j + LOojEi-j + Ei, i E Z , (8.23)


j=l j=l
8.5 Bootstrapping a Stationary ARMA Process 215

where p, q E Z+ with p+q E N, {EdiEZ is a sequence of iid random variables


with EEl = 0 and where (31,' .. ,(3p, a1,.'" a q E lR are parameters. Let
(3(z) = 1- L.~=l(3jZj and a(z) = 1 + L.]=lajZj, z E <C denote the
characteristic polynomials associated with the autoregressive part and the
moving average part, respectively. We shall suppose that the parameters
(31, ... ,(3p, aI, ... ,a q satisfy the causality and invertibility conditions that

{
(3(z) -=f. 0 for all Izl::; 1 (8.24)
a(z) -=f. 0 for all Izl::; 1
and a(z) and (3(z) have no common zero. Furthermore, we suppose that
a q -=f. 0, and (3p -=f. O. Then there exists a TJo > 1 (depending on the values
of (31, ... ,(3p and aI, ... , a q ) such that in the disc Izl ::; TJo, we have the
power series expansions
00

J j ,
"L.... b·z
j=O

"a
00

L.... ·zj
J , and
j=O

[(3(z)] -1 a(z) ~Pjzj = (~1jzj)-1 (8.25)

As a consequence of this, we may express the Xi'S as an infinite order


AR process and also as an infinite-order moving average (MA) process (cf.
Chapter 3, Brockwell and Davis (1991)). Indeed, the following are true:
00

Xi l:PjEi-j, iEZ
j=O

l: 1 X
00

Ei j i- j , i E Z, and
j=O

l: aj(Xi - j -
00

Ei (31 X i-j-1 - ... - (3pXi - j- p), i E Z , (8.26)


j=O
where the constants pi's, 1i's, and ai's are given by (8.25). From (8.25),
it follows that bo = ao = Po = 10 = 1. We adopt the convention that for
i < 0, bi = ai = Pi = 1i = O. Then, using the identity

[a(z)] [t, ajz j ] =1 for all Izl < TJo ,

and, for all k 2': 1, equating the coefficients of zk in the product on the left
side to zero, we have
(8.27)
216 8. Model-Based Bootstrap

Now, setting ao = 1 and 130 = -1, interchanging the summations three


times and using (8.23), (8.25)-(8.27), we have, for all i ?: 1 - q (cf. (2.3),
Kreiss (1987)),

L aj (Xi- j -
00

€i f31 X i-j-1 - ... - f3P X i- j - P )


j=O

1; aj-1 ( - ~ f3k X i+1-j-k ) + j'f:1 aj-1 (~ak€i+1-j-k )

1; aj~l ( - ~ f3k X i+1- j -k) + ~ ~ ak a i+1+s-k c s

i
~ aj-1
(P ) q-1
- t; f3k X i+1-j-k + ~ €-s
(S~ ai+1+s-k a k
)
(8.28)
Note that by (8.25), ai = O('T/o i ) as i --+ 00. Hence, for large i's, the
contribution ofthe second term in (8.28) is small. Thus, we may concentrate
on the first term only and define an "approximation" to €i by estimating
the coefficients aj-1 's and 13k's above. This observation forms the basis for
defining a residual-based resampling method for a stationary ARMA (p, q)
process, which we describe next.
Suppose that a finite segment X n +p = {X 1- p, ... , Xn} of the ARMA
(p,q) process {XihEZ of (8.23) is observed. Let (ihn, ... ,/3pn)' and
(a1n, ... , a qn )' respectively denote some estimators of the parameter vec-
tors (131, ... , f3p)' and (a1, ... , a q)' based on X n +p such that
p q

L l/3jn - f3j I + L lajn - aj I ----tp 0 as n --+ 00 . (8.29)


j=l j=l
Then, there exists 1 < 'T/1 < 'T/o such that, with high probability, the re-
ciprocal of the function a(z) == 1 + 2::)=1 ajnzj admits the power series
expansion

L ajn zj ,
00

[a(z)r 1 = Izl:<:::; 'T/1 (8.30)


j=O
for large values of n (cf. Lemma 2.2, Kreiss and Franke (1992)). ThEm, using
(8.28)-(8.30), we define the "residuals" fin'S by the relation

fin = tj=l
aj-1,n ( - t
k=O
/3knXi+1-j-k) , i = 1, ... , n , (8.31 )

where /3on = -1. Note that for a purely AR(p) process, if we set the moving
average parameters a1, ... ,aq equal to zero and also take ajn = 0, 1 :<:::; j :<:::;
8.5 Bootstrapping a Stationary ARMA Process 217

q, then it follows from (8.30) that aOn = 1 and ajn = 0 for all j 2': 1. In
this case, the "residual" fin of (8.31) reduces to fin = Xi - L:~=1 ~knXi-k'
which corresponds to the residuals defined in Section 8.2 for the ARB
method with ~kn = ~kn, the least square estimator of (3k, 1 ::; k ::; p.
The remaining steps of the bootstrap procedure for the stationary ARMA
process (we will call it the ARMA bootstrap or the ARMAB, in short)
parallel the steps in the ARB. Starting with fin, 1 ::; i ::; n, we form the
centered residuals Ein = fin - En, 1 ::; i ::; n, where En = n- 1L:~=1 fin.
Next, we generate iid bootstrap error variables C;:, i ;::: 1 - max{p, q} by
sampling at random, with replacement from {fin: 1 ::; i ::; n}. Then, we
define the bootstrap observations by using the recursion relation
p q

xt = L ~jnXt_j + L ajnE:_j + < (8.32)


j=l j=l

for i 2': 1 - max{p, q}, where, for i ::; - max{p, q}, we set Xt = 0
and Ei = 0 . The ARMA-bootstrap version of a random variable Tn
tn(Xn+p ; (31, ... , (3p, a1,···, a q ; F) is now defined by
(8.33)

where X;::+p = (Xr_ p, ... , X~), F is the unknown distribution function of E1


and Fn(x) = n- 1L:~=1 D.(Ein ::; x), x E 1R. denotes the distribution function
of Ei. One may, if desired, use a resample size m + p at the bootstrap stage
instead of the original sample size n + p; it is a relatively simple task to
modify the definition of the bootstrap variable T;:: in this case.
As in the case of the ARB, the variables Xt's generated by (8.32) (start-
ing with Xr = 0 = c;: for i ::; - max{p, q}) are not stationary. However, as
noted earlier, the effect of the initial values under (8.29) dies out exponen-
tially fast with high probability as n ---+ 00. As a result, the nonstationarity
of the Xt's typically does not have an effect on the limit. Alternatively,
one may proceed as in the ARB by generating a long enough chain and
discarding a set of beginning values to obtain the bootstrap observations.
Bose (1990) considers bootstrapping the pure moving average model
ARMA(O,l) and establishes second-order correctness of a version of the
ARMAB for the least square estimator of the MA parameter. The general
version of the ARMA bootstrap method presented here is due to Kreiss
and Franke (1992), who establish the validity of the ARMAB for a class
of M-estimators. Allen and Datta (1999) propose a modification to Kreiss
and Franke's (1992) proposal for bootstrapping M-estimators and show
superiority of the modified version using some simulation results.
In the following, we describe the results of Kreiss and Franke (1992)
and Allen and Datta (1999) in more details. Let F denote the distribution
function of the error variable E1 and let 7/J : 1R. ---+ 1R. be a function such that
(8.34)
218 8. Model-Based Bootstrap

Then, an M-estimator en of the parameter 00 = Uh, ... ,/3p,a1, ... ,aq)'


based on the function 1jJ is given by a measurable solution to the equation
(in 0 E ]Rp+q)
n
n- 1 L 1jJ(Ej(O))Z(j - 1; 0) =0, (8.35)
j=l

where, with 0 = (0 1, ... , Op+q)', the variables Ej(O) and Z(j - 1; 0) are de-
fined as Ej(O) == 2:{:~[ak(O)l (Xj- k - 2:f=l OiXj-k-i) and Z(j - 1; 0) =
2:t:~[ak(O)l (Xj- k- 1, ... ,Xj-k-p;Ej-k-1(0), ... ,Ej-k-q(0))', 1 ::; j ::;
n. Here, the factors ak (O)'s are formally defined by the relation (cf.
(8.25),(8.30))

(8.36)

Furthermore, in the definition of Z(j - 1; O)'s, we set dO) = 0 if i ::; o.


Note that at the true parameter value, 0 = 00 , Op+j = aj, 1 ::; j ::; q and
hence, by (8.25) and (8.36), ak(O) = ak, k 2: O. Thus, the variables Ej(O)'S,
1 ::; j ::; n play the role of "residuals" when 00 is "estimated" by O. In
particular, from (8.31) and (8.36), for 0 = (;31n, ... , ;3pn; ii 1n ,···, ii qn )', we
get Ej(O) = Ejn, 1::; j ::; n.
Next we describe a way of studentizing the multivariate M-estimator en.
Under some regularity conditions on 1jJ and F, it can be shown (cf. Kreiss
and Franke (1992)) that

(8.37)

where r n = n- 1 2:7=1 1jJ'(Ej)Z(j-l)Z(j-l)', \lin = n- 1/ 2 2:7=1 1jJ(Ej)Z(j-


1) and Z(j - 1) = Z(j - 1; ( 0 ), 1 ::; j ::; n. In the definition of r nand
elsewhere in this section, 1jJ' and 1jJ", respectively, denote the first and the
second derivatives of 1jJ. From (8.37), now it follows that we may studentize
en using the variance matrix estimator
A A_1A A_I
~n = r n An r n , (8.38)

where, with Zn(j-l) = Z(j-l; en), f n = n- 1 2:7=1 1jJ'(Ejn)Zn(j-l)Zn(j-


1)', and An = n- 2:7=1 1jJ2(Ejn)Zn(j - I)Zn(j - I)'. For definiteness, here
1
and in the rest of this section, we assume that the generic estimator en ==
(;31n, ... , ;3pn, chn, ... , ii qn )' of the ARM A model parameters is chosen to
be the M-estimator en = (e 1n , ... , ep+q,n)'. In particular, the residuals Ein'S
in (8.31) are defined with en = en. With this, the studentized version of
the M-estimator is given by

(8.39)
8.5 Bootstrapping a Stationary ARMA Process 219

where I;;;-1/2 is a (p + q) x (p + q) matrix (not necessarily symmetric) such


'-1/2)('_1/2)'
that ( ~n ~n ,
= ~;;-1. For example, we may use the Cholesky
decomposition of I;n to find I;;;-1/2.
Next, we define the bootstrap version of the studentized M-estimator
Tn of (8.39). With the ARMAB "observations" Xt's generated by (8.32),
define the bootstrapped M-estimator B~ as a solution of the equation (cf.
(8.35))
n
n- 1 2: 1jJ1 (Ej(B))Z*(j -1; B) = 0 , (8.40)
j=1
where E;(B) and Z*(j -1; B) are defined by replacing Xi's in the definitions
of Ej(B) and Z(j - 1; B), respectively, by Xt's, and where 1jJ1 (x) == 1jJ(x) -
E*1jJ(Ei). As in Section 4.3, we center the function 1jJ in the definition of
the bootstrapped M-estimator B~ to ensure

the bootstrap analog of the structural equation (8.34). It is worth not-


ing that for the lID bootstrap of Efron (1979), such an explicit center-
ing is important in linear regression models (cf. Freedman (1981), Shorack
(1982), Lahiri (1992b)), but not so for identically distributed observations
(cf. Lahiri (1992c, 1994)), as the centering factor is automatically ncgligible,
by the definition of the M-estimator en itself. For the ARMA bootstrap,
the centering has a negligible effect asymptotically (cf. Allen and Datta
(1999), p. 368), but it improves finite sample accuracy of the ARMAB,
particularly for models with a nontrivial MA-component.
Next, define ~~ by replacing the Xi's and en in the definition of I;n with
Xt's and B~, respectively. Then, the bootstrap version of Tn is given by

(8.41)

where [~~1-1/2 is a matrix satisfying ([~~1-1/2)([Z:;~1-1/2)' = [~~1-1. The


following result asserts the validity of the ARMAB for Tn.

Theorem 8.6 Suppose that 1jJ is twice continuously differentiable with


bounded derivatives 1jJ' and 1jJ" such that E1jJ(EI) = 0, E(1jJ(E1))2 E (0,00)
and E'lj/(E1) =I- O. Also, suppose that {e n }n>1 is a sequence of measur-
able solutions of (8.35) that is strongly consistent for' Bo. If, in addition,
Eh 13 < 00, then

(a) P* (there exists a solution B~ of (8.40) such that IB~ - enl <
Cn- 1 / 2 logn) 2: 1 - C(10gn)-2 a.s. ;

(b) for the sequence of solutions {B~}n>l of part (a),


220 8. Model-Based Bootstrap

sup Ip*(T~ ~ x) - P(Tn ~ X)I----+ 0 as n ----+ 00, a.s.


XElRp + q

Proof: Part (a) follows by using arguments similar to the proof of Theorem
4.2, Chapter 4. Allen and Datta (1999) gives a proof of part (b) assuming
vin((}~ - On) = OP. (1), a.s. Essentially, the same proof works in this
case. We leave the details of the modification of their proof to the reader. 0

Theorem 8.6 shows that the ARMAB provides a valid approximation to


the distribution of the studentized M-estimator Tn for a large class of score
functions 'IjJ. Asymptotically valid approximations to the distribution of Tn
may also be obtained by applying the ARMAB method to the studentized
version of the "linear approximation" r;;-l\jJn to vin(On - (}o), given by

Indeed, Kreiss and Franke (1992) establish validity of the ARMAB to


the distribution of vin(On - (}o) by applying the bootstrap to vinr;;-l\jJn.
As the effect of ignoring the higher-order terms in the stochastic expansion
of vin(On - (}o) would show up only in the second- and higher-order terms
in the corresponding Edgeworth expansions, a first-order analysis fails to
distinguish between the two versions. A finite sample simulation study in
Allen and Datta (1999) points out the superiority of applying the ARMAB
to Tn. A study of the second-order properties of the ARMAB and theoret-
ical analysis of the two approaches seem to be nonexistent in the literature
at this point.
9
Frequency Domain Bootstrap

9.1 Introduction
In this chapter, we describe a special type of transformation based-
bootstrap, known as the frequency domain bootstrap (FDB). Given a finite
stretch of observations from a stationary time series, here we consider the
discrete Fourier transforms (DFTs) of the data and use the transformed
values in the frequency domain to derive bootstrap approximations (hence,
the name FDB). In Section 9.2, we describe the FDB for a class of estima-
tors, called the ratio statistics. Dahlhaus and Janas's (1996) results show
that under suitable regularity conditions, the FDB is second-order accurate
for approximating the sampling distributions of ratio statistics. In Section
9.3, we describe the FDB method and its properties in the context of spec-
tral density estimation. Material covered in Section 9.3 is based on the
work of Franke and HardIe (1992). In Section 9.4, we describe a modified
version of the FDB due to Kreiss and Paparoditis (2003) that, under suit-
able regularity conditions, removes some of the limitations of the standard
FDB and yields valid approximations to the distributions of a larger class
of statistics than the class of ratio statistics. It is worth pointing out that
the results presented in this chapter on the FDB are valid only for linear
processes.
222 9. Frequency Domain Bootstrap

9.2 Bootstrapping Ratio Statistics


9.2.1 Spectral Means and Ratio Statistics
Let {XihEZ be a stationary time series with EXo = 0 and with spectral
density f. Let IT = [-71",71"]. Let dn(w) = L~=l X t exp( -~wt), w E II denote
the finite Fourier transform of Xl' ... ' Xn and let
(9.1)
denote the periodogram, where ~ = v'-I. Statistical analysis of {XihEZ
in the frequency domain is carried out in terms of the transformed val-
ues dn(w)'s. A class of level-1 parameters of interest that are commonly
considered in frequency domain analysis of the time series are given by

A(';; f) = Ju = (11r 6(w)f(w)dw, ... , i1r ';p(w)f(w)dw) I, (9.2)

where'; = (6, ... ,';p)' and where, for i = 1, ... ,p'';i : [0,71"] ----> lR. is a
function of bounded variation. The parameter A(';, f) in (9.2) is called a
spectral mean. A canonical estimator of A(';, f) is given by

The following are examples of some common spectral means and their
canonical estimators.

Example 9.1: (Autocovariance estimator). Let ';(w) = 2 cos kw, w E [0,71"]


for some integer k. Then,

£:
211r(cOSkW)In(W)dW

In(w)exp(-~kw)dw
(271"n)-1 LL
n n j1r
XtXj exp( -~wt) exp(~wj) exp( -~kw)dw
t=l j=l -1r
n-k
n- l L XtXHk ,
t=l
as I::1r exp(~mw)dw = 0 for any nonzero integer m. By similar arguments,

Thus, A(';; In) is the usual moment estimator of the autocovariance


EXIX Hk , for any k E Z.
9.2 Bootstrapping Ratio Statistics 223

Example 9.2: (Spectral distribution function estimator). Let ((w) =


D.[O,)"](w), wE [O,n] for some O:=:; A:=:; n. Then,

(9.4)

and
(9.5)

are, respectively, the spectral distribution function and its periodogram


based estimator. D

Suppose that {XihEZ is a linear process, i.e., {XihEZ has a representa-


tion of the form
Xi = Laj(i-j, i E Z
jEZ

where {aj}jEz C lR are constants satisfying l:jEZj2lajl < 00 and where


{(j }jEZ is a collection of iid random variables with E(l = 0 and 0'2 ==
E(r = 1. Then, the results of Dahlhaus (1983, 1985) imply that under
some suitable regularity conditions, for a real-valued ( (for simplicity of
discussion) ,

vin(A((; In) - A((; I))

~d N(O, [2n / e12 + ;: (/ (1)2]) , (9.6)

where ~4 is the fourth cumulant of (1. Dahlhaus and Janas (1996) showed
that under some regularity conditions, the FDB version of Rn (to be de-
e
scribed below) converges in distribution to N(O, 2n f P), with probabil-
ity 1. As a consequence, the FDB yields a valid approximation if either
~4 = 0 or f (I = O. The first condition is restrictive, as it specifies the
fourth cumulant of the innovations exactly and it holds, for example, if
(1 is Gaussian. In comparison, the second condition is less restrictive on
the distribution of the innovations but it limits the collection of ((.) func-
tions. Dahlhaus and Janas (1996) identified a large class of spectral mean
estimators, called the ratio statistics, for which the ((.) functions satisfy
the second condition f (I = 0 for any given spectral density f. We now
describe the ratio statistics.
Let
g(w) = I(w)/F(n), wE [O,n] (9.7)
denote the normalized spectral density of the process {XihEZ, where F
is the spectral distribution function, given by (9.4). Then, A((; g) == f (g
is a normalized spectral mean parameter with kernel ( : [0, n] ____ lRP • The
224 9. Frequency Domain Bootstrap

corresponding canonical estimator is defined as

(9.8)

where In(w) = In (w)/ Fn (1f) is the normalized periodogram and Fn (·) is the
spectral distribution estimator of (9.5). Note that the normalized spectral
mean estimator A(~; I n ) can be written as the ratio of two spectral mean
estimators as

(9.9)

and, hence, is called a ratio statistic or a ratio estimator. For ~(w) =


2coskw, w E [O,1fj of Example 9.1, we get A(~;g) = EXIXHk/E(XIX1),
the lag k autocorrelation of the process {XdiEZ and the corresponding
ratio estimator A(~; I n ) = L~~lk XiXHk/ L~=l xl is the lag k sample
autocorrelation. Similarly, ~ = ]1.[0,).] of Example 9.2 yields the normalized
spectral distribution function and its canonical estimator. Although the
class of ratio statistics is more restricted than the class of spectral mean
estimators, ratio statistics play an important role in many inference prob-
lems in the frequency domain. For example, the Yule-Walker estimators
of autoregressive parameters are based on estimators of autocorrelation,
Bartlett's Up-statistic for a goodness-of-fit test is based on the normalized
distribution function estimator, etc. See Dahlhaus and Janas (1996) for
further discussion on ratio statistics.
Next we show that for the ratio statistics, the second term in the asymp-
totic variance vanishes (cf. (9.6)). Indeed, for a given kernel ~ : [O,1fj----]R.P,
we have

Vn(A(~; I n ) - A(~; g))

(9.10)

where 'ljJ(w) == ~(w) J f - J u,


w E [O,1fj. Because J'ljJf = 0, 'ljJ satisfies
the second requirement for the validity of the FDB. In Section 9.2.3, we
shall show that under suitable conditions, the FDB outperforms the normal
approximation when applied to the ratio statistics.

9.2.2 Frequency Domain Bootstrap for Ratio Statistics


It is well known (cf. Chapters 4 and 5, Brillinger (1981), Lahiri (2003a)) that
under some mild conditions on the process {XihEZ, the finite Fourier trans-
forms dn(wd, ... , dn(Wk) are asymptotically independent for any given set
9.2 Bootstrapping Ratio Statistics 225

of distinct frequencies WI, ... , Wk E [0,11"]. In an unpublished article, Hur-


vich and Zeger (1987) proposed the FDB based on this observation. Since
the DFTs at distinct frequencies are approximately independent, they sug-
gested resampling suitably studentized DFT values one at a time as in
Efron's (1979) IID bootstrap. Since its inception, properties of the FDB
have been studied by many authors. Nordgaard (1992) considered the FDB
for Gaussian processes, while Franke and HardIe (1992) applied the FDB
to the problem of spectral density estimation (see Section 9.3 for more de-
tails). In this section, we describe the FDB for the ratio statistics of Section
9.2.1, based on the work of Dahlhaus and Janas (1996).
Let I jn = In (27rj In), j = 1, ... , no denote the periodogram at discrete
ordinates 211"j In, where no = ln/2 J. Note that for any given set of frequen-
cies 0 < WI < ... < Wk < 11", the scaled periodogram values In (Wj) I f (Wj),
j = 1, ... , k are asymptotically pivotal in the sense that they are asymp-
totically distributed as iid Exponential (1) random variables (cf. Theorem
5.2.6, Brillinger (1981)) and, hence, have limit distributions free of un-
known parameters. This suggests that we may use an estimator in of f
(say, a kernel density estimator) to studentize Ijn's. The main steps of the
FDB for the ratio statistics are as follows:
Step 1: Form the studentized periodogram ordinates Ejn = Ijnl ijn, j =
1,2, ... , no, where ijn = in(>'jn) and Ajn == 211"j In.
Step 2: Define the rescaled variables Ejn = Ejn/E.n, 1 ::; j ::; no, where
-1 ",no
E·n = nO L...i=1 Ein·
A A

Step 3: Generate the bootstrap variables Ejn' j = 1, ... , no by sampling


randomly with replacement from the collection {Ejn : j = 1, ... ,no}·

Step 4: Define the bootstrap periodogram values by Ijn = ijn . Ejn'


1 ::; j ::; no.

Define the bootstrap versions of In(w) and g(w) at Ajn by Jjn =


W =
Ijn/[2; L~!1 Itn] and gjn = ijn/[2; L~!1 iin], j = 1, ... , no. Then, the
FDB version of the centered and scaled ratio estimator A(~, I n ), i.e., of

(9.11)

is given by
T~ = vn(B(~; J~) - B(~; gn)) , (9.12)
where

and
226 9. Frequency Domain Bootstrap

The summations in B(~; J~) and B(~; fin) above are approximations to the
corresponding integrals over the interval [0,7f], where the approximating
step functions are constant over subintervals of length 27f I n. This results
in the factor 27fln, which is comparable to the factor (7flno) appearing in
Dahlhaus and Janas (1996). However, the effect of this scalar multiplier
vanishes for ratio statistics, as the constants from the numerator and the
denominator cancel out each other.
The rescaling in Step 2 plays a role similar to the centering of the es-
timating equations in the context of bootstrapping the M-estimators (cf.
Section 4.3). Without the rescaling, the FDB approximation may fail to
be consistent. Although the given variables Xi's are dependent, the stu-
dentized periodogram variables tjn'S are approximately iid and, hence, the
resampling scheme in Step 3 above resamples a single value at a time as
in Efron's (1979) iid resampling scheme (cf. Section 2.2). An alternative
version of the FDB can be defined by replacing the bootstrap variables Ej'S
by iid standard exponentially distributed variables EJ's (say) in Step 3
and using Ij~ == fjn . Ej, 1 :::; j :::; no as the bootstrap periodogram values,
instead of Ijn's of Step 4. Both versions of the FDB are known to have a
similar accuracy up to the second-order (cf. Remark 6, Dahlhaus and Janas
(1996)).

9.2.3 Second-Order Correctness of the FDB


In this section, we consider second-order properties of the FDB approxi-
mation to the distribution of Tn. The following conditions will be used.

Conditions:
(C.1) {XdiEZ is a linear process of the form

Xi = Laj(i-j, i E Z, (9.13)
jEZ
where ai's are real numbers satisfying ai = O(exp( -Clil)) as Iii ----> 00
for some C E (0,00) and {(diEZ is a collection of iid random variables
with E(l = 0, E(f. = 1, Ea = 0 and Ea < 00.

(C.2) (i) The spectral density f( w) == (27f)-1[ I:jEz aj exp(~wj) [2, Iwl :::;
7f satisfies
inf f(w) > 0 . (9.14)
wE[O,n]

(ii) The estimator fn of f used in the FDB method is uniformly


strongly consistent, i.e.,

sup [fn(w) - f(w)[----> 0 as n ----> 00, a.s. (9.15)


wE[O,n]
9.2 Bootstrapping Ratio Statistics 227

(C.3) The function ~ = (6, ... , ~p) : [0, 'if] ----+ ]RP is of bounded variation
(component wise) and

II~t(k)11 = O( exp(-Clkl)) as Ikl----+ 00 (9.16)

for some C E (0,00) where ~t (k) = 2 foK ~(w) cos kwdw is the Fourier
coefficient of ~ (extended as a symmetric function over [-'if,'if]).
(C.4) (i) ((1, (r)' satisfies Cramer's condition, i.e.,

lim IEexp (((l,(~)t)1 < 1. (9.17)


II t II-tCXJ

(ii) Let Wn denote the eight-dimensional finite Fourier transforms

for {j1, ... ,j8} C {I, ... ,no -I} or the (d+ 1) dimensional spec-
tral mean estimator f(e,l)'In. Then, I; = limn-tCXJ Cov(Wn )
exists and is nonsingular in each case. Further, L;l
f((, 1)'((, 1)P is nonsingular.
Next, we briefly comment on the conditions. The exponential decay of the
coefficients {aj}jEZ in (9.13) is required for establishing valid Edgeworth
expansions for the normalized ratio estimator Tn, based on the work of
GCitze and Hipp (1983) and Janas (1994). It can be replaced by a suitable
polynomial decay condition if one is to have only consistency of the FDB for
Tn. The condition on the first two moments the innovations (i 's is standard.
°
However, the requirement that E(r = is very stringent. Dahlhaus and
Janas (1996) point out (in their Remark 5, page 1942) that the rate ofFDB
approximation is only O(n- 1 / 2 ), i.e., the FDB is only first-order accurate
when this condition fails. Condition (C.2)(i) ensures a nondegenerate limit
distribution of the periodogram In(w) at each w E [0, 'if]. The uniform
strong consistency of in
is known when in
is a kernel spectral density
estimator of f. See, for example, Theorem AI, Franke and HardIe (1992).
Exponential decay of the Fourier coefficients e(k) in (9.16) of Condition
(C.3) is again required for establishing a valid Edgeworth expansion for
the ratio statistic Tn (cf. Janas (1994)). This condition does not hold for
the ~-function of Example 9.2, corresponding to the spectral distribution
function estimator. The technical conditions of Condition (C.4) are needed
to establish valid Edgeworth expansions for Tn. As mentioned in Chapter
6, (9.17) holds if (1 has an absolutely continuous component with respect
to the Lebesgue measure on R
Theorem 9.1 Suppose that Conditions (C.1)-(o.4) hold. Then,

sup IP(AnTn E A) - P*(AnT~ E A)I = o(n- 1 / 2 ) as n ----+ 00, a.s. ,


AEC
228 9. Frequency Domain Bootstrap

where C is the collection of all convex measurable sets in JRP, and An and
An are symmetric p x p matrices satisfying

respectively.

Proof: This is a special case of Theorem 1 of Dahlhaus and Janas (1996),


when no data-tapers are used. See Dahlhaus and Janas (1996) for a proof. D

Thus, under the conditions of Theorem 9.1, the FDB provides a better
approximation to the distribution of the normalized ratio statistic than the
normal approximation, which has an error of the order O(n- l / 2 ). Dahlhaus
and Janas (1996) prove the second-order correctness of the FDB for nor-
malized ratio statistics in the more general case where the periodogram is
defined using a data-taper. Furthermore, they also establish the superiority
of the FDB for the normalized Whittle estimator over normal approxima-
tion.
The results on the FDB for ratio statistics are valid under the assumption
that EX l = 0. If the mean of the stationary process {XdiEZ is indeed
unknown and estimated explicitly, say, by using Xi - Xn in place of Xi
for the calculation of the periodogram In (cf. (9.1)), the FD B has an error
of the order O(n-l/2) for ratio statistics. As a result, in this case, the
FDB no longer possesses the superiority over the normal approximation
(cf. Remark 4, Dahlhaus and Janas (1996)). Furthermore, as pointed out
earlier, the superiority of the FDB is also lost if the third moment of the
innovation variables does not vanish, i.e., if E(? -=I- O. Hence, it appears
that the superiority of the FDB approximation for ratio statistics is rather
sensitive to violations of the model assumptions.

9.3 Bootstrapping Spectral Density Estimators


In one of the early works on the FDB, Franke and HardIe (1992) studied the
FDB for spectral density estimation. In this section, we describe the FDB
method for kernel estimators of the spectral density fO of a stationary
process {XihEZ' and consider its consistency properties. We continue to
suppose that {XdiEZ is a linear process, given by (9.13) for some constants
ai E JR, i E Z with LiEz liai I < 00, and for some iid random variables
{(diEz with E(l = 0, ECf = 1. In particular, we suppose that EXl = 0
and the spectral density fO of {XdiEz is given by

f(w) = (21f)-ll LajexP (L j w)1 2, Iwl::::; 1f. (9.18)


jEZ
9.3 Bootstrapping Spectral Density Estimators 229

Define the raw periodogram of Xl, ... ,Xn by


n 2
hn(w) = n- 1 1LXjexp(twj)1 ' Iwl:::; 7r. (9.19)
j=l

Thus, hn(') is related to the periodogram I n (·) of Xl"'" X n , defined in


(9.1), by
(9.20)
Let K : lli. ----+ (0,00) be a symmetric function satisfying J~oo K(x)dx = 27r.
Then, a kernel estimator of f with bandwidth h == h n > 0 is given by
A
L
no
in(w; h) = (nh)-l . K(W -h jn )hn(Ajn), wE [-7r,7r] , (9.21 )
J=-no

where Ajn == 27rj/n, -no:::; j :::; no and no = In/2J. Performance of in(-;')


as an estimator of f(·) crucially depends on the bandwidth h. Under some
suitable assumption on the process {XdiEZ and on the kernel K(-), the
relative mean square error (RMSE) of in(';') at a point w E [7r,7r] admits
the following expansion (cf. Franke and HardIe (1992), p. 122):

I:
RMSE(w; h) E[in(w; h) - f(w)f / f(W)2

(h2 f"(w)/[2f(w)]) 2 + (27r)-1 K2(X)dx· (nh)-l

+ o([nh]-l + h- 4 ) (9.22)

as n ----+ 00, h ----+ 0 such that nh ----+ 00. Thus, the optimal h that minimizes
the RMSE in (9.22) is asymptotically equivalent to Co . n- 1 / 5 for some
suitable constant Co E (0,00). In the sequel, we suppose that bandwidth
h for the spectral density estimator inC h) lies in an interval of the form
[c5n- 1 / 5 , c5- 1 n- 1 / 5 ] for some arbitrarily small c5 > O. In the next section, we
describe the FDB for inC h) under this restriction, although the bootstrap
algorithm itself may be stated almost without any changes for other values
of h.

9.3.1 Frequency Domain Bootstrap for Spectral Density


Estimation
In the problems encountered so far, the level-1 parameters were finite di-
mensional and their estimators converge at the rate Op(n- 1 / 2). In con-
trast, here the level-1 parameter of interest is a function, namely, f('), and
the estimator inC') has a slower rate of convergence than the standard
Op(n- 1 / 2 ) rate. For a bandwidth h n rv Cn- 1 / 5 , the estimator inC h n )
230 9. Frequency Domain Bootstrap

has a bias that is of the same order as its standard deviation. As a re-
sult, for a valid approximation, the bootstrap algorithm must implicitly
correct for the effect of the bias. A similar situation arises in the con-
text of density estimation (cf. Romano (1988), Faraway and Jhun (1990),
Hall (1992), Hall, Lahiri and Truong (1995)) and regression function esti-
mation (cf. HardIe and Bowman (1988), Hall, Lahiri and Polzehl (1995))
with both independent and dependent data. Since hnP'jn)! f(Ajn)'S are
approximately independent, this leads to the "approximate" multiplicative
regression model
(9.23)
with "approximately" independent error variables Ejn'S and with the "re-
gression function" fe). The FDB for the spectral density estimation makes
use of two bandwidths hln and h2n of different orders as in bootstrapping a
nonparametric regression model with independent errors (cf. Hall (1992)).
The main steps of the bootstrap procedure are as follows:

Step 1: Define the residuals Ejn = hn(Ajn)! in(Ajn; h 1n ), j = 1, ... , no


with an initial bandwidth hln > 0, where inC;·) is as defined in
(9.21 ).

Step 2: Rescale the residuals Ejn'S to get

h - -l,\",no
were E· n = no Dj=l
A

Ejn·

Step 3: Draw a sample Ein' ... ,E~on of size no, randomly, with replacement
from {Ejn : j = 1, ... , no}.

Step 4: Define the bootstrap periodogram values

(9.24)

using a second bandwidth h2n > 0 (typically, different from hln).


Then, the FDB version of the estimator in(w; h) is given by (cf.
(9.21) )

fn*( w; h) = (
nh)- 1 0~ K (W-Ajn)
h *
I1n(Ajn), wE [-7f,7f] .
j=-no
(9.25)

As in the FDB for ratio statistics, the rescaling of the "residuals" in Step
2 ensures that Ein's have mean 1, and this avoids an additional bias at the
resampling stage. Indeed, the FDB may fail without the rescaling. As the
regularity conditions below show, the two bandwidths h 1n and h2n used in
the FDB above are required to satisfy different decay conditions. The initial
9.3 Bootstrapping Spectral Density Estimators 231

bandwidth h ln may be of the same order as the given bandwidth h = hn'


which is assumed to have the RMSE-optimal order, n- l / 5 . However, the
second bandwidth h2n' employed in Step 4 to define the bootstrap raw
periodogram finO, is required to go to zero at a rate slower than n- l / 5 .
Thus, the estimator inC; h 2n ) is smoother than the given estimator in('; h).
The over-smoothing at the bootstrap level is needed to ensure consistent
estimation of the bias of the estimator inC h), and is a standard device
in bootstrapping nonparametric regression models (cf. Hall (1992)). In the
next section, we show that the above version of the FDB provides a valid
approximation to the distribution of centered and scaled spectral density
estimator inC; h) for any sequence {h} == {h n }n>l that decreases to zero
at the optimal rate n- l / 5 . -

9.3.2 Consistency of the FDB Distribution Function


Estimator
We shall make use of the following conditions for proving the results of
this section.

Conditions:

(C.5) {XdiEZ is a linear process, given by (9.13) where the constants


{aihEZ satisfy 2::;:-=
Ijllajl < 00 and where the iid random vari-
ables {(ihEZ satisfy E(l = 0, E(? = 1, and EI(115 < 00. Further-
more, sup{IE exp(d(dI : It I 2 J} < 1 for all J > O.

(C.6) The spectral density fO of {XdiEZ is nonzero and twice continuously


differentiable on [-1f, 1f].

(C.7) K(·) is a symmetric, nonnegative kernel on (-00,00) satisfying


J~= K(x)dx = 21f, J~= x 2K(x)dx = 21f. Furthermore, K has a com-
pact support and K (.) is Lipschitz.

(C.S) (i) {h n }n>l is a sequence of positive real numbers such that there
exists "5 E (0,1) such that h n E [In- 1/ 5 , J-1 n -1/5] for all n 2
J-1.
(ii) hln ---+ 0 and (nhfn)-l = 0(1) as n ---+ 00.

(iii) h2n ---+ 0 and hn/h2n ---+ 0 as n ---+ 00.

Conditions (C.5) and (C.6) are, respectively, similar to Conditions (C.1)


and (C.2)(i), used in the case of ratio statistics. However, here the constants
ai's in (C.5) are allowed to go to zero at a polynomial rate. Condition
(C.7) on the kernel K requires that K (.) /21f be a symmetric probability
density function (with respect to the Lebesgue measure) with mean zero
and variance equal to one. Finally, Condition (C.S) requires the initial
232 9. Frequency Domain Bootstrap

bandwidth h In to go to zero at a rate not faster than n- I /4, i.e., h In can


be taken as C . n- IJ for some 0 < () :::; 1/4. In particular, h In can be of the
same order (viz. n- I / 5 ) as the given bandwidth h n . The second bandwidth
h2n must go to zero at a rate slower than h n . Thus, a set of permissible
values of h2n is given by h2n = C· n- f3 for C E (0,00) and 0 < fJ < 1/5.
Next, define the centered and scaled version of the spectral density
in(w; h n ) as

(9.26)

The bootstrap version of Rn (.; .) is given by

(9.27)

Note that we use in(w; h 2n ) to center and scale the FDB version f~(w; h n )
in (9.27). That this is the appropriate quantity for normalizing f~(w; h n )
follows from (9.24) in Step 4 of the FDB. Since the bootstrap periodogram
values were generated with in(>'jn; h 2n ), by comparing the relations be-
tween the pairs of equations (9.21) and (9.23) in the unbootstrapped case
with their bootstrap analogs (9.24) and (9.25), we see that in(-; h 2n ) plays
the role of the true density fO for the bootstrap spectral density estimator
f~(w; h n ).
The following result shows that the FDB provides a valid approximation
to the distribution of the normalized spectral density estimator in(w; h n )
for any given w E [-7f,7f] and any h n satisfying (C.8)(i).

Theorem 9.2 Suppose that Conditions (C.5}-(C.8) hold. Then, for any
wE [-7f,7fJ,

sup Ip(Rn(w; h n ) :::; x) - P*(R~(w; h n ) :::; x)1


xEIR

---+ 0 in probability as n -t 00 , (9.28)

where Rn(·;·) and R~(-;·) are as given by (9.26) and (9.27), respectively.
Proof: Theorem 9.2 is a version of Theorem 1 of Franke and HardIe
(1992), where their Condition (C.4) has been dropped and where the dis-
tance between the probability distributions of Rn and R~ in Mallow's met-
ric has been replaced by the sup-norm distance. Note that if (27f)-IKt(.)
denotes the characteristic function corresponding to the probability density
(27f)-1 K(·), then

lim Kt (t) - Kt(O)


t2

JX{
t--->O

= (27f) E~ [(27f)-1 Kt(t) -1- it (27f)-IK(x) }dx] /t 2


9.3 Bootstrapping Spectral Density Estimators 233

-1['. (9.29)

This shows that Condition (C.4) of Franke and HardIe (1992) follows from
Condition (C.7) above, which is a restatement of their Condition (C.3).
Hence, Theorem 9.2 follows from Theorem 1 of Franke and HardIe (1992),
in view of (9.29) and in view of the fact that convergence in Mallow's
metric implies weak convergence. D

As in the case of the ratio statistics, consistent estimators of the sampling


distribution of Rn (w; h n ) can be generated by replacing the variables Ej'S
in Steps 1-4 above with iid exponentially distributed random variables
Ej*'S. See Theorem 1 of Franke and HardIe (1992) for the validity of this
variant of the FDB, which also holds under Conditions (C.5)-(C.8). Thus,
both variants of the FDB can be used for setting confidence intervals for
the unknown spectral density f('), where the quantiles of Rn(w; h n ) are
replaced by the corresponding bootstrap quantiles. Accuracy of these CIs
and relative merits of the two versions of the FDB CIs for f (w) are unknown
at this time.

9.3.3 Bandwidth Selection


Franke and HardIe (1992) also consider an important application of the
FDB to the problem of choosing optimal bandwidths for spectral density
estimation. To describe their results, we suppose that the optimality of a
spectral density estimator in (-; .)
is measured by the relative mean-square
error (RMSE) of Section 9.3.2 (cf. (9.22)):

RMSE(w; h) == E(Jn(w; h) - f(w)r / f2(w) .


Furthermore, following Rice's (1984) approach, we restrict attention to an
interval Hn == [8n- 1 / 5 , 8- 1 n- 1 / 5 ] (for a suitably small 8 E (0,1)) of possible
bandwidths that go to zero at the optimal rate n -1/5. Then, the theoretical
RMSE optimal bandwidth h~ == h~ (w) for estimating the spectral density
f(w) at w is defined by
RMSE(w; h~) = inf RMSE(w; h) . (9.30)
hEJi n

Note that, in view of (9.22), the optimal bandwidth h~ satisfies the relation

h~ n- 1 / 5 [(21[')-1 J CXJ

-CXJ K2(x)dx· {f(w)/ !"(w)}2]


1/5
(1 + 0(1))
as n----;oo,

n- 1 / 5co(1 + 0(1)), say, (9.31 )


234 9. Frequency Domain Bootstrap

provided f"(w) #- 0, and 0 E (0,1) is small enough to satisfy 0 < Co <


0- 1 . Thus, the optimal bandwidth h~ depends on the unknown spectral
density f(·) and its second derivative. For a data-based choice of the optimal
bandwidth, we first define an estimated version of the RMSE criterion
function using the FDB, and minimize the resulting function to obtain
the FDB estimator of the level-2 parameter h~. Let f~(w; h) be the FDB
version of in(w; h), given by (9.25). Then, the estimated criterion function
is given by

(9.32)

h E H n , where E* denotes the conditional expectation given {XdiEZ,


The bootstrap estimator of h~ is given by a bandwidth h~ that minimizes
:RMsE(w; h), i.e.,
0 _ .-------
h n = argmm{RMSE(w; h) : h E Hn} . (9.33)
A

An important feature of the FDB-based estimated criterion function


:RMsE(-; .) is that no Monte-Carlo computation is necessary for the eval-
uation of RMSE. An explicit formula for :R:MsE(-; .) can be written down
using the linearity of the bootstrapped estimator f~ ('; .) and the indepen-
dence of the resampled variables Ej'S, similar to the MBB estimator of
the variance of the sample mean, given by (3.9). Indeed, straightforward
algebra yields

:RMsE(w; h)

j=-no

J=-no

(9.34)

Thus, one may find the FDB estimator of the optimal bandwidth h~ by
equivalently minimizing the explicit expression (9.34). The following result
shows that h~ is consistent for h~. Furthermore, the estimated criterion
function at h~ attains the optimal theoretical RMSE level over the set H n ,
asymptotically, in probability.

Theorem 9.3 Assume that the conditions of Theorem 9.2 hold and that
f"(w) #- 0 and 0 < Co < 0- 1 . Then, for h~ and h~, respectively defined by
(9.30) and (9.33),
(i) n1/5(h~ - h~) ---+p 0 as n ---+ 00 ,
9.4 A Modified FDB 235

( ii) RMSE(w;h~) 1 as n ---+ 00 .


RMSE(w;h~J -----+p

Proof: See Theorem 3, Franke and HardIe (1992). o

9.4 A Modified FD B
In this section, we describe a modified version of the FDB based on the
work of Kreiss and Paparoditis (2003). The modified FDB removes some
of the limitations of the FDB and provides valid approximations to the
distributions of a larger class spectral mean estimators than the class of ra-
tio statistics (cf. Section 9.2). Furthermore, the modified FDB continues to
provide a valid approximation in the spectral density estimation problems
considered above. Let {XdiEZ be a causal linear process, given by

=UL
00

Xi aj(i-j, i E Z , (9.35)
j=O

where ao = 1 and {adi>l is a sequence of real numbers satisfying


2::1 i 2 a il < 00 and {(ihE~ is a sequence of iid zero mean, unit variance
1

random variables with E(t < 00. Also, let InO denote the periodogram of
Xl"'" X n , defined by (9.1), i.e.,

n 2
In(w) = (21fn)-11 LXtexP(-LWt)1 ' w E ~-1f,1fl , (9.36)
t=l

and let fO denote the spectral density of {XihEZ, It is known (cf. Priestley
(1981), Chapter 6) that at the discrete frequencies Ajn == 21fj/n, 1 ::; j ::;
no,
(9.37)
and

if j=l=k
(9.38)
for all 1 ::; j, k ::; no, where no = Ln/2J and where ~4 = (E(t - 3) de-
notes the fourth cumulant of the innovation (1. Thus, if ~4 =1= 0, then the
periodogram at distinct ordinates Ajn and Akn have a nonzero correlation
and are dependent. Although the dependence of the periodogram values
In(Ajn) and In(Akn) vanishes asymptotically, the aggregated effect of this
dependence on the limit distribution of a spectral mean estimator may
not be negligible. Indeed, as noted in Section 9.2 (cf. (9.6)), for the spec-
J
tral mean A(~; f) == o71: U and its canonical estimator A(~; In) == o71: ~In J
236 9. Frequency Domain Bootstrap

corresponding to a function ~ : [0,1T] -+ ffi. of bounded variation, we have

(9.39)

The second term (i.e., K4(J f;,j)2) in the asymptotic variance of


Vn(A(~; In) - A(~; f)) results from the combined effect of the nonzero
correlations among the In p.'jn) 's. The standard version of the FDB fails
in such cases due to the fact that the bootstrap periodogram values Ijn's
generated by the FDB algorithm (cf. Step 4, Section 9.2.2) are independent
and, hence, do not have the same correlation structure as the periodogram
variables {In(>ljn) : 1 :s; j :s; no}. The modified version of the FDB, pro-
posed by Kreiss and Paparoditis (2003) gets around this problem by fitting
an autoregressive process to the variables {Xl, ... ,Xn } first and then scal-
ing the periodogram values of the fitted autoregressive process to mimic
the covariance structure of the InP.'jn)'s. As a result, the modified FDB
captures the dependence structure of In(Ajn)'s adequately and provides
a valid approximation to the distribution of Vn(A(~; In) - A(~; f)) even
when the term K4(J f;,j)2 in (9.39) is nonzero.

9.4.1 Motivation
We now describe the intuitive reasoning behind the formulation of the
modified FDB. Let {Yi};EZ be a stationary autoregressive process of order
p, fitted to {Xi}iEZ by minimizing the distance E(Xi - ~~=l (3jXi - j )2 over
(31, ... ,(3p. Write '"'((k) = CoV(Xl,X Hk ), k E Z. Then, the {Yi};Ez-process
is given by
p

Yi = L ;3jYi-j + o-p(i, i EZ , (9.40)


j=1

- - -, -1 -2 -, - - .
where (3 ((31, ... ,(3p) = fp '"'(p, up = '"'((0) -(3fp(3 and {(diEZ IS a
sequence of iid random variables with E(l = 0 and E(f = 1. Here, f p
is the p x p matrix with (i,j)th element '"'((i - j), 1 :s; i, j :s; p and
'"'(p = ('"'((1), ... ,'"'((p))'. As '"'((k) -+ 0 as Ikl -+ 00, by Proposition 5.1.1
of Brockwell and Davis (1991), for every pEN, f;l exists. For the rest of
this section, suppose that E(t < 00. Let

P 2
fAR(W) = 0-;/1 1 - L;3j exp(-~jw)1 ' wE [-1T,1T] (9.41 )
j=l

denote the spectral density of the fitted autoregressive process {Yi}iEZ.


Next, define the variables Wn(Ajn), 1 :s; j :s; no by

(9.42)
9.4 A Modified FDB 237

where I:;R(w) == (2'ifn)-11 I:~=l Y'texp(-LwtW, w E [-'if, 'if] is the peri-


odogram of Yl , ... ,Yn and where the multiplicative factor q(.) is defined
as
q(w) = f(w) / fAR(W), wE [-'if, 'if] . (9.43)

Note that the periodogram I:;R of the fitted autoregressive process satisfies
relations (9.37) and (9.38) with f replaced by fAR and K:4 replaced by
k4 = (E(t - 3), the fourth cumulant of (1. As a result, by (9.42) and
(9.43), it follows that the variables Wn(Ajn)'S satisfy

and

for all 1 :::; j, k :::; no. Thus, the covariance structure of Wn (Ajn), 1 :::; j :::; no
closely mimics that of the periodogram variables In (Ajn), 1 :::; j :::; no,
provided k4 is close to K:4. The modified version of the FDB, proposed by
Kreiss and Paparoditis (2003), fits an autoregressive process empirically
and replaces the multiplicative factor qU by a data-based version. In the
next section, we describe the details of this modified FDB method, known
as the autoregressive-aided FDB (or the ARFDB) method.

9.4.2 The Autoregressive-Aided FDB


Suppose that a finite stretch Xl, ... ,Xn of the series {XihEZ is observed
and that we want to approximate the distribution of the centered and scaled
spectral mean estimator

where A(~; In) and A(~; f) are as in relation (9.39), i.e., A(~; In) = fo7r Un
and A(~; f) = fo7r U for a given function ~ : [0, 'if] --+ lR of bounded varia-
tion. Extensions to the vector-valued case is straightforward and is left out
in the discussion below.
The basic steps in the ARFDB are as follows:

Step (I): Given Xl' ... ' X n , fit an autoregressive process {YihEZ ==
2
{Yin}iEZ of order P (== Pn). Let ((3ln, ... , (3pn) and (In denote the esti-
A A

mated parameter values, obtained by using the Yule-Walker equations


(d. Chapter 8, Brockwell and Davis (1991)). Let
p

(tn = X t - 2.:= {JjnXt-j, t = P + 1, ... , n


j=l
238 9. Frequency Domain Bootstrap

denote the residuals and let

(9.44)

be the standardized residuals, where (n = (n - p)-l L:~=p+l (tn and


8; = (n - p)-2 L:~=P+l ((tn - (n)2. Write Fn for the empirical distri-
bution function of {(tn : t = P + 1, ... ,n}, i.e. ,
n
Fn(x) = (n - p)-l L ll((tn:S: x), x E lE. . (9.45)
t=p+l

Step (II): Generate the bootstrap variables Xi, ... ,X~ from the autore-
gression model
P
Xi = L OjnXi_ j + an . C, i E fZ , (9.46)
j=l

where {(ihE~ is a sequence of iid random variables with common


distribution Fn.

Step (III): Compute the periodogram of Xi, ... ,X~ as


n 2
I:;R*(w) = (27rn)-11 LX; exp( -Lwt)1 ' wE [-7r,7r] . (9.47)
t=l

Step (IV): Let in,AR(W) = ~11- L:~=l Ojne-LWjl-2, W E


[-7r,7r] denote
the spectral density of {XihEZ. Also, let K: [-7r,7r]-+ [0,00] be a
probability density function. Define the nonparametric estimator of
the function q(.) by

wE [-7r,7rJ, where hn > 0 is a bandwidth.


Step (V): Finally, define the bootstrap version of the periodogram In (w)
by rescaling the periodogram I;;R*(.) of the Xi's by fin(·), as

(9.49)

The ARFDB version of the centered and scaled spectral mean estimator
Tn = Vn(A(~; In) - A(~; f)) is given by

(9.50)
9.4 A Modified FDB 239

where B(~; I~) = J07l" ~(w)I~(w)dw and B(~; in) = J07l" ~(W)in(w)dw, with
in(w) == iin(w)in,AR(W), W E [-1f,1f]. As an alternative, the integrals in
the definitions of B(~; I~) and B(~; in) may be replaced by a sum over
the frequencies Pjn : 1 :::; j :::; no} as in the case of the FDB of Section
9.2. The conditional distribution of T~ given X!, ... , Xn now gives the
ARFDB estimator of the distribution of Tn.

Remark 9.1 One may use alternative methods for estimating the
parameters /31,"" /3p and 0'2 in Step (I) of the ARFDB. However, an
advantage of using the Yule-Walker equations to estimate the parameters
/31, ... ,/3p and 0'2 of the fitted autoregression model in Step (I) is that
all the roots of the polynomial 1 - E~=l /3jnzj lie outside the unit circle
{z E C : Izl :::; I} and hence, the spectral density function in,AR(-) of Step
(IV) is well defined.

Remark 9.2 In practice, one generates the variables Xi, ... , X~ from the
"estimated" autoregression model (9.46) by using the recursion relation
(9.46) with some initial values Xi-p, ... ,Xo and running the chain for a
long-time until stationarity is reached (cf. Chapter 8). Kreiss and Paparodi-
tis (2003) also point out that the order p of the fitted model may be chosen
using some suitable data-based criteria, such as, the Akaike's Information
Criterion.
For establishing the validity of the ARFDB, we shall make use of the
following conditions, as required by Kreiss and Paparoditis (2003).

Conditions:
(C.9) The linear process {XihEZ of (9.35) is invertible and has an infinite
order autoregressive process representation
00

Xi = L/3jXi - j + O'(i' i E Z
j=l

where E~l j1/21/3jl < 00 and 1 - E~l /3jzj =j:. 0 for all complex z
with Izl :::; 1.
(C.lO) {(ihEZ is a sequence of iid random variables with E(l = 0, E(f = 1,
and Ea < 00. Further, 0' E (0,00).
(C.11) The spectral density f of {XihEZ is Lipschitz continuous and satisfies
inf f(w»O.
wE [0,71"]

(C.12) (i) The characteristic function Kt(-) of the kernel K(·), given by
Kt(u) == J::'oo exp(wx)K(x)dx, is a nonnegative even function
with Kt(u) = 0 for lui> 1.
240 9. Frequency Domain Bootstrap

(ii) The bandwidth sequence {hn}n>1 satisfies


hn + (nhn)-I ---+ 0 as n ---+ (Xl •

(C.13) The function ~ : [0,7l'] ---+ lR is a function of bounded variation.


(C.14) There exist two sequences of real numbers {PIn}n:;::1 and {P2n}n:;::1
such that Pl'; = 0(1) and P2n = O([n/ log n]I/5) as n ---+ (Xl and the
order P of the fitted autoregression model satisfies P = Pn E [PIn, P2n]
for all n 2: l.
Under Conditions (C.9)-(C.14), the ARFDB provides a valid approxi-
mation to the distribution of Tn, as shown by the next result.
Theorem 9.4 Suppose that Conditions (C.9)-(C.14) hold. Then, with
(E({ - 3),
""4 =

T~ ---+d N (0, [27l' LTC e12 + ,,"4(/ U)2]) , in probability

and, hence, by (9.39),

sup Ip*(T~ ::; x) - P(Tn ::; x)l---+p 0 as n ---+ (Xl ,


xElR

where Tn = fo(A(~; In) - A(~; 1)) and T* is as defined in (9.50).


Proof: See Theorem 3.1, Kreiss and Paparoditis (2003). D

Theorem 9.4 shows that under suitable regularity conditions, the modi-
fied version of the FDB provides a valid approximation to a wider class of
spectral mean estimators than the standard version of the FDB, which is
applicable only to the class of ratio statistics. However, the validity of the
ARFDB crucially depends on the additional requirement of invertibility (cf.
Condition (C.1)), which narrows the class of linear processes {XihEZ to
some extent. Kreiss and Paparoditis (2003) point out that this restriction
may be dispensed with, if one modifies the FDB by fitting a finite-order
moving average model to the data instead of fitting an autoregressive pro-
cess and then by using a suitable version of the correction factor qn (-) in
Step (IV) for the moving average case. Because of these additional tuning-
up-steps involved in the autoregressive-aided or the moving-average-aided
versions, the modified FDB is expected to have a better finite sample perfor-
mance than the usual FDB, even when such modifications are not needed
for its asymptotic validity, i.e., when the methods are applied to ratio-
statistics. A similar remark applies on the finite sample performance of the
ARFDB in the spectral density estimation problems considered in Section
9.3. We refer the interested reader to Kreiss and Paparoditis (2003) for
a discussion of these issues, for guidance on the choice of the smoothing
parameters P and h, and for numerical results on finite sample performance
of the ARFDB.
10
Long-Range Dependence

10.1 Introduction
The models considered so far in this book dealt with the case where the
data can be modeled as realizations of a weakly dependent process. In this
chapter, we consider a class of random processes that exhibit long-range
dependence. The condition of long-range dependence in the data may be
described in more than one way (cf. Beran (1994), Hall (1997)). For this
book, an operational definition of long-range dependence for a second-order
stationary process is that the sum of the (lag) autocovariances of process
diverges. In particular, this implies that the variance of the sample mean
based on a sample of size n from a long-range dependent process decays
at a rate slower than O(n-1) as n ----+ 00. As a result, the scaling factor for
the centered sample mean under long-range dependence is of smaller order
than the usual scaling factor n 1/2 used in the independent or weakly depen-
dent cases. Furthermore, the limit distribution of the normalized sample
mean can be nonnormal. In Section 10.2, we describe the basic framework
and review some relevant properties of the sample mean under long-range
dependence. In Section 10.3, we investigate properties of the MBB approx-
imation. Here the MBB provides a valid approximation if and only if the
limit law of the normalized sample mean is normal. In Section 10.4, we con-
sider properties of the subsampling method under long-range dependence.
We show that unlike the MBB, the subsampling method provides valid ap-
proximations to the distributions of normalized and studentized versions
of the sample mean for both normal and nonnormallimit cases. In Section
242 10. Long-Range Dependence

10.5, we report the results from a small simulation study on finite sample
performance of the subsampling method.

10.2 A Class of Long-Range Dependent Processes


Let {Z;}iEZ be a stationary Gaussian process with EZ l = 0, EZ? = 1 and
auto covariance function

We shall suppose that the auto covariance function r(·) can be represented
as
r(k) = k- a L(k), k 2 1 (10.1)
for some 0 < a < 1 and for some function L : (0,00) ---+ ffi. that is slowly
varying at 00, i.e.,

lim L(at) = 1 for all a E (0,00) . (10.2)


t-+oo L(t)
Note that under (10.1), 2::~1 r(k) diverges and, hence, the process {ZihEZ
exhibits long-range dependence. Here we consider stationary processes that
are generated by instantaneous transformations of the Gaussian process
{Z;}iEZ, including many nonlinear transformations of {ZihEZ, Let G l :
ffi. ---+ ffi. be a Borel measurable function satisfying EG l (Zd 2 < 00. We
suppose that the observations are modeled as realizations of the random
variables {X;}iEZ that are generated by the relation

(10.3)

In spite of its simple form, this formulation is quite general. It allows the
one-dimensional marginal distribution of Xl to be any given distribution on
ffi. with a finite second moment. To appreciate why, let P be a distribution
function on ffi. with J x 2 dP(x) < 00. Set G l = p- l 0 <l> in (10.3), where
<l> denotes the distribution function of N(O, 1) and p- l is the quantile
transform of P, given by

p-l(U) = inf{x E ffi.: P(x) 2 u}, u E (0,1) .

Then,

Furthermore, it readily follows that Xl = P-l(<l>(Zd) has distribution


P. Thus, relation (10.3) yields a stationary process {XihEZ with one-
dimensional marginal distribution F. The dependence structure of {X;}iEZ
10.2 A Class of Long-Range Dependent Processes 243

is determined by the function G l and by the auto covariance function rO


of the underlying Gaussian process. In a series of important papers, Taqqu
(1975, 1979) and Dobrushin and Major (1979) investigated limit distribu-
tions of the normalized sample mean of the Xi'S under model (10.3). Let
fJ = EX I be the level-1 parameter of interest and let Xn = n- l L~=l Xi
denote the sample mean based on Xl"'" X n . Even under long-range de-
pendence, Xn is a consistent estimator ofthe level-1 parameter fJ. However,
the rate of convergence of Xn to fJ may no longer be Op(n- l / 2 ) and the
asymptotic distribution of (Xn - fJ), when it exists, may be nonnormal.
The limit behavior of (Xn - fJ) heavily depends on the Hermite Rank of
the function
(10.4)
Recall that for k E Z+,
dk
Hk(X) = (_l)k exp(x2 /2) dxk [ exp( _X2 /2)], x E lR

denotes the kth order Hermite polynomial. Then, the Hermite rank q of
GO is defined as
q = inf {k EN: E( HdZl)G(Zr)) # o} . (10.5)

Let A = 2f(a) cos(mr/2) and cq = E(Hq(Zr)G(Zl))' Also, for n E N, let


d n = [n 2- qa Lq(n)p/2. The following result gives the asymptotic distribu-
tion of the sample mean X n .
Theorem 10.1 (Taqqu (1975, 1979), Dobrushin and Major (1979)). As-
sume that G has Hermite rank q, and that r(·) admits the representation
at (10.1) with 0 < a < q-l. Then, n(Xn - fJ)/d n ----+d Wq in distribution,
where Wq is defined in terms of a multiple Wiener-ItO integral with respect
to the random spectral measure W of the Gaussian white-noise process as

~
Aq/2
Jexp{L(xl + ... + x q )}
L(XI + ... + Xq)
- 1 IT IXkl(a-l)/2
k=l
dW(Xl)'" dW(xq) . (10.6)
When q = 1, Wq has a normal distribution with mean zero and variance
2c~j{(1- a)(2 - a)}, but for q ~ 2, the distribution of Wq is nonnormal
(Taqqu (1975)). For details of the representation of Wq in (10.6), and the
concept of a multiple Wiener-Ito integral with respect to the random spec-
tral measure of a stationary process, see Dobrushin and Major (1979) and
Dobrushin (1979), respectively. The complicated form of the limit distri-
bution in (10.6) makes it difficult to use the traditional approach where
large sample inference about the level-1 parameter fJ is based on the limit
distribution. In the next section, we consider the MBB method of Kiinsch
(1989) and Liu and Singh (1992) and investigate its consistency properties
for approximating the distribution of the normalized sample mean.
244 10. Long-Range Dependence

10.3 Properties of the MBB Method


10.3.1 Main Results
Let X; , ... , X~ denote the MBB sample based on b == n / Cresampled blocks
of size C (cf. Section 2.5), where, for simplicity of exposition, we suppose
that C divides n. Define

where X~ = n- 1 L~=l Xt denotes the bootstrap sample mean and fln =


E*X~. Then, T~ gives the bootstrap version of the normalized sample mean
Tn = n(Xn - f.1)/d n . Although under the conditions of Theorem 10.1, Tn
converges in distribution to a nondegenerate distribution for all values of
q :::: I, it turns out that the conditional distribution of T~ has a degen-
erate limit distribution. The following result characterizes the asymptotic
behavior of P*(T~ :::; x) for x E R
Theorem 10.2 Assume that the conditions of Theorem 10.1 hold for some
q:::: 1
and that n'C- 1 +Cn f - 1 = 0(1) as n - 7 00, for some E E (0,1). Then,

sup Ip*(T~ :::; x) - if> (d n (bdD- 1 / 2 X/U q ) I = op(l) as n -7 00, (10.7)


xEIR

where b = n / C and where

(10.8)

Proof: See Section 10.3.2 below. o

Note that by the definition of dn and by the "slowly-varying" property


of the function L(·) (cf. relation (9.9), Chapter 8, Feller (1971b)),

bd~ = o(d~) . (10.9)

Hence, from (10.7), it follows that

P* (T~ :::; x) -7 0, T 1 or 1 in probability

according as x < 0, x = 0, or x > O. Thus, the conditional distribution


of T~ converges weakly to 60, the probability measure degenerate at zero,
in probability. This shows that the MBB procedure fails to provide a valid
approximation to the distribution of the normalized sample mean under
long-range dependence.
In contrast to certain other applications, where the naive bootstrap ap-
proximations suffer from inadequate or wrong centerings (e.g., bootstrap-
ping M-estimators, see Section 4.3), here the failure of the MBB is primarily
due to wrong scaling. The natural choice of the scaling factor d:;;l used in
10.3 Properties of the MBB Method 245

the definition of the bootstrap variable T~ tends to zero rather fast and
thus forces T~ to converge to a degenerate limit. Intuitively, this may be
explained by noting that by averaging independent bootstrap blocks to de-
fine the bootstrap sample mean, we destroy the strong dependence of the
underlying observations Xl, ... , Xn in the bootstrap samples. As a result,
the variance of the bootstrap sample sum nX~ has a substantially slower
growth rate (viz., bd~) compared to the growth rate d~ for Var(nXn). When
the unbootstrapped mean Xn is asymptotically normal, one can suitably
redefine the scaling constant in the bootstrap case to recover the limit law.
However, for nonnormal limit distributions of X n , the MBB fails rather
drastically; the bootstrap sample mean is asymptotically normal irrespec-
tive of the nonnormallimit law of normalized Xn . For a rigorous statement
of the result, define the modified MBB version of Tn as

Then, we have the following result on T~.

Theorem 10.3 Assume that the conditions of Theorem 10.2 hold. Let O'~
be as in (10.8). Then,

(i) sup \P*(T~ ~ x) - ifJ(X/O'q) \ = op(l) ;


xEIR

(ii) sup \P*(T~ ~ x) - P(Tn ~ x)\ = op(l) as n -* 00


xEIR

if and only if q = 1.

Proof: See Section 10.3.2 below. o

Thus, Theorem 10.3 shows that with the modified scaling constants, the
MBB provides a valid approximation to the distribution of the normalized
sample mean Tn only in the case where Tn is asymptotically normal. The
independent resampling of blocks under the MBB scheme fails to reproduce
the dependence structure of the Xi'S for transformations G with Hermite
rank q 2 2. As a consequence, the modified MBB version T~ of Tn fails
to emulate the large sample behavior of Tn in the nonnormallimit case.
A similar behavior is expected if, in place of the MBB, other variants of
the block bootstrap method based on independent resampling (e.g., the
NBB or the eBB, are employed. Theorems 10.2 and 10.3 are due to Lahiri
(1993b). Lahiri (1993b) also shows that using a resample size other than the
sample size n also does not fix the inconsistency problem in the nonnormal
limit case, as long as the number of resampled blocks tend to infinity. As
a result, the "m out of n" bootstrap is not effective in this problem if the
number of resampled blocks is allowed to go to infinity with n. However, if
repeated independent resampling in the MBB method is dropped and only
246 10. Long-Range Dependence

a single block is resampled, i.e., if in place of the MBB, the subsampling


method is used, then consistent approximations to the distribution of Tn
can be generated (cf. Section 10.4). For some numerical results on the
MaBB method of Carlstein et al. (1998), see Hesterberg (1997). We now
give a proof of Theorems 10.2 and 10.3 in Section 10.3.2 below.

10.3.2 Proofs
Define G(y) = G(y) - cqHq(y), y E lEt For i,j E Z, let Oij = 1 or 0
according as i = j or i i= j. Also, recall that x V y = max{ x, y}, x, y E JR,
N = n - £ + 1, and Ut = £-1 2.:;:(i-l)£+1 X;, 1:::; i:::; b.
Lemma 10.1 Suppose that r(-), L(·), a, and q satisfy the requirements
of Theorem 10.1. Assume that £ = O(n 1 - E ) for some 0 < E < 1, and that
£-1 = 0(1). Then,

Proof: Without loss of generality, assume that JJ = o. Then, by (2.14),

R(M. - M)/d, ~ (Nd,)-' [niX. + ~(j - f)(X} + X.- jH )].

Note that EHk(Zi)Hm(Zj) = [r(i - j)Jkok,m for all i,j E Z, k, m E No


Hence, by Corollary 3.1 of Taqqu (1977), it follows that for any real numbers
aI, a2,···, a£ E [0, £J,

(N d£)-2 E (t, ajxj ) 2

< (Nd£)-2 HI t, t, aiaj[r(i - jWI + £2 t, t, IEG(Zi)G(Zj)l]

[t; f; Ir(i - j)l q] (1 + 0(1))


f. p

< C(cq)(nde)-Z£Z

O(£2+ E n- Z) = 0(1) .
Therefore, by stationarity,
e-l
(Ndp)-1 2:)j - £)(Xj + Xn-j+d = op(l) .
j=1
Similarly, £ = O(n 1 - E) implies that
(Ndp)-ZE(n£Xn)2 = O(n-2£2d"izd;,) = 0(1) .
Hence, Lemma 10.1 follows. o
10.3 Properties of the MBB Method 247

Lemma 10.2 Let a~ = {2 E*(Ui - Pn)2 /d~ and let u~ be as defined in


(10.S). Assume that the conditions of Theorem 10.2 hold. Then,

a~=u~+op(l).
Proof: Define

and

Then, by Cauchy-Schwarz inequality,

la~ - ainl : ; a~n + 21 a 2n a lnl . (10.10)


By Corollary 3.1 of Taqqu (1977),

Ea~n =di2 E(t,C(Zi)r =0(1). (10.11)

Hence, to prove the lemma, by (10.10) and (10.11), it is now enough to


show that
(10.12)
Note that by Lemma 3.2 of Taqqu (1977) and the stationarity of the Z/s,

Var(ain)

< Cc!(Ndj}-' };(N - j)ICOV([t,H,(Z;}]',


[t,Hq(Zi+j-l)f) I

l(q!)4 T 2q (2 q!)-1 L
1
IT
k=l
r(mk - jk) - (q!)2r(i 1 - i2)qr(i3 - i 4 )ql '
(10.13)
where L:l extends over all ml, jl; ... ; m2q, j2q E {iI, ... , i4} such that
(a) mk -I- jk
for all k = 1, ... ,2q, and
(b) there are exactly q indices among {mk' j k : 1 ::; k ::; 2q} that are
equal to it for each t = 1,2,3,4 . (10.14)
248 10. Long-Range Dependence

Next write 2::1 = 2::11 + 2::12' where 2::11 extends over all indices {mk,jk:
1::::: k ::::: 2q} under 2::1 for which Imk - jkl = IiI -i21 for exactly q pairs and
Imk - jkl = li3 - i41 for the remaining q pairs, and where 2::12 extends over
the rest of the indices under 2::1' Clearly, for any {mk' jk : 1 ::::: k ::::: 2q}
appearing under 2::11' I1~~1 r(mk - jk) = r(il - i2)qr(i3 - i 4)q. We claim
that the number of such indices is precisely (2q!)22q(q!)-2. Hence, assuming
the claim, one gets

l(q!)4 T 2q(2q!)-1 L
1
IT
k=1
r(mk - jk) - (q!)2r(il - i2)qr(i3 - i4)ql

2q
::::: C(q) LIT Ir(mk - jk)1 . (10.15)
12 k=1
To prove the claim, note that, for any {ml' jl; ... ; m2q, j2q} under 2:: 1, if
Imk - jkl = IiI - i21 for some k 1, ... , kq E {I, ... , 2q}, then, by (10.14),
(a) Imk - jkl = li3 - i41 for all k E {I, ... , 2q}\{k1, ... , kq}, and
(b) exactly q of {mkll jk 1 ; • • • ; mk q , jk q } are ik, k = 1,2 and exactly q of
the remaining 2q integers are ik, k = 3,4.

Using this, one can check that the set of all indices {ml' jl; ... ; m2q, j2q}
under 2::11 can be obtained by first selecting a subset {k 1 , ... , k q }
of size q from {1, ... ,2q}, and then setting (mk,jk) = (i 1,i 2) or
(i2' h) for k E {k1, ... , kq} and (mk,jk) = (i3, i 4) or (i4' i3) for k E

e
{I, ... , 2q} \ {kl' ... , kq}. Hence, it follows that the number of terms un-
der 2::11 is qq ) . 2q • 2Q , proving the claim.
Next define No = ,Cnol, where <5 is any real number satisfying 0 < <5 <
e(5 - 2qa)-1 and where e is as in the statement of Theorem 10.2. Let
r(j) = Ijl-"'(1 + IL(ljl)l), j E Z, and Mn = max{l + IL(j)1 : 1 ::::: j ::::: n}.
Then, it is easy to check that for n large,

uniformly in 1 ::::: j ::::: c.


Note that given any i 1,i 2,i 3,i4, for every multi-index {mk,jk: 1::::: k:::::
2q} under 2::12' Imk-jkl ::::: min{lil-i31, IiI -i41, li2-i31, li2-i41} ::::: (j-C)
for at least one k E {I, ... , 2q}. Hence, by (10.13), (10.15), and (10.16), it
follows that

N-l £ £ j+£-1 j+£-1 2q


< C(cq ,q)(Ndi)-1 L L L L L LIT Ir(mk - jk)1
10.3 Properties of the MBB Method 249

C(C q , a, q, E) (n 8a NdiJ- 1 Mn
N-1 £ l £ £
X L L LLL f(i1 - i 2)qf(i3 - i4)q

h
were I 1n = ""No
L..j=O ""l L..i2=1 ""j+£-l
L..il=l ""l L..is=j ""j+£-l
L..i4=j ""
2q
L..12 k=l 1r (mk - Jk n
. )1 . It
is now easy to see that for any i1,i 2 ,i3,i4, by (10.14)(b),

2q *
L II Ir(mk - jk)1 < C(q) L f(i1 - i2) q1 f(i1 - i3) q2 f(it - i4)q3 x
12 k=l

where I:* extends over all nonnegative integers q1, . .. , q6 satisfying q1 +


... +q6 = 2q and
q1 + q2 +q3 = q
q1 + q4 + q5 = q
q2 + q4 + q6 = q
q3 + q5 + q6 = q.
The set of four equations are determined by the frequencies of the indices
i 1, i2, i3, and i4 on the right side of the inequality above. Next, write a =
(1 - qa)jq and d = (2 - qa)jq. Now, using Holder's inequality and the
conditions on q1, ... , q6, one can show that for any 0 ~ j ~ No,

j+£-lj+£-l '- l
L L L L f(i1 - i2) q1 f(i1 - i3) q2 f(it - i4)q3 x

j+£-lj+£-l '-
< L L L f(i2 - i 3)q4 f(i 2 - i 4)q5 f(i3 - i4)q6 X

j+£-lj+£-l
< C(a,q)M~ L L f(i3-i4)q6((t'-i3)Vi3)aq2((t'-i4)Vi4)aq3
250 10. Long-Range Dependence

L
j+£-l
< 0(0:, q)M~q(j + €)d(ql+q3+q5) {((€ - i3) V i3)a(q2+q4)
i3=j

since ql + ... + q6 = 2q and q2 + q4 + q6 = q implies ql + q3 + q5 = q. Hence,

implying that
Var(ain) = 0(1) .
Since Earn --) (J~, (10.12) follows. This completes the proof of the lemma D

Lemma 10.3 Let WI, W 2, ... , Wn be n iid random variables with EW1 =
0, and EWr = 1. Then, for any rJ > 0, and every n E N satisfying n- 1 +
on(l) < 1,
sup Ip(W l + ... + Wn ::s; y'nx) - <I> (x) I
xElR
::s; (1 + ,8n)On(l) + 22[rJ + on(rJ)l,8~ ,
where On (a) == EWrn(IW11 > ay'n), a> 0, and,8n == 11-n-l-on(I)I-1/2.
Proof: Define Wi = Win(IWil ::s; Vn), and Vi = Wi - EWi , 1 ::s; i ::s; n.
Then, it is easy to check that

and for any rJ> 0,


EIWl l3 ::s; (rJ + On(rJ))y'n .
By the Berry-Esseen Theorem (cf. Theorem A.6, Appendix A),

sup Ip(Wl + ... + Wn ::s; y'nx) - <I> (x) I


xElR

< sup
xElR
Ip(V + ... + Vn ::s; y'nx(EV12)1/2) - <I> (x) I
l

+ sup I<I> (x) - <I>(x - y'nEWl (EVn-1/2)1


xElR
+nP(IW11 > y'n)
< (2.75)EIV11 3(EVl)-3/2(y'n)-1 + (1 + (27rEV12)-1/2)On(l) .
This proves Lemma 10.3. D
10.4 Properties of the Subsampling Method 251

Proof of Theorem 10.2: Let G-~ == g2 E. (Ui -/In)2 / d;. Then, by Lemma
10.3 with TJ = b- 1 / \

~~~ Ip. (t(£ut - £j1n) :::; VbdcG-nX) - <p(X) \


< C[8n (1 + 11- b- 8nl- 2) + (b- + bn)11- b-
1 - 1/ 1/ 4 1 - Jnl-3/2] ,
(10.17)

where 8n == (d CG-n )-2 E.(£Ui - £j1n)2n(I£Ui - £j1nl > 2b 1 / 4 dcG- n ). We shall


show that 8n -+ 0 in probability. Without loss of generality, let fL = O.
Then, by Lemmas 10.1 and 10.2, it follows that

d""i 1 £j1n = op(1) and G-~ = CT~ + op(1) .


Hence, with S{ == £Ui, we get

Now, using arguments similar to those used in the proof of Lemma 10.2, one
can show that E(2::;=l Hq(Zi))4 = O(di). Hence, with Sf = 2::;=1 CqH(Zi),

E [d""i 2 E*s;2n(ls; 1 > b1 / 4 dcCTq)1


d""i 2 E[£Xe]21l(I£Xcl > b1 / 4 dejCTq)

< 4d""i2 [Es;n(ISel > b1/ 4 dej2CTq) + E( t, rJ


G(Zi)

0(1) .

Consequently, bn = op(1), and the theorem follows from (10.17) and


Lemmas 10.1 and 10.2. 0

Proof of Theorem 10.3: (i) follows from (10.7). As for (ii), the "if"
part follows from (i) and Theorem 10.1. To prove the converse, suppose
q #- 1. Then, by Theorem 10.1, n(Xn - fL)/d n converges in distribution to
a nonnormallimit while by (i), t~ is asymptotically normal, implying the
"only if" part. 0

10.4 Properties of the Subsampling Method


In Section 10.3, we have shown that the MBB fails to approximate the dis-
tribution of the normalized sample mean, when the latter has a nonnormal
limit distribution. In this section we show that even in such situations, valid
approximations can be generated if, instead of the MBB, we employ the
252 10. Long-Range Dependence

subsampling method. In Section 10.4.1 we present the results on subsam-


pIing approximations to the distribution of the normalized sample mean.
In Section 10.4.2, we describe a method of studentizing the sample mean
under long-range dependence, following the work of Hall, Jing and Lahiri
(1998). The main results of Section 10.4.2 assert the validity of the subsam-
pling method for the studentized sample mean. An outline of the proofs
of the results of Sections 10.4.1 and 10.4.2 are given in Section 10.4.3. As
mentioned earlier, numerical examples on the finite sample performance
of the subsampling method are given in Section 10.5. All through Section
10.4, we work under the basic framework of Section 10.2.

10.4.1 Results on the Normalized Sample Mean


Let Tn = n(Xn - /L)/d n denote the normalized sample mean based on n
observations from the stationary process {XihEZ, where d~ = n 2 - qa. Lq(n)
is as in Section 10.2. Let Hi = (Xi"'" X i H-1), 1 :::; i :::; N denote the
collection of overlapping blocks of length £ for some given integer £ = £n E
(1, n) and let BCi = L~~~-l Xj denote the sum of the elements in the block
Hi, 1 :::; i :::; N, where N = n - £ + 1. Then, a "subs ample copy" of Tn over
Hi is given by
(10.18)
The subsampling estimator of Qn(x) P(Tn :::; x), x E lR, based on
subsamples of length £, is given by
N

On(X) == On(X;£) = N- 1 L l1(Tci :::; x), x E lR . (10.19)


i=l

Next we state the conditions needed for establishing consistency of On.


For clarity of exposition, here we state the key regularity condition on the
spectral density gC) of the Gaussian process {ZihEZ, rather than on its
autocovariance function r(·). By using Abelian-Tanberian Theorems (cf.
Bingham, Goldie and Teugels (1987)), one can show that if (10.1) holds,
then
g(X)
xa.-1 L(l/x) --+ C(a) as x --+ 0 (10.20)

for some constant C(a) E (0, (0). Conversely, if (10.20) holds for a slowly
varying function L(·), then rC) admits the representation (10.1) with (with-
out loss of generality) the same function L(·). Thus, the requirement (10.20)
on the spectral density functiong(·) and the condition (10.1) on the auto co-
variance function r(·) are equivalent. Because g(.) is a symmetric function,
the Fourier series of log g(.) is a pure cosine series. Replacing each cosine
function by the corresponding sine function, we obtain the harmonic conju-
gate of logg(·). Let 10ggC) denote the ha~onic conjugate of logg(·). The
key regularity condition on g(.) is that logg(·) is continuous on [-7r,7r].
10.4 Properties of the Subsampling Method 253

While log 9 ( .), being unbounded in every neighbor hood of the origin, is
not continuous on [-n, n J, an appropriately chosen branch of log 9 can be
continuous on [-n, nJ. The following result on On is due to Hall, Jing and
Lahiri (1998).
Theorem 10.4 Suppose that {XihEZ is generated by relation (10.3) and
that the function G has Hermite rank q E N. Also, suppose that
(i) g(x) = Ixl a - 1 L 1 (lxl), ° °
< Ixl :::; n where < a < q-1 and where
L1 (-) is slowly varying at 0 and of bounded variation on every closed
subinterval of [0, nJ;

(ii) a branch oflogg(·) is continuous on [-n,nJ; and


(iii) n€£-l + n-(l-€)£ = 0(1) for some E E (0,1).

°
Then,
sup IOn(X) - Qn(X)1 ----+p as n --+ 00 . (10.21)
xElR.

Proof: See Section 10.4.3 below. o


Thus, it follows from Theorem 10.4 that under appropriate regularity
conditions, the subsampling estimator of the distribution of the normal-
ized sample mean Tn is consistent for both normal and nonnormal limits
of Tn. In particular, by avoiding repeated independent resampling of the
blocks, the subsampling method overcomes the inconsistency problem as-
sociated with the MBB method in the nonnormal limit case. In the next
section, we show that the subsampling method continues to provide valid
approximations, when the scaling constants dn's are replaced by certain
data-based scaling factors.

10.4.2 Results on the Studentized Sample Mean


Note that the scaling constant dn = (n 2- qa Lq(n))l/2 that yields a proper
limit distribution for the centered sample mean (Xn - J.L) under long-range
dependence depends on the unknown population quantities q, a, and L(·).
As a consequence, one needs to estimate these quantities consistently to be
able to construct large sample confidence intervals for the level-1 parameter
J.L. Here, we describe an empirical device for producing suitable scaling
factors for (Xn - J.L) for all possible values of the Hermite rank q. Let
ml == mln and m2 == m2n E [1, nJ be integers such that for some E E (0,1),

for k = 1,2. For example, we may take ml = In(HO)/2J, m2 = lnoJ for


some () E (0,1). Next define e;' = (n - m + 1)-l2:::-lm+l (Sim - mXn)2
254 10. Long-Range Dependence

and let
(10.23)
where Sim == (Xi + ... + Xi+m-l)/m, m ~ 1, i ~ 1. Under the condition
of Theorem 10.5 below, e;'/[Var(nXn )] ---+ 1 in probability as n ---+ 00. We
use n-len for scaling the sample mean Xn and define the "studentized"
sample mean
(10.24)
Let Qln(X) = P(Tln ~ x), x E lR denote the distribution function of T ln . To
define the subsampling estimator of Qln(-) based on blocks of length f, let
eu denote the subsample version of en, obtained by replacing {Xl,"" Xn}
and n in the definition of en by {Xi,"" XiH-l} and e, respectively. In
particular, this requires replacing mk == mkn by mkf, k = 1,2. Let TU,i =
(Su - eXn)/eu, 1 SiS N denote the subsample copies of T ln . Then, the
subsampling estimator of Qln is given by
N
Qln(X) == Qln(Xjf) = N- l L n(TU,i ~ x), x E lR . (10.25)
i=l
Theorem 10.5 Assume that the conditions of Theorem 10.1 hold for some
q E N, that ml and m2 satisfy (10.22), that n€f-l + n-(l-€)f = 0(1) for
some E E (0,1) and that

(10.26)

Then,

(a) e;'/Var(nXn ) --tp 1 as n ---+ 00;


(b) T ln --td Wq/uq as n ---+ 00, where u~ is as defined in (10.8);

(c) sup iQln(X) - Qln(X)i --tp 0 as n ---+ 00 .


xEIR

Proof: See Section 10.4.3 below. 0

Theorem 10.5 shows that the empirical device yields a consistent esti-
mator of the variance of the sample sum nXn = L~l Xi for all values of
the Hermite rank q ~ 1. Furthermore, the subsampling method provides
a valid approximation to the distribution of the studentized sample mean
T ln for both normal and nonnormallimits of T ln . Consequently, we may
use Theorem 10.5 to set approximate confidence intervals for the level-1
parameter f..t that attain the nominal coverage levels asymptotically, for all
q ~ 1. An advantage of this approach over the traditional large sample
theory is that the subsampling confidence intervals may be constructed
without making explicit adjustments for the Hermite rank q and without
estimating the covariance parameter Q. For 'Y E (0,1), let q"( denote the
10.4 Properties of the Subsampling Method 255

LN,),J-th order statistic of the subsample copies TU,i, 1:S i:S N. Then, an
approximate (1 - ')')100% two-sided equal-tailed confidence interval for jJ,
is given by
In(')') = Xn - q1-! . --;:' Xn - q! . --;: .
A ( - A en - A en ) (10.27)

Then, under the conditions of Theorem 10.5,

p(jJ, E in (')')) ----> 1 - ')' as n ----> 00 .

Although the subsampling confidence interval inb) attains the desired


coverage level (1 - ')') in the limit, its finite sample accuracy depends on
various factors, notably on the block size £ and ori the integers ml, m2. If
we take m1 and m2 to be of the form m1 = Ln(1+IJ)/2J and m2 = LnoJ for
some () E (0,1), a value of () close to 1 is suggested by Hall, Jing and Lahiri
(1998) to ensure better bias properties of the estimator e;'k' k = 1,2. We
consider numerical properties of the subsampling method in Section 10.5,
following the proofs of Theorems 10.4 and 10.5 given below.

10.4·3 Proofs
Proof of Theorem 10.4: By Theorem 5.2.24 of Zygmund (1968), r(k) '"
k-OI.L 1 (1/k) as k ----> 00, and hence, by Theorem 10.1,

(10.28)

where, for n E N, the normalizing constant dn is now defined by d~ =


n 2 - qOl. L1(1/n)q. Consequently, by the slow variation of L 1(-),

(10.29)

In view of (10.28) and (10.29), this implies that .e(Xn - jJ,)/dt = op(l).
Hence, it is enough to show that

sup IQn(x) - Qn(x)1 = op(l) , (10.30)


xEIR

where Qn(X) = N- 1 L:1:-:;i:-:;N n{(Si£ - £jJ,)/d£:S x}. Since the distribution


of Wq is continuous, (10.30) holds provided Qn(X) -Qn(X) = op(l) for each
x E lR.
Note that

E{Qn(X) - Qn(x)}2 :S (2£ + 1)N- 1


N-1
+~ L
Ip{Su/d£:S x, S(i+1)dd£ :S x} - {Q(x)}21 '
i=£+1
(10.31)
256 10. Long-Range Dependence

where, for simplicity of notation, we have set f.l = 0 in the last line. Now
by Theorem 5.5.7 of Ibragimov and Rozanov (1978), the second term on
the right side of (10.31) tends to zero. Hence, Theorem lOA is proved. D

Lemma 10.4 Suppose that the function G (.) has Hermite rank q EN, that
0<0: < q-1 and that n cg- 1 + n-(l-c)g = 0(1) for some E E (0,1). Then,
for any <5 E (0,1), a, bEN,

as n ----+ 00

where Sim = 2:;~:n-1 cqHq(Zj).

Proof of Lemma 10.4: The proof of Lemma lOA is somewhat long and
hence, is omitted. We refer the interested reader to the proof of relation
(4.8), page 1201-1202 of Hall, Jing and Lahiri (1998) for details. D

Proof of Theorem 10.5: Let en = dnuq, M = n - m + 1, and e;' =


M- 1 2:~1 (Sim - mf.l)2. In view of (10.29), if m denotes either m1 or m2,
then

Ele;. - e;.1
L m[E(Xn -
AI 1/2
< 4M- 1 f.l)2{ E(Sim - mf.l)2 + m 2 E(Xn - f.l)2}]
i=l

4m(n-1en)[e;;, + m 2n- 2e;,] 1/2


o(d;,) . (10.32)

Next write e;,


= M- 1 2:~1 S'fm. By Corollary 3.1 of Taqqu (1975) and
Cauchy-Schwarz inequality, for m = m1, m2

Ie;;, - e;;, I
f/'
E

< E[ {M 't(S,m - mM + S,m)'

x {M- 1 t(Sim _ mf.l- Sim)2 } 1/2]

< {2E(Slm - mf.l)2 + 2ESrm} 1/2 {E(Slm _ mf.l- Slm)2} 1/2


o(d;;') .

By Lemma lOA, for m = m1 or m2,


10.5 Numerical Results 257

(n- j~l j~llcov{


M M

o 2 (Sjl m)2, (Shm)2} I)


o( d'!.) .

Because Ee;"k = e;"k {I + o(l)} for k = 1,2, it follows that en/en - t 1 in


probability. This proves part (a) of the theorem. Part (b) now follows by
applying part (a) and Theorem 10.4. This completes the proof of Theorem
10.5. D

10.5 Numerical Results


In this section, we consider finite sample performance of the subsampling
method for the renormalized sample mean

of (10.24). For this, as in Hall, Jing and Lahiri (1998), we generated station-
ary increments of a self-similar process with self-similarity parameter (or
Hurst constant) H = !(2 - a), and took a suitable linear transformation
of these data to produce a realization of a long-range dependent process
with Hermite rank q. The details of the relevant steps are as follows:

Step 1: Generate a random sample ZnO = (ZlO,"" ZnO)' of size n from


the standard normal distribution.

Step 2: Let R = ((rij)) denote the correlation matrix defined by

for k = Ii - il and! < H < 1. Express R as R = U'U by Cholesky


factorization.

Step 3: Define Zn == (Zl,"" Zn)' = U'ZnO. Then Zn may be considered


as a finite segment of a stationary Gaussian process with zero mean,
unit variance, and auto covariance function

r(k) ~{(k + 1)2H + (k _1)2H - 2k2H}


Ck-a as k - t 00 , (10.33)

where a = 2 - 2H E (0,1). Note that the r(k) 's are the autoco-
variances of the stationary increments of a self-similar process with
self-similarity parameter H (cf. Beran (1994), p. 50).
258 10. Long-Range Dependence

Step 4: Define Xi = Hq(Zi), for i = 1, ... ,n, where Hq is the qth Her-
mite polynomial. Then Xn = {Xl"'" Xn} is a long-range dependent
series with Hermite rank q.

We now report finite sample coverage accuracy of the subsampling


method for confidence intervals for IL, produced using the renormalized
statistic TIn from Section 10.4.2. We consider the long-range dependent
time series {Xl, ... ,Xn } of Hermite rank q E {I, 2, 3}, as generated above,
for n = 100,400,900. Throughout, we take the nominal confidence level to
be 0.90. The empirical approximations to coverage probabilities reported
here were derived by averaging over K = 1000 independent simulations.
For the block size e, we used e = cn l / 2 for c = .5,1,2. Although the best
choice of e is unknown, this choice is based on the intuition (cf. Hall, Jing
and Lahiri (1998), page 1155) that the optimal size should be greater than
!
that for the weakly dependent case, where e '" cn{3 for {3 ::; is generally
appropriate (d. Hall, Horowitz and Jing (1995) and Hall and Jing (1996)).
In choosing ml and m2 for the renormalization procedure mentioned in
Section 10.4.2, we took

with e = 0.8.
Results from the simulation study are summarized in Table 10.1. In the
table, the headings "Lower" and "Upper" represent coverage probabilities
of the lower and the upper 90% one-sided confidence intervals, respectively,
while q denotes the Hermite rank. The formula for the 100(1 - a)%, 0 <
a < 1, lower and upper confidence limits are, respectively, given by

(10.34)

and
(10.35)
where t{3 is the {3-quantile of the subsampling estimator Qln, 0 < {3 < 1, and
SIn = I:~=l Xi· It appears that for a = 0.5,0.9, the choice e = 2y'n leads to
more accurate results while for a = 0.1, e = 0.5n l / 2 works better. For each
value of a = 0.1,0.5,0.9 and for each value of q = 1,2,3, coverage accuracy
with these "optimal" choices of the subsampling parameter e increases with
the sample size. Interestingly, the Hermit rank q E {I, 2, 3} seems to have
little effect on accuracy of the subsampling method. We also repeated the
e
whole simulation study with = 0.9 in the definitions of ml = Ln(l+O)/2J
and m2 = LnoJ, as considered by Hall, Jing and Lahiri (1998). The choice
e = 0.8 had slightly better performance than e = 0.9 for the combinations
of the factors q, a, nand e considered here.
10.5 Numerical Results 259

TABLE 10.1. Coverage probabilities of 90% lower and upper confidence limits,
given by (10.34) and (10.35) respectively, based on K = 1000 simulation runs.
Here n denotes the sample size, .e is the length of subsamples, q denotes the
Hermite rank of {XdiEZ and a is as in (10.33).

(a): £ = ~ . n 1 / 2
a =0.1 a = 0.5 a = 0.9
q Lower Upper Lower Upper Lower Upper
1 87.0 86.1 95.0 93.1 97.0 94.2
n = 100 2 91.9 94.2 99.2 95.9 99.6 93.9
3 96.3 93.3 98.0 97.0 97.5 95.7
1 89.0 88.8 94.9 95.4 95.7 95.8
n = 400 2 90.0 96.6 98.0 95.5 99.1 93.5
3 95.1 93.5 96.9 96.7 97.0 95.8
1 91.7 92.0 96.8 95.8 97.6 96.6
n = 900 2 91.5 96.5 99.2 97.5 98.9 95.1
3 97.6 97.2 97.5 97.5 98.1 97.1

(b): £ = n 1 / 2
a = 0.1 a= 0.5 a =0.9
q Lower Upper Lower Upper Lower Upper
1 82.0 81.2 91.1 89.7 93.1 90.9
n = 100 2 81.6 91.9 95.7 93.9 96.6 91.4
3 90.6 87.4 93.7 92.8 92.8 91.4
1 87.6 86.6 93.4 93.2 94.2 94.0
n = 400 2 84.1 95.4 95.7 94.8 97.1 92.3
3 92.1 90.0 94.6 94.9 95.3 94.7
1 87.5 88.2 93.9 93.3 94.2 93.3
n = 900 2 84.9 94.4 96.9 94.9 97.0 92.7
3 92.7 93.5 95.2 95.3 94.5 94.3

(c): £ = 2n 1 / 2
a =0.1 a= 0.5 a = 0.9
q Lower Upper Lower Upper Lower Upper
1 78.7 76.0 89.4 85.1 90.8 86.9
n = 100 2 76.2 88.4 90.0 91.0 92.2 88.7
3 86.2 81.5 90.6 88.3 89.5 87.8
1 82.9 81.4 90.5 89.2 92.0 91.7
n = 400 2 77.8 92.9 91.4 92.2 93.2 89.4
3 87.5 85.6 92.3 90.2 91.8 90.3
1 82.6 82.2 89.9 90.3 91.3 92.0
n = 900 2 79.3 91.7 93.5 92.2 93.8 91.1
3 88.7 89.0 93.0 91.7 91.8 90.6
11
Bootstrapping Heavy-Tailed Data and
Extremes

11.1 Introduction
In this chapter, we consider two topics, viz., bootstrapping heavy-tailed
time series data and bootstrapping the extremes (i.e., the maxima and the
minima) of stationary processes. We call a random variable heavy-tailed if
its variance is infinite. For iid random variables with such heavy tails, it
is well known (cf. Feller (1971b), Chapter 17) that under some regularity
conditions on the tails of the underlying distribution, the normalized sample
mean converges to a stable distribution. Similar results are also known for
the sample mean under weak dependence. In Section 11.2, we introduce
some relevant definitions and review some known results in this area. In
Sections 11.3 and 11.4, we present some results on the performance of the
MBB for heavy-tailed data under dependence. Like the iid case, here the
MBB works if the res ample size is of a smaller order than the original
sample size. Consistency properties of the MBB are presented in Section
11.3, while its invalidity for a resample size equal to the sample size is
considered in Section 11.4.
In Sections 11. 5-11. 7, we consider the extremes of stationary processes.
This is another classic example where the "fewer than n" resampling works
better. In Section 11.5, we review some relevant definitions and results on
extremes of dependent data. Results on bootstrapping the extremes are
presented in Sections 11.6 and 11.7 respectively for the cases where the
normalizing constants are known and where they are estimated.
262 11. Bootstrapping Heavy-Tailed Data and Extremes

11.2 Heavy-Tailed Distributions


Let {Xn}n~l be a sequence of stationary random variables defined on a
probability space (0, F, P). Let F be the marginal distribution function of
Xl. Also, let Sn = Xl + ... + X n , n 2: 1 denote the partial sums of the
process {Xn }n2:l. Under weak dependence, it is known (cf. Ibragimovand
Linnik (1971)) that the possible limit distributions of the normalized sum
process {a;;:l(Sn -bn)}n~l for constants an> 0, bn E lR are infinitely divis-
ible. In particular, for stationary heavy-tailed random variables heaving an
infinite second moment, the normalized sum may converge to a nonnormal
limit. For independent random variables, necessary and sufficient conditions
for convergence of the partial sum to a given infinitely divisible distribution
are known (cf. Feller (1971 b), Chapter 17). However, for dependent random
variables, in addition to these tail conditions on the marginal distribution
function F, some additional conditions on the dependence structure of the
process {Xn }n2:l are imposed to guarantee convergence to an infinitely di-
visible law. Here, we collect some relevant definitions and results on weak
convergence to infinitely divisible distributions and consider certain depen-
dence structures that are suitable for studying theoretical properties of
the MBB approximation in the heavy-tail case. For more details on the
properties of the sum of dependent heavy-tailed random variables, we re-
fer the interested readers to the papers by Davis (1983), Samur (1984),
Jakubowski and Kobus (1989), Denker and Jakubowski (1989), and the
references therein.

Definition 11.1 A random variable W (or its distribution function) is


said to be infinitely divisible if its characteristic function is given by

(11.1)

for some 0::; c::; 00 and some measure M on (lR,B(lR)), where i = A,


Tc(X) is the function Tc(X) == xn(lxl ::; c), x E lR, and where M is a
canonical measure on lR, i.e., M is a measure on (lR, B(lR)) , M(I) < 00
for any bounded interval Ie lR and M+(x) == .h( x,= ) y-2 M(dy) < 00 and
M-(x) == J(-=,-xj y-2 M(dy) < 00 for all x> 0.

At x = 0, the expression [e otx - 1 - dTc(x)]X-2 in (11.1) is replaced by


its limit (as x -+ 0), i.e., by -t 2 /2. Some common examples of infinitely
divisible distributions include the normal distribution with mean zero and

distribution with mean>. E (0, (0), (with c = °


variance a 2 E (0, (0) (with M(A) = a 2 nA(0), A E B(lR)), the Poisson
and M(A) = >.nA(I),
A E B(lR)), and the nonnormal stable laws of order a E (0,2), where for a
given a E (0,2), the canonical measure M = Ma associated with the stable
11.2 Heavy-Tailed Distributions 263

law of order a is given by

Ma(A) = Co [p r
J(O,oo)nA
x1-adx + q 1
(-oo,O)nA
IxI1-adx] , A E 8(JR) ,

°
for some constants Co E (0, (0), p?: 0, q ?: with p + q = 1.
Next, let {Xn}n>l be a sequence of iid random variables with common
(11.2)

distribution function F, where F is the common marginal distribution


function of the given stationary sequence {Xn}n>l. Then, the sequence
{Xn}n>l will be referred to as the associated iid sequence to the given
sequence {Xn}n>l. We shall establish validity of the bootstrap approxi-
mation for sums of Xn's in a general set up where the sequence of partial
sums, suitably centered and scaled, may have different limits along different
subsequences, as made precise in the following definition.
Definition 11.2 Let {Xn }n2:1 be a sequence of stationary random vari-
ables with one-dimensional marginal distribution F and let W be an in-
finitely divisible random variable with distribution Fo. Then, we say that F
belongs to the domain of partial attraction of Fo if there exists a subsequence
{ndi2:1 and constants ani> 0, bn, E JR, i?: 1 such that

(11.3)

where {Xn}n>l is the associated iid sequence to the given sequence


{Xn}n>l.
For the associated iid sequence {Xn }n2:1, convergence of the normalized
sum a;;}(Xl + ... + X ni - bnJ to the infinitely divisible distribution of
(11.1) holds solely under some regularity conditions on the marginal distri-
bution function F. Let C(M) denote the set of all continuity points of (the
distribution function of a) measure M, i.e., C(M) = {x E JR : M ({x}) = o}.
Let W be a random variable with the characteristic function ~(.) of (11.1)
and let c and M of (11.1) be such that c E C(M). Then, a set of necessary
and sufficient conditions for (11.3) is that (cf. Feller (1971b), Chapter 17),
as i --+ 00,
ni(l - F(anix)) ---+ M+(x), for all x E (0,00) n C(M) , (11.4)
niF(anix) ---+ M-(x), for all x E (-00,0) nC(M) , (11.5)

niVar(Tc(a~/Xd) ---+ M([-c,c]) , (11.6)

a~ilbni - niETc(XI/anJ ---+ °. (11. 7)


When (11.7) holds, we may replace bni with niETc(XI/anJ and get the
convergence of L7~1 [Xj - ETC (XI/anJJ to W. In general, it is not possible
to further replace ETc(XI/anJ with EXI/a ni . However, if
lim limsupnia;:;}EIX1In(IX11 > ),anJ = 0, (11.8)
..\-+00 i--+CXJ
264 11. Bootstrapping Heavy-Tailed Data and Extremes

and (11.4)-(11.7) hold, then it can be shown that


1 -
a:;;, (Xl + ... + X- n , - nilL) ----+
d
Wo , (11.9)

where J.L = EX I and Wo is a random variable having the characteristic


function (11.1) with c = +00. Note that under (11.8), it is now meaningful
to consider statistical inference regarding the population mean J.L on the
basis of the variables i\, ... , Xn .
Next we turn our attention to conditions that ensure a weak conver-
gence result similar to (11.9) for the given dependent sequence {Xn}n>l.
In addition to the above assumptions on the tails of F, an additional set of
weak-dependence conditions is typically assumed to prove such a weak con-
vergence result. As the required set of weak-dependence conditions depend
on the form of the canonical measure M of the limiting infinitely divis-
ible distribution, for simplicity, we shall restrict attention to the "purely
nonnormal" case where M( {O}) = O. (See the references cited above for
conditions when M({O}) #- 0.) With this restriction on M, we shall as-
sume the following regularity conditions on the dependence structure of
the process {Xn}n;:,l:

P(XI > x, X n + l > x) <


\)1* == lim sup sup 00 (11.10)
x-->oo n>l [P(XI > x)J2
and
00

(11.11)

where p(.) denotes the p-mixing coefficient of the process {Xn }n2:1. Recall
that for n 2: 1, we define

p(n) = {IEJgl/(EJ 2Eg2)1/2 : J E .c~(J:.~+1), 9 E .c~(Fk+n+l)' k 2: I} ,


(11.12)
where F! = the (I-field generated by {Xk : kEN, i S k < j}, 1 SiS
j S 00 and .c~(F!) = {J : 0 -+ lR I J PdP < 00, J JdP = 0 and J is
F! -measurable}.
Condition (11.11) is quite common for proving Central Limit Theorems
for p-mixing random variables (cf. Peligrad (1982)). The quantity w* in
(11.10) is closely related to the well known W-mixing coefficient, defined by

W(n) sup {IP(A n B) - P(A)P(B)II P(A)P(B) :

A E F~+l, BE Fk+n+l' k 2: I}, n 2: 1 .

Together, conditions (11.10) and (11.11) specify the dependence structure


of the sequence {Xn}n;:,l that yields the following analog of (11.9) for the
given sequence {Xn}n;:'l.
11.3 Consistency of the MBB 265

Theorem 11.1 Assume that (11.4)-(11.6), (11.8), (11.10), and (11.11)


hold for some subsequence {nih2:l. Then

Tni == a;;il(Sni - nifJ) --+d Wo ,

where Sn = Xl + ... + X n , nEW, fJ = EX l , and W o has characteristic


function ~(t) given by (11.1) with c = +00, and M({O}) = O.

Proof: See Lemma 3.5, Lahiri (1995). o

In the next section, we consider properties of the bootstrap approxima-


tion to the normalized sum Tn.

11.3 Consistency of the MBB


Let Xn = {Xl,,,,,Xn} denote the sample at hand and let {X;, ... ,X;;,}
denote the MBB res ample of size m == mn based on blocks of size C. For
simplicity, we suppose that m = kC for some integer k, so that the bootstrap
approximation is generated using k "complete" blocks. Furthermore, we
suppose that the block length variable C satisfies the requirement that C =
o(n), but mayor may not tend to infinity. Thus, for C == 1, this also covers
the IID resampling scheme of Efron (1979) where single data-values, rather
than blocks of them, are res amp led at a time. Let S;;, n = X; + ... + X;;,
and fln = E* [S;;' ,n/m]. Then, the bootstrap version of Tn == a;;l(Sn - nfJ)
is given by
* ==
T ffi,n -l(s*ffi,n - m/-ln .
- am A )

The main result of this section says that bootstrap approximations to the
distribution of Tn are valid along every subsequence ni for which Tni --+d
W o , provided k --7 00 and the resample size m = o(n) as n --7 00.
Theorem 11.2 Suppose that 1 ::::: C « n and that k --7 00 such that m ==
kC = o( n) as n --7 00. Also, suppose that the conditions of Theorem 11.1
hold for some subsequence {ndi>l. If the subsequence {mni : i 2': I} is
contained in {ni : i 2': I} and k;;!/2(mninil)[an)amn] --7 0 as n --7 00,
then '
Q(r m ni ,ni , r ni) --+p 0 as i ----7 00 ,

r
where m,n(X) = P*(T';',n ::::: x) and r n(x) = P(Tn ::::: x), x E lR, n 2': 1, and
Qis a metric that metricizes the topology of weak convergence of probability
measures on (lR, B(lR)).

Proof: See Theorem 2.1 of Lahiri (1995). o

Thus, Theorem 11.2 asserts the validity of the MBB approximation along
every subsequence for which the limit distribution of the normalized sum
266 11. Bootstrapping Heavy-Tailed Data and Extremes

is a nonnormal infinitely divisible distribution (which may be different for


different subsequences). Theorem 11.2 shows that the MBB adapts itself
to the form of the true distribution of the normalized sum so well that it
captures all subsequential limits of Sn and provides a valid approximation
along every convergent subsequence, provided the res ample size m grows
slowly compared to the sample size n. See Lahiri (1995) for more details. For
independent random variables, a similar result has been proved by Arcones
and Gine (1989) for the IID bootstrap method of Efron (1979).
Next we comment on the block size C that leads to a valid bootstrap
approximation. For simplicity of exposition, here and in the rest of this
section, we suppose that the subsequential limits of normalized Sn are the
same. Then, there exists an a E (1,2) and scaling constants an > 0 such
that
(11.13)
where We> has a (nonnormal) stable law of order a, having characteristic
function ~(t) of (11.1) with canonical measure Me> of (11.2). In this case, the
variance of Xl is infinite and Theorem 11.2 shows that the MBB approxi-
mation works for the normalized sum Tn with a nonnormallimit, provided
the resample size m is of smaller order than the sample size. It is interest-
ing to note that the block length parameter C here need not go to infinity
in order to provide a valid approximation even though the random vari-
ables {Xn}n>l are dependent. Thus, Efron's (1979) IID resampling scheme,
which uses no blocking and resamples a single observation at a time (i.e.,
C == 1 for all n), also provides a valid approximation to the distribution of
the normalized sum of heavy-tailed random variables under dependence.
Validity of the IID bootstrap here is in sharp contrast with the finite vari-
ance dependent case, where the example of Singh (1981) (cf. Section 2.3)
shows that the IID bootstrap fails to capture the effect of dependence on
the distribution of the sum converging to a normal limit. An intuitive jus-
tification for this fact may be given by noting that under the conditions of
Theorem 11.1, the limit distribution of Tn in the heavy-tail case depends
only on the marginal distribution F of the sequence {Xn}n?,:l. As a conse-
quence, the resampling of single data-values capture adequate information
about the distribution of the Xn's to produce a valid approximation to the
distribution of the normalized sum.
If the constants {an}n>l in (11.13) are known, then the MBB can be used
to construct asymptotically valid confidence intervals for the parameter JL.
r
Let qm('r) denote the 1'-quantile of m,n, 0 < l' < 1. Then, an equal tailed
(1 -1') bootstrap confidence interval for the parameter JL is given by

Under (11.13) and the regularity conditions of Theorem 11.2,

p(JL E i m,n(I-1')) --t 1 -1' as n --t 00 . (11.15)


11.3 Consistency of the MBB 267

When the scaling constants {an}n>l in (11.13) are unknown, we may in-
stead consider a "studentized" statistic of the form Tin == an(Sn - nIL),
where an is an estimator of an satisfying

(11.16)

For example, when Xl has a stable distribution of order a E (1,2), the


scaling constants {an}n>l are given by an = n1/e> , n :2: 1. In this case,
we may take an = n 1 /&-:;', where {an}n>l is a sequence of estimators of
the tail index a that satisfies an - a ~ op((logn)-l) as n ---+ 00. See
Hsing (1991), Resnick and Starcia (1998), and the references therein. For
an iid sequence {Xn}n;;'l, when F lies in the domain of attraction of a
stable law of order a E (1,2), Athreya, Lahiri and Wei (1998) developed
a self-normalization technique for the sum Sn. A similar approach may be
applied in the dependent case. See also Datta and McCormick (1998) for
related work on data based normalization of the sum when {X,,}n>l is a
linear process.
Returning to our discussion of the studentized estimator Tin with a given
scaling sequence {an}n;;'l, the MBB version of TIn based on a res ample of
size m is given by

where a::r, n is obtained by replacing Xl, ... ,Xmin the definition of am by


the MBB'samples Xi, ... , X;;". And a "hybrid" MBB version of TIn may
be defined as

where the same data-based scaling sequence {an}n>l that appears in the
definition of Tin is also used to define the bootstrap version of Tin. Write
Om.n and Om,n for the conditional distributions of T{.m,n and T{,m.n re-
spectively. Also, let Gn(x) = P(Tln ~ x), x E lR. Then, using Lemma
4.1, it is easy to show that if the conditions of Theorem 11.2, (11.13), and
(11.16) hold, then
(11.17)
provided, for every E > 0,

P(la::r"n/an -11> E I Xoo) -+ 0 in probability as n -+ 00. (11.18)

On the other hand, consistency of the "hybrid" estimator Om,n holds with-
out the additional condition (11.18). Indeed, under the conditions of The-
orem 11.2, (11.13), and (11.16), it is easy to show that

(11.19)

Both (11.17) and (11.19) can be used to construct bootstrap confidence in-
tervals for IL when the scaling constants {an}n;;'l are not completely known.
268 11. Bootstrapping Heavy-Tailed Data and Extremes

Let tm(rr) and tm(rr) denote the 'Y-quantile (0 < l' < 1) of Gm,n and Gm,n,
respectively. Then, a (1 - 1') equal-tailed two-sided bootstrap confidence
interval for p, is given by

Similarly, a (1 - 1') equal-tailed two-sided bootstrap confidence interval


Jm (1- 1') (say), for p" based on the "hybrid" version of TIn, is obtained by
replacing tm(-)'s in (11.20) by tm(-)'s. Both Jm (1-'Y) and Jm (1-'Y) attain
the nominal converge probability (1-1') in the limit. However, magnitudes
of the errors in the coverage probabilities of all three bootstrap confidence
intervals i m (I-'Y), Jm (I-'Y), and Jm (1-'Y) are unknown at this stage. Note
that the rates of approximations in (11.15), (11.17), and (11.19) depend on
the res ample size m as well as on the block length C. The optimal choices
of the resampling parameters C and m are not known. In some similar
problems involving independent data, the choice of m has been addressed
empirically using the Jackknife-After-Bootstrap (JAB) method of Efron
(1992); see Datta and McCormick (1995) and Athreya and Fukuchi (1997).
A similar approach may be applied here as well. However, no theoretical
result on the properties of the JAB in these problems seems to be available
even under independence. See also Sakov and Bickel (2000) for the effects
of m on the accuracy of the "m out of n" bootstrap for the median.

11.4 Invalidity of the MBB


In the last section, we proved consistency of the MBB approximation under
the restriction that the resample size m be of smaller order than the sample
size n. It is natural to ask the question: What happens when this condition
is violated? For independent random variables, Athreya (1987) showed that
the IID bootstrap method of Efron (1979) fails drastically in the heavy-tail
case if one chooses m = n. In this section, we show that in the dependent
case, a similar result holds for the MBB and the IID bootstrap method
of Efron (1979). Thus, we cannot expect the bootstrap approximation to
work for heavy-tailed data if the condition 'm = o( n) as n ---4 00' is violated.
Further ramifications of this phenomenon have been studied, among others,
by Arcones and Gine (1989, 1991), Gine and Zinn (1989, 1990) in the
independent case and by Lahiri (1995) in the dependent case.
Let {Xn}n>1 be a sequence of stationary random variables with common
marginal distribution F and let {X n }n2:I be the associated iid sequence.
For simplicity, we describe the asymptotic behavior of the MBB for heavy-
tailed dependent random variables when the resample size m equals the
sample size n and the normalized sum has a nonnormal stable limit law as
in (11.13). Suppose that F lies in the domain of attraction of a stable law
11.4 Invalidity of the MBB 269

Fa (say) of order a E (1,2), i.e., there exist constants an > 0, bn E JR such


that
a:;; I (Xl
-
+ ... + Xn
- d
- bn ) ~ W a ,

where Wa has distribution Fa. It is well known (cf. Feller (1971b), Chapter
17) that in this case, the tails of F must satisfy the growth conditions:

F(x) rv px-aL(x) as x --+ 00 (11.21)

1 - F(x) rv qx- a L(x) as x --+ 00 (11.22)

for some p 2: 0, q 2: 0 with p + q = 1 and for some slowly varying function


L(·). Recall that a function L(·) is called slowly-varying (at infinity) if

lim L(ax)/L(x) = 1 for all a> 0 . (11.23)


x->oo

Because a E (1,2), EIXII < 00. Let {an}n~l be a sequence of constants


satisfying
nL(an)/a~ --+ 1 as n --+ 00 . (11.24)

Then, under the dependence conditions of Theorem 11.1,

where Wa has characteristic function (11.1) with c = +00 and with the
canonical measure MaO of (11.2) with Co = a, i.e., Wa has the charac-
teristic function

(11.25)

with

(11.26)

for any Borel subset A of R


Next, define the bootstrap version of Tn = a:;;I(Sn - nJL) based on a
MBB res ample of size m = n and block length C as before, by T~,n =
a:;;I(S~,n - n/1n). Also, let t n,n(x) = P*(T~,n :::; x), x E R We shall show
that, unlike the m = o(n) case treated in Theorem 11.2, the bootstrap
t
estimator n,n converges in distribution to a random limit distribution t,
say, and therefore, fails to provide an approximation to the nonrandom,
exact distribution r n of Tn. The random limit distribution is defined t
in terms of a Poisson random measure NO on (JR, B(JR)) having mean
measure Aa of (11.26). Recall that NO is called a Poisson random measure
on (JR, B(JR)) with mean measure Aa (cf. Kallenberg (1976) if
270 11. Bootstrapping Heavy-Tailed Data and Extremes

(i) {N(A) : A E B(JR.)} is a collection of random variables defined on


some probability space (n,:F,F), such that for each iiJ E 0" N(·)(iiJ)
is a measure on (JR., B(JR.)), and

(ii) for every disjoint collection of sets AI"'" Ak E B(JR.), 2 ::::; k <
00, the random variables N(Al)"'" N(Ak) are independent Poisson
random variables with respective means Aa(Al)"'" Aa(Ak), i.e., for
anYXl,X2, ... ,Xk E {0,1,2, ... },

For simplicity of exposition, here we describe the random probability mea-


sure t in terms of the corresponding (random) characteristic function
€(t) == J exp(dx)t(dx), t E R The characteristic function € of the ran-
t
dom limit is given by

(11.27)

Note that as a consequence of the "inversion formula" (cf. Chow and Teicher
(1997)), €O uniquely determines the probability measure t.
With this, we
have the following result.

Theorem 11.3 Suppose that (11.10), (11.11), (11.21), (11.22), and


(11.24) hold. Also, suppose that £ is such that n/£ is an integer, £-1 +
n- l / 2£ = 0(1) as n - t 00, and na(£)/£ = 0(1) as n - t 00, where
a(·) denotes the strong mixing coefficient of {Xn}n>l. Then, for any
Xl, ... , Xk E JR., 1 ::::; k < 00,

as n - t 00, where t is defined via its characteristic function € given by


(11.27).

Proof: See Theorem 2.2, Lahiri (1995). D

Theorem 11.3 shows that, with the resample size m = n, the MBB es-
timator t n,n(x) converges in distribution to a random variable t(x) for
every X E JR. and, hence, is an inconsistent estimator of the nonrandom
level-2 parameter r n(x). Indeed, for any real number x, if n is large, the
bootstrap probability t n,n(X) behaves like the random variable t(x), hav-
ing a nondegenerate distribution on the interval [0,1], rather than com-
ing close to the desired target r n(x) or to the nonrandom limiting value
r a(x) == P(Wa ::::; x) = lim r n(x). From a practical point of view, this
n-+oo
11.5 Extremes of Stationary Random Variables 271

implies that the bootstrap approximations generated with m = n would


have a nontrivial variability even for arbitrarily large sample sizes, and
hence, would not be a reliable estimate of the target probability even for
large n. We point out that the conclusions of Theorem 11.3 remain true
in a slightly more general setting, where n/ f is not necessarily an integer
and the MBB is applied with the standard choice of the res ample size, viz.,
m = nl == flnjfJ.
In the next two sections, we describe some results on bootstrapping the
extremes of a stationary process.

11.5 Extremes of Stationary Random Variables


Let {Xn}n>l be a sequence of stationary random variables with one-
dimensional marginal distribution function F, and let {Xn}n>l be the asso-
ciated iid sequence, i.e., {Xdi>l is a sequence of iid random variables with
common distribution function F. For each n ~ 1, let XI:n ::; ... ::; Xn:n de-
n_ote the order-statistics corresponding to Xl, ... , X n . Define X I :n ::; ... ::;
Xn:n similarly. In this section, we review some standard results on the max-
imum order-statistic Xn:n under dependence. By considering the sequence
Yn = -Xn' n ~ 1, and using the relation XI:n = - max Yi, one can carry
l::;i::;n
out a parallel development for the minimum X I :n .
In this and in the next sections, we shall assume that the process
{Xn}n>l satisfies a strong-mixing type condition (known as Condition
D(u n )), introduced by Leadbetter (1974).
Definition 11.3 Let {Un}n>l be a sequence of positive real numbers. Then
{Xn}n>l is said to satisfy Condition D(u n ) if there is a sequence rn = o(n)
such that
sup {IP(Xj ::; Un for j E AUB)-
P(Xj ::; Un for j E A) . P(Xj ::; Un for j E B)I
:Ac{l, ... ,k}, BE{k+rn, ... ,n}, l::;k::;n-r n }
---+ 0 as n ---+ 00 .

It is clear that if the sequence {Xn}n>l is strongly mixing, then it satisfies


Condition D(u n ) for any sequence of real numbers {Un}n>l. A result of
Chernick (1981b) shows that if {Xn}n>l is a stationary Markov chain,
then it satisfies D( un) for any sequence {Un}n>l with lim F( un) = 1. The
- n---+oo
following result, due to Leadbetter (1974), specifies possible types of limit
distributions of the normalized sample maximum under Condition D(u n ).
Here and elsewhere, we say that the distribution function of a random
variable V is of the type of a given distribution function G on lR. if there
exist constants a > 0, bE lR. such that P(V ::; x) = G(a-l(x - b)), x E lR..
272 11. Bootstrapping Heavy-Tailed Data and Extremes

Theorem 11.4 Suppose that there exist constants an > 0 and bn E JR such
that
(11.28)
for some nondegenerate random variable V. Also suppose that Condition
D(u n ) is satisfied for Un = anx + bn , n ~ 1, for each x E JR. Then, the
distribution function of V is of the type of one of the following distribution
functions:
(I) A(x) = exp( -e- X ), x E JR ;

- { 0 if x:::; 0
(II) <I>a(x) = exp( -x-a) if x> 0
for some a> 0;

(III) Wa(x) = { ~xp(-Ixla) if x:::; 0


if x> 0
for some a > o.
The classes of distributions (I), (II), and (III) above are known as the
extreme-value distributions. The distribution function of the limiting ran-
dom variable V in Theorem 11.4 is necessarily one of these extreme-value
distributions up to a suitable translation and scaling. In the iid case, i.e.,
for the sequence {Xn}n>l, Gnedenko (1943) gives necessary and sufficient
conditions on the tails of F for weak convergence of the normalized maxi-
mum a~l(Xn:n - bn ) to a given extreme-value distribution. In line with the
iid case, we say that F belongs to the extremal-domain of attraction of an
extreme value distribution G, and write F E V( G), if there exist constants
an > 0 and bn E JR such that

(11.29)
the distribution function of V is G. In the iid case, a set of possible choices
of the constants {a n }n2'l and {b n }n2'l for the three extremal classes are
given by (cf. Gnedenko (1943), de Haan (1970)):

(i) For FE V(A), an = F- 1 (1 - [enj-l) - en, bn = en,


(ii) For FE V(<I>aJ, an = en, bn = 0, (11.30)
(iii) For FE V(W a ), an = MF - en, bn = M F ,

where F-l(u) = inf{x E JR : F(x) ~ u}, u E (0,1), e = 2:%"=0 11k!,


en = F- 1 (I-n- 1 ), and MF = sup{x: F(x) < I} is the upper endpoint of
F. Under suitable conditions on the dependence structure of the sequence
{X n }n2'l, we may employ the normalizing constants {a n }n>l and {b n }n>l,
specified by (11.30), in the dependent case as well. - -
An important result of Chernick (1981a) (see also Loynes (1965)) says
that if for each T > 0, there is a sequence Un == Un (T), n ~ 1, such that

lim
n->oo
n(1 - F(Un(T))) = T for all T E (0,00) , (11.31 )
11.5 Extremes of Stationary Random Variables 273

Condition D(Un(T)) holds for all T E (0,00) , (11.32)

and lim P(Xn:n ::; Un(TO)) exists for some TO


n---->oo
> 0, then there exists a
constant () E [0, 1] such that

lim p(Xn:n ::; Un(T)) = e- 8r for all T E (0, (0) . (11.33)


n---->oo

This result leads to the following definition.

Definition 11.4 A stationary process {Xn }n>l is said to have extremal


index () if conditions (11.31)-(11.33) hold.

When the extremal index () > 0, both Xn:n and its iid counterpart Xn:n
have extremal limit distributions ofthe same type. However, for () = 0, Xn:n
and Xn:n .may have different asymptotic behaviors. Here, we shall restrict
our attention only to the case () > 0, covered by the following result.

Theorem 11.5 Suppose that the sequence {Xn}n~l has extremal index
() > °
and that F E V(G) for some extreme value distribution G. Let
{an}n~l and {bn}n~l be the sequences of constants specified by (11.30) for
the class containing G. Then

where the distribution function of V is of the type G, i.e., P(V ::; x)


G((x - b)/a), x E lR for some a > 0, bE R

Proof: Follows from Corollary 3.7.3 of Leadbetter, Lindgren and Rootzen


(1983) and the discussion above. D

The extremal index () is a parameter whose value is determined by the


joint distribution of the sequence {Xn }n>l. Theorem 11.5 shows that for
() > 0, both Xn:n and its iid counterpart-Xn :n may be normalized by the
same sequences of constants {an}n~l and {bn}n~b and the limit distribu-

°
tions of the normalized maxima are of the same type but not necessarily
identical. When < () < 1, the two limit laws are related by a nontrivial
linear transformation in the sense that if a;;: 1 (Xn:n - bn ) -+d V, then
a;;:l(Xn:n - bn ) --,>d [aV + b] for some (a, b) #- (1,0). Furthermore, the
°
values of (a, b) depend on (). Thus, for < () < 1, the limit distribution
in the dependent case is different from that in the iid case, and the effect
of the dependence of {Xn }n>l shows up in the limit through the extremal
index (). In contrast, when () = 1, both Xn:n and Xn:n have the same limit
distribution. In this case, the effect of the dependence of {Xn}n~l vanishes
asymptotically. This observation has an important implication regarding
validity of the bootstrap methods for dependent random variables. In the
next section, we shall show that with a proper choice of the resampling
274 11. Bootstrapping Heavy-Tailed Data and Extremes

size, the MBB provides a valid approximation for all () E (0,1], while the
IID-bootstrap method of Efron (1979) is effective only in the case () = 1.
Because of the special role played by the case () = 1, we now briefly
describe a general regularity condition on the sequence {Xn}n;:::l that leads
to the extremal index () = 1.

Definition 11.5 Let {Un}n>l be a sequence of real numbers. Then,


{Xn}n;:::l is said to satisfy Condition D'(u n ) if

lim limsup n [
k---+-oo n~oo
L
2$j$n/k
P(XI > Un, Xj > un)] = °.
To get some idea about the class of processes for which Condition D' (un)
holds, suppose that {Xn}n;:::l are iid and that nP(X I > un) = 0(1). Then
it is easy to check that Condition D'(u n ) holds. However, condition D'(u n )
need not hold for a sequence {Un}n;:::l with nP(X I > un) = 0(1), even
when {Xn}n>l are m-dependent with m ~ 1. The following result shows
that Xn:n and Xn:n have the same limit law when Condition D'(u n ) holds.
Theorem 11.6 Suppose that a;;: I (Xn:n - bn ) ---+d V for some constants
an > 0, bn E JR, n ~ 1 where V is a non degenerate random variable. Also,
suppose that Conditions D(u n ) and D'(u n ) hold for all Un == anx + bn ,
n ~ 1, X E JR. Then,

Proof: See Theorem 3.5.2, Leadbetter, Lindgren and Rootzen (1983). 0

In the next section, we describe properties of the MBB and the IID
bootstrap of Efron (1979) for stationary random variables under Conditions
like D(u n ) and D'(u n ).

11.6 Results on Bootstrapping Extremes


First we consider consistency properties of the MBB approximation to the
distribution of a normalized maximum. Suppose that {Xn}n;:::l is stationary
and has an extremal index () E (0,1]. Let Xn = {Xl"'" Xn} denote the
observations and let X~,n = {Xi, ... ,X,';J denote the MBB resample of
size m based on k resampled blocks of length C. Thus, here m = kC. Let
Xi:m ::; ... ::; X;':m denote the corresponding bootstrap order-statistics. To
define the MBB version of the normalized maximum Vn == a;;: I (Xn:n - bn ),
here we suppose that the constants {an}n;:::l and {bn}n;:::l are known. Then,
the bootstrap version of Vn is given by

(11.34)
11.6 Results on Bootstrapping Extremes 275

For proving consistency of the MBB approximation, in addition to Condi-


tion D ( un), we shall make use of the following weaker version of the strong
mixing condition.
Definition 11.6 For n 2:: 1, let

a(n) sup {IP(Xj E lu, j E A U B)


- P(Xj E lu, j E A) P(Xj E lu, j E B) I : A C {I, ... , k},
BE{k+n, ... }, k2::1, luE{(-oo,u],(u,oo)}, uEffi.}.
(11.35)

It is clear that for all n 2:: 1, a(n) ::; a(n), the strong-mixing coefficient
of the sequence {Xn }n> 1. Here we shall require that a( n) decreases at a
polynomial rate as n ---t 00. The following result proves the validity of the
MBB approximation.

Theorem 11. 7 Suppose that {Xn }n> 1 is a stationary process with ex-
tremal index () E (0,1] (as defined in Definition 11.4) and that a(r) ::; r- Tl ,
r 2:: 1, for some 'r/ > 0. Further suppose that (11.28) holds and the MBB
block size variable e and the number of resampled blocks k satisfye = ln E J,
°
k = lnoJ for some < E < 1, 0< Ij < min{E, I;E}. Then,

sup Ip*(Ve*m n ::; x) - P(Vn ::; x)1


xElR "
---'t p ° as n ---t 00 , (11.36)

where Ve:m,n is defined by (11.34)·


Proof: Follows from Theorem 3.8 and Corollary 3.2 of Fukuchi (1994). D

Thus, under the conditions of Theorem 11. 7, the MBB approximation to


the distribution of Vn is consistent for all values of the extremal index () in
the interval (0, 1]. In addition to the regularity of the upper tail of F (cf.
(11.31)), this requires the dependence structure of the process {Xn }n>1 to
satisfy Condition D(Un(T)) in (11.32) and the weak mixing conditio~ on
a(-). Furthermore, the conditions on the block length variable e and the
res ample size m require that e ---t 00 and m = o( n) as n ---t 00. Both of
these conditions are necessary for the validity of the MBB method. When
the extremal index () lies in the interval (0,1), it is produced by the depen-
dence among all Xn's, and as a consequence, the block length e must grow
to infinity with the sample size in order to capture this effect of the depen-
dence on the limit distribution of Vn . For () = 1, the limit distribution of
the normalized maximum is essentially determined by the one-dimensional
marginal distribution function F of Xl. As a result, one may have consis-
tency even when e does not go to infinity (see the discussion on the lID
res amp ling scheme of Efron (1979) below). On the other hand, the con-
dition "m = o(n)" on the res ample size m is needed to ensure that the
276 11. Bootstrapping Heavy-Tailed Data and Extremes

conditional distribution of Ve*m n (given the Xn 's) converges to the correct


nonrandom limit. For the extre'mes, bootstrap approximations tend to be-
have in a way similar to the case of bootstrapping the normalized sample
sum of heavy-tailed data. Indeed, when the res ample size m = n, a result of
Fukuchi (1994) shows that the bootstrap approximation generated by the
IID resampling scheme of Efron (1979) has a random limit (see Theorem
11.9 below).
Next we briefly state the results on the IID resampling scheme of Efron
(1979) as alluded to in the above paragraph. Let V~,n == Vl~m,n denote the
IID-bootstrap version of Vn as defined by (11.34) with £ == 1 and k = m.
The first result gives conditions for the consistency of .c(V~ n I X n ), where
X n ={Xl ,X2 , •.• ,Xn }. '

Theorem 11.8 Let {Xn}n::::l be a stationary process such that (11.28)


holds. Suppose that Conditions D(u n ), D'(u n ) hold with Un = un(x) =
anx+bn , n ~ 1, for all x E JR and that a(n) = O(n-O) as n ---+ 00 for some
8 ~ 2. If, in addition, m = o(n) as n ---+ 00, then

sup Ip*(V~,n ~ x) - P(Vn ~ x)1 ---+p 0 as n ---+ 00 .


x

Proof: Follows from Theorem 3.4 and Corollary 3.1 of Fukuchi (1994),
by noting that his mixing coefficient aj (u) is bounded above by the
coefficient a(j) of (11.35) for all j ~ 1, u E IR. 0

As pointed out in Section 11.4 (cf. Theorem 11.6), under Condition


D'(u n ) of Theorem 11.8, the extremal index () = 1, and Vn has the same
limit distribution as the normalized maximum Vn of the associated iid se-
quence. As a result, under the conditions of Theorem 11.8, the dependence
of the Xn's does not have any effect on the limit law and the IID bootstrap
method provides a valid approximation to the distribution of the normal-
ized sample maximum even for such dependent random variables, provided
that the resample size m = o( n). However, if the resample size m = n, the
consistency is no longer guaranteed, as shown by the next result.

Theorem 11.9 Let {Xn}n::::l be a stationary sequence such that (11.28)


holds for some an > 0, bn E JR, and for some nondegenerate random vari-
able V. Suppose that Condition D' (un) holds for Un = anx + bn , n ~ 1, for
all x E JR and that
(11.37)

for some sequence {Pn}n>l of positive integers satisfying Pn o(n) for


every Xr = (Xl, ... ,Xr )' E JRr, r ~ 1, where

SUp{IP(XjEIj , jEAUB)-
P(Xj E I j , j E A) . P(Xj E Ij , j E B)I :
11.7 Bootstrapping Extremes With Estimated Constants 277

1j E {( -00, anXi + bnl :1~i ~ r} for j EAUB ,

AE{I, ... ,k}, BE{k+Pn, ... ,n}, l~k~n-Pn}.


(11.38)

Then, for m = n,
P*(V';:n ~ x) - td exp( -r x)
for every x E JR., where rx is a Poisson random variable with the mean
-log [P(V ~ x)].
Thus, under the conditions of Theorem 11.9, the bootstrap distribution
function at any given x E JR., being a random variable with values in the
interval [0, 1], converges in distribution to a nondegenerate random variable
exp( - r x). As a consequence, when the resample size m equals n, the re-
sulting bootstrap estimator of the target probability P(Vn ~ x) fluctuates
around the true value even for arbitrarily large sample sizes. Like the heavy-
tail case, a similar behavior is expected of the MBB even when the block
length £ ---+ 00, if the resample size m grows at the rate n, i.e., if m '" n.
However, a formal proof of this fact is not available in the literature.
We conclude the discussion of the asymptotic properties of the IID boot-
strap of Efron (1979) by considering the case where {Xn}n?:l has an ex-
tremal index () E (0,1). In this case, Fukuchi (1994) (cf. p. 47) shows
that under regularity conditions similar to those of Theorem 11.8, for
m = o(nl/2),

while
P(Vn ~ x) -t exp ( - ()')'(x))
for each x E JR., where ')'(x) == lim n[1 - P(XI ~ anx + bn )]. Thus, even
n--->oo
with a resample size m that grows at a slower rate than the sample size
n, the IID resampling scheme of Efron (1979) fails. As explained earlier,
the reason behind this is that the value of () E (0,1) is determined by the
joint distribution of the Xi'S, but when a single observation is resampled
at a time, this information is totally lost. As a consequence, the limit dis-
tribution of the IID bootstrap version Vr:;"n of Vn coincides with the limit
distribution of the normalized sample maximum Vn of the associated iid
sequence {Xn}n?:l.

11.7 Bootstrapping Extremes With Estimated


Constants
For many applications, the assumption that the normalizing constants
{an}n?:l, {bn}n?:l are known is very restrictive. In such situations, we may
278 11. Bootstrapping Heavy-Tailed Data and Extremes

be interested in bootstrapping the sample maximum where some random


normalizing factors may be used in place of {an}n?l and {bn}n?l to yield
a nondegenerate limit distribution. Accordingly, let {an}n?l and {bn}n?l
be random variables with an E (0,00) and bn E ~ for all n ;::: 1 such that

(11.39)

and
(11.40)
for some constants ao E (0,00) and bo E R Here, we do allow the possibility
that bn or an be a function of a population parameter and be nonrandom,
so that the bootstrap approximation may be used to construct inference
procedures like tests and confidence intervals for the parameter involved.
For example, we may be interested in setting a confidence interval for the
upper endpoint Mp == sup{x : F(x) < 1} of the distribution function F
of Xl when F E D(~a) (cf. Theorem 11.4). In this case we would set
bn = Mp and replace the corresponding scaling constant an = Mp -
F- l (l-l/n) of (11.30) by a random scaling constant an that is a suitable
function of Mp and the empirical quantile function F;;l. Then, we may
apply the MBB to the pivotal quantity Vn == a;;: 1 (Xn:n - MF) and construct
bootstrap confidence intervals for the parameter M p . In general, consider
the normalized sample maximum with "estimated" constants

(11.41)
Let a;", b;" be some suitable functions of the MBB sample {Xi, ... , X;;'},
based on blocks of length £, and of the data Xl' ... ' X n , such that for every
E > 0,

P(la;;,la;" - aol > E I ,1'(0) + P(la;.;;l(b;" - bm ) - bol > E I ,1'(0)


-+ 0 in probability as n -+ 00 , (11.42)

where ao E (0, (0) and bo E ~ are as in (11.39) and (11.40), respectively,


and where ,1'00
= a({Xl' X 2 , .. . }). Then, a bootstrap version of Vn is given
by
(11.43)
As in the definition of Vn (cf. (11.41)), here we do allow the possibility that
a;" or b;" in (11.42) and (11.43) be just a function of Xl, ... ,Xn and do
not involve the Xi's. A prime example of this is the "hybrid" MBB version
of Vn , given by
Ve:m,n = a;;,l (X;';,:m - bm) , (11.44)
which corresponds to (11.43) with
11. 7 Bootstrapping Extremes With Estimated Constants 279

The following result shows that both Vi:m,n and Ve7m,n provide a valid
approximation to the distribution of Vn .
Theorem 11.10 Suppose that the conditions of Theorem 11.7 hold and
that relations (11.39) and (11.40) hold. Then,
sup Ip*(Ve7m,n ::; x) - P(Vn ::; x)1 -----+p 0 as n --+ 00 . (11.45)
x

If, in addition, relation (11.42) holds, then


sup Ip*("Cf,7m,n ::; x) - P(Vn ::; x)1 -----+p 0 as n --+ 00 . (11.46)
x

Proof: We consider (11.46) first. Note that by (11.39) and (11.40) and
Slutsky's theorem,
d 1
Vn a O (V - bo ) as n (11.47)
A

-----+ -+ 00

where V is as in Theorem 11.4 (cf. (11.28)). With Vitm n given by (11.34),


we may write ' ,

"Cf,:m,n = [am/a;"lV£:m,n + a;,.-l(bm - b;") . (11.48)


From (11.42), it easily follows that for each E > 0,

p(1 :: -
m
ao 11 > E I Xoo) + p(1 bma~ b;" + ao1bol > E I Xoo)
m
--+ 0 in probability as n --+ 00 . (11.49)
Hence, by Lemma 4.1, (11.46) follows from (11.47)-(11.49) and Theorem
11.7.
Next consider (11.45). Because am and bm are Xoo-measurable, with
a;" = am and b;" = bm in (11.42), for any E > 0, we get
the left side of (11.42)
= n(la~lam - aol > E) + n(la~l(bm - bm) - bol > E) ,
which goes to zero in L1 and, hence, in probability as n --+ 00, by (11.39)
and (11.40). Hence, (11.45) follows from (11.46). D

A similar result may be proved for the lID bootstrap of Efron (1979)
under the regularity conditions of Theorem 11.8. Theorem 11.10 and its
analog for the lID bootstrap in the "unknown normalizing constant" case
may be used for statistical inference for dependent random variables. For
results along this line for independent data, see Athreya and Fukuchi (1997)
who apply the lID bootstrap of Efron (1979) to construct CIs for the end-
points of the distribution function F of Xl, when the random variables
Xn'S are iid. For results on bootstrapping the joint distribution of the sum
and the maximum of a stationary sequence, see Mathew and McCormick
(1998).
12
Resampling Methods for Spatial Data

12.1 Introduction
In this chapter, we describe bootstrap methods for spatial processes ob-
served at finitely many locations in a sampling region in ]Rd. Depending
on the spatial sampling mechanism that generates the locations of these
data-sites, one gets quite different behaviors of estimators and test statis-
tics. As a result, formulation of resampling methods and their properties
depend on the underlying spatial sampling mechanism. In Section 12.2,
we describe some common frameworks that are often used for studying
asymptotic properties of estimators based on spatial data. In Section 12.3,
we consider the case where the sampling sites (also referred to as data-sites
in this book) lie on the integer grid and describe a block bootstrap method
that may be thought of as a direct extension of the MBB method to spatial
data. Here, some care is needed to handle sampling regions that are not
rectangular. We establish consistency of the bootstrap method and give
some numerical examples to illustrate the use of the method. Section 12.4
gives a special application of the block resampling methods. Here, we make
use of the resampling methods to formulate an asymptotically efficient least
squares method of estimating spatial covariance parameters, and discuss its
advantages over the existing estimation methods. In Section 12.5, we con-
sider irregularly spaced spatial data, generated by a stochastic sampling
design. Here, we present a block bootstrap method and show that it pro-
vides a valid approximation under nonuniform concentration of sampling
sites even in presence of infill sampling. It may be noted that infill sam-
282 12. Resampling Methods for Spatial Data

pIing leads to conditions of long-range dependence in the data, and thus,


the block bootstrap method presented here provides a valid approximation
under this form of long-range dependence. Resampling methods for spatial
prediction are presented in Section 12.6.

12.2 Spatial Asymptotic Frameworks


In this section, we describe some spatial asymptotic frameworks that are
commonly used for studying large sample properties of inference proce-
dures. For time series data, observations are typically taken at a regular
interval of time and the limiting procedure describes the long-run behavior
of a system as the time approaches "infinity". Because of the unidirec-
tional flow of time, the concept of "infinity" is unambiguously defined. For
random processes observed over space (and possibly, also over time), this
uniqueness of limiting procedures is lost. In this case, there are several
ways of approaching the "ultimate state" or the "infinity," giving rise to
different asymptotic frameworks for studying large sample properties of
inference procedures, including the bootstrap. It turns out that these dif-
ferent asymptotic structures arise from two basic paradigms, known as the
increasing domain asymptotics and the infill asymptotics (cf. Chapter 5,
Cressie (1993)). When all sampling sites are separated by a fixed positive
distance and the sampling region becomes unbounded as the sample size
increases, the resulting structure leads to increasing domain asymptotics.
This is the most common framework used for asymptotics for spatial data
and often leads to conclusions similar to those obtained in the time series
case. Processes observed over increasing and nested rectangles on the inte-
ger grid 7L. d in the d-dimensional space provide examples of such an asymp-
totic structure. On the other hand, if an increasing number of samples are
collected at spatial sampling sites from within a fixed bounded region of IR d ,
the resulting structure leads to infill asymptotics. In this case, the mini-
mum distance among the sampling sites tends to zero as the sample size
increases, and typically results in very strong forms of dependence in the
data. Such a structure is suitable for Mining and other Geostatistical appli-
cations where a given resource is sampled increasingly over a given region.
It is well known that under infill asymptotics many standard inference pro-
cedures have drastically different large sample behaviors compared to those
under increasing domain asymptotics. See, for example, Morris and Ebey
(1984), Stein (1987, 1989), Cressie (1993), Lahiri (1996c), and the refer-
ences therein. In some cases, a combination of the two basic asymptotic
frameworks is also employed. In Sections 12.5 and 12.6, we shall consider
one such structure (which we refer to as a mixed increasing domain asymp-
totic structure), where the sampling region grows to infinity and at the same
time, the distance between neighboring sampling sites goes to zero. Except
12.3 Block Bootstrap for Spatial Data on a Regular Grid 283

for some prediction problems treated in Section 12.6.2, the sampling region
R == Rn in all other sections becomes unbounded as n increases to infinity.
We conclude this section with a description of the structure of the sam-
pling regions R n , n 2: 1. Let R c (-!, !]d be an open connected set con-
taining the origin and let Ro be a prototype set for the sampling regions
such that R C Ro c cl.(R), where cl.(R) denotes the closure of the set R.
Also, let {An}n~l C [1,00) be a sequence of real numbers such that An loo
as n ~ 00. We shall suppose that the sampling region Rn is obtained by
"inflating" the prototype set Ro by the scaling constant An as

(12.1)

Because the origin is assumed to lie in R o, relation (12.1) shows that the
shape of the sampling region remains unchanged for different values of n.
Furthermore, this formulation allows the sampling region Rn to have a
wide range of (possibly irregular) shapes. Some examples of such regions
are spheres, ellipsoids, polyhedrons, and star-shaped regions. Here we
call a set A C ]Rd containing the origin star-shaped if for any x E A, the
line joining x and the origin lies in A. As a result, star-shaped regions
can be nonconvex. To avoid pathological cases, we shall suppose that the
prototype set Ro satisfies the following boundary condition:

Condition B For every sequence of positive real numbers {an}n~l with


an ~ 0 as n ~ 00, the number of cubes of the form an(i + [0, l)d), i E Zd
that intersects both Ro and R8 is of the order O([a n ]-(d-1)) as n ~ 00.

This condition is satisfied by most regions of practical interest. For ex-


ample, Condition B is satisfied in the plane (i.e., d = 2) if the boundary
oRo of Ro is delineated by a simple rectifiable curve of finite length. When
the sampling sites lie on the integer grid Zd, an important implication of
Condition B is that the effect of the data points lying near the boundary
of Rn is negligible compared to the totality of data points.

12.3 Block Bootstrap for Spatial Data on a


Regular Grid
In this section, we consider bootstrapping a spatial process indexed by the
integer grid Zd. Let Rn denote the sampling region, given by (12.1) for
some prototype set Ro satisfying the boundary Condition B. Suppose that
{Z (s) . s E Zd} is a stationary spatial process that is observed at finitely
many locations Sn == {Sl,"" S N n }, given by the part of the integer grid
Zd that lies inside R n , i.e.,

(12.2)
284 12. Resampling Methods for Spatial Data

n ~ 1. Note that the number N n of elements of the set Rn nz d is determined


by the scaling constant An and the shape of the prototype set Ro. As a
result, in this case, the collection {Nn : n ~ I} of all possible sample sizes
may not equal N, the set of all positive integers. For spatial data observed
on a regular grid, this is the primary reason for using N n to denote the
sample size at stage n, instead of using the standard symbol n, which runs
over N. For notational simplicity, we shall set N n = N for the rest of this
section. This N should not be confused with the N used in Chapters 2-11
to denote the number of overlapping blocks of length £ in a sample of size
n from a time series.
It is easy to see that the sample size N and the volume of the sampling
region Rn satisfies the relation

N rv vol.(Ro) . A~ , (12.3)

where, recall that, for any Borel set A C ]Rd, vol.(A) denotes the volume
(i.e., the Lebesgue measure) of A and where for any two sequences {Tn}n~l
and {tn}n~l of positive real numbers, we write Tn rv tn if Tn/tn ----- 1 as
n ----- 00. Let
Tn = tn(Zn; 0)
be a random variable of interest, where Zn = {Z(Sl), ... , Z(SN)} denotes
the collection of observations and where 0 is a parameter. For example,
we may have Tn = VN(Zn - /L) with Zn = N- 1 2:!1 Z(Si) denoting the
sample mean and /L denoting the population mean. Our goal is to define
block bootstrap estimators of the sampling distribution of Tn.
Different variants of spatial subsampling and spatial block bootstrap
methods have been proposed in the literature. See Hall (1985), Possolo
(1991), Politis and Romano (1993, 1994a), Sherman and Carlstein (1994),
Sherman (1996), Politis, Paparoditis and Romano (1998, 1999), Politis,
Romano and Wolf (1999), and the references therein. Here we shall follow a
version of the block bootstrap method, suggested by Biihlmann and Kiinsch
(1999b) and Zhu and Lahiri (2001), that is applicable to sampling regions
of general shapes, given by (12.1).

12.3.1 Description of the Block Bootstrap Method


Let {,Bn}n>l be a sequence of positive integers such that

,B;;1 + ,Bn/An = 0(1) as n _____ 00 . (12.4)

Thus, ,Bn goes to infinity but at a rate slower than the scaling factor An
for the sampling region Rn (cf. (12.1)). Here, ,Bn gives the scaling factor
for the blocks or subregions for the spatial block bootstrap method. Let
U = [0, l)d denote the unit cube in ]Rd. As a first step, we partition the
sampling region Rn using cubes of volume ,B~. Let Kn = {k E Zd : ,Bn (k +
12.3 Block Bootstrap for Spatial Data on a Regular Grid 285

U) n R", =f. 0} denote the index set of all cubes of the form f3n(k + U) that
have nonempty intersections with the sampling region Rn. We will define
a bootstrap version of the process Z (.) over Rn by defining its version on
each of the subregions
(12.5)
For this, we consider one R",(k) at a time and for a given Rn(k), resample
from a suitable collection of subregions of Rn (called subregions of "type
k") to define the bootstrap version of Z (.) over Rn (k). Let In = {i E 7i} :
i + f3nU C R",} denote the index set of all cubes of volume f3~ in R"" with
"starting points" i E Zd. Then, {i + f3nU : i E In} gives us a collection of
cubic subregions or blocks that are overlapping and are contained in Rn.
Furthermore, for each i E In, the subsample of observations {Z(s) : s E
Zd n [i + f3n U]} is complete in the sense that the Z(·)-process is observed
at every point of the integer grid in the subregion i + f3nU.
For any set A C ]Rd, let Zn(A) = {Z(s): s E AnSn } denote the set of
observations lying in the set A, where, recall that Sn == {Sl' ... , S N} is the
set of all sampling sites in Rn. Thus, in this notation, Zn(R",) is the entire
sample Zn = {Z(Sl), ... , Z(SN)} and Zn(Rn(k)) denotes the subsample
lying in the subregion Rn(k), k E Kn. For the overlapping version of the
spatial block bootstrap method, for each k E K n , we resample one block
at random from the collection {i + f3nU : i E In}, independently of the
other resampled blocks, and define a version of the observed process on the
subregion Rn(k) using the observations from the resampled subregion. To
that end, let K == Kn denote the size of Kn and let {h : k E Kn} be a
collection of K iid random variables having common distribution
1
P(h = i) = lIn I' i E In· (12.6)

For k E K n , we define the overlapping block bootstrap version Z~(Rn(k))


of Zn(Rn(k)) by using a part of the resampled block Zn(Ik + f3nU) that is
congruent to the subregion Rn(k). More precisely, we define Z~(Rn(k)) by

(12.7)

Note that the set [R",(k) - kf3n + IkJ is obtained by an integer translation of
the subregion Rn(k) that maps the starting point kf3n of the set (k+U)f3n
to the starting point Ik of the resampled block (h + f3nU). As a result,
Rn(k) and (h + f3nU) n [Rn(k) - kf3n + IkJ have the same shape, and the
resampled observations retain the same spatial dependence structure as the
original process Zn(Rn(k)) over the subregion R",(k). Furthermore, because
of translation by integer vectors, the number of resampled observations in
Z~(Rn(k)) is the same as that in Zn(Rn(k)), for every k E Kn.
To gain further insight into the structure of the resampled blocks of
observations Z~(Rn(k))'s in (12.7), let K ln = {k E Kn : (k + U)f3n eRn}
286 12. Resampling Methods for Spatial Data

and K 2n = {k E Kn : (k + U)f3n n R~ =f 0}, respectively, denote the


index set of all interior cubes contained in Rn and that of all boundary
cubes that intersect both Rn and R~. See Figure 12.1. Note that for all
k E KIn, Rn(k) = (k + U)f3n and, hence, it is a cubic subregion of Rn.
However, for k E K 2n , Rn(k) is a proper subset of (k + U)f3n and the
shape of Rn(k) depends on the shape of the boundary of Rn. In particular,
for k E K 2n , Rn(k) need not be a cubic region. As a result, for k E KIn,
Z~(Rn(k)) contains all the observations from the resampled cubic subregion
h + f3nU eRn. In contrast, for k E K 2n , Z~(Rn(k)) contains only a subset
of the observed values in Ik + f3nU, lying in a subregion of h + f3nU that is
congruent to Rn(k). Note that for k E KIn, the number f of observations
in the resampled block Z~(Rn(k)) is precisely f3~. Hence, by (12.3) and
(12.4), the typical block size f and the original sample size N satisfies the
relation
f- 1 + N- 1f = 0(1) as n - t 00 ,

as in the time series case (cf. Chapter 2). The overlapping block bootstrap
version Z~(Rn) of Zn(Rn) is now given by concatenating the resampled
blocks of observations {Z~(Rn(k)) : k E Kn}. Note that by our construc-
tion, the res ample size equals the sample size. Hence, the bootstrap version
of a random variable Tn == tn(Zn; fJ) is given by

(12.8)

where the same function t n (·; .), appearing in the definition of Tn, is also
used to define its bootstrap version. Here, On is an estimator of fJ, defined
by mimicking the relation between the joint distribution of Zn and fJ. For
an example, consider Tn = IN(Zn - J.L) with Zn = n- 1 2:;:'1 Z(Si). Then,
the overlapping block bootstrap version of Tn is given by

where Z~ is the average of the N resampled observations Z~(Rn), fln =


E*Z~, and E* denotes the conditional expectation given {Z(s) : s E Zd}.
Similarly, if Tn = IN(H(Zn) - H(J.L)) for some function H, then we may
define T;;' as T;;' = IN(H(Z~) - H(fln)). Note that the block bootstrap
method described above can also be applied to vector-valued spatial pro-
cesses with obvious notational changes. We shall make use of the block
bootstrap for the vector case later in the section where we consider M-
estimators of parameters of a vector-valued spatial process Z(·).
Next we briefly describe the nonoverlapping version of the block boot-
strap method. Let Rn(k), k E Kn denote the partition of the sampling
region Rn given by (12.5). For the nonoverlapping version, we restrict at-
tention to the collection of nonoverlapping cubes .:Tn == {j E Zd : [j +U] f3n C
12.3 Block Bootstrap for Spatial Data on a Regular Grid 287

a boundary block

r-- 11
/ ~ I'--
/ if
< II
1\ ~

\ - ~

a complete block

(a) (b) (c)

FIGURE 12.1. The blocking mechanism for the overlapping spatial block boot-
strap method. (a) Partition of a pentagonal sampling region Rn by the subregions
Rn(k), k E lCn of (12.5); (b) a set of overlapping "complete" blocks; (c) a set
of overlapping copies of the "boundary" block shown in (a). Bootstrap versions
of the spatial process Z(·) over the shaded "complete" and the shaded "bound-
ary" blocks in (a) are, respectively, obtained by resampling from the observed
"complete" blocks in (b) and the observed "boundary" blocks in (c).

Rn} and generate K iid random variables {Jk : k E JC n } with common dis-
tribution
1
P( J 1 = j) = l.Jn I ' j E .In , (12.9)

where K = Kn is the size of JC n . Then, the nonoverlapping bootstrap


version of the spatial process Zn(Rn(k)) over the subregion Rn(k) is given
by

Z~(2) (Rn(k)) = Zn O(Jk + U),Bn] n [Rn(k) - k,Bn + Jk,BnJ) , k E JC n .

This is equivalent to selecting a random sample of size K with replacement


from the collection of all nonoverlapping cubes {(j + U),Bn : j E .In} and
defining a version of the Z (. )- process on each subregion Rn (k) by consider-
ing all data-values that lie on a congruent part of the resampled cube. The
nonoverlapping block bootstrap version of Tn = tn(Zn; 0) is now given by

T*(2)
n = t n (Z*(2)
n D). en )
(.LLn, , (12.10)

where Z~(2) (Rn) is obtained by concatenating the resampled blocks


Z~(2)(Rn(k)), k E JC n , and en is a suitable estimator of 0 that is defined
by mimicking the relation between 0 and Z(Rn), as before.
For both versions of the spatial block bootstrap, we may define a "blocks
of blocks" version for random variables that are (symmetric) functions of
p-dimensional vectors of the form Y(s) = ((Z(s + hd, ... , Z(s + h p ))',
288 12. Resampling Methods for Spatial Data

s E Rn,p n Zd for some pEN, where h l , ... , hp E Zd are given lag vectors
and
Rn,p == {s E]Rd : s + hI, ... , s + hp ERn} .
For example, consider the centered and scaled estimator
Tn = INn (h)ll/2(On - (}) ,
where () = Cov(Z(O), Z(h)) denotes the autocovariance of the spatial pro-
cess at a given lag h E Zd \ {O}, On = INn(h)l- l LSENn(h) Z(s)Z(s + h) -
(INn(h)l- l LSENn(h) Z(S))2 is a version of the sample auto covariance esti-
mator, and Nn(h) = {s E Zd : s, s + hE Rn}. Here, recall that, IAI denotes
the size of a set A. Then, Tn is a function of the bivariate spatial process
Y(s) = (Z(s), Z(s + h))', s E Rn,2, where the set R n,2 is given by
R n,2 = {s E]Rd : s, s + hE Rn} = Rn n (Rn - h) .
As in the time series case, the bootstrap version of such variables may be
defined by using the vectorized process {Y(s) : s E R n ,2 n Zd}.
Next we return to the case of a general p-dimensional vectorized process
Y(·). Let Tn,p = tn(Yn; (}) be a random variable of interest, where Yn =
{y(s) : s E Rn,p} and () is a parameter. To define the overlapping bootstrap
version of Tn,p, we introduce the partition {Rn,p(k) : k E Kn,p} of Rn,p by
cubes of the form (k + U)(3n, k E Zd as before, where Kn,p = {k E Zd :
(k + U)(3n n Rn,p # 0}. Next, we resample IKn,pl-many indices randomly
and with replacement from the collection In,p == {i E Zd : i+U(3n c Rn,p},
define a version of the Y-process on each subregion Rn,p(k), k E Kn,p as
before, and then concatenate the resampled blocks of Y-values to define a
version Y~ of Yn over the region Rn,p. The "blocks of blocks" version of Tn
is now given by
T~,p = tn(Y~;On) (12.11)
where On is a suitable estimator of ().

12.3.2 Numerical Examples


In this section, we illustrate the implementation of the spatial block boot-
strap method with a numerical example. Let {Z (s) : s E Z2} be a zero mean
stationary Gaussian process with the isotropic Exponential variogram:

2"((h; (}) E(Z(h)-Z(0))2, hEZ2


{ ~l + (}2 (1 - exp( -(}31Ihll)) , h#O (12.12)
h= 0,
() =((}l,(}2,(}3)' E [0,(0) x (0,00) x (0,00) == 8. The variogram provides a
description of the covariance structure of the spatial process Z (.). The pa-
rameter (}l is called the "nugget" effect, which often results from an additive
12.3 Block Bootstrap for Spatial Data on a Regular Grid 289

white noise component of Z(·). The "isotropy" condition on the random


field means that the variogram at lag h E 7i} \ {O} depends only on the
distance Ilhll between the spatial indices of the variables Z(O) and Z(h),
but not on the direction vector h/llhll. For more details on the variogram
and its use in spatial statistics, see Cressie (1993) and the discussion in
Section 12.4 below. Plots of the variogram (12.12) for the parameter val-
ues () = (0,2,1)' (with no-nugget effects) and () = (1,1, I)' are given in
Figure 12.2. Realizations of the Gaussian random field Z (.) were generated
over a rectangular region of size 20 x 30 using these parameter values. The
corresponding data sets are shown in Figures 12.3 and 12.4, respectively.
Note that the surface corresponding to the "no-nugget" effect case (viz.,
()I = 0) has lesser "small-scale variation" than the surface with a nonzero
nugget effect case (viz., ()I = 1).

o
N

(0.2.1)
(1.1.1)
"'o

q
o ~ ________- ,________- ,________- .________ -.~

o 2 4 8

FIGURE 12.2. Plots of the isotropic variogram 2,(h; 0) of (12.12) against Ilhll for
o= (0,2,1)' (shown in solid line) and for 0 = (1,1,1)' (shown in dot-and-dash
line).

To apply the spatial block bootstrap method, we identify Rn with the


20 x 30 rectangular region [-10, 10) x [-15, 15), and fix the scaling constant
An and the prototype set Ro C [-!, !l2 as An = 30 and Ro = [-!,!)
x
[-!, !). Here Ro is chosen to be a maximal set in [-!, !l2 that corresponds
to the given rectangular region [-10, 10) x [-15,15) up to a scaling constant.
This, in turn, determines An uniquely. We applied the block bootstrap
method to each of the above data sets with two choices of f3n, given by
f3n = 5 and 8. In the first case, 5 divides both 20 and 30, so that the
partitioning subregions Rn(k)'s of (12.5) are all squares (and hence, are
complete). Thus, there are 24 subregions in the partition (12.5), given by
Rn(k) = [5kI, 5k I + 5) X [5k 2 ,5k2 + 5), k = (kl' k 2 )' E 7i}, -2 ~ kl < 2,
290 12. Resampling Methods for Spatial Data

: ....
....... . ...
..........
...... .....
, ....... . ..........
.......
............. ..... ,
·:··1
o

FIGURE 12.3. Realizations of a zero mean unit variance Gaussian random field '"
with variogram (12 .12) over a 20 x 30 region on the planar integer grid for
() = (0,2,1)' (with no nugget effect).

...... ·····r· """


....... .... . ...

FIGURE 12.4. Realizations of a zero mean unit variance Gaussian random field
with variogram (12.12) over a 20 x 30 region on the planar integer grid for
() = (1,1 , 1)' with nugget effect ()1 = 1.

-3 :::; k2 < 3. To define the overlapping block bootstrap version of the


Z(·)-process over R n , we resample 24 times randomly, with replacement
from the collection of all observed complete blocks
12.3 Block Bootstrap for Spatial Data on a Regular Grid 291

For (3n = 8, there are 4 interior blocks of size 8 x 8, while 12 rectangular


boundary blocks, 4 of size 8 x 7, 4 of size 2 x 8, and 4 of size 2 x 7. To
define the bootstrap version of the Z(-)-process over these 16 subregions,
we resample 16 blocks randomly with replacement from the collection

and use all observations from 4 of these for the 4 "complete" blocks of size
8 x 8 and use suitable parts of the remaining 12 blocks for the 12 boundary
regions. For example, for the 8 x 7 region [0,8) x [8,15), we would use only
the observations lying in [i~, i~ + 8) x rig, ig + 7) if the selected block is given
by [i~, i~ +8) x rig, ig +8). Similarly, for the 2 x 8 region [-10, -8) x [-8,0),
we would use the observations lying in [i~ +6, i~ +8) x rig, ig +8) only, when
the selected block is given by [i~, i~ + 8) x rig, ig + 8). When U{(k +U)(3n :
k E Kn} #- R n , a simpler and valid alternative (not described in Section
12.3.1) is to use the complete sets of observations in all K (= 16 in the
example, for (3n = 8) resampled blocks and define the bootstrap version of
a random variable Tn = t(N; {Z(SI), ... , Z(SN)}, e) as

T~* =t(M;{Z*(st}, ... ,Z*(SM)},en ) ,

where {Z* (s 1), ... , Z* (s M)} is the collection of all observations in the K-
many res amp led complete blocks, and where en is an estimator of based e
on {Z(st}, ... ,Z(SN)}. However, for the rest of this section, we continue
to work with the original version of the block bootstrap method described
in Section 12.3.l.
First we consider the problem of variance estimation by the overlapping
block bootstrap method. Suppose that the level-2 parameter of interest is
given by a;;' = Var(TIn ), the variance of the centered and scaled sample
mean
_ d/2 -
TIn = An (Zn - f.L)
(note that here d = 2 and f.L = 0). To find the block bootstrap estimator
0-;;' ((3n) of the parameter a;;', note that by the linearity of the sample mean
in the observations, we can write down an exact formula for 0-;;' ((3n), as
in the time series case. For later reference, we state the formula for the
general case of a ~d-valued random field {Z(s) : S E ~d}. Let Sn(i; k)
denote the sum of all observations in the ith block of "type k," Bn(i; k) ==
[Rn(k) - k(3n +i] n [i +U(3n], for i E In == {j E 7l,d : j +U(3n eRn}, k E Kn.
Then, the spatial bootstrap estimator of a;;' is given by

N- 2 A~ [IKInl {IInl-I L Sn(i; 0)2}


iE'In

+ L {IInl- I L Sn(i;k)2} _N 2jL;,] , (12.13)


kEK2n iEI"
292 12. Resampling Methods for Spatial Data

where itn = N-IIInl-I{IKInl LiEIn Sn(i; 0) + LkEJ(2n LiEIn Sn(i; k)},


and where KIn and K2n denote the set of all interior and all boundary
blocks, respectively. For the block size parameter f3n = 5, IK In I = 24,
IK2n I = 0, while for f3n = 8, IK In I = 4, and IK2n I = 12 in our example.
The corresponding block bootstrap estimates are reported in Table 12.l.
The true values of the level-2 parameter 0"; == A~VarO~n) and its limit
O"~ are given by 8.833 and 9.761 under the variogram model (12.12) with
0";
e = (0,2,1)'. The corresponding values of and O"~ under e = (1,1,1)'
are given by 5.167 and 5.630, respectively.

TABLE 12.1. Bootstrap estimates a-~(,6n) of the level-2 parameter


O"~ = >'~Var(Zn) for the data sets of Figures 12.3 and 12.4 with block
size parameter (3n = 5,8.

e = (0,2,1)' e = (1,1,1)'
5 5.950 4.469
8 7.811 5.590

Next we apply the bootstrap method to estimate the distribution func-


tion of TIn' Note that under both e-values, the true distribution of TIn
is given by N(O,O";), where 0"; = 8.833 for e = (0,2,1)' and 0"; = 5.167
for e = (1,1,1)'. Unlike the variance estimation case, the bootstrap esti-
mators of P(TIn ::; .) do not admit a closed-form formula like (12.13) and
have to be evaluated by the Monte-Carlo method, as in the time series
case (cf. Section 4.5). Histograms corresponding to the block bootstrap
distribution function estimators with block size parameter f3n = 5, 8 for
the data set of Figure 12.3 are shown in the upper panel of Figure 12.5.
The corresponding distribution functions are shown in the lower panel of
Figure 12.5. Figure 12.6 gives the histograms and the distribution functions
of the bootstrap estimates of P(TIn ::; .) for the data set of Figure 12.4. In
both cases, we used B = 1000 bootstrap replicates to generate the Monte-
Carlo approximation to P* (Tin ::; .).
In the next three sections, we study some theoretical properties of the
variance and the distribution function estimators, generated by the spatial
block bootstrap method.

12.3.3 Consistency of Bootstrap Variance Estimators


In this section, we show that the spatial block bootstrap method can be
used to derive consistent estimators of the variance of the sample mean,
and more generally, of statistics that are smooth functions of sample means.
Suppose that the random field {Z( i) : i E Zd} is m-dimensional. Let
12.3 Block Bootstrap for Spatial Data on a Regular Grid 293

·5

~ .' .:, ... . - - -- - - .- - -

"'!
true
0 bel,,-n- 5
be t,,-naS

<D
.,;

....
.,;
,,
<'!
0

0:
0

-5 0 5

FIGURE 12.5. Histograms and cumulative distribution functions of the block


bootstrap estimates of P(An(Zn - J-L) s: .)for the data set of Figure 12.3. The
left histogram corresponds to f3n = 5 and the right one to f3n = 8. The true
distribution of An(Zn - J-L) is given by N(O, (T~) with CT~ = 8.833, and is denoted
by the solid line in the lower panel.

- N
Zn = N- 1 L:i=l Z(Si) denote the sample mean and let ()n = H(Zn) be an
A -

estimator of a level-l parameter of interest () = H(J-L), where J.t = EZ(O) and


H : lR m -+ lR is a smooth function. As in the time series case, by considering
suitable transformations of the observations, one can express many com-
mon estimators under this spatial version of the Smooth Function Model.
Let Z; denote the block bootstrap sample mean based on a block-size pa-
294 12. Resampling Methods for Spatial Data

co true
0 bet n_5
beta_ na B

"l
0

"':
0

C!
0

~ -2 o 2 4

FIGURE 12.6. Histograms and cumulative distribution functions of the block


bootstrap estimates of P(Tln ::; .) for the data set of Figure 12.4. The left his-
togram corresponds to f3n = 5 and the right one to f3n = 8. The true distribution
is given by N(O, ()"~) with ()"~ = 5.167, and is denoted by the solid line in the lower
panel.

rameter !3n. Then, the bootstrap version of en is given by ()~ = H(Z~) , and
the bootstrap estimator of the level-2 parameter a~ == A~Var(en) is given
by

(12.14)
12.3 Block Bootstrap for Spatial Data on a Regular Grid 295

For establishing consistency of a;, we shall assume that the random field
{Z (s) : s E ]R.d} satisfies certain weak dependence condition. The weak-
dependence condition will be specified through a spatial version of the
strong-mixing condition. For 8 1,82 C ]R.d, let

a1(81,82 ) = sup {IP(A n B) - P(A)P(B)I : A E .rz(81), BE .rZ (8 2)},


(12.15)
where .rz(8) is the (T-algebra generated by the random variables {Z(s) :
s E 8nZd }, 8 c ]R.d. Let dist.(81, 8 2 ) = inf{lx-yl: x E 8 1 ,y E 8 2 } denote
the distance between the sets 8 1 and 8 2 in the iI-norm on ]R.d, defined by
Ixl = IX11 + ... + IXdl for x = (Xl, •.. ,Xd)' E ]R.d. The mixing coefficient of
Z ( .) is defined as

(12.16)

for a > 0, b ~ 1, ~here R(b) is the collection of all sets in ]R.d that have
a volume of b or less and that can be represented as unions of up to fb1
many cubes.
Many variants of the strong-mixing coefficient have been proposed and
used in the literature, where the suprema in (12.16) are taken over various
classes of sets 81, 8 2 . In (12.16), we restrict attention to sets 81 and 8 2
that are finite unions of d-dimensional cubes and have a finite volume. As
a result, the sets 8 1 and 8 2 are bounded subsets of ]R.d. This restriction is
important in dimensions d ~ 2. Some important results of Bradley (1989,
1993) show that a random field in ]R.d, d ~ 2 with a strong mixing coefficient
satisfying
lim a(a; 00) = 0 (12.17)
a-HX)

is also p-mixing. Thus, if one allows unbounded sets 8 1,82 in (12.16), then
random fields satisfying (12.17) necessarily belong to the smaller class of
p-mixing random fields. For more discussion of various mixing coefficients
for random fields, see Doukhan (1994).
The following result proves consistency of the bootstrap variance estima-
tors.

Theorem 12.1 Suppose that the random field {Z(i) : i E Zd} is station-
ary with EIIZ(0)11 6 +8 < 00 and with the strong mixing coefficient a(a, b)
satisfying
(12.18)
for some 8 > 0, T1 > 5d(6 + 8)/8, and 0 ::; T2 ::; TI/d. Also, suppose
that H is continuously differentiable and the partial derivatives DO. H(·),
lal = 1, satisfy a Holder's condition of order 'TJ E (0,1]. If, in addition,
f3;;1 + >..;;lf3n = 0(1), then

(12.19)
296 12. Resampling Methods for Spatial Data

where a;(,6n) is as defined in (12.14), (T~ == lim n-+ oo A~Var(en) =


vol.~Ro) LiEZ d EW(O)W(i) and where W(i) = Llol=l DO H(J.l) (Z(i) - J.l)o.

For proving the result, we need the following moment bound on partial
sums of a possibly nonstationary random field that is a special case of a
result stated in Doukhan (1994).

Lemma 12.1 Let {Y(i), i E Zd} be a random field with EY(i) =


EIY(i)1 2q+<> < ooforalli E Zd forsomeq E N and 0 E (0,00). Letoy(a;b)
and °
denote the strong-mixing coefficient of Y (.), defined by (12.15) and (12.16)
with Z(i) 's replaced by Y(i) 'so Suppose that
00

'L)r + 1)d(2q -1)-1[ay(r; 2q)]O/(2 q +<» < 00 . (12.20)


r=O

Then, for any subset A of Zd,

(12.21 )

::; C(q, 0) max {[~)EIY(i)12+<»


iEA
2;8 r, 'L)EIY(i)1 2q+<»
iEA
2;-h } ,

where C(q, 0) is a constant that depends only on q, 0, d, and the a-mixing


coefficient oy(·; .), but not on the subset A.
Proof: Follows from Theorem 1.4.1.1, Doukhan (1994). o

Proof of Theorem 12.1: Let fi,n = E*Z~. By Taylor's expansion, we get

(}~ H(Z~)
H(fi,n) + L D V H(fi,n)(Z~ - fi,nt + Qin
11'1=1
H(fi,n) + L D V H(J.l)(Z~ - fi,nt + Q;n , (12.22)
Ivl=l

where, by the Holder's condition on the partial derivatives of H,

IQinl < ClIZ~ - fi,n II 1+'1 ,


IQ;nl < IQinl + Cllfi,n -: J.l111+'1 (12.23)

for some nonrandom C = C(ry, d) E (0,00).


Next, for i E In, k E K n , let Bn(i; k) denote the ith block of "type k"
defined by Bn(i; k) = R",(k) - k,6n + i. Also, let Sn(i; k) denote the sum
over all W(j) with j E Bn(i; k), i E In, k E K n , and let S~(k) denote the
sum of the W(j)'s in the kth resampled block Bn(h; k), k E JC n . Write Lk
12.3 Block Bootstrap for Spatial Data on a Regular Grid 297

for the number of data sites in Rn(k), k E Kn. Then, using (12.22) and the
independence of the resampled blocks, we get

Var*(B~) = Var*(LD"H(fL)(Z~-Pn))+Q3n
1,,1=1
N- 2 Var* ( L S~(k)) + Q3n
kElC n

N- 2 L Var* (S~(k)) + Q3n

N- 2 [ L E*S~(k)2] +Q4n, (12.24)


kElC n

where, by Cauchy-Schwarz inequality, the remainder terms Q3n and Q4n


satisfy the inequalities

IQ3nl < E*[Q;n]2+2{E*Q;;}1/2{N- 2 L E*S~(k)2r/2


kElC n

IQ4nl < IQ3nl + C(d)· [N- 2 L L~] ·IIPn - fLI12 . (12.25)


kElC n

Note that by (12.3) and the fact that IKlnl rv vol.(Ro) . (A n lf3n)d, we
have

A~E[N-2 L E*S~(k)2]
kEIC 1n

A~N-2 ·IK1nl· E[IInl-1 L Sn(i;O)2]


iEIn

A~IKlnl· N- 2 . E[ L W(i)f
iEi3n Unzd

---+ O"~ as n ---+ 00 . (12.26)

Also, by the boundary condition on R o, IK2nl = O([A n lf3n]d-l) as n ---+ 00.


Hence, by Lemma 12.1,

A~E[N-2 L E*S~(k)2]
kEIC 2n

A~N-2 L E[I; I L Sn(i; k)2]


kEIC2n n iEIn

O( A~N-2 ·IK2n l . f3~)


O(f3nIAn) as n ---+ 00 . (12.27)
298 12. Resampling Methods for Spatial Data

Next, define S~(k) and Sn(k;i) by replacing the W(i)'s in the definition
of S~(k) and Sn(k; i) by Z(i)'s. Note that Pn can be expressed as Pn =
N- 1 2::;:=1 WrnZ(sr) for some nonrandom weights Wrn E [0,1]. Then, by
Lemma 12.1, (5.11), and arguments similar to (12.26) and (12.27),

E{E*IIZ~-PnIl4}
< C(d)N- 4E{ L E*IIS~(k) - LkPnl1 4
kEiCn
+( L E*IIS~(k) - LkPnI1 2)2}
kEiCn
< C(d)N-4IKnI2 max {EIISn(k; 0) - LkJLI14
+ LkEllPn - JLI14 : k E Kn }
< C(d)N-4IKnI2,B~d
O(-\;;-2d) . (12.28)

Hence, E[E*Q2~] :S C [(E{ E* IIZ~ - Pn11 4} )(1+'1)/2 + (EIIPn - JLI1 4)(1+ry)/2] =


o(-\;;-d(1+ry») as n -+ 00. Also, note that 2::kEiC n L~ :S max{Lk : k E Kn} x
[2::kEiC n L k ] :S ,B~N. Hence, by (12.23), (12.26), (12.27), and (12.28), it
follows that

EIQ3nl :S E{E*(Q2n)2}
+2[E{E*(Q2n)2}]1/2[N- 2E{ L E*S~(k)2}r/2
kEiCn

which in turn implies that

(12.29)

As a result, by (12.24), (12.26), (12.27), and (12.29), it remains to show


that

N- 1 L [E*S~(k)2 - E{E*S~(k)2}] ---+p 0 as n -+ 00. (12.30)


kEiC ln

As in the time series case, we do this by regrouping the squared block


sums Sn(i; k)2, i E In, k E KIn. Note that E*S~(k? = E*S~(0)2 =
IIn l-l 2::iETn Sn(i;0)2 for all k E KIn. Let I n = {j E Zd: [j+U]2,Bn nRn i-
0} and for h E {a, l}d, define In(h) = {j E I n : (j - h)/2 E Zd}. Thus,
12.3 Block Bootstrap for Spatial Data on a Regular Grid 299

I n is the index set of a partition of Rn by cubes of sides 2f3n and for each
hE {O, 1 }d, In(h) is the subset of I n consisting of integral vectors of "type
h". For example, with h = 0, In(O) is the set of all vectors i in I n such that
all d coordinates of i are even integers. Similarly, with h = (1,0, ... ,0)"
every i E I n ((l, 0, ... ,0),) has an odd integer in its first coordinate and
even integers in the remaining (d - 1) coordinates. For j E I n , let Vn(j)
denote the sum over all [Sn(i; 0)2 - ESn(i; 0)2] such that i E [j +U]2f3n. Set
Vn(j) = 0 if In n [j +U]2f3n = 0. Note that the .(iI-distance between the re-
grouped blocks U{Bn(i;O): i E [j+U]2f3n} and U{Bn(i;O): i E [k+U]2f3n}
of "type h" for any two distinct indices j i= k E In(h) is at least Ij - kl· f3n.
Hence, using Holder's inequality and Lemma 12.1, we have

E[N- 1 L {E*S~(k)2 - E(E*S~(k)2)}r


kEK 'n
N-21K:lnI2IInl-2 E{ L L Vn(j)} 2
hE{O,l}d jEJn(h)

< C(d)N-21K:lnI2IInl-2 L E{ L Vn(j)} 2


hE{O,I}d jEJn(h)

< C(d)N-21K:lnI2IInl-2IJnl max {(ElVn(jW)2/3 : j E I n}


00

r=O

00

r=O

O(N- 2 . [>'~/f3~]2. (>.~)-2. [>'~/(2f3n)d](2f3~)2. {f3~d}2/3)


O(>.;,df3~) .

This proves (12.30). Hence, the proof of Theorem 12.1 is completed. 0

An inspection of the proofs of Theorem 12.1 and Theorem 3.1 (on consis-
tency of the MBB variance estimator for time series data) shows that the
consistency of the spatial block bootstrap variance estimator may be estab-
lished under reduced moment conditions by using suitable truncations of
the variables Sn(i; k)'s in the proof of (12.30) and elsewhere. However, we
avoid the truncation step here in order to keep the proof simple. It follows
that the spatial block bootstrap variance estimator is consistent whenever
the block-size parameter f3n satisfies 13;;1 + >.;;1 f3n = 0(1) as n ---+ 00. Going
through the proof of Theorem 12.1, we also see that the leading term in the
variance part of the bootstrap variance estimator, o-;(f3n) == >'~Var*(B~),
is determined by Var(>'~N-2 LkEKl n E*S~(k)2), where S~(k) is the sum
300 12. Resampling Methods for Spatial Data

of the variables 2::1"1=1 D" H(/-l)(Z(Si) - /-l)" over Si in the resampled block
Bn(h; k). As in the time series case, this term increases as the block size
parameter f3n increases. On the other hand, the leading term in the bias
part of a-~ (f3n) is determined by the difference

N-2A~E[ L E*S~(k)2] - a~
kEKn
N-2A~E[ L E*S~(k)2 + L E*S~(k)2] - a~ .
kEK 'n kEK2n
As (12.27) shows, the contribution from the boundary subregions to the
bootstrap variance estimator, viz., B2n == N-2A~ 2::kEK 2n E{E*S~(k)2}
vanishes asymptotically, at the rate O(f3n/ An) as n ----t 00. However, the
exact rate at which B 2n goes to zero heavily depends on the geometry
of the boundary of Ro and is difficult to determine without additional
restrictions on the prototype set Ro when d ~ 2. To appreciate why, note
that in the one-dimensional case, the number of boundary blocks is at most
two (according to our formulation here) and hence, is bounded. However, in
dimensions d ~ 2, it grows to infinity at a rate O([A n /f3n]d-l). As a result,
the contribution from the "incomplete" boundary blocks playa nontrivial
role in higher dimensions. In contrast, the behavior of the first term arising
from the interior blocks, viz., BIn == N-2A~E{2::kEKln E*S~(k)2}, can
be determined for a general prototype set R o, solely under the boundary
condition, Condition B.
The discussion of the previous paragraph suggests that we may settle
for an alternative bootstrap variance estimator of a!, that is based on
the "bootstrap observations" over the interior blocks {Rn(k) : k E KIn}
only. Let Nl == N 1n = IK 1n 1f3~ denote the total number of data-values
in the resampled "complete" blocks Bn(h; k), k E KIn and let Z~* be the
average of these Nl resampled values. Then, we define the bootstrap version
ofen based on the complete blocks as ()~* = H(Z~*) and the corresponding
variance estimator of a!, as
(12.31 )
In the context of applying the MBB to a time series data set of size n, this
definition corresponds to the case where we resample b = Ln/ CJ "complete"
blocks of length C and define the bootstrap variance estimator in terms of a
resample of size nl = bC only, ignoring the last few boundary values (if any)
in the bootstrap reconstruction of the chain. For the modified estimator
a-rn(f3n), we can refine the error bounds in the proof of Theorem 12.1 to
obtain an expansion for its MSE. Indeed, applying the results of Nordman
and Lahiri (2003a, 2003b) to the leading term in the variance of a-rn(f3n),
we get
12.3 Block Bootstrap for Spatial Data on a Regular Grid 301

{3d 2 d 2174
,\~ [(3) . (vol.(~O))3] (1 + 0(1))
~1· ')'f (1 + 0(1)), say. (12.32)
n

Next, using arguments as in (12.26), we see that the bias part of afn({3n)
is given by
1
- (3n vol. (Ro) L lilaw(i) + o ({3;;-1 )
2EZ d

(3;;1')'2 + 0({3;;-1), say, (12.33)

where aw(i) = Cov(W(O), W(i)), i E tl d and Iii = i1 + ... + id for i =


(i l , ... , id)' E 7l d. Combining these, we have

(12.34)

Now, minimizing the leading terms in the expansion above, we get the
first-order optimal block size for estimating a~ (or a~) as

(12.35)

Note that for d = 1 and Ro = (-~,~], the constants 'l'f and ')'2 in
(12.32) and (12.33) are respectively given by 'l'f = ~ . [2a~l and ')'2 =
-22::: 1 iaw(i) and hence, the formula for the MSE-optimal block length
coincides with that given in Chapter 5. In particular, the optimal block
length (3~ for variance estimation grows at the rate O(NI/3) for d = l.
For d = 2, the optimal rate of the volume of the blocks (viz., (f3~)d) is
O(N 1/2), while for d = 3 it is O(N3/5), where N is the sample size. As
d!2 is an increasing function of d, (12.35) shows that one must employ
blocks of larger volumes in higher dimensions to achieve the best possible
performance of the bootstrap variance estimators.
In the next two sections, we consider validity of approximations generated
by the spatial bootstrap method for estimating the sampling distributions
of some common estimators.

12.3.4 Results on the Empirical Distribution Function


We now discuss consistency properties of the spatial block bootstrap
method for the empirical distribution function of the data. As in the case of
time series data, many common estimators used in the analysis of spatial
data may be expressed as smooth functionals of the empirical distribu-
tion of certain multidimensional spatial processes. As a result, here we
suppose that the spatial process Z(·) is an m-dimensional (m E N) sta-
tionary process with components Z1(·), ... , ZmO. Thus, the observations
302 12. Resampling Methods for Spatial Data

are given by Z(Si) = (Zl(Si), .. " Zm(Si))', i = 1, ... , N, where the data
locations {Sl,' .. ,SN} lie on the integer grid 7l,d inside the sampling region
Rn (cf. (12.2)). Let F~m)O denote the empirical distribution function of
Z(sI), ... , Z(SN), defined by
N
F~m)(z) = N- l L l1(Z(si) :::; z), z E]Rm , (12.36)
i=l
where, recall that for two vectors x = (Xl"'" Xm)' E ]Rrn and Y =
(Yl,"" Ym)' E ]Rrn, we write X :::; Y if Xi :::; Yi for all 1 :::; i :::; m. Let
G(m)(z) = P(Z(O) :::; z), z E]Rm denote the marginal distribution function
of the process Z(·) under stationarity. Define the empirical process

(12.37)

Because the sample size N grows at the rate [vol.(Ro) . A~] (cf. (12.3)), an
alternative scaling sequence for the difference F~m) (-) - G(m) (.) is given by
the more familiar choice ViV. However, in the context of spatial asymp-
toties, A~/2 happens to be the correct scaling sequence even in presence
of partial infilling (cf. Zhu and Lahiri (2001)), while the scaling N l /2 is
inappropriate in presence of infilling. As a result, we shall use A~/2 as the
scaling sequence here.
Next, we define the bootstrap version of ~~m). Let Z~(Rn)
{Z* (Sl), ... , Z* (s N )} denote the block bootstrap version of the process
{Z(s) : S E Rn n 7l,d}, based on a block size parameter f3n. Let F~m)*(z) =
N- l 2:!ll1(Z*(Si) :::; z), z E ]Rrn, be the empirical distribution function
of {Z*(Sl),"" Z*(SN n. Then, the block bootstrap version of ~~m) is given
by
(12.38)

To establish the weak convergence of the processes ~~m) and ~~m)*, we


consider the space]jJ)m of real-valued functions on [-00, oo]m that are con-
tinuous from above and have finite limits from below. We equip ]jJ)m with
the extended Skorohod Jl-topology (cf. Bickel and Wichura (1971)). Then,
both ~~m) and ~~rn)* are ]jJ)m-valued random variables. The following result
asserts that under some regularity conditions, the sequence {~~m) }n2:l con-
verges in distribution to a nondegenerate Gaussian process as ]jJ)rn-valued
random variables and that the bootstrapped empirical process ~~rn)* also
has the same limit, almost surely. Let ---+d denote convergence in distri-
bution of ]jJ)m-valued random variables under the given extended Skorohod
Jl-topology and let C;;, denote the collection of all continuous functions
from [-00, oo]m to lR that vanish at (-00, ... , -(0)' and (00, ... ,(0)'. Also,
let a( a; b) denote the strong mixing coefficient of the vector random field
Z(·), as defined in (12.15) and (12.16).
12.3 Block Bootstrap for Spatial Data on a Regular Grid 303

Theorem 12.2 Suppose that {Z(s) : s E Zd} is a stationary vector-valued


random field with components Zl (s), ... , Zm (s), s E lR d , such that
(i) Gi(a) == P(Zi(O) :::; a), a E lR is continuous on lR, i = 1, ... , m, and
(ii) a(a, b) :::; G 1 exp(-G2a)· bT2 for all a ~ 1, b ~ 1 for some constants
G 1 , G 2 E (0,00), and 0:::; T2 :::; 2.
Also, suppose that Condition B holds and that

(12.39)

for some E E (0,1). Let w(m) be a zero mean Gaussian process on


[-oo,oo]m with p(w(m) E C~) = 1 and with covariance function

Cov(w(m)(zI), w(m)(Z2)) (12.40)

[vol.(Ro)r1 L {P(Z(O) :::; Zl, Z(i) :::; Z2) - G(m) (zI)G(m) (Z2) } ,
iEZ d

Zl, Z2 E lRm. Then,

and
~~m)* ----+d w(m) as n ---7 00, a.s.
Proof: This is a special case of Theorem 3.3 of Zhu and Lahiri (2001),
who establish the theorem under a polynomial strong-mixing condition.
Here, we used the exponential mixing condition only to simplify the
statement of Theorem 12.2. See Zhu and Lahiri (2001) for details. D

An immediate consequence of Theorem 12.2 is that for any Borel subset


A of IIJ)m with p(w(m) E 8A) = 0, where 8A denotes the boundary of A,
we have

(12.41 )

Thus, we may approximate the probability p(~~m) E A) by its bootstrap


estimator p*(~~m)* E A) for almost all realizations of the process {Z(i) :
i E Zd}, without having to explicitly estimate the covariance function of
the limiting process w(m), given by (12.40). In particular, if Y : IIJ)m ---7 §
is a Borel-measurable function from IIJ)m to some complete and separable
metric space § that is continuous over C~, then by Theorem 12.2 and the
continuous mapping theorem (cf. Billingsley (1968), Theorem 5.1),

---7 ° as n ---7 00, a.s. (12.42)


304 12. Resampling Methods for Spatial Data

for any Borel subset B of § with p(y(w(m») EBB) = O. Because the


exact distribution of y(~~m») or of y(w(m») may have a complicated form
for certain functionals y(.), (12.42) provides an effective way of approxi-
mating the large sample distribution of y(~~m»), without further analytical
considerations. As an example, suppose that m = 1 and that we want to
set a simultaneous confidence band for the unknown marginal distribution
function G(z) == G(1)(z) = P(Z(O) ::; z), z E JR of the process Z(·). Then,
we take Y(g) = Ilgllo,» 9 E l!))l, where for any function 9 : [-00,00] - t JR,
we write Ilglloo = sup{lg(x)1 : x E [-oo,oo]}. It is easy to check that this
y(.) is continuous on C~ and, hence, (12.42) holds. For 0 < a < 1, let fin
denote the a-th quantile of the bootstrap distribution function estimator
P*(II~~(l)lloo ::; .), i.e.,

fin = inf {a E JR : P*(II~~(l) 1100 ::; a) ~ a} .


Then, a 100(1- a)% large sample confidence region for G(·) is given by

in(a) = {F :Fis a distribution function on JR and


IIF~l) - Flloo ::; >..;;d/2fil_n} (12.43)

which, by (12.42), attains the desired confidence level (I-a) asymptotically.


Note that in this case, the traditional large sample confidence region for
G(·) uses the (l-a)-th quantile of the distribution of IIW(l) 1100' for which
no closed-form expression seems to be available. In the special case where
d = 1, G is the uniform distribution on [0, 1] and the Z(i)'s are independent,
W(l) reduces to W, the Brownian Bridge on [0,1]. Although an explicit
form of the distribution of IIWll oo is known (cf. Chapter 11, Billingsley
(1968)), it has a very complicated structure that makes computation of the
quantiles an arduous task. In comparison, the block bootstrap confidence
region in(a) may be found for any d ~ 1 even in the presence of spatial
dependence, without the analytical consideration required for deriving an
explicit expression for the (1 - a)-th quantile of the distribution of IIWll oo
and without having to estimate the unknown population parameters that
appear in this expression.
In the next section, we consider properties of the spatial bootstrap
method for estimators that are smooth functionals of the m-dimensional
empirical distribution F~m).

12.3.5 Differentiable Functionals


As in Section 4.4, validity of the spatial bootstrap method for the m-
dimensional empirical process readily allows us to establish its validity for
approximating the distributions of estimators that may be represented as
12.3 Block Bootstrap for Spatial Data on a Regular Grid 305

smooth functionals of the empirical distribution. Here we will consider a


form of differentiability condition, known as Hadamard differentiability,
which is a weaker condition than the Frechet differentiability condition of
Section 4.4. Specialized to the d = 1, i.e., the time series case, these results
also imply the validity of the MBB for Hadamard differentiable function-
also Here we follow van der Vaart and Wellner (1996) to define Hadamard
differentiability.
Definition 12.1 Let lI»0 be a subset of and lI»+ be a subspace oflI»m. Then,
a mapping Y : lI»0 -- lR P , pEN is called Hadamard differentiable tan-
gentially to lI»+ at go E lI»0 if there exists a continuous linear mapping
y(l)(gO;') : lI»+ --lRP such that

Y(gO + anfn) - Y(go) ~ y(l)(gO;!)


(12.44)
an
for all converging sequences an -- 0 and fn --+ f with f E lI»+ and go +
anfn E lI»0 for all n ;::::: 1. When lI»+ = lI»m, Y is simply called Hadamard
differentiable at go. The linear function y(l)(gO;') is called the Hadamard
derivative of Y at go.
When lI»+ = lI»m, i.e., when the derivative y(l)(gO;') is defined on all of
lI»m' (12.44) is equivalent to requiring that for any an --+ 0 and any compact
set OCO of lI»m'

sup
jE'K.O ,go+anjEIIJ)O
IIY(gO + an!) - Y(go) - an y(l)(gO; !)II
= o(a n ) as n --+ 00 , (12.45)

where 11·11 denotes the usual Euclidean norm on lRP . In comparison, Frechet
differentiability of Y at go requires (12.45) to be valid for all bounded sets
OCO C lI»m. As a result, Frechet differentiability of a functional is a stronger
condition than Hadamard differentiability.
Hadamard differentiability of M-estimators and other important statisti-
cal functionals have been investigated by many authors; see Reeds (1976),
Fernholz (1983), Ren and Sen (1991, 1995), van der Vaart and Wellner
(1996), and the references therein. The following result proves the validity
of the spatial bootstrap for Hadamard differentiable functionals. Here, we
shall always assume that the domain lI»0 of definition of the functional Y
is large enough such that G(m), F~m)(.), F~m)*(.), E*F~m)*(.) E lI»0 (with
probability one). This ensures that the estimators Y(F~m)), Y(E*F~m)*)
of the parameter Y(G(m)) and the bootstrap version Y(F~m)*) of Y(F~m))
are well defined.
Theorem 12.3 Suppose that the conditions of Theorem 12.2 hold. Let
Y : lI»0 __ lRP be Hadamard differentiable at G(m) tangentially to C~ with
derivative y(l)(G(m);.) for some lI»0 C lI»m.
306 12. Resampling Methods for Spatial Data

(a) Then,

A~/2(Y(F~m))_y(c(m))) --+dy(l)(c(m);w(m)) as n-+oo.


(12.46)

(b) Suppose that Y and y(l)(c(m);.) satisfy the following stronger ver-
sion of (12.44): For any an -+ 0, fn -+ f E JI]l+ and gn -+ c(m) with
gn, gn + anfn E JI]lO for all n 2: 1,

(12.47)

Then, with probability 1,

A~/2(Y(F~m)*) - Y(E*F~m)*)) --+d y(l)(c(m);w(m)) as n -+ 00 .

(12.48)

Proof: Part (a) follows from Theorem 3.9.4 of van der Vaart and Well-
ner (1996). Next consider part (b). Using Lemma 12.1, the Borel-Cantelli
Lemma, and the arguments in the proof of the Glivenko-Cantelli Theorem,
it can be shown that

sup 16~m)(x) - c(m)(x)1 = 0(1) a.s. (P) (12.49)


xE[-oo,oo]d

Hn(f) == A~/2 (y(p~m) + A;;d/21) _ y(p~m)))


x n(p~m) + A;;d/2 f E JI]lO) ,

f E JI]lm· Then, by (12.47) and (12.49), there exists a set A with P(A) = 1
such that on A,

for any in -+ f E C~ with A;;d/2 fn + p~m) E JI]lO for all n 2: 1. Hence, by


Theorem 12.1 and the extended continuous mapping Theorem (cf. Theorem
5.5, Billingsley (1968); Theorem 1.11.1, van der Vaart and Wellner (1996)),
applied pointwise on a set of probability 1, we have

A~/2 (Y(F~m)*) _ y(p~m)))


= Hn(~~m)*) --+d y(l)(c(m); w(m)) as n -+ 00, a.s. (P).

This proves part (b). D


12.4 Estimation of Spatial Covariance Parameters 307

12.4 Estimation of Spatial Covariance Parameters


12.4.1 The Variogram
In this section, we describe a method for fitting variogram models to spatial
data using spatial resampling methods. Suppose that {Z(i) : i E Zd}, dEN
is an intrinsically stationary random field, i.e., {Z (i) : i E Zd} is a collection
of random variables defined on a common probability space such that

E(Z(i) - Z(i + h)) = 0 (12.50)

and
Var(Z(i) - Z(i + h)) = Var(Z(O) - Z(h)) (12.51 )
for all i, h E Zd. The function 2"((h) == Var(Z(O) - Z(h)) is called the
variogram of the process Z(·). Note that if the process Z(.) is second-order
stationary with auto covariance function u(h) = Cov(Z(O), Z(h)), hE Zd,
then (12.50) holds and, for any i, h E Zd,

Var(Z(i) - Z(i + h)) Var(Z(i)) + Var(Z(i + h))


- 2Cov(Z(i), Z(i + h))
2u(0) - 2u(h) ,

which implies (12.51) with

"((h) = u(O) - u(h), hE Zd . (12.52)

Thus, second-order stationarity implies intrinsic stationarity. Note that if


the process Z(·) is regular in the sense that u(h) --+ 0 as Ilhll --+ 00, then,
from (12.52),
u(O) = lim "((h). (12.53)
IIhll--->oo
Hence, by (12.52) and (12.53), the function uO can be recovered from the
knowledge of the variogram 2"(0. Thus, under some mild conditions, the
variogram 2"((·) provides an equivalent description of the covariance struc-
ture of the process Z(·) as does the auto covariance function uO. In spatial
statistics, it is customary to describe the spatial-dependence structure of
a spatial process by its variogram rather than the auto covariance function
(also called the covariogram) 0"(.). Like the nonnegative definiteness prop-
erty of the auto-covariance function u(·), the variogram must satisfy the
following conditional negative definiteness property (cf. Chapter 2, Cressie
(1993)): For any spatial locations Sl, ... , Sm E Zd, mEN, and any real
numbers aI, ... ,am with 2:::
1 ai = 0,

m rn

LL aiaj"((Si - Sj) :::; 0. (12.54)


i=l j=l
308 12. Resampling Methods for Spatial Data

Thus, for any estimator of a spatial variogram to be valid, it must satisfy


this conditional nonnegative definiteness property.
In the next section, we describe a general estimation method that pro-
duces conditionally nonnegative definite variogram estimators.

12.4.2 Least Squares Variogram Estimation


A popular approach for estimating the variogram is the method of least
squares variogram model jitting. Initially proposed in Geostatistical liter-
ature (cf. David (1977), Journel and Huijbregts (1978)) and then further
modified and studied by Cressie (1985), Zhang, van Eijkeren and Heemink
(1995), Genton (1997), Barry, Crowder and Diggle (1997), Lee and Lahiri
(2002), and others, this method fits a parametric variogram model by min-
imizing a certain quadratic distance function between a generic nonpara-
metric variogram estimator and the parametric model using various least
squares methods. Specifically, suppose that the true variogram of the spa-
tial process Z(·) lies in a parametric family {2'Y('; 0) : 0 E 8} of valid var-
iograms, where 8 is a subset of ]RP. Our objective here is to estimate the
variogram parameter vector 0 on the basis of the sample {Z (s) : 1 ~ i ~ N}
where the sampling sites SI, ... , S N lie on the part of the integer grid Zd in-
side the sampling region R.n, as specified by (12.2). Let 2in(h) be a nonpara-
metric estimator of the variogram 2'Y(h) at lag h. Also, let hI, . .. ,hK E ]Rd,
2 ~ K < 00 be a given set of lag vectors and let V (0) be a K x K positive-
definite weight matrix, that possibly depends on the covariance parameter
O. Then, a least squares estimator (LSE) of 0 corresponding to the weight
matrix V (0) is defined as

On,v = argmin{Qn(O; V) : 0 E 8} , (12.55)

where Qn(O; V) = gn(B)'V(O)gn(O) and gn(O) is the K x 1 vector with ith


element (2in(h i ) - 2'Y(hi ;O)), i = 1, ... ,K. For V(O) = IlK, the identity
matrix of order K, On,V is the ordinary least squares' (OLS) estimator of
O. Choosing V(O) = I:(O)-I, the inverse of the asymptotic covariance ma-
trix of 9n(O), we get the generalized least squares (GLS) estimator of O.
In the same vein, choosing V (B) to be a diagonal matrix with suitable
diagonal entries, we can get the various weighted least squares (WLS) esti-
mators proposed by Cressie (1985), Zhang, Eijkeren and Heemink (1995),
and Genton (1997). In addition to guaranteeing the conditional nonnega-
tive definiteness property (12.54) of the resulting variogram estimator, this
method has a visual appeal similar to that of fitting a regression function
to a scatter plot. This makes the least squares methods of variogram model
fitting popular among practitioners.
Statistical properties of the LSEs heavily depend on the choice of the
weighting matrix V(O), employed in the definition of the LSE On,V' Let
r(O) denote the K x q matrix with (i,j)-th element -8~.[2'Y(hi;O)1 and
3
12.4 Estimation of Spatial Covariance Parameters 309

let A(O) = r(O)[r(o)'V(O)r(O)]-l. Theorem 3.2 of Lahiri, Lee and Cressie


(2002) shows that under some regularity conditions, if

angn(O) ---+d N(O, E(O)) for all 0 E e, (12.56)

for some sequence {an}n>l of positive real numbers and for some positive
definite matrix E(O), the~ for all () E e,

(i)
an(On,v - 0) ---+d N(O, Dv(O)) as n - t 00 (12.57)
where Dv(O) = A(O)'V(O)E(O)V(O)A(O) ;
(ii)
(12.58)
is nonnegative definite for any V(O) ;

(iii) for the GLS method with V(O) = E(O)-l,


D~-l(O) = (r(O)'E(O)-lr((}))-l . (12.59)

Hence, it follows from (12.57)-(12.59) that the LSE On,V of 0 is asymp-


totically multivariate normal and hence, one may compare different LSEs
in terms of their limiting covariance matrices. This leads to the following
definition of asymptotically efficient LSEs of O.

Definition 12.2 A sequence {On,Va} of LSEs of () corresponding to a


weighting matrix Va (0) is said to be asymptotically efficient if for any other
weighting matrix V(O), the difference Dv(O) - DVa((}) is nonnegative defi-
nite for all 0 E e.
This definition of asymptotic efficiency is equivalent to the requirement
that for every x E IRP, the estimator x' On, Va of the linear parametric func-
tion x' 0 has the minimum asymptotic variance among the class of all LSEs,
for all () E e. From (12.58) and (12.59), it follows that the optimal covari-
ance matrix of the limiting normal distribution is given by D~-l (0) and
that the GLS estimator of () is asymptotically efficient among all LSEs.
Although it is an optimal estimator from the statistical point of view,
computation of the GLS estimator can be difficult in practice. To appre-
ciate why, note that the GLS estimator On,GLS == {)n,~-l is defined as (cf.
(12.55)) On,GLS = argmin{Qn(O;E- 1 ): () E e}, which involves minimiza-
tion of the nonlinear criterion function Qn(O; E- 1 ) over the parameter space
e c IRq. Because of the computational complexity associated with a gen-
eral optimization method for minimizing such nonlinear functions, whether
iterative or grid based (cf. Dennis and Schnabel (1983)), the GLS method
is computationally demanding. A second undesirable feature of the GLS
310 12. Resampling Methods for Spatial Data

method is that it requires the knowledge of the asymptotic covariance ma-


trix of the generic variogram estimator, which must be found analytically
and, therefore, may be intractable for certain non-Gaussian processes. In
practice, these factors often prompt one to use other statistically inefficient
LSEs, such as the OLS and WLS estimators.
Following the work of Lee and Lahiri (2002), we now describe a least
squares method based on spatial resampling methods that is also asymp-
totically efficient within the class of all least squares methods. Furthermore,
it is computationally much simpler than the standard GLS method and
does not require any additional analytical consideration. The main idea
behind the new method is to replace the asymptotic covariance matrix of
the generic variogram estimator in the GLS criterion function by a con-
sistent, nonparametric estimator of the covariance matrix based on spatial
resampling, which can be evaluated without knowing the exact form of the
covariance matrix.

12.4-3 The RGLS Method


We now describe the resampling-based least squares method (or the RGLS
method, in short). Let f: n be an estimator of the asymptotic covariance ma-
trix E( 0) (cf. (12.56)) of the normalized generic variogram estimator based
on a suitable resampling method. Then, we replace the matrix [E(O)]-l in
the GLS criterion function by f:;:;-l and define the resampling method based
GLS (RGLS) estimator of 0 as

(12.60)

Since f: n itself does not involve the parameter 0, computation of the


RGLS estimator requires inversion of the estimated covariance matrix f: n
only once. In contrast, for the GLS estimator, the inverse of the matrix
E(O) needs to be computed a large number of times for finding the "min-
imizer" of the GLS criterion function. As a result, the RGLS estimator is
computationally much simpler than the GLS estimator. And as we shall
show below, the RGLS estimator is also asymptotically efficient, making it
"as good as" the GLS estimator from a statistical point of view.
Lee and Lahiri (2002) suggest using a spatial subsampling method to
derive the estimator f: n of the asymptotic covariance matrix E and call it
the "subsampling based GLS method" or the "SGLS" method. A second
possibility is to employ the spatial block bootstrap method of the previous
section to form the non parametric estimator f: n , leading to what one may
refer to as the BGLS method. However, an advantage of the subsampling
method over spatial bootstrap methods is that the computation of the
estimator f: n does not require any resampling and may be found using an
explicit formula given below (cf. (12.62),(12.64)).
12.4 Estimation of Spatial Covariance Parameters 311

We now briefly describe the spatial subsampling method associated with


the SGLS methods of Lee and Lahiri (2002). Let Rn = AnRO be the sam-
pling region (cf. Section 12.2) and let {Z(sd, ... ,Z(SN)} = {Z(i) : i E
7l,d n Rn} be the observations. As in the spatial bootstrap method, let
{3n EN be an integer satisfying (12.4), i.e.,
(3;;1 + X;;I{3n = 0(1) as n -t 00 .

The subregions for the spatial subsampling method are obtained by con-
sidering suitable translates of the set {3nRO. More specifically, we consider
d-dimensional cubes of the form i +Uo{3n, i E 7l,d that are contained in the
sampling region Rn, where Uo = (-~, ~ld is the unit cube in ]Rd, centered
at the origin. Let ~ = {i E 7l,d : i + Uo{3n eRn} denote the index set of
such cubes. Then, we define the subregions {R~) : i E ~} by inscribing
for each i E ~, a translate of (3nRo inside the cube i + Uo{3n such that the
origin is mapped onto i, i.e., we define
R n(i) _.
- ~ + {3n R 0, .
~ E Ln .
.-Ml
(12.61)

Then, {R~) : i E ~} is a collection of overlapping subregions of Rn that


are of the same shape as the original sampling region R n , but are of smaller
volume. Moreover, the number (say, f) of observations in each subregion is
the same and it grows at the rate [vol.(Ro){3~], as in the block bootstrap
case.
The observations from the subregions can be used to define the subsam-
pling estimator of the covariance matrix (and, more generally, the prob-
ability distribution) of a given K-dimensional (K E N) random vector
of the form Tn = tn(Zn; "'), where Zn = {Z(SI), ... , Z(SN)} and", is a
population parameter. For this, on each subregion R(i), we define a ver-
sion T(i) of Tn by replacing the observed values Zn with the subsample
Z(i) == {Z (s) : S E R(i) n 7l,d} from the subregion R( i), and by replacing the
parameter", by an estimator fin based on Zn. Thus, we define
T(i) == ti(Z(i); fin), i E ~ .
Note that T(i) is defined using the function tiC .), not t n (·; .), since the sub-
sample Z(i) has only f observations. For example, iftn(Zn; "') = .;n(Zn -p,)
with Zn = n- 1L~1 Z(Si) and p, = EZ(O), then T(i) = ..,ff(Z(i) - Pn),
where Z(i) is the average of the f observations in the subsample Z(i) and
Pn = E*Z~ is an estimator of p, based on Zn.
The subsampling estimator of the sampling distribution of Tn is now
defined as the empirical distribution function of the subsample copies
{T(i) : i E ~}. By the "plug-in" principle, the subsampling estimator
of the (asymptotic) covariance matrix of Tn is given by

tn == I~I-l L T(i)T(i)' (12.62)


iEIR
312 12. Resampling Methods for Spatial Data

Next we apply the subs amp ling method to obtain an estimator of the
covariance matrix of the (scaled) variogram estimator at lags hI' ... ' h K .
Thus, the random vector Tn here is now given by

(12.63)

Let 21'(i) (h) denote the lag-h variogram estimator obtained by replacing Zn
and n in the definition of 2in(h) by the subsample Z(i) and the subs ample
size e, respectively. Also, let 2i'n(h) == I~I-l L:iEIo 21'(i) (h). Then, the
subsample version of Tn is given by n

T(i) == J£(21'(i) (hI) - 2i'n(hd; ... ;21'(il(h K ) - 2i'n(h K ))', i E ~ .


(12.64)
e
The SGLS estimator of is now given by (12.60) with i: n defined by rela-
tions (12.62) and (12.64).

12.4.4 Properties of the RGLS Estimators


Next we prove consistency and asymptotic efficiency of the RGLS estimator
Bn,RGLS, of (12.60), based on a general resampling method.
Theorem 12.4 Suppose that the following conditions hold:
(C.l) y'ngn(e o) ----+d N(O, ~(eo)) under eo, and ~(eo) is positive definite.
(C.2) (i) For any E > 0, there exists a 8 > 0 such that
inf {L:~1(21'(hi;ed - 21'(hi;e2))2: IIe1 - B211?: E, Bl,B2 E 8} > 8 .
(ii) sup{')'(h i ;B) : B E 8} < 00 for i = 1, ... , K.
(iii) I'(h i ; B) has continuous partial derivatives respect to B for
i = 1, ... ,K.
(C.3) i: n ----+p ~(eo) as n ---* 00.
Then,
(a) Bn,RGLS ----+p Bo as n ---* 00,
, d
(b) y'n(Bn,RGLS - eo) ----+ N(O, D y>, (Bo)) as n ---* 00.

Proof: Let g(e) = (21'(h 1;eO) - 21'(h 1;B), ... ,21'(h K ;Bo) - 21'(h K ;B))',
Q(B) = g(B)'[~(eo)]-lg(B), Qn(B) = gn(B)'[i: n ]-l gn (B), and Qn(B)
gn(B)'[~(Bo)tlgn(B). Then,

IIQn(B) - Q(B)II :::; IIQn(B) - Qn(B)11 + IIQ(B) - Qn(B)11


< Ilgn(e)11 211[i: n]-1 - ~(BO)-111
+ Ilgn(Bo)IIII~(eo)-III{llgn(B)11 + Ilg(B)II} . (12.65)
Note that by Condition (C.1), gn(eo) = op(1). Hence, by (12.65), Condi-
tions (C.2) and (C.3),
An == sup{IQn(B) - Q(B)I : e E 8} ---* 0 in probability as n ---* 00 .
(12.66)
12.4 Estimation of Spatial Covariance Parameters 313

°
Now, if possible, suppose that en -H Bo in probability as n ---+ 00. Then,
(by Proposition A.l, Appendix A), there exist an E > and a subsequence
{mn}n>l such that Ilemn - Boll 2: E for all n 2: 1. Now, by (12.66), there
is a fu;ther subsequence {ml n }n2':l of {m n }n2':l such that Li m1n = 0(1)
almost surely. Also note that under the hypotheses of Theorem 12.4, Q( B)
is strictly positive on 8\{Bo}, and Q(Bo) = 0. Thus, Q(B) has a unique
minimum at Bo. However, with probability 1, Qmln(em1J - QmlJBO) 2:
Q(e m1n ) - Q(Bo) - 2Li m1n 2: inf{Q(B) : liB - Boll> E} - 2Li m1n > for °
all n 2: no, for some no 2: 1. This contradicts the definition of em1n as the
minimizer of Qml n (B) for all n 2: no, proving part (a) of Theorem 12.4.
To prove the second part, let Wqr denote the (q, r) component of [i:n]-l,
1 :::; q, r :::; K. Also, let gnq(B) denote the qth component of gn(B) and
let rq('; B) = 0r(-; B)/oBq, 1 :::; q :::; p. Since en minimizes the function
gn(B)'[i:n]-lgn(B), it satisfies the equations

° = O:m (gn(B)'[i:nr1gn(B)) IOdn


KK
L L Wqrgnr(en)( - 2rm(hq;en))
q=lr=l
KK
+ LLWqrgnq(en)( -2rm(hr;en)),
q=lr=l

1 :::; m :::; p. Next, let {el == (I,O, ... ,O)', ... ,ep == (O, ... ,O,I)'} denote
the standard basis of ffi.p. Hence, by a one-term Taylor series expansion of
gnq(e n ) and gnr(en ) around Bo, we obtain,

KK Wqr {P[l
~?; ~ 1
-2ra(hr; uBo + (1- u){}n)du
]

x (en - Bo)'ea} ( - 2rm(hq; en))

KK
+ ~?; Wqr
{P[l
~ 1
-2ra(h q; uBo + (1 - u)en)du
]

x (en - Bo)'e a } ( - 2rm(hr;en))


K K
- L L Wqrgnr(Bo) ( - 2rm(hq;en))
q=lr=l
K K
- L L wqrgnq(Bo)( - 2rm(hr; en)) , (12.67)
q=l r=l
314 12. Resampling Methods for Spatial Data

1 :s; m :s; p. Then, it is easy to see that the set of p equations in (12.67)
can be rewritten as
(12.68)

where f~ = J;
f(ue o + (1 - u)en)du. Because en is a consistent estimator
of e and the matrix-valued function r(e) is continuous in e, the result
follows from (12.68), Condition (C.2), and Slutsky's Theorem. 0

Thus, if Conditions (C.1)-(C.3) in the statement of Theorem 12.4 hold


for all eo E 8, then the RGLS estimator en,RGLS has the same asymptotic
covariance matrix as the GLS estimator, and hence, according to Definition
12.2, en,RGLS is asymptotically efficient.
Next we comment about the conditions required for the validity of The-
orem 12.4. Condition (C.1) of Theorem 12.4 assumes asymptotic normality
of the generic variogram estimator 2in(-), and can be verified for a given
variogram estimator under suitable moment and mixing conditions on the
process Z(·). Condition (C.2) (i) essentially requires that the choice of the
lag vectors hI"'" hK should be such that the model variogram 2,{; e)
can be distinguished at distinct parameter values el , e2 E 8 by its val-
ues (2')'(h l ; ed, ... ,2')'(hK ; ei ))', i = 1,2 at hI, ... , h K . Condition (C.2) (ii)
is stringent, as it requires the model variogram to be bounded over the
parameter space at hI"'" hK. If the variables Z(s)'s are normalized to
unit variance, then 2,),(,;,) is bounded by 2 and this condition holds. For
a spatial process Z(-) with an unbounded variance function (over 8), one
may apply the RGLS methodology to estimate the parameters of the scaled
variogram or the correlogram, defined by
2p(h; e) == 2')'(h; e)/Vare(Z(O)), hE IRd , (12.69)
e E 8, using the generic estimator 2pn (h) == 2in (h) / s;" where s;, =
N- I z=i':1(Z(Si)-Zn)2 is the sample variance of {Z(sd, ... ,Z(SN Then, n.
Condition (C.2) (ii) holds for p(h; .), and the conclusions of Theorem 12.4
remain valid under conditions analogous to (C.1)-(C.3) where the function
,{; .) is replaced by pC;.). This modified approach yields estimators of
those covariance parameters that determine the shape of the variogram.
Next consider the remaining conditions. Condition (C.2) (iii) is a smooth-
ness condition that may be directly verified for a given variogram model.
Condition (0.3) requires consistency of the covariance matrix estimator
generated by the res amp ling method under consideration. Under mild mo-
ment and mixing condition, (C.3) typically holds for the spatial block boot-
strap method of Section 12.3 and the spatial subsampling method described
above. As an illustration, we now give a simple set of sufficient conditions
on the process Z(·) under which the conclusions of Theorem 12.4 hold for
the case where the subsampling method is employed to generate the co-
variance matrix estimator I: n and where the generic variogram estimator
12.4 Estimation of Spatial Covariance Parameters 315

2i'nO is given by Matheron's (1962) method of moments estimator:

2in(h) = INn(h)I- 1 L (Z(Si) - Z(Sj)f (12.70)


(Si,Sj)ENn(h)

Here, Nn(h) == {(Si' Sj) : Si - Sj = h, Si, Sj ERn} and, recall that, for any
finite set A, IAI denotes its size.

Theorem 12.5 Suppose that (12.4) and Condition (G.2) of Theorem 12.4
hold. Also, suppose that there exists a 'f) > 0 such that max{ EIZ(h j ) -
Z(0)112+21J : 1 :::; j :::; K} < 00 and the strong mixing coefficient a(a, b) of
Z(·) (cf. (12.16), Section 12.3) satisfies the condition .

for some C E (0, (0),71 > 5d(6+'f))/'f), and 0 < 72 :::; (71 -d)/d. Then, parts
(a) and (b) of Theorem 12.4 hold with en,RGLS = en,SGLS and 2i'nO =
2inO. The asymptotic covariance matrix DE-l(O) in part (b) is given by
DE-l (0) = (r(o)'~(O)-l r(O))' where the (q, r)-th element of ~(O) is

IJqr(O) == L Cove ([Z(i + hr ) - Z(iW, [Z(hq) - Z(0)]2) , 1:::; q,r:::; K .


iEZ d
(12.71)

Proof: Follows from Theorem 5.1 and Remark 5.1 of Lee and Lahiri (2002).
D

12.4.5 Numerical Examples


We now present the results of a small simulation study on the performance
of the SGLS method. We consider a stationary two-dimensional Gaussian
process {Z (i) : i E Z2} with zero mean and an "exponential" variogram,
given by

where 0 = (0 1 ,02 )' E (0,00)2 == e. The model variogram 21{;0) and


its contour plot for (0 1 , O2 ) = (0.10,0.08) are given in Figure 12.7. Under
the same O-parameter values, a realization of the process over a 15 x 15
rectangular region is shown in Figure 12.8. The realization of the Gaussian
process was generated by the spectral method of Shinozuka (1971) and
Meijia and Rodriguez-Iturbe (1974).
For the simulation study, we considered three square-shaped sampling re-
gions given by (-3,3] x (-3,3], (-5,5] x (-5,5], and (-15,15] x (-15,15].
The prototype set Ro for all three square regions was taken as (-!,!] x
(-!, !J, with the scaling factor An being equal to 6, 10, and 30 for the three
316 12. Resampling Methods for Spatial Data

Model variogram(Contour plot)

1.769

10 12

x-lag

FIGURE 12.7. A plot of the "exponential" variogram 2"((h; B) of (12.72) (right)


and its contour plot (left) for B = (0.1,0.08),.

regions, respectively. The subregions were formed by considering translates


of the set f3nRo = (- ,62n , ,62n] X ( - ,62n , ,62n] for different choices of the sub-
sampling block size f3n. We also considered a nonsquare-type rectangular
sampling region, given by (-5,5] x (-15,15]' with Ro = (-i, i] x (-~,~]
and An = 30. Following the work of Sherman (1996) and Nordman and
Lahiri (2003a) on optimal choice of the subsampling scaling factor f3n, we
worked with f3n = CX~!2 for different values of the constant C > O.
We took the generic variogram estimator 2in(h) to be Matheron's
method of moments estimator 2in(h), given by (12.70), and the lag vec-
tors h 1 , ... ,hK as hI = (1,0)', h2 = (0,1)', and h3 = (1,1)' with K = 3.
For each sampling region, we considered the OLS estimator and Cressie's
(1985) weighted least-squares estimator (CWLS) of (B 1 ,B2 )'. The latter is
defined by (12.55) with V(B) = diag(O'l1 (B), ... , O'KK(B)), where O'rr(B) is
as in (12.71), 1 :::; r :::; K. Because of the long computation time and in-
stability of the GLS estimators in these examples, a variation of the GLS
estimator (denoted by TGLS) was used, where the matrix-valued func-
tion ~(B) of B was substituted by the true matrix ~(Bo) for all B. Thus,
the TGLS estimators of B are defined by minimizing the criterion function
Q(B, V) with V(B) == ~(BO)-1 for all B. It can be shown that the TGLS es-
timator has the same asymptotic covariance matrix as the GLS estimator
at B = Bo. The TGLS estimators are available only in simulation study and
not in practice because the true values of the parameters are unknown in
practice. Note that the TWLS, TGLS, and SGLS require only nonlinear
minimizing routine of nonlinear regression type, such as a modified Gauss-
Newton algorithm, which is faster and more stable than general nonlinear
minimizing routines required for computing the GLS estimator.
12.4 Estimation of Spatial Covariance Parameters 317

Simulated process({Contour plot)


:e.,...--------------,

· 10 10 16

FIGURE 12.8. (Right panel) A realization of a zero mean unit variance


Gaussian process with variogram 2')'(-; 0) of Figure 12.7 over the region
(-15,15) x (-15,15) . (Left panel) Contour plot of the same realization .

The results of the simulation study based on 3000 simulation runs are
summarized in Table 12.2. The leading numbers in columns 4- 5, respec-
tively, denote the means for the estimators of fh and fh, while the numbers
within parentheses represent N times the MSE, where N denotes the sam-
ple size. The first and the third columns of Table 12.2 specify the sizes of
the two sides of the rectangular sampling and subsampling regions, respec-
tively.
From the table, it appears that the SGLS method performed better than
the OLS and CWLS methods in most cases, and produced MSE values
that fell between those of the CWLS and TGLS methods. Furthermore, for
the nonsquare sampling region of size (10,30), the rectangular subregions
of size (4,6) yielded slightly better results than the square subregions of
size (4,4). See Lee and Lahiri (2002) for more simulation results under a
different variogram modeL The SGLS method has a similar performance
under the variogram model treated therein.
Although the SGLS method has the same asymptotic optimality as the
GLS method, its finite-sample statistical accuracy (as measured by the
MSE) may not be as good as the GLS (or the idealized TGLS) method,
particularly for small sample sizes. In the simulation studies carried out in
Lee and Lahiri (2002), the SGLS estimators typically provided improve-
ments over the OLS and the WLS estimators for small sample sizes and
became competitive with the GLS estimators for moderately large sample
sizes. A negative feature of the SGLS method is that the block size param-
eter !3n must be chosen by the user . A working rule of thumb is to use a !3n
that is comparable to A~·f2 in magnitude. On the other hand, as explained
in the previous paragraph, the computational complexity associated with
318 12. Resampling Methods for Spatial Data

TABLE 12.2. Mean and scaled mean squared error (within parentheses) of various
least squares estimators of 81 and 82 under variogram model (12.72). Here Rn
denotes the size of the rectangular sampling region, BS denotes the size of the
subsampling regions.

Rn LSEs BS (8 1 = 0.1) (8 2 = 0.08)


6x6 OLS 0.10(0.14) 0.082(0.09)
CWLS 0.10(0.14) 0.083(0.08)
TGLS 0.10(0.12) 0.082(0.08)
SGLS 2x2 0.10(0.12) 0.080(0.08)
3x3 0.09(0.12) 0.075(0.07)
lOx 10 OLS 0.10(0.17) 0.082(0.11)
CWLS 0.10(0.17) 0.082(0.11)
TGLS 0.10(0.13) 0.081(0.10)
SGLS 3x3 0.10(0.16) 0.080(0.10)
4x4 0.10(0.15) 0.079(0.09)
10 X 30 OLS 0.10(0.27) 0.081(0.10)
CWLS 0.10(0.27) 0.081 (0.09)
TGLS 0.10(0.22) 0.081(0.09)
SGLS 4x4 0.10(0.25) 0.080(0.08)
4X 6 0.10(0.24) 0.080(0.09)
30 X 30 OLS 0.10(0.26) 0.080(0.14)
CWLS 0.10(0.25) 0.080(0.13)
TGLS 0.10(0.20) 0.080(0.13)
SGLS 4x4 0.10(0.23) 0.080(0.13)
5x5 0.10(0.24) 0.080(0.13)
6X 6 0.10(0.23) 0.080(0.13)

the GLS method can be much higher than that associated with the SGLS
method. Table 12.3 gives a comparison of the time required for computing
the SGLS and the GLS estimators, using an Alpha workstation. Here, I
denotes the number of times iterations in the optimization routine for the
GLS method are carried out. The reported times are obtained by aver-
aging 100 repetitions. It follows from Table 12.3 that the SGLS method
is considerably faster than the GLS method. However, the most impor-
tant advantage of the SGLS and other RGLS methods is that they provide
asymptotically efficient estimates of the covariance parameters even when
the form of the asymptotic covariance matrix of the generic variogram esti-
mator is unknown, in which case the GLS method is no longer applicable.
12.5 Bootstrap for Irregularly Spaced Spatial Data 319

TABLE 12.3. A comparison of computation times (time in seconds).

Sample Size Block Size SGLS GLS (I = 30)


10 x 10 3x3 0.019/100 1.151/100
30 x 30 4x4 0.088/100 93.892/100
60 x 60 5x5 0.258/100 1542.893/100
90 x 90 6x6 0.589/100 8073.810/100

12.5 Bootstrap for Irregularly Spaced Spatial Data


Let {Z(8) : 8 E JRd}, dEN, be an m-dimensional (m E N) random field
with a continuous spatial index. In this section, we describe a bootstrap
method that is applicable to irregularly spaced spatial data generated by
a class of stochastic designs. We introduce the spatial sampling design in
Section 12.5.1. Some relevant results on the asymptotic distribution of a
class of M-estimators are presented in Section 12.5.2. The spatial block
bootstrap method and its properties are described in Sections 12.5.3 and
12.5.4, respectively. Unlike the regular grid case, in this section, we use n
to denote the sample size.

12.5.1 A Class of Spatial Stochastic Designs


Suppose that the process Z (.) is observed at finitely many locations Sn ==
{81' ... ,8 n } that lie in the sampling region Rn. We continue to use the
framework of Section 12.2 for the sampling region Rn and suppose that
Rn is obtained by inflating a prototype set Ro by a scaling constant An,
as specified by (12.1). When the sampling sites 81,"" 8 n are irregularly
spaced, a standard approach in the literature is to model them using a
homogeneous Poisson point process. However, here, we adopt a slightly
different approach and consider sampling designs driven by a collection of
independent random vectors with values in the prototype set Ro. More
precisely, let f (x) be a probability density function (with respect to the
Lebesgue measure) on Ro and let {X n }n>l be a sequence of iid random
vectors with density f(x) such that {Xn}n~1 are independent of {Z(8) :
8 E JRd}. We suppose that the sampling sites 81, ... , 8 n are obtained from
a realization Xl, ... ,Xn of the random vectors Xl, ... ,Xn as

(12.73)

where An is the scaling constant associated with Rn (cf. (12.1)). We further


suppose that
nO /An = 0(1) as n -+ 00
320 12. Resampling Methods for Spatial Data

for some b > O. This condition is imposed for proving consistency of


bootstrap approximations for almost all realizations of the random vec-
tors Xl, X2, ....
In view of (12.73), a more precise notation for the sampling sites should
be Sln, ... ,Snn, but we drop the subscript n for notational simplicity. Note
that as Xl, ... , Xn takes values in R a, the sampling sites Sl, ... , Sn poten-
tially take values over the entire sampling region Rn = AnRa. Furthermore,
by the Strong Law of Large Numbers (cf. Theorem A.3, Appendix A), the
expected number of sampling sites lying over any subregion A C Rn is
f:.
given by nP(AnXl E A) = n· -1A f(x)dx, which may be different from
n· VOl.(A~l A) for a nonconstant design density f(x). As a result, this for-
mulation allows us to model irregularly spaced spatial data that may have
different degrees of concentration over different parts of the sampling re-
gion. A second important feature of the stochastic sampling design is that it
allows the sample size n and the volume of the sampling region Rn to grow
at different rates. For a positive design density f(x), when the sample size
n grows at a rate faster than the volume of R n , the ratio of the expected
number of sampling sites in any given subregion A of Rn to the volume
of A tends to infinity. Under the stochastic design framework, this corre-
sponds to "infill" sampling of subregions of Rn (cf. Section 12.2). Thus, the
stochastic sampling design presented here provides a unified framework for
handling irregularly spaced spatial data with a nonuniform concentration
across Rn and with a varying rate of sampling.
In the next section, we describe some results on the large sample distri-
bution of a class of M-estimators under the stochastic sampling design.

12.5.2 Asymptotic Distribution of M-Estimators


Suppose that {Z(s) : S E ]R.d} is an m-variate (m E N) stationary random
field that is independent of the random vectors Xl, X 2 , ... , generating the
sampling sites. In applications, the components of the multivariate ran-
dom field Z(·) could be defined in terms of suitable functions of a given
univariate (or a lower-dimensional) random field. Suppose that the Z(·)
process is observed at locations {S1, ... ,sn} == Sn, generated by the spatial
stochastic design of Section 12.5.1, and that we are interested in estimat-
ing a p-dimensional (p E N) level-1 parameter () based on the observations
{Z(Si) : 1 :::; i :::; n}. Let \]I : ]R.p+m --+ ]R.P be a Borel-measurable function
satisfying
E\]I(Z(O);()) = o. (12.74)
Then, an M-estimator en of () corresponding to the score function \]I is
defined as a measurable solution to the estimating equation (in t E ]R.P)
n
L \]I(Z(Si); t) = 0 . (12.75)
i=l
12.5 Bootstrap for Irregularly Spaced Spatial Data 321

This class of M-estimators covers many common estimators, such as the


sample moments, the maximum likelihood estimators of parameters of
Gaussian random fields, and the pseudo-likelihood estimators in certain
conditionally specified spatial models, like the Markov Random field mod-
els (cf. Cressie (1993); Guyon (1995)). The asymptotic distribution of On
depends, among other factors, on the spatial sampling density f(x) and
on the relative growth rates of the sample size n and the volume of the
sampling region Rn, given by vol.(Rn) = A~ . vol.(Ro). Here, we suppose
that
n/ A~ ---t ~ for some ~ E (0,00] . (12.76)
When ~ E (0,00), the sample size n and the volume of the sampling region
Rn grow at the same rate. In analogy to the fixed design case (cf. (12.3)),
we classify the resulting asymptotic structure as the pure increasing domain
asymptotic structure under the stochastic design. On the other hand, when
~ = 00, the sample size n grows at a faster rate than the volume of Rn
and, therefore, any given subregion of Rn of unit volume may contain an
unbounded number of sampling sites as n ---t 00. Thus, for ~ = 00, the
sampling region Rn is subjected to infill sampling, thereby resulting in a
mixed increasing domain asymptotic structure in the stochastic design case.
As we will shortly see in Theorem 12.6 below, these spatial asymptotic
structures have nontrivial effects on the asymptotic distribution of the M-
estimator On.
To state the results, we now introduce some notation. Let W1, ... , wp
denote the components of wand let D.., be the p x p matrix with (q, r )th el-
ement E[at Wq(Y(O); 0)] where 1 ~ r, q ~ p. For a E Z~, write D"'W(Z; 0)
for the a-th order partial derivative in the O-coordinates. Let

E..,,~ = r..,(O)· ~-1 + r r..,(s)ds. iRar f2(X)dx,


i~d
(12.77)

~ E (0,00], where r..,(s) = Ew(Z(O))W(Z(s))', s E jRd, and where we set


~ -1 = 0 for ~ = 00. Let a(a; b) denote the strong mixing coefficient of
the multivariate spatial process {Z(s) : s E jRd}, defined by (12.15) and
(12.16), with Fz(S) == a({Z(s) : s E S}), S C jRd. We shall suppose that
there exist constants C, T1 E (0,00), and T2 E [0,00) such that

(12.78)

for any d ~ 2 and for d = 1, (12.78) holds with T2 = o. As before, let G(m)
denote the marginal distribution of Z(O), Le., G(m)(A) = P(Z(O) E A),
A E 8(JRm). Also, recall that for a positive-definite matrix E of order kEN,
<p(.; E) denotes the probability measure corresponding to the k-dimensional
Gaussian distribution with mean 0 and variance matrix E.
Next, suppose that {Xn}n~l and {Z(s) : s E jRd} are defined on a com-
mon probability space (0, F, P). Let Px denote the joint probability distri-
bution of the sequence {Xn}n~l and let PIX and E.lx, respectively, denote
322 12. Resampling Methods for Spatial Data

the conditional distribution and expectation, given Xoo == 0'( {Xn : n 2 I}).
We now state the main result of this section that asserts consistency and
asymptotic normality of the multivariate M-estimator en conditional on
the sequence of iid random vectors Xl, X 2, ....

Theorem 12.6 Suppose that (12.75) has a unique solution, that (12.76)
holds, and that the following conditions hold:

(C.4) The density f(x) of Xl is continuous on cl.(Ro).

(C.5) For some 7]0 E (0,00), llJ(z; t) has continuous second-order partial
derivatives with respect to t on the set {lit - 811 ::; 7]0} == 8 0 for
almost all z (G(m)).

(C.6) The p x p matrices D'J! and Jf'J!(s)ds are nonsingular.

Also suppose that with r = 1, the following conditions holds:

(C.7)r For some 0 E (0,00),

(a) EIIlJ(Z(O); 8)1 2 rH < 00 ,


(b) max EID"'IlJ(Z(O); 8)1 2 rH < 00 ,
1"'1=1
(c) Esup{ID"'IlJ(Z(O) : t)1 2rH : t E 8 0, 10:1 = 2} < 00 ,
(d) Tl > (2r - 1)(2r + o)d/o, and
(e) for d 22,0::; T2 < (Tl - d)/4d .

Then, for both the pure- and the mixed-asymptotic structures (i.e., for '~ E
(0,00) , and for '~ = +00', respectively),

~~~ IPIX(A~/2(en-8) E A) -q?(A;Dq;lI:'J!,~(D'J!)-l)/)1 = 0(1) a.s. (Px )


(12.79)
where C is the collection of all measurable convex sets in IRP.

Proof: See Lahiri (2003d). o

When the solution to equation (12.75) is unique and the other condi-
tions of Theorem 12.6 hold, (12.79) shows that en is consistent for 8 for
almost all realizations of the random vectors Xl, X 2, ... , i.e., On ---+ 8 in
Pix-probability, a.s. (Px ). (See Definition 12.3, (12.89), and (12.90) in
Section 12.5.4 below for a precise definition of this notion of convergence
and its connection with the usual notion of convergence in probability.)
When the uniqueness condition on the solution to (12.75) does not hold,
Lahiri (2003d) shows that a consistent sequence of solutions of (12.75) ex-
ists. For the nonunique case, conclusions of Theorem 12.6 remain valid for
this sequence of solutions.
12.5 Bootstrap for Irregularly Spaced Spatial Data 323

A notable feature of Theorem 12.6 is that under both types of asymptotic


structures, the rate of convergence of the M-estimator is Op(,X;:;-d/2). Note
that this rate is comparable to the rate Op(n- 1 / 2 ) for the "pure increasing
domain" case (i.e., for Ll E (0,00)), but it is slower than Op(n- 1 / 2 ) under
the "mixed increasing domain" case (i.e., for Ll = +00). The slowness of
the convergence rate in the "Ll = +00" case compared to the standard
Op(n- 1 / 2 ) rat~ is due to the "infill" sampling component, which leads to
conditions of long-range dependence in the sampled data values. Also note
that the asymptotic variance matrices under the two asymptotic structures
are different. The infill component in the "mixed increasing domain" case
leads to a reduction in the asymptotic variance matrix of On as the positive-
definite matrix r(O) drops out for Ll = 00. Thus, the asymptotic variance
of the estimator t~On of the linear parametric function t~(), to E 1l~.P, to i=
0, becomes smaller in the mixed increasing domain case by the positive
additive factor t~r(O)to.
Although block bootstrap methods do not always work under strong
dependence (cf. Chapter 10), we now present a block bootstrap method
that remains valid, not only in the "pure increasing domain" case, but
also in presence of such strong dependence under the "mixed increasing
domain" case.

12.5.3 A Spatial Block Bootstrap Method


Let Rn = 'xnRO be the sampling region. Suppose that the process Z(.)
is observed at locations {S1,"" sn} == Sn, where Si'S are generated by
the stochastic design of Section 12.5.1. The block bootstrap method for
the irregularly spaced sampling sites attempts to recreate a version of the
process Z (.) over Rn using a mechanism similar to that used for the grided
data. Let {,Bn}n~1 be a sequence of positive real numbers satisfying (12.4),
i.e., satisfying
,13;;1 + ,X;:;-1,Bn = 0(1) as n -+ 00 .
Also, let Rn = UkEKn Rn(k) be the partition of the sampling region Rn
by the subregions Rn(k), k E Kn of (12.5), where, recall that Rn(k) ==
(k+U),BnnRn, Kn == {k E Zd : (k+U),BnnRn i= 0} and U = [0, l)d. We now
define the bootstrap version of the Z (.)- process over each of the subregions
Rn (k), k E K n , using the collection {i + U,Bn : i E In} of overlapping cubes
of volume ,B~, where as in Section 12.3, In == {j E Zd : j +U,Bn eRn}. For
k E K n , define Bn(i; k) = [i+U,Bn] n [Rn(k) - k,Bn +i], i E In. Then, for any
given k E K n , Bn(i; k) has the same shape as the subregion Rn(k) for all
i E In. We call {Bn(i; k) : i E In} the collection of observed blocks of "type
k". To define the bootstrap version of the Z(·)-process over Rn(k), we now
select a block of "type k" by resampling at random (and independently
of the other resampled blocks) from the collection {Bn(i; k) : i E In}.
Formally, let {h : k E Kn} be (conditionally) iid random variable with
324 12. Resampling Methods for Spatial Data

common distribution (12.6), i.e.,


. 1
P(Ik =~) = lIn I' i E In ,

for k E IC n . Then the collection of resampled blocks is given by {B(h; k) :


k E IC n }. Without loss of generality, we suppose that the variables {h : k E
IC n }n>l are defined on the same probability space (n, Y, P) supporting the
variables {Z(s) : s E IRd} and {Xn : n 2 I}. Next, write Zn(A) = {Z(s) :
s E Sn n A} to denote the collection of observations over a set A C JRd.
Then, the bootstrap version of the Z(·)-process over the k-th subregion is
given by
(12.80)
The bootstrap version Z~(Rn) of the random field Z(·) over the entire sam-
pling region Rn is now obtained by pasting together the copies Z~(Rn(k)),
k E IC n :

Although all the blocks Bn(i; k), i E In of "type k" have the same shape as
the subregion Rn(k) in the stochastic design case, each of them may contain
a different number of sampling sites, as the sampling sites Sl, . .. ,Sn are
randomly distributed over the sampling region Rn. Since the bootstrap
version of the process over Rn(k) is defined by randomly selecting one of
the "type k" blocks, the number of the resampled observations over Rn(k)
is typically different from the number of observations in Rn(k) itself. Let
L'k == L'k n denote the size of the resample Z~(Rn(k)) over the subregion
Rn(k), k' E IC n . Also, let n* = 2:kEKn L'k denote the total number of the
resampled values over the sampling region R n , i.e., n* is the size of Z~(Rn).
Although the resample size n * is typically different from the original sample
size n, it can be shown that
E.lx(n*)
--'----'-...:... ~ 1 as n ~ 00, a.s. (P:x). (12.81)
n
We define the bootstrap version of a statistic in = tn(Zn(Rn)) by
(12.82)
In particular, the bootstrap version of the sample mean Zn
n- 1 2:~=1 Z(Si) is given by

Z~ = L S~(k)/n* , (12.83)
kEKn

where S~(k) is the sum of the L'k-many resampled values Z~(Rn(k)) over
the subregion Rn(k), k E IC n . For later reference, we also define the boot-
strap version of the normalized M-estimator
_ d/2
Tn - An (On - 0) ,
A
12.5 Bootstrap for Irregularly Spaced Spatial Data 325

where () and On are respectively given by (12.74) and (12.75). As explained


in Section 4.3, there is more than one way of defining the bootstrap version
of Tn. Here, we follow the approach of Shorack (1982) that centers the
score function to define the bootstrap version of the M-estimator On. For
k E Kn and t E ]RP, let S~(k; t) denote the sum of all W(Z(Si); t)-variables
corresponding to the Z(Si)'S in the resample Z~(Rn(k)), i.e., S~(k; t) =
2:~=1 W(Z(Si); t)D.(Si E Bn(h; k)) (cf. (12.80)). Then, the bootstrap version
()~ of On is defined as a measurable solution (in t) of the equation

L [S~(k;t)-E*S~(k;On)] =0, (12.84)


kEiC n

where, in this section, P* and E*, respectively, denote the conditional prob-
ability and the conditional expectation given g == a( {Xn : n :::: I} U {Z( s) :
S E ]Rd}). The bootstrap version of Tn is now given by

T~ = >.~/2(()~ - On) . (12.85)

Note that centering ()~ at On in the definition of T;; is permissible as the


(conditional) expected value ofthe left side of (12.84) at the true "bootstrap
parameter" value t = On is zero. An alternative boot:strap version of Tn is
given by T;;* = >.:e(()~ - On). In view of (12.81), both bootstrap versions
of Tn have the same limit distribution. In this book, we shall consider T;;
only.
In the next section, we describe the properties of the spatial block boot-
strap method under the stochastic design of Section 12.5.2 in variance and
distribution function estimation problems. For a closely related version of
the spatial block bootstrap method and its properties under a homoge-
neous marked Poisson process set up, see Politis, Paparoditis and Romano
(1998, 1999).

12.5.4 Properties of the Spatial Bootstrap Method


First we consider properties of the block bootstrap variance estimator for
the sample mean. Suppose that {Z(s) : S E ]Rd} is a univariate (m = 1)
stationary random field. Note that in this case, the sample mean Zn =
n- 1 2:~=1 Z(Si) is the M-estimator of the population mean p, = EZ(O)
corresponding to the score function w(x; t) = x - t, x, t E ]R, with p = 1 =
m. If the conditions of Theorem 12.6 are satisfied for this choice of '11('; .),
then it follows from Theorem 12.6 that for any A E (0,00],
>.~/2(Zn - p,) --+d N(O; a~,~) a.s. (Px), (12.86)
where, with a(s) = EZ(O)Z(s), S E ]Rd, the asymptotic variance a~ ~ is
~~as '
a~,~ = a~) + r a(s)ds iRa
i~d
r f2(x)dx. (12.87)
326 12. Resampling Methods for Spatial Data

It can be shown (cf. Lahiri (2003d)) that a~,LJ. is not only the asymptotic
variance of A~/2 (Zn - /L), but it is also the exact limit of A~ . Var.lx(Zn), a.s.
(Px ). This shows that both the spatial sampling density f and the type of
the asymptotic structure (viz., pure increasing domain and mixed increas-
ing domain asymptotic structures) have nontrivial effects on the variance
- d/2 -
of Zn. Under both asymptotic structures, the variance of An Zn takes the
minimum value when the design density f is uniform over Ro. On the other
hand, as noted earlier, the infill component of the mixed increasing domain
asymptotic structure (with "b. = 00") leads to a reduction in the variance
of the scaled sample mean A~/2 Zn. Inspite of the variations in the form of
the asymptotic variance due to these factors, the block bootstrap method
provides a "consistent" estimator of a~ LJ. in all cases, as shown by the
following result. '
Theorem 12.7 Suppose that Conditions (0.4) and (0. 7)r of Theorem
12.6 hold with p = 1 = m, r = 3 and 1]!(x; t) = x - t. Also, suppose
that there exists <5 E (0, 1) such that
f3;;1 + A;;(l-O) f3n = 0(1) as n ---+ 00 (12.88)
and that (12.76) holds. Then, for almost all realizations of Xl, X 2 , ... under
Px,
A~.Var*(Z~) ---+ a~,LJ. in P.lx-probability, a.s. (Px)
where Z~ is as defined by (12.83).
Proof: See Lahiri (2003d) D

Thus, it follows that the block bootstrap variance estimator provides a


valid approximation to the variance of the sample mean under both pure-
and mixed-increasing domain asymptotic structures for a large class of
design densities f that can be nonuniform over Ro. Because the variance
estimator can be computed without the knowledge of the sampling density
f, it enjoys considerable advantage over traditional "plug-in" estimators
of a~ LJ. that require one to estimate each component of the population
paran::eter a~ LJ. explicitly.
Next note that the convergence of the bootstrap variance estimator in
Theorem 12.7 is asserted "in Pix-probability, a.s. (Px )," which is some-
what nonstandard. A precise definition of this notion of convergence is
given as follows:
Definition 12.3 Let {t n (Z, X)}n2:l be a sequence of random variables on
(n, F, P) such that tn(Z, X) is measurable with respect to the a-field 9 ==
a({Z(s) : s E jRd} U {Xn : n 2: I}), and let a E R Then, we say that
tn(Z, X) ---+ a in Pix-probability, almost surely (Px), if

Px Cl.!.~ PIX (Jtn(Z, X) - aJ > E) = 0 for all E> 0) = 1. (12.89)


12.5 Bootstrap for Irregularly Spaced Spatial Data 327

It is easy to show that the event on the left side of (12.89) is in :F as


we may (without loss of generality) restrict attention to a countable set
of E > O. It is easy to see that (12.89) implies the more familiar notion of
"convergence in probability," i.e., (12.89) implies that

lim p(ltn(Z, X)I >


n->oo
E) = 0 for all E> 0 . (12.90)

The reason for stating Theorem 12.7 using the nonstandard notion of con-
vergence is that it allows us to interpret consistency of the bootstrap vari-
ance estimator for almost all realizations of the stochastic design vectors
Xl, X 2 , . ..• Thus, once the values of Xl, X 2 , ..• are given, i.e., once the
locations of the sampling sites are given, we may concern ourselves only
with the randomness arising from the random field Z(·) and the bootstrap
variables {h : k E Kn} n> 1, by treating the locations {Sl' ... , Sn} as non-
random. However, the usual notion of "convergence-in-probability" (viz.,
(12.90)) does not allow such an interpretation in the stochastic design case.
Next we consider properties of the bootstrap distribution function es-
timators. As in Section 12.5.2, here we suppose that the random field
{Z(s) : s E ~d} is stationary and m-dimensional for some mEN. Let
en denote the M-estimator of the p-dimensional parameter e based on
d/2
Z(sd, ... , Z(sn), as defined by (12.75). Let Tn = An (en - e) be the nor-
A

malized version of en and let T~, defined in (12.85), be its bootstrap version.
Then, we have the following result.

Theorem 12.8 Suppose that e~ is a unique solution of (12. 84}. Also, sup-
pose that Conditions (C.4), (C.5), (C.6), and (C.7}r of Theorem 12.6 hold
with r = 3 and that (12.76) and {12.88} hold. Then,

sup Ip*(T~ E A) - Plx(Tn E A)I---+ 0 in Pix-probability, a.s. (Px )


AEC
(12.91 )
under both pure- and mixed-increasing domain asymptotic structures (i. e.,
for ~ E (0, 00) and for ~ = oo}, where C is the collection of all measurable
convex subsets of ~p.

Consistency of the bootstrap distribution function estimator continues


to hold even when the bootstrap estimating equation (12.84) has multi-
ple solutions. In this case, there exists a sequence of solutions {e~}n>l of
(12.84) that approximates {en}n>l with OP. (A;:;-d/2(10gnn accurac~ for
some c> 0, and (12.91) holds for this sequence of solutions, as in Section
4.3.
Like the variance estimation problem, the spatial bootstrap provides a
valid approximation to the distribution of M-estimators under the stochas-
tic design, allowing nonuniform concentration of sampling sites and infill
sampling of subregions in Rn. We point out that in contrast to the results
328 12. Resampling Methods for Spatial Data

of Chapter 10, here the block bootstrap method remains valid even in pres-
ence of a particular form of "strong" dependence in the data, engendered
by the mixed increasing domain asymptotic structure.

12.6 Resampling Methods for Spatial Prediction


In this section, we consider two types of prediction problems. For the first
type, we suppose that {Z(s) : s E ]R.d} is a random field with a continuous
spatial index and the objective is to predict fRn g(Z(s))ds, the integral of a
function 9 of the process Z (.) over the sampling region R n , on the basis of a
finite sample. We describe resampling methods for this type of problems in
Section 12.6.1. The other type of prediction problem we consider here is in
the context of best linear unbiased prediction (or Kriging) of a "new" value
on the basis of a finite set of observations. This second type of problem is
addressed in Section 12.6.2.

12.6.1 Prediction of Integrals


Suppose that g: ]R. -> ]R. is a bounded Borel-measurable function and that
we wish to predict

Aoo == Aoo,n = r g(Z(s))ds


lRn
(12.92)

on the basis of finitely many observations lying in the sampling region R n ,


where Rn = AnRo is as described in Section 12.2 (cf. (12.1)). Here we use
the hat n in Aoo to indicate that it is a random quantity, while we use
the subscript 00 in Aoo to indicate that it is a functional of the totality
{Z(s) : s ERn} of random variables in Rn and is unobservable. In order
to predict Aoo consistently, we adopt a sampling framework that fills in
any subregion of Rn with an increasing number of sampling sites. More
precisely, let {'I]n}n>l be a sequence of positive real numbers such that
'l]n 1 0 as n -> 00. We suppose that the process Z(·) is observed at each
point of the scaled down integer grid 'f/n . Zd that lies in the sampling region
Rn. Thus, the sampling sites are given by
(12.93)

Although we use the same symbol Sn to denote the collection of all sampling
sites in Sections 12.3, 12.5, and in here, the size of the set Sn is different in
each case, depending on the spatial design. For the rest of Section 12.6.1,
we shall use N 2n to denote the size of Sn. Then, under Condition B on the
boundary of the prototype set Ro, the sample size N2n satisfies the relation

N 2n = vol.(Ro)· 'f/.;;-d)..~(1 + 0(1)) as n -> 00 . (12.94)


12.6 Resampling Methods for Spatial Prediction 329

°
Since 'fJn 1 as n ---+ 00, this implies that the sample size N2n grows at a
faster rate than the volume of the sampling region Hr.. Thus, the resulting
asymptotic structure is of the "mixed increasing domain" type, with a
nontrivial infill component. A predictor of 1:::..00 based on the observations
{Z(s): s E Sn} is given by

I:::.. n = Ni} L g(Z(s)) . (12.95)


sESn

°
Under mild conditions on the process Z(·) and the function g(.), I:::.. n is L2_
consistent for 1:::..00 in the sense that E(l:::.. n - 1:::..(0)2 ---+ as n ---+ 00. The rate
at which the mean squared prediction error (MSPE) E(l:::.. n - 1:::..(0)2 goes
to zero depends on both the increasing domain scaling parameter {An}n>l
and the infill scaling parameter {'fJn}n::::l.
Lahiri (1999b) considers the spatial cumulative distribution function
(SCDF)
Foo(zo) = f n(Z(s)::; zo)ds, Zo (12.96)
lRn
E]R ,

corresponding to g(.) = n(· ::; zo), Zo E ]R in (12.92). The corresponding


predictor, given by (12.95), is

Fn(zo)
A
= N2n1 ~
' " n(Z(s) ::; zo), Zo E ]R , (12.97)
sESn

the empirical cumulative distribution function (ECDF). A result of Lahiri


(1999b) shows that under some regularity conditions on the process Z(·),

for some constant c(zo) E (0,00). Furthermore, the scaled process

(12.98)

converges in distribution to a zero mean Gaussian process W as random ele-


ments of the Skorohod space 11))1 of right continuous functions from [-00,00]
to ]R with left hand limits. The covariance function of W(·) is given by

Cov(W(zt}, W(Z2)) = [vol.(RO)]-l L a(~)


a.
. ( D a G 2(zl, Z2; s)ds ,
lJRd
lal=2
(12.99)
Zl, Z2 E ]R, where a(a) = fu fu {(x - s)a - (x - Id)a - (ld - s)a}dsdx,
U = [0,1)d, Id = (1, ... ,1)' E ]Rd, and where G 2(Zl, Z2; s) == P(Z(O) ::;
Zl, Z(s) ::; Z2), Zl, Z2 E ]R denotes the bivariate joint distribution function
of (Z(O), Z(s))', s E ]Rd.
330 12. Resampling Methods for Spatial Data

Lahiri (1999b, 1999d) describes subsampling and bootstrap methods for


such integral-based prediction problems. First we describe the subsampling
method, which allows us to describe the main ideas more transparently. Let
{,8n}n~l and bn}n~l be two sequences of positive real numbers such that
"Yn is a multiple of'rJn (i.e., 'rJ;/"Yn EN for all n 2: 1) and

(12.100)

and
(12.101)

Here ,8n will be used to construct the blocks or subregions of R n , while "Yn
will be used to construct a subsample version of the Z (. )- process on the
subregions at a lower level of resolution. As in Sections 12.3-12.5, the re-
quirement (12.101) says that the volume of the subregions grow to infinity,
but not as fast as the volume ofthe original sampling region Rn. Similarly,
the conditions on bn}n~l given by (12.100) say that "Yn tends to zero but
at a slower rate than the original rate 'rJn of infilling. Thus, the scaled grid
"Yn7l,d is a subgrid of'rJn7l,d for any n 2: 1 and, therefore, has a lower level of
resolution. For a given subregion Rn,i (say), we use the observations in Rn,i
on the finer grid 'rJn7l,d to define the subsample copy of the unobservable pre-
dictand ADO and the observations in Rn,i on the coarser grid "Yn7l,d to define
the subsample copy of the predictor An. Here we only consider overlapping
subregions Rn,i'S; a nonoverlapping version of the subsampling method
can be defined analogously by restricting attention to the sub collection of
nonoverlapping subregions only. Let U o = (-~, ~ld denote the unit cube in
!Rd , with its center at the origin. Also, let IOn = {i E 7l,d : 'rJni+Uo,8n eRn}
be the index set of all cubes of volume ,8~ that are centered at 'rJni E 'rJn7l,d
and are contained in Rn. Then, the subregion Rn,i is defined by inscribing
a scaled down copy of the sampling region Rn inside i + Uo,8n such that
the origin is mapped onto i (cf. Section 12.4.3). Specifically, we let

Rn,i = i + ,8nRo, i E IOn.

Note that Rn,i has the same shape as the original sampling region R n , but
a smaller volume, ,8~vol.(Ro), than the volume A~vol.(Ro) of Rn. Next, we
define the subsample versions of ADO and An for each i E Ion. To that end,
note that Rn,i's are congruent to ,8nRo == Rn,o and that the numbers of
sampling sites in Rn,i over the finer grid 'rJn7l,d and over the coarser grid
"Yn7l,d are respectively the same for all i. Let Ln == Land Rn == R denote
the sizes of the sets ,8nRo n 'rJn7l,d and ,8nRo n "Yn7l,d, respectively. For each
i E Ion, we think ofthe L observations {Z(s) : s E Rn,in'rJnZd} on the finer
grid as the analog of {Z(s) : s ERn} and the R observations {Z(s) : s E
Rn,i n "Yn7l,d} as the analog of the original sample {Z(s) : s E Rn n'rJn7l,d},
at level of the subsamples. Hence, we define the subs ample versions of ADO
12.6 Resampling Methods for Spatial Prediction 331

and .6.. n on Rn,i as


L- 1
I:
sE7)n 'il d nRn "
g(Z(s))

£-1
I:
sE,n'ildnRn,i
g(Z(S)) , (12.102)

i E Ton. Then, for a random variable of interests Tn = tN 2n (.6.. n ; .6.. CXJ ), its
subsample version on the subregion Rn,i is defined as
(12.103)
Note that we use tp in the definition of T:' i' as £ is the analogous quantity
to the sample size N 2n at the level of subs~mples. The subsample estimator
of G n (-) == P(Tn :::; .) is now given by

G~(x) = ITon l- 1 I: n(T:',i :::; x), x E JR . (12.104)


iEIOn

The subs ample estimator of a functional <pO of ~n is given by the "plug-


in" rule <p(G~). For example, with Tn = an(lln -llCXJ), where {a n }n>l is a
sequence of scaling constants, the subsample estimator of the scaled MSPE
A 2_ 2 2' .
of lln' ETn = anE(lln - llCXJ) ,IS gIven by
A A

MSPEn == ITon l- 1 I: {ap(ll~,i -ll~,i)}2 .


iEIOn

The subsampling method applies not only to prediction problems involv-


ing finite dimensional predictands like .6.. CXJ ' but it also applies to infinite
dimensional predictands. Lahiri (1999b) proves the validity of the subs am-
pIing method for predicting the SCDF FCXJ . For each Zo E JR, define the
subsampling versions of FCXJ of (12.96) and of Fn of (12.97) as

F~,i(ZO) L- 1
I:
SE7)n 'il d nRn "
n(Z(s) :::; zo) (12.105)

F:',i(ZO) C- 1 I:
sE,n'ildnRn,i
n(Z(s) :::; zo) , (12.106)

Zo E JR, i E Ton. Let w : JR ---+ [0, (0) be a measurable function and let
(12.107)
where for any function h : JR ---+ JR, we write IlhllCXJ,w = sup{lh(x)lw(x) :
x E JR}. Then, the subsampling estimator of the sampling distribution
GIn (-) == P(Tln :::; .) is given by

Gtn(a) = ITonl- 1 I: n(b~2jJ~)1/21IF:"i - F~,iIlCXJ:::; a), a E JR.


iEIOn
(12.108)
332 12. Resampling Methods for Spatial Data

We now show that Gin is a consistent estimator of G 1n . Recall (cf. (12.99))


that G 2(Zl,Z2jS) == p(Z(O) :::; Z1, Z(s) :::; Z2), Z1,Z2 E JR denotes the
bivariate distribution function of (Z(O), Z(s))'. Define its two-sided variant

Zl, Z2 E JR, s E Rd. Also, define the p-mixing coefficient of the random field
Z(·) by

p(kjm) sup {P(YZ(Sl), YZ(S2)) : S1, S2 E B(JRd ),

vOl.(Sl) :::; m, vol.(S2) :::; m, dist.(Sl, S2) ~ k} ,


k,m E (0,00), where dist.(SI,S2) denotes the distance between the sets A
and B in the £1- norm, Yz(T) = O"({Z(s) : sET}) for any T C JRd and
where the p-mixing coefficient between two O"-fields Y 1 and Y 2 are defined
as usual as in Chapter 3 (cf. (3.4)):

Then, we have the following result:

Theorem 12.9 Suppose that the following conditions hold:

(C.S) There exist positive constants C, Tl, T2 satisfying T1 > 3d and T2 <
Td d such that

p(kj m) :::; C· k- T1 m T2 for all m, k E [1,00) .

(C. 9) {Z (s) : s E JR d} is stationary and the marginal distribution function


G(·) of Z(O) is continuous on R

(C.l0) For each Z1, Z2 E JR, G 2(Zl, Z2j·) has

(i) bounded and integrable partial derivatives of order 2 on JRd (with


respect to the Lebesgue measure), and
(ii) for lal = 2, there exist nonnegative integrable functions
H,,(Zl, Z2j·) such that for all s, t E JRd with IItll :::; 1,

ID"G2(ZI' Z2j S + t) - D a G 2(ZI, Z2j s)1 :::; IItll'; Ha(ZI' Z2; s)


for some 8 E (0,00), (not depending on Zl. Z2).
12.6 Resampling Methods for Spatial Prediction 333

(C.ll) There exist constants C > 0, 'I E (~, 1] such that

L ID a C2(ZI, Z2; s)1 ::; CIG(Z2) - G(zdl'Y


lal=2

(C.12) [77~2'Y+I)2A~rl + [77nAn/IOgAnrl ----+ 0 as n ----+ 00 where 'I is as


in Condition (C.ll).

(a) Then, there exists a zero mean Gaussian process W such that

(12.109)

as ][))I -valued random variables under the Skorohod metric, where


the Gaussian process W has continuous sample paths with prob-
ability one, W( +(0) = W( -(0) = 0 a.s., and where the covari-
ance function ofW is given by (12.99).
(b) Suppose that, in addition to the above conditions, (12.100) and
(12.101) hold and that Condition (C.12) holds with An replaced
by f3n. If IIWlloo,w has a continuous distribution on JR., then

sup IGin(a) - GIn(a)l----'>p 0 as n ----+ 00 . (12.110)


aEiFt

Proof: See Lahiri (1999b). o

For discussion of the Conditions (C.8)-(C.12) and their verification for


instantaneous transformations of a Gaussian random field, we refer the
reader to Lahiri (1999b). Theorem 12.9 can be used to construct valid
simultaneous prediction intervals for the unobservable SCDF F00 over any
given interval (a, b) by choosing the weight function w(·) to be w(·) ==
ll(a,b)(')' In particular, to construct a prediction band over the whole real
line, we set w(x) == 1 for all x E R Let q~ n denote the a-th quantile
of Gin' 0 < a < 1. Note that, if Ttl:! ::; .. '. ::; Ttl:l denote the order
statistic corresponding to the subsample version T{n i' i E Ion and I ==
IIonl, then q~,n is given by the lIaJth order statist'ic TtllaJ:I' A large
sample 100(1 - a)% simultaneous prediction interval for Foo is given by

where w(x) == 1 for all x E JR. and ]PI is the collection of all probability
distribution functions on R Then, under the conditions of Theorem 12.9,
this prediction band attains the nominal coverage probability as n ----+ 00.
334 12. Resampling Methods for Spatial Data

The subsampling method presented here can be extended to grided spa-


tial data with a similar mixed asymptotic structure, where the grid is not
necessarily rectangular. See Lahiri, Kaiser, Cressie and Hsu (1999) for an
extension of the subsampling method to a hexagonal grid structure in ]R2.
Finite sample properties of the subsampling method for the square grid is
studied by Kaiser, Hsu, Cressie and Lahiri (1997).
For the sake of completeness, we next briefly describe a spatial boot-
strap method for the prediction problem. Let bn}n~l and {,Bn}n~l be
two sequences of positive real numbers satisfying (12.100) and (12.101),
respectively. Also, suppose that ,Bn, 1'n'T/;;1 E N for all n ?: 1. Let
K~ = {k E Zd : (k + UO),Bn n Rn =1= 0} and let R~(k) == (k + UO),Bn n Rn,
k E K~. Then,
Rn = R~(k)U (12.111)
kEIC~

is a partition of the sampling region Rn based On cubes having volumes f3~


and having centers at k,Bn E Zd. This differs from the partition (12.5)
in that the cubes (k + UO),Bn in (12.111) are centered at k,Bn E Zd,
while the cubes (k +U),Bn in (12.5) are centered at k + (~, ... , ~)',Bn. Let
K~n == {k E K~ : (k + UO),Bn C Rn} denote the index set of all interior
cubes. For bootstrapping in the context of the prediction problem, we de-
fine a version of the random field Z(·) On the finer scale 'T/nZd over each
of the subregion R~(k), k E KIn. Bootstrap versions over the boundary
subregions R~(k), k E K~\K~n are ignored as it is difficult to assess the
order of the discrepancy between the bootstrap version of the predictor
and the predictand, owing to possible irregular form of the boundary. Un-
der Condition B On the boundary of Ro, the effect of this modification is
negligible, asymptotically. In the context of time series data, this is equiv-
alent to defining the bootstrap version of a chain Xl' ... ' Xn of size n by
resampling complete blocks of length l, say, and generating a bootstrap
chain Xi, ... ,X~l of length nl == bl.
As in the case of the subsampling method, let 'Ion = {i E Zd : i'T/n +
Uo,Bn C Rn} denote the index set of all translates of the cube ,BnUo of
volume ,B~ that are centered On the finer grid 'T/nZd and that are contained
in the sampling region Rn. The bootstrap version of the process Z (.) over
each R~(k), k E K~n is defined using a randomly selected subregion from
the collection {i'T/n + Uo,Bn : i E 'Ion}. More precisely, let {I2 :
k E K~n} be
a collection of iid random variables with commOn distribution

P(IP = i) = 1'I~n I' i E 'Ion .

Then, for each k E K~n' the resampled subregion is given by R~(k)


+ ,BnUO· We define the bootstrap version ~~,n of .:i(X),n by using
12'T/n
the observations from all the subregions R~(k), k E K~n' corresponding to
data-sites located On the finer grid 'T/nZd, and similarly, define the bootstrap
12.6 Resampling Methods for Spatial Prediction 335

version of An using the observation from the same collection of resampled


subregions, but only corresponding to those data-sites that are located on
the coarser grid "(n7l,d. Thus,

~~,n = L L g(Z(s)) / L i1JnZd n R~(k)i, (12.112)


kEK?n SET/n ZdnR ;:, (k) kEKYn

g(Z(s)) , (12.113)

where N3n = 2:kEKoin i"(n7l,d n R~(k)i denotes the number of observations


on the coarser grid over UkEKoin R~(k). With this, the bootstrap version of
a random variable of interest Tn == tN2n (An' Aoo) is given by

(12.114)

Bootstrap estimators of the sampling distribution Gn(x) = P(Tn :::; x),


x E ~ may be obtained as usual by considering the conditional distribution
Gn(x) = P*(T;; :::; x), x E jR of T;; given {Z(s) : s E ~d}.

12.6.2 Prediction of Point Values


In this section, we consider a stationary Gaussian process {Z (s) : s E
jRd} with constant mean J1 = EZ(O) and autocovariance function (or the
covariogram)
a(h) = Cov(Z(O), Z(h)), hE]Rd . (12.115)
Given a set of observations {Z(Sl), ... , Z(sn)} on the spatial process at a
finite set of locations Sl, ... ,Sn C jRd, suppose that we are interested in
predicting the value of the spatial process at a new (unobserved) location
So E jRd\ {S1, ... , sn}. (Unlike the earlier sections of this chapter, in this
section we do not assume that Sl, . .. ,Sn are necessarily specified by one
of the spatial sampling designs considered so far.) A widely used method
for optimal linear prediction of Z(so) based on {Z(Sl), ... , Z(sn)} is the
method of kriging, named after a South African mining engineer, D. G.
Krige (cf. Chapter 3, Cressie (1993)). Given the variables Z(sd, ... , Z(sn),
it seeks to find an optimal predictor from the class

(12.116)

of linear unbiased predictors of Z (so), by minimizing the mean squared


prediction error E(Z(so) - 2:~=1 WinZ(Si))2. An explicit formula for the
best linear unbiased predictor (BLUP) .2'n(SO) of Z(so) can be worked out
using standard optimization techniques from Calculus. Let L: n denote the
336 12. Resampling Methods for Spatial Data

n X n matrix with (i,j)-th entry CJ(Si - Sj), 1:S i,j:S n and let In denote
the n x 1 vector with i-th entry CJ(so - Si), 1 :S i :S n. Then, the BLUP
Zn(SO) of Z(so) is given by

Zn(SO) = (Z(Sl), .. " Z(Sn))'~~l (In + In(l - 1~~~11n)[I~~~11nl-1),


(12.117)
where In = (1, ... ,1), E lR. n is the n x 1 vector of l's. The associated
prediction error is

T~ E( Zn(SO) - Z(so) f
CJ(0)2 _ y'n~-ly + (I' ~-ly _ 1)2(1' ~-11 )-1
nn nnn nnn'

(12.118)

If the covariogram CJ(.) is known, then Zn (so) can be used as a predictor


of Z(so) and a 100(1 - 2a)% (0 < a < 1/2) prediction interval for Z(so)
can be constructed as

(12.119)

where Z", is the upper a critical point of the standard normal distribution,
i.e., <I>(z",) = 1 - a, with <1>(.) denoting the distribution function of the
standard normal distribution N(O, 1). Note that in(a) attains the nominal
coverage probability (1 - 2a) exactly, i.e.,

(12.120)

However, in most applications, the function CJ(') is unknown and has to be


estimated from the data. Here, we shall suppose that CJ(') lies in a paramet-
T;
ric family {CJ(.; 8) : 8 E e} of valid covariograms. Let Zn (so; 8) and (8) be
defined by replacing CJ(') by CJ(·;8) in (12.117) and (12.118), respectively.
Also, let en denote an estimator of 8 based on Z(Sl), ... , Z(sn), e.g., the
maximum likelihood estimator of 8. When CJ(') is unknown, one often uses

the "plug-in" predictor Zn(SO; en)

as an "approximate" or "estimated" BLUP of Z(so) and T;(e n ) as an esti-


mator of the mean squared prediction error (MSPE). Substituting these in
(12.119) we get the "plug-in" prediction interval of nominal coverage level
(1 - 2a)
(12.121 )

where Zn(So) == Zn(SO; en) and f;


= T;(e n ). Because of the additional ran-
domness introduced in the prediction interval in (a) through the estimator
en, the actual coverage probability of the plug-in prediction interval in (a)
12.6 Resampling Methods for Spatial Prediction 337

is typically different from the nominal level 1 - 2a. In many situations,


a more accurate interval may be constructed by calibrating the plug-in
interval i n (·), where a different value ao is used to ensure

(12.122)

In general, ao also depends on the true parameter value eo and may not be
known. Sjostedt-DeLuna and Young (2003) suggested using a parametric
bootstrap to calibrate the plug-in interval. Let

7f n (a;(h,e 2 ) == POl (Z(so) E i n(a;e 2 )), 0 < a < 1/2, e 1 ,e2 E e,


(12.123)
where in(a; (h) = [Zn(So; e2 ) - Tn (e 2 )Za, Zn(SO; e2 ) + Tn (e 2 )Za]' Thus, in
this notation, in(a; On) = inCa), and relation (12.122) can be rewritten as

(12.124)

where GnU == 7fn ('; eo, On), eo denotes the unknown true value of the pa-
rameter e, On is the estimator used to define the plug-in interval inCa), and
ao is the unknown calibration level that depends on the distribution of On
and on eo. Because z"! is a decreasing function of "Y, it is easy to verify that
Gnh) is a decreasing function of "Y. Hence, ao can be found by inverting
relation (12.124).
Next we generate an estimator of GnU, using the parametric bootstrap
method. Let Z* (so), Z*(Sl), ... ,Z*(sn) be a collection of Gaussian random
variables with mean 0 and covariances

(12.125)

where Cov* denotes the conditional covariance given Z(sd, ... , Z(sn). Let
e~ be defined by replacing Z(sd, ... , Z(Sn) in the definition of On with
Z*(sd, ... , Z*(sn). Similarly, let Z~(so) and T~ be respectively defined by
replacing Z(sd, ... , Z(sn) and On in the definitions of Zn(SO) == Zn(SO; On)
and Tn(On) by Z*(Sl),"" Z*(sn) and e~. Then, for 0 < "Y < 1/2, the
bootstrap estimator G n h) of G n h) is given by

7fn h; On, e~)


Pen ( Z*(so) E [Z~(so) - T~Z,,!, Z~(so) + T~Z,,!])
(12.126)

The parametric bootstrap estimator an of ao of (12.124) is now given by


inverting the relation
(12.127)
338 12. Resampling Methods for Spatial Data

The bootstrap calibrated prediction interval i:;: (0:) of nominal coverage


level (1 - 20:) is In(a n ; Bn ), i.e.,

(12.128)

In practice, one may find an in (12.127) by Monte-Carlo simulation, by


generating B iid collections of Gaussian variables {Z*i(sO),"" Z*i(sn)} of
size (n + 1) with the covariances given by (12.125) for i = 1, ... , B, then
computing e~i based on {Z~i(Sl)"'" Z~i(sn)} for each i, and then solving
(12.127) by replacing GnU with its Monte-Carlo approximation
B

G~h) = B- 1 L::n(z*i(so) E [2;'i(SO) ±T~iZ'Y])' 0 < 'I < 1/2.


i=l

Sjostedt-DeLuna and Young (2003) established superior coverage accuracy


of the bootstrap calibrated prediction interval i:;:
(0:) under some general
conditions on 0'(-) and Bn. They also extend the methodology to the case of
universal kriging, where the underlying spatial process has a nonconstant
mean structure. We refer the interested reader to Sjostedt-DeLuna and
Young (2003) for further discussion and numerical examples.
Appendix A

For easy reference, here we collect some standard results and definitions
from Probability Theory for independent and dependent random variables
that have been used in this monograph. For proofs and further discussions,
see the indicated references. The first set of definitions deal with the basic
convergence notions.

Definition A.I Let X, {Xn }n>l be random variables and a E lR. be a


constant.

(i) {Xn}n>l is said to converge in probability to a, written as Xn ----';p a,


if for any E > 0,
lim P(lXn - al > E) = 0 .
n ..... oo

(ii) Suppose that {Xn }n> 1 and X are defined on the same probability
space (O,F,P). Then, {Xn}n>l is said to converge to X in prob-
ability, denoted by Xn ----';p X, if (Xn - X) ----';p 0, i.e., for any
E > 0,

lim P(lXn -XI> E) = o.


n ..... oo

(iii) Suppose that {Xn}n>l and X are defined on a common probabil-


ity space (0, F, P). Then, {Xn}n~l is said to converge to X almost
surely (with respect to P) if there exists a set A E F such that
P(A) = 0 and

lim Xn(w) = X(w) for all wE A C •


n ..... oo
340 Appendix A.

In this case, we write Xn -+ X as n -+ 00, a.s. (P) or simply,


Xn -+ X a.s. if the probability measure P is clear from the context.

In general, if Xn -+ X a.s., then Xn ------tp X but the converse is false.


A useful characterization of the convergence in probability is the following
(cf. Section 3.3, Chow and Teicher (1997)).

Proposition A.I Let {Xn}n>l, X be random variables defined on a prob-


ability space (0" F, P). Then Xn ------tp X if and only if given any sub-
sequence {nd, there exists a further subsequence {nk} C {nd such that
X nk -+ X as k -+ 00, a.s.

Definition A.2 Let {Xn}n~l' X be a collection ofJRd-valued random vec-


tors. Then, {Xn}n~l is said to converge in distribution to X, written as
Xn ------t d X if
lim P(Xn E A) = P(X E A) (A.l)
n->oo

for any A E B(R d) with P(X E 8A) = 0, where 8A denotes the boundary
of A.

For d = 1, i.e., for random variables, a more familiar definition of "con-


vergence in distribution" is given in terms of the distribution functions
of the random variables. Suppose that X, {Xn}n>l are one dimensional.
Then, Xn ------t d X if and only if -

lim P(Xn :::; x) = P(X :::; x)


n->oo

for all x E JR with P(X = x) = 0, i.e., the distribution function of Xn


converges to that of X at all continuity points of the distribution function
of X. Convergence in distribution ofrandom vectors can be reduced to the
one-dimensional case by considering the set of all linear combinations of
the given vectors, which are one dimensional. More precisely, one has the
following result.

Theorem A.I (Cramer-Wold Device): Let X, {Xn}n>l, be JRd-valued


random vectors. Then, Xn ------t d X if and only if for all t -E JRd, t' Xn ------t d
t'X.
For a proof of this result, see Theorem 29.4 of Billingsley (1995).

The definition of convergence in distribution can be extended to more


general random functions than the random vectors. Let (§, d*) be a Polish
space, i.e., § is a complete and separable metric space with metric d* (cf.
Rudin (1987)) and let S denote the Borel a-field on §, i.e., S = a({G :
G is an open subset of §}). If (0" F, P) is a probability space and X :
0, -+ § is (F, S)-measurable, then X is called an §-valued random variable.
Appendix A. 341

The probability distribution £(X) of X is the induced measure on (§, S),


defined by

£(X)(A) = Po X-l(A) = P(X- l A), A E S .

A sequence {Xn}n~l of§..valued random variables converges in distribution


to an §-valued random variable X, also written as Xn ---'td X, if (A.l)
holds for all A E S with P(X E 8A) = 0, where 8A denotes the boundary
of A. In this case, we also say that £(Xn) converges weakly to £(X). Let
IP's denote the set of all probability measures on §. Then, {£(XnHn~l is
a sequence of elements of the set IP's. Weak convergence of {£(XnHn~l to
£(X) is the same as convergence of the sequence {£(XnHn>l to £(X) in
the following metric on IP's:

e(f-L, v) = inf{ 8> 0 : f-L(A) ~ v(AO) +8 for all A E S} , (A.2)

f-L,v E IP's, where for 8 > 0, AO = {x E §: d*(x,y) ~ 8} is the 8-


neighborhood of the set A in S under the metric d*. The metric e(·;·)
in (A.2) is called the Prohorov metric, which metricizes the topology of
weak convergence in IP's. In particular, for §-valued random variables X,
{Xn}n~l' Xn ---'td X if and only if

e(£(Xn ), £(X)) --t 0 as n --t 00 . (A.3)

For more details and discussion on this topic, see Parthasarathi (1967),
Billingsley (1968), Huber (1981), and the references therein.
The next set of definitions and results relate to the notion of stopping
times and moments of randomly stopped sums, which play an important
role in the analysis of the SB method in Chapters 3-5.

Definition A.3 Let (0, F, P) be a probability space and let {Fn}nEN be a


collection of sub-a-fields of F satisfying Fn C Fn+1 for all n E N. Then
a N U {oo} -valued random variable T on 0 is called a stopping time with
respect to {Fn}nEN if

{T=n}EFn forall nEN. (A.4)


T is called a proper stopping time with respect to {Fn}n~l, if T satisfies
(A.4) and P(T < 00) = 1.
We repeatedly make use of the following result on randomly stopped sums
of iid random variables. For a proof, see Chapter 1, Woodroofe (1982).

Theorem A.2 (Wald's Lemmas): Let {Xn}nEN be a sequence of iid ran-


dom variables and let T be a proper stopping time with respect to an increas-
ing sequence of a-fields {Fn}nEN such that Fn is independent of {Xk; k ;:::
n+l} for each n E N. Suppose that ET < 00. Let Sn = Xl +-. +Xn' n E N,
and define the randomly stopped sum ST by ST = 2:nEN Snll(T = n).
342 Appendix A.

(a) IfEIX 11 <00, then


E(ST) = (ET)(EX1) .

(b) If EX? < 00, then


E(ST - T . EX1? = T· Var(Xd .

The next result is a Strong Law of Large Numbers (SLLN) for indepen-
dent random variables.
Theorem A.3 (SLLN): Let {Xn }n>l be a sequence of iid random vari-
ables with EIX1 1 < 00. Then, -
n
n- 1 EXi -+ EX 1 as n -+ 00, a.s.
i=l
A refinement of Theorem A.3 is given by the following result. For a proof,
see Theorem 5.2.2, Chow and Teicher (1997).
Theorem A.4 (Marcinkiewicz-Zygmund SLLN): Let {Xn}n~l be a se-
quence of iid random variables and p E (0,00). If EIX1 1P < 00, then
n
n- 1/ p E(Xi - c) -+ 0 as n -+ 00, a.s. , (A.5)
i=l
for any c E JR if p E (0,1) and for c = EX1 if P E [1,00). Conversely, if
(A.5) holds for some c E JR, then EIX1 1P < 00.
The next result is a Central Limit Theorem (CLT) for sums of indepen-
dent random vectors with values in JRd. For a proof in the one-dimensional
(Le., d = 1) case, see Theorem 9.1.1 of Chow and Teicher (1997). For d 2: 2
it follows from the one-dimensional case and the Cramer-Wold Device (cf.
Theorem A.1).
Theorem A.5 (Lindeberg's CLT): Let {Xnj : 1 :::; j :::; rn}n~l be a
triangular array where, for each n 2: 1, {Xnj : 1 :::; j :::; rn} is a finite
collection of independent JRd-valued (d E N) random vectors with EXnj = 0
2:;:1
for all 1 :::; j :::; rn and EXnjX~j = lId. Suppose that {Xnj : 1 :::; j :::;
rn}n~l satisfies the Lindeberg's condition:
for every E > 0,
rn
lim "EIIXnjIl2n(IIXnjll > E) = 0 .
n-+oo L..J
(A.6)
j=l
Then,
E X nj ~d N(O, lId)
rn
as n -+ 00 .
j=l
Appendix A. 343

The next result is a version of the Berry-Esseen Theorem for independent


random variables. For a proof, see Theorem 12.4 of Bhattacharya and Rao
(1986), who also give a multivariate version of this result in their Corollary
17.2 where the supremum on the left side of (A. 7) is taken over all Borel-
measurable convex subsets of JR.d, and where the constant (2.75) is replaced
by a different constant C(d) E (0,00).
Theorem A.6 {Berry-Esseen Theorem}: Let X!, ... , Xn be a collection
of n (n E N) independent {but not necessarily identically distributed}
random variables with EXj = 0 and EIXjl3 < 00 for 1 ~ j ~ n. If
a; = n- 1 E]=1 EX] > 0, then

sup
xEIR
Ip(; tX ~ x) - <J>(x) I
V nan j=1
j

< (2.75)
n
~2 t
j=1
(EIXj I3/a!) , (A.7)

where <J>(x) denotes the distribution function of the standard normal distri-
bution on JR..
Next we consider the dependent random variables.
Definition A.4 A sequence of random vectors {XihEZ is called stationary
if for every i 1 < i2 < . . . < i k, kEN, and for every m E Z, the distributions
of(Xi1, ... ,Xik )' and (Xil+m, ... ,Xik+m)' are the same.
Definition A.5 A sequence of random vectors {XihEZ is called m-
dependent for some integer m ~ 0 if a({Xj : j ~ k}) and a({Xj : j ~
k + m + 1}) are independent for all k E Z.
Definition A.6 Let {XihEZ be a sequence of random vectors. Then the
strong mixing or a-mixing coefficient of {XihEZ is defined as

a(n) = sup{IP(A n B) - P(A)P(B)I : A E a({Xj : j ~ k}),


BEa({Xj:j~k+n+1}, kEZ}, nEN.

The sequence {XdiEZ is called strongly mixing {or a-mixing} ifg(n) -+ 0


as n -+ 00.
The next two results are Central Limit Theorems (CLTs) for m-
dependent random variables and strongly mixing random variables. CLTs
for random vectors can be deduced from these results by using the Cramer-
Wold device. For a proof of these results and references to related work,
see Ibragimov and Linnik (1971), Doukhan (1994), and Lahiri (2003b).
Theorem A.7 {CLT for m-Dependent Sequences}: Let {XihEZ be a
sequence of stationary m-dependent random variables for some integer
344 Appendix A.

m :::: 0. If EX? < 00 and O"~ == Var(Xd + 22::::=1 COV(X1' X Hk ) > 0,


then

Theorem A.8 (CLT for Strongly Mixing Sequences): Let {XdiEZ be a


sequence of stationary random variables with strong mixing coefficient a(·).

(i) Suppose that P(IX1 1 :::; c) = 1 for some c E (0, (0) and that
2:::~=1 a(n) < 00. Then

°: :; O"~ ==
00

L COV(X1' X Hk ) < 00. (A.8)


k=-oo

If, in addition, O"~ > 0, then

(A.9)

(ii) Suppose that for some r5 E (0,00), EIX l 2H < 00 and 1

2:::~=1[a(n)l6/2H < 00. Then (A.8) holds. If, in addition, O"~ > 0,
then (A . g) holds.
Appendix B

Proof of Theorem 6.1: Theorem 6.1 is a version of Theorem 20.1 of


Bhattacharya and Rao (1986) for triangular arrays. As a result, here we
give an outline of the proof of Theorem 6.1, highlighting only the necessary
s
modifications. Let 1jn = Xj n(IIXj I1 2 n) and Zjn = 1jn - E1jn, 1 S j S
n. Set SIn = n- I / 2 'L.7=1 Zjn. Then, under (6.25), Lemmas 14.6 and 14.8
of Bhattacharya and Rao (1986) imply that

IEf(Sn) - J fdWn,sl

< IEfn(Sln) - J fndWI,n,sl + C(s, d)unan(s) , (B.1)

where fn(x) = f(x - n- I / 2 'L.7=1 E1jn), x E ~d, and where WI,n,s is ob-
tained from Wn,s by replacing the cumulants of Xl's with those of Zjn's.
Since SIn may not have a density, it is customary to add to SIn a suitably
small random vector that has a density and that is independent of SIn.
The additional noise introduced by this operation is then assessed using
a smoothing inequality. Applying Corollary 11.2 (a smoothing inequality)
and Lemma 11.6 (the inversion formula) of Bhattacharya and Rao (1986),
as in their proof of Theorem 20.1, for any 0 < E < 1, we get

S Ms(f)· C(s,d)
s+d+1
L
1<>1=0
JID<>H~(t)1 exp(-Elltll l / 2 )dt
346 Appendix B.

+ W(2E; in, l\}Il,n,sl) + I J in d (\}Il,n,s - \}Il,n,s+d+l) I


Irn + hn + hn' say, (B.2)
where HJJt) = Eexp(~t'Sln) - J exp(d'x)d\}ll,n,s+d+l(X), t E ]Rd, and
where l\}Il,n,sl denotes the total variation measure corresponding to \}Il,n,s.
Note that here E is the amount of noise introduced through smoothing and
would typically depend on n. Like \}In,s, the signed measure \}Il,n,j has
density with respect to the Lebesgue measure on ]Rd (cf. 6.16)

x E ]Rd, j E N, j ~ 3, where Xe is the v-th cumulant of SIn and 3 =


COV(Sln). Although moments of Sn of order 8 + 1 and higher may not
exist, all moments of SIn exist, as the variables Zjn'S are bounded for all
1 S j S n. This makes 'l/Jl,n,j well defined for j ~ 8 + l.
First we consider Irn. As is customary, we divide the range of integration
into two regions {t : t E JRd,lltll S aln} and {t : t E ]Rd, lit II > aln} for
some suitable constant aln > 0 (to be specified later). For small values of
Iltll (i.e., for Iltll s aln), we use Theorem 9.11 of Bhattacharya and Rao
(1986). By (6.25) and their Corollary 14.2, 3 is nonsingular and

~ S 11311 s ~, 113- 1 11 s ~ .
Let 31 = 3- 1/ 2 and Htn(t) = H),(3 1 t), t E JRd. Then, it is easy to check
that
n
n- 1 '"""' EIIZ·In Ils+d+l <
~ _ C(8
, d)n(d+l)/2 min{p-n,s, p- n,s } ,
j=1

and, uniformly in t E JRd,


max {IDa H~ (t) I : 0 S lad s 8 + d + 1}
S C(8, d) max {1(D a Htn)(3 11 t)1 : 0 sial s 8 + d + I} .

Hence, by Theorem 9.11 of Bhattacharya and Rao (1986), there exists a


constant C 1 (8,d) such that with aln = C 1 (8,d)(u n Pn,s)-I/s+d+l,
s+d+l
L
lal=O
1IItll::;a' n
IDa Hn(t) I exp( -ElltI1 1 / 2 )dt S C(8, d)UnPn,s . (B.4)

Next consider the case where Iltll > aln. Note that for a > 0,

(B.5)
Appendix B. 347

and
(B.6)

kEN. Now using (B.5), (B.6), the definition ofthe polynomials Pr(';') (cf.
(6.17)), and Lemma 9.5 of Bhattacharya and Rao (1986), we get
s+d+1
L
lal=O
l Iltll>al n
IDa Jexp(~t'x)dIJ!1,n,s+d+1(x)1
< C(s, d) r [1 + IltI13(s+d-2)+(S+d+1)]
J11tll>al n

[ ?; n- r / 2(1 + Pn,s)n(r-S)+/2]
S+d+1
X e-31ItI12/8dt

< C(s, d)(l + Pn,s) [a 1'; + a~~+2d)] exp( -3ain/8) . (B.7)


Next, let a2n = (16Pn,3)-1 n 1/2. Then, using Lemma 14.3 of Bhattacharya
and Roo (1986) for a1n :::; Iltll :::; a2n and using the inequality,
IEexp(d' Zjn)1 :::; (}n,j(t), t E ~d, 1:::; j :::; n ,
for IItll > a2n, we get
s+d+1
L
lal=O
lIltll>al n
IDaEexp(d'S1n) I exp(-ElltI1 1/ 2)dt

< C(s, d) r
Ja n:::::lltll:::::a2n
l
(1 + Iltll s +d+1) exp( -511t11 2/24)dt
r
L n- r / 2 II EIZjinl
r=O

X [sup { II IE exp(d'Zjn/yn) I : a2n :::; lit II :::; E- 4 }


ji-it,···,jr
X r
Ja2n:::::lltll:::::c 4
exp( -ElltI1 1/ 2)dt

+ 1· r
J 11 tll>c 4
exp( -ElltI11/2)dt]

< C(s, d)a~~2d exp( -5ain/24)


+ C(s, d) [E- 2d 'Yn(E) + ns+d+1E-8d exp( _E- 1)] . (B.8)

Repeating the arguments in the proof of Theorem 20.1 of Bhattacharya


and Rao (1986) in a similar fashion, it can be shown that

12n :::; C(d, s)Ms(J) [Pn,sUn + (1 + Pn,s){ an(s)Un


348 Appendix B.

+ n[3(s-2)+sl/6 exp( _n 1 / 3 /6) }]


+ C(s, d)(l + ,on,s)W(2E; j, <1» (B.9)

and
(B.10)
Theorem 6.1 now follows from the bounds (B.1), (B.2), (B.4), and (B.7)-
(B.IO).
Proof of Theorem 6.2: Note that by (6.26) and (6.27),
n
Pn,s = n- 3 / 2 LEIIXn,jIlS+1n(IIXn,jll S n 1 / 2 )
j=l
n
< n- 1 LEIIXn,jIISn(IIXn,jll > n 1 / 2 -,,)
j=l
n
+n- 1 / 2 n- 1 LEIIXn,jlls+ln(IIXn,jll S n 1 / 2 -,,)
j=l

and, for n large,


2 n
an (s' "3) < n- 1 LEIIXn,jIISn(IIXn,jll > n 1 / 2 -,,)
j=l
0(1) .

Hence, setting E = 'f/n and applying (6.28) and Theorem 6.1, we get
sup IP(Sn E B) - Wn,s(B)1 = 0(n-(s-2 l /2) + o( sup w(2'f/n; nB, <1») •
BEB BEB

Theorem 6.2 now follows by noting that W(E; nB, <1» = <I> ((8B)€) for all
E > 0 (cf. Corollary 2.6, Bhattacharya and Rao (1986)).
References

Allen, M. and Datta, S. (1999), 'A note on bootstrapping M-estimators in


ARMA models', Journal of Time Series Analysis 20,365-379.

Anderson, T. W. (1971), The Statistical Analysis of Time Series, Wiley,


New York.

Andrews, D. (2002), 'Higher-order improvements of a computationally


attractive k-step bootstrap for extremum estimators', Econometrica
70(1), 119-162.

Arcones, M. and Gine, E. (1989), 'The bootstrap of the mean with arbitrary
bootstrap sample', Annales de l'Institut Henri Poincare 25,457-481.

Arcones, M. and Gine, E. (1991), 'Additions and corrections to "The boot-


strap of the mean with arbitrary sample size"', Annales de l'Institut
Henri Poincare 27, 583-595.

Arcones, M. and Yu, B. (1994), 'Central limit theorems for empirical and
U-processes of stationary mixing sequences', Journal of Theoretical
Probability 7, 47-70.

Athreya, K. B. (1987), 'Bootstrap ofthe mean in the infinite variance case',


The Annals of Statistics 15, 724-731.

Athreya, K. B. and Fukuchi, J. 1. (1997), 'Confidence intervals for endpoints


of a c.d.f. via bootstrap', Journal of Statistical Planning and Inference
58, 299-320.
350 References

Athreya, K. B., Lahiri, S. N. and Wei, W. (1998), 'Inference for heavy tailed
distributions', Journal of Statistical Planning and Inference 66,61-75.

Bai, Z. D. and Rao, C. R. (1991), 'Edgeworth expansion of a function of


sample means', The Annals of Statistics 19, 1295-1315.

Barbe, P. and Bertail, P. (1995), The Weighted Bootstrap, Vol. 98 of Lecture


Notes in Statistics, Springer-Verlag, New York.

Barry, J., Crowder, M. and Diggle, P. (1997), Parametric estimation of


the variogram, Preprint, Department of Mathematics and Statistics,
Lancaster University, United Kingdom.

Bartlett, M. S. (1946), 'On the theoretical specification of sampling prop-


erties of autocorrelated time series' , Journal of the Royal Statistical
Society, Supplement 8, 27-41.

Basawa,1. V., Mallik, A. K., McCormick, W. P., Reeves, J. H. and Tay-


lor, R. L. (1991), 'Bootstrapping unstable first-order autoregressive
processes', The Annals of Statistics 19, 1098-110 1.

Basawa, 1. V., Mallik, A. K., McCormick, W. P. and Taylor, R. L. (1989),


'Bootstrapping explosive autoregressive processes', The Annals of
Statistics 17, 1479-1486.

Beran, J. (1994), Statistics for Long Memory Processes, Chapman and Hall,
London.

Bhattacharya, R. N. and Ghosh, J. K. (1978), 'On the validity of the formal


Edgeworth expansion', The Annals of Statistics 7, 434-451.

Bhattacharya, R. N. and Ghosh, J. K. (1988), 'On moment conditions for


valid formal Edgeworth expansions', Journal of Multivariate Analysis
27,68-79.

Bhattacharya, R. N. and Rao, R. R. (1986), Normal Approximations and


Asymptotic Expansions, R. E. Krieger Publishing Company, Malabar,
FL.

Bickel, P., Gotze, F. and van Zwet, W. (1997), 'Resampling fewer than n
observations: Gains, losses, and remedies for losses', Statistica Sinica
7, 1-31.

Bickel, P. J. and Wichura, M. J. (1971), 'Convergence criteria for multi-


parameter stochastic processes and some applications', The Annals of
Mathematical Statistics 42, 1656-1670.

Billingsley, P. (1968), Convergence of Probability Measures, Wiley, New


York.
References 351

Billingsley, P. (1995), Probability and Measure, Wiley, New York.

Bingham, N. H., Goldie, C. M. and Teugels, J. L. (1987), Regular Variation,


Cambridge University Press, Cambridge, U.K. .

Boos, D. D. (1979), 'A differential for L-statistics', The Annals of Statistics


7,955-959.

Bose, A. (1988), 'Edgeworth correction by bootstrap in autoregressions',


The Annals of Statistics 16, 1709-1722.

Bose, A. (1990), 'Bootstrap in moving average models', Annals of the In-


stitute of Statistical Mathematics 42, 753-768.

Bradley, R. C. (1989), 'A caution on mixing conditions for random fields',


Statistics and Probability Letters 8, 489-491.

Bradley, R. C. (1993), 'Equivalent mixing conditions for random fields',


The Annals of Probability 21, 1921-1926.

Bretagnolle, J. (1983), 'Limit laws for the bootstrap of some special func-
tionals (french)', Annales de l'Institut Henri Poincare, Section B, Cal-
cul des Probabilities et Statistique 19, 281-296.

Brillinger, D. R. (1981), Time Series: Data Analysis and Theory, Holden-


Day Inc., San Francisco.

Brockwell, P. J. and Davis, R. A. (1991), Time Series: Theory and Methods,


2nd edn, Springer-Verlag, New York.

Biihlmann, P. (1994), 'Blockwise bootstrapped empirical process for sta-


tionary sequences', The Annals of Statistics 22,995-1012.

Biihlmann, P. (1997), 'Sieve bootstrap for time series', Bernoulli 3, 123-


148.

Biihlmann, P. (2002), 'Sieve bootstrap with variable-length Markov chains


for stationary categorical time series', Journal of the American Statis-
tical Association 97, 443-471.

Biihlmann, P. and Kiinsch, H. R. (1999b), 'Comments on "Prediction of


spatial cumulative distribution functions using subsampling"', Journal
of the American Statistical Association 94, 97-99.

Bustos, O. H. (1982), 'General M-estimates for contaminated p-th or-


der autoregressive processes; consistency and asymptotic normal-
ity', Zeitschrijt fur Wahrscheinlichkeitstheorie und Verwandte Gebiete
59,491-504.
352 References

Carlstein, E. (1986), 'The use of subseries methods for estimating the vari-
ance of a general statistic from a stationary time series', The Annals
of Statistics 14, 1171-1179.
Carlstein, E., Do, K.-A., Hall, P., Hesterberg, T. and Kiinsch, H. R. (1998),
'Matched-block bootstrap for dependent data', Bernoulli 4,305-328.
Chernick, M. R. (1981a), 'A limit theorem for the maximum of autore-
gressive processes with uniform marginal distributions', The Annals
of Probability 9, 145-149.
Chernick, M. R. (1981b), 'On strong mixing and leadbetter's D condition',
Journal of Applied Probability 18, 764-769.
Chernick, M. R. (1999), Bootstrap Methods: A Practitioner's Guide, Wiley,
New York.
Chibishov, D. M. (1972), 'An asymptotic expansion for the distribution of
a statistic admitting an asymptotic expansion', Theory of Probability
and Its Applications 17, 620-630.
Choi, E. and Hall, P. (2000), 'Bootstrap confidence regions computed from
autoregressions of arbitrary order', Journal of the Royal Statistical
Society, Series B 62,461-477.
Chow, Y. and Teicher, H. (1997), Probability Theory: Independence, In-
terchangeability, Martingales, 2nd edn, Springer-Verlag, Berlin.
Cressie, N. (1985), 'Fitting variogram models by weighted least squares',
Journal of the International Association for Mathematical Geology
17, 693-702.
Cressie, N. (1993), Statistics for Spatial Data, 2nd edn, Wiley, New York.
Dahlhaus, R. (1983), 'Spectral analysis with tapered data', Journal of
Times Series Analysis 4, 163-175.
Dahlhaus, R. (1985), 'Asymptotic normality of spectral estimates', Journal
of Multivariate Analysis 16, 412-43l.
Dahlhaus, R. and Janas, D. (1996), 'A frequency domain bootstrap for ratio
statistics in time series analysis', The Annals of Statistics 24, 1934-
1963.

Datta, S. (1995), 'Limit theory and bootstrap for explosive and partially
explosive autoregression', Stochastic Processes and their Applications
57, 285-304.
Datta, S. (1996), 'On asymptotic properties of bootstrap for AR(l) pro-
cesses', Journal of Statistical Planning and Inference 53,361-374.
References 353

Datta, S. and McCormick, W. P. (1995), 'Bootstrap inference for a first-


order autoregression with positive innovations', Journal of the Amer-
ican Statistical Association 90, 1289-1300.

Datta, S. and McCormick, W. P. (1998), 'Inference for the tail parameters


of a linear process with heavy tail innovations', Annals of the Institute
of Statistical Mathematics 50, 337-359.

Datta, S. and Sriram, T. N. (1997), 'A modified bootstrap for autoregres-


sion without stationarity', Journal of Statistical Planning and Infer-
ence 59, 19-30.

David, M. (1977), Geostatistical Ore Reserve Estimation, Elsevier, Ams-


terdam.

Davis, R. A. (1983), 'Stable limits for partial sums of dependent random


variables', The Annals of Probability 11, 262-269.

Davison, A. C. and Hinkley, D. V. (1997), Bootstrap Methods and Their


Application, Cambridge University Press, Cambridge, UK.

de Haan, L. (1970), On Regular Variation and Its Application to the Weak


Convergence of Sample Extremes, Vol. 32 of Mathematical Centre
Tracts, Mathematical Centre, Amsterdam.

Denker, M. and Jakubowski, A. (1989), 'Stable limit distributions for


strongly mixing sequences', Statistics and Probability Letters 8, 477-
483.

Dennis, J. E. and Schnabel, R. B. (1983), Numerical Methods for Uncon-


strained Optimization and Nonlinear Equations, Prentice-Hall, Engle-
wood Cliffs, NJ.

Deo, C. (1973), 'A note on empirical processes of strong-mixing sequences',


The Annals of Probability 1, 870-875.

Dieudonne, J. (1960), Foundations of Modern Analysis, Academic Press,


New York.

Dobrushin, R. L. (1979), 'Gaussian and their subordinated self-similar ran-


dom fields', The Annals of Probability 7, 1-28.

Dobrushin, R. L. and Major, P. (1979), 'Non-central limit theorems


for non-linear functionals of Gaussian random fields', Zeitschrijt fur
Wahrscheinlichkeitstheorie und Verwandte Gebiete 50, 27-52.

Doukhan, P. (1994), Mixing: Properties and Examples, Vol. 85 of Lecture


Notes in Statistics, Springer-Verlag, New York.
354 References

Efron, B. (1979), 'Bootstrap methods: Another look at the jackknife', The


Annals of Statistics 7, 1-26.

Efron, B. (1982), The Jackknife, the Bootstrap and Other Resampling


Plans, SIAM, Philadelphia.

Efron, B. (1992), 'Jackknife-after-bootstrap standard errors and influence


functions (with discussion)', Journal of Royal Statistical Society, Se-
ries B 54, 83-111.

Efron, B. and Tibshirani, R. (1986), 'Bootstrap methods for standard er-


rors, confidence intervals, and other measures of statistical accuracy',
Statistical Science 1, 54-77.

Efron, B. and Tibshirani, R. (1993), An Introduction to the Bootstrap,


Chapman and Hall, London.

Faraway, J. J. and Jhun, M. (1990), 'Bootstrap choice of bandwidth for


density estimation', Journal of the American Statistical Association
85, 1119-1122.

Feller, W. (1971a), An Introduction to Probability Theory and Its Applica-


tions, Vol. I, Wiley, New York.

Feller, W. (1971b), An Introduction to Probability Theory and Its Applica-


tions, Vol. II, Wiley, New York.

Fernholz, L. T. (1983), Von Mises Calculus for Statistical Functionals,


Vol. 19 of Lecture Notes in Statistics, Springer-Verlag, New York.

Filippova, A. A. and Brunswick, N. A. (1962), 'Mises' theorem on the


asymptotic behavior of functionals of empirical distribution functions
and its statistical applications', Theory of Probability and Its Applica-
tions 7, 24-57.

Findley, D. F. (1986), On bootstrap estimates of forecast mean square


errors for autoregressive processes, in D. M. Allen, ed., 'Computer
Science and Statistics: The Interface', Elsevier, Amsterdam.

Franke, J. and HardIe, W. (1992), 'On bootstrapping kernel spectral esti-


mates', The Annals of Statistics 20, 121-145.

Freedman, D. A. (1981), 'Bootstrapping regression models', The Annals of


Statistics 9, 1218-1228.

Freedman, D. A. (1984), 'On bootstrapping two-stage least-squares esti-


mates in stationary linear models', The Annals of Statistics 12, 827-
842.
References 355

Freedman, D. A. and Peters, S. F. (1984), 'Bootstrapping an economic


model: Some empirical results', Journal of Business and Economic
Statistics 2, 150-158.
Fukuchi, J. 1. (1994), Bootstrapping extremes of random variables, PhD
Dissertation, Iowa State University, Department of Statistics, Ames,
IA.

Fuller, W. A. (1996), Introduction to Statistical Time Series, 2nd edn, Wi-


ley, New York.

Genton, M. G. (1997), Variogram fitting by generalized least squares us-


ing an explicit formula for the covariance structure, Preprint, MIT,
Cambridge, MA.

Gine, E. and Zinn, J. (1989), 'Necessary conditions for the bootstrap of the
mean', The Annals of Statistics 17,684-691.

Gine, E. and Zinn, J. (1990), 'Bootstrapping general empirical measures',


The Annals of Probability 18, 851-869.

Gnedenko, B. V. (1943), 'Sur la distribution limite du terme maximum


dune seriealeatoire', Annals of Mathematics 44, 423-453.

G6tze, F. (1987), 'Approximations for multivariate U-statistics', Journal


of Multivariate Analysis 22, 212-229.
G6tze, F. and Hipp, C. (1983), 'Asymptotic expansions for sums of weakly
dependent random vectors', Zeitschrijt fur Wahrscheinlichkeitstheorie
und Verwandte Gebiete 64, 211-239.

G6tze, F. and Hipp, C. (1994), 'Asymptotic distribution of statistics in


time series', The Annals of Statistics 22, 2062-2088.

G6tze, F. and Kiinsch, H. R. (1996), 'Second-order correctness of the block-


wise bootstrap for stationary observations', The Annals of Statistics
24, 1914-1933.

Guyon, X. (1995), Random Fields on a Network: Modeling, Statistics, and


Applications, Springer-Verlag, Berlin; New York.

Hall, P. (1985), 'Resampling a coverage pattern', Stochastic Processes and


Their Applications 20, 231-246.

Hall, P. (1987), 'Edgeworth expansion for Student's t statistic under mini-


mal moment conditions', The Annals of Probability 15, 920-931.

Hall, P. (1992), The Bootstrap and Edgeworth Expansion, Springer-Verlag,


New York.
356 References

Hall, P. (1997), Defining and measuring long-range dependence, in C. D.


Cutler and D. T. Kaplan, eds, 'Nonlinear Dynamics and Time Series',
Vol. 11, Fields Institute Communications, pp. 153-160.
Hall, P. and Horowitz, J. L. (1996), 'Bootstrap critical values for tests
based on generalized-method-of-moments estimators', Econometrica
64, 891-916.
Hall, P., Horowitz, J. L. and Jing, B.-Y. (1995), 'On blocking rules for the
bootstrap with dependent data', Biometrika 82, 561-574.
Hall, P. and Jing, B.-Y. (1996), 'On sample re-use methods for dependent
data', Journal of the Royal Statistical Society, Series B 58,727-738.
Hall, P., Jing, B.-Y. and Lahiri, S. N. (1998), 'On the sampling window
method under long range dependence', Statistica Sinica 8, 1189-1204.
Hall, P., Lahiri, S. N. and Polzehl, J. (1995), 'On bandwidth choice in
nonparametric regression with both short- and long-range dependent
errors', The Annals of Statistics 23, 2241-2263.
Hall, P., Lahiri, S. N. and Truong, Y. K. (1995), 'On bandwidth choice
for density estimation with dependent data', The Annals of Statistics
23, 2241-2263.
Hall, P. and Wang, Q. (2003), 'Exact convergence rate and leading term
in Central Limit Theorem for Student's t-statistic', The Annals of
Probability. (To appear).
HardIe, W. and Bowman, A. (1988), 'Bootstrapping in nonparametric re-
gression: Local adaptive smoothing and confidence bands', Journal of
the American Statistical Association 83, 100-110.
Heimann, G. and Kreiss, J.-P. (1996), 'Bootstrapping general first order
autoregression', Statistics and Probability Letters 30, 87-98.
Helmers, R. (1991), 'On the Edgeworth expansion and the bootstrap ap-
proximation for a studentized U-statistic', The Annals of Statistics
19, 470-484.
Hesterberg, T. C. (1997), Matched-block bootstrap for long memory pro-
cesses, Technical Report 66, Research Department, MathSoft, Inc,
Seattle, WA.
Hipp, C. (1985), 'Asymptotic expansions in the central limit theorem for
compound and Markov processes', ZeitschriJt fUr Wahrscheinlichkeit-
stheorie und Verwandte Gebiete 69, 361-385.
Hsing, T. (1991), 'On tail estimation using dependent data', The Annals
of Statistics 19,1547-1569.
References 357

Huber, P. (1981), Robust Statistics, Wiley, New York.

Hurvich, C. M. and Zeger, S. (1987), Frequency domain bootstrap methods


for time series, Statistics and Operations Research Working Paper,
New York University, New York.

Ibragimov, 1. A. and Hasminskii, R. Z. (1980), 'On nonparametric esti-


mation of regression', Soviet Mathematics (Doklady Akademii Nauk)
21, 810-814.

Ibragimov, 1. A. and Linnik, Y. V. (1971), Independent and Stationary


Sequences of Random Variables, Wolters-Noordhoff, Groningen.

Ibragimov,1. A. and Rozanov, Y. A. (1978), Gaussian Random Processes,


Springer-Verlag, New York.

Inoue, A. and Kilian, L. (2002), 'Bootstraping autoregressive processes with


possible unit roots', Econometrica 70(1),377-391.

Inoue, A. and Shintani, M. (2001), Bootstrapping GMM estimators for


time series, Working paper, Department of Agricultural and Resource
Economics, North Carolina State University, Raleigh, NC.

Jakubowski, A. and Kobus, M. (1989), 'a-stable limit theorems for sums of


dependent random vectors', Journal of Multivariate Analysis 29,219-
251.

Janas, D. (1993), 'A smoothed bootstrap estimator for a studentized sample


quantile', Annals of the Institute of Statistical Mathematics 45, 317-
329.

Janas, D. (1994), 'Edgeworth expansions for spectral mean estimates with


applications to whittle estimates', Annals of the Institute of Statistical
Mathematics 46, 667-682.

Jensen, J. L. (1989), 'Asymptotic expansions for strongly mixing Harris


recurrent Markov chains', Scandinavian Journal of Statistics 16, 47-
63.

Journel, A. G. and Huijbregts, C. J. (1978), Mining Geostatistics, Academic


Press, London.

Kaiser, M. S., Hsu, N.-J., Cressie, N. and Lahiri, S. N. (1997), 'Inference


for spatial processes using subsampling: A simulation study', Environ-
metrics 8, 485-502.

Kallenberg, O. (1976), Random Measures, Akademie Verlag, Berlin.


358 References

Kendall, M. and Stuart, A. (1977), The Advanced Theory of Statistics.


Vol. 1: Distribution Theory (4th edn) , Charles Griffin & Co, High
Wycombe, UK.

Kreiss, J. P. (1987), 'On adaptive estimation in stationary ARMA pro-


cesses', The Annals of Statistics 15, 112-133.

Kreiss, J. P. and Franke, J. (1992), 'Bootstrapping stationary autoregressive


moving-average models', Journal of Time Series Analysis 13, 297-317.

Kreiss, J. P. and Paparoditis, E. (2003), 'Autoregressive aided periodogram


bootstrap for time series', Annals of Statistics 31. (In press).

Kiinsch, H. R. (1989), 'The jackknife and the bootstrap for general station-
aryobservations', The Annals of Statistics 17,1217-1261.

Lahiri, S. N. (1991), 'Second order optimality of stationary bootstrap',


Statistics and Probability Letters 11, 335-341.
Lahiri, S. N. (1992a), Edgeworth correction by 'moving block' bootstrap for
stationary and nonstationary data, in R. Lepage and L. Billard, eds,
'Exploring the Limits of Bootstrap', Wiley, New York, pp. 183-214.
Lahiri, S. N. (1992b), 'Bootstrapping M-estimators of a multiple linear
regression parameter', The Annals of Statistics 20, 1548-1570.

Lahiri, S. N. (1992c), 'On bootstrapping M-estimators', Sankhyii, Series A,


Indian Journal of Statistics 54, 157-170.
Lahiri, S. N. (1993a), 'Refinements in the asymptotic expansions for sums of
weakly dependent random vectors', The Annals of Probability 21,791-
799.

Lahiri, S. N. (1993b), 'On the moving block bootstrap under long range
dependence', Statistics and Probability Letters 18, 405-413.

Lahiri, S. N. (1994), 'Two term Edgeworth expansion and bootstrap ap-


proximation for multivariate studentized M-estimators', Sankhyii, Se-
ries A 56, 201-226.
Lahiri, S. N. (1995), 'On the asymptotic behaviour of the moving block
bootstrap for normalized sums of heavy-tail random variables', The
Annals of Statistics 23, 1331-1349.
Lahiri, S. N. (1996a), 'Asymptotic expansions for sums of random vectors
under polynomial mixing rates', Sankhyii, Series A 58, 206-225.

Lahiri, S. N. (1996b), 'On edgeworth expansion and moving block boot-


strap for studentized m-estimators in multiple linear regression mod-
els', Journal of Multivariate Analysis 56, 42-59.
References 359

Lahiri, S. N. (1996c), 'On inconsistency of estimators based on spatial data


under infill asymptotics', Sankhyii, Series A 58, 403-417.

Lahiri, S. N. (1996d), Empirical choice of the optimal block length for


block bootstrap methods, Preprint, Department of Statistics, Iowa
State University, Ames, IA.

Lahiri, S. N. (1999a), 'Theoretical comparisons of block bootstrap meth-


ods', The Annals of Statistics 27, 386-404.

Lahiri, S. N. (1999b), 'Asymptotic distribution of the empirical spatial cu-


mulative distribution function predictor and prediction bands based
on a subsampling method', Probability Theory and Related Fields
114,55-84.

Lahiri, S. N. (1999c), On second order properties of the stationary boot-


strap method for studentized statistics, in S. Ghosh, ed., 'Asymp-
toties, Nonparametrics, and Time Series', Marcel Dekker, New York,
pp. 683-712.

Lahiri, S. N. (1999d), Resampling methods for spatial prediction, in 'Com-


puting Science and Statistics. Models, Predictions, and Computing',
Vol. 31 of Proceedings of the 31st Symposium on the Interface, pp. 462-
466.

Lahiri, S. N. (2001), 'Effects of block lengths on the validity of block re-


sampling methods', Probability Theory and Related Fields 121,73-97.

Lahiri, S. N. (2002a), 'On the jackknife after bootstrap method for depen-
dent data and its consistency properties', Econometric Theory 18,79-
98.

Lahiri, S. N. (2002b), 'Comments on "Sieve bootstrap with variable length


Markov chains for stationary categorical time series'" , Journal of the
American Statistical Association 97, 460-462.

Lahiri, S. N. (2003a), 'A necessary and sufficient condition for asymptotic


independence of discrete fourier transforms under short and long range
dependence', Annals of Statistics 31,613-641.

Lahiri, S. N. (2003b), 'Central limit theorems for weighted sums under


some stochastic and fixed spatial sampling designs', Sankhyii, Series
A . (In press).

Lahiri, S. N. (2003c), 'Consistency of the jackknife-after-bootstrap variance


estimator for the bootstrap quantiles of a studentized statistic', Annals
of Statistics. (To appear).
360 References

Lahiri, S. N. (2003d), Validity of a block bootstrap method for irregularly


spaced spatial data under nonuniform stochastic designs, Preprint,
Department of Statistics, Iowa State University, Ames, IA.
Lahiri, S. N., Furukawa, K. and Lee, Y.-D. (2003), A nonparametric plug-in
rule for selecting the optimal block length for block bootstrap methods,
Preprint, Department of Statistics, Iowa State University, Ames, IA.
Lahiri, S. N., Kaiser, M. S., Cressie, N. and Hsu, N.-J. (1999), 'Prediction
of spatial cumulative distribution functions using subsampling (with
discussion)', Journal of the American Statistical Association 94, 86-
110.
Lahiri, S. N., Lee, Y.-D. and Cressie, N. (2002), 'Efficiency of least squares
estimators of spatial variogram parameters', Journal of Statistical
Planning and Inference 3, 65-85.
Leadbetter, M. R. (1974), 'On extreme values in stationary sequences',
ZeitschriJt fuer Wahrscheinlichkeitstheorie und Verwandte Gebiete
28, 289-303.
Leadbetter, M. R., Lindgren, G. and Rootzen, H. (1983), Extremes and Re-
lated Properties of Random Sequences and Processes, Springer-Verlag,
Berlin.
Lee, Y.-D. and Lahiri, S. N. (2002), 'Least squares variogram fitting by
spatial subsampling', Journal of the Royal Statistical Society, Series
B 64, 837-854.
Liu, R. Y. and Singh, K. (1992), Moving blocks jackknife and bootstrap
capture weak dependence, in R. Lepage and L. Billard, eds, 'Exploring
the Limits of the Bootstrap', Wiley, New York, pp. 225-248.
Loynes, R. M. (1965), 'Extreme values in uniformly mixing stationary
stochastic processes', The Annals of Mathematical Statistics 36, 993-
999.
Malinovskii, V. K. (1986), 'Limit theorems for Harris-Markov chains,!',
Theory of Probability and Its Applications 31, 269-285.
Mammen, E. (1992), When Does Bootstrap Work? Asymptotic Results and
Simulations, Vol. 77 of Lecture Notes in Statistics, Springer-Verlag,
New York.
Martin, R. D. and Yohai, V. J. (1986), 'Influence functionals for time series
(with discussion)', The Annals of Statistics 14, 781-855.
Matheron, G. (1962), Traite de geostatistique appliquee, Tome I, Vol. 14 of
Memoires du Bureau de Recherches Geologiques et Minieres, Editions
Technip, Paris.
References 361

Mathew, G. and McCormick, W. P. (1998), 'A bootstrap approximation to


the joint distribution of sum and maximum of a stationary sequence',
Journal of Statistical Planning and Inference 70, 287-299.

Meijia, J. M. and Rodriguez-Iturbe, I. (1974), 'On the synthesis of random


field sampling from the spectrum: An application to the generation of
hydrologic spatial processes', Water Resources Research 10,705-711.

Miller, R. G. (1974), 'The jackknife - A review', Biometrika 61, 1-15.

Milnor, J. W. (1965), Topology From the Differentiable Viewpoint, Univer-


sity Press of Virginia, Charlottesville.

Morris, M. D. and Ebey, S. F. (1984), 'An interesting property ofthe sample


mean under a first-order autoregressive model', The American Statis-
tician 38, 127-129.

Nadaraya, E. A. (1964), 'On estimating regression', Theory of Probability


and Its Applications 9, 141-142.

Naik-Nimbalkar, U. V. and Rajarshi, M. B. (1994), 'Validity of blockwise


bootstrap for empirical processes with stationary observations', The
Annals of Statistics 22, 980-994.

Nordgaard, A. (1992), Resampling a stochastic process using a bootstrap


approach, in K. H. Jackel, G. Rothe and W. Sendler, eds, 'Bootstrap-
ping and Related Techniques', Lecture Notes in Economics and Math-
ematical Systems, Springer-Verlag, Berlin, p. 376.

Nordman, D. and Lahiri, S. N. (2003a), 'On optimal spatial subsample size


for variance estimation', Annals of Statistics. (To appear).

Nordman, D. and Lahiri, S. N. (2003b), On optimal block size for a spa-


tial block bootstrap method, Preprint, Department of Statistics, Iowa
State University, Ames, IA.

Paparoditis, E. and Politis, D. N. (2001), 'Tapered block bootstrap',


Biometrika 88(4), 1105-1119.

Paparoditis, E. and Politis, D. N. (2002), 'The tapered block bootstrap


for general statistics from stationary sequences', The Econometrics
Journal 5(1), 131-148.

Parthasarathi, K. R. (1967), Probability Measures on Metric Spaces, Aca-


demic Press, San Diego, CA.

Peligrad, M. (1982), 'Invariance principles for mixing sequences of random


variables', The Annals of Probability 12, 968-981.
362 References

Peligrad, M. (1998), 'On the blockwise bootstrap for empirical processes


for stationary sequences', The Annals of Probability 26, 877-901.

Peligrad, M. and Shao, Q.-M. (1995), 'Estimation of the variance of partial


sums for p-mixing random variables', Journal of Multivariate Analysis
52, 140-157.

Petrov, V. V. (1975), Sums of Independent Random Variables, Springer-


Verlag, New York.

Politis, D. N., Paparoditis, E. and Romano, J. P. (1998), 'Large sample


inference for irregularly spaced dependent observations based on sub-
sampling', Sankhya, Series A, Indian Journal of Statistics 60, 274-
292.

Politis, D. N., Paparoditis, E. and Romano, J. P. (1999), Resampling


marked point processes, in S. Ghosh, ed., 'Multivariate Analysis, De-
sign of Experiments, and Survey Sampling: A Tribute to J. N. Srivas-
tava', Mercel Dekker, New York, pp. 163-185.

Politis, D. N. and Romano, J. P. (1992a), 'A general resampling scheme


for triangular arrays of a-mixing random variables with application to
the problem of spectral density estimation', The Annals of Statistics
20, 1985-2007.

Politis, D. N. and Romano, J. P. (1992b), A circular block res amp ling pro-
cedure for stationary data, in R. Lepage and L. Billard, eds, 'Exploring
the Limits of Bootstrap', Wiley, New York, pp. 263-270.

Politis, D. N. and Romano, J. P. (1993), 'Nonparametric resampling for


homogeneous strong mixing random fields', Journal of Multivariate
Analysis 47, 301-328.

Politis, D. N. and Romano, J. P. (1994a), 'Large sample confidence re-


gions based on subsamples under minimal assumptions', The Annals
of Statistics 22, 2031-2050.

Politis, D. N. and Romano, J. P. (1994b), 'The stationary bootstrap', Jour-


nal of the American Statistical Association 89, 1303-1313.

Politis, D. N. and Romano, J. P. (1995), 'Bias-corrected nonparametric


spectral estimation', Journal of Time Series Analysis 16, 67-103.

Politis, D. N., Romano, J. P. and Wolf, M. (1999), Subsampling, Springer-


Verlag, Berlin; New York.

Politis, D. N. and White, H. (2003), Automatic block-length selection for


the dependent bootstrap, Preprint, Department of Mathematics, Uni-
versity of California at San Diego, LaJolla, CA.
References 363

Possolo, A. (1991), Subsampling a random field, in A. Possolo, ed., 'Spa-


tial Statistics and Imaging', IMS Lecture Notes Monograph Series, 20,
Institute of Mathematical Statistics, Hayward, CA, pp. 286-294.

Priestley, M. B. (1981), Spectral Analysis and Time Series, Vol. I, Academic


Press, London.
Radulovic, D. (1996), 'The bootstrap of the mean for strong mixing se-
quences under minimal conditions', Statistics and Probability Letters
28,65-72.
Rao, R. R. (1962), 'Relations between weak and uniform convergence of
measures with applications', The Annals of Mathematical Statistics
33, 659-680.
Reeds, J. A. (1976), On the definition of von Mises functionals, PhD thesis,
Harvard University, Cambridge, MA.

Ren, J.-J. and Sen, P. K. (1991), 'On hadamard differentiability of extended


statistical functional', Journal of Multivariate Analysis 39, 30-43.

Ren, J.-J. and Sen, P. K. (1995), 'Hadamard differentiability on D[O, l]P',


Journal of Multivariate Analysis 55, 14-28.
Resnick, S. and Starcia, C. (1998), 'Tail index estimation for dependent
data', The Annals of Applied Probability 8, 1156-1183.
Rice, J. (1984), 'Bandwidth choice for nonparametric regression', The An-
nals of Statistics 12, 1215-1230.
Romano, J. P. (1988), 'Bootstrapping the mode', Annals of the Institute of
Statistical Mathematics 40, 565-586.
Rudin, W. (1987), Real and Complex Analysis, McGraw Hill, New York.

Sakov, A. and Bickel, P. J. (1999), Choosing m in the m out of n bootstrap,


in 'ASA Proceedings of the Section on Bayesian Statistical Science',
pp. 125-128.

Sakov, A. and Bickel, P. J. (2000), 'An Edgeworth expansion for the


m out of n bootstrapped median', Statistics and Probability Letters
49(3), 217-223.
Samur, J. D. (1984), 'Convergence of sums of mixing triangular arrays
for random vectors with stationary laws', The Annals of Probability
12, 390-426.

Sen, P. K. (1974), 'Weak convergence of multidimensional empirical pro-


cesses for stationary ¢-mixing processes', The Annals of Probability
2, 147-154.
364 References

Serfiing, R. J. (1980), Approximation Theorems of Mathematical Statistics,


Wiley, New York.

Shao, J. and Th, D. (1995), The Jackknife and Bootstrap, Springer-Verlag,


New York.

Sherman, M. (1996), 'Variance estimation for statistics computed from spa-


tial lattice data', Journal of the Royal Statistical Society, Series B
58, 509-523.

Sherman, M. and Carlstein, E. (1994), 'Nonparametric estimation of the


moments of a general statistic computed from spatial data', Journal
of the American Statistical Association 89, 496-500.

Shinozuka, M. (1971), 'Simulation of multivariate and multidimensional


random processes', Journal of the Acoustical Society of America
49, 357-367.

Shorack, G. R. (1982), 'Bootstrapping robust regression', Communications


in Statistics, Part A - Theory and Methods 11, 961-972.
Singh, K. (1981), 'On the asymptotic accuracy of the Efron's bootstrap',
The Annals of Statistics 9, 1187-1195.

Sjostedt-DeLuna, S. and Young, S. (2003), 'The bootstrap and Kriging


prediction intervals', Scandinavian Journal of Statistics 30, 175-192.

Skovgaard, I. M. (1981), 'Transformation of an Edgeworth expansion by


a sequence of smooth functions', Scandinavian Journal of Statistics
8,207-217.

Statulevicius, V. (1969a), 'Limit theorems for sums of random variables


that are connected in a Markov chain, 1', Litovsk Mat Sb 9, 345-362.

Statulevicius, V. (1969b), 'Limit theorems for sums of random variables


that are connected in a Markov chain, II', Litovsk Mat Sb 9,635-672.

Statulevicius, V. (1970), 'Limit theorems for sums of random variables that


are connected in a Markov chain, III', Litovsk Mat Sb 10, 161-169.

Stein, M. L. (1987), 'Minimum norm quadratic estimation of spatial vari-


ograms', Journal of the American Statistical Association 82,765-772.

Stein, M. L. (1989), 'Asymptotic distributions of minimum norm quadratic


estimators of the covariance function of a Gaussian random field', The
Annals of Statistics 17, 980-1000.

Swanepoel, J. and van Wyk, J. (1986), 'The bootstrap applied to power


spectral density function estimation', Biometrika 73, 135-141.
References 365

Taqqu, M. S. (1975), 'Weak convergence to fractional Brownian motion and


to the Rosenblatt process', Zeitschrijt fUr Wahrscheinlichkeitstheorie
und Verwandte Gebiete 31, 287-302.
Taqqu, M. S. (1977), 'Law of the iterated logarithm for sums of non-
linear functions of Gaussian variables that exhibit a long range de-
pendence', Zeitschrijt fur Wahrscheinlichkeitstheorie und Verwandte
Gebiete 40, 203-238.
Taqqu, M. S. (1979), 'Convergence of integrated processes of arbitrary Her-
mite rank', Zeitschrijt fur Wahrscheinlichkeitstheorie und Verwandte
Gebiete 50, 53-83.
Taylor, C. C. (1989), 'Bootstrap choice of the smoothing parameter in
kernel density estimation', Biometrika 76, 705-712.
van der Vaart, A. W. and Wellner, J. A. (1996), Weak Convergence and
Empirical Processes: With Applications to Statistics, Springer-Verlag
Inc, Berlin; New York.

von Mises, R. (1947), 'On the asymptotic distribution of differentiable sta-


tistical functions', The Annals of Mathematical Statistics 18, 309-348.
Wallace, D. L. (1958), 'Asymptotic approximations to distributions', The
Annals of Mathematical Statistics 29, 635-654.
Watson, G. S. (1964), 'Smooth regression analysis', Sankhyii, Series A
26, 359-372.
Woodroofe, M. B. (1982), Nonlinear Renewal Theory in Sequential Analy-
sis, SIAM, Philadelphia.
Wu, C. F. J. (1990), 'On the asymptotic properties of the jackknife his-
togram', The Annals of Statistics 18,1438-1452.
Yoshihara, K. (1975), 'Weak convergence of multidimensional empirical
processes for strong mixing sequences of stochastic vectors', Zeitschrijt
fUr Wahrscheinlichkeitstheorie und Verwandte Gebiete 33, 133-137.
Zhang, X., van Eijkeren, J. and Heemink, A. (1995), 'On the weighted
least-squares method for fitting a semivariogram model', Computers
and Geosciences 21, 605-608.
Zhu, J. and Lahiri, S. N. (2001), Weak convergence of blockwise boot-
strapped Empirical processes for stationary random fields with sta-
tistical applications, Preprint, Department of Statistics, Iowa State
University, Ames, IA.
Zhurbenko, 1. G. (1972), 'On strong estimates of mixed semiinvariants of
random processes', Siberian mathematics Joumal13, 202-213.
366 References

Zygmund, A. (1968), Trigonometric Series, Vol. I, II, Cambridge University


Press, Cambridge, U.K.
Author Index

Allen, M., 217, 219-220 Bretagnolle, J., 211


Anderson, T. W., 42 Brillinger, D. R, 224
Andrews, D.W.K., 173 Brockwell, P. J., 75-76, 157, 200,
Arcones, M., 18, 20, 92, 266, 268 215, 236-237
Athreya, K. B., 18, 211, 267-268, Brunswick, N. A., 94
279 Biihlmann, P., 42, 93, 182, 284
Bustos, O. H., 81
Bai, Z. D., 163
Barbe, Ph., 20 Carlstein, E., 4, 25, 30, 37, 127,
Barry, J., 308 246, 284
Bartlett, M. S., 61 Chernick, M. R, 20, 271-272
Basawa, 1. V., 24, 210 Chibishov, D. M., 163
Beran, J., 241, 257 Choi, E., 203
Bertail, P., 20 Chow, Y., 270
Bhattacharya, R N., 55-56, 73, Cressie, N., 24, 282, 289, 307-309,
147-150, 162 316, 321, 334-335
Bickel, P. J., 18, 37, 92, 211-212, Crowder, M., 308
268
Billingsley, P., 92-93, 138, 153, Dahlhaus, R, 221, 223, 225-227
210, 303-304, 306 Datta, S., 24, 202, 207-208, 210-
Bingham, N. H., 252 212,217,219-220,267-
Boos, D. D., 94, 110 268
Bose, A., 24, 158, 202-203, 217 David, M., 308
Bowman, A., 230 Davis, R. A., 75-76, 157, 200,
Bradley, R C., 295 215, 236-237, 262
368 Author Index

Davison, A. C., 20, 190 HardIe, W., 221, 225, 227-230,


de Haan, L., 272 233, 235
Denker, M., 262 Hasminskii, R Z., 129
Dennis, J. E., 309 Heemink, A., 308
Deo, C., 92 Heimann, G., 24, 211
Dieudonne, J., 95 Helmers, R, 164
Diggle, P., 308 Hesterberg, T. C., 127, 246
Do, K-A., 127, 246 Hinkley, D. V., 20, 190
Dobrushin, R L., 243 Hipp, C., 146, 154, 156, 158, 179
Doukhan, P., 46, 99, 295-296 Horowitz, J. L., 125, 127, 173,
178,182-184,186,258
Ebey, S. F., 282 Hsing, T., 267
Efron, B., 1-2, 17, 20-21, 24,37, Hsu, N.-J., 334
40, 146, 171, 187, 190, Huber, P., 94-95
192, 202, 208, 211-212, Huijbregts, C. J., 308
219, 225-226, 265-266, Hurvich, C. M., 40,225
268, 274, 276
Ibragimov, 1. A., 129, 256, 262
Faraway, J. J., 230
Inoue, A., 173, 210
Feller, W., 35, 244, 261-263, 269
Fernholz, L. T., 94, 109, 305
Jakubowski, A., 262
Filippova, A. A., 94
Janas, D., 158,221,223,225-227
Findley, D. F., 202
Jensen, J. L., 154
Franke, J., 24, 217, 220-221,225,
Jhun, M., 230
227-229, 233, 235
Jing, B.- Y., 37, 125, 127, 173,
Freedman, D. A., 23-24, 82, 199,
178, 182-184, 186, 253,
202
255, 258
Fukuchi, J. 1., 18, 211, 268, 275-
Journel, A. G., 308
276,279
Fuller, W. A., 209
Kaiser, M. S., 334
Furukawa, K, 182, 186, 194
Kallenberg, 0., 269
Genton, M. G., 308 Kilian, L., 210
Ghosh, J. K, 73, 162 Kobus, M., 262
Gine, E., 18, 20, 266, 268 Kreiss, J. P., 24, 211, 216-217,
Gnedenko, B. V.,272 220-221, 235, 239-240
Goldie, C. M., 252 Kiinsch, H. R, 25, 28, 56, 122,
Gotze, F., 18, 37, 146, 154, 156, 127,166,171-172,178,
158, 164, 166, 171-172, 182, 190, 208, 243, 246,
179,211 284
Guyon, X., 321
Lahiri, S. N., 23, 34, 41-42, 55,
Hall, P., 20, 25, 37, 73, 125, 127, 82, 124-127, 155, 158,
151, 164, 173, 178, 182- 160, 164, 167, 170, 173,
184, 186, 203, 230, 241, 179, 182, 186-187, 194,
246, 253, 255, 258, 284 211, 219, 224, 230, 245,
Author Index 369

253, 255, 258, 265, 267- Reeves, J. H., 24, 210


268, 270, 282, 284, 300, Ren, J.-J., 305
303, 308-310, 315-317, Resnick, S., 267
322, 326, 329-330, 333- Rice, J., 233
334 Rodriguez-Iturbe, 1., 315
Leadbetter, M. R, 271, 273-274 Romano, J. P., 28, 32, 34, 37, 125,
Lee, Y.-D., 182, 186, 194, 308- 173,230,284
310,315,317 Rootzen, H., 273-274
Lindgren, G., 273-274 Rozanov, Y. A., 256
Linnik, Y. V., 262
Liu, R Y., 22, 25, 99, 190, 243 Sakov, A., 212, 268
Loynes, R M., 272 Samur, J. D., 262
Schnabel, R B., 309
Major, P., 243 Sen, P. K., 92, 305
Malinovskii, V. K., 154 Serfiing, R. J., 94, 109
Mallik, A. K., 24, 210 Shao, J., 20, 190
Mammen, E., 20 Shao, Q.-M., 56
Mathew, G., 279 Sherman, M., 284, 316
McCormick, W. P., 24, 210, 212, Shinozuka, M., 315
267-268, 279 Shintani, M., 173
Meijia, J. M., 315 Shorack, G. R, 23, 82, 325
Miller, R G., 190 Singh, K., 21-22, 25,55, 99, 146,
Morris, M. D., 282 190, 243, 266
Sjostedt-DeLuna, S., 337-338
Naik-Nimbalkar, U. V., 94 Skovgaard, I. M., 163
Nordgaard, A., 225 Sriram, T. N., 24, 202
Nordman, D., 300, 316 Statulevicius, V., 154
Stein, M. L., 282
Paparoditis, E., 127, 221, 235, Starcia, C., 267
239-240, 284 Swanepoel, J.W.H., 24, 202
Parthasarathi, K. R, 55, 93, 210
Taqqu, 11. S., 243, 246-247, 256
Peligrad, M., 56, 94, 264
Taylor, R. L., 24, 210
Peters, S. F., 202
Teicher, H., 270
Petrov, V. V., 147, 151
Teugels, J. L., 252
Politis, D. N., 28,32, 34, 37, 125-
Tibshirani, R, 20, 24, 190, 202
127, 173,284
Truong, Y. K., 230
Polzehl, J., 230
Th, D., 20, 190
Possolo, A., 284
Priestley, M. B., 61, 235 van der Vaart, A. W., 305-306
van Eijkeren, J., 308
Radulovic, D., 56 van Wyk, J.W.y', 24, 202
Rajarshi, M. B., 94 van Zwet, W., 18, 37, 211
Rao, C. R, 163 von Mises, R, 94
Rao, R Ranga, 55-56, 147-150
Reeds, J. A., 94, 305 Wang, Q., 164
370 Author Index

Wei, W., 267


Wellner, J. A., 305-306
White, H., 126
Wichura, M. J., 92
Wolf, M., 284
Wu, C.F.J., 190

Yoshihara, K., 92
Young, S., 337-338
Yu, B., 92

Zeger, S., 40, 225


Zhang, X., 308
Zhu, J., 284,303
Zhurbenko, I. G., 59
Zinn, J., 268
Zygmund, A., 255
Subject Index

ARMA bootstrap, 217, 220 calibration, 338


ARMA process, 7, 81, 99, 104, circular (See Circular boot-
110, 116, 157, 200, 214, strap)
216 confidence intervals, 9, 108,
Associated iid sequence, 263, 271, 112, 204, 212, 266, 268
276 distribution function estima-
Asymptotic efficiency, 309, 312, tion, 8, 54, 105, 112,
318 127, 179, 181, 185-186,
Asymptotic relative efficiency, 204, 207, 292, 327, 335
123-124, 126-127 Frequency domain (See Fre-
Autoregressive bootstrap, 199, quency domain boot-
203, 208-209, 211 strap)
Autoregressive process, 24, 42, generalized block, 32, 39
75, 116, 127, 200, 205, IID, 18, 171, 192, 208, 225,
209, 215, 236 265-266, 268, 274, 276-
277
Best linear unbiased predictor, m out of n, 211, 213, 245, 266
335-336 matched block, 127
Bias, 37, 119, 121, 230, 300 naive, 29, 104
estimation, 38, 120-122,127, nonoverlapping
178, 188 (See Nonoverlapping
Block Jackknife, 189 block bootstrap)
Bootstrap ordinary, 29, 81, 103
blocks of blocks, 28, 105, overlapping (See Moving
287-288 block bootstrap)
372 Subject Index

prediction interval, 338 Extreme value distributions, 272-


principle, 2 273
quantiles, 8,19, 107, 127, 304
sieve (See Sieve bootstrap) Finite Fourier transform, 222, 227
stationary (See Stationary Frequency domain bootstrap, 40,
bootstrap) 225, 227, 230, 232
tapered block, 127
transformation based, 40 Gaussian process, 158, 225, 242,
252, 288, 302, 315, 329,
Characteristic function, 147, 232, 335
262, 265, 269 Generalized method of moments,
Circular bootstrap, 32, 33, 48, 54, 173
77, 102, 105, 116, 122,
124, 126, 128, 245 Heavy tail, 261, 268
Conditional Slutsky's Theorem, Hermite polynomials, 151, 243
77
Consistency, 49, 55, 58, 63, 96, Inconsistency, 210-211, 244, 270,
189, 196, 207, 253-254, 276
275, 312, 326 Increasing domain asymptotics,
Cramer's condition, 152-153, 282, 321, 323, 326
156-157, 163, 202, 227 Infill asymptotics, 282, 320-321,
Cumulants, 59, 129, 147-148, 323, 326
159, 161, 165 Infinitely divisible distributions,
Delta method, 161 262, 264
Differentiable functionals, 109 Intrinsic stationary, 307
Frechet, 94, 99, 109, 305 Isotropy, 289
Hadamard, 305
Discrete Fourier transform, 40 Jackknife-After-Bootstrap, 175,
Domain of attraction, 268, 272 187, 190, 212, 268

Edgeworth expansion, 145-146, Kriging, 335, 338


152, 154, 156, 160-163,
165, 167, 169, 172, 180, L-estimator, 109
227 Least squares
Empirical distribution function, BGLS, 310
4, 25, 202, 301 RGLS, 310, 312, 314
Empirical measure, 27, 92 SGLS, 310, 315-316
Empirical process, 302 estimator, 200, 206, 211, 217,
Estimators 308, 310, 316
lag-window, 28 generalized, 308, 310, 314
ratio, 224-225, 240 weighted 308, 310
spectral means, 222, 235 Linear process, 156, 223, 226
Whittle, 228 Linear regression, 167
Yule-Walker, 75, 119, 224, Long range dependence, 241, 252,
239 282, 323
Subject Index 373

MA, 116,215 Parameters


m-dependence, 156-157 level-I, 2, 74, 100, 105, 175
M-estimators, 27, 81, 91, 167, level-2, 1-2, 100, 107, 119,
218, 286, 319-320, 325, 183-184
327 level-3, 12, 187
Markov chain, 36, 42, 127, 154, Periodogram, 222, 225, 228-230,
156, 158, 271 235
Mean integrated squared error, Plug-in
181 estimator, 3, 18, 107, 187,
Mean squared error, 3, 99, 120, 189, 194, 311, 331
124,126,175,300 predictor, 336
Mean square prediction error, Poisson random measure, 269
329, 331, 336 Prediction, 282, 328
Mixed increasing domain asymp- interval, 333, 336
totic structure, 282,
321, 323, 326, 329 Sample autocovariance, 27, 61,
Mixing 74, 99, 166, 288
strong, 45, 49, 59, 120, 155, Second-order correctness, 39,
271, 295, 302 168, 226, 228
</>, 46 Sieve bootstrap, 42
\II, 46, 264 Slowly varying, 242, 244, 252, 269
p, 46, 295, 332 Smooth function model, 73, 99,
Model based bootstrap, 23 118, 125, 146, 160, 164,
Modulus of continuity, 151, 159 167-168,177,179,293
Moment bounds, 128-129, 296 Spatial cumulative distribution
Monte-Carlo simulation, 8, 101, function, 329, 333
103, 107, 185, 192, 197, Spatial sampling design
234, 292, 338 based on Poisson point pro-
Moving block bootstrap, 25, 42, cess, 319
48, 54, 77, 93, 100, 104, stochastic, 319-320
110, 116, 122, 124, 126, Spectral density, 28, 128, 222,
128, 171, 178, 183-184, 226, 233, 239
190, 203, 208, 244-245, Stable distributions, 269
261, 267-268, 274-275, Star shaped region, 283
299 Stationary bootstrap, 32, 34, 57,
63, 77, 102, 105, 116,
Nonoverlapping block bootstrap, 122, 124, 126, 135
4,30,42,48,54,77,101, Stochastic approximation, 161
105, 116, 122, 124, 126, Subsampling, 37, 183, 252-255,
128, 245 284, 310-311, 330, 334
Nugget effect, 288 generalized, 39

Optimal block size, 12, 124, 126- Trimmed mean, 91, 110
127, 176, 178, 181, 183-
184, 186, 194 Variance, 119, 122, 299
374 Subject Index

Variance estimation, 7, 38, 48, 57,


120-122, 127, 178, 185-
186, 192, 292, 295, 299,
326
Variogram, 11,288,307-308
Variogram model fitting, 11, 308
Springer Series in Statistics (continued from p. ii)

Knottnerus: Sample Survey Theory: Some Pythagorean Perspectives.


Kolen/Brennan: Test Equating: Methods and Practices.
Kotz/Johnson (Eds.): Breakthroughs in Statistics Volume I.
Kotz/Johnson (Eds.): Breakthroughs in Statistics Volume II.
Kotz/Johnson (Eds.): Breakthroughs in Statistics Volume III.
Kiichler/Serensen: Exponential Families of Stochastic Processes.
Lahiri: Resampling Methods for Dependent Data.
Le Cam: Asymptotic Methods in Statistical Decision Theory.
Le Cam/Yang: Asymptotics in Statistics: Some Basic Concepts, 2nd edition.
Liu: Monte Carlo Strategies in Scientific Computing.
Longford: Models for Uncertainty in Educational Testing.
Manski: Partial Identification of Probability Distributions.
Mielke/Berry: Permutation Methods: A Distance Function Approach.
Pan/Fang: Growth Curve Models and Statistical Diagnostics.
Parzen/Tanabe/Kitagawa: Selected Papers of Hirotugu Akaike.
Politis/Romano/Wolf: Subsampling.
Ramsay/Silverman: Applied Functional Data Analysis: Methods and Case Studies.
Ramsay/Silverman: Functional Data Analysis.
Rao/Toutenburg: Linear Models: Least Squares and Alternatives.
Reinsel: Elements of Multivariate Time Series Analysis, 2nd edition.
Rosenbaum: Observational Studies, 2nd edition.
Rosenblatt: Gaussian and Non-Gaussian Linear Time Series and Random Fields.
Siirndal/Swensson/Wretman: Model Assisted Survey Sampling.
Schervish: Theory of Statistics.
Shao/Tu: The Jackknife and Bootstrap.
SimonofJ: Smoothing Methods in Statistics.
Singpurwalla and Wilson: Statistical Methods in Software Engineering:
Reliability and Risk.
Small: The Statistical Theory of Shape.
Sprott: Statistical Inference in Science.
Stein: Interpolation of Spatial Data: Some Theory for Kriging.
TaniguchilKakizawa: Asymptotic Theory of Statistical Inference for Time Series.
Tanner: Tools for Statistical Inference: Methods for the Exploration of Posterior
Distributions and Likelihood Functions, 3rd edition.
van der Laan: Unified Methods for Censored Longitudinal Data and Causality.
van der VaartiWellner: Weak Convergence and Empirical Processes: With
Applications to Statistics.
VerbekeiMolenberghs: Linear Mixed Models for Longitudinal Data.
Weerahandi: Exact Statistical Methods for Data Analysis.
West/Harrison: Bayesian Forecasting and Dynamic Models, 2nd edition.
ALSO AVAILABLE FROM SPRINGER!

...............
._N._ _r._
-....
. . . 111_ Slal',licsnntf (amp "I i"!)

--
Subsampllua

.......
-
DESIGN AND ANALYSIS OF RANDOM NUMBER
COMPUTER EXPERIMENTS GENERATION AND MONTE
THOMAS J. SANTER. WIWAM NOTZ.
BRIAN J. WIWAMS
CARLO METHODS
econd Edition
This book describes methods for designing and JAMES E. GENTLE
analyzi ng experimenl~ conducted u ing com-
putercode in lieu of a physical experiment. It di - Thi book survey techniques of random num-
cus es how to elect the values of the factors at ber generation and the u e of random numbers
which 10 run the code (the design of the computer in Monte Carlo simulation. The book covers
experiment) in light of the research objective of ba ic principle , as well as newer methods such
the experimenter. It also provides techniques for as parallel random number generation, nonlinear
analyzing the re ulting data so as to achieve congruential generators, quasi Monte Carlo meth-
these research goal . It illu trales these methods ods, and Markov chain Monte Carlo. The best
with code that i avai lable to the reader at the com- methods for generating random variate from the
panion web ite for the book. standard distributions are presented, but al 0
2003/240 PP./ HARDCOVER/ISBN 0-387·95420-1
general techniques useful in more complicated
SPRINGER SERIES IN STATISTICS model. and in novel settings are described. The
emphasis throughout the book is on practical
methods that work well in current computing envi-
SUBSAMPLING ronment . The second edition 'includes advances
DIMITRIS N. POlITIS, JOSEPH P. ROMANO, in methods for parallel random number genera-
MICHAEl WOLF
tion, universal method for generation of nonuni-
Since Efron's profound paper on the bootstrap, form variates, perfect sampling, and software for
an enorm us amount of effon has been pent on random number generation.
the development of boot trap,jacknife, and other 2003/392 PP ./HARDCOVER/ ISBN 0-387-00178-6
resampling methods. The primary goal of the e STATISTICS AND COMPUTING
computer-intensive methods has been to provide
stati tical tools that work in complex situations
without imposing unrealistic or unverifiabl e
assumptions about the data generating mechani m.
The primary goal of this book is to lay ome of
the foundation for subsam pling methodology
and relaled method .
To Order or for Infonnation:
1999/ 384 PP./ HARDCOVER/ ISBN 0-387·98854-8
SPRINGER SERIES IN STATISTICS In the Americas: CAll.: 1-800-SPRINGER Of
FAX: (201) 348-4505 • WIIIT'E: 5p(inger·Verlag New
Yofk. Inc.. Dept. 55612. po Box 2485. Secaucus. NJ
07~2485 • VISIT: Your locallechnlcal bookstore
'.
• E-MM..: ordersOsprtnger-ny.com
'. .~
_ .C) • • OUtside the Americas; CAll.: -+49 (0) 6221 345-217/B
• FAX: + 49 (0) 6221 345-229 • WIIIT'E: SprInger
'~: ":i Springer Customer Service. Haberstrasse 7, 69126
Heidelberg. Germany · E-MM..: OfdersOsprlnger.de
.f, .'\
PRGMOTION: &5612

You might also like