You are on page 1of 1000

~ ~

Empirical Processes
with Applications
to Statistics
b--------~
Books in the Classics in Applied Mathematics series are monographs and textbooks declared out
of print by their original publishers, though they are of continued importance and interest to the
mathematical community. SIAM publishes this series to ensure that the information presented in these
texts is not lost to today's students and researchers.

Editor-in-Chief
Robert E. O'Malley, Jr., University of Washington

Editorial Board
John Boyd, University of Michigan
Leah Edelstein-Keshet, University of British Columbia
William G. Faris, University of Arizona
Nicholas J. Higham, University of Manchester
Peter Hoff, University of Washington
Mark Kot, University of Washington
Peter Olver, University of Minnesota
Philip Protter, Cornell University
Gerhard Wanner, L'Universite de Geneve

Classics in Applied Mathematics


C. C. Lin and L. A. Segel, Mathematics Applied to Deterministic Problems in the Natural Sciences
Johan G. F. Belinfante and Bernard Kolman, A Survey of Lie Groups and Lie Algebras with Applications and
Computational Methods
James M. Ortega, Numerical Analysis: A Second Course
Anthony V. Fiacco and Garth P. McCormick, Nonlinear Programming: Sequential Unconstrained
Minimization Techniques
F. H. Clarke, Optimization and Nonsmooth Analysis
George F. Carrier and Carl E. Pearson, Ordinary Differential Equations
Leo Breiman, Probability
R. Bellman and G. M. Wing, An Introduction to Invariant Imbedding
Abraham Berman and Robert J. Plemmons, Nonnegative Matrices in the Mathematical Sciences
Olvi L. Mangasarian, Nonlinear Programming
*Carl Friedrich Gauss, Theory of the Combination of Observations Least Subject to Errors: Part One,
Part Two, Supplement. Translated by G. W. Stewart
Richard Bellman, Introduction to Matrix Analysis
U. M. Ascher, R. M. M. Mattheij, and R. D. Russell, Numerical Solution of Boundary Value Problems for
Ordinary Differential Equations
K. E. Brenan, S. L. Campbell, and L. R. Petzold, Numerical Solution of Initial-Value Problems
in Differential- Algebraic Equations
Charles L. Lawson and Richard J. Hanson, Solving Least Squares Problems
J. E. Dennis, Jr. and Robert B. Schnabel, Numerical Methods for Unconstrained Optimization and Nonlinear
Equations
Richard E. Barlow and Frank Proschan, Mathematical Theory of Reliability
Cornelius Lanczos, Linear Differential Operators
Richard Bellman, Introduction to Matrix Analysis, Second Edition
Beresford N. Parlett, The Symmetric Eigenvalue Problem
Richard Haberman, Mathematical Models: Mechanical Vibrations, Population Dynamics, and Traffic Flow
Peter W. M. John, Statistical Design and Analysis of Experiments
Tamer Ba§ar and Geert Jan Olsder, Dynamic Noncooperative Game Theory, Second Edition
Emanuel Parzen, Stochastic Processes
*First time in print.
Classics in Applied Mathematics (continued)

Petar Kokotovic, Hassan K. Khalil, and John O'Reilly, Singular Perturbation Methods in Control: Analysis
and Design
Jean Dickinson Gibbons, Ingram Olkin, and Milton Sobel, Selecting and Ordering Populations: A New
Statistical Methodology
James A. Murdock, Perturbations: Theory and Methods
Ivar Ekeland and Roger Temam, Convex Analysis and Variational Problems
Ivar Stakgold, Boundary Value Problems of Mathematical Physics, Volumes I and II
J. M. Ortega and W. C. Rheinboldt, Iterative Solution of Nonlinear Equations in Several Variables
David Kinderlehrer and Guido Stampacchia, An Introduction to Variational Inequalities and Their Applications
F. Natterer, The Mathematics of Computerized Tomography
Avinash C. Kale and Malcolm Slaney, Principles of Computerized Tomographic Imaging
R. Wong, Asymptotic Approximations of Integrals
O. Axelsson and V. A. Barker, Finite Element Solution of Boundary Value Problems: Theory and Computation
David R. Brillinger, Time Series: Data Analysis and Theory
Joel N. Franklin, Methods of Mathematical Economics: Linear and Nonlinear Programming, Fixed-Point
Theorems
Philip Hartman, Ordinary Differential Equations, Second Edition
Michael D. Intriligator, Mathematical Optimization and Economic Theory
Philippe G. Ciarlet, The Finite Element Method for Elliptic Problems
Jane K. Cullum and Ralph A. Willoughby, Lanczos Algorithms for Large Symmetric Eigenvalue
Computations, Vol. I: Theory
M. Vidyasagar, Nonlinear Systems Analysis, Second Edition
Robert Mattheij and Jaap Molenaar, Ordinary Differential Equations in Theory and Practice
Shanti S. Gupta and S. Panchapakesan, Multiple Decision Procedures: Theory and Methodology
of Selecting and Ranking Populations
Eugene L. Allgower and Kurt Georg, Introduction to Numerical Continuation Methods
Leah Edelstein-Keshet, Mathematical Models in Biology
Heinz-Otto Kreiss and Jens Lorenz, Initial-Boundary Value Problems and the Navier-Stokes Equations
J. L. Hodges, Jr. and E. L. Lehmann, Basic Concepts of Probability and Statistics, Second Edition
George F. Carrier, Max Krook, and Carl E. Pearson, Functions of a Complex Variable: Theory and
Technique
Friedrich Pukelsheim, Optimal Design of Experiments
Israel Gohberg, Peter Lancaster, and Leiba Rodman, Invariant Subspaces of Matrices with Applications
Lee A. Segel with G. H. Handelman, Mathematics Applied to Continuum Mechanics
Rajendra Bhatia, Perturbation Bounds for Matrix Eigenvalues
Barry C. Arnold, N. Balakrishnan, and H. N. Nagaraja, A First Course in Order Statistics
Charles A. Desoer and M. Vidyasagar, Feedback Systems: Input-Output Properties
Stephen L. Campbell and Carl D. Meyer, Generalized Inverses of Linear Transformations
Alexander Morgan, Solving Polynomial Systems Using Continuation for Engineering and Scientific Problems
I. Gohberg, P. Lancaster, and L. Rodman, Matrix Polynomials
Galen R. Shorack and Jon A. Wellner, Empirical Processes with Applications to Statistics
Richard W. Cottle, Jong-Shi Pang, and Richard E. Stone, The Linear Complementarity Problem
Rabi N. Bhattacharya and Edward C. Waymire, Stochastic Processes with Applications
Robert J. Adler, The Geometry of Random Fields
Mordecai Avriel, Walter E. Diewert, Siegfried Schaible, and Israel Zang, Generalized Concavity
Rabi N. Bhattacharya and R. Ranga Rao, Normal Approximation and Asymptotic Expansions
P
Empirical Processes
with Applications

to Statistics
6 a

Galen R. Shorack
Jon A. Wellner
University of Washington
Seattle, Washington

II1p
Society for Industrial and Applied Mathematics
Philadelphia
Copyright © 2009 by the Society for Industrial and Applied Mathematics
This SIAM edition is an unabridged republication of the work first published by John Wiley
& Sons, Inc., 1986.
10987654321
All rights reserved. Printed in the United States of America. No part of this book may be
reproduced, stored, or transmitted in any manner without the written permission of the
publisher. For information, write to the Society for Industrial and Applied Mathematics,
3600 Market Street, 6th Floor, Philadelphia, PA 19104-2688 USA.
Research supported in part by NSF grant DMS-0804587 and NI-AID grant 2R01 AI291968-04.
Tables 3.8.1, 3.8.2, and 9.2.1 used with permission from Pearson Education.
Tables 3.8.1, 3.8.4, 3.8.7, 5.3.4, 5.6.1, 24.2.1, and 24.2.2 used with permission from the Institute
of Mathematical Statistics
Tables 3.8.5, 5.6.1, and 9.2.1 and Figures 5.7.1(a) and 5.7.1(b) used with permission from the
American Statistical Association.
Tables 3.8.6, 5.6.1, and 5.7.1 and Figures 5.7.1, 5.7.2, and 18.3.1 used with permission from
Oxford University Press.
Tables 5.3.1, 5.3.2, 5.3.3, and 5.6.1 used with permission from Wiley-Blackwell.
Tables 9.3.2 and 9.3.3 copyrighted by Marcel Dekker, Inc.

Library of Congress Cataloging-in-Publication Data

Shorack, Galen R., 1939-


Empirical processes with applications to statistics / Galen R. Shorack, Jon A. Wellner.
p. cm. -- (Classics in applied mathematics ; 59)
Originally published: New York : Wiley, c1986.
Includes bibliographical references and indexes.
ISBN 978-0-898716-84-9
1. Mathematical statistics. 2. Distribution (Probability theory) 3. Random variables. I.
Wellner, Jon A., 1945- II. Title.
QA276.S483 2009
519.5--dc22
2009025143

SL. is a registered trademark.


TO MY SONS
GR, BART, AND MATT
YOUR YOUTH WAS THE JOY OF MY LIFE.
-GRS

To VERA,
WITH THANKS FOR YOUR
LOVE AND SUPPORT.
-JAW
Short Table of Contents

1. Introduction and Survey of Results 1


2. Foundations, Special Spaces and Special Processes 23
3. Convergence and Distributions of Empirical Processes 85
4. Alternatives and Processes of Residuals 151
5. Integral Test of Fit and Estimated Empirical Process 201
6. Martingale Methods 258
7. Censored data; the Product-Limit Estimator 293
8. Poisson and Exponential Representations 334
9. Some Exact Distributions 343
10. Linear and Nearly Linear Bounds on the Empirical Distribution
Function G. 404
11. Exponential Inequalities and 11 • /q 1I- Metric Convergence of U.
and V„ 438
12. The Hungarian Constructions of K,,, U. and V. 491
13. Laws of the Iterated Logarithm Associated with U. and V. 504
14. Oscillations of the Empirical Process 531
15. The Uniform Empirical Difference Process D,, = U. +V„ 584
16. The Normalized Uniform Empirical Process Z,, and the
Normalized Uniform Quantile Process 597
17. The Uniform Empirical Process Indexed by Intervals and
Functions 621
18. The Standardized Quantile Process Q. 637
19. L-Statistics 660
20. Rank Statistics 695
21. Spacings 720
22. Symmetry 743
23. Further Applications 763
24. Large Deviations 781
25. Independent but not Identically Distributed Random Variables 796
26. Empirical Measures and Processes for General Spaces 826
A. Appendix A: Inequalities and Miscellaneous 842
B. Appendix B: Counting Processes Martingales 884
References 901
Author Index 923
Subject Index 927
]x
Contents

Preface to the Classics Edition xxxi


Preface xxxv
List of Tables xxix
List of Special Symbols xxxix

1. Introduction and Survey of Results 1


1. Definition of the Empirical Process and the Inverse
Transformation, 1
2. Survey of Results for ICUII,10
3. Results for the Random Functions G„ and U. on [0, 1], 13
4. Convergence of U,, in Other Metrics, 17
5. Survey of Other Results, 19

2. Foundations, Special Spaces and Special Processes 23


0. Introduction, 23
1. Random Elements, Processes, and Special Spaces, 24
Projection mapping; Finite-dimensional subsets; Measurable
function space; Random elements; Equivalent processes;
Change of variable theorem; Borel and ball o -fields
-

2. Brownian Motions S, Brownian Bridge U, the Uhlenbeck


Process, the Kiefer Process V(, the Brillinger Process, 29
Definitions; Relationships between the various processes;
Boundary crossing probabilities for S and U; Reflection
principles; Integrals of normal processes
3. Weak Convergence, 43
Definitions of weak convergence and weak compactness;
weak convergence criteria on (D, 2); Weak convergence
criteria on more general spaces; The Skorokhod-Dudley-
Wichura theorem; Weak convergence of functionals; The
key equivalence; /q JJ convergence; On verifying the
hypotheses of Theorem 1; the fluctuation inequality; addi-
tional weak convergence criteria on (D, 2); Conditions for
a process to exist on (C, (6)
xi
xii CONTENTS

4. Weak Convergence of the Partial-Sum Process S,, 52


Definition of S„; Donsker's theorem that S„=S; The
Skorokhod construction form of Donsker's theorem;
O'Reilly's theorem; Skorokhod's embedded partial sum
process; Hungarian construction of the partial sum process;
the partial sum process of the future observations
5. The Skorokhod Embedding of Partial Sums, 56
The strong Markov property; Gambler's ruin problem;
Choosing T so that S(T) = X ; The embedding version of S;
Strassen's theorem on rates of embedding; Extensions and
limitations; Breiman's embedding in which S is fixed
6. Wasserstein Distance, 62
Definition of Wasserstein distance d 2 ; Mallow's theorem;
Minimizing the distance between rv's with given marginals;
Variations
7. The Hungarian Construction of Partial Sums, 66
The result; Limitations; Best possible rates; Other rates
8. Relative Compactness M► ,69
Definition of 'v.; LIL for iid N(0, 1) rv's; LIL for Brownian
motion; Hartman-Wintner LIL and converse; Multivariate
LIL in M► form; T,,, approximation and Tm linearization;
Criteria for establishing mapping theorem
«j-';

9. Relative Compactness of S(nI)/ ✓ nb„, 79


Definition of Strassen's limit class YC; Properties of C;
Strassen's theorem that S(nI)/ ✓nb „M► /C; Definition of
Finkelstein 's limit class ; B(n, • )—for the Brillinger process
l3
10. Weak Convergence of the Maximum of Normalized
Brownian Motion and Partial Sums, 82
Extreme value df s; Darling and Erdös theorem with gen
-eraliztons
11. The LLN for iid rv's, 83
Kolmogorov's SLLN; Feller's theorem; Theorems of Erdös,
Hsu and Robbins, and Katz; Necessary and sufficient condi-
tions for the WLLN

3. Convergence and Distributions of Empirical Processes 85


1. Uniform Processes and Their Special Construction, 85
Uniform empirical df G,,, empirical process U,,, and quantile
process V,,; Smoothed versions 6,,, UJ,,, and N „; Identities;

CONTENTS xl”
Weighted uniform empirical process W „; Covariances, •r.a.,
and correlation p„=p„(c, 1); Finite sampling process (or
empirical rank process) R,,, with identities; 2 ; I[ • ]I, (• ),
'

and BVI(0, 1); Applications: Kolmogorov-Smirnov,


Cramer-von Mises, stochastic integrals, and simple linear
rank statistics; The special construction of U,,, V,,, W n , IR n
and Brownian bridges U, W; The special construction of
!o h dW„ =J h dW; Glivenko-Cantelli theorem in the
uniform case; Generalizations to U„ (A); Uniform order
statistics: the key relation, densities, moments
2. Definition of Some Basic Processes under General
Alternatives, 98
The empirical df IF,,, the average dfF„ and empirical process
f(F„ —P); The quantile process; Reduction to [0, 1] in
the case of continuous df s; X,,, Y,,, Z,, R,, and identities;
Reduction to [0, 1] in the general case: associated array of
continuous rv's; Extended Glivenko-Cantelli theorem;
Some change of variable results
3. Weak Convergence of the General Weighted Empirical
Process, 108
Definition and moments; Thefunction v,,; Weak convergence
(=') of Z. and its modulus of continuity; The special con-
struction of Z,,; Moment inequalities for Z„
4. The Special Construction for a Fixed Nearly Null Array, 119
Notation for the reduced processes X. and Z.; Nearly null
arrays; The special construction of 7L „; Nearly null arrays;
The special construction of Z n ; The special construction for
fohd7L„
5. The Sequential Uniform Empirical Process K „, 131
The definition of K„ and the Kiefer process K; The Bickel-
Wichura theorem that K„='K
6. Martingales Associated with U,,, V,,, W n , R n , 132
Martingales for U,,, V „, W,,, Rn divided by 1— I; The Pyke-
Shorack inequality, with analogs; Reverse martingales for
U,,, V,,, W „, P. divided by I; Submartingales for n(6„-
I) * 4111; Reverse submartingales for II(3 n —I) # l(iIl; Sen's
inequality; Vanderzanden's martingales
7. A Simple Result on Convergence in I •/q I Metrics, 140
8. Limiting Distributions under the Null Hypothesis, 142
Kolmogorov-Smirnov and Kuiper statistics; Renyi's statis-
tics; Cramer-von Mises, Watson, Anderson-Darling
statistics
CONTENTS
xiv
4. Alternatives and Processes of Residuals 151
0. Introduction, 151
1. Contiguity, 152
The key contiguity condition; Convergence of the centering
function; Convergence of the weighted empirical process E.
on (—cc, co) and the empirical rank process F; Le Cam's
representation of the log likelihood ratio L„ under contiguity;
Uniform integrability, -> L, and -4, of the rv's exp (L„); Le
Cam's third lemma; The Radon-Nikodym derivative of
U + O measure wrt U measure; miscellaneous contiguity
results
2. Limiting Distributions under Local Alternatives, 167
Chibisov's theorem; An expansion of the asymptotic power
of the JI(G„ —I) 1 test
3. Asymptotic Optimality of F,,, 171
Beran's theorem on the asymptotic optimality of F,,; State-
ment of other optimality results
4. Limiting Distributions under Fixed Alternatives, 177
Raghavachari's theorem for supremum functionals;
Analogous result for integral functionals
5. Convergence of Empirical and Rank Processes under Con-
tiguous Location, Scale, and Regression Alternatives, 181
Fisher information for location and scale; The contiguous
simple regression model, and its associated special construc-
tion; The contiguous linear model with known scale; The
contiguous scale model; The contiguous linear model with
unknown scale; the main result
6. Empirical and Rank Processes of Residuals, 194
The weighted empirical process of standardized residuals i,,;
The empirical rank process of standardized residuals R,,;
Convergence of IE„ and 68,,; Classical and robust residuals,
the Pierce and Kopecky idea; The estimated empirical process
U,,; Testing the adequacy of a model

5. Integral Tests of Fit and Estimated Empirical Process 201


0. Introduction, 201
1. Motivation of Principal Component Decomposition, 203
Statement of a problem; Principal component decomposition
of a random vector; Principal component decomposition of
a process-heuristic treatment

CONTENTS XV

2. Orthogonal Decomposition of Processes, 206


Properties of kernels; Complete orthonormal basis for 2'2;
Mercer's theorem; Representations of covariance functions
via Mercer's theorem; Orthogonal decomposition of X a la
Kac and Siegert; Distribution of j X 2 via decomposition
3. Principal Component Decomposition of U., U and Other
Related Processes, 213
Kac and Siegert decomposition of U; Durbin and Knott
decomposition of UJ„; Decompositions of W„ and W2 ; Distri-
bution of the components of U; Computing formula for Wn;
Testing natural Fourier parameters; Power of W„, A„, and
other tests; Decomposition of cliU for ii continuous on [0, 1]
4. Principal Component Decomposition of the Anderson and
Darling Statistic A„, 224
Limiting distribution of A .,; Anderson and Darling decompo-
sition of 1 and A; Computing formula for A n,
5. Tests of Fit with Parameters Estimated, 228
Darling's theorem; An estimated empirical process U;
Specialization to efficient estimates of location and scale
6. The Distribution of W 2 , W„, Ä Z Ä „, and Other Related
,

Statistics, 235
The Darling-Sukhatme theorem for K(s, t) =K(s, t)—
E;' cp (s)cp (t); Tables of distributions; Normal, exponential,
; ;

extreme value and censored exponential cases; Normalized


principal components of W; A proof for E _ E —
7. Confidence Bands, Acceptance Bands, and QQ, PP, and SP
Plots, 247
8. More on Components, 250
Asymptotic efficiency of tests based on components; Choosing
the best components; We come full circle
9. The Minimum Cramer-von Mises Estimate of Location, 254
The Blackman estimator of location

6. Martingale Methods 258


0. Introduction, 258
1. The Basic Martingale M. for U,, 264
The cumulative hazard function A; Definition of the basic
martingale M,,; The key identity; The variance function V;
Convergence of M. to Rv = S(V) for a Brownian motion S;
Xvi CONTENTS

M = 7L(F) for continuous Fand a particular Brownian motion


7L; The predictable variation process (NN„); Discussion of
Rebellodo's CLT; The exponential identity for ✓ n(F„ — F);
Extension to the weighted case of W„
2. Processes of the Form i/ifA,,, IiU,(F), and /sW„(F), 273
Convergence in 11- rli metrics; F ' and F+' are q-functions
-

3. Processes of the Form j h dM n , 276


K,, = j_. h du,, is a martingale; Evaluation of the predict-
able variation of K,,; Existence of K =
gence of K, to K in metrics
j
h dM; Conver-

J and J_.

f_.
4. Processes of the Form h dU,(F) h dW„(F), 282
Reduction of 5”. h dW„(F) to the form $ h dNA„;
Existence of f _ h dU(F) and
W J_
h dW(F); Convergence
in metrics; Some covariance relationships among the
processes; Replacing h by h„
5. Processes of the Form F,, dh, ji v„(F) dh, and
.

jW „(F) dh, 289


Convergence of these processes in 11.4111 metrics
6. Reductions When F is Uniform, 291

7. Censored Data and the Product-Limit Estimator 293


0. Introduction, 293
The random censorship model; The product limit estimator
i=,,; The cumulative hazard function A and its estimator A,,;
The processes IB„ /i(A„ — A) and X.
basic martingale U n
= Ii(E„ — F); The

1. Convergence of the Basic Martingale M,,, 296


The covariance function V; Convergence of M. to M = S(V)
2. Identities Based on Integration by Parts, 300
Representation of 13„ and X,,/(1— F) as integrals
The exponential formula
J 0 dMN ;

3. Consistency of Ä„ and 1:„, 304


4. Preliminary Weak Convergence = of B. and X,,, 306
The Breslow-Crowley theorem
5. Martingale Representations, 310
„)
The predictable variation (M of the basic martingale t11,,;
The predictable variation of B. and X„/(1 — F)

CONTENTS xvii
6. Inequalities, 316
The Gill- Wellner inequality; Lenglart's inequality for locally
square integrable martingales; Gill's inequality (product-
limit version of Daniels and Chang inequalities)
7. Weak Convergence = of 1$„ and X. in / q II ö Metrics, 318
Application of Rebolledo's CLT and Lenglart's inequality;
Confidence bands
8. Extension to General Censoring Times, 325
Convergence of 1„ and X,,; The product-limit estimator is
the MLE

8. Poisson and Exponential Representations 334


0. Introduction, 334
1. The Poisson Process IQJ, 334
One-dimensional; Two-dimensional
2. Representations of Uniform Order Statistics, 335
As partial sums of exponentials; As waiting times of a
conditioned Poisson process; Normalized exponential
spacings; Lack of memory property
3. Representations of Uniform Quantile Processes, 337
4. Poisson Representations of U n , 338
Conditional, Chibisov, and Kac representations
5. Poisson Embeddings, 340
The Poisson bridge; Representation of the sequential uniform
empirical process

9. Some Exact Distributions 343


0. Introduction, 343
1. Evaluating the Probability that G. Crosses a General
Line, 344
Dempster's formula; Daniels' theorem; Chang's theorem
2. The Exact Distribution of IIU„ I and the DKW
Inequality, 349
The Birnbaum and Tingey formula; Asymptotic expansions;
Harter's approximation; Pyke's result for the smoothed
empirical df 6,,; The Dvoretzky- Kiefer- Wolfowitz (DK W)
inequality
xviii CONTENTS

3. Recursions for P(g s G„ < h ), 357


Recursions of Noe, Bolshev, and Steck; Steck's formula;
Ruben's recursive approach; The exact distribution of 1U 11
using Ruben's table; Tables for ll U / I (1— I) and
(w/ O„(1—Q9„) from Kotelnikova and Chmaladze
4. Some Combinatorial Lemmas, 376
Andersens's lemma; Takäcs' lemma; Tusnddy's lemma;
Another proof of Dempster's formula; Csdki and Tusnddy's
lemma
5. The Number of Intersections of G. with a General Line, 380
The exact distribution of the number of intersections; Limit-
ing distributions in various special cases
6. On the Location of the Maximum of 11'. and Ü, 384
The smoothed uniform empirical and quantile processes;
Theorems of Birnbaum and Pyke, Gnedenko and Mihalevic,
Kac, and Wellner
7. Dwass's Approach to G. Based on Poisson Processes, 388
Dwass's approach to 6„ based on Poisson processes;
Dwass's theorem with applications; Zeros and ladder points
of U,,; Crossings on a grid
8. Local Time of v„, 398
Definition of local time; The key representation; Some open
questions about the limit
9. The Two-Sample Problem, 401
The Gnedenko and Korolyuk distribution; Application to the
limiting distribution of lU' II

10. Linear and Nearly Linear Bounds on the Empirical Distribution


Function G. 404
0. Summary, 404
1. Almost Sure Behavior of en:k with k Fixed, 407
Kiefer's characterization of P(6,, :k <_ a„ i.o.); Robbins-Sieg-
mund characterization of P(^,, :k > a„ i.o.)
2. A Glivenko-Cantelli-type Theorem for I(6„ — 1)41 11, 410
Lai's SLLN for 11(6„ — I)^1JI
3. Inequalities for the Distributions of JAG„/ I IJ and
llI/6nllt.:,, 412
Shorack and Wellner bound on P(II I /6„ II g', , ? A); Wellner's
:

exponential bounds

CONTENTS xix
4. In-Probability Linear Bounds on G, 418
5. Characterization of Upper-Class Sequences for IIG/III and
III /G„ II L:,, 420
Shorack and Wellner's characterization of upper class
sequences for IIG./III and I I /G„ II , ,; Chang's WLLN type
result for I) • II ä, °'; Wellner's restriction to I) I) ä with a„ --> 0;
Mason's upper class sequences for II n '(G„ — I )/
[I(1 —I)]' "II with 0 <_ v <_z
6. Almost Sure Nearly Linear Bounds on 6„ and 6 „', 426
Bounding 6„ and G ' between (1 t e)t' $s ; James' boundary
weight i/i = l /(1og 2 (e e / I) for III /G„ II',,,; Nearly linear
bounds with logarithmic terms
7. Bounds on Functions of Order Statistics, 428
/
Mason's upper class result for max {ig( n:; na,,: 1 i < k„}
)

with g \ and a,, 7; Strength of bundles of fibers; Determina-


tion of the a.s. lim sup of IIG„gjj/a„ via Eg(6)
8. Almost Sure Behavior of 71,,(a,,)/b,, as a„ .!.0, 432
Kiefer's theorem
9. Almost Sure Behavior of Normalized Quantiles as a„j.0, 435

11. Exponential Inequalities and Ij •IgII- Metric Convergence of U.


and V. 438
0. Introduction, 438
1. Universal Exponential Bounds for Binomial rv's, 439
The exponential bounds for Binomial tail probabilities of
Bennett, Bernstein, Hoeffding, and Wellner, Behavior of the
functions i/i and h; Constants (3C related to 'i and h;
Extensions of the binomial exponential bounds to suprema
of U. over neighborhoods of zero
2. Bounds on the Magnitude of II UJ #/ q 11 ä, 445
n
Inequalities for P(II U / q II ä ? A); Corollary for q = -.11 ; Cor-
responding inequalities for P(IIU/gIIa —A)
3. Exponential Bounds for Uniform Order Statistics, 453
Exponential bounds for order statistic tail probabilities;
Behavior of the functions +y and h; Bounds for absolute
central moments of uniform order statistics; Extensions of
the bounds to suprema of V. over neighborhoods of 0
4. Bounds for the Magnitude of 1IN/gllq, 460
Inequalities for P(IIN„/ q II ä _— A); Corollary for q = f
XX CONTENTS

5. Weak Convergence of U, and V. in 11 • / q !I Metrics, 461


Chibisov's theorem; O'Reilly's theorem on convergence of
V. with respect to /q; Other versions of the quantile
process
6. Convergence of U,,, w,,, 4/„ and R. in Weighted .,
Metrics, 470
7. Moments of Functions of Order Statistics, 474
Existence of moments of rv's; Existence of moments of order
statistics; Anderson's moment expansions; Mason's bounds
on moments
8. Additional Binomial Results, 480
Unimodality of the binomial distribution and related
inequalities; Feller's inequalities; Large deviations, the
Bahadur and Rao theorem
9. Exponential Bounds for Poisson, Gamma, and Beta
RV's, 484
Unimodality of the Poisson distribution; Large deviations;
Exponential bounds for Poisson probabilities related to the
Binomial bounds; Inequalities of Bohman and of Anderson
and Samuels; Analogous results for the Gamma distribution
12. The Hungarian Constructions of K„ , U,, and V. 491
0. Introduction, 491
Skorokhod constructions of U,, and V. again; The sequential
uniform empirical process K.
1. The Hungarian Construction of K„, 493
The Hungarian construction of K. at rate (log n) Z /'Ji; The
other Hungarian construction of U. at rate (log n) /Vi
2. The Hungarian Renewal Construction of '/,, 496
The Brillinger process; The Hungarian renewal construction
of V,, using partial sums of exponentials
3. A Refined Construction of U. and V„, 499
4. Rate of Convergence of the Distribution of Functionals, 502
Rate (log n) /..I is possible for Lipschitz functionals with
bounded density

13. Laws of the Iterated Logarithm Associated with U. and V. 504


0. Introduction, 504
1. A LIL for 1U,,11, 504

CONTENTS xxi
Smirnov's LIL for I U , II; Chung's characterization of upper
class sequences; Boundary crossing probabilities; A remark
on an Erdös theorem
2. A Maximal Inequality for IIv ;/ glla, 510
James' inequality for general q; Shorack's refinement for
q =f
3. Relative Compactness of U. and N„, 512
Definition of v; Finkelstein's theorem that OJ„/ b „M+;
Cassels' theorem; Rescaled Kiefer process K(n, •)/.inb„ +;
An alternative proof of Finkelstein's theorem based on the
Hungarian construction
4. Relative Compactness of U. in II /qII — Metrics, 517
James' theorem
5. The Other LIL for IIU,,II, 526
Mogulskii's theorem
6. Extension to General F, 530

4. Oscillations of the Empirical Process 531


0. Introduction, 531
1. The Oscillation Moduli w, m, and w of U and S. 533
The modulus of continuity w and Levy's theorem, The
modulus cri; An exponential bound for w(a); The Bickel and
Freedman bound on Ew(a); The order ofo3(a); An exponen-
tial bound for w(a)
2. The Oscillation Moduli of 1J „, 542
The modulus of continuity w„; The Lipschitz z modulus cö„;
Stute's theorem on the order of w n (a„) for "regular” a„;
Behavior of w„(a„) on the boundary sequences; The same
theorems hold for cw„(a„) and (a,,); The Mason, Well-
er, Shorack exponential bound on w„; The associated
maximal inequality of Stute; The martingale Iw „(a), n >_
1; A proof using the conditional Poisson representation
3. A Modulus of Continuity for the Kiefer Process Oö„, 558
The modulus of continuity w„ of IB„ = K(n, • )l ✓n; Behavior
of w„ (a„) on the upper boundary sequences; A maximal
inequality; The Lipschitz Z modulus cü„ of B.
-

4. The Modulus of Continuity Again, via the Hungarian


Construction, 567
xxii CONTENTS

A partial theorem for "regular" sequences a; A full theorem


for a„ on the upper boundary
5. Exponential Inequalities for Poisson Processes, 569
Inequalities for centered Poisson processes; The modulus of
continuity of bö s , M s , and N s ; Inequalities for the Poisson
bridge; Another proof of Chibisov's theorem; Convergence
of rescaled G,, to a Poisson process
6. The Modulus of Continuity Again, via Poisson
Embedding, 578
The centered Poisson process bö s , the Poisson bridge t i1J , and
the final N,; Theorems of Section 2 reproved via Poisson
embedding; Summary; A proof using Poisson embedding
7. The Modulus of Continuity of V,,, 581

15. The Uniform Empirical Difference Process D„ = U,, +V,, 584


0. Introduction, 584
1. The Uniform Empirical Difference Process D„, 584
Definition of ®„; The key picture; Kiefer's theorems; A
simple proof establishing the correct order of magnitude; The
order of D„/ q

2. The Integrated Empirical Difference Process, 594


Vervaat's theorem; The parameters estimated version

16. The Normalized Uniform Empirical Process Z. and the Normalized


Uniform Quantile Process 597
0. Introduction, 597
1. Weak Convergence of 00l„ (i, 597
Definition of Z„ and its natural limit Z; Relationships
between Z, S, and the Uhlenbeck process X; Darling and
Erdös' limit theorem for IS,/ f ^) ö; Jaeschke's analogous
limit theorem for JJ71„ f 1; Eicker's theorem for the quantile
version; Representation of Jö Z, (t) dt as a sum of 0-mean
iid rv's
2. The a.s. Rate of Divergence of J7Lö 2 , 603
Csäki's theorems; Shorack's theorem
3. Almost Sure Behavior of IIZ n ^^ ä! z with a„ \,,0, 609
Csörgö and Revesz theorem; Proof of Singh 's theorem
4. The a.s. Divergence of the Normalized Quantile Process 615
CONTENTS XXiii

17. The Uniform Empirical Process Indexed by Intervals and


Functions 621

0. Introduction, 621
1. Bounds on the Magnitude of (jU„/ q ^I T( ., b) , 621
2. Weak Convergence of U. in / q II w Metrics, 625
3. Indexing by Continuous Functions via Chaining, 630

18. The Standardized Quantile Process Q. 637


0. Introduction, 637
Summary; Recollection of earlier results for V.
1. Weak Convergence of the Standardized Quantile Process
Q „, 638
Convergence in distribution of sample quantiles; The
Hdjek-Bickel theorem on weak convergence of Q. on
[a, b]c (0, 1); Shorack's theorem on convergence of
0„ to V; The Csörgö and Resesz condition on f
2. Approximation of 0„ by V. with Applications, 645
Csörgö and Revesz determination of the rate at which lI 0 „ —
0l„ 11 goes to 0; Extension of Kiefer-Bahadur and Finkelstein
theorems to Q,,; Miscellaneous applications; Parzen 's
observation on the Csörgö and Revesz condition on f;
Mason's SLLN
3. Asymptotic Theory of the Q-Q Plot, 652
Limiting distribution of Doksum's process; Confidence bands
for s- G ' °F —I
-

4. Weak Convergence = of the Product — Limit Quantile Pro-


cess V., 657

19. L- Statistics 660


0. Introduction, 660
t. Statement of the Theorems, 660
Basic idea of the proof; The assumptions; CLT, LIL, SLLN;
Functional CLT and LIL for past and future; Simplifying
the mean; Better rates of embedding for a specific score
function
2. Some Examples of L-statistics, 670
Mean, median, and sign test; The likelihood ratio statistic;
Pitman efficiency; Nonstandard examples
XXIV CONTENTS

3. Randomly Trimmed and Winsorized Means, 678


Ordinary trimmed and Winsorized means; The metrically
symmetrized and Winsorized mean; Other examples
4. Proofs, 688

20. Rank Statistics 695


0. Linear Rank Statistics, 695
1. The Basic Martingale M,,, 695
Definition of M,,; Convergence of M.
2. Processes of the Form T. = J h„ dR„ in the Null Case, 699
Definition of T,,; Convergence of T.
3. Contiguous Alternatives, 704
Efficiency, asymptotic linearity, and rank estimators
4. The Chernoff and Savage Theorem, 715
5. Some Exercises for Order Statistics and Spacings, 717

21. Spacings 720


0. Introduction, 720
1. Definitions and Distributions of Uniform Spacings, 720
2. Limiting Distributions of Ordered Uniform Spacings, 725
3. Renewal Spacings Processes, 727
Normalized, ordered, and weighted renewal spacings proces-
ses; Convergence to limiting processes
4. Uniform Spacings Processes, 731
Normalized, ordered, and weighted uniform spacings proces-
ses; Convergence in 11 • / q fl metrics
5. Testing Uniformity with Functions of Spacings, 733
Testing an iid sample; Testing a renewal process for exponen-
tiality; Testing for exponentiality
6. Iterated Logarithms for Spacings, 741

22. Symmetry 743


1. The Empirical Symmetry Process S,, and the Empirical Rank
Symmetry Process R,,, 743
Definitions of the absolute empirical process, the empirical
symmetry process S,,, and the empirical rank symmetry pro-

CONTENTS XXV

cess UB,,; Identities relating these processes to v „; Conver-


gence theorems
2. Testing Goodness of Fit for a Symmetric DF, 746
The symmetric estimator F.** of Fsatisfies VIl(F — F) # Ij
IIUII /2; Supremum, integral, and components tests
3. The Processes under Contiguity, 751
4. Signed Rank Statistics under Symmetry, 753
Asymptotic normality; Representation of the limiting rv;
Asymptotic normality under contiguous alternatives
5. Estimating an Unknown Point of Symmetry, 757
Estimators based on signed rank statistics; Estimators based
on variants of the Cramer- von Mises statistic
6. Estimating the DF of a Symmetric Distribution with
Unknown Point of Symmetry, 759
Schuster's results; Boos' statistic

23. Further Applications 763


1. Bootstrapping the Empirical Process, 763
2. Smooth Estimates of F, 764
3. The Shorth, 767
4. Convergence of U- Statistic Empirical Processes, 771
5. Reliability and Econometric Functions, 775

24. Large Deviations 781


0. Introduction, 781
1. Bahadur Efficiency, 781
Exact slope; Bahadur's theorem
2. Large Deviations for Supremum Tests of Fit, 783
Large deviations of binomial rv's; The key function c1 2 (t)
—log (t(1 — t) ); A version of Abrahamson's theorem
_
3. The Kullback-Leibler Information Number, 789
Elementary properties
4. The Sanov Problem, 792
Sanov's conclusion; Hoadley's theorem; The Groeneboom,
Oosteroff, and Ruymgaart extension; Reformulation of Sec-
tion 2 in the spirit of Sanov's conclusion

xxvi CONTENTS

25. Independent but not Identically Distributed Random Variables 796


0. Introduction, 796
1. Extensions of the DKW Inequality, 796
Bretagnolle's inequality and exponential bound
2. The Generalized Binomial Distribution, 804
Hoeffding's inequalities for the probability distribution of a
sum of independent Bernoulli rv's; Feller's variance
inequality
3. Bounds on I n , 807
Van_Zuijlen's inequalities for P(IIF„/F„ lI ? .1) and
P(IIF^/F fIz.:,^A)
4. Convergence of X„, V,,, and Z. with respect to II / I)
Metrics, 809
= of Z, in II -/qI)-metrics; of X„ and V. in 11 •/q();
Comparison inequalities; Another natural reduction of the
empirical process; An inequality of Marcus and Zinn; The
Marcus-Zinn exponential bound for the weighted empirical
process Z.

5. More on L-statistics, 821


The CLT for L-statistics of independent but not identically
distributed rv's; Stigler's variance comparisons

26. Empirical Measures and Processes for General Spaces 826


0. Introduction, 826
1. Glivenko-Cantelli Theorems via the Vapnik-Chervonenkis
Idea, 827
Vapnik-ehervonenkis classes of sets; The Vapnik-Cher-
vonenkis exponential bound; Bounds on the growth function
m w (r); Examples of VC classes of sets; Equivalence of
and -gy p for Dn (S«)
2. Glivenko-Cantelli Theorems via Metric Entropy, 835
Entropy conditions; Pollard's entropy bound for functions;
The Blum-DeHardt Glivenko-Cantelli theorem; The Pol-
lard-Dudley Glivenko-Cantelli theorem
3. Weak and Strong Approximations to the Empirical Process
Z n , .837
The Gaussian limit process; Functional Donsker and strong-
invariance classes of functions; A general theorem of Dudley
and Philipp; Dudley's CLT; Pollard's CLT

CONTENTS xxvii
A. Appendix A: Inequalities and Miscellaneous 842

0. Introduction, 842
1. Simple Moment Inequalities, 842
Basic, Markov, Chebyshev, Jensen, Liapunov, C r, Cauchy-
Schwarz, Minkowski, estimating EI X I
2. Maximal Inequalities for Sums and a Minimal
Inequality, 843
Kolmogorov, Monotone, Häjek-Renyi, Levy, Skorokhod,
Menchof; Maximal inequality for the Poisson process; Weak
symmetrization, Levy; Mogulskii's minimal inequality; The
continuous monotone inequality of Gill- Wellner
3. Berry- Esseen Inequalities, 848
Berry- Esseen, with generalizations; Crämer's expansion;
Esseen's lemma; Stein's CLT, a special case
4. Exponential Inequalities and Large Deviations, 850
Mill's ratio for normal rv's; Variations on P(S„ /s„ > A) =
exp (— A 2 /2); Bennett, Hoefding, and Bernstein inequalities;
Kolmogorov's exponential bounds; Large deviation theorems
of Chernoff and others; The Poisson example; Properties of
moment-generating functions
5. Moments of Sums, 857
von Bahr inequality; rth mean convergence equivalence;
Burkholder's inequality; Marcinkiewicz and Zygmund
equivalences, with variations; Horn ich's inequality
6. Borel-Cantelli Lemmas, 859
Borel-Cantelli, Renyi, and other variations
7. Miscellaneous Inequalities, 860
Events lemma, Bonferoni inequality; Anderson's inequality
8. Miscellaneous Probabilistic Results, 862
Moment convergence; -i,, is equivalent to -> as, on subsequen-
ces; Cramer- Wold device; Formulas for means and covari-
ances; Tail behavior of F when moments exist; Vitali's
theorem; Scheffe's theorem with applications
9. Miscellaneous Deterministic Results, 864
Stirling's formula; Euler's constant; Some implications of
convergence of series and integrals; A discussion of the sub-
sequence n; ° ( exp (aj/log j)) used in upper class proofs;
Integration by parts

xxviii CONTENTS

10. Martingale Inequalities, 869


Functions of martingales; Doob's inequality with variations;
The Birnbaum-Marshall inequalities; Submartingale conver-
gence theorem
11. Inequalities for Reversed Martingales, 874
Inequalities; Reverse submartingale convergence theorem
12. Inequalities in Higher Dimensions, 876
Wichura's inequality; The Shorack and Smythe inequality
13. Finite-Sampling Inequalities, 878
Hoeffding's bound
14. Inequalities for Processes, 878

B. Appendix B: Martingales and Counting Processes 884


1. Basic Terminology and Definitions, 884
2. Counting Processes and Martingales, 886
Examples; Compensators; Doob-Meyer decomposition for
a counting process N= M+A; The predictable variation
process (M) = J (o .^ (1 — DA) dA; Formulas for the com-
pensator A; Continuity of A and quasi-left-continuity of N
3. Stochastic Integrals for Counting Processes, 890
Martingale transform theorems; The predictable variation
process of a martingale transform; Martingale representation
theorem
4. Martingale Inequalities, 892
Lenglart's inequality; Burkholder-Davis-Gundy inequality
5. Rebolledo's Martingale Central Limit Theorem, 894
The ARJ conditions; Relationships among the ARJ condi-
tions; A central limit theorem for local martingales; A central
limit theorem for locally square integrable martingales
6. A Change of Variable Formula and Exponential Semimar-
tingales, 896
The Ito, Doleans-Dade, Meyer formula; The exponential of
a semimartingale; Examples; A useful exponential supermar-
tingale
Errata 901
References 919
Author Index 940
Subject Index 944
List of Tables

CHAPTER 3

3.8.1 Limiting Distribution of the Kolmogorov-Smirnov Statistic 143


3.8.2 Limiting Distribution of the Kuiper Statistic 144
3.8.3 Limiting Distribution of the Renyi Statistic 146
3.8.4 Limiting Distribution of the Cramer-von Mises Statistic 147
3.8.5 Limiting Distribution of the Anderson-Darling Statistic 148
3.8.6 Modification of the Asymptotic Distributions for Finite Sample 149
Sizes
3.8.7 Limiting Distribution of Mallows Statistic 149

CHAPTER 5

5.3.1 Upper Percentage Points for Normalized Components z 218


5.3.2 Percentage Points of BP= WZ ^°_ (z*)2/(j2ir2)
— ,
219
5.3.3 Asymptotic Powers of Components and W„, A„, and U„ against 220
Shifts in Normal Mean and Variance in the One-Sided Situ-
ation
5.3.4 Distribution of the Cramer-von Mises Statistic for n = 1, 2, 3 223
5.6.1 Percentage Points for W 2 , Ä 2 , 0 2 , D - , 15 and V with Estimated 239
Parameters
5.6.2 Percentage Points of A 2 and p W2 241
5.6.3 Coefficients a for Calculating Components Zr,, of W„ in Tests
;; 242
for the Normal Distribution
5.6.4 Coefficients 6^ for Calculating Components Z*, of W , in Tests
; . 243
for Exponentiality
5.7.1 Upper Percentage Points for Ds p 249
xxix
xxx LIST OF TABLES

CHAPTER 9

9.2.1 Finite Sample Distribution of the Kolmogorov Statistic 350


9.3.1 Percentage Points of the df of the Studentized Smirnov 359
Statistic
9.3.2 Percentage Points of the df of the Standardized Smirnov Statistic 361
9.3.3 Coefficients in Ruben's Formula 370
9.3.4 Coefficients of P(D„ <_ a) for a in the Various Subintervals of 371
{1/n,1-1/n], n=3(1)10

CHAPTER 22

22.2.1 Limiting Distribution of the Symmetry Statistic T. 748

CHAPTER 24

24.2.1 f(a,t)=(a+t)log( a t )+( 1—a—t)log(


,

1
a t t) 783
24.2.2 g 1 (a)/ 1 (a) and g2(a) /S2(a) 784
Preface to the Classics Edition

This edition of our book is the same as the 1986 Wiley edition, but with a long list
of typographical and mathematical errors appended. A somewhat shorter version
of this list has been posted on Wellner's University of Washington Web site for
many years; the current list has resulted from adding all the additional errors of
which we are currently aware.
We owe thanks to N. H. Bingham, Miklos Csörgö, Sandor Csörgö, Kjell Dok-
sum, Peter Gaenssler, Richard Gill, Paul Janssen, Keith Knight, Ivan Mizera, D.
W. Müller, David Pollard, Peter Sasieni, and Ben Winter for pointing out errors,
difficulties, and shortcomings.
Although our book focuses almost entirely on empirical processes of real-
valued random variables, and much progress on general empirical process theory
has been made in the two decades since then, it seems that the book is still valuable
as a reference for many of the one-dimensional results and for the useful collection
of inequalities. We are very pleased to see it reprinted in the SIAM Classics Series.
During the 1970s and 1980s the general theory of empirical processes for
variables with values in an arbitrary sample space X began developing strongly,
thanks to the pioneering work of Dick Dudley, Evarist Gine, Vladimir Koltchin-
skii, Mike Marcus, David Pollard, and Joel Zinn, and this has continued since with
notable contributions from Michel Ledoux, Michel Talagrand, Pascal Massart, Ken
Alexander, and many others. This more general theory is covered by several books
and monographs including van der Vaart and Wellner (1996), de la Pena and Gine
(1999), Dudley (1999), Ledoux and Talagrand (1991) van de Geer (2000), and
Pollard (1984, 1990).
Our book gave statements of 19 "Open Questions." To the best of our knowl-
edge, 11 of these have been solved. We give a list of the problems and the solutions
of which we are aware at the end of the Errata list.

GALEN R. SHORACK JON A. WELLNER


DEPARTMENT OF STATISTICS, Box 354322 DEPARTMENT OF STATISTICS, Box 354322
UNIVERSITY OF WASHINGTON UNIVERSITY OF WASHINGTON
SEATTLE, WA 98195-4322 SEATTLE, WA 98195-4322
GALEN @ STAT. WASHINGTON.EDU JAW @ STAT. WASHINGTON.EDU

Xxxi
The Uniform Song
There are continuous distributions,
discrete ones too.
Some are heavy tailed,
and some are skew.
There are logistics and chi squares,
but these we will scorn,
'Cause the loveliest of them all
is the Uniform.

xxxiii
Preface

The study of the empirical process and the empirical distribution function is
one of the major continuing themes in the historical development of mathemati-
cal statistics. The applications are manifold, especially since many statistical
procedures can be viewed as functionals on the empirical process and the
behavior of such procedures can be inferred from that of the empirical process
itself. We consider the empirical process per se, as well as applications to tests
of fit, bootstrapping, linear combinations of order statistics, rank tests, spacings,
censored data, and so on.
Many of the classical results for sums of iid rv's have analogs for empirical
processes, and many of these analogs are now available in best possible form.
Thus we concern ourselves with empirical process versions of laws of large
numbers (LLN), central limit theorems (CLT), laws of the iterated logarithm
(LIL), upper-class characterizations, large deviations, exponential bounds,
rates of convergence and orthogonal decompositions with techniques based
on martingales, special constructions of random processes, conditional Poisson
processes, and combinatorial methods.
Good inequalities are a key to strong theorems. In Appendix A we review
many of the classic inequalities of probability theory. Great care has been
taken in the development of inequalities for the empirical process throughout
the text; these are regarded as highly interesting in their own right. Exponential
bounds and maximal inequalities appear at several points.
Because of strong parallels between the empirical process and the partial
sum process, many results for partial sums are also included. Chapter 2 contains
most of these.
Our main concern is with the empirical process for iid rv's, though we also
consider the weighted empirical process of independent rv's in some detail.
We ignore the large literature on mixing rv's, and confine our presentation for
k- dimensions and general spaces to an introduction in the final chapter.
We emphasize the special Skorokhod construction of various processes, as
opposed to classic weak convergence, wherever possible. We feel this makes
for simpler and more intuitive proofs. The Hungarian construction is also
considered. It is usually more cumbersome for weak convergence results, since
there is no single limiting Brownian bridge. However, it can be used to provide
strong limit theorems even though the Skorokhod construction cannot.
xxxv
Xxxvi PREFACE

The book is intended for graduate students and research workers in statistics
and probability. The prerequisite is a standard graduate course in probability
and some exposure to nonparametric statistics. A reasonable number of exer-
cises are included. In some cases we have listed as exercises results from
themes we have not pursued.
The following convention is used for cross-referencing theorems,
inequalities, remarks, and so on: Theorem 1 in Section 2 of Chapter 3, for
example, will be referred to as simply Theorem I only within Section 2, but
will be referred to as Theorem 3.2.1 everywhere else.
We have had fruitful discussions with many individuals concerning topics
in this book. It gives us great pleasure to thank Peter Gaenssler, Richard Gill,
Jack Hall, David Mason, David Pollard, and Winfried Stute in particular.
Reactions from students and colleagues who took a Fall 1983 course by
Shorack were also helpful; comments from Sue Leurgans and Barbara
McKnight led to improvement of parts of Chapters 3 and 4.
We owe special thanks to Tina Victa, Linda Chapel, and DeAnne Carr for
their expert typing of the manuscript and its several revisions, and to Ilze
Shubert and Anne Sheahan for preparation of the tables and figures.
Finally, much of the research for writing of this book has been supported
by grants from the National Science Foundation.

GALEN R. SHORACK
JON A. WELLNER

Seattle, Washington
October 1985
Acknowledgments and
Sources of Tables

We are grateful to the following publishers and organizations for permission


to reproduce, in part or whole, the tables included in the text as indicated
below.
Addison-Wesley Publishing Company for Tables 3.8.1, 3.8.2, and 9.2.1
reproduced from Donald B. Owen (1962), Handbook of Statistical Tables,
Addison-Wesley, Reading, Massachusetts, pages 440, 442, 444, 424-425, and
427-430. Reprinted with permission.
Table 9.3.1 is reprinted from H. Ruben (1976), On the evaluation of Steck's
determinant for the rectangle probabilities of uniform order statistics, Commun.
Statist. A, 1, 535-543, by courtesy of Marcel Dekker, Inc.
The American Statistical Association for Table 3.8.5 from T. W. Anderson
and D. A. Darling (1954), A test of goodness-of-fit, J. Am. Statist. Assoc., 49,
765-769; Table 5.6.1 from M. A. Stephens (1974), EDF statistics for goodness
of fit and some comparisons, J. Am. Statist. Assoc., 69, 730-737; Figures 5.7.1(a)
and 5.7.1(b) from R. Iman (1982), Graphs for use with the Lilliefors test for
normal and exponential distributions, Am. Statist., 36, 109-112; and Table
9.2.1 from Z. W. Birnbaum (1952), Numerical tabulation of the distribution
of Kolmogorov's statistic for finite sample size, J. Am. Statist. Assoc., 47,
425-441.
The Institute of Mathematical Statistics for Table 3.8.1 from N. Smirnov
(1948), Table for estimating the goodness of fit of empirical distributions, Ann.
Math. Statist., 19, 279-281; Table 3.8.4 from T. W. Anderson and D. A. Darling
(1952), Asymptotic theory of certain "goodness-of-fit" criteria, Ann. Math.
Statist., 23, 193-212; Table 3.8.7 from B. McK. Johnson and T. Killeen (1983),
An explicit formula for the L,-norm of the Brownian bridge, Ann. Prob., 11,
807-808; Table 5.3.4 from A. W. Marshall (1953), The small sample distribution
of w n„ Ann. Math. Statist., 29, 307-309; Table 5.6.1 from M. A. Stephens
(1976), Asymptotic results for goodness-of-fit statistics with unknown
parameters, Ann. Statist., 4, 357-369; and Tables 24.2.1 and 24.2.2 from
P. Groeneboom and G. R. Shorack (1981), Large deviations of goodness
of fit statistics and linear combinations of order statistics, Ann. Prob., 9,
971-987.
xxxvii
Xxxviii ACKNOWLEDGMENTS AND SOURCES OF TABLES

The Biometrika Trustees for permission to reprint portions of Table 54 on


page 359 from Biometrika Tables for Statisticians, Vol. 2, Third Edition (1966),
as our Table 3.8.6, as part of our Table 5.6.1, and as Table 3.8.1; Table 5.6.1
from M. A. Stephens (1977), Goodness of fit for the extreme value distribution,
Biometrika, 64, 583-588; Table 5.6.2 from A. Pettit (1977), Tests for the
exponential distribution with censored data using Cramer-von Mises statistics,
Biometrika, 64, 629-632; Table 5.7.1 and Figure 5.7.2 from J. Michael (1983),
The stabilized probability plot, Biometrika, 70, 11 -17; Figure 18.3.1 from K. A.
Doksum and G. L. Sievers (1976), Plotting with confidence: Graphical com-
parisons of two populations, Biometrika, 63, 421-434.
The Royal Statistical Society for Tables 5.3.1, 5.3.2, and 5.3.3 from J. Durbin
and M. Knott (1972), Components of Cramer-von Mises statistics, I., J. Roy.
Statist. Soc. Ser. B, 34, 290-307, and Table 5.6.1 from J. Durbin, M. Knott,
and C. Taylor (1975), Components of Cramer-von Mises statistics, II, J. Roy.
Statist. Soc. Ser. B, 37, 216-237.
The Society for Industrial and Applied Mathematics for Tables 9.3.3 and
9.3.4 which are reprinted with permission from V. F. Kotel'nikova and E. V.
Chmaladze (1983), On computing the probability of an empirical process not
crossing a curvilinear boundary, Theory Prob. Appl. (English transl.), 27 (3),
640-648; copyright 1983 by Society for Industrial and Applied Mathematics.
All rights reserved. Table 22.2.1 which is reprinted with permission from
A. Orlov (1972), On testing the symmetry of distributions, Theory Prob. Appl.
(English transl.), 17 (2), 357-361; copyright 1972 by Society for Industrial and
Applied Mathematics, all rights reserved.

G.R.S.
J.A.W.
List of Special Symbols

'A The indicator function of the set A


I The identity function on [0, 1], or on the line
X=Y Means X and Y have the same distribution
X = (µ, 0-) Means X has mean µ and variance a 2
X = F(µ, 0.2 ) Means X has df F, mean µ, and variance a 2
(fl, s4, P) The underlying probability space
w A typical point in Cl
[ ] Events A E .si are described between such brackets
() The greatest integer function
(,) Inner-product notation

I The supremum norm


I[ ] The '2 norm
f" Denotes, simultaneously, any one of 7, f, or I f
(C, 16), See Section 2.1
(D, ), See Section 2.1
= Weak convergence; see Section 2.3 for definition
M► Relative compactness; see Section 2.8 for definition
Independent Uniform (0, 1) rv's
The special independent Uniform (0, 1) rv's of Theorem 3.1.1
f„ „
: The ordered values of either , ... , „ or ,
X„= Y„ Q Means X„—Y„- p 0 as n-*03

a =bG c Means ta—bIsc


jra Convergence of the finite-dimensional distributions
xxxix

xl LIST OF SPECIAL SYMBOLS

U.(s, tJ The increment U.(t)—U„(s) of the process U,,


v(s, t) The increment v(t) — v(s) of the measure v
u.a.n. Uniformly asymptotically negligible constants c n; satisfy
n l
max c. ; Y c„i :1 <_isn } -*0 asn - m
i =I J
i.o. infinitely often
Q, Q * See Section 11.1
lr Usually, +4' (A)= 2h(1 +A) /,1 2 with h(A)= A(logA -1)+1;
see Section 11.1
CLT Central limit theorem
WLLN, SLLN Weak and strong laws of large numbers
LIL Law of the iterated logarithm

INTERDEPENDENCE TABLE

/1\

^^ a
ections Sections
.1-2.4 2.5-2.11

4 5 6 18 10

\ 1 1 1
\ Appendix 1 9 11

25 7 i 20 13

I T
21 14

22 15

23 16

24 17

1
26

x 1i
CHAPTER 1

Introduction and
Survey of Results

1. DEFINITION OF THE EMPIRICAL PROCESS AND THE


INVERSE TRANSFORMATION

Let X 1 , X2 ... be independent random variables (rv's) with distribution func-


,

tion (df) F. The random df

(1) I`„(x)=— 1 ( - 00 ,,(X )


; for —oo<x<m
n

which assigns mass 1/n to each data value X is called the empirical df of
;

X 1 . . , X,,; here 1, denotes the indicator function of the set A. We will see
,.

in Section 3.1 that even though F may be unknown, it can be accurately


estimated in that

(2) sup Ir(x) — F(x)I -. 0 as n - 00.


—m<x<ao

This fact has been calledt "the existence theorem for statistics as a branch of
applied mathematics" and also# "the fundamental theorem of statistics." It
implies that the whole unknown probabilistic structure of the sequence can,
with certainty, be discovered from the data. Thus the study of the empirical
df F. has been an important and continuing theme in statistical literature.
The natural normalization of F. leads to the empirical process defined by

(3) .[F „(x) — F(x)] for —oo < x < ao.

t Pitman, E. (1979).
1 Loeve, M. (1955).

1
2 INTRODUCTION AND SURVEY OF RESULTS

Example 1. (Classic sums of lid rv's) Let us briefly review a bit of standard
classical probability and statistics. Let X, X,, X2 ,... be independent and
identically distributed (iid) where
(7 2 );

that is, the distribution of X is taken to have meanjL and variance o.2 . We let

n ;_,

Then the strong law of large numbers (SLLN) states that

(4) X„ -"a.s. M- as n - o0

(this actually holds with no assumption about the variance of X). Thus the
mean can be accurately estimated from the data with increasing certainty as
n -*ao.
An assessment of the accuracy of this approximation is provided by the
central limit theorem (CLT) which states that

(5) -)' d N(0, 1) as n -*cm.


or

The convergence in (5) is known to be rather rapid when EIXI 3 <00, in that
we then have the Berry- Esseen result

[v/-n- (g,,
(6) su p
) ] - ( ) I I 3 j J3

where c denotes the N(0, 1) df.


One of the most useful results of probability theory is the Mann-Wald
theorem which, when applied to (5), states that

(7) g(;„ µ)) >ag(N(0, 1 )) as n-^oo

for any continuous function g. Thus, for a trivial example, n( — p) 2 /o- 2 - d X i


as n - cc.
There are other theorems that also assess the rate of convergence of X. — µ
to zero. For example, the law of the iterated logarithm (LIL) states that

(8) ✓n (X„ — fir,)/b n ^^► [ —o,, 0 ] a.s. as n -^ co


DEFINITION OF THE EMPIRICAL PROCESS 3

for b n = 2 log e n; M► means that for almost every w in the set fl of the basic
probability space (CI, s4, P) on which the rv's are defined, the sequence of real
numbers ^(Xn (w)—µ)/b n has as its set of limit points exactly the interval
[—o, a ]. The "other LIL" gives
-

k
(9) lim max (Xk —µ) 6n = - a.s.;
n-.co lsksn V n i =1 2

we refer the reader to Jain and Pruitt (1975) for an updating of this theme
begun with Chung (1948).
Another result on the convergence of X n to 0 is provided by Chernoff's
(1952) large deviation result, for which we refer the reader to Bahadur (1971).
If X 1 , X2 ,... are arbitrary lid rv's, then

(10) n 'log P(Xn ?0)-* logg


- as n-^o0

where

p = inf M(t) with M(t) = E exp (tX) = moment generating function.

Moreover, if X satisfies the "standard conditions" that

(11) 0<t*=sup{t: M(t)<oo}moo, M'(0+)<0,


and M'(t) > 0 for some 0< t < t*,

then we have the stronger result that

P n bn
(12) P(Xn >_ 0) = — ^ where 0 < lim b n : lim bn < oo.

This ends Example 1. U

The study of the empirical df will amplify on the themes of Example 1,


and is greatly simplified if we take advantage of the following fact.

Theorem 1. (The inverse transformation) Let = Uniform (0, 1). For a fixed
df F, define its left continuous inverse by

(13) F '(t)=inf{x: F(x)>_t}


- for 0<t<I.

Then the ry X = F '(f) has df F; that is,


-

(14) X=F-'(6)=—F.

4 INTRODUCTION AND SURVEY OF RESULTS

In fact,

(15) [X^x]=[esF(x)]

or

(15') 1[X X]= 1[£_F(X)] for —oo<x<oo.

Also, for a.e. o we have

(15 ") 1[X <x] =1[r<F(X_)] for all —oo < x <oo.

Proof. Now 6< F(x) implies X = F '(6) <x by (13). If X = F '(6) <_ x, then
- -

F(x + e) >_ e for all e > 0; so that right continuity of F implies F(x) - 6 (see
Häjek and Sidäk, 1967). We have thus shown that (15) holds; that is, the
events are equal. [We only need P(ec (0, 1]) = 1 for this proof, so that the
result can be generalized to other than a Uniform (0, 1) ry f. ]
If e= t where F(x) = t is satisfied by no x's or exactly one x, then (15 ")
holds. If = t where F(x) = t is satisfied by at least two distinct x's, then (15 ")
fails. O

We now indicate how Theorem 1 can simplify our investigation of the


empirical df F. and the empirical process ./n [F„ — F].
Let , ... , f„ denote independent Uniform (0, 1) rv's, and let G„(t) and
U,,( t) for 0< t <_ I denote the empirical df and empirical process of these rv's.
Thus

(16) G„(t) =— t l t and


€ U„(t)=.W[G„(t) —t] for 0 <_t^1.
n

Theorem 2. The sequences of random functions F,, and G„(F) on ( —m, oo)
have identical probabilistic behavior; we denote this by writing

(17) F„ =G„(F) jointly in n.

Likewise,

(18) ✓n [F„ — F] = U„(F) jointly in n.

[ We will improve on this in (3.2.41) where = as. will replace = .]


Proof. For any fixed k and x, < • • ' < x k , the rv's

(19) (Fn(x1), ... , fn(xk)) = (G,(F(x1)), ... , Gn(F(xk));

that is, these have identical joint distributions. This is clear since nF „(x ; ) and
DEFINITION OF THE EMPIRICAL PROCESS 5

nG„(F(x; )) both have Binomial (n, F(x ; )) distributions; the vectors in (19)
have identical "cumulative multinomial" distributions. Clearly, the joint
behavior in n is likewise identical. ❑

The fundamental simplification of our investigation is provided by the above


theorem. It says that we can reduce our probabilistic investigation to the uniform
random functions G,, and U„. The probabilistic behavior of the general random
functions follows by inserting the function F in a deterministic fashion into
the argument of these random functions. In fact, (15) allows us to claim the
following result which is basic to what follows.

Theorem 3. If we begin with independent Uniform (0, 1) rv's f, , ... , C. and


define X, , ... , X. via X, = F'(),
), then

(20) F,=G„(F) and ^(0=„—F)=UJ„(F) when X; =F '(e; ).


-

holds simultaneously for all n.

This completes the main thrust of this section. However, we will now list
several results in the spirit of Theorem 1 that will have some usefulness later.

Notes on Transformed rv's

Note from (15) that for any df F and any 0< t <1

(21) F(x) >_ t if and only if F - '(t) ^ x,

(22) F(x) <1 if and only ifF(t)>x,


(23) F(x,) < t <_ F(x 2 ) if and only if x, < F(t) <x 2 .

Proposition 1. For any df F we have


(24) FoF '(t)>_t
- forall0<_t<-1;

with equality failing if and only if t is not in the range of F on [—oz, cc]. Thus
F - F ' is the identity function for continuous df's F. See Figure 1 c.
-

Proof. Let x = F(t) in (21). Just consider separately is in and not in the
range of F. ❑

Proposition 2. (Probability integral transformation) If X has df F, then


(25) P(F(X)<t)s t for all0^ t 1,

with equality failing if and only if t is not in the closure of the range of F.
Thus if F is continuous, then e= F(X) is Uniform (0, 1). See Figure 2c.
6 INTRODUCTION AND SURVEY OF RESULTS

F -1 -F
z
(a)

F -1oF (x) = x fails if


and only if F (x) = F (x -e)
for some t > 0 .

-
F '°F(X) = ar. X

(b)

FoF -1(t) = t holds if


and only if t is in
the range of F on
F, F -'

os - -t
0
(0)

• Figure 1. F, F - ' F, and F o F - '

Exercise 1. By considering the two kinds of is separately, prove


Proposition 2.

Proposition 3. For any df F we have

(26) F - ' o F(x) <_ x for all —oo < x < m;


with equality failing if and only if F(x — E) = F(x) for some e > 0. Thus

(27) P(F 'oF(X)34X) =0


-

where X denotes any ry with df F. See Figure lb.


DEFINITION OF THE EMPIRICAL PROCESS 7

1 •x
(a)

The set of x 's where [X 5 x] ^ [F(X) S F(x)] is shown below.


Note that [X s x] c [F(X) s F(x)], while
P(jc: if F(X(cj)) 5 F(x) then X(c^) S xj) = 1.

0.— S S

(b)

P(F(X) 5 t)

5-

0 V
0 1
(e)
Figure 2.

Proof. Let t = F(x) in (21). Just consider the two kinds of x's separately.

Exercise 2. Show that F ' o F o F"' = F ', F o F ' o F = F, and (F ') ' = F.
- - - - -

Also show that H ° F ' o F = H if the measure of the df H is absolutely


-

continuous with respect to the measure of F.


Proposition 4. If Fis a continuous df and F(X) = Uniform (0, 1), then X = F.
Proof. Now for f ° F(X) we have
P(X _ x) < P(F(X) 5 F(x)) = P(e <_ F(x)) = F(x) = P(f < F(x))
= P(F '(^) <x)
- by Theorem 1.1.1
= P(F ' ° F(X) x)
-

(a) = P(X <_ x) unless F(x — e) = F(x) for some 6>0



8 INTRODUCTION AND SURVEY OF RESULTS

since F - ' o F(X) = X unless F(X — e) = F(X) for some e >0 by Proposition
3. Thus (a) implies

(b) P(X <_ x) = F(x) unless F(x — e) = F(x) for some e > 0.

Since F is continuous, (b) implies X = F.

Proposition 5. (i) F is continuous if and only if F ' is strictly increasing and


-

(ii) F is strictly increasing if and only if F ' is continuous.


-

Proof. Easy. ❑

0 --

—> x
(a) F`' -F

F ;I -F(x) = x fails
if and only if F(x) = F(x—)
for some s > 0 .

(b)

ii

of
0 (o) 1 t

Figure 3. F, F +'o F, and F' F'.


DEFINITION OF THE EMPIRICAL PROCESS 9
Proposition 6. If F has a positive continuous density in a neighborhood of
F '(t) where 0< t< 1, then (d /dt)F '(t) exists and equals 1 /f(F '(t)).
- - -

Proof. For sufficiently small h we have

F '(t+h)
- — F '(t)
- k
h F(x +k) —F(x)

where x = F ' (t) and where h-0 implies k -' 0.


- ❑

Several of the theorems of this section are recorded in Häjek and Sidäk
(1967, pp. 33-34, 56). See also Withers (1976) and Parzen (1980) among what
must be a multitude of sources.

Exercise 3. (Parzen, 1980) If g is monotone and left continuous, then


Fg(, ) =g(FX') for g>' and Fg(X) =g(FX (1—•)) for g\ and F continuous.

Exercise 4. Let F+'(t) = sup {x: F(x) <_ t }. Show that F (f) = F for a Uni-
form (0, 1) ry 6. Determine when F o F +' (t) = t fails. See Figure 3. Show that
for a.e. w we have

[e <F(x—)]=[F+'(e)<x]= [F '(e) <x] - for all x.

The Elementary Form of Skorokhod's Theorem

The Skorokhod-Wichura-Dudley theorem (Theorem 2.3.4) is basic to much


of our approach. We now illustrate its simplest special case; in this case a
simple constructive proof is possible.
Let F1 , F2 ,... and F0 denote df's such that

(28) F„ -* d Fo as n - oc.

Define rv's X by *,
(29) Xn*= F„'(4) forn?0

where 6 is a fixed Uniform (0, 1) rv; then

(30) X = F„

by Theorem 1. Moreover, and this is the fundamental result:

Theorem 4. (Elementary Skorokhod theorem)

(31) X* -'as. Xö as n oo.


10 INTRODUCTION AND SURVEY OF RESULTS

Proof. Fix 0< t < 1. Let s > 0.be given. Choose x such that

(a) F-'(t) -e <x <F"'(t)

with F continuous at x. The second inequality of (a) gives F(x) < t. Thus
F,, (x) < t for all n exceeding some N, and thus F„' (t) > x for all n ? N. Hence
lim F(t) ? x> F '(t) - e by the first inequality of (a), so that
-

(b) lim F„'(t)>_F '(t)


- forall0 <t<1.
w
-CO

Now let t'> t and choose y so that

(c) F-' (t') < y < F ' (t') + e


-

with F continuous at y. Thus t < t' < Fe F'(t') < F(y) by the first half of (c),
where Proposition 1 was used for the first :. Thus t <_ Fn (y) for all n exceeding
some other N; hence F„'(t) <_ y for all n > N. Thus lim F-,' (t)<y < F ' (t') + e
-

by the second half of (c), implying lim F(t) - F ' (t'). Thus
-

(d) lim F„'(t)<_F '(t)


-

n- co

provided F ' is continuous at t. Combining (b) and (d) shows that


-

(32) lim F 1 ( t) = F '(t)


- at all continuity points t of F '.
-

Since F ' is 7, it is continuous a.s. Thus (32) is just a statement that (31)
-

holds [recall (29)]. This proof can also be found in Billingsley (1979). ❑

Exercise 5. (Parzen, 1980) We say that X. converges in quantile to X if


F„'(t) -+ F '(t) as n -+ oo for each continuity point t of F ' in (0, 1). Show
- -

that convergence in quantile is equivalent to convergence in distribution.

Exercise 6. (Mann-Wald theorem) Suppose X. 4 d X0 as n - ao and +1 is

continuous except on a measurable set 0 for which P(X0 E A) = 0. Then


.fi(Xn ) -* d cu (X0) as n -> oo. Hint: Let X * = F „'(e) for all n >0 where X. = F.
Show that çtr(X*) a.s. 4V(Xö).

2. SURVEY OF RESULTS FOR ICU II

In the remainder of this chapter we will survey some of the main results in
this book. This should give the flavor of what is to follow.

SURVEY OF RESULTS FOR JtU„iI 11

We begin by recalling that nG,,(t) is a sum of n lid Bernoulli (t) rv's. Thus
Example 1.1.1 implies

(1) G(t) a.s . t as n-^oo by (1.1.4),


(2) U (t) -^ d N(0, t(1—t)) asn -oo for0<t<1 by (1.1.5),

(3) IP[U (t)/ t(1 — t) <_x]—«(x)I <_ 24/V


forallxifl /n <t1 -1/n by(1.1.6),
(4) U(t)/bnv.► [— t(1—t) , t(1—t)] a.s. wrtl

as n -* oo by (1.1.8 ),

(5) n -' log P(G n (t)—t?a)-+—g(a, t) as n->cc by(1.1.10),

where

a +t+(1 l—a —t
(a+t)log —a — t)log if0:a^1 —t
(6) g(a,t)= t 1—t
[ cc ifa>1 —t.

Exercise 1. Verify that (3) follows from (1.1.6) [compute EU (t)].

Exercise 2. Verify that (5) follows from (1.1.10).

From Chapter 3

We now turn to the idea of uniformity in t, which we first measure by II II. A


celebrated result is the Glivenko (1933) and Cantelli (1933) SLLN which
establishes that

(7) {IGn — 1 1{ a.s. 0 as n -* co.

From Chapters 3 and 9

A fundamental result useful in establishing the order of this convergence was


obtained by Smirnov (1944) and Birnbaum and Tingey (1951) ; they showed that

(nit—A)) n ) ( ,+ i t
( 8 ) P(II(G —I)*II>A)=Eo A ( —A--/
i n/ \1
for0<A <1.

From such an equation Smirnov showed that

(9) P(^IU^JI>A)- exp( -2,1 2 ) asn-*ooforallA>0.


12 INTRODUCTION AND SURVEY OF RESULTS

Also, Kolmogorov (1933) showed that

(10) P(I!v „lIzA)-+2 Y exp(-2k 2 A 2 ) asn-*ooforallA>0.


k=1

More will be said of exact distributions below.

From Chapters 9 and 13

Smirnov was able to use his formula to establish the LIL

(11) lim jjv„II/b„=2 a.s. asn-oo.

Independently, Chung (1949) used other methods to characterize the upper-


class sequences by showing that if A„ I' then
°° z <m
(12) P(II v„ II ? A„ i.o.) = j according as n exp (-2A „) _

The key to a clean modern proof of such strong results is an exponential


bound. This was obtained by Dvoretzky, Kiefer, and Wolfowitz (1956), who
manipulated (8) to obtain the result

(13) P( aJ„I ?,l)<_21'(llv „II >_d)s58 exp (-2A 2 ) for allA>0.

In view of (9), the factor 2A 2 in the exponent of (13) is the best possible rate.
Mogulskii (1980) obtained the other LIL by showing that

(14) lim b „jIU „Il = it /2 a.s.


n-00

The exact probability (8) converges to the limiting value (9). Is a more
accurate approximation possible? Asymptotic expansions by Chan Li-Tsian
recorded in Gnedenko et al. (1961) show

P(IIU 11aA)=exp(-2A2) 1+3 ✓ n+3n (1-23 )


(15) + 943 Z (5- 1952
+234 +0\n2

for 0<A < 0(n't 6 ). Also, if 2A „/ log e n >' oo, then

P(A) = P(IIU m 1I? A m for some m n)


(16) =exp (-2A2,[1 +0(1)]) if A„ = 0(n"6).
RESULTS FOR THE RANDOM FUNCTIONS G. AND U. ON [0, 11 13
From Chapter 24

A large deviation result for IIG„ — I jI was provided by Abrahamson (1967),


who showed that

(17) n - ' log P(II(C — I) # Ilia) -+ — g(a)= — infg(a,t)


I

with g(a, t) defined in (6). Moreover,

(18) g(a)/2a 2 - 1 as a J.O.

From Chapters 12 and 18

All of the results for IIQJ„II in this section carry over to the uniform quantile
process N„ ° n [G V I] because of the fact, obvious from Figure 3.1.1, that
(19) IIV H = lIU„ II•
The quantile process is dealt with systematically in Chapters 12 and 18.

3. RESULTS FOR THE RANDOM FUNCTIONS G.


AND U. ON [0, 11

From Chapter 9

We begin by considering the exact probability that G. lies between two >'
curves g h having g(0) 0 <_ h(0) and g(1) < 1 <_ h(1). Steck (1971) estab-
lished that

_G„(t) <_h(t) for all0 t-1) = P(a <_ ,<_b for l<_isn)
P ( g ( t )< ; ;

(1) =n!det[(b;—a;)+ 1 /(j — i+1)!],

where

a,=inf {t: h(t)?i /n }, b sup {t: g(t) <_(i— 1)/n },

(x) + x v 0, and where it is understood that all elements of the matrix having
subscripts i > j + 1 are zero. Even more useful than (1) are the recursion
relations found in Section 9.3.
For various special cases, (1) can be evaluated exactly. An especially
interesting result due to Daniels (1945) is

(2) P(IIG„/III>_a) =1 /A for all A> 1 and all n ? 1.


14 INTRODUCTION AND SURVEY OF RESULTS

From Chapter 3

We now turn to a generalization of the Mann-Wald theorem. Letting


mean that the finite-dimensional distributions of the process on the left con-
verge to those of the process on the right, it is a minor exercise to show that

(3) Un -'fd. U as n - x

for a Brownian bridge U (see Section 2.2 for the definition of U). However,
this mode of convergence is not strong enough to yield the Mann-Wald
theorem; that is, it does not follow from (3) that h(U) -> d h(U) for ^1 ^^-
continuous functions h. The concept of weak convergence, =., was designed
to fill this need (we leave the precise definition of = until Chapter 2). In a
landmark paper, Doob (1949) suggested heuristically that

(4) U = U„ as n -^ 00,

in a sense that carried with it the implication that

(5) h(U) h(U) as n -> oo for all h that are II 11- continuous a.s. U.

Doob's conjecture had been guided by his observation, based on earlier work
of Bachelier, that

(6) P(lIU}lI?A)=exp( -21 2 ) for all A>0

and

(7) P(IIUII ? A) = 2 E exp (-2k 2 ,i 2 ) for all A> 0,


k=1

as (2.2.9), (2.2.10), and (4) would imply.

From Chapter 5

Once (4) was established, it was trivial to show results such as

(8)
0 f
U ,(t) dt -> d U Z (t) dt
o ,
as n -+ ao;

just note that h(f) = J'o f2 ( t) dt is Il- continuous. The trick is to determine
the distribution of h(U); the solution of this problem for the h in (8) leads to
some particularly fruitful methodology. This is explored in the next few
paragraphs (see Kac and Siegert, 1947).
RESULTS FOR THE RANDOM FUNCTIONS 6,, AND U. ON [0, 1] 15

The covariance function Ku(s, t) = s A t — St of U can be decomposed into


the form

(9) Ku(s, t) = S A t — st = Y_ A;f; (s)fj (t) for 0: s, t 1,


=1

where

(10) A; = (jTr) -2 and f1 (t) = sin (jTrt) forj = 1, 2, .. .

are eigenvalues and (orthonormal) eigenfunctions of K v defined by the


relationship

(11)E f(s)Ku (s,t)ds=Af(t) for0:t^1

and where the series (9) converges uniformly and absolutely. (This is done
via Mercer's theorem, which is the natural analog of the principal axis theorem
for covariance matrices.) Then the principal component rv's

(12) Z; = J f(t)U(t) dt are such that Z* = Z; / are iid N(0, 1).

Moreover, we have the representation


00
(13)
!= 1

Given the representation (13), it is now clear that


C l oo Cl

(14)
J o
QJ Z (t) dt =
i =t
Z; J f,?(t) dt =
o

X; / (j1r) Z where X; , x , .. . are iid chi-square (1) rv's.


=1

Durbin and Knott (1972) applied the same approach to U, to obtain the
representation
m
(15) Y_ Zj(t)-[v„(t)+U8(t—)]/2 asm-^co for each0^t<_1,

for each w where

(16) {Z„ : j 1} are uncorrelated identically distributed (0, 1) rv's


16 INTRODUCTION AND SURVEY OF RESULTS

that are "nearly normal" for n of about 20. They further indicated how these
variance normed principal components Z„ can be used in a fashion analogous
to tests of fit. Further,

( 17.) JU (t)dl= j=1 A1ZRj.


0

From Chapters 2, 3, and 12

In many ways the concept of weak convergence is a rather inconvenient


one to work with. Technical manipulations became easier to deal with after
Skorokhod (1956) and Komlös, Major, and Tusnady (1975) introduced their
constructions. Thus Skorokhod effectively constructed a triangular array
{ „;, 1< i<_ n, n ? 1} of row-independent Uniform (0, 1) rv's and a Brownian
bridge U, all on a common probability space, that satisfy

(18) 11U n —U a.s. 0 fora special construction;


here U is the empirical process of fn I f„„. Since it is trivial from (18)
, ... ,

that h(Skorokhod's U) -> a. h(U) for any 11 1I- continuous functional h, and
,.

since h(Skorokhod's U) = h(any U„), one obtains immediately from (18) the
result (5) that h(any U„) > d h(U). So far, (18) has only provided an alternative
proof of (5). In what way is it really superior to (4)? First, it can be understood
and taught more easily than (4). Second, it is often possible to show that
h(Skorokhod's U„), or even h(Skorokhod's v„), a . s . h(U) and to thereby
establish the necessary 11 fl- continuity of h in a fashion difficult or impossible
to discover from (4). (Examples will be seen in the chapters on linear combina-
tions of order statistics and rank statistics.) Given that Skorokhod's construc-
tion is based on a triangular array, we know absolutely nothing about the joint
distribution of Skorokhod's (U 1 , U2,...). Thus his construction can be used
to infer d or -' of h(any U „), but it is helpless and worthless for showing a.s_
The Hungarian construction (begun in Csörgö and Revesz, 1975a) and
fundamentally strengthened by Komlös et al. 1975), improves Skorokhod's
construction in that it only uses a single sequence of Uniform (0, 1) rv's and
a Kiefer process K (see Section 2.2 for the definition of the Kiefer process)
on a common probability space that satisfy

.fi n
(19) —B,, <x a.s. for the Hungarian construction;
li (log n) 2 JJU„

here U,, is the empirical process of 6 1 , ... , „ and

(20) B„ = K(n, • )/.Ii is a Brownian bridge

as in (2.2.11). Since h(I3„) = h(U), this construction also yields (5). It is also
CONVERGENCE OF U,, IN OTHER METRICS 17

capable of yielding -* 8.5. for the original sequence, though the subscript n on
B. may make the problem difficult. Its real value is in the rate it establishes.

From Chapter 13

We now turn to LIL-type results. Finkelstein (1971) showed that (see Section
2.8 for the definition of relative compactness M ►)

(21) U„/ b„ a.s. wrt II II as n oc,

where b„ = I2 log e n and Ye= {h: h is absolutely continuous on [0, 1] with


h(0) = h(1) = 0 and I [h'(t)] 2 dt <_ 1}. It is a minor consequence of this that
a.s. wrt ( 1

(22)
J o U(t) dt/b
{J
h 2 (t) dt: h E gel =[0, 1/]
as n - oo;

thus establishing a LIL for the Cramer-von Mises statistics J U (t) dt; of
course, the upper bound in (22) was provided by solving a calculus-of-
variations problem. Cassels (1951) showed that for E >0 and a.e. m there exists
an NE such that for all n > NE.w the increments of U,, satisfy

(23) Iv„(s,t]I/b „^ (t—s)[1—(t—s)]+e for all 0:s<t<_1.

Our intuition is improved if we note, on the one hand, that nG„(s, t]


Binomial (n, t — s) and if we know, on the other hand, that

(24) [h(t)—h(s)] 2 <_(t—s)[1—(t—s)] for all0<_s<t<_ 1 andall hE

4. CONVERGENCE OF U,, IN OTHER METRICS

From Chapters 3 and 11

Treatment of goodness-of-fit statistics often reduces to consideration of

U„(t)+P(t) dt,
(1) J0

say. Based on the construction (1.3.18) it is tempting to write

ci dt — J va/r dt
(2)
fo v ,
n
0
- II(vn — U)/9Il f q^G dt.
0
18 INTRODUCTION AND SURVEY OF RESULTS

Then if 1,giIrdt<oo and if (1.3.18) can be strengthened to U a.s.11 in the


II / q II -metric in the sense that II (v„ — v)/ q (I -a.s. 0 for special processes as in
(1.3.18), then we can conclude that I' U 4i dt d J'o U 4i dt. Proving II 1q11 -
convergence can lead naturally to either a type of Häjek-Renyi inequality or
a type of upper-class integral test; the former are simpler and we give only
one of these here as an example. Let q ? 0 be / and continuous. Then

(3) P(IIU„/qlI ^A) E [q(t)] - 2 dt/A Z for allA>0


e

where 0< 0 <_ Z ; moreover, we may replace U. by U. This is a slight improve-


ment on an inequality of Pyke and Shorack (1968). The "right” inequality of
Shorack and Wellner (1982) is found in Chapter 11.

From Chapters 3, 10, 11, 13, and 24

Analogs of (1.2.7), (1.2.17), (1.3.5), and (1.3.21) will be found in various


chapters. That is, we will give regularity conditions on ali that allow the
conclusions

(4) II(G —1)4GII'a.5. 0,


(5) h(U) -mo d h(U) for all h that are (I /gjI - continuous a.s. U,

(6) h E'} a.s. wrt II II as n -> oo,

b„ = 2 log e n,

(7) n log p(II4i(Gn


- 1 ) # II^a)- - g(a)= — infg(a/iIi(t),t),

as well as other conclusions.

From Chapter 3

Extensions of = to the weighted empirical process


II
Cnt
(8) W(r) = [1[€ißt]—t] for 0<tC1
r=l „— 2
^j -1 C nJ

for known uniformly asymptotically negligible (u.a.n.) constants c„ ; are


developed. Local alternatives are also considered. A natural empirical rank
process t8„ is also considered.

SURVEY OF OTHER RESULTS 19
From Chapter 17

Convergence of the increments

(9) sup {jU (s, t]—U(s, t]l/q(t—s): It—sI> en - ' log n} -* p 0 as n-+oo

is established for the Skorokhod construction of U,, and U. This generalizes


(1.3.18) from functions f = i [o ,, ] to functions f = l (s , ] . More general functions
f are also considered in a study of j* f d v,,.

5. SURVEY OF OTHER RESULTS

Chapter 15

Define the uniform empirical difference process D. on [0, 1] by

(1) ®„ = U„+4!„ on [0, 1].

We write f' when a result is true if we replace f” by any of f + , f - , or Jfl.


Kiefer (1972) proved

(2) lim n"4llB„ ll/ b„ log n =1/f a.s. where b,, = 2 log t n,
n-.m

establishing the order of this difference. In view of Smirnov's result (1.2.11),


it is interesting that Kiefer (1970) also gives us

(3) nW4jIDlj/JlIUnII log n 'a.5. 1


-
as n - oo.

Of course, (3) has the immediate consequence

(4) nl/4IIDnll/ logn^a II"-'ll asn-oo.

From Chapter 14

In a related vein, a modulus of continuity for the U, process will be established


in Chapter 14. This is the analog for U,, of the Levy result for Brownian motion.

From Chapter 16

The normalized uniform empirical process Z. defined by

(5) Zn =-U„/ 1(1—I) on (0, 1)


20 INTRODUCTION AND SURVEY OF RESULTS

satisfies Z(t) = (0, 1) for all t. Let S denote Brownian motion on [0, co) and
define

(6) X(r) = e ' S(e 2 r)


- for oo < r «o;

the process X is known as the Uhlenbeck process. Then

—t)s(1! t )
(7) Z(t)— (l t) t — t(1—t) (1 =x\21og (i 7))'

so that the Z process can be represented in terms of the X process. Now


Darling and Erdös (1956) showed that

(8) b(r)IIXIIöe" —
c(r) - d E4 as r -4,

where

(9) b(r)= 21og2 r, c(r)=2log, r+2 - ' log 3 r-2 ' /2 log4ir,
-

and E„ denotes the extreme value df

(10) E„(r)=exp (—exp ( —r)) for —oc<r<co.

Using this, Jaeschke (1979) showed that

(11) b(n)IIgn!I — c(n) _* d Ev and b(n)II Z II — c(n) -*d E 2

as n - co, thus obtaining the limiting null distribution of the weighted


Kolmogorov-Smirnov statistics with the normalizing weight function +y(t) =
1/ t(1— t). From the point of view of statistical practice, it is unfortunate that
this natural statistic has its distribution essentially controlled by „,,, as will
become apparent in Section 16.1.
We next consider the a.s. behavior of Z. . For this we must consider two
cases because for 0< t s 2 the normalized Binomial (n, t) distribution of 71 n (t)
has a well-behaved left tail and a long, troublesome right tail, while the situation
reverses for 2 ^ 1<1. Csäki (1975) shows that if A„ 1', then

(12) 0 according as Y 1 = j < 00


lim III n II0 2 / A =a.s. { 00
n-oo n=1 nAn l co;

and thus
+ j/

(13) kmm
I I nllo _
log s a.s.
n +ao
SURVEY OF OTHER RESULTS 21

For the better-behaved left tail


ö2

(14) li m 6^^ =v2 a.s.
n

Csörgö and Revesz (1975b) posed the problem of considering IIZ 2/bn as
a n . 0. Shorack (1977, 1980) showed that if a n = (c n log, n )/ n defines c,,, then

^^Z <00 lim c,,>0


(15) lim Ilg" b^^.s.
° =a according as n '^
n_'°° n
= 00 lim cn = 0.
n -.'x

A more complete solution to (15), in which the lim is evaluated for various
a n , was obtained by Csäki (1977).

From Chapters 8, 9, and 14

Treatment of the empirical process via Poisson process methods are motivated
in Chapter 8 and applied in parts of Chapters 9 and 14. Many first proofs of
results presented in other chapters also used these methods.

From Chapters 4 and 5

If the true df FB is actually indexed by an unknown parameter 0, and if B„ is


an estimator of 0, then '/i(F n - Fa,) is called the estimated empirical process.
It is studied in Chapters 4 and 5. Chapter 4 also studies empirical and rank
processes of residuals.

From Chapters 19-23

Applications to L-statistics, rank statistics, spacings statistics, and tests of


symmetry are considered in Chapters 19-22. Chapter 23 is a survey of examples.

From Chapters 3 and 25

Extensions to the case of independent but not identically distributed rv's are
considered in Chapter 25. Local alternatives of this sort are considered in
Chapter 3.

From Chapter 7

Extensions to censored data are considered. Of particular interest are the


product-limit estimator IF,, of Kaplan and Meier (1958) and the corresponding
22 INTRODUCTION AND SURVEY OF RESULTS

cumulative hazard function and quantile function estimators. Here martingale


methods and inequalities, some of which are included in Appendix B, are very
important.

From Chapter 26

Much of the research in the theory of empirical processes during the past 10
years has aimed to generalize the results surveyed in this chapter to higher-
dimensional observations and more abstract sample spaces. Some of these
results are summarized in our final chapter.
CHAPTER 2

Foundations, Special Spaces,


and Special Processes

0. INTRODUCTION

For the case of a sequence of lid rv's, major theorems are the central limit
theorem (CLT) and the law of the iterated logarithm (LIL). The natural analogs
of these results yield weak convergence (^) and relative compactness (-.)
of partial-sum processes and of empirical processes.
In Section 1 we present the basic spaces (C, 16) and (D, ) on which our
processes will be defined and their convergence studied. The most important
of the limiting processes are introduced in Section 2; their interrelationships
are discussed and some interesting results involving them are recorded.
Section 3 defines =, illustrates its utility, and presents criteria for its
verification. Donsker's theorem on the weak convergence of the partial-sum
processes S. to Brownian motion S is established in Section 4. Special
constructions of S„ that converge a.s. are summarized briefly at the end of
Section 4; then Skorokhod embedding is presented in Section 5 and the
Hungarian construction is presented in Section 7, after the Wasserstein distance
it uses is discussed in Section 6.
The general definition of M► is given in Section 8. Since the simplest versions
of this concept are less familiar, some examples are presented, and the relation-
ship to the classic LIL is carefully discussed. Strassen's theorems on the relative
compactness M► of scaled Brownian motion S(nI)/ and of the partial-sum
process S. are presented in Section 9.
In this chapter our aim is to learn how to use the major theorems of =
and M►. Thus the Skorokhod-Dudley-Wichura theorem (Theorems 2.3.4), the
weak convergence criteria of Theorem 2.3.6, Strassen's theorems (Theorems
2.5.2 and 2.9.1), and the Hungarian construction of Section 7 are presented,
discussed, and used, but will not be proved. Enough is proved, however, so
that the reader should be able to understand most of the fine detail of literature
that uses these major results. Roughly, we have used partial sums for illustration
23
24 FOUNDATIONS, SPECIAL SPACES, AND SPECIAL PROCESSES

in the present chapter, while the corresponding results for empirical processes
will be considered in greater detail in later chapters.
There are a few additional results about partial sums that need to be recorded
for use in later chapters. Section 10 considers the supremum of normalized
Brownian motion S(t)/ . Section 11 is devoted to various forms of the LLN.

1. RANDOM ELEMENTS, PROCESSES, AND SPECIAL SPACES

General Concepts

Let M denote a collection of functions x that associate with each t of some


set T a real number x(t). T is usually a Euclidean set such as [0, 1] or R, but
occasionally we will make use of more general index sets T For each integer
k ? I and all t,, ... , t,. in T let IT = a, _.. , be the projection mapping of M
into k-dimensional space R k defined by

ir (x) ....,,k( x)= (x(t1), ..., x(tk))•


,

Then for any Borel subset in the class 913 k of Borel subsets of R k , the set
r^^' fk (B) is called a finite-dimensional subset of M.

Exercise 1. Show that the collection of all finite-dimensional subsets of


M is necessarily a field.

We let At denote the o -field generated by the field Ab of finite-dimensional


subsets. For such a set M, we will call the measurable space (M, .41) a
measurable function space over T. A measurable mapping from a probability
space (fl, .s^, P) to a measurable function space (or, for that matter, to an
arbitrary measurable space) will be called a random element on (i.e., taking
values in) the measurable space. Random elements on measurable function
spaces will also be called stochastic processes, or simply processes.

Exercise 2. Let 1i = -98, denote the Borel subsets of the real line R. Then X
is a process on (M, .41) if and only if X(-, w) e M for all w E f and X(y)
is a ry on (R, ) for each t E J. [X is called a random variable (rv) on (R, A)
if and only if X(B) e .si for all Ba 90.]

Exercise3. Let X denote a process defined on the probability space (0, 4, P)


taking values in the measurable function space (M, At). For any finite-
dimensional subset F of M define

Px (F) = P(X- '(F)).


RANDOM ELEMENTS, PROCESSES, AND SPECIAL SPACES 25

(i) Show that Px is a probability measure on (M,.A( c ).


(ii) Use the Caretheodory extension theorem to show that this measure
extends uniquely to a measure (which we shall also denote by PX ) on
(M,).

Two processes X and Z are said to have the same finite-dimensional


distributions if PX and P. agree on At a ; in this case, the previous exercise
shows that Px and PZ necessarily agree on .i« also. When X and Z have the
same finite-dimensional distributions, we call them equivalent processes and
write X = Z.
When X denotes a random element from the probability space (11, .i, P)
to an arbitrary measurable space (M, .v), the natural PX (F) = P(X '(F)), for -

all F E ,M, defines a probability on (M, Al) called the induced probability. If
two transformations X and Z induce the same probabilities, then we call them
equivalent random elements and again write X =Z.
If all of the finite-dimensional distributions of a process X [i.e., the distribu-
tions induced on (R k , k ) by Tr,,, k (X)] on a measurable function space
(M, At) are multivariate normal, then X is called a normal process.
Suppose X, X,, X2 ,.... denote processes on (M, .tit ). If the convergence in
distribution

7'r y,...,t k (Xn) = (XX(tl), ... , XX(tk))

^d (X(t1),, ... 9 X(tk)) _ cri...... (X) as n -* co

holds for all k >_ 1 and all t,, ... , t k , then we write Xn - f.d _ X as n -* co; and
we say that the finite-dimensional distributions of X. converge to those of X.

Exercise 4. (Change of variable theorem) (i) Let X denote a measurable


transformation from the o-finite measure space (ß, .si, s) to any measurable
space (M, .fit) inducing the measure v(A) = µ (X '(A)) for A E X. If g is a
-

measurable real-valued function on M, then for all A e 41

J X
G(X(tw)) dµ(w) =
'(A) f A
g(x) dv(x)

in the sense that if either integral exists, so does the other and they are equal.
Hint: Prove it for indicators, then simple functions, nonnegative functions,
and integral functions. See Lehmann (1959, p. 39).
(ii) Suppose rv's X = F and Y= G are related by G(K)=F and X=
K '(Y), where K is ‚' and right continuous on the real line with right
-

continuous inverse K'. Then set g, X, s, v, A in (i) equal to


g, K ', G, F, (—oo, x] to conclude that
-

g(K-) dG= t gdF


f^ ­O,K(x)] w'x]
26 FOUNDATIONS, SPECIAL SPACES, AND SPECIAL PROCESSES

since (K ') '((—oo, x]) = {y: K 1 (y) - x} = (—oo, K(x)]. Setting G = I, K = F,


- -

and Y = 6 gives

g(F - ') d1= I g dF


f o.F(x)] J (—^o,x]

Metric Spaces

Let (M, 5) denote an arbitrary metric space and let .tit s denote its Borel field
(i.e., the a-field generated by the collection of all 5 -open subsets of M). Let
4ts denote the a-field generated by the collection of all open balls, where a
ball is a subset of M of the form {y: S(y, x) < r} for some x E M and some r> 0.

Exercise 5. ‚U ä c ,u s , while

s = As if (M, &) is a separable metric space.

The Special Spaces (C,') and (D, :1)

For functions x, y on [0, 1] define the uniform metric (or supremum metric) by

(1) IIx —yii= sup Ix(t)—y(t)I•


or1

Let C denote the set of all continuous functions on [0, 1]. Then

(2) (C, 1! (1) is a complete, separable metric space.

Now `III II denotes the o -field of Borel subsets of C, denotes the o -field
of subsets of C generated by the open balls, and 16 denotes the Q -field
generated by the finite-dimensional subsets of C. It holds that

(3) III II = III II —


Let D denote the set of all functions of [0, 1] that are right continuous and
possess left-hand limits at each point. (In some applications below, it will be
noted that D is also used to denote the set of all left continuous functions on
[0, 1] that have right-hand limits at each point. This point will receive no
further mention). [In some cases we will only admit to D, and to C, functions
X having X(0) = 0. This too, will receive little if any, mention.] Then

(4) (D, 11 ) is a complete metric space that is not separable.

Now III II denotes the Borel o -field of subsets of D, III II denotes the cr-field
of subsets of D generated by the open balls, and 2 denotes the o -field
RANDOM ELEMENTS, PROCESSES, AND SPECIAL SPACES 27

generated by the finite-dimensional subsets of D. It holds that

(5) o
and moreover

(6) CES and f = C n ^.

We now digress briefly. The proper set inclusion of (5) was the source of
difficulties in the historical development of this subject. To circumvent these
difficulties, various authors showed that it is possible to define a metric d on
D (see Exercise 8 below) such that

(7) (D, d) is a complete, separable metric space

whose Borel o-field 9'd satisfies

(8) 2d = 2.

Moreover, for all x, x„ in D the metric d satisfies

(9) llx„—xli-*0 implies d(x„,x)-*0,

while

(10) d(x,,,x)->0 withxEC implies l(x„—xIH0.

The metric d will not be important to us. We are able to replace d by II 11 in


our theorems; however, we include some information on d as an aid to the
reader who wishes to consult the original literature.

Exercise 6. Verify (2) and (3).

Exercise 7. (i) Verify (4). [For each 0s t 1 define a function x, in D by


letting x, (s) equal 0 or I according as 0!5 s < t or t < s <_ 1 ].
(ii) Verify (5). (Consider u{0,: 0: t<_ 1}, where 0, is the open ball of
radius 3 centered at x,.) (See also Dudley, 1976, lecture 23.)
(iii) Verify (6).

Exercise 8. Consult Billingsley (1968, Chapter 3) to verify (7)-(10) for

(11) d(x,Y)=inf{IIx — y° All vIIA —


Ill:AEA},

where A consists of all T continuous maps of [0, 1] onto itself.


28 FOUNDATIONS, SPECIAL SPACES, AND SPECIAL PROCESSES

Exercise 9. Verify that

(12) 16 is both I 11-separable and d- separable as a subset of 2.

We will require the II II -separability below.

Let q > 0 denote a function on [0, 1] that is positive on (0, 1). For functions
x,yon[0,1]we

(13) call II (x - y)/ q II the / q 11 -distance between x and y

when this is well defined (i.e., when lim jx(t)-y(t)j/q(t) is finite for t
approaching 0 and 1).

Independent Increments and Stationarity

If T is an interval in (-cc, oo), then we will write

(14) X(s, t] =X(t)—X(s) for any s, t in T

and we will refer to this as an increment of X. If X ( to ), X (to , t, ), ... , X(tk-, , 1k ]


are independent rv's for all k >_ 1 and all t o < • • • < t k in T, then we say that
X has independent increments. If X(s, t] = X(s +h, t+ h] for all s, t, s + h, t + h
in T with h ? 0, then X is said to have stationary increments. If (X (t 1 + h), ...
X(tk +h)) -(X(t,),..., X(t k )) for all k?1, h >_0, and all time points in T,
Then X is said to be a stationary process.

Other Special Spaces

Let T = [0, co) x [0, cc) with t = (t,, t2 ) a typical point. Each t determines four
quadrants by intersecting the half-spaces s, < t l and s, >_ t, with the half-spaces
S2 < t2 and s 2 ? t2 . We label these quadrants Q( <, < ), Q(<, >_), Q( >_, <),
Q(?, ?). D T denotes the collection of all real-valued functions x on T for
which for each t E T

X0 (t) = tim X(s) exists for each of the 2 2 quadrants Q;


s-f
5EQ
(15)
X(t)=XQ(^'>)( t),
X(t) = 0 whenever any coordinate of t equals 0.

Let 9 T denote the o--field generated by the finite-dimensional subsets of D T.


Then

(16) (D T, - T ) is a measurable function space


BROWNIAN MOTION AND VARIATIONS 29

we will find useful. We write

(17) s1 ifs i <_t; for 1<_i2;

and if s ^ I we let

(18) X(s, t] =X(1 1 , tZ ) — X(s,, t 2 ) — X(s 2 , t,)+X(6,,5 2 )

denote an increment of x. Stationary and independent increments are defined


as before.
We make completely analogous definitions when T= T, x T2 and each T;
is any one of [0, 1], R + = [0, oo), R = (—co, oo), and so on.
For any of the above sets T we let C T denote the continuous functions in
D T and we let (6 T denote the o'-field generated by the finite-dimensional
subsets of C T. Then

(19) (C T, '6 T ) is a measurable function space

that we will also find useful.


Obviously, all of this generalizes to an arbitrary number of dimensions. In
all cases we let I I denote the ordinary Euclidean distance between two points,
while 11 denotes the sup norm over all of T.
We have followed Wichura (1973a).

Exercise 10. Show that (C T, `6 T, 11 11) is a complete separable metric space


when T is a two-dimensional closed rectangle.

2. BROWNIAN MOTIONS S, BROWNIAN BRIDGE U, THE


UHLENBECK PROCESS, THE KIEFER PROCESS K, AND THE
BRILLINGER PROCESS

Existence of and Relationships between the Normal Processes

We take as well known the existence of a normal process S on (C, `6) having
mean-value functions ES(t) and covariance function Coy [S(s), 5(t)] given by

(1) ES(t)=0 and Coy [S(s),S(t)]=SAt for alls,t.

We call S Brownian motion on [0, 1]. (See Exercise 15 below for its existence.)
Note that if 0= t o < 1 1 <• • •<t k+ ,=1, then S(t,) — S(to)....,S(tk+1) — S(tk)
are independent rv's distributed as N(0, t, — t o ), ... , N(0, t k+ , — t k ) for all k ? 1.
Also S(0) —0.
30 FOUNDATIONS, SPECIAL SPACES, AND SPECIAL PROCESSES

Brownian bridge is a normal process U on (C, 'e) having

(2) EU(t) =0 and Cov[U(s),U(t)]=sAt — stfor0^s,ts1.

The existence of Brownian bridge follows from Exercise 1 or 15 below.


Brownian motion S on R + = [0, co) is a normal process on (C R -, R ') that
satisfies (1). Its existence follows from Exercise 2 below.
Exercise 9 below establishes the existence of a normal process X on (CR, R)
with R = (—co, oo) satisfying

(3) EX(t) = 0 and Coy [ X ( s ),X(t)]=exp(—jt—sI) for all s,tER.

We will call X the Uhienbeck process. Note that Var [X(t)J = 1 for all In R.
We also take for granted the existence of a normal process S on
(CR'XR * , CR * xR * ) satisfying

(4) ES(s,, t,) =0 and Coy [S(s,, t,), S(s 2i t 2 )] = (S i A s 2 )(t 1 A 12)

for all s l , s2 , t I , t, n R + . We call S Brownian motion on R + x R.


The Kiefer process K is a normal process on (CR+xto,,J, `fR'x[o,,]) having

(5) EK(s,, t,) = 0 and Coy [l((s,, t,), K(s 2 , t 2 )]


= (s1AS2)(t,At2)(t,At2 — t, tz)

for all s,, s2 >_ 0 and 0 <_ 1, 12 <_ 1. Its existence follows from Exercise 12 below.
Relationships between these processes are shown in the following exercises.
Because of Exercises 2.1.2 and 2.1.3, it is clear that all the transformations
below lead to normal processes on the appropriate (CT, 16 T ). It is trivial to
verify that the resulting mean-value functions are zero so these exercises merely
consist of verifying that the resulting covariance functions are correct.

Exercise 1. Define a random element V on (C, 1C) by


V(t) = S(t) — tS(1) for 0 t < 1.

Then V is a Brownian bridge on (C, 16). So is U —V.

Exercise 2. If U is Brownian bridge on [0, 1], then

S(t) = (I + t)U(t/(1 + t)) for t?0


is Brownian motion on R. This is called Doob's transformation.

Exercise 3. If S is Brownian motion on R + and r> 0, then t/rS( /r) is also


Brownian motion. So is S(•+r) —S(r).
BROWNIAN MOTION AND VARIATIONS 31
1."0

0. 5

0. 0

—0.

—1. 0

5
0.0 0.2 0.4 0.6 0.8 1. 0
(a)
1.5

1.0
N

0.5

0.0

—0.5

—1.0

—1.5
0 0 0.2 0.4 0.6 0.8 1. 0
(b)

Figure 1. (a) S (b) U

Exercise 4. Let § denote Brownian motion on R + . Then

U(t)=(1— t)S(t/(1—t)) _t<_1


for0<

is Brownian bridge on (C, 16). [It is true that S(r)/r-+ a.5 .0 as r-> x ; you may
use this fact. This fact is a consequence of Theorem 2.8.1.]
32 FOUNDATIONS, SPECIAL SPACES, AND SPECIAL PROCESSES

Exercise 5. Define a random element Z on (C, 16) by

Z(t)=[U(t)+UJ(1— t)] / ✓2 for0<_ts1,

where U is Brownian bridge on (C, 16). Then Z is a Brownian motion on [0, 2],
while Z(t)= 1'(1—t) fort <_1. Also

U(t /2) — U(1— t/2) for 0:s: t <_ 1

is Brownian bridge.

Exercise 6. If U and V are independent Brownian bridges on (C, ) and if


0< a -1, then ✓ 1 — aU —'Ja0/ is also Brownian bridge on (C, 16).

Exercise 7. Let U be a Brownian bridge on (C, 16) and let Z denote a N(0, 1)
ry independent of U. Then U + IZ is Brownian motion on (C, `6). [Recall that
I(t) = t is the identity function on [0, 1].]

Exercise 8. Let S denote Brownian motion on [0, ax). Then the time-reversal
process tS(1 /t) for t >0 is again Brownian motion. [It is true that S(r) /r -* a , s .0
as r-' oo; you may use this fact. This fact is a consequence of Theorem 2.8.1.]

Exercise 9. Let S denote Brownian motion on R. Show that

X(t) = e - 'S(e 21 ) for —co< t <oo

is a Uhlenbeck process. Show that X is a stationary process, but that X does


not have independent increments.

Exercise 10. Let X denote the Uhlenbeck process on R. Show that

t(^X(Ilogi
t t ) for0<t<1

v(t) _
0 for t = 0 or 1

is a Brownian bridge on [0, 1].


Exercise 11. Let U denote a Brownian bridge on (C, 16). Define
B(t)=[U(at)—tU(a)] /.Ja for0<ts 1
for a fixed 0< a <_ 1. Then B is a Brownian bridge on (C, 16) and is independent
of {U(t): a <_ t^ 1 }.
Exercise 12. Let S denote two-dimensional Brownian motion. Define
K(s, t) =S(s, t) —tS(s, 1) fors>Oand0st<_1.

Then K is a Kiefer process.



BROWNIAN MOTION AND VARIATIONS 33
Exercise 13. Let 1< denote a Kiefer process. Then for each fixed s > 0 the
process U(t) = K(s, t)/'Js for 0:t<_1 is a Brownian bridge on (C, 16). Also

B(s, t)_[S(st)—tS(s)]/.ls for 0<t<_1

is a Brownian bridge for each s>0. Let B(0, t) = 0 for 0 s t < 1. We call B the
Brillinger process.

Exercise 14. Let § denote Brownian motion on (C, t). Then

(t)— J r S(r)r"' dr for0<_ t<— 1,

(t)—
J U(r)r ' dr
- for0^ t<1,

U(t)+ J U(r)(1—r) ' dr - for0t<-1

are all Brownian motions on (C, 16). [Note (3.4.47), (3.4.48), and (6.1.24).]

Exercise 15. Let Z0 , Z,, Z2 ,... be iid N(0, 1). Define

f sin (kart)
U(t) = Zk for0<t<1.
k-, kir

Show that U is a well-defined random element on (C, 16) that satisfies our
definition of Brownian bridge. Likewise (or use Exercise 7),

'i2sin (kTrt)
S(t)=Zo t+ Zk for0<—ts1
k=1 kir

in Brownian motion on (C, W). This establishes the existence of Brownian


motion on (C, 14).

Boundary-Crossing Probabilities for S and U

We now record various probabilities associated with Brownian motion:

(6) P( sup S(s)> b j =2P(N(0, t)> b)


0_s_t III

=
f 1 b exp
o ^27rS3
(_b

Z f ds
2s/
for all b> 0

= F (t) F

where r inf {s: S(s) = b};


34 FOUNDATIONS, SPECIAL SPACES, AND SPECIAL PROCESSES

(7) P( sup IS(t)I>b f =4 Y P((4k —3)b < N(0, 1) <(4k -1)b)


0-e-1 / k=1

=1--
7r k=o 2k+ 1
exp /— (2k 8b^ ^2 2 f for all b > 0;

(8) P sup > 1 =exp(_2ab) for all a ? 0, b > 0;


)

( ,mo o at +b
§(t)

(9) Pt
at +b
I§(t)I , 1 ^

=2 ( -1) k + ' exp (-2k 2 ab) for all a ? 0, b>0;


k =I

(10) P(µ{t: S ( t )>0,0<_t^1}<_b)= (2 /ir)aresin(..Jb) for0<_b<_1

where denotes Lebesgue measure.


Related results for Brownian bridge are, for all b > 0,
(11) P(IIU + II > b) = exp (-2b 2 ),
-1)k +I exp (-2k 2 b 2 )
(12) P(IIUII 5 b) = 1-2Y, (
k =I

= b I exp ( —(2k-1) 22 /(8b 2 )),

(13) p({t: UJ(t)>0})=Uniform (0, 1).

Proof. We will give informal justification for some of results (6)-(13) based
on "reflection principles."
Consider (6). Corresponding to every sample path having 5(t)> b, there
are exactly two "equally likely" sample paths (see Figure 2) for which IS>
b. Since S(t) = N(0, t), Equation (6) follows. The theorem that validates our
key step is the "strong Markov property" (see Theorem 2.5.1); we paraphrase
it by saying that if one randomly stops Brownian motion at a random time r
that depends only on what the path has done so far, then what happens after
time 'r as measured by {S(-r+ t) — S(r), t >_ 0} has exactly the same distribution
as does the Brownian motion {S(t): t?0}. In the argument above, r was the
first time that S touches the line y = b. Change variables to get the second
formula.

y
b

0
t Figure 2.

BROWNIAN MOTION AND VARIATIONS 35

A second reflection
3b
2b
ç:::ection %.

or path
—b

Figure 3.

Equation (7) follows from a more complicated application of another


reflection principle. Let A + = [ II§ + II > b] = [S exceeds b somewhere on [0, 1]J
and A_ [ II S II > b] = [S falls below —b somewhere on [0, 1]J. Though
[ II S II > b] = A + u A-, we have P( II S II > b) < P(A + ) + P(A_), since we counted
paths that go above b and then below —b (or vice versa) twice. By making
the first reflection in Figure 3, we see that the probability of the former event
equals that of A + _ = [ I)S + I) > 3b], while that of the latter equals that of A_ + =
[II§ - I > 3 b]. But subtracting out these probabilities from P(A + )+P(A-) sub-
tracts out too much, since the path may then have recrossed the other boundary;
we compensate by adding back in the probabilities of A+-+= [II § + II > 5b] and
A- + - = 11S11 > 5b] which a second reflection shows to be equal to the
[

appropriate probability. But we must continue this process ad infinitum. Thus

(a) P(IISIIö> b)
= P(A + ) — P(A+-)+ P(A+-+) —.. .
+ P(A) — P(A-+) + P(A-+-) — .. .
(b) =2[P(A+) —P(A+-)+P(A+-+)—• • •] by symmetry

=2 (-1) k 2P(N(0, 1) > ( 2k —1)b) by (6)


W

(c) =4 S P((4k-3)b <N(0, 1) <(4k-1)b).


k=I

=1— (-1)kP((2k-1)b<N(0, 1)<(2k+1)b).


k=— ao

See Chung (1974, p. 223) for the second formula in (7).


Consider (8). We follow Doob (1949). Let

(d) ¢(a, b)° P(S(t)-at+b for some t?0) where a>_0, b>0.
Now note that

(e) .0(a, b, + b2) _ 0(a, b1)0(a, b2)


36 FOUNDATIONS, SPECIAL SPACES, AND SPECIAL PROCESSES

b, + b 2

b,

0
t

Figure 4.

because at the instant T when S first hits the line at+ b,, we then have the
same problem all over again where the equation of the original line at + (b, + b2 )
relative to the point (T, S(-r)) has become at +b2 (see Figure 4). This is again
rigorized by the strong Markov property. Also, 4(a, b) ? P(S(l) ? a + b) > 0
and 4(a, b) is \ in b. The only solution in b of the functional equation (e)
having these properties is

(f) ¢(a, b) = exp (—(a)b) for some constant Ii(a) depending on a.

In order to solve for ifr(a) we consider Figure 5. Let r =inf {t: S(t) = b }. Now

P(Ts)=P( sup S(r)>_bf =2P(N(O, ․ ) -b) by(6),


\0—rs

so that r has density function


z
(14) fT(s)= ^ b 3/2
expI -2s s>0.
) for
^ s

Note from Figure 5 that once S intersects y = b at time r, the event that is
then required has probability 4(a, aT) by the strong Markov property. We
thus have
exp (—iIr(a)b) = 4(a, b) by (f)
(g) = E [¢(a, ar)]
T by the strong Markov property

J
(h) = o" exp (-4i(a)as) —b exp( - ---- 'I ds
ds by (f) and (14)
o 1s 3 /Z J
^^r

2
f o exp $ — a ^2y 2 b2 +y 2 )
tting
dy le y2= 2s —

= exp ( —b 2) by elementary integration,

so that iIi(a) =2a.



BROWNIAN MOTION AND VARIATIONS 37

y = a t + b Relative to this point, the


sloping line has equation
y = at + (a1)
b p y=b

Figure 5.

Consider (11). Now

P(IIU + II>b)=P1 sup (1-t)S(t/(1-t))>b1 by Exercise 4

(i) =P(supS(r)/(l+r)>b f letting r=t/(1-t)


r>O

= exp (-2b 2 ) by (8)

establishing (11). Likewise, (12) follows from (9) via

G) P(IIUII> b) = P(sup I§(r)I/(l+r)> b}.


r>O

In Section 5.9 we will establish (12) as a limiting result for a two-sample


statistic; then 0) can be looked on as a proof of (9) in the special case that
a = b. For the general proof of (9) see Doob (1949). See Durbin (1973a) for
the second part of (12).
The arcsin law (10) has a long and rich history. However, we will not use
it or take time to prove it, though we list it because of its tradition. See
Billingsley (1968, p. 80).
The result (13) will be established in Section 5.6 as a consequence of some
combinatorial identities. ❑

The results contained in Exercises 16-19 will be important in the sequel.


Exercise 16. Use (8) and (9) as in the proof of (11) to show that for all a, b> 0

(15) P(U(t)<_a(1-t)+btfor0<t<_1)=1 -exp (-tab)

and
P(-a(1-t)-bt<_UJ(t)<a(1-t)+bt for 0^t<1)

(16) =1-2 (-1) k exp (-2k 2 ab).


k=t

See Häjek and Sidäk (1967, p. 182).


38 FOUNDATIONS, SPECIAL SPACES, AND SPECIAL PROCESSES

Exercise 17. Show that for all a, b>0

P ( U ( t )^a( 1—t) +bt for c<t<—dIU(c)=x and U(d) =y)


(17) = 1 —exp (- 2 [a( 1 — c) +bc—x][a(1— d) +bd —y]/(d—c)).

Hint: The conditional distribution of U on [c, d] satisfies

(18) UJI[U(c) = x and U(d) = y]


_ t —c d—t t—c
+ d —c y forc<_t<_d
U d — c)+d —cx

See Häjek and Sidäk (1967) and Malmquist (1954).

Exercise 18. (i) (Doob, 1949) Show that for a, c >_ 0 and a, ß ? 0

P(—(at +/3) <_ §(t) <_ at+ b for all t ? 0)

(19) = 1— [exp (-2A k )+exp (-2Bk )—exp (-2Ck )


k=1

—exp (-2Dk )],

where

A k = k 2 ab +(k -1) 2 aß+k(k-1)(aß+ba),


Bk =(k- 1) 2 ab+k 2 aß+k(k-1)(aß+ba),
Ck = k 2 (ab +aß) +k(k-1)aß+k(k+1)ba,

Dk = k 2 (ab +aß) +k(k+1)aß+k(k-1)ba.

(ii) (Häjek and Sidäk, 1967) Now show that for all a, b, a, ß >0

P(—a(1—t) —ßt<_U(t)<— a(1—t) +hi)


(20) = P( —(at +ß) <—S(t)<at +b).
(iii) (Häjek and Sidäk, 1967, P. 199 and Malmquist, 1954) Suppose JxJ
a(1—c) +ßc and IyJ<a(1— d) +ßd. Then

P( — a(1 —t)— 5t <U(t) < a(1 — t) +btIU(c) = x and U(d) =y)


(21) [ ex p 2Ak +ex 2Bk ex 2Ck
k=2 ( d—c) p( d—c) p(d—c

— exp —
C d2Dk
—c /J
BROWNIAN MOTION AND VARIATIONS 39
where

Ak = {( 2k - 1)[a(1 - c)+ßc] - x}{(2k - 1)[a(1 - d)+ bd ] - y},

Bk = {( 2k - 1)[a(1 - c)+ bc]+x}{(2k - 1)[a(1 - d)+ bd]+y},


Ck ={2k[a(1-c)+bc]-x}{2k[a(1-d)+bd]+y}+xy,
Dk = { 2k[a(1-c)+bd]+x}{2k[a(1-d)+bd] -y}+xy.

(iv) Extend (iii) to the asymmetric case when -a(1- t) - bt is replaced


by -a(1- t)-ßt.
(v) Use (iii) to show that for all a, b> 0

_ Y_
(22) P(1IU 11 < a and IIU II <_ b)
-

k=—ao
exp (- 2k 2 (a+b) 2 )-

(vi) (Kuiper, 1960) Show that for all b>0


F,
k=—w
exp (- 2[b+k(a+b)] 2 )

(23) P(11OJ 1 + U I >_b)=2Y_ (4k 2 b2 -1) exp (-2k 2 b 2 ).


k=1

(A nice proof is in Dudley, 1976, p. 22.6.)

Exercise 19. (Renyi, 1953) Show that for A >0 and 0<c<1

(24) P(U(t)!_-At for all c <_t<_1)=24D(A ) -

and

(25) P(IJU/I IIC A)=4 kEI [4k_1AJii)

-OP((4k-3)A /iii)]
1-c

_ 4 °° (-1) k _(2k +1)2ir2(1 -c)


(26)
Irk o 2k+1 eXp ( 8A2c )

Here, 1 denotes the N(0, 1) df. The extension of these results from [c, 1] to
[c, d] also appears in Renyi (1953).

The results contained in the remaining exercises constitute a slight


digression.
40 FOUNDATIONS, SPECIAL SPACES, AND SPECIAL PROCESSES

Exercise 20. (Billingsley, 1968, p. 79) Let a, b> 0 with —a < u < v < b. Then

P(II§ IIö^aandII§ + IIö- b and u<S( 1 )sv)

(27) = P(u+2k(a +b) <N(0, 1) <2k(a +b))


k = -m

P(2b — v+2k(a + b)< N(0, 1) <2b— u +2k(a +b)),


k = -w

giving

P(IIS IIö^ b and u:5 §( 1 ) <_ v)


(28) =P(u<N(0, 1) <v)—P(2b— v <N(0, 1) <2b —u)

and

(29) P(IIS II ^ a and IIS II < b)


-

00

_ (-1)kP(—a+k(a +b) <N(0, 1) <b+k(a +b)).


k = -w

Use (27) to give another proof of (22).

Exercise 21. Consider the boundaries shown in Figure 6 where a 1 > 0, a 2 > 0,
b, >— b 2 (but b, = b2 = 0 is disallowed), and 0< T s cx. Let Pi( T) denote the
probability that Brownian motion S on [0, co) hits the upper line and does so
prior to time T and prior to hitting the lower line. Then if b, ? 0 we have
0"
(30) p1(oo)= Z {exp(- 2 [k 2 a, b , — (k - 1) 2 a2b2 — k(k - 1)(a,b2 — a2b1)])
k=1

—exp(-2[ k 2 (a, b,—a 2 b2 )—k(k-1)a,b 2 +k(k+l)a 2 b,])}.

a,

-a 2

Figure 6.
BROWNIAN MOTION AND VARIATIONS 41

If b, = b 2 -O, then

exp (2a 2 b,) —1


(31)
P'(°O)=exp (2(a1+a2)b,)—I

See Anderson (1960) for formulas for p(co) when b, <0, for p 1 (T) in general,
and for this same probability conditional on the values of S(T). Other
generalizations are also given by Anderson.

Exercise 22. Use a reflection principle to show that

sup S(s)aa and S(t)<_b!=P(S(t)>_2a—b)


P( o_s_r

fora>0, —co<b<a and

P(S(t)=0 for some 0<t o <t<t,)=(2/vr)arccos ✓ to/t,.

See Karlin and Taylor (1975, p. 348).

Exercise 23. (i) Show that Z = max {s: S(s) = 0 and s <_ t} has

P(Z < z) = (2/ ir) arcsin for t ? z ? 0.

See Ito and McKean (1974, p. 28).


(ii) Let V=µ({t:0<t<Z and U(t)>0}) and S=S(1). Then (Z, V,S)
has density
I IZ

2^r[z(1 z)I3/2exp(-2( ls z) l for0<v<z<1and-00<s<.
s

for 0<v<z< 1 and —co<S<oo.

See Billingsley (1968, P. 82).

Exercise 24. Show that

P(IiS 11 aIS(1)=b)=exp(-2a(a—b)) for a,b>0

and

P(iiS /Iii6 >a)=2P(N(0, 1)>a') for a, b>0.

Exercise 25. Let a,b>0 and 0<c<1. Compute P(U(t) a(1—t)+bt for
c<—t<1).
42 FOUNDATIONS, SPECIAL SPACES, AND SPECIAL PROCESSES

Exercise 26. Show that for 0< a <b < 1

P(v(t)00fora<t <b) = 2 arccos (b — a) /b(1 —a)


V

2a(1—b)
_ — aresin
it b(1 —a)

See Renyi (1953) and Karlin and Taylor (1975, p. 387).

Other interesting results are found in Anderson (1960), Billingsley (1968),


Brieman (1968), Csörgö (1967), Doob (1949), Dudley (1976), Durbin (1973a),
Freedman (1971), Karlin and Taylor (1975), and Malmquist (1954).

Integrals of Normal Processes

Let X be a normal process on (C, W) with 0 means and covariance function

(32) K(s, t) = Coy [X(s), X(t)] for 0 s, t 1.

Suppose

(33) H = H, — H2 where each H is >' and right (or left) continuous.


;

Suppose

(34) a.2
fo, fo, K(s, t) dH(s) dH(t) exists.
Proposition 1. Suppose almost all sample paths of the normal process X on
(C, '') are such that

(35)
fo, X(s) dH(s) exists a.s.
Under (32)-(34) we have that

(36)
fo, X(s) dH(s) = N(0, Q2 ).

Proof. We know from (35) that

(a)
9
1-6

X dH -> J
0
X dH.

WEAK CONVERGENCE 43
Moreover,
1 B
(b)

(c)
f o

X dH= n-•

= lim N(0, arm)


(n(1-0))
um Y_ X(i/n)[H(i/n)–H((i-1)/n)] a.s.
(no)

n-W

with
(n(1-6)) (n(1-e))
K i –, H –1 – H —
1—1
(d) ^n =j =(ne)
E i=(ne) n n n n

x[H(n)–H(L )

nl J
1. 1-0 i B

(e) ßo-2(0)= K(s, 1) dH(s) dH(t) as n- co


B JO

(f) Q2 as 0 -o 0.

Clearly, N(0, Q„)-4 d N(0, Q Z (6)) as n - an, and N(0, o 2 (8))-* d N(0, 0- 2 ) as
0-+0. ❑

Exercise 27. Let H denote a left continuous function on (0, 1) that is of


bounded variation on each closed subinterval of (0, 1); we say H is of bounded
variation inside (0, 1) and write H E BVI(0, 1). Define Y = H(^) – H(a) for
any fixed a in [0, 1] at which H(a) is finite (define H by continuity at the
endpoints). Then

Var [ Y] = Var [H()] = f H Z (t) dt – I J , H(t) dt^2


o, L o

provided H €12 . Now show that

Var[Y]= J 1
0 f
o ,
[sAt–st]dH(s)dH(t)

also holds.
Hint: In the special case when H(1) = 0 we can write Var [ Y] = Coy [ Y, Y] _
t in Cl
j0 [l [o , t) (r)–s] dH(s) J o [l [o ,, ) (r) – t] dH(t) dr and use Fubini's theorem.
Note that l [o , r) (^) = 1 (£,, t (r) and Coy [1 [o , ․ ) (t ), 1 to ,, ) (e)] = S A t – st.

3. WEAK CONVERGENCE

Let (M, S) denote an arbitrary metric space. Let X n , n ? 0, denote random


elements from some probability space (fl, sd, P) to the measurable space
44 FOUNDATIONS, SPECIAL SPACES, AND SPECIAL PROCESSES

(M, .'Gls ), and let P. denote the probability measure induced on (M, At:) by
X„. We want criteria which will imply that

(1) i/i(X„) 41(X0) as n- 00 for "nice” real-valued functionals +'.

To this end it would suffice to show that f, = 1 ( _, t] o ii satisfies P(tk(X„) < t) _


Ef,(X„) --> Ef (X o ) = P(r(i(Xo ) s t) as n- for enough real t. Since indicator
functions of intervals can be closely approximated by "smooth” functions,
the following definition should seem plausible.

Definition 1. For random elements X, n >O, on (M, .tit ä) as above, we say


that X„ converges weakly to X0 (or P. converges weakly to Po ) provided
SM ! dPn = Ef(X„)-* Ef(X0 ) = J M fdPo as n -+oo for all real-valued functions
f on M that are bounded, 3 -uniformly continuous, and .tit: -measurable. We
denote this by writing

(2) X„=,'Xo on(M,AS,8) asn-oo,

or

(2') P„ =Po on(M,,0 ,S) asn-goo.

Definition 2. A collection of distributions OP on (M, A1) is said to be weakly


compact if each sequence of distributions of 9 contains a subsequence that
converges weakly to a distribution on (M, At) (which we do not require to
be in 1').

Our primary interest is in processes on (D, ). Thus we will prove one of


the main results on (D, :9) from scratch. However, our interest is in learning
to use the tool of weak convergence, not to rederive the basic theorems. We
thus content ourselves with a survey of some additional useful results for
which the reader can find proofs elsewhere.

Remark 1. The uniform empirical process to be considered below will provide


a fundamental example of a process X„ that induces a measure P. on (D, II II)
but is undefined on (D, III II). This is the reason At must be used in the
general treatment of Definition 1.

Exercise 1. Show that on (R k , P11,, I j) we have X. X0 as n - m is


equivalent to X. -)'dXo as n - oo.

For each m >_1 we let T,„_ {t,„ o = 0 < t,,, I <• • •<t k . =1 }, and define
mesh (Tm ) = max {t; — t,_ 1 : 1 <_ i ^ Q. We call the function A,„ (x) in D that
equals x(t,„ j ) at each t,„ , 0 - i _< km , and, is constant on each [ t„,, _,, tm; ),
; ;

1 1 ^ k„„ the T,„- approximation of x.


WEAK CONVERGENCE 45

Theorem 1. (' criteria) Let X,,, n >_0, be processes on (D, 9) having


Xn (0) = 0. Suppose mesh (T,„) -' 0 as m -> oo. If

(3) Xn->`.d.Xoasn-*oo where P(X0 EC)=1

and for all e >0

(4) Iim P(IIXn —A,,,°Xn II>E)sdEm where dd ,,,->Oas m-oo,

then

(5) Xn =Xo on (D, 2, II II) as n-*oo.

Consider the modulus of continuity of X. defined by

(6) wn(a)= sup {IXn (s, tu: 0<_t—s<_a).


Note that for Tm ={0, 1 / m, ... , (m —1)/ m, 1} we have

(7) IIXn—Am °Xn(I C Wn( 1 / m )^ 3 II Xn — Am ° Xn II ,

using the triangle inequality via Figure 1 for the second inequality in (7).
Hence (4) is really a statement that the sample paths of the processes are "not
too wiggly." We could thus replace (4) by

(8) firn P(w n (1/m)>_e)-*0 as m-'oo foralle>0.


n-.ao

Proof of Theorem 1. Let f denote a real-valued, 2-measurable function that


is bounded (by K, say) and II II-uniformly continuous. Then

EJ (Xn) — EJ (X0))

^I E{f(Xn) f(Am
— ° Xn )]I+IEf(A m ° Xn) — EJ(Am ° X0))

+IE[f(A m ° X0) J (X0)]I —

<_ [2KP(II Xn — Am ° X,j 2 e E)+Sf (e)]


+IE[f(Am ° Xn) — f(Am ° X0)]I

(a) + [2KP(IIAm ° Xo —Xo lI ? E)+Sf (e)]

where Sf(s)=sup{If(x) — f(y)I: IIx —


yII <e}. For a fixed 0>0 we can choose

0 1 Figure 1.
46 FOUNDATIONS, SPECIAL SPACES, AND SPECIAL PROCESSES

e >0 so small and then m so large that

(b) I Ef(X„) — Ef(Xo )I < 0+ 0+ 0 for all n sufficiently large

by applying II II-uniform continuity off and the uniform continuity of the


paths of X0 to the third term in (a), II TI-uniform continuity off and (4) to
the first term, and the argument of the next paragraph to the second term.
Since (b) implies Ef(X„)-* Ef(Xo ), we will have then verified Definition 1.
We let h,,, : Rk,,, + , -> D by letting h m ((ro , r,, ... , rk,,, )) denote the function
that equals r; at t,,, i for 0 <_ i ^ k,„ and is constant on each [t,,,, i _ 1 , t,„ ; ), 1- i < km .
Clearly, h,„ is 2 — ?1k + ,-measurable and uniformly continuous. Thus f o h„,
is uniformly continuous also. Also h,„ ° ITT, = A. for the projection mapping
7r,,, implies Ef(A,,,°Xn)=E(f - h„n)°ITT(X„) *E(f°h,,,)°ITT (Xo)=
Ef(A m o Xo ) for any II II-uniformly continuous f from D to R; use Exercise 1•
We have merely replaced £Y by = 2 IB II in a theorem of Wichura (1971).
The theorem is equally true for 2 II II; but the difficulty with applying the 2^ i 1
version of the theorem is that empirical processes induce well-defined measures
on 2 but they do not induce well-defined measures on 2 II Ia . ❑

In fact, even more than Theorem 1 is true.

Theorem 2. (Weak-compactness criteria) Suppose we omit hypothesis (3)


from Theorem 1. Then we can still claim that

(9) {X„: n >_ 1} is weakly compact on (D, , II II).

Moreover, if X„.^X on (D, 2, II Ij) on some subsequence n', then


(10) P(X E C) =1.

Exercise 2. Prove Theorem 2. Consult Billingsley (1968, p. 127).

One of the most common criteria for weak convergence on general spaces
is phrased in terms of tightness. A sequence X., n> 1, of (M, 3)- valued random
elements is tight if the corresponding induced measures are tight: for every
e> 0 there exists a compact set K c M such that
(11) P„(K)=P(X„EK)?l —e forall n?l

where P„, n -1, denote the induced probability measures on (M, X"). [See
Dudley (1966; 1976, Chapter 23) for a generalization of the following theorem
suitable for the nonseparable case when .4lä C 41S]

Theorem 3. (= criteria) Suppose that (M, S) is separable. If

(12) Xn ^r.d.XO
- as n- +X
WEAK CONVERGENCE 47

and

(13) Xn , n>_ 1, is tight,

then

(14) Xn = X0 on (M,.M b, S) as n-.co.

Exercise 3. Show that if we replace the phrase "S-uniformly continuous" in


Definition 1 by "S-continuous," then the resulting definition is equivalent to
Definition 1.

Exercise 4. (Portmanteau theorem) The following are equivalent for


measures P,,, n >_ 0, on the Borel a--field M s of a metric space (M, 5):
(i) P = PO on (M, 4t, S),

(ii) lim Pn (F): P0 (F) for all closed F,

(iii) lim PP (G)? P0 (G) for all open G,


n-"X'

(iv) lim PP (B) = P0 (B) for all sets B whose boundary dB has P0 (aB) = 0.
n-.00

A Special Construction

What follows is an extremely elegant result that is of importance both motiva-


tionally and technically. It allows us to replace processes that converge weakly
(this implies absolutely nothing about convergence of sample paths) by
equivalent processes whose sample paths converge a.s. The idea is to then
establish additional results for the equivalent processes using the a.s. conver-
gence and then claim these results (if possible) for the original processes.

Theorem 4. (Skorokhod; Dudley; Wichura) Suppose

(15) Xn = Xo on (M,.i1ls, S) as n-^oo

and

(16) P0 (MS ) = 1
for some .eft -measurable subset M. of M that is 5-separable

(i.e., M3 has a countable subset that is 5-dense in M s ). Then there exists a


probability space ((l*, .QQ*, P*) and .i1 b — A*- measurable transformations
X, n >_ 0, from ft* to M inducing the probability measures P. on (M, .tit ä )
(i.e., X. and X* are equivalent processes) such that S(X*, Xö) is a.s. equal
48 FOUNDATIONS, SPECIAL SPACES, AND SPECIAL PROCESSES

to a ry and satisfies

(17) 3 (X*, X0) a as 0 as n -oo.

Proof. See Dudley (1976) for a very nice writeup. Skorokhod's (1956) funda-
mental paper required (M, S) to be complete and separable; in this case an
even more accessible proof is in Billingsley (1971). ❑

This theorem allows an easy proof of the fact that weak convergence does
indeed imply (1) for a wide class of functionals i/i, as we now show.
Let 0 ‚ = {x E M: si is not 8 -continuous at x }. If there exists a set ‚ E tits
having 0,^ c Q' and P(Xo E 0 *) = 0, then we say that 4i is a.s. 8-continuous
with respect to the Xo process.

Theorem 5. Suppose XX =X0 on (M, .pits , S) as n -* oo where X0 satisfies (16).


Let +1 denote a real-valued, 41: -measurable functional on M that is a.e.
8 -continuous with respect to X0 . Then the real-valued rv's tfi(X„) satisfy

(18) 41(X.,) ->d +G(X0) as n-*oo.

Proof. Let X* denote the equivalent processes of Theorem 4. Then the


are rv's and te(X*)-+ tr(X o*) as n -oo for all (o . A, u A 2 where A,
is the P* -null set of Theorem 4 and where A 2 = Xö -t () is also P* -null.
Since * a . s. with respect to P* implies mo d , we have that :ji(X * )-+ d 1G(Xö) as
n -'oo. But .'(X„) - s4i(X*) since X„=X *, and thus Ji(X„)-- d +1i(Xo ) as n - oo
also. ❑

Corollary 1. Suppose Xo satisfies (16). Then as n - oo, the following three


statements are equivalent: t

X,z X0 on (M, 41 , S),


(19) 6(X, X ä ) a a _s .0 for equivalent processes X n = X. on (M, offs ),
S(X ^ *, Xö *) ^ p 0 for equivalent processes X ** = X„ on (M, 1 ).

Proof. Starting with X„ = Xo , use a trivial application of Theorem 4. Starting


with S(X„*, X o**)-* 0, obtain ^r(X**)-^ p ti(X o**) for bounded 8- uniformly
continuous 4r by using the subsequence argument of Exercise.A.8.2, and then
apply the dominated convergence theorem to the bounded function ir(X**)
to obtain EIG(X„)=E:ji(X**) +Etji(Xa*)=Ety(X 0). ❑

t The conclusion of the previous theorem would still hold even if the special process Xö * was
replaced by a sequence of special processes Xö . that are equivalent and that satisfy
S(X**, Xö*„)-► 0 0 as n - oc. This could arise when we consider Hungarian constructions.
WEAK CONVERGENCE 49

Remark 2. Thus on (D, 1) we have the following result. Let each X„, n >0,
be a process on (D, ) with P(XO E C) = 1. Suppose X= X0 on (D, -9, II)
as n -> Qo. Then each X. may be replaced by an equivalent process X * (i.e.,
X n - X„) for which JI X „* — X O* 11 , 5. 0 as n -* oo. If II X„ — XoII - 0 as n - . is
given, then X„zX0 on (D, 2, II II) as n-*oo can be concluded; in this latter
case we will write either X„ Xo on (D, _9, II II) or ll X„ — Xö 1) -> p 0 as n -* m.

Remark 3. Suppose now that q ? 0 denotes a specific function on [0, 1] that


is positive on (0, 1). In a number of important cases processes X*, satisfying
11 X — X IH p 0 can be shown to satisfy

(20) 11(X*—Xo)/q11-*v0 as n->x.

Now if /i denotes a s-measurable, real-valued function that is (1 • /q-


continuous except on a set A y having P(Xo E A,) = 0, then (20) implies (as in
Theorem 5) that

(21) +G(Xn) -+d 1 (Xo) as n-*oo.

For this reason, the conclusion (20) will also be denoted by writing

(22) XXo on (D, 2j J /q^J) as n->x.

There are a number of such functionals that are highly interesting.

Verification of (4)

We now take up the verification of hypothesis (4) of theorem 1. Let Z be a


process on (D, -9) with increments Z(s, t]. Suppose v is a finite continuous
measure on [0, 1] and let v(s, t] denote its increments. The following lemma
will be used to verify condition (4) of Theorem I in our later treatment of
empirical processes and weighted empirical processes.

Lemma 1. Suppose A. is as in Theorem 1 with t im = i/ m. Suppose

(23) EZ(r, s] 2 Z(s, t] 2 < v(r, t] 2 for all 0<— r<_ s <— t <_ 1.

Then for some universal constant K we have

P(IIZ —
AmZII ^ P(wz(1/m)-E)

( 24 ) I m EZ(k-1, k1 4 + Kv(0, 1]I max Pk-1


L

E k=1 \ m mJ E L1mksm \ m m k11

for all m >_ 1 and all s > 0.


50 FOUNDATIONS, SPECIAL SPACES, AND SPECIAL PROCESSES

Proof. Note first that for 0 r ^ s <_ r+ ô<_ 1 we have

sup I Z(r, s]i <_ jZ(r, r+ 8]) + sup {IZ(r, s]) A ^Z(s, r+ S])}
rss,^r +s rss^r +s

(a) =IZ(r, r+3]I+L.

Thus

pP( sup IZ( r, s])>_2e)<_P(IZ(r,r+S]J- e) +P(L>_e)


r<-sir +S

(b) <_EIZ(r, r +8]l 4/e 4 +P(L>E) by Markov's inequality.

The Fluctuation Inequality (Billingsley (1971)) below applies to L since

P()Z(r, sl) A Z(s, t]( >_ e) <_ E {Z(r, s] 2 Z(s, t] 2 } /e 4

(c) <_ v(r, t] 2/s 4 by hypothesis,

and thus from the fluctuation inequality conclude that

(d) P(L? e) <_ Kv(r, r+ 5] 2/E ° .

Plugging (d) into (b) gives

C E)Z(r, r +6] ) 4 Kv(r, r+ & ] 2


(e) p — , + 4 for each e > 0.
E E

Now consider any 0 ^ r s ^ 1 such that 0 <_ s — r <_ 1/ m. Then for some
0: i <_ j <m -1 with 0<_j—i<1

i i+1 j j+1
(f) —ter<_— and <_ss—;
m m m m

so that

)Z(r, s]) )Z(i /r, r]I +IZ(jl m, s]) +)Z(i/m,j /m]

(g) <_3 max sup Z(k /m, k/m+s])


0sksm -1 Osss1 /m

for all 01 s — r <_ 1/ m.


WEAK CONVERGENCE = 51

Thus
m-1
P(wz (1/m)?6e)^ E P( sup IZ(k/m, k/m+s]I>_2e)
k=0 Ots^l/m

k —1 , k l 4 k-1
14 j EZ( +Kv( , k ' l by(e)
e k_1 t. \ \ m m
(h)'5i jEZ,
M
k1 ' k
+Kv(0, 1] max v (
{

E k=1 m
m E 1krn m
m J

valid for all m >— 1 and all s > 0. This proof is a special case of Vanderzaanden
(1980) D

Fluctuation Inequality 1. Let T be a Borel subset of [0, 1] and let {X,: t E T}


be a right-continuous process on T. Suppose µ is a finite measure on T such
that
P(IXs—XrIAIX,—Xst ^ A)- µ 2 (Tn(r, t])
A4
for all r <_ s t in T and A > 0. Then

P(L=sup{IXs —Xr jAIX,—Xj: res <_t in T}>_,t)—<Kµ 2 (T) /d a

for all A > 0, where K is a universal constant.

Alternative Criteria for on (D, , II II)

The next theorem is one of the more useful weak-convergence results found
in the literature. It is essentially from Billingsley (1968, p. 128). Generalizations
to processes in higher dimensions appear in Bickel and Wichura (1971).
Basically, it allows for discontinuous limit processes.

Theorem 6. (= criteria) Let X,,, n ? 1, denote processes on (D, 0). Suppose


that for some a> z and b>0 we have
(25) E[IX,,(r,s]X,,(s, 1 ]1] b _-[µ,,(r,s]A.(s,t]] ° for all 0<
_r<_s<t<_1.

[ If X. = X o A. for the A. preceding Theorem 1 and if mesh (T,,) -> 0 as n -r oo,


X

then we only require (25) to hold for all r, s, t in T,,.] Suppose that is a
continuous measure on [0, 1] and either
(26) µ„(s,t]p(s,t] for all 0<_sSt<
_I and for all n>I
or
52 FOUNDATIONS, SPECIAL SPACES, AND SPECIAL PROCESSES

( 26 ') µn /.n([O,11) -'a tL/iL([ 0 , 1 ]) as n -+cm.

Then for the metric d of Exercise 2.1.8 we have

(27) {Xn : n? 1} is weakly compact on (D, 2, d).

If, further, X. - *fa. X as n -+ oo with P(X E C) =1, then

(28) X, X on (D, 9, 11 11) as n-+ x.

Exercise 5. Let X be a process on (D, ). Suppose, for continuous,

(29) EIX(s, t]j b <_ µ(s, t] ° for all 0 <_ s <_ t <_ I

for some b >0 and a> 1. Then P(X E C) = 1.

Exercise 6. Show that (9) and (10) hold if for a finite continuous we have

(30) ►
1 m n EXn (s, t] 4 ^ µ(s, t] 2 for all 0 <— s <_ t ^ 1.

4. WEAK CONVERGENCE OF THE PARTIAL-SUM PROCESS S.

We will let

(1) X, X1 , X2 , ... denote iid (0, 1) rv's

with partial sums

(2) S, X, +• • • +X; for i1

and So 0. Define a random function S on [0, oo) by

(3) S(t)=S(,> for t?0.

The nth partial-sum process S. on (D, 9) is defined by

S(nt)
S(t)= ^_ for0^t<_1
n

Si i i+l
(4) _ for—<_t<—and0<_isn;
n n

see Figure 1. Recall that S denotes Brownian motion.



WEAK CONVERGENCE OF THE PARTIAL-SUM PROCESS S. 53

•tom
X 2/.r'
'
i- ^ i

1 2 3 4 ••• n-1 1 t
n n is is is

Figure 1. The partial-sum process S.

Theorem 1. (Donsker) For iid (0, 1) rv's X,, X 2 ,..., we have

(5) § n ='Son(D,2,II II) as n-oo.

Proof. It is left to the routine Exercise 1 below to show that S n fd § as


n - oo. It now suffices to verify (2.3.4). Now for n sufficiently large

mn=P(II§° — Am°Sn )
(a) <_ mP(IISn110 m> E)
(b) <— 2 mP(IS(n/m>I/''— E/ 2 )
by Levy and Skorokhod inequality (Inequality A.2.4)
= 2 mP(IS(n/m)I/ ..In/m > e.lm/2)
(c) -^2mP(N(0, 1)>e./m/2) as n-oc by the CLT
(d) -0 as m-+oo for all E>0.

Thus (2.3.4) holds, and § n =.S on (D, 2, II I1).


There is an alternative to the verification of (d). We could instead appeal
to our weak-convergence criteria theorem (Theorem 2.3.6) with a = 1, b = 2,
Tn = {0, 1/n, 2/n, ... , (n —1)/n, 1} and p, denoting Lebesgue measure on
[0, 1]. By independence, for r_< s <_ t in T,,, we have

ES(r, s]S(s, t] = ES„(r, s]E§(s, t]


(6) =(s—r)(t—s)=µn((r, s])/-.((s, t]).

Thus Theorem 2.3.6 applies, and yields S=S on (D, .9, II II )• The original
paper is Donsker (1951). El
54 FOUNDATIONS, SPECIAL SPACES, AND SPECIAL PROCESSES

Exercise 1. Verify that S. -> f. d .5 as n - oo.

Our next theorem spells out what the conclusion (5) means when reexpressed
in a most convenient form.

Theorem 2. (Skorokhod's construction of S „) There exists a construction


of a triangular array of row-independent rv's X,, 1 ,.. . , X,,,,, n >_ 1, all distributed
as X and of a Brownian motion S on (C, 16) for which the partial-sum process
S. of X „,, ... , X „„ satisfies

(7) II S n — S 8 5 . .0 as n - oo for this Skorokhod construction.

Proof. Applying Corollary 2.3.1 to our proof of Theorem I gives processes


S„ = S„ and S ° S on (D, ) for which

(a) II §n — °
§ 1Ha.g.0 as n -+oo.

We obtain the required rv's X 1 ,.. , X,,, by defining

(b) 1)/n)] for 1 <i<_n.

Now let S denote the partial-sum process of X 1 ,.. . , X*„ and S = S ° .


Let .9° denote the collection of all functions in D that are constant on
[(i —1)/ n, i/ n ] for 0 <_ i ^ n. Now since S„ = S,,,

P(S„ E 3) = lira P(S (j /n K +i ) = S^(i /n)


K- co

for in K <j <(i+1)n K and 0:5 i s n)


= lim P(§,,(j/n K )=S.(i /n)
K-.00

for in K <j<(i+1)n K and 0 <_ i <_ n)

=P(S„EY)
(c) =1.

Thus our construction of S satisfies

(d) S=S a.s.

Of course (d) implies S = 5,,, and (d) and (a) combine to give

(e) II5n*-5öII_*a.5.0 as n -oo.

Thus the theorem holds, where we suppress the *'s in its statement. ❑
WEAK CONVERGENCE OF THE PARTIAL-SUM PROCESS S, 55

Exercise 2. (O'Reilly, 1974) Let

(8) q?0 be J and continuous on [0, 1]

and define

(9) T T(q, A) J o
t - ' exp (—Ag 2 (t)/t) dt.

Then we have

(10) II(S„— S) /qII -+ P 0 as n-> 'i for Skorokhod's construction

if and only if

(11) T(q, A) < ao for every A > 0.

(A simple proof of this theorem for the special case of square integrable 1/q
is given in Section B.2.)

Other Special Constructions of S.

The Skorokhod construction of Theorem 2 is deficient in that we arrived at the


conclusion (7) for a triangular array of row-independent rv's. However, we
know absolutely nothing about the joint distribution of any two rows. This
means that results involving the a.s. convergence of functionals of the original
sequence (1) cannot be established via a proof based on (7). Thus other
constructions were sought.
An important result was the Skorokhod embedding of S. into Brownian
motion. (This is discussed carefully in Section 5; though superseded, it is still
heavily referred to in the literature.) The upshot in Strassen's (1964) result
that it is possible to make a special construction of a Brownian motion S and
a single sequence of iid (0, 1) rv's X 1 , X2 ,... with df F whose partial-sum
process S satisfies

(12) IIS* S(nI)/✓ nII/b„ -a.s.0


— as n-oo

for b„ =J2 log 2 n. Note that S(nI) /f is also a Brownian motion, and then
compare (12) and (7).
In Section 7 we will discuss a different type of construction in which a best
rate of convergence is also established. The best result is given by the Hungarian
construction of a Brownian motion S and a single sequence of iid (0, 1) rv's
X 1 , X2 ,... with df F whose partial-sum process S„ satisfies

( 13 ) IIS* §( nI)/.JnII = 0((log n)/'Ji) a.s.



56 FOUNDATIONS, SPECIAL SPACES, AND SPECIAL PROCESSES

provided the df F has a moment generating function that is finite in a


neighborhood of the origin. A careful treatment of the Hungarian construction
requires the Wasserstein distance of Section 6.

The Partial-Sum Process of the Future Observations

For the iid (0, 1) rv's X,, X2 ,... of (1) we define

n Sk n n
(14) „(t)= k S for and k >_n
k+1<t<_k

with partial sums Sk = X, + • • • + Xk and with 9(0) = 0. Then ^„ is a random


element on (D, ) ; see Figure 2.

:n
n=4

-- 0 S.

-4

~ n -
n+3 '-- n+1
n+2 - A-- 1
—4

Figure 2. The partial-sum process 9„ of future observations.

Theorem 3. For iid (0, 1) rv's X,, X2 ,..., we have

(15) &„ S on(D,^,I ) asn-*co.

Exercise 3. Prove Theorem 3.

Remarks

Functional analogs, for partial sums, of the Berry- Esseen theorem are con-
tained in Nagaev (1970) and Aleskjavicene (1977).

5. THE SKOROKHOD EMBEDDING OF PARTIAL SUMS

Let X denote a ry having mean 0. Let S denote a Brownian motion on [0, co).
Our first objective is to sketch out how to define a ry r? 0 having the property
that

(1) S(r)X.

This is the basic building block of Skorokhod embedding.


THE SKOROKHOD EMBEDDING OF PARTIAL SUMS 57

The Strong Markov Property

Let X denote a process on the measurable function space (DR+, R*).

Proposition 1. If 0 <_ r < oo is a rv, then so is X(T).

Let {si,: t >— 0} be a collection of sub-Q-fields associated with the underlying


probability space (11, .^ P) such that .. c si, for all 0<_ s < t. A ry 7? 0 is
called a stopping time with respect to the .I,'s if [-r <_ t] E , for all t ? 0. We let

d,={AES/I: An[r<t]Ed, for all t>0}.

If the process X on (D R S, 1 R +) is such that the o,-field o{X(s): s <_ t] generated


by the rv's X(s) with 0 s s <_ t satisfies o [X(s): s <— t] c .s/I, for all t ? 0, then X
,

is said to be adapted to the s4,'s.

Proposition 2. If X is adapted with respect to the sd,'s and T is a stopping


time with respect to the .^ 's, then .fi r is a a -field and both the stopping time
-

r and X(r) are sa? z measurable.

Theorem 1. (Strong Markov property) Let X on (DR *, 2 R -) be adapted to


{si,: t >0} where sad, is 2' in t. Suppose X has stationary and independent
increments and that X(t, t + h) is independent of s/I, for all h >0. Let r be a
stopping time with respect to the s?,'s, and suppose P(r < oo) > 0. For t ? 0
we define

RC(t+T)—X(T) on [T<oo]
(2) ^'(t)
t o on [T=«t].

Then Y is a process on (DR *, 'R) and

(3) P(V E Bl[r <oc]) = P(X E B) for all B E g R +.

Moreover, for all B E 2 R + and all Ac s^ T we have

(4) P([V E B] r Aj[T<oo])= P(4'E BI[z <oo])P(AI[T<oo]).

Thus if P(T < oo) = 1, then V = X and V is independent of .s/I T .

We refer the reader to Breiman (1968) for these results.

Exercise 1. (Blumenthal's 0-1 law) Let A be in the o,-field s:Qo =(1 E ^ o s:^ E
,

where d, = o-[S(s): 0 s ^ t] and S is Brownian motion. Show that P(A)


equals 0 or 1.
58 FOUNDATIONS, SPECIAL SPACES, AND SPECIAL PROCESSES

Gambler's Ruin Problem

It is an easy exercise to verify that

(5) S(t), t?0, is a martingale,


(6) S2(t) — t, t >_ 0, is a martingale,
(7) exp (cS(t) — c 2 t 2/2), t>_0, is a martingale for any —oo < c < oo.

Now the ry

(8) ra.b= inf {t: S(t)>_b or S(t)s—a} for fixed a, b >0

is a stopping time with respect to the a-fields sf, = Q[S(s): s < t]. We take as
known the fact that P(ra, b <00) =1 and that if Z denotes any of the three
martingales above, then the optional sampling theorem of martingale theory
gives EZ(ra, b ) = EZ(0). If we let Pa = P(S(Ta.b ) = a), then application of
optional sampling to (5) yields 0 = ES(0) = ES(ra, b ) = — apa + b(1— pa ); this
implies

(9) P(S(TQ,b) = a) = b /(a + b).

Applying optional sampling to (6) gives 0 = E(S 2 (0) —0) = E(S Z (ra, b ) — ra, b )
(— a) 2 b /(a +b) —b 2 a/(a +b) —Era, b = ab —Era, b , so that

(10) ETa, b = ab.

Exercise 2. Verify (5)-(7).

Exercise 3. Show that

Era, b <_ Cr ab(a +b) 2 ' -2 for all r >0

where 4rI'(r) works for C, when r -1.

Construction of 'r Having S(r) = X

We now give a formal statement of (1) in the next proposition; Exercise 4


sketches its proof.

Proposition 3. If X is a ry with mean 0 and df F, then there is a stopping


time r such that S(r) = X. Moreover,

(11) Er= Var[X];


THE SKOROKHOD EMBEDDING OF PARTIAL SUMS 59

and for r? 0 we have


2r
(12) ETr^C,EIXI

for some constant C r .

Exercise 4. Let (A, B) be independent S with joint df H having

(13) dH(a, b) = (a+b) dF(a) dF(b)/EX + for a, b?0.

Observe (A, B) = (a, b) according to H, and then observe Ta , b ; call the result
of this two-stage procedure r. Show that T is a stopping time, and then use it
to prove Proposition 3. (This T was taken from a W. J. Hall seminar in 1967.
There are many other ways to specify a suitable r; see Root, 1969 for a nice
one. See also Breiman, 1968.)

Embedding the Partial-Sum Process in Brownian Motion

Let F denote a df having mean 0. Let a Brownian motion S be given; note


that S is adapted to the collection of si, = v[S(s): 0 s <_ t]. Now let T 1 denote
the stopping time described in Exercise 4. Then

(14) X, = S(r) has df F and is .sir,-measurable

by Propositions 2 and 3. Note that ET, = Var [X,] by (11). Also T, is a.s. finite
(since Ta , b is), so that the strong Markov property of Theorem I shows that

(15) §'(•)=S(•+T l )—S(T,) is a Brownian motion independent of X.

Because of this independence, we can now repeat the above procedure obtain-
ing a ry T2 such that

(16) XZ - S'(TZ ) has df F and is independent of X,.

Note that

(17) X,+X2=[5(T2+r1)—S(T1)]+S(T1)=S(T1+T2).

Continuing on in this fashion we obtain:

Proposition 4 Let F be a df having mean 0 and variance Q Z E [0, ac]. Then


there exists a sequence T,, r 2 ,... of iid nonnegative rv's for which

(18) X S(To+T,+• . •+T1) — S(T0+TI+• • •+ri-1 ) for i =1,2,...


60 FOUNDATIONS, SPECIAL SPACES, AND SPECIAL PROCESSES

(with To ° 0) are lid F. Note that

(19) ET =Var[X,] =(T 2 .


;

Let us now form the partial-sum process S of the rv's of Proposition 4.


Note that
X, ^ ... ..
+X( nt) = S(r0 +T1+
.. .
„t)) +T(
(20) „(t) J for 0<—t <_1.
n n
We say that we have embedded the partial-sum process of the iid (0, 02 ) rv's
with df F in Brownian motion.
We now suppose that o2= 1, so that

(21) Er = Var [X] = 1.

The LLN suggests that T, + • • • + T = nt. Thus we see reason to hope that
(

the partial-sum process S of (20) is well approximated by the Brownian


motion §(nt) /V for 0 <_ t -1; that is, we suspect that Skorokhod's embedded
partial-sum process S of lid (0, 1) rv's

§(O +Tt+ ...


+T (
„ r ))
(22) S. _ (with iid mean 1 ry's T,, T Z ,...)
✓n
satisfies

(23) S* =S(nI)/. n where S(nI) /s=S.

Theorem 2. (Strassen) Let F be a (0, 1) df. Then Skorokhod's embedded


partial-sum process S* of (22) satisfies

(24) IIS —5(nI)/'1 nII /b a.s. 0 as n -cc,

where b„ = I 2 log 2 n. If further J .0 x 4 dF(x) <cc, then

II§*-S(nI)/clI
(25) Thu
[n , (log 2 n) log 2 n]II,<oo a.s.

Exercise 5. (Jain et al., 1975) Show that

(26) IIS ' —S(nI)/-JnII = o(b;, s ) a.s.

provided

x 2 [Iog 2 (3 v Ix)] s dF(x) <oo with 3 _> 0.


THE SKOROKHOD EMBEDDING OF PARTIAL SUMS 61

Exercise 6. (Breiman, 1967) Show that

IS(n)—S(n)I+
(27) a.s. 0 if EIXI'<ao for 2<r<4.
n h/ r Ilog n

Use this to establish a rate between the rates of (24) (when r = 2) and (25)
(when r'= 4). Breiman also proved (26) for S = 2.

Exercise 7. (Breiman, 1967) There exists a (0, 1) distribution such that

IS(n)—S(n)I
lim >0 a.s.
n-.=

Thus it is impossible in general to omit the b n from (24).

Exercise 8. (Keifer, 1969) Follow Kiefer to evaluate the lim sup in (25). Its
value is a function of the particular stopping time used.

An Alternative Construction

Suppose now that X = (0, 1) with df F, and we now specify T via Proposition
3 so that

(28) S(T)=X/./i with ET=Var[X/Vi]=1/n.

Then repeating the Skorokhod embedding scheme leads to a triangular array


Tnn of row-independent identically distributed rv's such that

(29) Xni/ " =— [S(r0+ • • •+ri)—S(Tno+• . • +Tn1-1)] for i =1,2,...

(with T,,, = 0) are row independent with df F. The partial-sum process S* of


Xnn satisfies

X.I +' ' '+X.n


(30) S (t)= =S(0+•rn,+ •+rn(n , > ) for 0-t^1,

so that we now hope that


(31) S*=S.

We now make the degree of approximation in (31) precise.

Theorem 3. (Breiman) Let F be a (0, 1) df. Then Skorokhod's embedded


partial-sum process S* of (23) satisfies

(32) II§n—SII- O as
62 FOUNDATIONS, SPECIAL SPACES, AND SPECIAL PROCESSES

Note that the triangular-array character of Theorem 3 is the same as that


of Theorem 2.4.2. However, the Brownian motion S in (32) is simpler than
the Brownian motion S(nI) /V in (24), (25), and so on.
Proof. Note that

(a) IIS„ —SIl = sup JS( 0 +T„1+ ... +T„(„r>) — §(t)I+


O^fsI

and since each sample path of S is uniformly continuous, it suffices to show


that

(b) An= sup 1 0+T,,+ ... +T„(„ f ) — tI ' p 0.


O5t51

We will establish (b) circuitously.


To this end note that Enr„, = 1, and now let T,, Tz , ... be iid rv's such that

(c) T, = nT„,.

We thus note that


+T( „,> I
0+T, +... _
(d) O* = sup t = D „.
I ostsl n

Thus (b) will follow if we show that tln - p 0; we will in fact show that

(e) O*n ^a.s. 0 as n- 00.


But (e) is immediate since for each fixed t E [0, 1] the SLLN gives

(f) (T, +...+T(„n)l n a.s.t,


4

where the functions on both sides of the arrow in (f) are A. [Note the proof
of the Glivenko-Cantelli theorem (Theorem 3.1.3), which uses this idea also.]
This is from Breiman (1968, p. 279). ❑

Exercise 9. Establish moment conditions on F that guarantee -* a .,.0 in (32).


Can you establish a rate of convergence?

6. WASSERSTEIN DISTANCE

We wish to define a metric on

(1) S2= j F: F is a df such that J . x 2 dF(x)<oo }.


l ^ JJJ
WASSERSTEIN DISTANCE 63
First, recall that if X = F, then X = F - '(f) by the inverse transformation; thus
J [F - '(t)] 2 dt = EX 2 =f -0 x 2 dF(x). If X = F and Y G, then it would seem
that X ° — F'() and Y° = G - ' (same ) are as close together as any two such
rv's can be. We therefore define

(2) d2(F, G)=


i

0
[F-(t)—G-(t)]2 dt
I 1/z

and note that the above remarks show this to be finite for F, G E 2 • It is clear
that d2 (F, G) >_ 0 with equality if and only if F = G, that d2 (F, G) = d2 (G, F),
and d2 (F, H)<d2 (F, G) + d2 (G, H) by Minkowski's inequality; thus

(3) ( Z , d2 ) is a metric space.

Theorem 1. (Mallows) For F,, F2 ,... in 2 we have

(4) d2(F,, F)-->0

if and only if both F. -* d F and J x 2 dF„ (x) x 2 dF(x).


JT
Corollary 1. If X 1 ,.. . , X. are iid F in . 2 , then

(5) d2(ß, F)-*0 a.s. as n -*oo.

Proof. Suppose J x 2 dF„ (x) -^ J x 2 dF(x) and F. -* d F. Let r >0 be given.


Choose S = S e so small that

(a) J [F-'(t)]2dt<e.

Now (1.1.31) allows application of the dominated convergence theorem to


conclude both

(b)J [F'(t)—F'(t)]2dt<r

and

(c) ^^ s [F„'(t) 2 —F - '(t) 2 ] dt ( <E


s

for all n >_ some n. Convergence of second moments implies that for n ? some
n °_ s we have

(d) ^ J[F(t) 2 — F'(t) 2 ] dt I <e.


0
64 FOUNDATIONS, SPECIAL SPACES, AND SPECIAL PROCESSES

Combining (a), (c), and (d) gives

[Fn'(t)]z dt <3e
(e) [s,`
J1 -7

for n ? n,, s = n', s v n. Thus (a), (b), (e), and Minkowski's inequality give

(f) di(F,F)= J 1
0
[Fn'(t)— F - '(t)J 2 dt<e +(v +^) 2 <9e

for n >_ n; that is, d2 (F, F) -* 0 as n-* oo.


Now suppose d2 (F, F) -*0. Applying Minkowski's inequality twice gives
1/2 > 1/2 2

(g) dz(F,,, F)^{[ (t) z dt]


J 0 F _ J0 [ F -1(t)z dt ] }
/z l z '

J'
—{[J co X2dFn(x)]1 /2—[J x2dF(x)

so that J x z dF(x) - j x z dF(x). Assume F. -' d F does not occur. Then at some
continuity point t o of F we have F„•(to ) -* a 0 F(to ) on some subsequence
n'. This is enough to imply d2 (F, F) -A 0, which is a contradiction. Thus
F„ d F. O

Exercise 1. (p2 , d2 ) is a complete metric space (see Dobrushin, 1970).

Theorem 2. (Mallows, 1972) Show that d2 (F, G) = inf E(X — Y) 2 when the
infimum is taken over all jointly distributed X and Y having marginal df's F
and G.

Exercise 2. Prove Theorem 2.

Exercise 3. (Dobrushin, 1970) Define

(6) d,(F, G)=


Jo
i jF -1 (t) — G - '(t)1 dt

for F, G in = {F: F is a df such that ,( . lxi dF(x) <oo}. Show that d,)
is a complete metric space. Note from geometrical considerations that

(7) d1(F, G) =
J m
IF(x) — G(x)l dx.

Show that d,(F, G) = inf EIX — Yl when the infimum is taken over all jointly
WASSERSTEIN DISTANCE 65

distributed X and Y having marginal df's F and G. Show that for F,, F2 ....
in

(8) d,(F", F)-0

if and only if F" -* d F and J ixi dF" -> J ixi dF as n -^ oo.

Exercise 4. (Major, 1978) Show that if 0 is convex, then

(9) inf E(X—Y)=J1 #i(F - '(t)—G - '(t))dt


0

when the infimum is taken over all jointly distributed X and Y having marginal
df's F and G.

Exercise5. (Bickel and Freedman, 1981) Consider a separable Banach space


with norm II II • For fixed p> 1, let 5 denote all probability distributions P
on the Borel o-field for which j jx dP(x) <ao. Let dd (P, Q) denote the
infimum of (EIIX — Yii) ° over all Borel-measurable rv's X and Y having
X = P and Y = Q. Show that dp (P" , P) -> 0 as n --3- co is equivalent to each of

(10) PP =P and • iixJJ°dP"(x)4 J JJxij°dP;

(11) P=P and x1i° is uniformly P. integrable;

f f dP^ --), f dP
(12) f
for all continuous f having f(x) = O(iixiI ° ) at infinity.

Exercise 6. (Mallows, 1972) Suppose X 1 ,.. . , X. are independent with df


)
F having mean 0. Let F"' denote the df of (X, + • • • + X")/f. Likewise, G'" )
denotes the df of (Y, + • • • + Y" )/V for an independent sample from df G
having mean 0. Then

(13) d2(Ft " >, G( " ) ) <_ d2(F, G).

Bickel and Freedman discuss this carefully, on Hilbert spaces.

Exercise 7. (Bickel and Freedman, 1981) Show that 1IF— G il <S implies
that {t: IF - '(t)—G - '(t)i>'} has Lebesgue measure less than v.

Exercise 8. The Prohorov distance between F and G is d (F, G)


inf { e > 0: F(B) < e + G(BE) for all Borel sets B} where Be = { y: x — yJ e for
some xE B}. Show that d(F, G)^[d,(F, G)]" 2 (see Dobrushin, 1970).
66 FOUNDATIONS, SPECIAL SPACES, AND SPECIAL PROCESSES

Exercise 9. Let X, X2 ,... be iid F where Fe Jw,. Show that

(14) d, (C=, F) -+ 0 a.s. as n -> x .

7. THE HUNGARIAN CONSTRUCTION OF PARTIAL SUMS

Using the Skorokhod embedding method, the rate of convergence of (2.5.25)


cannot be improved on. However, Csörgö and Resesz (1975a) introduced a
fundamentally different method of constructing the partial sums from
Brownian motion. This method was considerably refined and honed by Komlös
et al. (1975, 1976). Fundamental to this approach is the concept of defining
random variables to be as close together as possible; recall (2.6.2). We now
state their results using the notation of Section 2.4. Thus S„ = X, + • • • + X„,
S(t)=S< , > and S.(t)=S(nt)/Vi. Recall that

(1) S(nI)/✓n=S.

Let X denote a (0, 1) rv. The Hungarians succeeded in defining iid X rv's
X,, X 2 , ... and a Brownian motion S on a common probability space so that
the following results hold: The partial-sum process S of X,, ... , X. satisfies

(2) II§*—S(nI)/,finII- p O as n-^o0

while

(3) II§„ — §(nI)/1nII /b,, _>a.5.0 as n- co

with b„ = 2 log e n. However, for any A. -+ m there exists a (0, 1) df F for which

(4) Im A „
I .^n — S(nI) /v /!I —
b = oo a.s.
n

whenever X,, X2 ,... are iid F; thus (3) is indeed the most that can be
established under the (0, 1) assumption. See Major (1976a, b) for all results
of this paragraph. Theorem 2.6.2 provides some of the basic motivation; consult
also Major (1976b).
In the next paragraph we shall see that under additional moment assump-
tions, more can be claimed about the uniform closeness of the special S„ to
S(nI)/'.
The best possible rates are obtained when we assume that

(5) X =(0, 1) and has a finite moment generating a function in a


neighborhood of 0.
THE HUNGARIAN CONSTRUCTION OF PARTIAL SUMS 67

We may then suppose that

* log n
(6) lim (^^ n —^(nI)/^^^ <some M <ao a.s. when (5) holds.
n- w

In fact, for all n >_ 1 and all x we have

(7) P( max S*(k)—S(k)i>c 1 log n+x)<c 2 exp(—c 3 x),


1<ksn

where c 1 , c2i C3 > 0 depend only on the df F of X and where c 3 may be taken
arbitrarily large by taking c, large enough. Moreover, this construction is the
best possible to the extent that whenever F is not a N(0, 1) df, then no matter
what construction is used we have

^S*(n)—S(n)^
(8) tim ^ some c> 0 a.s.,
n-.ao log n

where c depends on F. Thus the rate in (6) is the best possible. Moreover, the
hypothesis (5) is essential in that whenever it fails, the construction on which
(6) is based satisfies

(9) Tl IS*( = a.s.


log n (n)I
See Komlös et al. (1975, 1976) for all results of this paragraph, and for (11)
below with r> 3.
Lesser rates rates are available when we assume only

(10) X m (0, 1) and EIXV <oo for some r>2.

We may then suppose that

(11) JAS „ —S(nI)/,/n11=o(n -1 / 2 + 1 /r ) a.s. when (10) holds,

while we necessarily have

S*(n)—S(n)
(12) iii — =00 a.s. if EIXI'=oo with r>2.
n -. n

Moreover, for all n'^' — x <_ n log n we have

(13) P( max IS*(k)—S(k)i>x)=o(n)/x'


1^ksn

if EjXl'<ao with 2<rs3.


68 FOUNDATIONS, SPECIAL SPACES, AND SPECIAL PROCESSES

Major (1976a, b) established (11) and (13) for 2< r ^ 3, as well as results based
on E(X 2 g(X)]<oo. See Breiman (1967) for (12).

Partial Sums on 10, ao)

Exercise 1 (i) Suppose the df F has mean 0, variance 1, and a moment


generating function that is finite in some neighborhood of the origin. Then
the Hungarian construction of iid rv's X 1 , X2 ,. . . with df F yields a partial-sum
process S n that satisfies

—S(nt)/./iI / log n
(14) lim max § "(t)
n -i00 {t:t= k /n,kz0} I v logt Vn

(ii) Show that (14) implies

(15) IISn — S(nI)/ ✓n)/g110 -^ P 0 as n-*co,

where q(t) = [(e 2 v t)1og 2 e 2 v t]' /2 for t > 0. [In fact, Müller, 1968 showed that
(15) holds for the Skorokhod construction of the partial-sum process of iid
(0, 1) rv's.]

Example 1. If X,, X2 ,. .. are iid (0, 1), then

(16) P(' max Sk / k > x) -^ 2P(N(0, 1) > x)


kn

as n-+ oo for all x>0.

Proof. Let i(f)= sup,y l f(t) /t, and note that both +/j(S) and çfi(S(nI)/./)
are rv's that are a.s. finite. Also

(a) (§n)—+G(S(nI)/Vi)I ll[§" — S(nI) / ✓n]/( 1 v I)II' -r 0

by (15). Thus iIi(§n)—iI(§(nI) / ✓n) ->p0. Thus G( §n) * d +y(S). Now +'(Sn)=
v n maXk zn Sk /k. Finally,

P(ç!i(S)> x) = P(sup S(t)/t> x)


tit

(b) = P( sup sS(1/s)> x)


o<s^I

(c) = P( sup S(t)>x) since IS(• /I) _ S by Exercise 2.2.8


o <I 1

(d) =2P(N(0 ,1)>x) by (2.1.5).


RELATIVE COMPACTNESS - 69

The key to this example was step (c), which rests on the fact that IS(• / I) is
the natural time-inversion dual process of S. 0

See Müller (1968) and Wellner (1978a) for additional examples in a similar
vein.

8. RELATIVE COMPACTNESS M ►

Let X,, X 2 ,... denote processes on a probability space (11, d, P) that have
trajectories belonging to the set M. Let ' denote a subset of M, and let 6
denote a metric on M. Suppose there exists Ac d having P(A) = I such that
for each w E A we have:

(i) Every subsequence n' has a further subsequence n" for which X,,.•(w)
8-converges (or is S-Cauchy).
(ii) All 6-limit points of X(o) are in
(iii) For each h E' there exists a subsequence n' = n,,, w such that
S(X,, (m), h)-0 as n'-*co.
,

When these conditions hold we say that X. is a.s. relatively compact with
respect to S on M with limit set Z, and we write

(1) X„ — X a.s. wrt 8 on M.

We can summarize this definition by saying that a.s. x„ is 6-relatively compact


[condition (i)] with limit set C [conditions (ii) and (iii)].

Remark 1. The prototype example of N• ► is a slight strengthening of the classic


LIL. If X 1 , X2 ,... are iid (0, 1) rv's, then the classic LIL gives

(2) li m —- -= 1 a.s. and lim ^b = —1 a.s.


n n-+ao „

for S„ = X, + + X,, and b = 2 log e n. In fact, this can be strengthened to

(3) Ibn ^^► [-1, 1] a.s. wrt l on R.

This latter result has a natural functional analog for the partial-sum process
S„, as defined in (2.4.4). Specifically, Strassen's (1964) landmark paper showed
that

(4) §„ M► ^l• a.s. wrt II II on D


fib„
70 FOUNDATIONS, SPECIAL SPACES, AND SPECIAL PROCESSES

where
(5) 11—{k: k is absolutely continuous on [0, 1] with k(0)=O and

J0
" [k'(t)] 2 dt

Our aim in this section is to become comfortable with LIL-type of proofs


and the concept of M►, and to set up criteria that will allow us to establish M►
of the empirical process later on. To this end we will first give a detailed proof
of the classic LIL that is based on Skorokhod's embedding; the virtue of this
approach is that the details of the calculations are particularly clear and
transparent. We will then extend the LIL to (3), and its multivariate generaliz-
ation. We will then derive criteria for establishing +. These are particularly
tailored to empirical processes, and are not very convenient for the partial-sum
process S n . For this reason we use the exercises to present alternative criteria,
and to then verify (4).

The Classic LIL for the Normal Processes S and U, and for Sums of lid rv's

The next proposition puts on record the proof of the classic LIL in a very
simple case where details do not cloud essentials. (See Brieman, 1968 for the
approach of this subsection.)

Proposition 1. Let Z,, ... , Zn be iid N(0, 1) rv's. Let Sn = Z, + • • • + Zn and


bn = 2 log e n. Then
Sn
hrn ^ b =1 a.s.
n

Proof. Let e > 0. We need the exponential bound

(a) exp(— (1 + E ) A2/2) <_P(Sn /v >_A)<_exp(— (1 -6)A2/2)


for all A >_ some A (see Mill's Ratio A.4.1), and the maximal inequality

(b) P( max Sk >_A)<2P(Sn ?A)


lsksn

for all A > 0 (see Levy's Inequality, A.2.8).


Let n k = (a k ) for a> 1; a sufficiently small a will be specified below. Now

(c) Ak=[ max S,„ >-(1 +2£)b m


nk -i mint ]

max S,,,?(1 +2e) k n )>


nq_, m5nk nk_'
nk ^ b k _'
since fn is 1' and b n is 2',
RELATIVE COMPACTNESS 71
so that for k sufficiently large

n - ' ✓n k b„ k _ I )
P(A k )<_ 2P(S„ k ?(1+2e) k by (b)
nk

s2exp(- 2(i— e)(1+2e) ZI a E 2log k) by (a)

-5 2 exp (—(1 + e) log k) for a sufficiently small


(d) =2/k' +E

= a convergent series.

Thus P(A k i.o.) =0 by Borel-Cantelli. Since e >0 is arbitrary, we thus have

(e) 1TVnb
i 1 a.s.
n

We also note from (e) that

(f) P(Ak i.o.) = 0 for any (large) positive a.

We must now show the iii in (e) is also ?1 a.s. We will still use n k = ( a k );
but a will be specified sufficiently large below. We write

(g) S, = Snk-f + ( Snk — Snk _ I ).

Now the events

(h) Bk = [S„ k — S„ k _, (1— 2e)/b„ A ] are independent

and

(i) P(Bk) >_exp( -2 (l+e)(1-2e )2


nk nnk b2 by (a)

exp —2(1 +E)(1- 28) z( Q+el a 2logk 1


exp (( —(1 — e) log k) for a sufficiently large
(j) =1/k
= a series with infinite sum,

so that P(Bk i.o.) = 1 by the other Borel-Cantelli lemma. But P(A k i.o.) = 0
72 FOUNDATIONS, SPECIAL SPACES, AND SPECIAL PROCESSES

and P(Bk i.o.) = I means


(k) P(Ak n Bk i.o.) = 1.
Moreover, on Ak n Bk we have [using (g), (h), and (c) with symmetry]
1+3E
(1) "bnk >_— +(1 -2E)?(i -3e)

for a specified sufficiently large. Thus


Sn k
a.s.
(m) ky= k nk '-1
^b

Combining (e) and (m) gives the proposition. ❑


Of course, the LIL for Brownian motion follows almost immediately from
Proposition 1.

Theorem 1. (LIL for Brownian processes)

(6) lim S( t) =1 a.s.


t-^= 2t loge t
S(t)
(7) lim =1 a.s.
to 2t 1og 2 1 /t
U(t)
(8) lim =1 a.s.
do 2t log 2 1/t
Proof. We first observe that
S((t))
= a.s.
(a) cl> 2(t)1og2 (t) —1
by Proposition 1, and that
(b) 2(t) log 2 (t) 2t log 2 t as t - x.
Also, for a sufficiently large,
(c) P( sup S(t)—S(n)^>_ 2t log 2 t)
wstsn+l

:4P(N(0, 1)>e'/2n log2 n) by Exercise 2.2.3 and (2.2.6)


s 4 exp (—e 2 n log e n) by Mill's Ratio A.4.1
= 4 (log n) - ` Zn
(d) = a convergent series,
RELATIVE COMPACTNESS 73
so that the event A n of (c) has P(A n i.o.) = 0 by Borel-Cantelli. Thus (a),
(b), and P(A,, i.o.) = 0 show

(e) 5(1) — / 2(t)log 2 (t) §((t)) S(t) §((t))


2t log 2 t V 21 log 2 t 2(t) log 2 (t) + I 2t log2 t

has lim sup equal to v'T 1 + 0 =1 a.s. as t -* oo. Thus (6) holds.
For (7) we use the time reversal Exercise 2.8. Thus

t§(1/t)
I as lim by time reversal and (6)
-.= 2-t log t t
S(r)/r
(f) =lim letting r= 1/i.
r-•o (2/ r)1og 2 1/ r

That is, (7) holds.


Finally, (8) follows immediately from (7) and the representation of U(t)
as S(t)—tS(1) given in Exercise 2.2.1. ❑

We can now use Skorokhod embedding (2.4.12) to extend the LIL for
Brownian motion to the general LIL for an lid sequence.

Theorem 2. (i) (Hartman-Wintner LIL) Let X 1 , X2 ,. . . be iid (0, 1). Then

S
(9) lim _b =1 a.s.,
n

where Sn = X, + • • +X,, and bn 2 log 2 n.


(ii) (Strassen's converse) If X 1 , X2 . . are iid, then
,.

(10) P(lim Sn /(vbn )< oo)>0 implies EX 2 <oo and EX =0.

Proof of (i) of Theorem 2. Let X, Xi',... and S denote the specially con-
structed random quantities of (2.4.12). Note that

Z S(n)—S(n-1) are iid N(0,1) rv's for n-1.


Thus our proof of Proposition I gives

(a) lim S(n) / ✓nbn = lim (Z,+• • +Z)/.b n = 1 a.s.


n- W n—oo

Applying (a) and (2.4.12) gives


(b) lim S (1) /bn = 1 a.s.;
74 FOUNDATIONS, SPECIAL SPACES, AND SPECIAL PROCESSES

that is (9) holds for the special rv's X, X Z , .... But (9) is a property
determined solely by the finite-dimensional distributions of the X; 's, and these
agree with those of the X*'s. Thus (9) holds. [The other half of (2) follows
by symmetry.] The original proof was given in Hartman and Wintner (1941).

Exercise 1. Prove (10). See Strassen (1966).

In the next subsection we will improve on the LIL (2) by establishing the
M► of (3).

An Example of Relative Compactness

Theorem 3. (i) (LIL) Let X,, X2 ,... be iid (0, 1) rv's and let Sn
X, + + X. Let b n = 2 log e n. Then (9) can be strengthened to

(11) S„/(fb„) M► [ -1,1] a.s. wrtI on R.


(ii) (Multivariate LIL) Let X,, X2 be iid m- dimensional random vectors
with zero mean vector and identity covariance matrix. Let S. = X, + • • • + X,,.
Then

(12) S„ /(fib„) N•► Bm a.s. wrt (I on R.;

here I I denotes ordinary Euclidean distance on R,„ and B.


f x E R.: jxj = f ;” x; < 1} is the unit ball.
Proof. (See Finkelstein, 1971) Let Z. = S„ / -b„ and let "a" denote a vector
in R. with IaI = 1. Now a'X,, a'X2 , ... are iid (0, al 2 ), so that

(a) 1 =dal = tim a'Zn a.s. by the classical LIL


lim Ia! !Z„I by Cauchy-Schwarz
=film lZ„i.
Thus lim JZ, >_ I a.s. Suppose lim IZ „i =1 + e. Recalling that a'Z„ is the projec-
tion of Z. onto a and noting that um a'Z„ is a.s. constant by the Kolmogorov
0-1 law, we thus see that there exists a vector a o with ga o l = 1 for which
Iim ao Z„ =1 + e' with 0< e' r; but (a) then implies e' = 0. Thus

(b) lim JZ„1 = 1 a.s.

Let C,„ ={x€ R.: IxI =1} denote the surface of B.. Let an C„, be arbitrary
but fixed; recall (a). Let 0< e <_ 1 be arbitrary. Now a.s. for infinitely many n
the vector Z„ satisfies a'Z„ ? 1- E by (a) and 1 4 1 ^ 1 + e by (b), so that

IZ„ -a^ 2 = iZn I 2 +lal 2 -2a'Z„ <5e for infinitely many n


RELATIVE COMPACTNESS 75

obtains a.s. Thus a.s.

(c) the set of limit points of Z„ contains C m .

Now let IT project R m+ , onto R m by IT(x,, ... , x m+ ,) _ (x,, ... , x m ). Let


Y,, Y2 ,... be iid (0, 1) and independent of the X„'s. Let X „* =(X, Y„) and
Z*=(Zr ,(Y1 +• • •+Y„)/V'ib n ), and note that ir(X*)=X„ and 7T(Z*)=Z„.
Since (c) holds for all m, a.s. the set of limit points of Z* contains C m+ ,. Since
Ir is a continuous mapping, we thus have that a.s. the set of limit points of
Zn = IT(Z*)( contains B m = Ir(Cm+ ,). But (b) implies that a.s. the set of limit
points of Z„ is contained in B.. Thus a.s. the set of limit points of Z„ equals
B m . That is (12) [and hence (11) also] holds. D

Exercise 2. (Finkelstein, 1971) Let Xk = (XIk, ... , X m ,j', k >_ 1 be iid m-


dimensional vectors having zero mean vector and covariance matrix = 1^ a ;; 11
with o; ; =(1/m)(1-1/m) and s;; =-1/m z for i^j. Let S,,=X,+•••+X„and
b„ = J2 log e n. Then
m m
(13) Sn /(Vib„) M► {(x,,...,x m ):Zx; =0andm x;-1} a.s.
I 1

wrt I Ion R m .

Sketch: The range of the X k is the hyperplane H in R m defined by H =


{x€ R m : Y_ x ; =0}. Now there exist iid random vectors Y,, Y2 ,... in R m _,
and a linear transformation T from R m _, to H such that Xk = TYk . Note that
Tx = x/ m for all x E H, so that T({ y E R m _,: y'y <_ 1}) equals {x E R m : I;” xi =0
and m — I x; ^ 1}. Now apply the multivariate LIL of Theorem 3 to the Y's.

Exercise 3. Suppose X 1 , X2 ,... are iid (0, T) random m X I vectors with


ITS 0 0. Show that S. = X, + • • • + X. satisfies
(14) lim S„/(Vb,,) N-► {x: xE R m has x'Zx-1}.

(See Sheu, 1974 and Berning, 1979 for non-iid extensions.)

Exercise 4. Show that (2) trivially implies

(15) i
Tim
ö = 1 a.s.
.^ 2t log e t

Criteria for Relative Compactness on (D,

We need a workable criterion for establishing — on (D, I1 ). The next theorem


is a natural analog of Theorem 2.3.3 for ', in that the limit can be identified
from the limit of the finite-dimensional projections.
76 FOUNDATIONS, SPECIAL SPACES, AND SPECIAL PROCESSES

Let Tm = { tmo, ... , tmk }, where 0 = tmo <_ tm ' ' ' tmk ,, 1 denotes a finite
subset of [0, 1]. Recall that ar (x) (x(t mo ), ... , x(tr k- )) denotes the projec-
tion mapping of x. We will call the function x m (the function z m ) that equals
x(tmj ) at each t mj and is constant (linear) on each subinterval closed on the
left and open on the right (closed on both the left and right) between adjacent
t,nj 's the Tm -approximation of x (the Tm -linearization of x).
When dealing with —Ye a.s. wrt II II on D, we will make the following
assumptions about Ye:

(16) JT is a jI II-compact subset of D.

For every h E , the Tm -approximation h m and the T-linearization hm satisfy


both

(17) Ilh,n – hll -40 as moo

and

(18) hm E g' and Il h m hm II 0 as m -' oo

provided Tm becomes I I-dense as m - oo (i.e., provided max {tm; – tm; _,: 1 <_
i^km }-+0 as m -cc).

Theorem 4. (M► criteria) Let X, denote random elements on (D, ). Suppose


that for each fixed m the Tm -approximation X n ,n of X. satisfies

(19) lim IIX nm — X n II <_ a m a.s. where a m -O0 as m - X,


n-.00

and for some X satisfying (16)-(18) we have for each fixed m that

(20) 'n m (Xe) ITT,,, (9C) a.s. wrt I I on R m

for ordinary Euclidean distance I j on R m . Then

(21) Xn — W a.s. wrt II II on D.


Proof. Let A,,,, (let A2.) denote a null set of cw's for which (17) fails [(18)
fails]. Then A= f1° (A, m n A lm ) has P(A) =1 and both (17) and (18) hold
for all m and all w E A. Moreover, for each w E A conditions (i), (ii), and (iii)
of the definition of » hold (as we will now show). Let to E A be fixed.
We first verify (iii). Let h E Now for all m we have

(a) IIx – hOI IIX,, XnmII+IIXnm – hmII+IIhm – hII


for the Tm -approximations X nm and h n, of X. and h. Let Ek ,[0 be given. We

RELATIVE COMPACTNESS ^» 77
now construct a subsequence n k. ,Tco as follows. Let m,. be so large that

(b) II h mk — h II < E k and a m , < E k /2 by (17).

Because of (19) we know that there exists an Nk, 0, such that

(c) IIX,,—Xnmgllca,k+Ek/2<Ek for all n ^ N .

From (20) we have that

(d) IIXnmk — h mk II < 6k infinitely often.

Combining (b), (c), and (d) into (a), we can pick an n, (it must exceed both
n k _ 1. and Nkr,) for which

(e) IIXn— hfl <3ek for n=n k .

Thus (iii) holds.


We next verify (ii). Let f denote a limit point of X n along some subsequence
n'= n',. For each m the Tm -approximation fm is the limit point of X nm along
the same subsequence n'. First, fix m so large that the a m of (19) is less than
E/2. Then for all n' sufficiently large we have

(f) ilfm —fhl ^ IIJm —Xn'm II + fiXn'm —Xn'fl + ijX' —fhl


E+IIX. n . m X. n 'II+E

(g) :3E.

We know that fm is the Tm -approximation of some hf in Let fm denote the


Tn,-linearization of hf. Then

(h) IhIm—f1I II.fm — f,II+II.fm .fII^e+3E=4E


— for all large m

by (18) and (g). Thus f is the II II-limit point of the sequence fm , where fm E yC
by (18). But C is II II-compact by (16). Thus f E Ye, and (ii) is verified.
It remains to prove (i). This will hold if we show that every subsequence
n' has a further subsequence n" that is Cauchy. Fix m so large that a m < E/2.
Since

(I) IIXn—XNII IIxn—XnmII+IIXnm—XNmII+IIXNm—XNII,


and since X n ' m contains a Cauchy subsequence X,.• m by (20), two applications
of (19) complete the proof of (i).
A similar theorem that is a more exact analog of Theorem 2.3.1 is contained
in Oodaira (1975). The present theorem was motivated by Finkelstein (1971).
It is similar in form to the = criterion of Wichura (1971). ❑
78 FOUNDATIONS, SPECIAL SPACES, AND SPECIAL PROCESSES

Exercise 5. (Wichura, 1973a) Show that (19) and (20) may be replaced by
(22) P( lim inf ffX„ — h ^f = 0) =1
n-.ao hek

(23) P(sup tim JJX, — h jf = 0) =1.


hE,lC n-

Exercise 6. (Oodaira, 1975) Show that (19) may be replaced by

(24) X,, is 11 11- relatively compact a.s.

Exercise 7. Prove Strassen's theorem (4) [consult Corollary 2 to Theorem


(2.9.1)].

Mapping Theorem

The following theorem is to M► as Theorem 2.3.5 is to . However, the proof


of this theorem is trivial.

Theorem 5. (- mapping theorem) Suppose

(25) X „M► 2' a.s. wrt S on M,


where ' is a subset of the metric space (M, S). Let {çii„: n - 0} be mappings
from (M, 6) to the metric space (M, 6) such that for a.e. w we have

(26) S(a(^„•(^Y „•(w)) çlio (h)) -+0 as the subsequence n'-*co

whenever h E' and S(X,,,(m), h) -* 0 as n'-^ co. Then

(27) i#n(Xn —qio( &T) a.s. wrt S on M

Proof. Let A, be the set having P(A,) =1 that is associated with (25). Let
A denote the exceptional set on which (26) fails. Let A = A, n A 2 , so that
P(A) = 1. Let co E A. We will show that for every co E A conditions (i), (ii),
and (iii) of definition (1) of M► hold. Let n' be an arbitrary subsequence.
Condition (25) guarantees the existence of a further subsequence n” and of
an h E ' for which S(X„•(u,), h) --> 0 as n"-x. Hence (26) yields
fi(^i„.(X„••(cu)), rjro (h))-* 0 as n "-> oo. Thus (i) holds. Let h E i/i o (); then h=
gio (h) for some h E Condition (25) guarantees the existence of a subsequence
n' for which S(X„•(w), h)--0 as n' -+oo. Thus S(a/i„•(X„•(w)), h)-+0 as n' -goo.
Thus (iii) holds. Suppose S(tji„.(X„•(w)), k) - 0 as n'-^ co for some subsequence
n' and some k€ M. As in our proof of (i), there exists a further subsequence
n” and an h E ge for which S(i/i„••(X„••(w)), «io (h)) ' 0 as n "moo. But
S(rJi„.(X„••(w)), k) ^ 0 as n" -.cc by our assumption. Thus k = +J0 (h) E s<r o ( 'T).
Thus (ii) holds. See Wichura (1974a). ❑

RELATIVE COMPACTNESS OF SS(nt) / /i 79

9. RELATIVE COMPACTNESS OF §(nI)/.%n b„

Discussion of the Particular Limit Sets air' and .*

We recall from (2.8.5) that

(1) X = j k: k is absolutely continuous on [0, 1] with k(0)=0 and

J0
[k'(t)] 2 dt<_ 1I.

Since I k(t) — k(s)I = Js k'(r) drl <_ { Js dr Js [k'(r)] Z dt}" 2 <_ t — s, we have

(2) Ik(t)—k(s)I< t—s for all 0<_s<_t 1 and all keg.

Proposition 1. k E yC if and only if k(0) =0 and for every partition 0 = t o < t, <
< tm = 1 we have

Z
(3) i -1 t[k(t
i—ti ' )— k(ti-1)]
I <_1.

Moreover,
(4) satisfies (2.8.16)-(2.8.18).

Proof. Suppose k E . Then, as in the proof of (2), we have (3) since


m m Ct

(a) Y_[k(t,)—k(t,-i)]2/(t1—ti ,)^ [k'(r)] 2 dr= [k'(r)] 2 dr^ 1.


I I c ; , o
Suppose (3) holds. Then k is absolutely continuous, since
m m k(ti )—k(t i-1 ) 2 m 1/2
Elk(t,)-k(t,-,)l E t^-t. .Z}
t, — t-^ ,

I m l 1/2

so k' exists a.s. Also


km(t) _ k(i/m) i /(mi-1)/m) for*Iml<_t<m and lei<_m

_ i /m—t k(i/m)—k(t) t— (i -1)/m k(t)—k[(i-1)/m]


1/m i/m—t + 1/m t— (i -1)/m
(b) - k'(t) as m-*x.
80 FOUNDATIONS, SPECIAL SPACES, AND SPECIAL PROCESSES

Moreover, (3) implies

/m)— k(( i - 1) /m)]2


(c)
fo ,
[km(t)]Zdt=. [k(i -1 for all m.

Thus, using Fatou's lemma,

(d) J0
[kl(t)]2dt= J
0
lim[km(f)]2dtlim J0
[km(t)]2dt<_1.

Thus kEX.
We now turn to (2.8.16 - 2.8.18). That km E 3C follows from (3) by the
argument written out twice so far. Both 11 k,„ — k 11-+ 0 and k,„ — k,„ I -+0 are
trivial by uniform continuity of k. For (2.8.16), suppose I k — k u1-+ 0 for any
;

k E 9l: Each k satisfies (3); passing to the limit in (3) for k , we find k satisfies
; ; ;

(3). Thus ke^C, as shown above. ❑

Theorem 1. (Strassen) For b„ = 2 log, n we have

S(nI)
(5) ^- " - ^L a.s. wrt ll lion D as n oo.

Corollary 1. For all e > 0 there exists an NE, w such that the increments of S
satisfy

jS(ns, nt]l
(6) t — s + e for all 0 <_ s <_ t <_ 1 whenever n >_ N ,. EB

fib„
Corollary 2. (Strassen) The partial-sum process S. of iid (0, 1) rv's satisfies
(2.7.4). That is,

(7) Sbn ^^► ^f' a.s. wrt II II on D as n - oo.

Proof. The reader is referred to Strassen (1964) for (5); note also Exercise
2.8.7. Then note that (5) combined with Skorokhod's embedding (2.5.24) gives
a trivial proof of (7). Also (6) follows trivially from (2) and (5). (Note the
proof of Cassells' theorem (Theorem 13.3.2).) ❑

We now define Finkelstein's (1971) class

_[ h: h is absolutely continuous on [0, 1] with


(8) — h (0) = h(1) = 0 and J, [h'(t)] 2 dt _< 1

RELATIVE COMPACTNESS OF S(nl)/f 81
We will first show that

(9) 1h(s+t)—h(s)I_ t(1—t) for all 0<s<s+t<1 and all hem

If one rearranges h by defining g(r) to equal h (r+ s) — h(s), h (r — t) + h(s — t) —


h(s), h(r) according as 0<—r<—t, t<r^s+t, s+t<r<1, then g e X also.
Applying Proposition 1 to g gives [h(s+ I)— h(s)] Z = [g(t)] Z t(1— t).

Exercise 1. (i) Recall from Exercise 2.2.13 that

S(n)
I)—IS(n
(10) B(n, •) _ = U = Brownian bridge,

where we call B the Brillinger process.


(ii) Use Strassen's theorem (Theorem 1) to show that

B(n, )
(11) b ^^► &Y a.s. wrt ^^ ^^ on D as n -> oo.
n

Exercise 2. Show that

(12) 'satisfies (2.8.16)-(2.8.18).

Show (or note) from the previous exercise that

(13) h E if and only if h = k — Ik(1) for some k E.Jl.

See Strassen (1964) for some very interesting applications of Theorem 1.

Exercise 3. The relative compactness of the partial-sum process on [0, co) of


any iid (0, 1) sequence is handled by the following result from Wichura (1974b).
Define

k: k is absolutely continuous on [0, cc)]


(14) ^f^= I with k(0) =0 and Jö [k'(t)] 2 dt - 1

Now let X 1 , X2 ,... be lid (0, 1) rv's. Let b n ° 2 log e n. Then

(15) S n /b n N ► ^ a.s. wit II /gliö .

where q(t)=[(e 2 v t) log t (e 2 v t)]" 2 for t>0 (as in Exercise 2.7.1).


82 FOUNDATIONS, SPECIAL SPACES, AND SPECIAL PROCESSES

10. WEAK CONVERGENCE OF THE MAXIMUM OF


NORMALIZED BROWNIAN MOTION AND PARTIAL SUMS

We call S(t) /f normalized Brownian motion since it has variance 1 for all t.
We define

(1) m(l)= sup § ) and M(t) = sup I§(s)t


1<s =t

We define normalizing functions b and c by

(2) b(t) = 2 log 2 t

and

(3) c(t) =21og2 t+2 - ' log 3 t-2 ' log (4r)
-

for t> e e. Let E„ denote the extreme value df defined by

(4) E0(t) = exp (—exp ( —t)) for —x< t <oo.

Note that E t, has a long right-hand tail, and the kth power E. is the df of the
maximum of k independent rv's having df E,,.

Theorem 1. (Darling and Erdös, 1956) We have

(5) b(t)m(t)—c(t)-'dE„ as t-^cc

while

(6) b(t)m(t)—c(t)-'dEt, as t-*.

Recall from Exercise 2.2.9 that

(7) X(t) = e -t S(e 2t ) for t > 0 is the Uhlenbeck process.

Now note that

(8) m(t)= sup X(s) and M(t)= sup jx(s)I.


Osss(1/2)1ogt 0ss =(1 /2)log:

Darling and Erdös (1956) proved Theorem 1 via the representation (8). They
also proved an analogous result for the partial-sum process of independent
(0, 1) rv's with uniformly bounded third moments. However, the stronger result
presented as Theorem 2 is due to Oodaira (1976) and Shorack (1979b);
examples involving dependence are given there.
THE LLN FOR lID RV'S 83

Theorem 2. Let S denote any process on [0, oo) for which

(9) D(t) S(t)—§(t)j/f=O(t ") a.s. - as t-400 for some S>0;

recall (2.2.6). Suppose also that

^S(S))
(10) Y sup
1--s(logn)` ' S

is essentially equal to —r in that for all large K > 0 and small e > 0

(11) P(b(n)Y , —c(n)?—K)g(s)


f E where g(E)-+0 as e-0.

Then
^S^)^
(12) m =
it sup S") and Mn =^
sup
n
I sssn

satisfy

(13) b(n)m n —c(n)- d E„ as n-*oo

and

(14) b(n)Mn —c(n)-+ d E" as n-^oo.

11. THE LLN FOR lid rv's

Theorem 1. (Kolmogorov's SLLN) Let X 1 , X2 , ... be lid. Then

(1) Sn/n->a.s_EX as n-*ao provided EIXI<oo,

while

(2) lim ISn j/n=oo a.s. provided EIXI=oo.


n-0

The following is a strengthening of the divergence half of this classic result;


see Feller (1946).

Theorem 2. (Feller) Let X,, ... , X. be iid with E[XI =. Let a n ?0 satisfy
a n /n/. Then

HsnI 1 0 °° ( <00
(3) lim = S according as E P(IX ? a n ) = S —
n - an 100 n=I l co.
84 FOUNDATIONS, SPECIAL SPACES, AND SPECIAL PROCESSES

In another direction, we analyze the magnitudes of the probabilities


P(IS I/n ? e) under moment conditions.

Theorem 3. Let X 1 , X2 ,... be iid.


(i) (Spitzer)

(4) EX =0 if and only if En - 'P(IS I?nE)<oo for all e >0.


S
n=t

(ii) (Katz) If a> 2 and ra > 1, then


0"

(5) EIXV<co if and only if nra -2 P(ISn I >n °e) <00 for all e >0.
n=1

See Chow and Lai (1978) for citations and additional results and references.

A necessary and sufficient condition for the WLLN is given in the next result.

Theorem 4. Let X 1 ... , Xn be iid F. In order that there exist constants µ n


,

such that S./ n µ n -> 0 as n + oo, it is necessary and sufficient that


(6) x[1 —F(x) +F( —x)]->0 as x-m.

In this case µ n = J n x dF(x) works.


CHAPTER 3

Convergence and Distributions


of Empirical Processes

1. UNIFORM PROCESSES AND THEIR SPECIAL


CONSTRUCTION

The Processes

We let 1,, ... , e„ denote independent Uniform (0, 1) rv's whose df is the
identity function I on [0, 1]. We define the uniform empirical df C„ by

t „
(1) Gn(t)=— i 1 [€c,^ for0<_t^1.
n i.,

Note that

(2) nC„(t) = Binomial (n, t)

so that

(3) E6„(t) = t and n Coy [G„(s), G,(t)] = s A t —st.

The inverse uniform empirical df 6„' is the left-continuous function

G(t) = inf {x: G„(x)>_ t}

1
(4) _.„,; for—<ts—and 1-5i-5n
n n

with 4„'(0) = 0; here

( 5) 0 — fn:0 6n:l &:n ^ Sn:n+1 1


85
86 CONVERGENCE AND DISTRIBUTIONS OF EMPIRICAL PROCESSES

is our notation for the uniform order statistics. We will also have use for the
smoothed uniform empirical df L. defined by

(6) (t) equals i/(n + 1) at each en:; and is linear on each [fi n:; _,, fin;; ].

We agree that

(7) U, = Jn [G n — I] is the uniform empirical process,


(8) 0/n = ✓n {6' —I] is the uniform quantile process,
(9) Un = In [t,, — I] is the smoothed uniform empirical process,
(10) 4/„ = .In [ — I ] is the smoothed uniform quantile process.

Note the basic identities (see Figure 1).

(11) JIG '—III=116n—III, Ilan— III= — III


(12) N n = —U n (6 n ')+J[G n ° Qv n ' — I]
(13) Un = 1n(Gn) +1i n [ 3 n ' ° 6 n — I].

We can write U n as

n
(14) Un(t) Jn 1IfIE4i <1 — t} for0t1.

It is clear, as in (3), that

(15) EU n (t) =0 and Coy [U n (s),UJ n (t)]=snt —st for 0<—s,t<_1.

We agree that

(16) U will denote a Brownian bridge,

since, matching (15),

(17) EU(t) =0 and Coy [U(s),U(t)]=snt —st for0 s,I:1.

The ordinary multivariate CLT clearly implies that

(18) Un —'f.d. U as n -^ Q0.

In light of (12), we make the definition

(19) V = —U, which is also a Brownian bridge.


w~~~-~.
1 IG _ 1

O o
"
n:1 cflfl E-:i ^^^ n :n
(a) (b)

1 ^/^^ 1 1
/ ^-°

^ U
1rx-1
^^. 1n'-1
__ ~"" _-__. 1
n TI. Ti m
(c) (d)

1 1

^
" "
Emu ^^^ En:p ^^^
E~:i Ennp 1

(e) -'
Figure I .

87

88 CONVERGENCE AND DISTRIBUTIONS OF EMPIRICAL PROCESSES

Lo

Nn
0.5

0.0

-0.5

In
0.0 0.2 0.4 0.6 0.8 1.0

Figure 2.

Exercise 1. Verify (18).

Suppose now that {c,,,, ... c; n ? 1) is a triangular array of known con-


stants. We usually assume the u.a.n. condition
2 2
(20) max "' c = max cn 'C 2-+ 0 as n - co
1 <_isn C C
S isn ^n
1

where c cn = (ci1 , ... , cnn )'. The weighted uniform empirical process W n is
defined by

(21) W(t)= 1 cni[l[ ]


✓c c
'
, —t]
i_1
for0 t 1.

Note that we trivially have

(22) EW n (t) =0 and Coy [W n (s),W„(t)]=sAt —st for 0<s,t<I


and

(23) Coy [U .(s),W„(t)]=p n (snt —st) for 0<_s,t^1,

where
1'c c„
( 24 ) Pn=Pn(C, 1 )=
✓ 1'lc'c ✓c
UNIFORM PROCESSES AND THEIR SPECIAL CONSTRUCTION 89

n3 —7 .........
Cri 1 [^ni $ t]
^► ♦ ^—iw ^ i =1
C ann ' I

C nDn2 Zi y
^_
Cni t
~^ ti=1
C nD.t -i+I
t
0 En;i En:2 En: 3 "' En:n 1

Figure 3. The difference between the step function and the straight line is

with 1'=(1,...,1) and

(25)
1 „
c„ _ —
ni _, ;
2=1 „ 2
c„ and c„ — — c i
n ,_,
„.
This holds since

sn t —st ifi=j
(26) Cov[1—s, 1[^,]—t]=
0 ifi5&j.

In light of (22) we agree that

(27) W will denote a Brownian bridge,

so that

(28) EW(t)=0 and Coy [W(s),W(t)]=SAt —st for 0<_s,t<_1.


The reader is asked to show in Exercise 2 below that

(29) Wn —* f.d. W as n -* oo.

We agree further, as is suggested by (23), that

(30) whenp„=C„/'-*0as n-cC,


we construct W and U to be independent.

Note that

(31) W. reduces to U,, when all c„ = 1. ;


90 CONVERGENCE AND DISTRIBUTIONS OF EMPIRICAL PROCESSES

Suppose now that

(32) R„,, ... , R„„ denote the ranks of , ... ,„

and that

(33) D 1 ,. .. , D„„ denote the antiranks defined by


RD, i or X0 X„:;•

Note that

(34) all n! permutations of 1, ... , n


are equally likely values for D„,, ... , D„ n

(or R 1 ,. , R„ ). It is thus appropriate to give the name finite sampling process


to Q8„ defined by (let cno = c„,„ + , = 0)

1 ( ( „ +I)r)
(35) l(t)= — E c ; for0s t <_ 1.
,Jc c i =,

We will also use the name empirical rank process. Note the fundamental identify

(36) R „ =W „(fin') provided c„ =0.

The above discussion and identities suggest that

(37) UJ n , N n , W „, R „, G n , k-' "converges jointly” to U, V, W, W, I, I,

where we require c„ = 0 to suggest that 68„ converges to W.

t --- ---^

Cant _ ^zt 0,^! CflD.w

—1 — k-- 1
n+1 ♦— n+1

Figure 4. c'cR„ ; note that c„ = 0.


UNIFORM PROCESSES AND THEIR SPECIAL CONSTRUCTION 91

Let Y2 denote the space of all square integrable functions on [0, 1], and let

J
1/2

J
('

(38) h= h(t) dt, h (t)dt} ,


2 u [h]1 2— ham .
0 { 0

We let ( , ) denote the inner product on Y 2 ; thus

(39) (h, h)= f 0


h(t)^(t) dt and weleto- h ,, (h—h,h—h)

for all h,hE5 2 .


We say that h is of bounded variation inside (0, 1), which we denote by
h E BVI(0, 1), if h is of bounded varation on [e, I — s] for all e > 0.

Exercise 2. (i) Use the Lindeberg-Feller theorem and (20) to show that if
I[h]1 2 <co, then

(40) 1 Y_ cn;[h(f1) — i] — dN( 0,on) asn- x.


✓ c'c ;_,

Then use this to show that

(41) W n —* f. d .W asn-*co.

(ii) Show that if J[h]I 2 <o0, then (20) and Lindeberg-Feller give

1
(42) 1 cn; h() --" d N(0, [h]j 2 ) as n -> oo provided J. = 0.
✓ c'c ;_,

Statistical Applications

We now turn to some statistical applications based on the presumed conver-


gence of the above processes. Let X; = F - '() so that X 1 ,.. . , X„ are lid with
df F; let F n denote their empirical df.
We note that the Kolmogorov-Smirnov statistics satisfy

(43) K„ _ ✓ n t(0F n — F) # II = ^JUn (F)II (recall that # denotes +, —, or


(44) = UI) if F is continuous,

so that we expect

(45) K„d K#= 11UII as n-^oo, when F is continuous.

The distribution of K' is taken up in Section 3.8.


92 CONVERGENCE AND DISTRIBUTIONS OF EMPIRICAL PROCESSES

Also, the Cramer- von Mises statistic

(46) W„ = J n[F.(x) - F(x)] 2 dF(x)

U 2(F) dF
(47) =f

(48) =
E v„(t) dt if F is continuous

by the change-of-variable Exercise 2.1.4. Thus we expect that

(49) W„ -> d W Z = J U 2 (t) dt as n -), oo, when F is continuous.


0

The distribution of the limiting ry W 2 will be taken up in Section 5.3.


Consider the statistic

/ i /
(50) S,, -7=Y_ Cni[h(&i) — h]
✓c c

(51) = I h dW n .
0

If W„ "converges” to W, we would expect that

(52) S„ -mo d hdW


0

(53) = N(0, o h,)

for an appropriate stochastic integral J h dW. Exercise 2 shows that the ry of


(52) must satisfy (53), and, indeed, shows that just --* d in (52) is nothing but
unusual notation for a simple Lindeberg-Feller result.
Consider the simple linear rank statistic

(54) TT= -L=^^^ h (


-) cm _ ✓ l Y,

Cn Doi
(- h in1 )

(55) _ 0
h dJI8„
(56) _ -

f I8„ dh
o
if h e BVI(0, l) is left continuous.
UNIFORM PROCESSES AND THEIR SPECIAL CONSTRUCTION 93
If Fl converges to W when c„ = 0, then we would expect [from (55) and (36)]
that

(57) T„ — +d E 0
hdW=N(O,o1.) asn^eo ifc„=0.

It is important to note that when (45), (49), (52), and (57) are made precise
via a special construction, we will not only have —> d in these equations, but we
will also have a representation of the limiting rv's.
This should serve as sufficient motivation for what is to come. In later
chapters we wish to consider statistics (such as S„ and T„) under both null
and very general alternative distributions. This section has allowed us to set
up notion for the null situation. The following theorems are special cases of
results to be proved in Sections 3 and 4; Theorem 1 rigorizes (45) and (49)
while Theorem 2 rigorizes (52) [we will rigorize (57) later].

The Special Construction

Theorem 1. (The special construction) Suppose that {cn1 , ... , C; n ? 1}


satisfy
z
(58) max "
C i 0 and p„ =- -gyp as n - oo.
1 gincc
, c„

Then there exists a triangular array of row-independent Uniform (0, 1) rv's


n >_ 1} and Brownian bridges U and W having

(59) Coy [U(s), W( t)] = p[s A t — st] for 0< s, t ^ I

that are all defined on a common probability space (Cl, .s', P) for which

(60) 11U„—U1I- 0 asn-moo


(61) IIN„ —VII 0 as n -g oo with V —U
(62) DDW„—WlH0 as n-*oo
(63) IJR„—W -*0 asn -goo if c„=0
for everyt u, e Cl.
and for convenience,

(64) 0n:o< n:t<


... < n:n<n:n+t =1

and

(65) U and W are continuous functions on [0, 1].

t It is customary to claim only "a.s.” in a special construction, but we chose "for every w” instead
at the small price of redefining things on a null set.
94 CONVERGENCE AND DISTRIBUTIONS OF EMPIRICAL PROCESSES

Corollary 1. If we drop the assumption that p„ - p as n from Theorem


1, then we still claim (60) and (61), or (62); but we connot claim that these
happen simultaneously on the same (11, .sit, P).

Remark 1. To make our notation more precise, we agree that on occasion

(66) we will denote W n by W „, where c = (c n 1 , ... , Cnn )'-

We thus note that

(67) it is appropriate to denote U. by W ,, with 1 = (1, ... , 1)'.

We will also

(68) introduce a third process W „, where a = (a,,,, . .. , a,,,)'.

Recall that

(69) Pn = P,(c, 1) = I'cj 1'lc'c and introduce


p n (c, a) = a'c/✓a'ac'c, and so on.
If

(70) Pn(c,1)- Pcl, pn(c,a) - Pca, Pn(a, 1 ) ' Pal asn - goo,

then Theorem 1 can be extended by adding the conclusion that

(71) IW —W 1-*0 as n >a0, for all w,

where w a is a continuous Brownian bridge with

(72) Coy [Wa(s), W`(t)] = pca [s A t — st] and


Coy [W a (s), U(t)] = p a1 [s A t — St]

for 0 <_ s, t t. (It is sometimes appropriate to let "a” denote the "truth” and
let "c" denote what we think is true.)

Theorem 2. Let h, h e 2'2 . Under the hypotheses of Theorem I we can claim


the existence of rv's on (11, sd', P), to be denoted by $ö h dW and h du, for jä
whicht

(73)
t

0
h dUn =a f
0
hdv =N(0,o )

t Recall that X„ = o Y„ means X. — Y„ p 0. Also, it is trivial that


—.

h du,, --> d N(0, o) in (73).

What is of interest here is that we have a limiting ry on (fa, s4, P), denoted h du, for which

h dUJ„ —J, h dU as

UNIFORM PROCESSES AND THEIR SPECIAL CONSTRUCTION 95

and

74) j^ hdW„= Q j ' hdW=N(0,oh).


0 0
Moreover, U, W, f ö h d U, Jo h d U, Jö h dW, and j7 h d W are jointly normal with

> 1j' > l( '


I (75) Cov U(t),
o
J hdU I =Cov W(t), J h dW J = J h(s) ds - th
(

= Ql^^ . ,^,he
0 o

('
(76) Cov I J
L o
1 (' '
h d UJ, J h dU I=Cov Jh d W,
o
1
J
IC
o
1
f o l hd W = uh,n,

(77) Cov I U(t),


LL f o
,
hdW =Cov W(t),
J 0
j hdU I[^,],,,,
=pv,

(78) Coy J h d u, I h dW I =
0 Jo
As is suggested by these covariance formulas,

(79)
fo ,1
[o,,^dW=W(t)=—J " WdI to,, 1
o

and

(80) 1,,,,] dU = U(t) = —U d1 1011 .


Jo
Jo o

Corollary 2. If we drop the assumption that p„ - p as n oo from Theorem


2, we cannot claim that results for v, U and W., W hold simultaneously on
the same ((1, si, P).

The Glivenko—Cantelli Theorem

Theorem 3. If 6,, 62 , ... are independent Uniform (0, 1) rv's, then

(81) IIGn— Ill =JI3„'— 'II --ßa.5.0 as n - oo

and

(82) llt1in —Ill =Ili'— [11 ßa.5.0 asn -*oo.


96 CONVERGENCE AND DISTRIBUTIONS OF EMPIRICAL PROCESSES

Proof. Let M be a large integer. By applying the SLLN to each of the finite
number of binomial rv's nG n (i/M), 0 <_ i -5 M, we conclude that

(a) yn = o ma x IG ,, (i/ M) - i/ MJ -*a.s. 0 as n -> oo.

Also, for (i- 1)/ M < t<_ i/ M we have by monotonicity that

[ G(
I -1 1 1
) M ] M^Gn(t) ts G n ( )_. .
i i l +M.
1
(83) -
-

Combining (a) and (83) yields

(b) lim IjG„ - I jj <_ lim i Yn +


n- 0 n-+oo \
M j ='
and since M >0 is arbitrary, the result follows. We note that IIGn - III =
IG'-III from Figure 3.1.1. ❑

The monotonicity technique used in the above proof at step (83) is worth
remembering.

Generalizations

Much current research deals with the following type of generalization of these
ideas. Consider

( 84 ) IIvn ILS sup {Ivn(A)I: A E sB}

for some large class .si of measurable sets; here

(85) vn(A) Jn [Gn(A) - IAI]

for the empirical measure G n and Lebesgue measure I of A. Of course, for


each fixed A we have as n oo that

(86) Gn(A) - IAI_-a.5. 0 and


U(A) some ry U(A) = N(0, IAI(1 - IAI)).

When only a finite number of sets A are involved, the latter is no more general
than Q.J . ^f.d. U . When . ' ={{O, t]: 0 <_ t_< 1}, (86) holding uniformly reduces
to (81) and (60). Further generalizations to larger classes of sets can lead to
difficult problems. Since UJ,(A) = jö l 4 dUJ,, a further generalization would be
to consider

(87) U,(h) = J 1 h dU n for functions h in Y2 ;


0
UNIFORM PROCESSES AND THEIR SPECIAL CONSTRUCTION 97

under these hypotheses 0J„(h) (0, vh) and is asymptotically normal. One
could then ask for uniform convergence and study the limiting process indexed
by h. See Chapters 17 and 26 for some results of this type.

Uniform Order Statistics

Since e„ :; < t if and only if at least i of the n observations are less than or
equal to t, we have the key relation

(88) [ „ i Ct] = [nG „(t)>


: —i]= [Binomial (n,t)>_i] for any 0 t<1.

Differentiating this probability with respect to t shows that

(89) „:iisaBeta(i, n —i +1)ry

with density
I
(90) —t)" - ' for0<— t<_ 1.
(i -1)!(n—i)!t'-'(1

This is also "proved” by noting that n! /((i —1)!(n — i)!) distinct groupings of
the n observations have i —1 observations less than t, one equals to t, and
n — i exceeding t; each such grouping "has probability density” t'(1—t)"'.
Similar reasoning shows that for 1 i <j < n the joint density function of
„ :i and ; is

I
1
(t—s) , -
I(1—t)" " for0<s<_t 1;
(91) (i -1)!(j—n-1)!(n—j)!S'

and from this, or from Proposition 8.2.1, it follows easily that

(92) „:j— „, i is a Beta (j — i,n—j +i+ 1) rv.

Proposition 1. For 1: i ^ j < n we have

i 1 i j
(93) E "` i= COV[6„:,,f„:;]=
n+1' n+2n+1 1 n+l

and

1 j —i ( 1— j —i
(94)
n+l
Ti )

Proof. This follows from elementary computations based on the densities.


El
98 CONVERGENCE AND DISTRIBUTIONS OF EMPIRICAL PROCESSES

Exercise 3. Give rigorous proofs of (89) and (93) based on (88) and its
trinomial analog.

The joint density function of (n:1,.. , n:n ) equals

(95) n! on the region 0<t<.. < to < 1.

To see this, note that the inverse image of any (t ,, ... , tn ) having 0< t, < • • • <
to < 1 is the set consisting of the n! permutations of (t,, ... , t o ). The density
value, namely 1, at each of these n! points is mapped onto the same image
point to yield a density value there of n!.
The uniform spacings 5 n; are defined by

(96) - for 1 <_ i n + 1


6 n; = fin:; — :iI

2. DEFINITION OF SOME BASIC PROCESSES UNDER GENERAL


ALTERNATIVES

Let Xn ,, ... , Xnn , n > 1, be a triangular array of row-independent rv's having


df's F„,, ... , Fnn, n ? 1. It is natural to define the empirical df

in
(1) F,(x)°— I 1[Xx] for —ao<x<oo,
n ;_,

the average df

n
(2) F.(x)=— j F,, 1 (x) for—co<x<co,
n ;_,

and the empirical process

(3) -Jn[F,(x)—F.(x)] for—co<x<ax.

Let Xn: , _
< . < Xn:n denote the order statistics. Let

(4) N/n[C= „`(t) — F„'(t)] for 0<1<1

denote the quantile process.


Let Cni , ... , Cnn , n >_ 1, be a triangular array of known constants. We usually
assume the u.a.n. condition
i

(5) max —-*0


C -.0 as n -^ oo.
i^i n C'C
DEFINITION OF SOME BASIC PROCESSES 99

The general weighted empirical process is defined to be (for any vector c)

1 ^
(6) l(x)= c^,[1[x,,<XJ—F„j(x)] for —co<x<oo.
vc c

Reduction to (0,11 in the Case of Continuous F„ ; 's

Suppose for the moment that

(7) FF,, ... , F„„ are all continuous df's.

Define

(8) a,, = F^(X"1), G^r = F,,; o Fn', and e"r = G"r(a",)•

Then (the following facts are proved later in this section)

a„ ; has absolutely continuous df G n; on [0, 1],


(9)
„; = Uniform (0, 1),

and

(10) — E G„ (t)=t
; for0^i<1.

In fact, it is a.s. true that

[a";ct]=[`<_G„;(t)] forall0<t<_1
(11)
[X„ jux] =[a^;:SF"(x)]=[e"1^G^,(F^(x))] for all —oc<x<x;

note (44) and (45) below. The reduced empirical df

_1 _ I
C"(t) — for0<t<_1
(12)
n;=1 1 [n..i<<] —
n ;_, l[..^G..,(t)]

of a,,, ... , a„„ is used to form the reduced empirical process X„ on (D, 2)
defined by

1 ”
(13) X,(t)-`I [C„(t)—t]= _ {ltf.,.^G (,)] — G";(t)} for0 t<1.
.fin

Clearly,

(14) EX„(t) =0 for0<— t<_ 1


100 CONVERGENCE AND DISTRIBUTIONS OF EMPIRICAL PROCESSES

and
K„(s, t) = Coy {X,,(s), X„(t)]
(15)
=snt - 1n G
;= i
1 (s)G(t) for0:s,t<1.

The reduced quantile process V. on (D, 2) is defined by

(16) V„(t) ='Jn[GR'(t) -t] for0<_t_1.

Expressions for the exact mean and covariance function of V. would be


difficult, but we shall find that the asymptotic behavior of V. is simply related
to that of X,,. This relationship comes from the identities

(17) lY„=-RC„(G„')+n[G„-Q^n'-I]

and

(18) X„ = -V(G) +,In [G -n G„ - I].

We will also consider smoothed versions lk,1^', >K,,, l' n of these processes.
They are random elements on (C, W). [Recall (3.1.9)-(3.1.10).]
It will suffice to concentrate our study on the reduced processes because
the fundamental identity,

(19) for all n, holdsa.s.

This is true because nG„(F„(x)) equals the number of indices i for which
= F„(X„ ; ) <_ F„(x); and (11) shows that this is a.s. equal to the number of
indices i for which X,, ; <_ x. We need only union (11)'s null sets over a finite
number of indices i for each of a countable number of n's; such a union is
again a null set. Note also that the quantile process is such that

(20) J[f„' - F„'] =.In [F„'(G ) - Fn'] for all n, holds a.s.

We define the weighted empirical process 7L„ of the a „ 's on (D, ) by


;

Z»(t)= , cn;[ltntt -
G,,;(t)] for0^ t <_ 1
(21)
_✓ in
— c;[ 1 ( .Qc,»-Gnr(t)]
c'c; =1
for0<_t<1

for known constants c 1 , . . . , c„ „, n >_ 1. We usually assume the u.a.n. condition


z
(22) c i - 0
max ° as n - (x.
lstsn c'C
DEFINITION OF SOME BASIC PROCESSES 101

Note that

(23) EZ(t)0 for0<t1

and

(24) Coy [Zr(S), 7 ( t)]= i L Cni[Gni(S At) — Gnf ( s)Gni(t)]


CCi_,

for 0^s, t-1.

Moreover,
n
(25) Coy [Xn(s), Z (t)] _—
'In i .,
c=
= [Gni(s A t) - Gni(s)Gni(t)]
'Ic'c

for O s,t<_1.

We now let

(26) R„,, ... , R„ n denote the ranks of an,, ... , ann

and

(27) D 1 ,. .. , D„„ denote the antiranks defined by


R„ D =i or afD = an:i•

Since all F„ are continuous, there is a.s. no ambiguity in these definitions. We


;

define the empirical rank process R. on (D, 9) by

1 «n+ut>
(28) R(t)=-=I C,, , 1 for0^t<_1.
J c'c ;-,
Note the fundamental identity

(29) R„ =Z( )+ cc

Consider the linear rank statistic

(30) Tn - h ( l cni =
ni
\ n 1I
h ( ) CnD^j= fl hdR^.
.lc'c i =, \ n+1 o

This differs from (3.1.36) and (3.1.54)-(3.1.56) in that we are now considering R„
and T. under completely general continuous alternatives F 1 ,..., F„„.
102 CONVERGENCE AND DISTRIBUTIONS OF EMPIRICAL PROCESSES

Reduction to 10, 1J in the General Case

We now extend these results to the case of completely arbitrary df's. We agree
that

(31) d„ t , d,, 2 , , . denote the points of discontinuity of F.

and

(32) p„,, P n2 , .. . denote the corresponding magnitudes of discontinuity.

We now define an associated array of continuous rv's X.,, ... , X„„, n > 1, by
letting

(33) X„I =X r +Y P „j l [d,y<X,. ]+EP „;e„ J 1 [X


„ , d]

for 1 <_ i ^ n; here all f„ are independent Uniform (0, 1) rv's that are also
;;

independent of X„ 's. Note that


;

(34) [X„1 ^ x] = [X„ ^ x+E P 1 [d] = [X„; ^ D„(x)]


;

for all n, i, x (see Figure 1); here

(35) D„(x) = x +gy p„; 1 [d sx] for all x.

We thus see from (34) that X„,, ... , X„ „, n ? 1, are independent with con-
tinuous df's H,, 1 ,.. . , H,,,,, n > 1, that satisfy

(36) F„ = H„ ; (D„)
; forl_<i<n and F„= H„(D „).

Likewise, the empirical df H n of X„,, ... , X„„ satisfies

(37) IF„= l- l„(D „)•

d2 d 4 d11 din

M M M M

Pn2 Pn4 PnI Pn3


Figure 1. Whatever mass F 1 assigns to the point d„ j is distributed uniformly across the gap of
length p, by H,,.
;
DEFINITION OF SOME BASIC PROCESSES 103

Now let


(38) «„; = H„(X„ ; ) and G„ ; = H„ ; ° H„' where H„ = n' H„ ; .
i =1

Then (9)-(11) give a„ ; has absolutely continuous df G„ ; on [0, 1] and


(39) n '
- G 1 (t)=t for0<—ts1.

Let G. denote the empirical df of a„,, ... , a.


Thus we see that the empirical process /[F„ — F„] of X,,,, , X„„ satisfies

✓ n[lF„—F„]='Jn[H„(D„)—H„(D„)] by (36) and (37),

=^,Y„(Hn (D„)) a.s. by(19),

=X(P) by (36),

where

(40) I-0„ is the empirical df of the continuous X„ I , ... , X„„

(thus X. is the empirical process of the continuous a„,, ... , a„„). This funda-
mental equation,

(41) ✓ n [F„ — F„ ] = %„(F„) on (—oo, ao) for all n, holds a.s.

is the discontinuous case version of (19) ; note that in the case of continuous
df's the present X„ is the same as the X. of (19). We can think of (41) as
saying that the empirical process Jn[F„ — F„] only "looks in on” the reduced
empirical process X,, of the associated array of continuous rv's at points in
the range of F„. All we really need to learn from what is above in this subsection
is formulas (40) and (41) and this interpretation of (41).
In this general case we agree that

(42) Z. denotes the weighted empirical process (21) of the


continuous a,, 1 ,... , a„„

and we note that the process E. of (6) satisfies

(42') I=„ =7L„(F„) an (—co, co) for all n a.s.

Likewise the ranks R„,, ... , R„„, the antiranks D„,, ... , D„„, and the rank
sampling process 08„ are also defined in terms of the continuous a n ,, ... , a„„.
104 CONVERGENCE AND DISTRIBUTIONS OF EMPIRICAL PROCESSES

This is equivalent to agreeing that

(43) the ranks R 1 ,. . . , R„ n of X„ 1 , ... , X„„ are to be assigned by


breaking ties at random.

For these randomly broken ties, expression (30) for T„ with the R„ ; 's defined
in terms of the continuous a „ ; 's is an accurate representation of simple linear
rank statistics formed from arbitrary F„,, ... , F.
The key paper is van Zuijlen (1978).
A different reduction is treated at the end of Section 3.3.

Verification of Reduction

Proof of (9) and (10). Now G„ = F, ° Fn' = I by Proposition 1.1.1, so that


G„ is absolutely continuous. Also
;

(a) eni=FFi°°Fn(Xni)=Fni(Xn,) a.s. F,

(since Fn' ° F„ = I a.s. P. by Proposition 1.1.3, and hence a.s. F since F is ; ;

absolutely continuous with respect to F„.) Thus Proposition 1.1.2 and (a) imply
that

(b) i„; = Uniform (0, 1).

Since f„ = G(a) is Uniform (0, 1) with G continuous, Proposition 1.1.4


; ;

gives

(c) ant = G,

as claimed. ❑S

Proof of (11). Now

[a n , ^ t] c [G(a 1) ^ G,;(t)] _ [uni Gn;(t)]

_ [Gni l (Sni) - t] by (1.1.21)

_ [G ni1 (Gi(ai)) t]
_ [a <— t]
n; for all 0 ^ t s 1, a.s., by Proposition 1.1.3
since G,^; (G (a„ )) = a„ a.s. follows from a„ = G„
; ; ; ;

_[Fn(Xni)-t]

C [Fn ` ° F,,(Xni) Fn ' (t)] = [E(a1) Fn'(t)]


= o Fn (Xni) ^ F.^(t)]
_ [X , Fn'(t)]
X for all 0 < t <_ 1, a.s., by Proposition 1.1.3

DEFINITION OF SOME BASIC PROCESSES 105
[E(X 1 ) F„ ° F„'(t) =:] by Proposition 1.1.1
= [a m < t ]

_ [ '„, ^ G(t)] for all 0<_ t < 1, a.s., by the proof so far
c [ F.,. (en1) ^ F. o Gnn (t) ]

_ [F,,,'(nr) Fn,' ° Fn'(t )]

c [F 11 ( 1 ) F„'(t)] by Proposition 1.1.3


c [^„; ^ G(t)] by Proposition 1.1.1.

Thus we a.s. have that for all 0 <_ t < 1

(44) [a„; t]=[fn;^Gn1(t)]=[Xnr^Fn'(t)]=[F(fn1)^Fn'(t)]

= [Gn,'(4n1) ^ t] = [Fn'(an,): Fn'(t)]•

We also note that (8) implies that a.s, for all —oo < x < oo we have

[Xni C x] c[an,<Fn(x)]c[fn;^Gn,° Fn(x)]

=[4L FF;°F.'oF.(x)]
[ „; < F„,(x)] by Proposition 1.1.3

= [ F„ ( ) <_ x} by the inverse transformation Theorem 1.1.1

=[F ° FF; ° F.,' ° F,,(X 1) x]


= [Fn-; o F(X ) <x] 1 by Proposition 1.1.3
= [X„ <_ x]
; for all x, a.s., by Proposition 1.1.3
= [a„ „(x)]
; for all x, a.s., by the proof so far
_ [(a„ ) ^ x] by the inverse transformation Theorem 1.1.1.
;

Thus we a.s. have for all —oo < x < oo that

[X, x] _ [a, E(x)] ={ ° Fn(x)]


(45)
=[F. (nr) x]=[F.,`(an;) { x].

This completes our extension of (11). 0

Extended Glivenko—Cantelli Theorem

Consider an arbitrary array of row-independent rv's defined on a common


probability space. Thus X,,.. . , X„„ will denote independent rv's with
arbitrary df's F„,, ... , F„„ and empirical df F„. Let P. _ (1/n) Y_,"_, F„;.
106 CONVERGENCE AND DISTRIBUTIONS OF EMPIRICAL PROCESSES

Theorem 1. We have

(46) IIF„ — Fn 1I —^a.s. 0 as n - oo,

at a rate that is independent of the F 's. ;

Proof. Now by (36) thru (41) we have

(47) IIFI—FnII!IlR—H^II'IIG — III,


where H, ... H,,,, is the associated array of continuous df's, and G. is the
reduced empirical df of the continuous array. Now, using (3.3.23) for a
fourth-moment bound,

(a) E6„(t)=t and E[6„(t)—t] 4 --4/n 2

gives E P(IC,,(t)—tl E) <.


n=1

Thus C„(t)-> t a.s. by Borel-Cantelli. Thus

(b) max II6n(i/M)— i /MII—,.a.s.0 asn-moo,


7,, 0si<_M

and complete the proof as in (3.1.83). See Shorack (1979a) and Koul
(1970). ❑

Some Change-of-Variable Results

Let X have an arbitrary df F. Let X denote the associated continuous ry


[recall (33)]

(48) X(w)=X(w)+YPil[d<x(w)]+Y-Pisil[xc,^)=e;i,
i i

where 4 , ... are additional iid Uniform (0, 1) rv's. Let F denote the df of
X. Then with

(49) D(x)-x+Y_ pi 1 [d ^ X] for—co<x<cc


i

[recall (35)] and with the left-continuous inverse

(50) D '(y) = inf {x: D(x)>_y},


-

we have from Figure 2 [recall also (36)] that

(51) F=F(D)
DEFINITION OF SOME BASIC PROCESSES 107
d2+P2 dz+d,+P2 +Pl d2+d1+d3+p2 +pl +p3

d 2 d2+d,+P2 d2+d1+d3+P2+P1


ID 1D - '

P2
Pt * P3

d 2 d, d3

Figure 2.

and

(52) X = D '(X ).-

Moreover, each empirical df F, of the observation X of an iid sample


; ;

X,, ... , X. satisfies

(53) F, =I, (D)


; ; for 1 <_ i- n,

where ; is the empirical df of the associated continuous ry X [recall (37)]. ;

Also, from (53) we have

(54) %I n [F„ — F] = U,,(F) on (—co, cc) a.s.,

where U, is the uniform empirical process of the rv's f = F(X,). We note that ;

(55) X = F `(f ) = D' o F `(;)


; - ; for all i a.s.

One other identification worth spelling out is

(56) (D ') '((—oo, x]) = {y: D '(y) ^ x} = (—cc, D(x)].


- - -

Now note by careful inspection of Figure 2 that for an arbitrary df F we have

J J
d,sx
x h(F) dF= D(x) h(F(y)) dF(y)

E
h(F(y)) dF(y)+
-(dj) d^sx
Y_ h(F(d,))p .;

Thus, by the change-of-variable Exercise 2.1.4, we have

f
(57) h(FF) dF= h(t) dt+
co-'
J
(' F(x)


dj^xfFF((dj)

_d,)
[h(F ±(d;) —h(s)] ds.
108 CONVERGENCE AND DISTRIBUTIONS OF EMPIRICAL PROCESSES

In particular,

f
x (X) X

(58) h(F_) dF<_ h(t) dtv h(F+) dF if h is 7;


-^ . F -^

the inequalities are reversed for h>,.

3. WEAK CONVERGENCE OF THE GENERAL WEIGHTED


EMPIRICAL PROCESS

We must temporarily set our statistical problems aside and concentrate on the
probabilistic behavior of the underlying processes. We maintain part of the
notation of Section 2. Thus

i n
(1) 7L(t)= — for 0 <_t^ 1
J CC;_^ cni[1[f,.G,,(,)]—Gn1(t)]
,

where, now

(2) Gn 1 , ... , G"" are arbitrary df's on [0, 1]

and where the constants c, 1 , ... , c" n satisfy the u.a.n. condition
2

(3) max--+0 as n - c0.


1i n C'C

Thus

(4) E7L (t) = 0 for all 0 <_ t <_ 1

and

(5) Coy [Z n (s), Z.(t)] = Z c[G(s A t) — G" ; (s)Gn ,(t)]


C C i1

for0:s,t<_1.

A key role will be played by the df

(6) vo(t)=— c„ ; Gm (t) for0<t<1


Cc ; =,

and its increment function

(7) Pn(s, t]= vn (t)— vn (s) for0<_s^ t<1.


WEAK CONVERGENCE OF THE EMPIRICAL PROCESS 109
In the special case of the reduced weighted empirical process of the a „ 's ;

of (3.2.42), it will be appropriate to entitle the next theorem ( ' of Z „, of the


a„^ S .
, )

Theorem 1. ( of 7L „) Suppose (2) and (3) hold. If

(8) a, = lim max v„ t -o as m -^ Qo,


n+m tsk^m mm

then

(9) Z. is weakly compact on (D, , II 1).

Moreover,

(10) 74 = someZ on (D,-9, jI II) asn-+ao

if and only if

K(s, t) = Coy [71„(s),Z n (t)]-+some K(s, t)


(11)
as n xofor0<_s, t 1;

and in this case Z is necessarily a normal process with mean 0, covariance


function K, and P(7LE C) = 1.

Corollary 1. Suppose (2) and (3) hold. If

(12) max 11G„ —III-+0


; as n -oo
151sn

then (8) holds and (11) holds with K(s, t) = s A t —st. Thus

(13) Z,, = UJ = Brownian bridge on (D, ^, I II) as n - oo.

Corollary 2. If (2), (3), and (8) hold, then

(14) lim lim P(w g „(1 /m)?e) =0 forallE>0


m- W n +co

for the modulus of continuity w $ , of Z. defined by

(15) w (S) = sup ^Z.(t) — Z (S)l.

Corollary 3. The reduced empirical process X. and the reduced quantile


process V,, see (3.2.13) and (3.2.16), of an arbitrary array of row-independent
110 CONVERGENCE AND DISTRIBUTIONS OF EMPIRICAL PROCESSES

continuous rv's necessarily satisfy

(16) X is weakly compact on (D, -9,

and
(17) 'V is weakly compact on (D, .9, II l).

To treat the general empirical process, recall (3.2.41).

The weak convergence of U,. to U is an historically significant result. It was


conjectured and applied, based on U. —^ rd U, in a landmark paper of Doob
(1949), and was then proved by Donsker (1952). For this reason we now
highlight it.

Theorem 2. (Doob; Donsker) U. U on (D, , ^1 (1) as n-^x.

The original papers dealing with 74, and R,, seem to be Koul (1970) and
Koul and Staudte (1972). See Shorack (1979) for reference to other authors.

Time Out for Proofs

Proposition 1. Let A, B, C denote a partition of the probability space and


let a = P(A), b = P(B), c P(C). Then

(18 r
) 1,, jIa [a(1—a) —ab
11 1, J bl —ab b(1—b)

Moreover,

(19) E([1 A —a] 2 [1 B —b] 2 )=a(1—a) 2 b 2 +a 2 b(1—b) 2 +a 2 b 2 c<_3ab

and

(20) E[1,,—a] 4 =a(1—a) 4 +a 4 (1—a)^a.

Proof. This is trivial. ❑

Let

(21) Z„(s, t]-=Z„(t)—Z„(s) for0<s^t<1

denote the increments of the 1„ process.

Inequality 1. For all 0 < r <_ s -5 t < 1 we have

(22) EZ,(r, s] 21 „(s, t] 2 --3v„(r, sIv,,(s, t],


WEAK CONVERGENCE OF THE EMPIRICAL PROCESS 111

(23) E7L n (s, t] 4 <3P.(s, 11 2 +1 max "'c CZ


i -n C
v„(s, t]

(24) EZ„(s, t] 6 — ( some M) <oo for all n.

Exercise 1. Show that

(25) EOJn(t) =31 1 —n It Z (1— t) Z +nt(1 —t) for 0<t<1.

Proof of Inequality 1. We essentially follow Billingsley (1968). Writing C for ;

c„ ; /.ic'c we have

Z (r,s]=ICiyf and Z (s,t]= L.Cisi,


I

where we let

y — Gni(r, S]
and

s+ — 1 [G„ (s)<f G„ (r)l — Gni(S, t]•

Now the y,'s are independent, the S ; 's are independent, and y is independent ;

of Sj if i 0 j. Thus
2}
c) 2 (
)

E {(

n
_ C'E[y?6 ]+ CZC;Ey?E6;+2y-y C?C;E[y 1 6 1 ]E[yj 6; ]

( i0j i*j

n
3 C4G (r, s]G„ (s, t]+3 Y-Y C i
„1 ; j Gn; (r, s]G„j (s, t]
( isj

= 3v „(r, s]v„(s, t]

proving (22). Also


In 4


E j C; 5 ; _ C °ES°+ C?C; ES?ES; +2
I i;j
as CZC; ES; ES;
12
=3 E C,E6j +Y C'E6° -3 C4(E3;) 2
( I I

3v „(s, t] Z +[max C1 C;E6° +0

s3v „(s, t] Z +[max C;]v „(s, t]

proving (23). ❑
112 CONVERGENCE AND DISTRIBUTIONS OF EMPIRICAL PROCESSES

Exercise 2. Prove (24); note that we only ask for a crude bound.

Proof of Theorem 1 and Corollary 2. According to Lemma 2.3.1 (the


hypotheses are valid by Inequality 1) at step (a) and Inequality 1 at step (b):

c 4 P(WZ^(I/m) ^ E)

(a) EZ ,, k-1 , k14+3Kv„(0,1]


i max vn(
k–l ' _^]J
k=, t ` m mJ J
k —i k1 2 +v ^k-1 k 1
(b) [3v ( I max c"'^J
<
k=, " m' m J ” Jl <<+<n c c
m' m '

/ k-1 k l
+3Kv„(0,1] max v„{ ,— }
1 1^5k!5m ` m m 111

J(c)
k 1 k (
^(3+3K)v„(0,1] max v„^, +v„(0,1]j max
lz5kicm m m
c;
l lsisn C C '
-

'

so that applying (3) and (8) to (c) shows that giving Corollary 2

(d) lim P(w z (1/m)?e)->0 as m-.

Then {7L„: n ? 1} is weakly compact with all possible limiting processes on


(C, 16) by Theorem 2.3.2.
Suppose K. -^ some K. We first show that for any linear combination
/
(e) T„= > a; 7L„(t; ) we have
k - -^ T=N 0 k k aaK t tj J
j - I 1 i

If Var [ T] = 0, (e) follows from Chebyshev's inequality. If Var [ T] > 0, (e) is


a simple application of Liaponov's CLT using (3). Letting Z K denote a
zero-mean normal process with covariance function K, we have just shown that

(f) ?n >r.d. Zx
– when K„ -> K.

Combining (d) and (f) into Theorem 2.3.1 shows that

(g) Z„ = 7L K on (D,.9, 11 11) whenK„->K.

Suppose now that Z„some 1 on (D, -9, II II). Then Z. .fd1 by using
projection mappings for the f of Theorem 2.3.5. From -^ d and the uniform
bound of (23) on fourth moments, we conclude

h) Kn(s, t) = Coy [Z„(s), Z.(t)]- ► Cov [i(s), Z(t)] = K(s, t)

and EZ(t) = lim EZ „(t) = lim 0 = 0. ❑


WEAK CONVERGENCE OF THE EMPIRICAL PROCESS 113
Proof of Corollary 1. Now

v" cZ G (k-1 kl
l J 'mJ
-

m m c' c ^ ` m
(a)
r max G„,k-1 k 1
1 „
c„ <_ 2 ma x ^I Gni — j Il + l / ms
L m 'm
t_in C'C
;

so that
/ k-1
lim max v„ I , — ^ lim 2 max jI G -I II + 1/ m
n-+co ]k^m \ m m n- I^in

(b) = 1/m under (12)


(c) -*0 asm -o0.

Thus Theorem 1 applies. A completely analogous calculation shows that

(d) K„(s,t) -*sAt -st,

so the limiting process is Brownian bridge W. ❑

Proof of Theorem 3.1.1

Proof of (3.1.62) in Theorem 3.1.1. Replace c„ by d„ 1 where the d„ satisfy ; ;

d'd = c'c, max {nId„ i - c ; I/.lc'c: 1 <i<n }-*0 as n-*oo and the d, take on n
values, no subset of which sum to zero. Note that

(a) i'' (t) =^(d„ / ✓ d'd) {1 - t}


;

thus satisfies

(b) 1IMU„ -W „j)-*0 asn ->oo for every WE 11.

Since Wn trivially satisfies (8) and (11) with v„(t)=t and K(s, t) =sA t-st,
we have from Theorem 1 that W,, CNN on (D, 2, I) II) ; thus by Skorokhod's
Theorem 2.3.4 there exists processes W W d. and W* = W for which

(c) ^^OM^” -4AV*^^ -'a.5.0


as n -.> oo.

Now dN„' = dN^ implies

(d) Hn= dn ,1 [f ^. t =H*n = d'd91Ndn"+ndt.


=1
114 CONVERGENCE AND DISTRIBUTIONS OF EMPIRICAL PROCESSES

Because of right continuity, the sample paths of H and H are both completely
determined by their values on the diadic rationals; and for each m ? 1, the
joint distributions of{H„( j/2 m ): 0j^2'} and {1Vü (jj2"'j: 0 j<2'Jarethe
same. Let a'„ denote the set of all possible sample paths of 6.0„. Then

p (Hn € X„)
= lim P(H *„ (j/2 m ) with 0 j <_ 2"' takes on n +1 distinct values)
m-W

= lim P(H,, (j/2 m ) with 0j - 2' takes on n + 1 distinct values)

(e) = P(H„ E Yr„) = 1.

Let A = n„ t [Hl n* E Ye„ ], so that P(A) = 1. For WE s' define 0< f„ , < • • • <
< 1 to be the ordered points of discontinuity of Moreover, for all
0=to<t'< ... <t^<t,,—=I we have

(f) P( :; <_ t ; for l s i n)


=P((H^ (to),...,Hn(tn+l))EB„) forsomet B„

=PO(to),...,Ha(t,,+l))EB„) sincel-O n=Q-1n


(g) =P(1st1for1^i<_n);

and thus

, t* \L
(h) (n:t, =(fn:t, ... , fl:n)-

Now let fin, ... , f *„ denote a random permutation of ^*;,, ... , 6* : ,,, where
each of the n! permutations is equally likely. Note that W. is the weighted
empirical process of , ^n„. Observe that [as in (b)]

(I) IIWWnI -^0 a.s.,

where W* is the weighted empirical process of 6*,, ... , f * :n with weights c „ i.


Combining (c) and (i) gives

Ü) JI W *n — W * IH —a.s. 0 as n -> oo.

Modification on a null set to remove the a.s. restriction in 6) completes the


proof.

t Step (f) is probably clarified by the following special cases. If all c,, ; = 1, we would rewrite (f)
asP(f„, ; <_t ; forlsi<_n)=P(nG,(t ; )?iforlsisn).Also,ifn=3then[i=,,. ; st ; forIsi31
is the union of disjoint events of the type [/:, s t,, f, ^ t„ t, < 2 s t z ]; each event of this latter
type is easily seen to satisfy (f) for appropriate B„ (using our conditions on the d,, 1 s).
WEAK CONVERGENCE OF THE EMPIRICAL PROCESS 115
(Note that we had to modify the original c n; 's slightly in order to guarantee
that H had exactly n jump points. If one c n; was equal to 0, say, then H.
would have had at most n —1 jump points.) ❑

Exercise 3. Assuming p n -' p, extend the previous proof to show that (IW, —
W11-> 0 and II U n — U II -' 0 for all w, as stated in Theorem 3.1.1. [Note that the
individual statement (3.1.60) is an immediate corollary to (3.1.62)].

Proof of (3.1.61) and (3.1.63) in Theorem 3.1.1. Now (3.1.61) holds since
(3.1.12) gives

Il V,—VII =II—vn(G ') +U +Jn[G. oG„'—I]ll


Ilv( 3 )—v(G')lI+lIU(G)—Ul °—III
IIvn— UlI+IIU(G ')—Ull+li.n
-'0

by (3.1.60), I JG' —III -^ 0, and the continuity of the sample paths of U.


Also (3.1.63) holds since (3.1.36) gives

Ills—WII = JIW n (^ n ')—aw(^ n ')+W(^ n ')_WII


IW—WII+IIW(') —W
-0

by (3.1.62), III n' —III -' 0, and the continuity of the sample paths of W. ❑

Exercise 4. Show that for all disjoint intervals (say, [0, r], (r, s] and (s, t])
we have

IE 7Ln(r, s] 37Ln(s, t]I^ 3 vn(r, s]vn(s, I]

and
(E7L n (0, r] 2 Z,,(r, s]7L n (s, t]J ^ 3v (0, r] {vn (r, s] A vn (s, t] }.
,,

Theorem 3.1.2 is proved in the next section.


Proof of Corollary 3. Since Y_; Gn; / n = I by (3.2.10), the function vn of (6)
satisfies

(a) V. = I

Thus (8) holds, and Theorem 1 gives (16).


116 CONVERGENCE AND DISTRIBUTIONS OF EMPIRICAL PROCESSES

On any subsequence n' where SS„.(some X 0 ), a Skorokhod construction


of the type detailed in the proof of Theorem 3.1.1 earlier in this section shows
that IIX —X I1 >a.s. 0 for processes X*,I=— C„•. Thus the identity (3.2.17) shows

that

511x„,—x0*11+Ilx`Ö(G ')—X +1/'.Jn'


(b) a.s. 0,
since Theorem 1 gives P(Xo E C)-1. Thus V=—X' on (D, 2, () ). Thus
V„ is weakly compact. 0

Remark 1. We have proved above that — W 11— ,, 0 for the special construc-
tion of Theorem 1.1. Thus the modulus of continuity w R , of R. satisfies

wR„( 1 /m)= 0se—ss1/m


sup JR,,(S, t]I

(26) ^ww(l/m)+211R„—WII
=w w (1/m)+Op (1) as n-goo,

where w w (1/ m) --+ as , 0 as m -' o3. Thus, in analogy with (3.3.14),

(27) lim lim P(w R ,(1/m)>s)=0 forall e>0


m—w„-oo

under the hypotheses of Theorem 3.1.1. 0

An Alternative Reduction of the Weighted Empirical Process that Always


Works

Suppose that X,, 1 ,. .. , X„„ are independent with df's F 1 ,. .. , F„„. Let
X,, 1 ,. . . , X„„ with continuous df's H„,, ... , H„„ be the associated array of
continuous rv's of Section 3.2. Thus, as in (3.2.52),

(28) X„i = Dn'(X„r)

where, as in (3.2.35),

(29) D„(x) = x+Y p„J l [dX] for —m <x <co


i

with p„,, p„ 2 , ... denoting any enumeration of the discontinuities of F. and


with d„,, d 2 ,... denoting the corresponding magnitudes of discontinuity. We
recall from (3.2.34) that

(30) [XX1 <_ x] = [X„ ; D„(x)] for all n, i, and x.


WEAK CONVERGENCE OF THE EMPIRICAL PROCESS 117
Shorack and Beirlant modified Section 3.2 and defined

(31) ßn; = Hn(Xni) = Gn; = H„ ° H^' ; where fin = cniHn;l c'c,


i=1
n
(32) „; = G 1 (ß) = I where G„ = Y cn i G„ ; /c'c = I,

(33) [ßn;mot]= ^Gn;(t)] for 0^t I a.s.,

(34) [Xn, ma x] = [ßn; Hn(x)] = [ ni ^ Gnr(Hn(x))] for —< x < oo a.s.

Combining this with (30) and


n

(35) 11„(Dn (x)) = En (x) where F„ _ Y c nni Fni /c'c

gives

(36) [Xn, x] = [ßn, Fn(x)] = [fn; ^ F,,(x)] for —oc < x <x a.s.

Note that the f n; 's of the present ß -construction, call them „; = f ni , and the
's of the earlier a-construction of (3.2.42), call them ^„°;, are a.s. identical in
that

(37) fin;°6ni=fni=Hn;(Xn1) for 1 ^ i< n, n >_ 1.

We now let Z. denote the weighted empirical process of the ß„ ; 's; thus
n
7Ln(t) E
, Cni{1[/3,,i_t] — Gni(t)}
J c c;_^
(38)
— VCi 'c i=1
n
C ni{ 1 [fn^G,dM)J — Gnilt)}

for 0: t< 1. Moreover, (36) shows that

i
(39) c E ' c {1 [x .. —F„ ; }
n
En= Jc ni 1

i
( 40 )Y_ ./c'
c
n
Cnr{ 1 [.F(.)] — F„;} a.s.

i n
(41) _ Cn; {1 ] —Gn,(Fn)} a.s.
✓c c

(42) =Zn(Fn)•
118 CONVERGENCE AND DISTRIBUTIONS OF EMPIRICAL PROCESSES

Now

(43) Ell"(t) =0 for0< t<_ 1

and, using O. = I from (32),

Coy [Z"(s), 74(t)] = — Y_ cn;[Gr(s A t) — G(s)G",(t)]


c c ;_,
1 " / /
(44) = SA t — —
, c ni Gn1(s)Gni(t)•
CCai

Note that the identity

(45) If8„ =7L"(4^n`)+ c";G";(( R


✓c c

of (3.2.29) still holds provided we understand that

i n
(46) G„(t)=— 1 tß <,^ is now the empirical df of the ß,'s
n

and & is linear between the distinct

Theorem 3. (= of Z,,, of the ß's) Suppose

(47) max c„ Jc'c-.0


; as n-* X.
i

Then we necessarily have

(48) 7L„ is weakly compact on (D, ,

Moreover,

(49) 7L” some l on (D, 2, I fit) as n -> x

if and only if

(50) K"(s, t)=Cov[Z"(s),Z"(t)]-some K(s, t) as n-cofor0s, t 1;

and in this case Z is necessarily a normal process with mean 0, covariance


function K, and P(Z E C) = 1.

Proof. We consider once again the proof of Inequality 3.3.1; however, we


apply it to the process Z. of (38), not the processes Z,, of (1) or (3.2.21). That
SPECIAL CONSTRUCTION FOR A FIXED NEARLY NULL ARRAY 119
proof applies verbatim, but the conclusion simplifies since vn (s, t] = t – s. Thus
in the present case we have

(51) EZL n (r,s] 2 Z (s, tj 2 S3(s–r)(t –s) for0r^s<_t<_1,

(52) EZ n (s,t]'^3(t–s) Z +(t –s)[ max c /c'c


1 ^ i^ n 1 for0<s_<t^I

(53) EZ,,(s,t] 6 --(some M) <X for all O<s<_t<_1.

Now consider the proof of Theorem 1. It applies verbatim; but since we now
have vn (s, t] = t – s we do not require condition (8). This completes the present
proof. We single out for display the fact that this proof has established that

(54) P( (1/m) >_ e) <_ s ° K j (1/m) +[ max c^, ; /c'C


111 1`i <n JJJ

for some universal constant K. Thus

(55) lim lim P(w Z „(1 /m)?e) =0

also holds. a

4. THE SPECIAL CONSTRUCTION FOR A FIXED NEARLY NULL


ARRAY

In this section we will consider a triangular array of row-independent rv's

(1) a,,,

1
(1')
X; n > 1} having continuous df's {F 1 ,... , F; n > 1 }. As in Sec-
tions 3.2 and 3.3

F n (X n i) = G,,1 F,,1 o F n

13ni — Fn (Xni) = G'1 = Fn i ° Fn


1

1
where F. = –
in
n ; =1


where Fn =— Y c 2n; Fn; ,
Y_
C C i=1
n
Fni,

1 n
(2) G,;(an;) = I and –
n ;
E
=1
G n; = I,

/^ 1 n
(2') 6ni = G ni(F' ni
' ) a.s. and -- c 1GPi = I,
C C 1

(3) [a n 1 ^t]= n; ^G, 1 (t)] for 0<_t<_1 holds a.s.,

and

(3') [ßn; <_ t] = [an; < G„ ; (t)] for 0-5 t 1 holds a.s.
120 CONVERGENCE AND DISTRIBUTIONS OF EMPIRICAL PROCESSES

We consider the processes

I n
(4) X (t)= = [ 1 [f,a.,(r)] Gi(t)] for 0<_ t 1,
✓n
i n (
(5) 7n(t)= C ni [i [f....G,.t( , )] — Gni(t)] for 0 tC 1,
✓ CCi_^

_ 1 ^`
(5 ' ) ZR(t)— L Cni[ 1 [&. GpJ , )) — Goni ( t)] for 0<_t<_1 ,

✓ C C i =1

and W n and Ih n of Section 3.2. We assume that the continuous df's


{Fn ,, ... , F,,,,; n ? 1 } form a nearly null array in that

(6) max F,,1 — Fn;jH 0 as n-*co.


Isi,jsn

Since this maximum approaches zero, the weighted average (on j) approaches
zero; thus (6) and the definition of G, i and G„ ; in (1) and (1') imply
m ax II G ni — I (H 0 and max ^I Gp ; — I II -> 0 as n -* oo.
(7) te isn 1si<_n

From Theorem 3.3.1 it is natural to think that if {c,, 1 , ... , Cnn ; n >_ 1} satisfy
the u.a.n. (3.1.58), then (recall Corollary 3.3.1 and Corollary 3.3.2)

X n U and 1,, = W in some joint sense,

where

(8) U and W are Brownian bridges

having covariance function (3.1.59). We now phrase this conclusion in the


spirit of Theorem 3.1.1.

Theorem 1. Suppose the continuous df's {Fn ,, ... , F,,,,; n >_ 1} are nearly null
as in (6) and the constants {C,,, ... , c nn ; n > 1} satisfy the u.a.n. condition
2

(9) c i -^ 0 and p n -
max n p as n -> oo.
I i -n C , C Cn

Consider the special construction if,,, ... , i,,,,; n ? 1}, U, W of Theorem 3.1.1
that satisfies

(10) Coy [U(s),W(t)]=p[sAt —st] for0<s, t<1.


SPECIAL CONSTRUCTION FOR A FIXED NEARLY NULL ARRAY 121
In addition to the conclusions of Section 3.1, these same random elements
also satisfy

(11) l^RC n -OJ^1^0 asn-> c0

(12) IIZ n -WII -4 4 0 as n-*o0

(12') ^I7L„-WI1-0 asn-*x

provided these same in; are used in defining X,, and Z. in (4) and (5). Moreover,

(13) an, =G.,;( ni)=Gni forI!si^n

(13 ' ) ßni = (G 1 )'( 1 ) = GR ; for 1 i <_ n

(14) Xn; =F„'(fn; )=Fn; for 1<i<_n

may be assumed, for convenience, to satisfy

(15) O =Crn;p<IXn;l<• •<an:n <an:n +1 —I foreverywEC1

and

(16) Xn; , < • • '<X,,,, for every w E d2

(provided at most a countable number of different triangular arrays of con-


tinuous df's are considered at any one time). If the assumption p n -* p as n - o0
is dropped, we can still claim either (11) or (12), but we cannot claim that
they happen simultaneously on the same (11, .si, P).
Proof. Let U. and W,, be as in (3.1.7) and (3.1.21). Let

O n (t)=W n (t)-1 n (t) forO t_1


(a)
in
C ni{['If] — t — [ l l..Q,( , )l —G nf (t )]} .
J C C ; -1
]

Now EA.(t) =0 and

(n^

Var [A „(t)] =—
, L
cc ;_, Cni{I G,, (t) — tl[ 1— IGnf(t) — tl]}

max ^I Gni —I ll
=isn

(b) -0 as n -* m.
122 CONVERGENCE AND DISTRIBUTIONS OF EMPIRICAL PROCESSES

Further, for any £ >0 and for some fixed large m = m F ,

P(II ^3E)

(c) P( (i/ ►n)?e)+P(w (1/m)?e)+P(w(1/ ►n)^e)

(d) ^ 3E for all n ? some n E

by applying Chebyshev's inequality and (b) to the first term in (c) and by
applying (3.3.14) to the second and third terms in (c). Since e > 0, we have
shown

(e) &I --te n 0 as n -* oo.

Thus, using Theorem 3.1.1 and (e) we have

(f) DZ -WHä„11+SINN„- Wal ---► p 0,


which is (12). Statement (11) is a special case of (12). We achieve (15) and
(16) by making modifications on null sets. Replace (3.3.14) by (3.3.55) for
7L„. ❑

Stochastic Integrals

Recall our definition of .2 , f [h]j, (h, h),

hdt, Q;,=I[h-h]I2=Jh2dt-ham,
hJ 0 0
(17) (' 1
Q h,(h-h,h-h)= J hhdt-h h -
from Section 1. Now (with X,, Z„, and Z as in Theorem 1)
I n
(18) h dX n =— Y1 [h(a;) - Eh(an;)]
0

and
^ n

(19)
J o
hdZL =- 1 Z cn;[h(«n;) - Eh(an;)],
✓c,c +=I

(19') h d7° =.^ c„r[h(ß;) - Eh(ßnr)]


would seem to converge jointly to rv's, all of which are N(0, o ). It is now
-
SPECIAL CONSTRUCTION FOR A FIXED NEARLY NULL ARRAY 123
our purpose to define N(0, 0.h) rv's, to be denoted by Jä h dU and jö h dW,
on the probability space ((1, .s^P, P) of the special construction of Theorem 1 in
such a fashion that both

hdX„= J dU and hdZ,, =J hd7L„= hdW


J o ° o h o o ° o

asn ->co

and the joint distribution of U, W, J h du, and ,(ö h dW is known. To obtain


only -mo d, and not --> P with a representation for the limiting rv, would be
nothing new. Some mild restrictions will be needed.
One part of the next theorem requires that we assume either

(20a) F„,=. • • =F F „, n >_1

or

nC2
(20b) cc some K < oo for all 1 < i <_ n, n ? 1

or

(20c) max G„ ; (s,t]<_K(t —s) forall0<_s<t1 and some K>0


I<l n

or

(20d) v„(s,t]^K(t —s) for all 0^s<_t I and some K>0

or

(20e) h and h are continuous functions on [0, 1] having bounded variation

or

(20f) condition (27) below holds.

Theorem 2. Suppose the continuous df's {F,,..... F„„ ; n >_ 1} are nearly null
as in (6) and the constants {c,,, ... , c; n >_ 1} are as in (9). Consider the
special construction { , ... , i,,,,; n >_ 1 }, U and W of Theorem 3.1.1. Let
h, h E Y2 • Then there exist rv's

(21) I
f o ,
hdU=N(0,ah) and I hdW=N(O,o )
Jo
124 CONVERGENCE AND DISTRIBUTIONS OF EMPIRICAL PROCESSES

for whicht

(22)
f. '
hdX„ =
a0
hdU and Jhd7L° = hdW.
o a

Further, if one of the versions of (20) holds, then we also have


1 1

(22') I h d Z„ = I !dW.
0 ° 0

Moreover, U, W, f ,, h du, Jö h d U, Jö h dW, and Jä !d W are jointly normal


with the covariances stated in Theorem 3.1.2.

Remark 1. Our proof will define J h dU and Jo h dW so that (21) holds and
the covariance formulas hold true for any appropriately correlated Brownian
bridges U and W. We insist on using the U and W of Theorem 1 in Theorem
2 so that (22) also holds true in the sense of ---* , instead of just — d. Thus we
will get representations of our limiting rv's.

Proof. Consider first the case when

h(t)=1 [o , r] (t) for0^t<_1.

Then clearly

hdi„ = ^, Lcni[l[o,r](ani)—Gni(r)]=—J ZL A.
0

This extends immediately to step functions of the form


k
(23) h(t)= a; l ( , 1 , ] for 0=t o <t,<• • •<t k =1
i =I

to give

(a) Yn= hdi„=— Z„dh


0 0

n
(b) /^
(=1 ^ E cni[h(ni) — Eh(eni))if all G„
'IC C
; =I

of the integration by parts relationship in (a), we are motivated to

t We write A. —B,, if A„ — B„—^ 0 as n -. oo.


a
SPECIAL CONSTRUCTION FOR A FIXED NEARLY NULL ARRAY 125

the following definition:

(24)
f o ,
h dW= — W dh
Jo
for step functions h as in (23),

and we let Y = 0 h d W for ease of notation. Theorem 1 implies

(c) IY„— Yl=I — J(4— W)dh <_ 11 4—W11


0 f , d^hj^0
o P

for total variation measure dlhI. Thus (22) holds for the step functions (23),
and (21) follows by Exercise 3.1.2.
Since the step functions (23) are I[ ]) dense in £2 , given any h E Yz we can
choose a step function h E having

(d) Uh, <_ J[h — hi]! 2 = (h — h ) 2 dt< e 3 /K.


e

Then with Y = — f 0 W dh. we have


E

Yn —Y,,,= hd7L n — hd7L,„


0 0

(e) = J 0
(h — h,)d7L„+ J
0
h d7L„
E — Y
f

—J0 (h —h )dZL — J
'
E o h r,dl m —Yr ,
1
so that for m A n >_ some N E Chebyshev and (c) at step (f) give
! (' I

P(I Y„ — Y„,I >4e) s P1 J (h — h E ) dZ„ >_ e


` /

+
P( J h,d Z , — Y I ? e J + (two analogous terms).
o
Y

(f) sVar[ J (h —h e )dZ„] /e 2 +e+E

) dZ m /e 2 .
+Var
[J1 (h — h
E ]

0
126 CONVERGENCE AND DISTRIBUTIONS OF EMPIRICAL PROCESSES

Seeking to bound the variance in (f) we note that

Var1 f (h-h,)dZ I

L 1 " J
1 1 2
J (h-h )
('

_—
e c i_)
c„ ; e 2 dG” -(; (h-h e ) 2 dGn;/
0 0
1

J 0 (h-h,) d
(

1 ^ci (h-h f ) 2 dG n; = 2 jc ,'^G"i]


cc1 0 ,cc

(g) = E(h - h,) 2 dv"

where, as in (3.7), for all 0 <_ s <- t <_ 1

v lss t] i Y_ Gni(s, t]
"
(
— C
1 "
-
1
c
/

(t - s) if (20a) holds
K (t - s) if(20b)holds
K(t-s) if(20c)holds
(h) <_ K(t-s) [i.e., (20a)-(20c) are all three special
cases of (20d)].

Thus (g) and (h) give, independent of all F" 's satisfying (h) with a fixed K, that
;

(i) Van l
Lfo,
_Kf (h-h,) 2 dt<e 3 .
(h-h E )dZ" <
J o,

We thus obtain from (f) and (i) that

(j) P(IY,,-Ym, 4E)<2E+2e'/s 2 =4s.

Since statement (j) says that the Y. are a Cauchy sequence in probability,
there exists a ry Y for which

(k) Y"->PY asn - co and Y=N(0,o )

since the Lindeberg-Feller theorem of Exercise 3.1.2 establishes for the sum
representation of Y. in (b) that Y. * d N(0, oh). We then define Jö h dW to
be Y, and rewrite (k) as

(1) f . 'h dZ" - J


0
h dW=- N(0, 0,h) as n -* oc.
SPECIAL CONSTRUCTION FOR A FIXED NEARLY NULL ARRAY 127

Joint normality of any finite number of W(t ; )'s and Y = (ö h; M's likewise
follows by applying the Lindeberg-Feller theorem to an arbitrary linear combi-
nation of the same W n (ti )'s and Y;n = J h, dW 's. Since coupled with this - d
we already had - ,' to a limiting vector, it is clear that the covariances among
the W(t ; )'s and Y; 's equal those among the W n (t1 )'s and Y,,'s. These are

11
1 n

ov W n (t) , hdW = -- CnilEl[0,^]l Sni)hl ni )— tEh( ni))


0 CC i=1

1 n 1
c^,, 1 o , 1 (s)h(s) ds—th}
c'ci -1 tj0

(25) =j
t E h(s) ds — th} = Qn,1 1 ,., 1

and
j' 1 1 ('

Coy I h dW , l h dW n 1=—;— Cn 1 Cov [h( ni ), h( n; )]


.l0 Jo J

(26) ={ f
h(s)h(s) ds—k} =cT n ,n.

If we set cni = 1 for all i, then 74, reduces to X,, and the very same proof
works for X,,. (The ordinary CLT could replace the Lindeberg- Feller version.)
The covariances for X„ and Z n are analogous to (25) and (26). If all c ni = 1,
then (20b) is automatically satisfied; thus any version of (20) may be omitted
from the formal hypothesis if only X n is considered.
We note that the essential part of inequality (i) is that

for all r >0 we can choose a function h F for which (a) and (c) hold
(27)

and lira (h—h,) 2 di',, <r 3 .


n-x Jo

The theorem also holds under this condition. We note that both (a) and (c)
hold for functions h satisfying (20e), so that (27) is not needed in this case.
To prove the Z result we merely note that for the process Z of (5') the
corresponding P. function is v„(s, t] = t—s; thus (h) necessarily holds. ❑

Definition of J0 h dS and Its Relation to J h dU

Suppose for a Brownian motion § we define

(28) 1
0
h d S = h (1)S(1) — § dh
1 J 0
for the step functions h of type (23)
128 CONVERGENCE AND DISTRIBUTIONS OF EMPIRICAL PROCESSES

and then extend our definition of Jö h dS to all h E.2 , as in the proof of


Theorem 2.

Exercise 1. Suppose W is the Brownian bridge of Theorem 2 and that Z is


a N(0,1) ry independent of W. Let S(t)=4W(t)+tZ for 0<—t<1, so that §
is a Brownian motion (as in Exercise 2.2.1). Note that Z = S(1). Show that

11 Cl
(29) ^ 1 hdW= I hdS—Z I h(r)dr forallhE.'2 ,
0 Jo Jo

that ,(ä h dS is normal with mean 0, and

(30) Coy J h dS, f hdS =Jhhdi for all h,hE.T2 .


0 o, I o 0

Now define

(31) fhdsJ
= h1 [o ,, ] dS and J hdW= h1 10,, ] dW.
o o 0 0

It follows from Exercise 1 that

Cov h d§, J o h d§1 =


f
,,I AI
hh dr
(32)
L f.
s

for all 0 75 s, t< 1 and h, h E 'Z .

Also Jö h dS, for 07 t <— 1, has independent increments; thus

(33) ^ t h dS, for 0 t <_ 1, is a martingale for each h E Y 2 •


0

Also note that

(34)
f
Cov I S(s), I h d = I
o, J Jo
h(r) dr

for all 0 s, t _< I and each h E 22 .

It also follows from Exercise 1 that

(35) I hdW=
0
fI 'h
odS—S(1)
Jo
h(r)dr for0<—t<landallhE2 2
SPECIAL CONSTRUCTION FOR A FIXED NEARLY NULL ARRAY 129
and

36) Cov I J o h dW, J o li dW I = fs ^' hh dr— J os h dr J o h dr

and all h, h E 22 . Also


('
for all 0<_s, t: 1

lf
f
1 SAI

(37) Coy W(s), Jh dW = h dr — s h dr


0 I .,

for all 0 ^ s, t < 1 and each h E 22 . Similarly, (35), (32), and (34) give
(J s ft l f ',1nt t
(38) Coy d h S,
00
h dW = h
hhdr—
J fos
dr I h dr
o

(i.e., Cov [S(1), Jö h dW] = Cov [Z, Jä h dW] =0 for all t).
Since ,(ö h d S is a martingale on [0, 1] for h E 22 , the Birnbaum and Marshall
inequality (Inequality A.10.4) and (32) imply that if

(39) q is / on [0, z], \ on [Z, 1], and J 1 (h/q) 2 dt <oo,


0

then for all Os O<_ 2 we have

P \II\Jo hds)/qll o^E)<e


-Z f 0 [q(t)] -2d{E ( f }
hdS)2r

e 1
= s -z [q(t)] "2d h 2 (s) ds
0 0

(40) =e 2 [h(:)/q(t)] 2 dt for q as in (39),


J
with a symmetric result on [1— 0, 1]. Since IJö h(r) drl <_ (J, h 2 (r) dr)" 2 and
since IS(1)I(J h 2 (r) dr)" 2 is a submartingale for 0 <_ t <1, the Birnbaum and
Marshall inequality also gives

,S(1) h(r)
P (IIL fo, dr]/gJIlo> E)
rt /2
h2(r)dr)1
cP ( 1I§(1)I\Jo /q110>E/
e
E -2 [q(t)] -2 d 2 (r) dr}
I E(S2(1) fo h ,

41) = e -2 [h(t)/q(t)} 2 dt for q as in (39).


J0

130 CONVERGENCE AND DISTRIBUTIONS OF EMPIRICAL PROCESSES

Combining (40) and (41) into (35) gives for all 0<_ 0< 2 that

(42) P( J
( o
h dW
)/ g ll oß_28)
2r [h(t)l q(t)] 2 dt

for q as in (39),

with a symmetric result on [1-0, 1 ].


We now consider a sequence of processes J h„ d W. Applying both (42)
and its symmetric version on [1-0, 1 ], with 0 =.‚ gives

P (ll( J h„ dw
— ohJ dW)/qll
^2e)

n — h) dW)/qfl ^ 2e)
= P (11 Jo (h

(43) ^ 2e-2 [(h n — h)/q] 2 dt for h., h EY2 and q as in (39).


fo,
The same result holds if dW and 2e is replaced throughout by dS and E.
Finally, note that although we assumed q symmetric for ease of presentation,
h and q could be "balanced" differently in the two tails.

Exercise 2. Provide conditions under which

(44) KJO h„dS


n J
— o
hdS)/qjj—'°0
for a special construction of the partial-sum process §„. Note that

(45) S. is a martingale on [0, 1].

Note also that


<=
h„ d§„=— h n X)(46 ,, for0<t1.
;

o V r=t i
n (

Exercise 3. If AA is Brownian motion on [0, 1], then

` t— r
(47) U(t) ° A^II(t) — 0 1—r dM(r) is Brownian bridge for 0< t <_ 1.

Then show that we can get Nl back via

(48) M(t)=U(t)+ fo U(r) dr for0t<_1.


' 1—r
THE SEQUENTIAL UNIFORM EMPIRICAL PROCESS K, 131

Finally, show that for h E'2 , we have

(49) J 0
hdU=Jhdi1_Jh(r)
0 0
Jj -

0 —s
-dR1I(s)dr

(50) f
= o I h (s) — 1
,
s I SS h(r) dr] du(s).

5. THE SEQUENTIAL UNIFORM EMPIRICAL PROCESS K.

For independent Uniform (0, 1) rv's f,, 2••• we define


1 (">
(1) ✓ ni
K„(s,t)=[1ro,7( ,)—t] for
1

to be the sequential uniform empirical process. Note that when s = m/ n the


process EK (m/n, t) equals I7n times the uniform empirical process U. of
4„,. We recall that

(2) K will denote the Kiefer process

of Section 2.2; thus K is a normal process with continuous sample paths and
covariance function

(3) Coy [K(s,, t,), K(s2, t2)] _ (s, A s2)[t 1 A tZ — tItz].

Proposition 1.

K f.d. K as n -'

Proof. This is a simple application of the multivariate CLT that parallels


Exercise 3.1.1. ❑

Theorem 1. (Bickel and Wichura) Let T=[0, 1] 2 . Then

(4) Kn =IIIon (Dr,^T, II 11) asn-*cc.

Proof. We will appeal to the natural generalization of Theorem 2.3.6, found


in Bickel and Wichura (1971). Let denote ✓3 times Lebesgue measure. We
will verify the key step that

(a) E{K (B,)K,(B2)}<_ p.(Bi)pi(B2)

for all neighboring "blocks" B, and B2 . There are two types of neighboring
blocks.
132 CONVERGENCE AND DISTRIBUTIONS OF EMPIRICAL PROCESSES

Suppose first that

(b) B,=(a,b]x(r,s] and B 2 =(b,c]x(r,s].

Then independence of the f; 's gives

(c) E{K(BI)K„(B2)} = Elö„(B,)El(„(B 2 ),

while
nb — na E E, b,^a+I [I[ ,s](e ) — (s — r)] 2
,
Ellö n(t)
B =
,

n nb —na
_(b—a)EUJ m (r s] where m nb — na
=(b—a)(s—r)

(d) µ(BS)

with the analogous statement for B 2 .


The other case to be considered is

(e) B,=(a,b]x(r,s] and B 2 =(a,b]x(s,t].

Then with m = nb — na, we have

E{K„(B 1 )K n (B2 )} = (b — a) Z E{U ,(r, s]U ,,(s, t]}


<_3(b—a) 2 (s—r)(t—s) by (3.3.22) with v„(t)=t

(f) =µ(B1)FA(B2)•

Now apply the Bickel and Wichura (1971) theorem, using Proposition 1.

This process is discussed further in a later chapter.

Exercise 1. Verify that Ui,, = lü on (DT, T, V II) if

1 (s)
K,(s,t)= cflI{1( . Gcr»—Gn,(t)] for0<—t<_1
✓ cc

and (3.3.2), (3.3.3), (3.3.8), and (3.3.11) hold with the K(s, t) of (3.3.11) being
SAt — St.

6. MARTINGALES ASSOCIATED WITH U., V,,, W., AND R.

In this section we enumerate a number of martingales associated with the


processes of Section 1.
MARTINGALES ASSOCIATED WITH UJ,,, V,,, W,,, AND R. 133

Proposition 1. Fix n ? 1. Let q,=— o [ 1 [6,,, t : 0 <_ r <_ s, 1: i < n ]. Then

(1) {v"(t)/(1 — t): 0 s t< 1} is a martingale with respect to a„


(2) {MV"(t)/(1— t): 0z t< 1} is a martingale with respect to a„

and if p,, ; = i/(n + 1), then

(3) {V( p1)/(1 —p 1 ): 1 ^ i <— n} is a martingale

and

(4) {Rn(p";) /(1—i /n):O^i<_n -1} is a martingale.

Proof. Now for 0 <_ s < t we have for each 1 <_ i<_ n that
t—s
( 5 ) E(1[e.:JI05)= 1 [f; =.s]+ 1 -5 1 [SGf;]•

Thus [check the values of (5) and (a) in the two cases e; < s and fi > s
separately]

(a) (1— s)[E(1[<,]Io-s)— t]=(1—t)[1[ < ) —s].

Thus (a) gives

E I X5 1
[ ^V " (t)
^_ ►

_ _
1—t J 1—t

s Cni 1ts] — Wn(s)


(b)
i=1 ✓ c'c 1—s 1—s

and (2) holds. Let all c, ; = 1 in (2) to obtain (1).


<k<j]. Then for j <i
Now let crf = 0"[4/"(p„ k ): 1_

E( V "(P"') I
Q) = ✓ n(E(f":iI ^i) — P";)l( 1 — Pni)
Pni
—j i n+1
=./n " :;+ (1— " :;) — n+1 n —iA -1
n — j+l
(c) (n+1)f":;—J / "(P"i)
=
n[ n—j+1 J 1—p ",

which establishes (3); see Braun (1976).


The proof of (4) will be left to Exercise 3(ii) below. ❑
134 CONVERGENCE AND DISTRIBUTIONS OF EMPIRICAL PROCESSES

Exercise 1. Verify for each n that

(6) {(l+t)U (t/(l+t)): t >_0} is a martingale.

Exercise 2. Verify that for Brownian bridge U

(7) {U(t)/(1 — t): 0^ t < 1} is a martingale

and recall from Exercise 2.2.2 that

(8) {(1+t)U(t/(1+t)): t >_ 0} is Brownian motion

(and hence a martingale).

Inequality 1. (Pyke, Shorack) Let q denote a positive, increasing, right-


continuous function on (0, 1). Let 0< B <_ 1 be fixed. Then for all n >_ 1 we have
"

'[q(t)]-Zdt/Al2 for allA>0.


(9) f
We may replace U, by W. or U in (9).

Proof. Now U, (t )/ (1 — 1) is a martingale with second-moment function i'( t) _


t/(1— t). Thus by the Birnbaum and Marshall inequality (Inequality A.10.4)
with g(t) = Aq(t)/(1 — t), we have

(a) P(IJU /gjjö>_A)<_! e [g(t)] Z dv(t)


0

(b) = [g(t)]
J0 B -2 (l f
— t) -2 dt = oe [q(t)] -2 dt/A 2 .

See Pyke and Shorack (1968) for the first version of this inequality. Note also
Häjek (1970); he decreased the constant on the right-hand side to one.
Now U(t)/(1 — t) is also a martingale with second-moment function v(t)
t/(1— t). So is W (t)/(1— t). Thus the same proof works for them. 0

Inequality 2. Let q, : • •:q„ and let 0 < 0 1 be fixed. Then for all n>-1
and all A >0 we have

1 ( nO > 1
(10) Ps" max IV^(P^1)) Al j--
2•
/
1 i ne 4. A n 9.

and
IRn(i/(n+1))I A - 1 ( no> 1
(11) P max

Imsi!sno q; — AZ(n-1) I q,
MARTINGALES ASSOCIATED WITH U,,, U,,, W,,, AND Q. 135
Proof. Let p; =i/(n+ 1). We will use Proposition 1 and the Birnbaum and
Marshall inequality (Inequality A.10.3) for our inequality below, and then use
Proposition 3.1.1 to evaluate variances. Thus

IV. (P"i)/( 1 — Pi)I ^A


P max
. e q./( 1— pi)
c ( n o)
( 1— pi) 2 I EV"(P) _ EVn((P"_ -0
(a) A2gz
L (1 — pi) 2 (1—p11)2 J
< > (1 — pi
n Pi Pi- i
(b) =
n+2 , A q; )
n "B 1n+1—i 1
n+2 A 2 q; ^n +2—i n+1
1 ("9) 1
(c) 92>
A2 n

as was claimed. We leave (11) to Exercise 3(ii) below. El

Exercise 3. (Variance correction for finite population sampling)

(i) Suppose an urn contains n identical balls labeled c„,, ... , c„, and m balls
are randomly sampled without replacement. Let Y; denote the value of c„ ;

sampled on the ith draw. Then

0 =Var[ Y; ]= nVar[Y,]+n(n - 1) Coy [Y,, Y2 ]


I JJJ
so that

(12) Coy [ Y , Y] ; _—1 o ,, if i 0 i'


n-1

and

”' 1 (^ m1 1
Y/m = m t 1 n -1 for1 msn,
(13) Var
Li 1 J
where
1 /
14 mac,,, — ( Cni — C„) 2 .
n ;_,
( )
136 CONVERGENCE AND DISTRIBUTIONS OF EMPIRICAL PROCESSES

Also, for constants d, we have

(15) Cov [
n 1 i = I n
dnil'i, 1 i dnjY; J

1 _n ;-^
c,n
mm'
z I z 1 1
d n i — — ni d — dnj )].
n n i -, n;_,

Verify these easy calculations.

(ii) Now verify (4) and (11).

Remark!. Theorem 3.1.1 showed that if cn = 0 and max {c/ c'c: 1 < i n} - 0
as n -> oo, then IIR n —WII —* a . s . 0 for a special construction; this implies
ff (t) --> d N(0, 1(1-1)) for each 0<t<1. In the finite sampling notation of
the last problem, we note that
I m
(16) R n (t)= c Y ^ Y; with m=((n+1)t).

Proposition 2. For each n

(17) {t3(t)/t: 0< t< 1} is a reverse martingale


and
(18) { fin :i / i: I<_ i <— n} is a reverse martingale.

Proof. Let o, = Q[G n (r): r -- t]. Then for s < t

(19) E(G n (s)/sIo,)= E (Binomial (nG n (t), s/t))/ns=G n (t)/t,

so that the first claim holds. Now let O i = o [en:k : k >_ j]. Then for i < j, fin , ; is
distributed as the ith largest of j —1 rv's uniform on [0, f n .; ]; so

I
'. ') t (i -1)+1 j

so that the second claim holds. LI

Exercise 4. Fix n ? 1. Let QS = Q[ 1,^ f _^ I ^: s i r <_ 1, 1 <_ i ^ n 1. Then

(20) {U(i)/t: 0< t <_ 1} is a reverse martingale,


(21) {(W n (t)/t, o ,): 0< t<_ 1} is a reverse martingale,
-

(22) {V n (i/n) /(i /n): 1 ^ i n} is a reverse martingale.


MARTINGALES ASSOCIATED WITH U.,, V,,, W,,, AND R,, 137

and

(23) {R (i/(n+1))/(i/n); 1 i< n} is a reverse martingale.

Also,

(24) { „:;/ (i —1): 2 i<_ n} is a reverse submartingale.

Remark 2. For each a> 0 the process {G (t)/ t: a ^ t < 1 } is a reverse mar-
R

tingale. Thus Inequality A.11.1 gives P( IIG„/ I II', ? A) < A -' E (G„(a )/ a) = 1/A.
Passing to the limit on "a” gives

(25) P(IIGn/I)II0 A)<1/A for all A>0.

In later chapters we will show that equality actually holds. In the latter chapter
on linear bounds we will also use martingale methods to establish the com-
panion inequality

(26) P(II I/G n II 6',,:, ^ A) s el exp (—A) for all A > 0.

This inequality requires more technical detail than we care to display at this
time; however, we will use it before we prove it. Thus for any e > 0 we can
use (25) and (26) to choose a large A e > 0 such that

(27) P(G„ < A,t for 0 s is!1 and G„(t) ^ t /A, for fn: , < t <_ 1) > 1— E.

That is, with high probability G„ is bounded between linear functions.

Exercise 5. Show that for any fixed Os is 1 we have

(28) G (t) — t is a reversed martingale.

Use this to provide an alternative proof of Proposition 4 below.

Exercise 6. Show that

(29) {U n (^n:; ) /(1— fin , ; ): 0s i ^ n} is a martingale.

Additional Results for Various Suprema

The martingale results that follow will not be used until much later in this
monograph.
Recall that for any function f we let f + and f - denote the positive and
negative parts respectively, so that f =f + —f. Thus (since the upper extreme
is achieved)
IIU II =ö ax U(t) and IIU II = — o inf^ U„(t).
138 CONVERGENCE AND DISTRIBUTIONS OF EMPIRICAL PROCESSES

Proposition 3. (Csdki) We note that

(30) II n(G, — I ) + 11, II n(G n — I) I , and II n(G, — I)II are submartingales

with respect to o- [e,, ... , en ].

Proof. (See Csäki, 1968). Let S n = n(G n — I). Let Tn denote a value of t at
which IIS'.II achieves its maximum, and note that

(a) II S n+III — II Sn II ^ S,,±1 (r,,) — II Sn II = Ilo.r„]( n+l) — Tn•

[It is the inequality in (a) that breaks down when we try this with Sn =—./c'c W ;.]
We thus have that

E(IISn+lII—IISnIIIel,..., en, Tn)^E( 1 10. 1(Cn+l) — TnI Tn)

= P(en+l Tn)(1 —Tn)+P(Sn+1> r,,)(0 — T1,)

= Tn (1 — Tn )+(1 — Tn )(0 — Tn )=0;

and hence, taking conditional expectations we get

E(IIS,,+III—iIS+nIIIe...... en)?0.
Since IISn II n, we have EII Sn II <oo. Thus IISn II is a submartingale with respect
to the o-field o [C,, ... , CJ generated by e,, ... , 6 n .
The proof for IISnII is completely analogous.
If a n and 13n are both submartingales with respect to an increasing sequence
.n of o-fields, then so is a n v A n since

E(an+l V Yn+l I 9°n) ^ E(an+l I Yn) v E(Yn+l I «9"n) ^ an V Fan•

Thus 1I Sn II = II S v 11 S„ 1I is a submartingale. ❑

Remark 3. The result of the previous proposition maybe generalized consider-


ably. We may replace II II by (I II ä for any 0 < a < b <_ 1 and we may replace
n(G, — I) + , and so on by n(G, — I) + /f, and so on for any positive function f
and still maintain the property of increasing conditional expectations. Thus
II n (6 n — I) / f lI ä will be a martingale with respect to a[ C,, ... , a,,] provided f
}

is such that Ell n (G n —1)^/ f II < oo. The same holds for Proposition 4 below.

Proposition 4. (Sen) We note that

(31) II(Gn — I) + II, II(Gn — I)11, and II6 n — III are reverse submartingales

with respect to Sen = o,[Cf: ,, ... , en:n, Sn+l, Cn+2, • • J.


MARTINGALES ASSOCIATED WITH U, V,,, W,,, AND d8„ 139
Proof. (See Sen, 1973b) Let r,, denote a value of I at which II(6„ — I)+II
achieves its maximum. Now

E (II( G . — I ) + II I ` fi n +1) ^: E(Gn(Tn +l) — Tn +l l.n +l) v 0


I "
-- Y, [P(fi ^ Tn +l ( 9 'n+l) — Tn +l] V 0

— [P(41 Tn +11'99,,+1) — Tn+l]V 0

= [P(fl C Tn +1 16n+l:l, • • ' S n+l:n +l) — Tn +l] V 0


— [G n +1(Tn+,) — r,,+1] V 0

= II(G,+l — I) + II

establishes the first result. The rest is similar to Csäki's Proposition 3. ❑

Exercise 7. If wn (a) denotes the modules of continuity of U n and if W„


v[U m (t):0<t<1, 1 <m<n], then

(32) ( ✓ n w „(a), %„), n ? 1, is a submartingale


for any fixed a > 0.
Inequality 3. (Sen) If q is 7 on [0, 2], symmetric about t = Z, and
J [q(t)] 2 di < co, then for some 0< M < o0

(33) P max I) Uk jj ?A ^ <- for all A > 0


(] --k^ n

and

(34) P (max k1qI for all A > 0.


A
Proof. Now Proposition 4, Remark 3, and the reverse martingale inequality
(Inequality A.11.1) give

P(max ✓ n/k IlUk /g112 A) = P(max II(Gk — I)/qII'A/'f)


k?n k?n

(a) ^^ IE JJUn/9II

( 2 /A)EJIvn/qlI/ 2

(2 /A)EZ =(2 /A) ^P(Z>x) dx

/x } dx]
c(2 /A) 1 J {[J q2(i )
1+
1
112
dt
]
2

Jq
('

c2 1+ -2 (t) dt /A
0

(b) = M/A
140 CONVERGENCE AND DISTRIBUTIONS OF EMPIRICAL PROCESSES

establishing (34). Also Proposition 3, Remark 3, and the martingale inequality


(Inequality A.10.1) give

P( max max k
= ( 1^k^n I II > A.)
1—k<_n n k 1^ ^ A)
IIq q

(c) cA-'EIIvn/qII
(d) : M/A

using the proof of (b) from (a) in going from (c) to (d). See also Sen (1973b).

For q = 1, the DKW inequality 9.5.1 below allows considerable


strengthening of inequality 3.

Nonidentically Distributed Observations

Exercise 8. (Vanderzanden, 1980) Suppose X, 1 ,.. . , X. are independent


with df's F 1 ,. .. , Fnn on (—cc, oo). Consider any real numbers c, 1 , ... , cn „.
Let —cc <_ a < oo be fixed. Then

Cn.
[1[a<xFni(a,x]]
(35) foraSx<ol7
I — Fnr(a, x]
is a martingale adapted to the o-fields

(36) ax=a'[1[a<x,;SV]: ay<_x and 1<—inn].

Likewise for —cc< a < cc fixed, we have that

(37)
n
cn;
1[x<xn;sa] —Fn; x, a ll for—oo<x<a
1— Fn1(x,a]

is a reverse martingale adapted to the a-fields

(38) vx=a'[1[ r <x,.=a]: xy^a and 1< i<_ n]. —

[Note that we must actually restrict (35), and (37), to the interval of x's for
which its denominator is nonzero.]

7. A SIMPLE RESULT ON CONVERGENCE IN '1q11 METRICS

Theorem 1. Suppose that

(1) q is / on [0, 2], symmetric about Z, and f [q(t)] -2 dt <oo.


o,
A SIMPLE RESULT ON CONVERGENCE IN jJ•/qJj METRICS 141
Then the special construction of Section 3.1 satisfies

(2) II(U — v) /q11 --*P 0,


(3) II( —")/qII —>P 0,
(4) II(W^ —W)/ q — 0,
(5) IRI — W) /qJI --*P 0
as n _* m. Of course, p„ (l, c)-p
p,,,
1 ,, as n --, oo is required for (2)-(3) and (4)-(5)
simultaneously.

Proof. We pattern this proof after Pyke and Shorack (1968). We treat only
W. (of which U. is a special case). Now

(a) II( W „— W )/g41fl^IIW,,/q{lo+4lW/gllö+{!(W„— )/glls 2


+ (symmetric terms on [2, 1]).

For given s > 0 we choose 0 = 0 so small thatE

e
P(I^Wn/gllö E) [q(t)] -2 dt/e 2 by Inequality 3.6.1
0

(b) E3/EZ= E.

Likewise,

(c) P(IIW/q11 E)^ [9(t)] -2 dt /e 2 _ E3 /E2= E.


J e

Finally, we note that

(d) II(W,, —W)/9118 2 — IIW^—WII/q(e)-->D 0,


so that

(e) P(II(NN„— W) /gIIB2>e)<_e for aD n ;2t some n,.

Thus from (a) -(c) and (e) we obtain

(f) P(II (W n — W)/ q II ^ 6e) <_ 6e for all n ? n F ,

giving (4). See also Chibisov (1964-65). ❑

Exercise 1. Use Inequality 3.6.2 to prove (3) and (5).


142 CONVERGENCE AND DISTRIBUTIONS OF EMPIRICAL PROCESSES

Exercise 2. Suppose q is 7 on [0, 1] with Jö [q(t)] 2 dt <ao.

Show that the Skorokhod construction of the partial-sum process S. of


Theorem 2.4.2 satisfies

(6) II(§n–S)/qII—c0 asn- oo.

This special case of O'Reilly's (2.4.10) has an easy proof.

8. LIMITING DISTRIBUTIONS UNDER THE NULL HYPOTHESIS

In this section we wish to consider some of the standard statistics that have
rich histories.

Example 1. (i) (Kolmogorov-Smirnov) Now


as
(1) Dn=IIU II_ -*a jIv"II

[see (2.2.11) and (2.2.12)]

(2) P( IIU'II >x) =exp(-2x2) for all x?0

and

(3) P(IIUII>x)=1–L(x)=2 (-1) k+ 'exp(-2k 2 x 2 ) for all x>_0.


k=1

(ii) (Kuiper) We also have

(4) K o=s=r_t IUn(t)—vn(S)I=IIvnII+IIu II—dK=IIU+II+IIU

as n -' oo, where [see (2.2.23)]

(5) P(IIU+II+IIU II>x)=1–P(x)=2Y_ (4k 2 x 2 -1)exp(-2k 2 x 2 )


k=l
for all x > 0.

The functions L and P are given in Tables I and 2.

Example 2. (Renyi) As an alternative to Kolmogorov's statistics, Renyi


(1953) proposed

(6) sup {✓ n [ i (x) – F(x)]"/ F(x): a ^ F(x) <_ b}


IIUJ„ /I II ä when F is continuous on [a, b].
Table 1. limiting Distribution of the Kolmogorov-Smirnov Statistic
(from Smirnov (1948))

x L(x) x L(x) x L(x) x L(x)


0.28 0.000001 0.73 0.339113 1.18 0.876548 1.76 0.995922
0.29 0.000004 0.74 0.355981 1.19 0.882258 1.78 0.996460
0.30 0.000009 0.75 0.372833 1.20 0.887750 1.80 0.996932
0.31 0.000021 0.76 0.389640 1.21 0.893030 1.82 0.997346
0.32 0.000046 0.77 0.406372 1.22 0.898104 1.84 0.997707

0.33 0.000091 0.78 0.423002 1.23 0.902972 1.86 0.998023


0.34 0.000171 0.79 0.439505 1.24 0.907648 1.88 0.998297
0.35 0.000303 0.80 0.455857 1.25 0.912132 1.90 0.998536
0.36 0.000511 0.81 0.472041 1.26 0.916432 1.92 0.998744
0.37 0.000826 0.82 0.488030 1.27 0.920556 1.94 0.998924

0.38 0.001285 0.83 0.503808 1.28 0.924505 1.96 0.999079


0.39 0.001929 0.84 0.519366 1.29 0.928288 1.98 0.999213
0.40 0.002808 0.85 0.534682 1.30 0.931908 2.00 0.999329
0.41 0.003972 0.86 0.549744 1.31 0.935370 2.02 0.999428
0.42 0.005476 0.87 0.564546 1.32 0.938682 2.04 0.999516

0.43 0.007377 0.88 0.579070 1.33 0.941848 2.06 0.999588


0.44 0.009730 0.89 0.593316 1.34 0.944872 2.08 0.999650
0.45 0.012590 0.90 0.607270 1.35 0.947756 2.10 0.999705
0.46 0.016005 0.91 0.620928 1.36 0.950512 2.12 0.999750
0.47 0.020022 0.92 0.634286 1.37 0.953142 2.14 0.999790

0.48 0.024682 0.93 0.647338 1.38 0.955650 2.16 0.999822


0.49 0.030017 0.94 0.660082 1.39 0.958040 2.18 0.999852
0.50 0.036055 0.95 0.672516 1.40 0.960318 2.20 0.999874
0.51 0.042814 0.96 0.684636 1.41 0.962486 2.22 0.999896
0.52 0.050306 0.97 0.696444 1.42 0.964552 2.24 0.999912

0.53 0.058534 0.98 0.707940 1.43 0.966516 2.26 0.999926


0.54 0.067497 0.99 0.719126 1.44 0.968382 2.28 0.999940
0.55 0.077183 1.00 0.730000 1.45 0.970158 2.30 0.999949
0,56 0.087577 1.01 0.740566 1.46 0.971846 2.32 0.999958
0.57 0.098656 1.02 0.750826 1.47 0.973448 2.34 0.999965

0.58 0.110395 1.03 0.760780 1.48 0.974970 2.36 0.999970


0.58 0.122760 1.04 0.770434 1.49 0.976412 2.38 0.999976
0.60 0.135718 1.05 0.779794 1.50 0.977782 2.40 0.999980
0.61 0.149229 1.06 0.788860 1.52 0.980310 2.42 0.999984
0.62 0.163225 1.07 0.797636 1.54 0.982578 2.44 0.999987

0.63 0.177753 1.08 0.806128 1.56 0.984610 2.46 0.999989


0.64 0.192677 1.09 0.814342 1.58 0.986426 2.48 0.999991
0.65 0.207987 1.10 0.822282 1.60 0.988048 2.58 0.999 9925
0.66 0.223637 1.11 0.829950 1.62 0.989492 2.55 0.999 9956
0.67 0.239582 1.12 0.837356 1.64 0.990777 2.60 0.999 9974

0.68 0.255780 1.13 0.844502 1.66 0.991917 2.65 0.999 9984


0.69 0.272189 1.14 0.851394 1.68 0.992928 2.70 0.999 9990
0.70 0.288765 1.15 0.858038 1.70 0.993823 2.80 0.999 9997
0.71 0.305471 1.16 0.864442 1.72 0.994612 2.90 0.99999990
0.72 0.322265 1.17 0.870612 1.74 0.995309 3.00 0.99999997

143
Table 2. Limiting Distribution of the Kuiper Statistic
(from Owen (1962))

x P(x) x P(x) x P(x) x P(x)


0.50 0.000001 1.05 0.243174 1.50 0.822255 1.95 0.985848
0.52 0.000003 1.06 0.257083 1.51 0.830121 1.96 0.986769
0.54 0.000007 1.07 0.271223 1.52 0.837724 1.97 0.987635
0.56 0.000021 1.08 0.285570 1.53 0.845067 1.98 0.988450
0.58 0.000054 1.09 0.300099 1.54 0.852155 1.99 0.989216

0.60 0.000128 1.10 0.314786 1.55 0.858991 2.00 0.989936


0.62 0.000276 1.11 0.329607 1.56 0.865580 2.01 0.990612
0.64 0.000553 1.12 0.344538 1.57 0.871927 2.02 0.991247
0.66 0.001035 1.13 0.359554 1.58 0.878036 2.03 0.991843
0.68 0.001824 1.14 0.374632 1.59 0.883913 2.04 0.992402

0.70 0.003050 1.15 0.389749 1.60 0.889563 2.05 0.992925


0.71 0.003874 1.16 0.404883 1.61 0.894991 2.06 0.993416
0.72 0.004867 1.17 0.420012 1.62 0.900203 2.07 0.993875
0.73 0.006050 1.18 0.435114 1.63 0.905203 2.08 0.994305
0.74 0.007447 1.19 0.450170 1.64 0.909998 2.09 0.994707

0.75 0.009082 1.20 0.465160 1.65 0.914593 2.10 0.995083


0.76 0.010978 1.21 0.480064 1.66 0.918994 2.12 0.995762
0.77 0.013159 1.22 0.494865 1.67 0.923206 2.14 0.996355
0.78 0.015650 1.23 0.509546 1.68 0.927235 2.16 0.996870
0.79 0.018472 1.24 0.524090 1.69 0.931087 2.18 0.997317
0.80 0.021649 1.25 0.538483 1.70 0.934766 2.20 0.997704
0.81 0.025202 1.26 0.552710 1.71 0.938280 2.22 0.998039
0.82 0.029149 1.27 0.566758 1.72 0.941633 2.24 0.998328
0.83 0.033510 1.28 0.580614 1.73 0.944830 2.26 0.998577
0,84 0.038300 1.29 0.594266 1.74 0.947878 2.28 0.998791

0.85 0.043534 1.30 0.607703 1.75 0.950781 2.30 0.998975


0.86 0.049223 1.31 0.620917 1.76 0.953546 2.32 0.999132
0.87 0.055378 1.32 0.633898 1.77 0.956175 2.34 0.999267
0.88 0.062006 1.33 0.646638 1.78 0.958676 2.36 0.999382
0.89 0.069112 1.34 0.659129 1.79 0.961053 2.40 0.999562

0.90 0.076699 1.35 0.671366 1.80 0.963311 2.44 0.999692


0.91 0.084767 1.36 0.683343 1.81 0.965455 2.48 0.999785
0.92 0.093313 1.37 0.695055 1.82 0.967488 2.52 0.999851
0.93 0.102333 1.38 0.706498 1.83 0.969417 2.56 0.999898
0.94 0.111821 1.39 0.717669 1.84 0.971245 2.60 0.999930

0.95 0.121767 1.40 0.728565 1.85 0.972976 2.64 0.999953


0.96 0.132161 1.41 0.739183 1.86 0.974615 2.68 0.999968
0.97 0.142989 1.42 0.749524 1.87 0.976166 2.72 0.999979
0.98 0.154236 1.43 0.759585 1.88 0.977633 2.76 0.999986
0.99 0.165887 1.44 0.769367 1.89 0.979020 2.80 0.999991
1.00 0.177924 1.45 0.778871 1.90 0.980329 2.84 0.999994
1.01 0.190326 1.46 0.788097 1.91 0.981566 2.88 0.999996
1.02 0.203075 1.47 0.797046 1.92 0.982733 2.92 0.999997
1.03 0.216147 1.48 0.805720 1.93 0.983833 2.96 0.999998
1.04 0.229521 1.49 0.814122 1.94 0.984871 3.04 1.000000

144
LIMITING DISTRIBUTIONS UNDER THE NULL HYPOTHESIS 145

Renyi was motivated by the desire to measure relative error rather than absolute
error. Now

(7) IIUn/IIIa—>a IIv'^/IIIb=II§#Illl-b)%b asn ->oo

for any 0< a <b < 1. See Exercise 2.2.19 for the limiting distribution in case
b=1; thus

(8) P(IIU /IIIä<—x)=24(x ✓a/(1 — a))-1 for all x>_0


[or ✓ a/(1—a) 11U /III'-> II§II ö]
and
k+Iexp
(-1) (2k+ ir 2 (1—a)
(_1)
(9) P(IIU/IIIäsx)=4 o
ark=o 2k+ 1 8x a )
a
=—R ( x 1—a)

for all x>0. The function R(• ✓a/(1— a)) is given in Table 3 for various a.

Proof. Since II /IIIa is a II II-continuous function, the —' d part of (7) is


immediate. We work in reverse order for the rest. Thus from Exercise 2.2.2,

(10) I'(IIS#II-x)
=P((1+t)U # (t/(l+ t)) <_x for (1—b)/b<—t<—(1—a)/a)
=P(U'D (s)^x(1—s) for(1—b)^s<_1—a)
=P(U # (r)<—xr forbr<a)

using symmetry about s = 2 in the last step. This proof is from Csörgö (1967).
E

Exercise 1. (Csörgö, 1967) Derive expressions for the limiting distributions


in case b<1 in (7).

Exercise 2. Establish some results along the lines of (7), but use as a method
of proof Renyi's (1953) representation of Uniform (0, 1) order statistics in
terms of independent Exponential (1) rv's. See Proposition 8.2.1.

Example 3. (Cramer; von Mises) Now Theorem 2.3.5 shows


'
(11) W2,= J U„(t)dt— ,. d W 2 = U 2 (t)dt asn-*oo
0 0
Table 3
Limiting dietribuUion of the Renyi Static (from Renyi. (1953))

y a 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.06 0.09 0.1 02 0.3 0.4 0.5
.1 0. 000 0.000
0.5 0.0000 0.0000 0.0008 0.0092
1.0 0.0000 0.0000 0.0000 0.0000 0.0000 0.0092 0.0716 0.2001 0.3708
1.5 0.0000 0.0000 0.0000 0.0002 0.0009 0.0023 0.0050 0.0092 0.1420 0.3543 0.5591 0.7328
2.0 0.0000 0.0001 0.0008 0.0038 0.0101 0.0212 0.0367 0.0563 0.0791 0.3708 0.6193 0.7951 0.90821
2.5 0.0001 0.0022 0.0112 0.0299 0.0578 0.0925 0.1320 0.1730 0.2155 0.5778 0.7966 0.9714 0.9751
9.0 0.0000 0.0015 0.0157 0.0474 0.0941 0.1487 0.2081 0.2832 0.3184 0.3780 0.7328 0.9009 0.9915 0.9954'
3.5 0.0001 0.0092 0.0491 0.1135 0.1879 0.2629 0.3341 0.3994 0.4598 0.5140 0.8398 0.9561 0.9978 0.9991
4.0 0.0006 0.0291 0.1052 0.2001 0.2942 0.3804 0.4570 0.5244 0.5635 0.6353 0.9082 0.9823 0.9995 0.9999
4.5 0.0031 0.0643 0.1778 0.2950 0.4001 0.4902 0.5685 0.8311 0.6880 0.7328 0.9511 0.9936 0.9999 1.0000
5.0 0.0096 0.1135 0.2562 0.3895 0.4985 0.5873 0.6594 0.7193 0,7683 0.8088 0.9751 0.9979 1.0000
5.5 0.0225 0.1726 0.3511 0.4784 0.5863 0.8723 0.7374 0.7903 0.8326 0.8665 0.9887 0.9994
6.0 0.0428 0.2375 0.4204 0.5591 0.6627 0.7409 0.8006 0.6463 0.8817 0.9081 0.9954 0.9999
8.5 0.0707 0.3045 0.4952 0.6310 0.7282 0.7989 0.8509 0.8895 0.9181 0.9395 0.9977 1.0000
7.0 0.1053 0.3708 0.5639 0.6939 0.7834 0.8461 0.8904 0.9220 0.9448 0.9607 0.9991
7.5 0.1452 0.4347 0.6193 0.7484 0.8294 0.8839 0.9207 0.9460 0.9633 0.9752 0.9996
8.0 0.1889 0.4959 0.8811 0.7951 0.8871 0.9135 0.9438 0.9634 0.9783 0.9847 0.9999
8.5 0.2348 0.5513 0.7301 0.8345 0.8977 0.9385 0.9806 0.9756 0.9850 0.9908 1.0000
9.0 0.2819 0.8032 0.7731 0.8896 0.9221 0.9540 0.9729 0.9849 0.9907 0.9948
9.5 0.3290 0.6510 0.8104 0.8950 0.9410 0.9713 0.9617 0.9898 0.9944 0.9989
10.0 0.3754 0.8938 0.8427 0.9175 0.9564 0.9770 0.9876 0.9936 0.9967 0.9983
10.5 0.4205 0.7328 0.8704 0.9358 0.9680 0.9840 0.9921 0.9961 0.9981 0.9991
11.0 0.4640 0.7678 0.8939 0.9505 0.9788 0.9891 0.9949 0.9976 0.9989 0.9995
11.5 0.5055 0.7992 0.9137 0.9622 0.9833 0.9927 0,9968 0.9988 0.9994 0.9997
12.0 0.5450 0.8271 0.9303 0.9713 0.9882 0.9951 0.9980 0.9992 0.9997 0.9999
12.5 0.5824 0.8517 0.9441 0.9784 0.9917 0.9968 0.9988 0.9995 0.9998 0.9999
13.0 0.8174 0.8734 0.9555 0.9841 0.9943 0.9980 0.9993 0.9997 0.9999 1.0000
13.5 0.6509 0.8924 0.9648 0.9883 0.9961 0.9987 0.9996 0.9999 1.0000
14.0 0.6812 0.9090 0.9724 0.9915 0.9973 0.9992 0.9998 0.9999
14.5 0.7099 0.9234 0.9780 0.9938 0.9982 0.9995 0.9999 1.0000
15.0 0.7367 0.9358 0.9833 0.9958 0.9988 0.9997 0.9999
15.5 0.7615 0.9464 0.9872 0.9969 0.9992 0.9998 1.0000
18.0 0.7844 0.9555 0.9902 0.9978 0.9995 0.9999
18.5 0.8055 0.9631 0.9927 0.9985 0.9997 0.9999
17.0 0.8249 0.9697 0.9944 0.9990 0.9998 1.0000
17.5 0.8428 0.9752 0.9958 0.9993 0.9999
18.0 0.8591 0.9797 0.9989 0.9995 0.9999
18.5 0.8740 0.9838 0.9977 0.9997 1.0000
19.0 0.8878 0.9867 0.9983 0.9998
19.5 0.9000 0.9893 0.9988 0.9999
20.0 0.9112 0.9915 0.9991 0.9999
20.5 0.9213 0.9932 0.9994 0.9999
21.0 0.9304 0.9948 0.9996 1.0000
21.5 0.9388 0.9957 0.9997
22.0 0.9480 0.9967 0.9998
22.5 0.9528 0.9974 0.9998
23.0 0.9590 0.9980 0.9999
23.5 0.9838 0.9984 0.9999
24.0 0.9696 0.9988 1.0000
24.5 0.9724 0.9991
25.0 0.9760 0.9993
28 0.9821 0.9996
27 0.9887 0.9998
28 0.9902 0.9999
29 0.9929 0.9999
30 0.9949 1.0000
35 0.9991
40 0.9999
43 1.0000

146
LIMITING DISTRIBUTIONS UNDER THE NULL HYPOTHESIS 147
since j[ ]1 is a I- continuous functional on C. In Chapter 5 we will show that
i
(12) Wz= I o U (t) dt
2 z zZ;
l
J=I.J

for iid N(0, 1) rv's Z . The distribution of W 2 is given in Table 4.


;

Table 4. Limiting Distribution of the Cramer-von Mises Statistic W 2


(from Anderson and Darling (1952))

x P(WZSx) x P(W2sx) x P(WZS_x)


.02480 .01 .08562 .34 .17159 .67
.02878 .02 .08744 .35 .17568 .68
.03177 .03 .08928 .36 .17992 .69
.03430 .04 .09115 .37 .18433 .70
.03656 .05 .09306 .38 .18892 .71
.03865 .06 .09499 .39 .19371 .72
.04061 .07 .09696 .40 .19870 .73
.04247 .08 .09896 .41 .20392 .74
.04427 .09 .10100 .42 .20939 .75
.04601 .10 .10308 .43 .21512 .76
.04772 .11 .10520 .44 .22114 .77
.04939 .12 .10736 .45 .22748 .78
.05103 .13 .10956 .46 .23417 .79
.05265 .14 .11182 .47 .24124 .80
.05426 .15 .11412 .48 .24874 .81
.05586 .16 .11647 .49 .25670 .82
.05746 .17 .11888 .50 .26520 .83
.05904 .18 .12134 .51 .27429 .84
.06063 .19 .12387 .52 .28406 .85
.06222 .20 .12646 .53 .29460 .86
.06381 .21 .12911 .54 .30603 .87
.06541 .22 .13183 .55 .31849 .88
.06702 .23 .13463 .56 .33217 .89
.06863 .24 .13751 .57 .34730 .90
.07025 .25 .14046 .58 .36421 .91
.07189 .26 .14350 .59 .38331 .92
,07354 .27 .14663 .60 .40520 .93
.07521 .28 .14986 .61 .43077 .94
.07690 .29 .15319 .62 .46136 .95
.07860 .30 .15663 .63 .49929 .96
.08032 .31 .16018 .64 .54885 .97
.08206 .32 .16385 .65 .61981 .98
.08383 .33 .16765 .66 .74346 .99
1.16786 .999

Exercise 3. (Watson) Show that the statistic

(13) U„ =2J' f [ Un(t) - Un(s)] z dsdt =f


0
,'

.,
UJ„(t)dt -(J ,v„(t)dt)z]
0

- *d U2 =[ J 0 '
U 2 (t)dt- (J "

0
U(t)dt z asn--*oo.
)

]
148 CONVERGENCE AND DISTRIBUTIONS OF EMPIRICAL PROCESSES

The distribution of U 2 is (see Exercise 5.3.4)

2 / ,TZ
UZ= E Z2jZE;=11v11
7r
i =1

for iid Exponential (1) rv's E. Thus

P(U>x)=P(IIUII>7r ✓x)=1—L(-rr✓x) for Lasin(3)

and can be obtained from Table 1.

Example 4. (Anderson and Darling) We note that

(14) A„ — f _! (
dt^dAZ= fol UZ(t) dt as n-*oo
o t(1—t) t(1-1)

since Theorem 3.7.1 shows

aJ Z _U 2 ' i
IA„—A
z 21^ [I(1—I)]'”
11. fo [t(1—t)] 314di
n

(a) ^II(Un—U)/gjj{IjU„—U +IlUll} 16 with q(t)=[t(1—t)]' /4


(b) = oD (1){oP (1) + OP (1)} = o p (1)•

It will be shown in Chapter 5 that

1 Z,
(15) AZ=
=lj(j+l)

for iid N(0, 1) rv's 4 The distribution of A 2 is given in Table 5.

Table 5. Limiting Distribution of the


Anderson-Darling Statistic A 2
(from Anderson and Darling (1954)
and Pearson and Hartley (1972))

x P(A2>x)
1.610 .150
1.933 .100
2.492 .050
3.070 .025
3.85 .010

Remark 1. Table 6 gives, in compact form, a good approach to obtaining the


upper 15, 10, 5, 2.5, and 1 percentage points of the distributions of Ilv„II,
LIMITING DISTRIBUTIONS UNDER THE NULL HYPOTHESIS 149

Table 6. Modifications of the Asymptotic distributions for finite


sample sizes. The tabled percent points are those of the asymptotic
distributions, and agree with tables 1-5. The claim is that
T(D„) D, T(W, W2- etc.

for the modified form T (statistic) given in column 1 of the statistics


in column 1.
(from Hiometrika Tables for Statisticians, Volume 2, (1972))

Upper percentage points for modified T


Modified forms
Statistic T(D'), T(D), T(V), etc. 15.0 10.0 5.0 2.5 1.0

D'' (D )- D * (v'+0.12+0.11/✓n) 0.973 1.073 1.224 1.358 1.518


D D (v+0.12 + 0.11/v) 1.138 1.224 1.358 1.480 1.628
V V (' + 0.155 + 0.24/V) 1.537 1.620 1.747 1.862 2.001
Wt (W 2-0.4/n+0.6/nz)(1.0+ 1.0/n) 0.284 0.347 0.461 0.581 0.743
U2 (0-0.1/n+0.1/n2)(1.0+0.8/n) 0.131 0.152 0.187 0.221 0.267
A For all n ? 5: 1.61 1.933 2.492 3.020 3.857

IIVJ„II, +^^v„^^, W„, U n , and A from the asymptotic distributions of


IU 1, IJUJI, IIUJ + 11+^^U ^^, W 2 , U 2 , and A 2 . The work is due to Stephens (1970,
1974, 1976, 1977).

Example 5. (Shepp; Rice; Johnson and Killeen) C. Mallows suggested the


statistic

(16) "
Mn = J IUJ„(t)I dt -* d M m II IU(t)I dt as n oc.
0

Shepp (1982) used calculations involving Kac's formula to characterize the


Laplace transform E e - s' of M. Rice (1982) used Shepp's result to give an
explicit formula for E e -5 'N. Then Johnson and Killeen (1983) inverted

Table 7. Limiting Distribution of the Mallows Statistic M


(from Johnson and Killeen (1983))

x P(Mx) x P(Mx) x P(Mx)


1531 .05 .2818 .50 .5123 .91
.1721 .10 .2975 .55 .5267 .92
.1875 .15 .3147 .60 .5427 .93
.2011 .20 .3338 .65 .5610 .94
.2142 .25 .3553 .70 .5821 .95
.2270 .30 .3804 .75 .6074 .96
.2395 .35 .4103 .80 .6391 .97
.2533 .40 .4480 .85 .6822 .98
.2671 .45 .4993 .90 .7518 .99
150 CONVERGENCE AND DISTRIBUTIONS OF EMPIRICAL PROCESSES

s 'Ee s" to obtain the following formula for the df of M:


- -

(17) P(M<—x)=J ir/2


gi siz l/,
(x / 6j 2 ),

where

+L(t) 3z^3
= expt ( 2/27 t 2 )
- Ai((3t) -413)

Ai is the Airy function (see Johnson and Killeen for references), S; = —a; /2' /3 ,
and a; is the jth zero of Ai'. The df in (17) is given numerically in Table 7.

Theorem 1. (Linear Rank Statistics, Häjek and Sidäk) Let ,„„ and
W be as in Theorems 3.1.1 and 3.1.2. Suppose
(18) h = h, — h 2 with each h ; 1' and in YZ .
Then the linear rank statistic T. of (3.1.54) satisfies
(19) T„ _ c „I I R 1=f R „^
I hdl^ n * p T= J 1 hdW.
=t C C J Jo
Proof. This is a special construction version of Häjek and Sidäk (1968, p.
163). We can trivially replace Z. by II,, in (a) -(f) and then (i) (which is applied
to (f)) of this iid case of Theorem 3.4.2. It thus suffices for > 0 to exhibit an
h E of the form (3.4.23) for which Var [ j'o (h — h E ) d 0] < E 3 . But (3.6.15) with
m = n shows this holds if
2 <

( +l
)]

(a) n 1 1
h [ L)n h`(—

n+1
E3 for all n>—some n 6 .

The easy Exercise 4 completes it; see Häjek and Sidäk (1968, p. 164). ❑

Since S„ = f ä hdW„ -^ p J hdW by Theorem 3.1.2, we have

(20)T —S„= n c„;


c c
[ h( R„ '1 ) —h(^„ ) 1J
; - > 0 as n *co, under (18).
-

Let h„ be constant on each ((i-1)/n, i/n]; then,


(21) we may replace h by h„ and -*, by = a in (19) if fth — h„ ]I -* 0.

Exercise 4. Establish line (a) above. Write out details for (21).
CHAPTER 4

Alternatives and
Processes of Residuals

0. INTRODUCTION

The weighted empirical process (in this chapter we center at F rather than at
as in the previous chapter)

(1) E(x) =i= c,,.{ 1 [x x] — F ( x )} for —oo<x<oo


c c ;_,

of independent rv's X, , ... , X„„ whose true df's F, ... , F„„ satisfy
max,^,s„ i F ; — FIt -*0 is asymptotically equal (according to Theorem 3.4.1) to

(2) W(F)+ 1 ^Y^ cn1[FF, — F].

The mild "nearly-null-type” condition is enough to ensure that the properly


centered random part behaves as W(F). Thus the key to the behavior of E. is
the deterministic term in (2). Equations (4.1.4)-(4.1.6) specify a contiguity
condition under which the behavior of this deterministic term is particularly
simple, and results in a representation for the limiting process on the probability
space of the special construction. Analogous results are shown to hold for the
empirical rank process R,,. Some of the standard contiguity results are also
presented, with representations for the limiting rv's.
More general local alternatives are considered in Section 2. Asymptotic
optimality of F. is considered in Section 3. The behavior of various statistics
under fixed alternatives is considered in Section 4.
Section 5 extends the results of Section 1 to a uniform convergence strong
enough to allow processes of residuals to be similarly treated in Section 6.
151
152 ALTERNATIVES AND PROCESSES OF RESIDUALS

1. CONTIGUITY

Let µ denote a o -finite measure on (R, P'). Let Xn; denote the identity map
on R for 1 ^ i <_ n, n ? 1. Consider testing the null hypothesis

(1) Pn: Xn' ,...,Xnn areiidf

against the alternative hypothesis

(2) Q: X, 1 ,. . , X. are independent with densities fn ,, ... ‚f,,,,;

all densities are with respect to µ. We agree that in this section

(3) j[h]I= ✓$ h2dµ and h= J hdµ forall hcE2 (µ),

where the space of all square integrable functions h is denoted by 22 (µ ). Let


p n and q„ denote the densities of P. and Q n with respect to µ x • • • x µ.
We assume the existence of constants a., , ... , a nn , n >_ 1, satisfying
z 2
a,,,
(4) max = max 2 -*0 as n - oo where a' = a„ _ (a,, 1 , ... , ann ).
a s 1 i n ani

We also assume the existence of a function

(5) 5 in Y2 (µ)

for which the densities satisfy our key contiguity condition

(6)
ant

1 2

C=, a'a i
a2 . v
a,, 1 ! a'a
Vf -
- J 1b
2
- 0 as n^oo.

Theorem 1. (Convergence of the centering function) If (4)-(6) hold, then


the Fn; 's are nearly null and

(7) 7f = J Sv'7 dµ =0.


If c,, 1 ,.. . , cnn , n> 1, are constants for which

(8) c = max nc n z -*0


max "' ' as n -> oo, where c' = c', ° (cn ,, ... , cnn ),
c t<_isn :i=, cni

CONTIGUITY 153
then

( 9 )
1 n
(
^` n
)— Li=1 ac J
^.1
ni ni
28vr dg - >0 as
CCi=,
r
f
Cn, Fn1 — F
0
a'Q C'C

It is natural in (9) to make use of the definition

a c
'

( 10 ) pn(a, c) = r r
as cc

[Note also (40) below.]

Consider the weighted empirical process

(11) E n (x)= I cn; {1 XX} —F(x)} for —oo<x<c

and the empirical rank process


1 ((n+1)t)
l8 n (t)= r Y.
(12)
C C i_,C , for0<_ t<_ 1,
nDn

where D 1 ,. .. , D the antiranks of the X,11 , .. .‚ X nn of Q. in (2) and F


is given by P. in (1). The following theorem combines Theorem 3.4.1 (which
treats the "appropriately centered" part of E n and UB n , and requires only nearly
null alternatives) with Theorem 1 [which treats the resulting centering function
and requires our contiguity condition (6)] to obtain results for E n and R,,
themselves.

Theorem 2. Suppose (4)-(6) hold. Then

1 n
(13) 11 Fn —FII=O(n " 2 ) asn-goo, where,,-- E Fn; ,
n ;= 1

c) J_ 23f d/LI -0 as n-*,


(14) I Il

(15) R„—jW+pn(a,c) J 2Sfdµrl _*P0 asn-00,

when cn = 0,

for the Brownian bridge W and the specially constructed Xn; = F„'(fn; ) of
Theorem 3.1.1 on the special (11, si, P).
154 ALTERNATIVES AND PROCESSES OF RESIDUALS

Remark 1. All manner of tests of the hypotheses P. vs. Q. are possible. Many
of these can have their limiting power evaluated, for a contiguous sequence
of alternatives satisfying (4)-(6), by means of Theorem 2. Thus, for example

^I(E n II, l(t) di, and I h dG n for continuous h of


Jo J
bounded variation

are all easily treated via Theorem 2. For example,


F- '
(16) liE n I1 and IIRn1I are both = a to VW+pn(a, c) J 23f dµl

for the special construction X n; = F(4 1 ) of the alternatives Q n .

The likelihood ratio statistic A n for testing Pn vs. Q. is

fni(Xn))
(17) L n =log An= log
i=1 J ( x.)

where log (fn; /f) is defined to equal log(fn; (x)/f(x)), 0, oo according as


f(x)>0, f(x)=f(x)=0, f(x)= 0 <fn1 (x). We note that Ln is <cO a.s. P,,,
while L. > —oo a.s. Q. Thus L. is an extended real-valued rv.
We define

(18) Zn _ „ —„ Zn'
; L
an,

,
28 (Xn1)
a'af^
.

We note of the summands of Zn that under (4)-(6)

(19) 26(X 1 )/ f , 1 <_ i <_ n, are iid (0, 41[6]1 2 ),

using (7). Thus an easy application of the Lindeberg- Feller theorem of Exercise
3.1.2 shows that

(20) Zn d N(0, 41[S]l 2 ) under (4)-(6).

The ry Zn is far simpler to deal with than L. because of this simple form. One
importance of conditions (4)-(6) is that they are applicable in many situations
and that they allow the following replacement of L. by Z.

Theorem 3. (Le Cam) Suppose (1)-(6) hold. Then (20) holds and

(21) Ln — {Zn -21[S]I 2 } + P,. 0 asn- cc.


CONTIGUITY 155

Moreover,

(22) Sf = 0.

Proofs of the theorems are grouped together in a subsection at the end of


this section.

Corollary 1. Suppose (1)-(6) hold. For Xn; =F '(6n; ) we have-

(23) exp (L„) -^ y, exp (Z —2I[S]I 2 ) as n -X

on the probability space (Cl, si, P) of the special construction of Section 3.1.
Here

(24) Z= J S o dw ° with S o =2S(F ')/ f(F '). - -

Proof. We will appeal to Vitali's theorem (Exercise A.8.6). We can rewrite


(21), using Theorem 3.1.2 on Z„, as

(25) exp (L„) -*„ exp (Z-2[6]1 2 ) as n-+oo.

We note that

(a) Epexp(L„)=f
r =i
J f„ 1 (x)dx=1,

and that

(b) Ep exp (Z — 21[6]1 2 ) = exp (i Varp [Z]-21[3]12)= 1

since Exercise 2.1.4 shows

(26) I[So]f 2 = 4 l[S]1 2 .

We rewrite the combination of (a) and (b) as

(c) Ep exp (L„)-. Ep exp (Z-21[S]^ 2 ) as n-+,

where both integrands in (c) are >O. Using (25) and (c) in Vitali's theorem
gives (23). We also remark that Vitali's theorem implies that since (25) holds,
the three statements (23), (c), and

(27) the rv's exp (L„) are uniformly integrable wrt P

are equivalent. U
156 ALTERNATIVES AND PROCESSES OF RESIDUALS

We next consider several applications of Corollary 1 and Vitali's theorem.


See also Section 3 below.
We now let S. = S„(X„,, ... , X„„) denote a random vector of interest to
us. The following theorem tells us that if we can determine the joint asymptotic
distribution of (S„, L„) under P„, then we automatically know the asymptotic
behavior of S,, under Qn . Moreover, according to Theorem 1, instead of
studying (S,,, L„) under Pn , we can actually. study the far simpler ry (S„, Z„)
under P„ provided Q„ satisfies (4)-(6).

Theorem 4. (Le Cam's third lemma) (i) If S„ is a ry for which

1/1 [ 0, v° 1\
(28) F µ 2 4
S„' -* d N(l 7 } i as n -*oounderP„,
[

J
then

(29) S„ -> d N(µ+v, T) as n - counder Q„.

(ii) Thus (29) holds provided (4)-(6) hold and provided

(30) [ ] -+dN [ ° ]. [°
[ as n-00 under P.
]]

with o. 2 =41[S]I2.

Remark 2. Suppose (1)-(6) hold. Let S„ = S„ (Xn , , . .. , X„„) denote a random


element on either (Rk. k ), (R, P), or (D, ), say. Let B denote a set in
the image space. Suppose that S. -^ P S as n - oo for a random element S on
the image space in such a way that

(31) 1 B (S„) 3 p 1 B (S) asn-+oounderP

(this need not hold for all B). We note that the rv's

(32) 1 B (S„) exp (L„) ^ exp (L„) are uniformly integrable wrt P

by (27). Thus Vitali's theorem (Exercise A.8.6) with (25), (31), and (32) shows

Q„(S„ E B) = EQ^ 1 B (S„) = Ep[1 B (S„) exp (L „)]

-> Ep[1 B (S) exp (Z -2I[S]I 2 )) by Vitali

1 B (S) exp (Z -2 1 [S]1 2 ) dP


(33) =
f
n
if (31) holds.
CONTIGUITY 157

Exercise 1. Show that for any S E YZ having 5=0, there exists a density f„
on [0, 1] such that

^[✓n(^—f)—S]1 Z ^O asn-goo

where f denotes the Uniform (0, 1) density. (Hint: Let v=


[ 1- 1[ 6 ]1 2 / n]' i2 f+S/,ln.)

Theorem 5. (Häjek; Shepp) Suppose

(34) S o E . 2 has S a = 0 and 0 = ^ 6 0 (s) ds.


0

Let P and Po denote the distributions on (C,') of U and U + A, respectively.


Then PA « P and

o du-2 o 6ödI).
(35) dP^=exp(0 S
We now present some of the standard results that are usually associated
with the concept of contiguity.

Definition 1. If for any sequence of events A„ c s^„

(36) P„(A„)-.0asn-cc implies Q(A)-0 as n-.cc,

then we say that {Q„} is contiguous to {P„}.

Exercise 2. (Le Cam's first lemma) If L„ -'> d N(—Q 2 /2, 0.2 ) as n - oo, then
{Q„} is contiguous to {P„}. Thus (according to Theorem 3)

(37) {Q„} is contiguous to {PP } if (4)-(6) hold.

[See Häjek and Sidäk (1967, p. 204).]

Exercise 3. Contiguity is equivalent to the condition that any sequence of


rv's on (fl,,, .sf1„) converging to zero in P,,-probability converges to zero in
Q„-probability.

Lemma 1. (Jureckovä) Suppose {Q„} is contiguous to {P„}. Then for every


> 0 there exists a S = S E >0 such that all sequences of events {A„} that satisfy

P„(A„)<S for all but a finite number of indices n


also satisfy
Q„(A„)<e for all but a finite number of indices n.
158 ALTERNATIVES AND PROCESSES OF RESIDUALS

Proof. Assume the lemma to be false. Then for some e 0 > 0 and all values
of 2 -k, k> 1 (values of fi), we can find a sequence {A k,„: n? 1} such that

(a) P„(Ak,„)<2 -k for all n >_some n k

but

(b) Q„(Ak,)? ea for infinitely many indices n.

We now use (b) to choose indices nk satisfying nk> n k , nk + ,> nk, and
especially

(c) Q„t(Ak,,,t) > e o for all k >


-1.

Now define a new sequence of events B. by

(d) _ A k. ,, t if n = n,*^ for some k = 1, 2, .. .


(e) B ” 0 otherwise.

Since n> n k , (a) and (d) together or else (e) alone imply that

(f) P„ (B „) -^ 0 as n -^ co.

However, (c) shows that

(g) Q„(B )>_ e o infinitely often.

Thus {Q„} is not contiguous to {P„} and this is a contradiction. See Jureckovä
(1969). ❑

Exercise 4. (Hall and Loynes, 1977) {Q„} is contiguous to {PP } if and only
if {L„} is uniformly integrable with respect to {P„} and Q„(p„ = 0) -> 0 as n - Co.

Definition 2. The Hellinger distance between densities f and g of probability


measures with respect to a o -finite measure µ on the measurable space (11, sad)
is defined by
( l I /z

(38) H(.f, g) = I [f —v' ] = S [ 7— J ] 2 d,


r
J
1/2
/z ,
(39) = j 2(1—
111 g dµ) 1 _ { 2 (1 P(f, g))}'

where

( 4 0) P(fg)= f ✓fgdp.
CONTIGUITY 159

is called the affinity between f and g. We call

(41) D(.1, g) = sup I Pf (A) — Pg (A) I ° sup IfA (f — g) dls


AEd/I AESQ

the total variation distance between f and g. If f « g, we define the Kullback-


Leibler information number to be

(42) K(f, g) =
J flog (f/g) dtL
otherwise, we set K (f, g) = oo.

Exercise 5. Show that

P 2 (f g)}" __{K(fg)} ► " 2


2
(43) 0<1 — p(f g) D(fg) {1 —

and that

(44) D(f g)=-1 J If — gI dp.

(Runnenberg and Vervaat, 1969 consider an interesting example involving


uniform spacings.)

Proofs

Proof of Theorem 1. (Our proof will generalize to densities on R k .) We


write the lhs of (9) as I1A n II where

/ ( n f x ( { ^
On(x) ^^ cni J) — a ,a dµ
(a)
J _. (fni — aniSV /

C
n
1r 11 ^ ^
x
„—'f )(Vfni+Vj) ara ani 3 V7 d/2

^
1
(b) — C S C , L l Cni f _^ V,/n7 — VJ ara a n i/Sj2 -%/ d^A.

1 n j x ^ ^ ^

Cni [ f — vf vrf,+v/ —2VfId1


-

il

(c) = A(x) + B(x).

Now using the Cauchy-Schwarz inequality for sums at (d) and then for
160 ALTERNATIVES AND PROCESSES OF RESIDUALS

integrals at (e) gives

ant Cni z v3 — /J i
IA(x)I_ t^1 Q d^A.
' a C' C anil Qra—U^2Vf
m

(d) ^ \/ j
n Cni n Qni
r
CC i=1 QQ Qni/ Q Q
If
nyfi —V./ l 2 1 1/2

— S 2fdµ }I
11
2 1/2
C n a ni ani/ — l
—g]
(e) — E
=I aa a ni / as dµ4Jfdµ1
1 2 1/2

I =2
n a 2 Vfni-
^I ara
ani/a'atj J dµ^
(f) -+0 by (6).

Using the c,-inequality at (g) gives


n Cnil 2
IB(x)I < ma „ c , c ] I —
fi I
[

r Icnil
n 2

2
= I max —S +
L1 i n an;
cc i=1 a'Q _'_a
[a,j/V a J

(g) —] max
I<isn CC
^cnil n2,,
21 f
i=I a r a La1/

arQ
—S 2 + S2
]

l
J
0 by (5), (6), and (8).
Thus (9) holds. Note that max a „ i /a'a -+0 was not used, but is typically required
to verify (6).
In fact,
(h) if densities fn; replace f in (I) and (6), then fn; can replace f in (9).

Now one choice for c ni that satisfies (8) is to let c ni = a n1 . Then

(i) pn(a, c) = p (a, a) = 1,

Thus we can improve (9) to

1) I) Q , Q i y, ani(Fni F) — — J 26v7 dµ l -+ 0 as n oo.

Note that Cauchy-Schwarz gives

f J fdM) =IL8II<00.
I/2

J
('

(k) Sv7dµ — ( 5 2 dµ

CONTIGUITY 161
When n is fixed so large that the 1) 11 in 0) is <s, we can let x- in the
functions in 6) to conclude that IS SV7 dp.I < s ; here e >0 is arbitrary. Thus
(7) holds.
If densities fn; replace f in the previous paragraph, then 0) becomes
n n 2 ('

(l) II l r I ani(Fni - Fni) - Y a


n ' J 2S.JLdµ _ asn-co
*0

where Cauchy-Schwarz gives

(m) I J SJ7d I[S]I< 3D for all n.

The argument of the previous paragraph yields


n 2
(n) a °r 2S ✓fn; dµ - 0
Y— as n -> X.
a a -.

Combining (h) and (n) gives

(45) if densities fn; replace f in (1) and (6), then fn;


can replace f in (9) and (n) also holds.

This may be of interest.


We now show that the F, ; 's are nearly null. Now

IIF,,i —Fli
li f *W (f
— J)dFzll

J (v_7)(v+vJ) du

- f ] I [v+ f ]
<I by Cauchy-Schwarz
(o) ^ 2I [' -f ] I since densities integrate to 1,
so that
ands+ anf
(1 Fni —F ^I 2--4 I[ f ni — Vf ]I 2=4 I [VJni —VJ — a'a a'aSj IZ

8
{ I [v'_v7— l7nQ
S] I Z+ R R I[S]I2}
by the c,-inequality

Vr' -f - R^ +81 max anal S[s]^2


<_ 8 1=1
EI[ a a SJ I z Li^iSn a a.J

(p) -p0 by (6) and (4), uniformly in i.


162 ALTERNATIVES AND PROCESSES OF RESIDUALS

Taking square roots in (p) gives

(q) max IIFn; —FiI-*0 as n -*cc,


l^isn

so that the F ; are nearly null. ❑

A version of this theorem for iid observations is in Beran (1977b). This


theorem (citing Wellner) is in Shorack (1985). See Häjek and S1däk (1967,
p. 211). See Bickel (1973) and Koul (1977) for earlier treatments of nearly the
same problem, but with different emphases.

Proof of Theorem 2. Now for 1,, as in (3.4.5) we have by (3.2.11) that

(a) En=71n(Fn)+ c , c Y^ cni(Fni —


F) a.s.

Thus by Theorem 3.4.1 we have

II m n( Fn) — W(F)II ^ Il 7 n(Fn) — W(Fn)+W(Fn) — W(F)II

-5 IiZn — 'II + II W ( Fn)'( F )II


(b) -*‚0

using ^^ Fn — Fit -* 0 and the continuity of W in the second term of (b). Also

(c) E cni(FF; — F) — pn(a,c) J 2Sfdµ -+0

by Theorem 1. Combining (b) and (c) in (a) gives the E n result. [Note that
(c) with c = 1 implies Ii Fn — Fit = O(n 'I 2 ).] -

As in (3.2.29) we have

i n
(d) R n =ln(&)+ c , c i l l cn;Gn;(^„')

Since i&—III -* a . s . 0 by Theorem 3.2.1, we have [as in (b)]

(e) ji7n(&n I) —Wil -* P 0.



CONTIGUITY 163
For the second term in (d) we note that when c n = 0,
1 n F ^(t)

CniGni(Gn')—Pn(a, c) 2öv d/.t


n
CII 1 L cni[Fn1(E ()) — F( Fn l (ö n ' ))]
(f)

— P(a, c)
J F - ' (G,,')
28fdL

+IPn(a, c)I11fF-' 2Sf d1i


so(1)+1 • I
E -
1
2Sf d1i

F-(G;')1/2
by Theorem 1

(g) ßo(1)+2I[S]Ij fdµ' byCauchy-Schwarz


l f F _'
(

(h) ! o(1)+2I[8]I{IIF(Fn'(^R'))-1111 1 / 2 ,
where

IIF(Fn'(^n'))—III IIF(Fn'(.n'))—'n'+^n'—III
IIF(F')—III+II—III
(i) =IIF(')—',(')II+o(1) a.s.
IIF—F„II+o(1)
(j) -*0 a.s. as n -' o0.

Since the rhs of (h) goes to 0, and since (e) holds, the identity (d) gives the
P. result. Li
Proof of Theorem 3. All probability computations are made under P,,. Let
n

(a) Wn =T 1 where T,,,= 2[ fn. (Xne )/f(Xni) — 1 J = 2[ f 1].

Since Var [rv] <_ E(rv) 2 ,

Var[Wn — (Zn — I[ 8 JI 2 )J= 4 Var


r ^L5 -1— an,s 1
i =1 L a'a^J
n

^ 4 Y_ I [ — — a n; S/ a'a ] I
i =1

(b) -+0 by (6).


164 ALTERNATIVES AND PROCESSES OF RESIDUALS

Since (just square out the final term)

[([/j]2
(c) EWn = 2 Y_ J [ f — .i] dµ = —^ f d z,
we also have, since EZn = 0,

(d) {E[Wn — (Zn —I[6]I 2 )] } 2

l y, {I[anis/ a / a]I Z— I[VJni_ ." ]I 2 } } 2 by (c)

— IY- (I[0i]I Z— I[]I 2 )


l 2
I
_ IY, I[oi - 4'ill I[oi + 'Yi]I}

since I I[0]I 2— I[+4]I 2 I 1[0 — 'I']I I[b+O]I

[ ¢ i — fir. ] V } j j j [ 0 i + 4i; ] I Z } by Cauchy-Schwarz

={o(i)}{Y_I[ ,Li — Oi+ 2 0i]I Z } by(6)

=o(1)2j y- I[0 i — gri ]I 2 +4Z I[46 ; ]I 2 } by c,- inequality

= o(1){o(1) +4I[S]I 2 } by (6)


(e) -0 by (5).

Thus (b) and (e) give

(f) E {Wn— (Zn —I[8]I2)}- 0 asn-> x.

Now using the Lindeberg-Feller theorem and (4), we conclude that

(g) Zn d N(0, 4 1[ 8 ]I z ) as n - .

We thus have

(h) Wn ^dN( — I[s]1 2 , 4 1{S]1 2 ) as n - oo.

To establish (21) it remains to show that

(i) Ln—(Wn—I[S]IZ)^P 0 asn- x.



CONTIGUITY 165
Now LeCam's second lemma (Häjek and Sidäk, 1967, P. 205) is exactly a
statement that (h) implies (i) provided the u.a.n. condition

f` -1 >_e
(j) lim max P,,
n- .0 l^isn
n
1 =0 for all e >0;

the proof is a rather long truncation argument. Now

ePP(Ifr_1I>—E)<_EI L.

=E
{I lIl j+}

(k) <1[•/fn; — '?]I{2j[J7 —v7]{ 2 +21[vf]IZ},


where

I[^Jni — ^J — ^Jni — ^J —ani a u a ]I+I[(5]I Innil/ aua

(1) -> 0, uniformly in 1 <— i n, as n - oo by (4) and (6).

Thus 0) holds, and (h) and (i) give (21). [We proved (22) in Theorem 1.] This
completes the proof.
Note that

(46) if densities fni replace f in (1) and (6), then im can replace f
in (18), (20), and (21) provided Ep.(Z,)->0 as n-^ x,

providing a generalization. U

In the special case when all Fni = F. on [0, 1] and all c,, = 1, these results
are found in Beran (1977b).
Proof of Theorem 4. We prove only (ii). The proof of (i) can be found in
Häjek and Siddk (1967). By considering all possible linear combinations, it is
enough to prove Theorem 4 when S,, is a one-dimensional rv.
By making a new Skorokhod construction if necessary, we may suppose that

(a) (Zn, Sn ) -* p (Z, S) as n - oo

for some ry S on (Cl, .i, P); the distribution of (Z, S) is given by the rhs of
(30). Now the rv's

(b) lexp (itS,,) exp (L,)I = exp (L n ) are uniformly integrable wrt P
166 ALTERNATIVES AND PROCESSES OF RESIDUALS

by (27). Combining (a) and (b) in Vitali's theorem (Exercise A.8.6) gives
EQ,[exp (itS„)] = Ep [exp (itS„) exp (L„)]
(47) - Ep[exp (itS+Z-2I[8]1 2 )] by Vitali
1212 Var y [S]+2it Cov e [S, Z]+Varp [Z]
=ex itE +
p ( 2

—2 1[&]1 2 /
t2
=exp (it{EPS+Cov e [S, Z]}— 2 Varp [S]

+ (2 Varp [Z] —21 [&]1 2 )


)

(c) =exp (it{E,,S+Coup [S, Z]}— t Var y [S]

We note that (c) is the characteristic function of the rhs of (29). ❑


Proof of Theorem 5. Let & o ° 28, and consider the densities f and f, of Exercise
1; thus (6) holds with all a ; = 1 where .'2 (µ) = Y2 • In the present case, if we
set all c„ ; = 1, then the I„ of (11) is just U. From Theorem 3.1.1 we have the
convergence

(a) IIQJ„.•—U *a.s.O as n"-co.

Thus for any Borel set B in some R k whose boundary aB has Lebesgue measure
0, the finite-dimensional set A= H T`(B) = II ^'... , k (B) E'6 satisfies

(b) J11A(U„--)-1A(U)I) ßa.5. 0 as n"-ct;


this is true since if a path of HT(U) is in B°= (interior of B) [is in (B`)°] then
follows from (a) with 1 A (v) equal to 1 (equal to 0), while P(H T (U) E aB) _
0. Thus from (33) we have

Q„(v„E A)- n
J 1A(U) exp (Jo' 6
dU-21[60]12) dP

J0 6odU - 2 I[5o]I 2 ) dP,


(c) L (

where P is the measure induced on (C, W) by U when P is true. Now

(d) Q„(0J„ e A) = P(U„(F„)+f (F„ — F) E A) by construction


(e) - P(U+0 E A) by the same type of argument that gave (b)
(f) = Po(A),
LIMITING DISTRIBUTIONS UNDER LOCAL ALTERNATIVES 167

where (e) uses the implication of (14) of Theorem 2 that

(g) ßn0 as n-"X .

Combining (c) and (f) gives

I
(h) PP(A)=fAexpl o dP with A=11T'(B)
(

for all T and for all Bore! sets B whose boundary has Lebesgue measure 0.
Now the sets A of (h) generate W. Thus (h) implies (35). ❑

2. LIMITING DISTRIBUTIONS UNDER LOCAL ALTERNATIVES

Expressions for Asymptotic Power

Suppose that Q: X„ , , ... , X„„ are iid with dl F„, and consider tests of
Ho : F = F vs. H I : F 0 F based on some functional of the process
F F

(1) fn(F„—F).

In this section we want to briefly consider the power of such tests under local
alternatives F. satisfying

(2) II &„ — O lI -+0 as n -> m, where /(F — F) = A„(F) defines 0„,

for some function 0. The key identity is

, n(F„—F)=^(F„—F„)+^(Fn —F)

(3)

if (2) holds we naturally expect the latter to converge to U(F)+A(F)

Theorem 1. (Chibisov). If (2) holds, then

(4) IIU„(F„)+O&(F) —[U(F)+L (F)] I - a.s. 0

for the Skorokhod construction.


168 ALTERNATIVES AND PROCESSES OF RESIDUALS

Proof. See Chibisov (1965) for a version of this theorem. The proof is simple
since
IJU (F„)+fl „(F) —[U(F)+ii(F)]jl

Ilv „(F„) - U ( F„)Il +lIU(F„)-U(F)II+IIo „(F) -o(F)II


Ilv„-Ull+JU(F„)-U(F)II+IIo„ -oll
^as 0
by Theorem 3.1.1, uniform continuity of U on [0, 1], and (2). ❑

Corollary 1. For F„ satisfying (2) with continuous Fo , the statistics of Section


3.6 satisfy, as n -* oo,

(5) ./n D„ II(U +o) #Ii,


(6) K„ -*d II(U +n) +Ii+ll(U +o) -
II,
(7) Wn

for example.
J 0
(U+A) 2 dl,

Hence, for F. satisfying (2),

(8) power of the D„ test at F. = P(Vi D*n a >_ d„ a (F„ )


-$ P(II(U +0) # II ? da ) as n -+oo
where d„, « 4 da with P( IIU # II ? da ) = a, and
(9) power of the W„ test at F„ = P(Wn -- w„ a IF„)
! i \
PI (U +0) 2 dl>_w ^l asn -'oo
111 0 111
where w „,. -* wa with P(Jö U 2 dl >_ wem) = a. Additional examples come from
the E. and P. of Theorem 4.1.2.
See Remark 5.3.3 for further information about the distribution of Jö (U +
A) 2 dl.
Calculation of the asymptotic power on the right-hand side of (8) involves
(the usually difficult) computation of general boundary-crossing probabilities
for U: for example, for # denoting +,

(10) P(II(U+A)+II>—di,)=P(U(t)>—di,. - 0(t) for some 0—<t<_1).

Quade (1965) and Durbin (1971, 1973a) have given various bounds and numeri-
cal methods for computing these general boundary-crossing probabilities.
Finite-sample power calculations for D. can be found in Section 9.3.
LIMITING DISTRIBUTIONS UNDER LOCAL ALTERNATIVES 169
Remark 1. We note that Theorem 14.1.1 shows that

(11) I (F„—F)— J_ 2BV7dx1J^O asn-*


I
provided that

(12) I[Vi('17—f)-8]H0 as n->oo

for some S E 22 (µ) = 22 (Reals, Bore!, Lebesgue). Thus (2) holds with
)
f
f (13) (t)=—
F '(` ` 26 0 F '(s) _
2Sf dx= lo f F (s) ds J('

o 6 0(s) ds

by Exercise 2.1.4. Moreover, 6 0 E 22 and A(0) = A (1) = 0, since j [ S o ] 1 2 = 41[S]1 2


by (4.1.26) and since z(1)=6 0 =0 by (4.1.7). Thus (12) not only implies (11)
and (2), but allows us to apply (4.1.33) and (4.1.35) to conclude that as n -> o0

Qn (U EA)-P(U+0EA)=P4 (A)= (dP,/dP)dP J A

(14)
j
J
/
= exp I
1
So d v --
1
fo 6
l
\
I I dP, under (12),
A \ 0 2

r any set AE satisfying 1A(v„) -'p I A (U) as n -^ oo. Equations (8)-(10)


indicate the type of set A to which we would like to apply this result.

An Expansion of the Asymptotic Power of the D ; Test

Häjek and Sidäk (1967, p. 230) use Corollary 1 and Theorem 4.1.2 together
to give an interesting expansion of the local asymptotic power of the Dn test
for location alternatives. In fact their calculations are valid for arbitrary local
alternatives for which (12) holds, as we will now show. [They also apply to
tests based on the processes E. and R„ of Theorem 4.1.2 under (12), _provided
we now suppose p„ (a, c) -+ p as n --> oo and let S o = 2 p S a F - ' f - F - '.]
ac 0C

Suppose that

(15) Ind„—bbl->0 with b1(t)= J r bäd1 and 1[8]1 2 =1.


0

Then, letting A = {x E C: JIx + II > d ,} where P(A) = a,


0

power of the Dn test at F. ° P(I1Drt > d,,,^I F„)


B(a, t, b) = P(IIU+ bA) + II ^ d0) by (5)

t (16) = A exp(b SdUJ-2b 2 )dP by Theorem 4.1.2.


170 ALTERNATIVES AND PROCESSES OF RESIDUALS

Now ([exp(b jöSdU — zb z ) -1]/bl<exp(jöSdU) +2 which has finite


expectation since f ö SU = N(0, 1), and hence we can differentiate under the
integral sign to obtain

SdU)dP.
ab B(a, 0, b)1b =0= fA (Joy
a

We ask the reader to show in Exercise I below that

(17) fA
I
\
J 0
Sdv) dP= J ^ S(t) d I
0
` J A
v(t) dPI.

Thus it remains only to compute J A U(t) dP.


Now, EU(t) =0, so

J A
U(t) dP=— J OJ(t)
A
dP
= —EU(t)I A '(U)
= —EE{U(t)l A '(U)IU(t)}
(18) = — EU(t)P(A`I U(t))•
Hence, upon noting that given U(t) the processes {U(s): 0 <_ s <_ t} and
{U(s): t <_ s <_ 1} are conditionally independent so that for x <_ d
zdc d- x ) /t
P(A`I U(t) = x) = (1 — e -
-Za(d- x)1(1-t))
)(1— e

f,(x,d)

by Exercise 2.2.11 and (2.2.22), (18) yields

f A
U(t)dP —J ^xf(xd)d^
t(1—t)/

(19) _ x(1 — e -Zeca -X >/' )


—J
(1— e -zaca-Xvc1 -t)) 1 e -xZi2t('-') dx.
2irt(1— t)

Thus, by straightforward calculation

Av(r)dP= -2 «dam I
(20) at J 2 ^\ d^ (1 lt)/ —1 J
—2ad„c/i(a, t).
ASYMPTOTIC OPTIMALITY OF F„ 171

Plugging (20) into (17) yields

b B(a, A, b)Ib=o= —2ada


f S(t)a,(a, t) dt.

Hence

(21) B(a, A, b)—a ---2ad« b I (t)^i(a, t) dt as b \ 0


0

provides a first-order approximation to the power of a supremum test based


on 11U11 (or IIE„II or IIlR II). (Location alternatives are a special case with

(t)=—f'(F (t))/(foF (t)v'i) for the location case,

where I denotes the Fisher information off)

Exercise 1. Use Fubini's theorem to establish (17) when 5 denotes the


indicator of an open interval. Extend this to general 8.

Exercise 2. Carry out the computation leading to (20).

Exercise 3. For the family of alternatives F(x) = x °, 0 <_ x <_ 1, with 0< 0 < oo,
use (21) to approximate the power of the Dn test. (Take a = 0.05 and n = 64,
81, 100.)

Exercise 4. Let A = {x E D: II x II <_ t} with 0 < t < oo. Verify that A satisfies the
(4.1.31)-type condition 1 A (UJ„) . P 1(v) as n -4 oo.

3. ASYMPTOTIC OPTIMALITY OF F.

There is a nice connection between the preceding section concerning the local
asymptotic power of tests of fit and the asymptotic optimality of F. as an
estimator of F. To describe this connection, let F be an absolutely continuous
df with density f, let

( 1 ) c(f; 6) {{f,,}: I[v "(f ”-fI/2)-811- 0}

and

^(f) = V {^(f; S): 8E. ' 2 i 8±f " }

Suppose that {f„} is some sequence of continuous estimators of F. = c f„ dI


based on observation of X 1 ,,,. . . , X„„ iid F,, at the nth stage. We say that F,,
172 ALTERNATIVES AND PROCESSES OF RESIDUALS

is regulart atf if (under Pf the special construction of the X„ 's satisfies


) ;

(2) jj.I(,, —F,,) —Z(F)II - P 0 asn-*ooforall {f }e '(f),


where 1 is a process in (C, 16). Thus the law of the process 1 on C does not
depend on S.
For the usual empirical df estimator I=„ we have

(3) (F„— F„)= U„(F„) = U(F)

since, as in the proof of Theorem 3.4.1,

l!U „(F„)-U(F) v „(F„)-U(F„)j +IIU(F)- U(F)II


-> a , s. 0 as n -* oo for the Skorokhod construction

for any F,, with (1 F„ — Fly -*0, and this follows from (4.1.13) for any { f„} E (f).
Thus F,, is regular at every f in the sense that (2) holds for each of the
continuous versions of F,, uniformly within 1/n of F,,, for example, F„ [see
(6.6.1)].
The following theorem gives a representation for the limiting process 1
corresponding to any sequence of regular estimators {9 „}.

Theorem 1. (Beran) If {f „} is a sequence of estimators of F which is regular


at f with limit process 1 on C, then

(4) 7 = U(F) +W

where the Brownian bridge U and W are independent.

Thus the limiting process Z for any sequence of regular estimators {f „} of


F is at least as "dispersed” as the process U(F); in view of (3), the empirical
df F„ is optimal in the sense that it achieves this "minimal dispersion” with
W = 0.

Exercise 1. (Beran, 1977a) Suppose w: C -* R is convex, symmetric, and


continuous. Show that Theorem I implies that

(5) lim Ef w('


J ,,— F„))>_ Ew(Z) > Ew(U(F))
n
-oo

for every regular estimator It„. [Hint: Since w is convex and symmetric,

t When applying asymptotic theory one wants a certain stability; that is, the asymptotic theory
should not change severely when F is only perturbed slightly. In particular, Beran (1977a)
shows how this condition rules out certain "superefficient” estimators of the df.
ASYMPTOTIC OPTIMALITY OF F„ 173

w(U(F)) < 2 w(U(F)+W)+2 w(U(F) —W) = 2 w(U(F)+4x91)+2 w(—U(F)+W)


with —U = U.)

A much stronger version of (5) was established by Dvoretzky et al. (1956).


To state their asymptotic minimax theorem we let .' denote the class of all
df's, and let F denote the smallest Bore! o - -field on . containing every element
of 9 and all the finite-dimensional sets (see Section 2.1). Let D. denote the
class of all randomized decision rules: thus for be D", b(., X) is a probability
measure on (Jw, #'), where X = (X,, ... , X") iid Fe .t ; for each fixed AE.,
b(A, •) is measurable on R. Suppose that w: R + -^ R + is nondecreasing and
satisfies

(6)f
"

w(x)x e -2 i 2 dx < ao.


,o

Theorem 2. (Dvoretzky, Kiefer, Wolfowitz). Under the above assumptions


on w it follows that

SUPFEEFw(IIVi(F"—F)II)
(7) lim I =1.
"-.- nfbEO„
supFE^ EF[j w((^"
Ilv — F)1I)b(d^", X ))

Thus F. is "asymptotically minimax" for a "supremum type" of loss function.

Similar results hold for other loss functions, for example, nice functions of
J ,, [Iii(F„(x) — F(x))) 2 dF(x); see Dvoretzky et al. (1956) and Millar (1979).
The proof of Theorem 2 will not be given here; we refer the interested reader
to Dvoretzky et al. (1956), Levit (1978), and Millar (1979).
Millar (1979), using the methods of Le Cam (1972, 1979), also gives a
"geometric” sufficient condition for the empirical df F,, to be asymptotically
minimax in smaller nonparametric classes of df's, such as the classes of all
convex df's or all df's with increasing failure rate. Kiefer and Wolfowitz (1976)
give a detailed study of the former case.
We now briefly summarize without proof some small-sample optimality
properties of F". Let Fe F, = the class of all continuous df's on R', let 4i be
a positive continuous function defined on (0, 1), and set

(8) w(F, i ") = J (f "(x) — F(x)) 2 4i(F(x)) dF(x)

and

(9) R(F,l")= EFw(F,1").

Aggarwal (1955) and Ferguson (1967) show that this estimation problem is
invariant under the group of monotone transformations and that the minimax
174 ALTERNATIVES AND PROCESSES OF RESIDUALS

invariant estimator is given by

(10) n(x)= u l[x,:..x,., 1)(X) + 1 [x,:, ..)(x)


, ,

with

(11) u = 1,..., n-1.


J' (t)t'(1— t) "-'dt' i

In particular, when ifi(t) =1, u i =(i+ 1)/(n+2), i=1,...,n-1, so Iv" is the


estimator which gives weight 2/(n+2) to the largest and smallest observa-
tions and weights 1/(n+2) to all others is minimax invariant. When i(t)=
[t(1—t)]' so that

(12) w(F.,fn)=
f (in(x)—F(x)) Z
F(x)(1— F(x)) dF(x),

it is easily verified that u ; = i/ n for i = 1, ... , n so that the empirical df F n


is the minimax invariant estimator of F. On the other hand, Read (1972) shows
that F,, is "asymptotically inadmissible" for the "integral-type" loss function
(12) in the class of all estimators (being dominated by Pyke's noninvariant
estimator). Phadia (1973) shows that F. is minimax for the loss function

(13) w(F, F,1) =


f ( En(x) — F(x)) Z
F(x)(1— F(x))
dG(x),

where G is any df on
It is also known that F. is the "nonparametric maximum likelihood
estimator" of F; see Kiefer and Wolfowitz (1956) and Scholz (1980) for
definitions. The reader is asked to verify a heuristic statement of this in Exercise
2 below. Also, from the theory of U- statistics, F,,(x) is a UMVU estimator of
F(x) for each fixed x; see Serfling (1980).

Exercise2. Let X,,...,Xn beiidFE3 andletp = F{X; }=F(X; )—F(X; —) ;

so that 0 p; - <1 and the "likelihood" of X 1 ,..., X. is given by L(F; X) =


II"= 1 p; . Show that L(F; X) is maximized as a function of F when Y-; p = 1 ;

and p; = 1/n, and hence that F,, is the "nonparametric maximum likelihood
estimate" of F.

Proof of Theorem 1. (See Beran, 1977a) Let v be a function of bounded


variation on (—cc, x), let { fn E W(f; 8), and let Z. = Ji(F n — F. ). Then, by
}

regularity of C n it follows that the characteristic functional of Z n (under Pf,)


,
ASYMPTOTIC OPTIMALITY OF F„ 175

converges to the characteristic functional of Z, that is

(a) EJ
l
J
exp S i ^ 7L„ dv } -> E exp j i J - Z dv }
1 1 l ^ ))
as n -* oo.

By regularity of tI„ and Theorem 4.1.3, the marginal laws of (Z, L„)-
(^(F„ - F), L„) converge under Pf to proper laws. Hence the joint laws are
relatively compact, and there exists a subsequence for which the joint laws
converge to some joint subprobability law on the product space; since the
marginal laws are proper, this joint law must also be proper. In the following
we restrict attention, if necessary, only to the subsequence for which the joint
laws of (Z, L„) converge weakly.
Since (7„,L„)= (Z,2I[5]IY-2I[S]I 2 ) as n-oo, by (4.1.3) where Y=
N(0, 1), the Skorokhod-Dudley-Wichura theorem (Theorem 2.3.4) guarantees
the existence of probabilistically equivalent versions for which (R„, L„) - *e.s.
(Z, 2I[S]I Y-2I[S]I 2 ) as n -> oo. Thus it also follows, by regularity of F„, (1.9),
and (4.1.25)-(4.1.27) and by the preceding construction and Vitali's theorem
(Exercise A.8.6), that

Ef, exp {i Jz dv}

=Ef expji J ^%
I n — F)dv-i^^ n(F„-F)dv+L„r
111 ^ ^ 1

= E exp ji
t JT z dy -i J 2(3. 1(_.,Jf2)dV(t)+2I[8JI Y

-2 1[b]1 2 }+o( 1 )

(b) E exp i Jz dv+21[S]I Y}


I
xexp -i JT2(o, 1 c-ro.,]f 2 ) dv(t) -2 I[s]1 2
{

as n -, oo for any S E y 2 , S1fU 2 .


We now choose, for h E R,

(c) 5(t)= - (v(t)_J vidµ)f1 ' 2 (t),


where

J _^vfdµ ) fdµ=Varf(v(X)).
('

(d) QZ=Q2(v)= ^^v - —


176 ALTERNATIVES AND PROCESSES OF RESIDUALS

Note that

(e) S E y2i Slf' /2 , ([28] J = h,

and

(f) f T 2( 6, 1 ( co.,lf 2) dv(t) _


- J 2v3f' 12 dµ = h J ( 23 ) 2 dµ
-

= -o h.

[If H = {S €2 2 : Slf / Z } and T: H-> C0 (R) is defined by

(g) (T5)(t) = (s, I(_« ,rlf


, Z ),

then T *: C* (= dual of C0 (R) = BV( -oo, oo)) -^ H defined by

(h) (T *v)(t)=-(v(t)- J vfdµ)f'U 2 (t)

is the adjoint of r satisfying

(i) (T6, V)CO (R) = (S9 T * v),

where the left-hand side of (i) equals, by definition, the left-hand side of (f).
This paragraph is not actually used in the proof.]
For this choice of S the deterministic part in the exponent on the right-hand
side of (b) becomes -iho - 2 h 2 . Hence, letting

+Ji(v, t) =Eexp ji J . Zdv+itYt


1

denote the joint characteristic function of (7, Y), it follows from (a), (b), and
(f) that for the choice of S given in (c),

(j) ii(v,0)=Eexpi
IJ z exp -iho -2h 2

The right-hand side is analytic in h, constant for all real h, hence constant for
all complex h. Choosing h = it in (j) yields

*(v, 0) = a/i(v, t) exp (-tQ +2 t 2 )


(k) _ iG(v, t) exp ((t-o ) 2 ) exp (-2u2)
LIMITING DISTRIBUTIONS UNDER FIXED ALTERNATIVES 177
Choosing t = o, = a(v) in (k) in turn yields

(1) tp(v,0)=+f(v,u) exp (_ r 2 ),


which implies the statement of the theorem upon noting that the second factor
on the right-hand side of (1) is the characteristic functional of U(F), that is,
for any function of bounded variation v,

(m) E exp j i U(F) dv) = exp {—z r 2 (v)}. ❑


l JT
Exercise 3. Verify that (e) and (f) hold for S given by (c).

Exercise 4. Verify (i).

Exercise 5. Verify (m).

Exercise 6. By setting v= v(t,, ... , ;` t; l [ x, in (1), verify that (1)


X )

implies (4).

4. LIMITING DISTRIBUTIONS UNDER FIXED ALTERNATIVES

Let X 1 ,.. . , X. be iid F. Let F0 denote a fixed, hypothesized df. Then

(1) Dn = II(F„ — Fo)''II a.s. A * = II(F — Fo) # I as n-4co

by the Glivenko-Cantelli theorem (Theorem 3.1.1); here # denotes any of +,


—, or 1 1. Likewise,

(2) Kn= II(Fn— Fo) +II+II(F F0)11 a.s.A + +A - asn-+oo.

If F and F0 are both continuous, then

(3) L#= {tE(— oo,00): (F(t)—Fo (t)) # =A # }

is well defined.

Theorem 1. (Raghavachari) If F and Fo are both continuous, then

(4) i(D„ —,t + ) -'* d Z + = sup U(F(t)),


[EI.

v(D„ —A - ) -d Z - = sup ( — U(F(t)),


IE L.

(5) D„ — A) - d Z + v Z - and
v'i(K„ —(,1 + +A )) d Z + +Z
as n - co.
178 ALTERNATIVES AND PROCESSES OF RESIDUALS

Exercise 1. Generalize Theorem 1 by considering the weighted statistics


ll(F" — Fo) # +'(F0)1).

Raghavachari (1973) generalizes Theorem I to the case of two-sample


statistics.
Both Theorem 1 and Corollary 4.2.1 can be used to approximate the power
of the D„ tests for specified alternatives and finite-sample size n. We do not
know the relative merits of these two approximations. The exact finite-sample
formulas for boundary crossings of 6" due to Steck and others can also be
used to calculate the power of these tests; see Section 9.3. Also see Durbin
(1973a, p. 25) for a brief discussion and further references.
Now consider the Cramer-von Mises statistic

(6) W2 = f n(F n — Fo ) 2 dFo .

Theorem 2. If F and Fo are both continuous, then

(7)
Wz j

n" — ^ J
°

(F —Fo) 2 dFo
1 c°

l .
d Z = 2 f (F—Fo)U(F)dFo

= N(0, 0.2 )

with . 2 obtained from Proposition 2.2.1.

Exercise 2. Generalize Theorem 2 by considering the weighted statistic

f (F" —F0 ) 2 (F0 ) dFo .

Exercise 3. For the family of alternatives F(x) = x ° , 0 <_ x <_ 1, with 0< 0< co,
use Theorem 1 to approximate the power of the D„ test. Repeat this for the
W„ test using Theorem 2. (Take a = 0.05 and n = 64, 81, 100.)

Exercise 4. Find a lower bound for the power of the D„ tests based on the
binomial distribution. Approximate this bound for large n using the central
limit theorem and compare with the approximation resulting from Theorem
I in the same case as in Exercise 3. (Massey, 1950; Birnbaum, 1953). [Hint:
IIU„ II2:IU,(t)*I for any fixed 0 <t<1.]

Proof of Theorem 1. We give the proof only for D; the rest is an exercise.
Since (see Exercise 5 below)

(a) JJ F — Fo b) = IF o Fo' — I II = () G — I II for G = F o F^' continuous,


LIMITING DISTRIBUTIONS UNDER FIXED ALTERNATIVES 179

we shall henceforth replace F„, F, F0 , (—cc, cc) by F„, G, I, [0, 1]. Let

(b) Z„ = sup /i(F„(t) — G(t)) = sup UJ„(G(t)),


tEL` tEL'

and write

(c) I-(D„—A+)=Z„+[V(D„—A+)— Zn] =Z„+R,


where

R„ _^{ sup [F„(t)—t]—A + —sup [F„(t)—G(t)]}


0i1 tE L +

?V{Sup [F(t)— t] —,t + —sup [IF„(t) — G(t)]}


IEL^ tEL'

=V{sup [IF„(t)—t—G(t)+t]—sup [F„(t)—G(t)]}


IEL` lEL'

(d) =0

and, clearly,

(e) Z d . s. Z+ as n - cc for the special construction.

It remains to show that

(f) R„-^P0 asn-+oo.

Since L [0, 1] is compact by the continuity of G, for every k>_ I there


exist t, ... , t y E L + such that

r
(g) L+ c M ° U S,,

where

S1 ={tE[0, 1]: IG(t)—G(t 1 )^<1/k}.

On M` we have

G(t)—t<_A + —e forsome e>0,

since L + is a compact subset of the open set M and G(t)—t is continuous,


and, therefore,

(h) /i( sup [F„(t) — t] — A + ) --)'a.s. —cc as n -g oo.


t€ml
180 ALTERNATIVES AND PROCESSES OF RESIDUALS

This yields, with M = closure of M,


R„ <_ /{sup [F„(t) — I]— A + } — Z„ for n > some N. by (e) and (h)
tE M
<_ max i {sup [F„(t) —t] —A }—Z^
lsisy tES,

<— max {^(sup_ [F,(t) — t] — ,1 + ) — ^[F.(ti) — G(ti)])


I i5y tE sj

^ max sup {f[F„(t)—G(t)]—Vi[F „(t )— G(t )]} ; ;

t515y tE$y

since — t =—G(t)+G(t)— t <_—G(t)+A +


(i) wn(1 /k)

for the modulus of continuity w„ of U. defined by

w„(a) =sup {Iv „(t) — U(s)I:It si S a}


= sup {IU,(G(t)) — v„(G(s))I: IG(t) — G(s)I<_a}.

But, by Inequality 14.2.1 with S = Z, for any c > 0,


i

P("(k)>E)<_160kexp(_16 2 k2) forVi>_ek

since ir(t) in (14.2.19) exceeds z for t <— 1


=160k exp (—r 2 k /64)
(j) s for k > some k, and n sufficiently large.

Combining (i) and (j) proves (f), and (4) then follows from (c), (e), and (f).
Note that the key idea of an heuristic argument is contained in step (h).

Exercise 5. Demonstrate the truth of (a) in the proof of Theorem 1.


Proof of Theorem 2. We note that, as in the proof in Theorem 1, the transfor-
mation G = F - F reduces the problem to

fo' (IF„ — I) 2 dI— J f (G — I) 2 dl


^[ 0 ]

(a) =
I [(F„ +G)U (G) - 2MU (G)] dl

(=2
b) (G —I)U(G)dI+ o(1) a.s. for the special construction
fo,
( c) = Z+o(1).

Proposition 2.2.1 shows that Z has the claimed N(0, 0.2 ) distribution. ❑
CONVERGENCE OF EMPIRICAL AND RANK PROCESSES 181

5. CONVERGENCE OF EMPIRICAL AND RANK PROCESSES


UNDER CONTIGUOUS LOCATION, SCALE, AND REGRESSION
ALTERNATIVES

In Section 4.1 we showed that the weighted empirical process E,, and the
empirical rank process R. converged under a fixed sequence of contiguous
alternatives. In this section we specialize to location and scale alternatives,
and we show that in such cases the convergence exhibits a type of uniformity
in the parameters. The key result is Theorem 8, though we find it convenient
to let our presentation proceed slowly through several special cases before
treating the most interesting situation.
We will use these results in Section 6 to treat empirical and rank processes
of residuals. These Section 6 results in turn can be looked upon as an improve-
ment of Section 5.5 in the special case of linear regression.

Fisher Information for Location and Scale

Definition 1. If the density f is absolutely continuous on (—oo, aD) and

(1) Io(.f)- r
J
f/(x) f(x) dx<oo,
f(x)

then f is said to have finite Fisher information for location 10 (f). Otherwise,
we define I0 (f) = m.
When I0 (f) < oo the function f' is well defined a.s., and we define

(2) 0o= 0o(.If)=— f/(F_,) on (1).


.i(F)

Proposition 1. If I0 (f) <oo, then f(F) is absolutely continuous on [0, 1]


with value 0 at both endpoints, and

(3) =0 and ([40]1 2 = 10(f)•

Proof. Now by the change of variable of Exercise 2.1.4

je
—ßo= . (—f'lf)fdx= f .f'dx= tim J A f'dx

= im
f(B) — li m f(—A)
(a) =0 since . f dx =1

with f absolutely continuous as claimed. Just change variables for l[4 o ]1 2 . For
182 ALTERNATIVES AND PROCESSES OF RESIDUALS

the absolute continuity, note that the same change of variable gives

f
F '(r)
f o F (t)=
—' _ f'(x) dx = j [.f'(F '(s))/f(F
- - '(s))] ds
o

f0
^o(s) ds;

hence, the result. We follow Häjek and Sidäk both here and in Proposition 2.

Exercise 1. Verify Propositions I and 2 for the standard normal distribution


with mean 0 and variance 1.

Exercise 2. (Häjek, 1972) Show that

(4) if Io (f) <oo, then v7 is absolutely continuous on —B, B] [

for any 0!5 B < oo.

(The appendix of this paper is of general interest.)

Definition 2. If in some open neighborhood containing 0 = 1, the function


x, and if
f(x /6)/0 is absolutely continuous in 0 for every

f (x)
1, (f)= J ^ [ 1 +x(
)]

(5) f(x) dx<eo,


^ \f(x)

then f is said to have finite Fisher information for scale 1 1 (f). Otherwise, we
define I, (f) = co.
When I, (f) < oo, we define

f((F 11) 1
(6) 11+F -1 on(0,1).
^j1(.,f)=-
.i(F ) J
Proposition 2. If 1 1 (f) <oo, then xf(x) is absolutely continuous on (—co, o0),
F f(F is absolutely continuous on [0, 1] with value 0 at both endpoints, and
- ' ')

(7) *=0 and 1[01]I 2 =I(.f)•

Proof. By the change of variable of Exercise 2.1.4,

f,
(a)
— J
l+x(
— J [f +xf']dx =0,

after using integration by parts on [A, B] with B -+ oo to show xf(x) - 0 as


CONVERGENCE OF EMPIRICAL AND RANK PROCESSES 183

x oo. Thus the absolute continuity of xf(x) follows easily from Definition 2.
Just change variables for 1[,k 1 ]I 2 . Finally, F 'f(F ') =J 4 1 (s) ds.
- - ❑

The Contiguous Simple Regression Model

Suppose a„, , ... , a„„ satisfy the u.a.n. condition


z
(8) max "t -* 0 as p- •
I^i^n a a

Suppose X 1 ,.. . , X,,,, are independent with drs F„,, ... , F„„, and consider
the problem of testing the hypothesis

(9) Pn:Fni=...=Fn„=F,n?1

versus the alternative

(10) Q„:Fni =F(•—ba n; / a'a) for1<_i<n,n?1,with—oo<b<ao;

we will assume here that

(11) lo(.f) = f dx < oo.


J . (f'/.f ) 2

We will call these contiguous simple regression alternatives. [We shall prove
presently that they satisfy (4.1.6).]

Theorem 1. For testing P vs. Qn under (8)-(11), the log likelihood ratio
statistic L„= "=1 logf, 1 (X,)/f(X) satisfies

(12) L„—[bZ„—zbzlo(1)]^r0 asn-,co


uniformly in I bI <— B for any 0 < B < oo
with

(13) Zn _—‚ am f'(X 1) _E am


0o(6 )
i

i= 1 a'a .i(Xni) i =1 a'a

We note that S = —b(v7)', 25Ij= —bf', and J_.2S1jdx = —bf in (4.1.6) and
(4.1.9),
(14) the rv's -O o (e ; ) are iid (0, I (f)) under P.,
0

and

(15) {Q„} is contiguous to {Pn}.


184 ALTERNATIVES AND PROCESSES OF RESIDUALS

Moreover,
(16) Zn - ► d N(0, Io (F)), under P., as n - oo.
Finally, with p n (a, c) a'c/ a'ac'c, we have

II
(17) 1 n c,, {F(• —ba n; / aa) —F } +bp „(a,c)fII 0
;

as n-* oo uniformly in I bI s B for any O<— B < o0


for any c'= c' = (c,,, , ... , C nn )' satisfying
z
(18) max °`
c 0 as n - X.
I^i n c'c

Proof. Letting f, = f(• — ba n; / a'a ), we will verify (4.1.6) uniformly in IbI s


;

B. Let t„ denote the quantity in (4.1.6); thus we must show that

(a) in_ b2a, f' 1 ✓f(x—ba n ,/ a'a)— ✓f(x)


+h'(x) ] 2 dx- 0
t a'a ban;/ a'a
uniformly in I bI < B as n -> oo, where h = f.

That is, we are verifying (4.1.6) with

(b) 8 = —bh' = —bf' /(2f) and J 28 s[fdX=_b J f'dX=_bf(X).

We follow Häjek and Sidäk (1967, p. 211). Since max Ia„ I/ a'a - 0, in order ;

to establish (a) it suffices to show

h(x —r)— h(x)


(c) T, =J +h'(x) 2 dx -*0 asr -- 'Owithhv7.
r

Since I0 (f) < oo, we know from Exercise 2 that h = f is absolutely continuous.
Thus the integrand in T, converges a.s. to 0 as r -oc. It thus suffices to show
that

(d) lim
f h(x— r) —h(x) Z l f
dx< `° [h'(x)j 2 dx = Io(i)l 4 .
.-. J -^ r J J

To this end we note by the fundamental theorem of calculus that

)[ hx_r_hx
( ( ]2 [ 1 )

Jr h'(x —y) dy] 2 ^ r [h'(x —y)] 2 dy /r


r r Jo J0
(e) = H(x),
CONVERGENCE OF EMPIRICAL AND RANK PROCESSES 185

and H is a suitable dominating function since

J H(x) dx<_ J J[hF(x_y)}2 dydx/r

=JJ o - W
[h'(x — y)] 2 dx dy/ r by Fubini

(f) = J 0
fI0 (f )/4} dy/ r = Io (f )/4 <00.

Thus (c) holds by Exercise A.8.8.


We have thus verified that (4.1.6) holds uniformly in bbl <_ B. Now check
and note that the proofs of Theorem 4.1.1 [and Theorem 4.1.3] are such that
this uniformity of (4.1.6) in bI <_ B implies the uniformity of our present
conclusion (17) [conclusion (12)] in Ibi <_ B. The claimed uniformity of the
proof of Theorem 4.1.1 is easy. We note that the claimed uniformity of the
proof of Theorem 4.1.3 must be worked back through our reference to Häjek
and Sidäk and through their reference to Loeve; we also note that we will not
use the uniformity of (12) in J b1 s B. 0

An Associated Special Construction for Contiguous Simple Regression

Suppose now that c,, ... , c,, n >_ 1, satisfy


i

(19) max c, - 0 as n -> oo


tin cc

and introduce from Theorem 3.1.1

(20) W(t) = 1 c„ß[1[t,,<,^—t]


, for 0_ 1
C t_1
C ,
CCi=j

for which

(21) TINA„—WlH^0 as n-oo for all co

with W = Brownian bridge. We now define

(22) ej =—e, ; =F '(f ; )


- forl<i<n

for the F of (11) and [note (10)] we define

(23) 1'^^ = ba,"i + E for 1 <_ i <_ n.


;

This is our special construction of the contiguous simple regression model (10).
186 ALTERNATIVES AND PROCESSES OF RESIDUALS

We define

(24) Z (Y)= C C ;_,


E ,
cni[1[r,;,^y1—Fn1(Y)] for —ao<y<m

n L
1 '` Y —V ani
(25) '— r L Cni 1 [r,1 y—ba.,1/ ✓ a'a)—FI
Q^Q

Theorem 2. For the special construction of (19)-(23) we have


(26) II7Ln —W(F)II -h, 0 uniformly in Ibi <_ B as n - o0
for any 0 <_ B <cc. [Note also (17).]

Proof. We let Y_ ++, E +-, Y_ -+, E --, denote the summation over those indices
i for which (c are (?0, ?O), (>0, <0), (<0, ?0), (<0, <0), respectively.
„ ; , a n ,)
We suppose initially [see Loynes, 1980 for the key idea we introduce in (a)] that

(a) b <_ b <_ 6 for some "short” interval [b, 6] c —B, B]. [

Then for all b E [b, 6] we have

1 ban; '

(b) Z b
Z(y) c

(^
C , C L Cni 1[Eni^v na n / ✓a a] — F^ a , a f
- -

1 —ba y-6a „;
+ c , c cni [ F( '”
) —( a , a )

+ {analogous terms for , , and }


1 1 + -

i n
(c) Y- Cni[ 1 E.,^Y-d,J
I — F(Y — dni)]

+1 1 „ Cni [F(Y — dn;) -- F(Y)]

1 n
i cni[F(Y — eni) — F(Y)]

(d) =Zn(Y)+{Rn(Y)},

CONVERGENCE OF EMPIRICAL AND RANK PROCESSES 187
where

ba„;/ a'a 6a„;/ a'a ^,

ba/ a'a ba„;/ a'a Y-,


(e) dni = and for iE
band a'a e„; = banil a'a

ba„ / a'a
; ba„;/ a'a .

From (17) we have

i n
(f) Rn+ cn;(dni—eni)ff,0 as n- o;
C C;_^

however, If I is bounded when I(f)<c and

(g) I r„rc(dnr—eni)J (b—b)+1 ( ^",^ I Q;R <(b—b)

so that

(h) n- 0
i:

Also, Theorem 3.4.1 shows that

(i) 11Z —W(F)II-v0 asn-4oc.

Corresponding to the upper bound of (b) is a completely analogous lower


bound, that leads to its own versions of equations (h) and (i). Combining
these two (h) and two (i) equations gives, for any 0> 0,

(j) lim sup IIln—W(F)II <_ 0 +(b—b)II.f1I


n-. he[b,bj a.s.

(k) <-0 provided 6—b<e o = 0 /[IIfI]•

If we now cover [—B, B] by a finite number of intervals [b, b] of length less


than 00 , then we can conclude from (k) that

(1) tim sup IIZn — W(F)II- 0 ;


n-.00 he[-B,B]

and since 0 > 0 is arbitrary, the theorem follows. ❑


188 ALTERNATIVES AND PROCESSES OF RESIDUALS

The Contiguous Linear Model with Known Scale

We wish to rewrite the usual linear model Y = Xb + E (special case Y, = a,b + a,)
in a form suitable for consideration of local alternatives (special case Y =
ba i / a'a + e 1 ). To this end we writet

[
bx
1 11
+...+ bpx p
(27) Q n: Y= 2
/
+Ei, 1:i_n,

where e i , ... , s n are iid F with

(28) 10(f) f (.f'lf) 2.fdA<m.

Let F, denote the df of Y in (27). The null hypothesis is

(29) PP : Y=e,, 1 <in.


For applicability of Lindeberg-Feller we will assume that

1
I
n
(30) max max x / X2•j I -* 0 as n -^ oo;
1 < j`p 1n i'=1

this will also make {Q „} contiguous to {P}.

Theorem 3. For testing P. vs. Q„ under (27)-(30), the log likelihood ratio
Theorem
statistic L"n satisfies

L „ —Z „ — ZVar[Z„]J p.0 asn -+x


(31) uniformly in [ max I bj ] s B for any 0 <_ B < cc,
t—Jsp

where
n y \
(32) Zb = _ n 61x;1 ^.. • -^ b xip lI f (Ei )
n ^^ n 2 ^` n 2 J
i=1 Li' =t xi'1 Li' =1 X f(e,)

We note that S = —(f )', 2S/f = —f'/f and ,( 2 8/j' dx = —f in (4.1.6) and
(4.1.9),

(33) the rv's — ^) are iid (0, 10 (f)),


f(Ei)

t Actually, x x j is indexed by a suppressed n and Y1 = Y ; is indexed by n also.


CONVERGENCE OF EMPIRICAL AND RANK PROCESSES 189

and

(34) Q„ is contiguous to P.

Moreover, provided the variance does not go to zero,

(35) Zn /.iVar [Z" ] -' N(0, 1)


n as n - oo.

Finally,

v
— F) +.i , cn^ blx^l ^0
1 ' Y. cni(Fni
2
(36) (I

as n - 0o uniformly in [ max I b; ] <_ B for any 0< B < oo.


1 ^J-P

Proof. Only minor changes are needed in the proof of Theorem 1 [to verify
that the obvious analog of (4.1.6) holds] and in Theorems 4.1.1 and 4.1.3 [to
verify that these proofs carry over under the new version of (4.1.6)]. Our
present theorem is an immediate corollary to the new versions of these last
two theorems. ❑

We now agree that i n; , W,, and W are as in Theorem 3.1.1 [or (20) and
(21)] and

(37) e ; =e n; =F - '(en; ) for 1 Simon.

This is our special construction of the contiguous linear model. Let

(38) Z^(Y) Y_ cn1{1[y,,,.Y1—FF1(Y)} for—co<y<oo.


CC

Theorem 4. If (27)-(30) and (37) hold, then

(39) IIZn—W(F)II-> P 0 uniformly in [max IbI] B asn -)-oo


1 ^l^P

for any 0: B <ca. We also rewrite (36) as

1 " P
(40) , Y_ c„ ; (Fn; —F)+f Y_ b3 p n (X„ c) as n-*cc
c c 1 ;=e

niformly in [max I b; 1 ] < B for any 0 ^ B < oo,


<l^P

where X; = (x, ; , ... , xn1 )' is the jth column of X.


190 ALTERNATIVES AND PROCESSES OF RESIDUALS

Proof. This is just a minor variation on Theorem 2; only the notational


complexity has increased. ❑

The Contiguous Scale Model

Let X. , ... , X„„ be independent rv's with continuous df's F. , ... , F„„. Let
e1,..., enn be an array of known constants satisfying condition (8) above.
Let d E (—co, oo) and consider the hypothesis testing problem

(41) P„: F„ ,=... = F „ =F n >1,


F ,

Vs.

(42) Q„:F„ ; = F(•(1 —de„ ; / e'e)) forl<—isn, n >1,

where we assume finite Fisher scale information

(43) I,(f)= J [l +xf' /f ] 2fdx<oo.

We call these contiguous scale regression alternatives; if all e„ = 1, the main


;

case of interest, we call them simply contiguous scale alternatives. Note that
(42) corresponds to the model

(44) Q: Y= e /(1— de ; / e'e ), 1 ^ i:5 n, where r , ... , E. are iid F.


;

°
Theorem 5. For testing P„ vs. Q „ under (41)-(43) the log likelihood ratio
statistic L„ satisfies

Ln°—[dZn—zd2II(f)] > P„ 0 asn-* o


(45) uniformly in I d I <_ D for any 0< D < oo

with

(46) Z„ Y- e„;e —1—


Xf`(Xni)) nt
_ E^ e , e 0.(6)•

We note that 3 =d(f/2 +x(Vj)'), 2Sv7=d(f+xf'), ,J 2Svfdx=


—dxf in (4.1.6),

(47) the rv's /,(f ) are iid (0, I,(f)),


;

and

(48) Qn is contiguous to P.
CONVERGENCE OF EMPIRICAL AND RANK PROCESSES 191

Moreover,

(49) Z„ -* d N(0, I(f)) as n-aoo.

Finally,

(50) I) ^^c;SFI •I1— de II—F}+dp„(e,c)xfjl-^0


," asn-oc
cc;_ t \ \ eelll/
uniformly in d I < D for any O<_ D < oo

for any constants c„, , ... , c„„ satisfying (19).

Exercise 3. (i) (Häjek and Siddk, 1967, P. 214) Verify that (4.1.6) is
satisfied with a 1 / s/ replaced by e m /V? and with the d defined in Theorem
5.
(ii) Show that (50) is now a minor corollary of Theorem 4.1.1.

An Associated Special Construction for Contiguous Scale

From 6„ ; , W„, and W as in Theorem 3.1.1 [or in (20)-(21)] we let

(51) E =s„ =F '(1'„ )


; ; - ; forl<_i^n

for the F of (41) and [note (44) with all e„ ; = 1]

(52) forlsisn.

This is our special construction of the contiguous scale model (44).


We define

(53) Z^(Y)= 1 Y- c„,[ 1 [y,, , y] — F;(Y)] for—co< y <co

n
(54) = 1 , Y- c„ r[ 1 [E.,i^Y(i-dIVn» — F(Y( 1— d/ ✓ n))]•

Theorem 6. For the special construction of (43) and (51)-(52) we have

(55) 111 — W(F)JI - 0 for all w, uniformly in I d J <_ D, as n -x

for any 0<_ D<oc. [Note also (50).]


Proof. Note that

(56) 7L„(y)=W (y(1—d// )) for —oo<y<co.


192 ALTERNATIVES AND PROCESSES OF RESIDUALS

Thus

IIZn W(F)II II'n(F(• (1—d /.


— n))) —'(F)II
IIVn(F(• (1 — d / ./ )))—W(F(• (1 — d /Vi)))l
+ (1— d /Vi)))—W(F)II
(a) <lIWn—Wlj +o(1) uniformly in IdI<D
(b) -*0 asn-oo,forallw,by(21)

using II F( • (1 — d / .,/ )) — F^I -*0 and uniform continuity of the sample paths
of W. ❑

Exercise 4. Prove a version of Theorem 6 for general e n; .

The Contiguous Linear Model with Unknown Scale

Let Yn ,, ... , Ynn be independent rv's with continuous df's F n ,, ... , Fnn . Con-
sider the hypothesis testing problem

(57) P: Y ; = e i , 1 i n, with e, , ... , s n iid F

vs.

b1xil +...+
(58) Q"n .d : n — n 2
b x,,
n 2
e' =Fni,
(1 —d /^)

1 < i <_ n, where we assume that F satisfies

(59) Io(f) <co and I1(f) <co

and the constants satisfy

x
(60) max ax n 2 -*0 as n -> oo.
1_j^p ki^n I x.,.

The special construction now takes the form

(61) e; =eni= F - '(enr), I: i- n,

for e 's, W,, and W as in Theorem 3.1.1 [or in (20) and (21)]. Let
;

n
1b for —oo<y<oo
(62) ' d (Y)= cn 1{1[ ,,,y] —Fni(Y)}
CONVERGENCE OF EMPIRICAL AND RANK PROCESSES 193
for c„ ; 's that satisfy
2
(63) max 0 as n - 'x.
Ie, n c'c

Theorem 7. If (57)-(61) and (63) hold, then for the special construction (61)
we have

(64) II7L" d —W(F)jI _ P 0 uniformly in {Id l v [max jb; I ]} <_ B as n -*oo


1^l 2 SP

for any 0 <_ B <. Moreover,

in p

c'c;Y, cn,(Fn; — F')+f^Y b;Pn(X;,


^ c)+YfdPn( 1 , c)II -4 0 as n-oo
(65) (I
uniformly in fId I v [ max I b; I ]} <_ B for any 0 <_ B < oo,
► ^J P

where X; = (x, ; , . .. , x,,; )' is the jth column of X.

Now consider the weighted empirical process

n
(66) En(Y) = 1 Y- c.^► {1fr.;cv1—F(Y)} for —oo<y<oo

and the rank sampling process

I ((n+ ► )n
(67) R (t)= c c ;= ► c„on; for0--t^1.

Theorem 8. If (57)-(61) and (63) hold, then as n->co we have both

(68) II E —jW(F)—.f Y_
i =l
b1Pn(X,c)—yfdp (I,c)
JII -P0
and

(69)
V (

In — j W—f o F ' -
P
bj p n (X; , c)1 II -> P 0 provided c„ = 0,

uniformly in (I d^ v [max, j , p ^ b; I] } ^ B for any 0-5; B < oo.

Recall from the definition (1) and Propositions 1 and 2 that both

(70) f and yf are absolutely continuous on (—co, oo), in (68),


194 ALTERNATIVES AND PROCESSES OF RESIDUALS

and

(71) foF - ' and F 'f o F


- - ' are absolutely continuous on [0, 1] in (69).
Proof of Theorems 7 and 8. We can essentially obtain (65) directly from (36)
and (50) by writing [let 13 denote the term inside [ ] in (58)]
;

i n
c , c Z c (F,, - F)
iI
1 ^
c [F((' - ß)(1 - d/ ✓ n)) - F( ( 1 -
dlv'i))]

1
(a) + Y^c^ 1 [F(•(1-d / ,/ )) -F].

The second term in (a) gives the second term in (65) by applying (50). Applying
(36) to the first term of (a) gives the first term of (65) —but with f( -(1 -
d/Vi))(i - d/Vi) instead of just f; however, the I II-distance between these
two functions goes to 0 since f is absolutely continuous and f converges to 0
as
Now consider (64). Theorem 2 gives

(b) III „ d —W(F(• (1— d /./)))II .- 0


uniformly in b and d as claimed. Then

(c) IIW(F(•(1- d /Vi)))-W(F)II- 0 for all (o

uniformly in I d I < B. Thus application of the triangle inequality to -


W(F)II via (b) and (c) gives (64).
Equation (68) follows immediately from (64) and (65). Equation (69) is
just two applications of steps (f)-(j) in the proof of Theorem 4.1.2; check
Theorems 1 and 5 for the appropriate 8's to use in the two pieces.
See Häjek and Sidäk (1967, p. 225) for this theorem with p = 1, known
scale, and b fixed. ❑

6. EMPIRICAL AND RANK PROCESSES OF RESIDUALS

Consider the linear model

(1) PP: Y =Xß+oe wheree 1 ,...,e areiid F

and the df F has

(2) Io(f) <oo and I,(f) <x'.


EMPIRICAL AND RANK PROCESSES OF RESIDUALS 195
Here X = X (") is an n x p vector of known constant satisfying
n 1
(3) max max x / x 0 as n -^ co,

/3 is a p x 1 vector of unknown parameters, and a> 0 is unknown.
We suppose that

(4) ß and o denote estimators of (3 and o,

that satisfy

(5) x (ß; —(3; )=Op (1) fort jp

and

(6) V-n (Q/ o —1) = O o (1).

The standardized residuals are defined by

(7) E.=(Y^—X;/.)/ ,

where

(8) X;=(x,...,xj,) denotes the ithrowofX.

Recall that

(9) X; = (x, , ... , x" )' denotes the jth column of X.


; ;

The standardized ith difference d between the fitted value X;/3 and the theoretical
;

value X;ß is

(10) ã1 X(j—ß)/ô

t i11 n = x`' (
"
^//^ 2 P'—ß')=—
v 2 [„ %^ x`' 2 bl
.,. xj
i n
f =i =1 X^l +_I (7 1=1 V 1^=1 X^)

[recall (4.5.58)].
We introduce constants c about which we assume
;

(12) max --*0


, as n -> oo.
1 inncc
196 ALTERNATIVES AND PROCESSES OF RESIDUALS

We now define the weighted empirical process of standardized residuals

(13) L(y) = , E cn1{ 1 [E,^ vl — F(Y)} for —oc<y<00.

We also define the empirical rank process of standardized residuals

1 un+nn
(14) A„(t)= c, c E c,,,, for0<t^1,
1=I

where R,,, , ... , R n „ are the ranks of ei,., i„ and where

(15) D,,, , ... , D„„ denote the antiranks defined by R n ,% = i.

The special construction of these processes supposes

(16) e ; =s nj = F '(en ,)
-

for the f,, ; 's, W n , and the Brownian bridges W and U of Theorem 3.1.1.

Theorem 1. Suppose (1)-(6) and (12) hold. Then as n +oo both

n—{W(F) +f
c'X(ß —ß)
+YfP n(c,1)•
l
(17) (l
C,C^
i( —i ) - 0

and

(18)
I^
U8„ — 4M+Ja F 'C'X (ß—ß)
c c )
,

^II
^ P 0 provided c„ =0

for the special construction of (16). We note that

(19) _° c'X(ß ß)
c'co

Y- P,(X , c) Xi Xi 'Q
i= ►
(ß A)—

when X, denotes the jth column of X.

As in (4.5.70) and (4.5.71), we remind the reader that

(20) f and yf are absolutely continuous on (—cc, cc) in (17)

and

(21) foF - ' and F 'f o F


- - ' are absolutely continuous on [0, 1] in (18).
EMPIRICAL AND RANK PROCESSES OF RESIDUALS 197

Proof. We write

E.,(Y)= 1 , Y_ c,,; 1 [faur a)] — F( (Y—d))


}

(a) + c , c ^^ {cnjF(^(Y — d;) I — F(Y)}

(b) =7L„(y)+M„(y).

Applying (5) and (6) to (11) we see that the b ; 's of (11) and

(c) d=— ✓n(Q/a-1)

are such that for all E > 0, there exists B E such that

(d) P({Idlv[max (b; I]}<BF )>1—e for all n 2t some n,.


I -ci^n

Thus from (4.5.64) we conclude that

(e) II 2 n — W(F)11- v 0 as n-'.

A similar application of (4.5.65) to the M. of (b) gives

(f) 111„+.f I b;P^(A' c)+Yfdpn( 1 , c) 0.

Applying (e) and (f) to (b) gives the approximation

(g) W(F)+f Xi%{i (ß,—ß^) Pn(X,,c)+yf✓n(^—I)P.,(l,c)

X cp-
i =1 Q Q

c P) +Yfp.,(c, 1)v (
(h) W(F)+f — 1)

for L„(F). We can replace v and 0-in the middle term of (h) since its coefficient
is Op (i). Once this is done, the resulting version of (h) is (18).
This is an improved and extended version of the location-scale case of a
theorem of Darling (1955), Durbin (1973b), Loynes (1980), and others. It can
be found in Shorack (1985). ❑

Classical and Robust Residuals

It is often the case that an estimator ß of ß has the property that

c'X(ß—(3) — c'X(X'X) X'e -

(22)
c'ca a c'c '
198 ALTERNATIVES AND PROCESSES OF RESIDUALS

where for some function +l, we have

(23) ;
c(E^ 2
e = E^ '( 1 E
=ir o (e ; ) are (0, v 2 ) with v 2 =^ E (E) 2^ .

Indeed in the classical least-squares case we have ß = (X'X) X'Y, so that -

(22) and (23) hold with 4i(y) =y the identity function. For the classical
maximum likelihood estimate (MLE) we often have (22) and (23) with 4r =
—f'/f and E+i'(s)=Io ( f). In robust regression i41 is usually a bounded con-
tinuous and odd weight function. It is also usually appropriate to suppose that

(24) c is in the vector space spanned by the columns of X.

It is also often the case that

(25) ✓n^^ -1 J a
=
a 2
1 •/n
Q
2 1
= ^1(e1)
a T r=j

for some function ei l .

Theorem 2. Suppose (1)-(6), (12), and (22)-(25) hold, and also

(26) p n (c,1)->somep cI as n- oo.

(Choosing c„ so that p c , =0 seems useful.) Thent

(27)II9n — {w(F) +.f f iAo dW(F)+yfpci


J . c ' dU(F) } (I ° 0
and

(28) IIIB^ — {W+f ° F- ' dW} II -^° 0 provided c„ = 0.


J +✓io(F - `)

Of course, the special case c = 1 of (27) gives

(i( —F)— j U(F) +.f J 410 dUJ(F)+yf


(29) t J dU(F)1II ° 0

for the empirical df „ of the e 's. ;

f We define J ,
41dW(F)= jö cr(F `) dW.
-
EMPIRICAL AND RANK PROCESSES OF RESIDUALS 199
Proof. The basic observation is that

(30) c'X (X'X ) X'e = c'e


- under (24),

since X (X'X) X' is the matrix of the projection mapping onto the column
-

space of X; it is due to Pierce and Kopecky (1979). Thus we first alter the
middle terms of (17) and (18) by writing

c'X(ß—ß) — c'X(X'X)-X'e
by(22)
c eor Q c'cQ

_ c'e
by (30)
c'c0,

(a) = +Go(F ') d W„


J -

(b) = J t]i o (F ') dW.


- by Theorem 3.1.2, since gio (F ') E YZ
-

qi0 dW(F) by notational convention.

We likewise alter the last term of (17) and (18) by writing

(c) ✓n(Q/o, - 1)= J . qi,(F ')dUJ„


-
by(25)

(d) a/i(F) dU by Theorem 3.1.2.


a ao

Finally, we can replace p„(c, 1) by its limit p. in (17) and (18) since its
coefficient has 11 11 that is Op (1). We remind the reader from Remark 3.1.1 that
in order to talk simultaneously of W and U, we must have p„(c, 1) converging.

Remark 1. In the usual regression situation it is appropriate to suppose that

(31) the first column of X is 1.

Then we define

(32) Qv„ _ I„ o F ' to be the empirical df of the ; = „; = F(s ; ) = F(8„ ; )


-

and

(33) UJ, = ^[(^„ — I] on [0, 1].


200 ALTERNATIVES AND PROCESSES OF RESIDUALS

Tests of the hypothesis that F is the correct df in (1) based on functionals of


the estimated empirical process .[F„ — F] can most conveniently have null
hypothesis distributions tabled when we rewrite (29) [recall (20) and (21)] as

(34) 116" — UJ iI -O as n -4 co,

where

-^fo F-i qt
(35) Ü = U — .f'° F -1 ^o(F ') dUJ —F
- 1 (F -1 ) du.
0 0

The asymptotic power of such tests would come from a version of Theorem
2 for contiguous alternatives of a non-location-scale type, or from (5.5.15) to
(5.5.17) below.

Remark 2. Does the matrix X (which we still assume to have first column
1) contain all the regressors that are needed, or should our model (1) be
extended to

(36) Q,,: Y=(X Z)j P i +s,

where the columns X; of X are orthogonal to the columns Z1 ,. . . , Zq of Z?


Let

(37) c denote any vector spanned by the columns of Z

(thus c„ = 0 = p c ,) and form the processes

(38) L„ and Ü8c. [formulas (13) and (14) with the present c].

(In fact, we could form q asymptotically independent processes.) Then these


processes could be used to test the P. of (1) versus the Q. of (36). The
asymptotic power of such tests comes from an easy version of Theorem 2
under contiguous alternatives of a location type.

Exercise 1. Derive a general formula for the covariance function of the 1J of


(35).

Remark 3. Tests asymptotically equivalent to those of Remarks 1 and 2 can


be based on appropriate "quantile" processes.
CHAPTER 5

Integral Tests of
Fit and the Estimated
Empirical Process

0. INTRODUCTION

The basic problem that motivates this chapter is to determine the distribution
of the Cramer-von Mises goodness-of-fit statistic

W„ _
ao
n[1 „(x) — F(x)] z dF(x) = f l U(t) dt.
o

Just as an n X n covariance matrix can be represented as _ A yy; ;

where the A are eigenvalues and the y, are orthonormal eigenvectors of T, so


;

too the covariance function K of many processes can be represented as

(1) K(s, t) = E kj (s)fj(t)

for functions f orthonormal with respect to the 2 metric. Let Z*, Z*.... be
'

iid N(O, 1) rv's. Just as E Z*y is a N(O, ) rv, so too the process X
;

defined by

(2) X(t)= Y_ 4Z*f(t)


i =1

is a normal process with mean-value function 0 and covariance function K.


Integrating (2) we see that
Cl
(3) RCZ(t) dt = _ A Z* 2
;

o j1
201
202 INTEGRAL TESTS OF FIT AND ESTIMATED EMPIRICAL PROCESS

is distributed as a weighted infinite sum of independent chi-square (1) rv's.


Section 1 goes into this heuristic treatment at greater length. Section 2 lists
the basic properties of 22 kernels K, and then uses them to give a rigorous
version of (2) and (3) for a fairly general process X.
In Section 3 this general theory is applied to obtain the Kac and Siegert
decomposition (2) of U and the Durbin and Knott decomposition of v „. Just
as (3) follows from (2), the distributions of the Cramer-von Mises statistic
Wz and its limiting form W2 now follow. Appropriate tables and power
,,

computations are also presented in Section 3.


Since the weighted Anderson-Darling statistic A , = Jö U (t) J,(t) dt, with
,

weight function t'(t) = 1/[t(1— t)], can be written as An = Jö X Z (t) dt where


X(t) =1J„(t)v has covariance function Kx (s, t) = Ku (s, t) iJi(s)ç(i(t), the
decomposition of such processes X and kernels Kx with general weight
functions i'/i is also discussed at the end of Section 3. The specialization to the
particular weight function a/i above that yields A n is summarized in Section 4.
Sections 1 through 4 provide the theory for using integral tests of fit like
W„ to test whether or not a sample comes from a population with completely
specified continuous df F. In Section 5, we allow F to depend on a parameter
9 whose value is unknown. The natural extension of W„ is then W ,
J n[F„(x) — FB(x)] 2 dF;(x) for some estimator B of 0. After a change of
.,

variable we find W =J U(t) dt for an appropriate estimated empirical pro-


, _
cess U,,. We first follow Darling and determine that the natural limit of the U.
process is a process of the form (when 0 is one dimensional)

(4) OJ =U +Zg,

where g is a known function and the ry Z arises as the limit of rv's Z. of the
v
type f ö h d v„. The covariance of turns out to be of the form (the details are
simplest if 0 is an efficient estimator of 6)

(5) Kv(s, t) = K u (s, t) — 0(s) (A(t)

for an appropriate 22 function Q,. Results on convergence of v„ to VJ are


summarized for location, scale, and location-scale cases; proofs with complete
details are avoided since they would lead to "dirty” theorems. Section 6
considers the distribution of the natural limit W2 = J, U 2 (t) dt of W. For this
a "clean" theorem is available, and various tables of percentage points based
on it are presented. It is much more difficult to obtain the components Z;
of (2) for Kv than it is for Ku , but work of Durbin et al. is summarized in
Section 6.
Different statistics have led in a natural fashion to different sets of orthogonal
functions f in (3). Later authors turned the question around and began with
a convenient family of f3 's and then investigated how well the family worked.
This approach is considered in Section 7. Asymptotic efficiency of such tests
is examined. If we push this point of view to its logical conclusion, we find
MOTIVATION OF PRINCIPAL COMPONENT DECOMPOSITION 203

that with proper choice of f, (and an 12,13,... that do not enter in) we can
have an asymptotically efficient test. We do not seriously suggest this, but it
seems instructionally useful.
In Sections 5 and 7 we estimated 0 by 9 and then tested whether or not
FB was sufficiently close to F by means of the statistic W. In Section 9 we
turn this around. We assume that F(• — 0) is the correct df for some value
of 9. We then estimate 0 by the value 0 that minimizes W„ _
J °n[0 „(x) — F(x — 9)] 2 dF(x — 0). Blackman's theorem on the distribution of
this 9 is presented.
The monograph of Durbin (1973a) has a fairly large intersection with this
chapter; both pieces of the symmetric difference are nonnegligible.

1. MOTIVATION OF PRINCIPAL COMPONENT DECOMPOSITION

Statement of a Problem

We have seen earlier that under the null hypothesis, the Cramer-von Mises
statistic reduces to the form

(1) W = j^ [L„(t)] 2 dt
0

(2) --id W 2 =f [U(t)] 2 dt as n->co,


0
where —► d follows from Example 3.8.3. The determination of the distribution
of rv's such as W 2 began the application to statistics of the type of methodology
to be presented in this chapter. _
Let Y denote an m xl vector whose ith component is U(i/(m+1))//m.
Then Y' Y-+ W 2 a.s. as m -* oo. In the next subsection we decompose an
arbitrary (0, T) random vector Y into its principal components, and then
express Y' Y in terms of these principal components. Following that, we give
a heuristic treatment (to be rigorized later) for such a principal component
decomposition of a process X on [0, 1] and then express 1 1 X 2 (t) dt in terms
of these principal components. Thus everything we do for processes, covariance
functions, principal component decomposition, and the distribution of the
integral of the process squared is but a natural extension of what is familiar
in terms of vectors, covariance matrices, principal components, and the
distribution of sums of squares.

Principal Component Decomposition of a Random Vector

Consider an n X I random vector

(3) Y=(0,4).
204 INTEGRAL TESTS OF FIT AND ESTIMATED EMPIRICAL PROCESS

We know by the principal axes theorem that there exists an orthogonal matrix
r (with rows y ;, ... , y„ that form an orthonormal basis for n- space) for which
(4) Z =rY is =(0,A) with A =r4r',

and A is a diagonal matrix with diagonal entries ,1, >_ • • > A n that are > 0
(>0 for nonsingular T). The coordinates

(5) Z; =y;Y =(y, Y) forlj <_n

are called the principal components of Y; note that the A 's and y 's are ; ;

obtainable as solutions of the equation

Y_
(6) y' = Ay'.

Of course, we have
n
(7) Z=F'AF= AJ[yy;]•
l= 1

Note that [ yy;]y = y; (y;y) = (y, y;)y is the projection of y onto the unit vector
;

in the y direction; thus each of the n matrices


;

(8) [ yy;] defines the projection onto y . ;

Also,
n n
(9) Y= (Y, yy)y; _ Z;y;

(10) =E IA Z7 -y j 3 where Z* = Z / ; are uncorrelated (0, 1) rv's


i=1

represents Y in terms of its principal components Z in (9) and in terms of ;

its normalized principal components Z* in (10). From (9) or (4) we have

(11) Y'Y= (yi,


n
y) 2 =Y _E
n
Z;
n
AZ *2•
Finally, we note that

(12) Zr,. . . , Z n* are independent N(0, 1) if Y = N(0, T).

In this case Y' Y is distributed as a weighted sum of independent chi-square


(1) rv's, where the weights A are the eigenvalues of the covariance matrix
;

and the Z*'s are the projections of Y onto the associated eigenvectors.
MOTIVATION OF PRINCIPAL COMPONENT DECOMPOSITION 205

Principal Component Decomposition of a Process—Heuristic

Processes with any sort of smoothness property typically have their trajectories
completely specified if we know the trajectories at all points in a countable
dense set. Thus we believe that a countable number of rv's should be sufficient
to define a process like the Brownian bridge, say; see Exercise 2.2.15. Thus it
seems reasonable that the harmonic decomposition of a general zero-mean
process {X(t): 4<_ t< I} might take the form [see also (9)]

(13)X = Y_ Zif,
where Z; 's are uncorrelated zero-mean rv's and f's are orthonormal functions
on [0, 1]. We will use the notation

viz
(14) (g1, 8z)= S1(t)Sz(t) dt and ![g]I = (g, g)"2=
Lfo, g z(t)di

If the representation (13) is correct, then purely formal calculations suggest

(i 5 ) (X,fk)= Y- Zi1j,Jk j = Zi(Jj,fk)=Zk

and also

K(s,t)= Coy [X(s),X(t)]=E^ Zijj(S) £ Zkfk(t)


k=I J
0o m m
_ Y, Z Cov [ZJ, Zk]f3(s)Jk(t) = Y_ Var [ZJ]Jj(s)Jj(t)
i =I k=I j=I

(16) _ E A f (s)f(t)
; ; where AVar[Z ] ;

1= 1

and Cov [Z;, Zk ] = 0 if j 96 k. It is Eq. (16) that suggests how the f's may be
determined; we have

Jk(s)K(s, t) ds = f fk(S) L A j(s). (t) ds


0 o, i =1
W

AJ (t)(Ik,ff)

= AkJk(t)-
206 INTEGRAL TESTS OF FIT AND ESTIMATED EMPIRICAL PROCESS

We thus seek all solutions (eigenvalues A and eigenfunctions f) of the integral


equation

f(s)K(s, t) ds = Af(t) for 0 t <_ 1.


(17)J 0

Note that (16) carries within it the implication that

(18) Cov [Z , Zk ] = E
; X(s)f (s) ds X(t)fk (t) dt
0 0

= I 01 ^o f (s)K(s, t) dsI fk(t) dt


L
f
= ol Ajf; (t)fk (t) dt = A; [Kronecker's 8jk ].

The rv's Zk of (15) are called the principal components of X, a n d (13) is called
the principal component decomposition of X. We call Z* = Z; / JA the normalized
principal component since Z* =(0, 1); note that Eq. (13) can be reexpressed as

(19) X Y, Z* f.
j=1

Finally, from (19) we have that

(20)
f o,
X 2 (t)
i=l
dt= A;Z' 2 for uncorrelated (0, 1) rv's Z*.

Thus Jö X 2 dt is distributed as an infinite weighted sum of independent chi-


square (1) rv's in case X is a normal process; the weights A ; are the eigenvalues
of the covariance kernel and the Z; are the projections of X onto the associated
eigenfunctions.

2. ORTHOGONAL DECOMPOSITION OF PROCESSES

A List of Background Results on Kernels

Let Y2 denote the collection of all real measurable functions g on [0, 1] having
j^ g 2 ( t) dt < oo; here dt denotes Lebesgue measure. We let (g 1 , g 2 ) =
J o g 1 (t)g 2 (t) dt denote the usual inner product on Y2 , and we let I[g]I = (g, g) 1/2
denote the usual norm on 2'2. Recall that '2 is a complete normed linear
space, or Hilbert space.

ORTHOGONAL DECOMPOSITION OF PROCESSES 207
By a kernel we shall mean a non-null, symmetric, square integrable function
K (s, t) defined for 0 ^ s, i:5 1; that is,

(1) K$O, K(s,t)=K(t, ․ ), and J ^I


0 o
K 2 (s,t)dsdt<oo.

The kernel is called positive definite if

(2) I I g(s)K(s, t)g(t)dsdt>0 forallge2'2 withg00;


0 Jo

it is called positive semidefinite if we replace > by >_ in (2).


If for the real number A there exists an f in ' 2 such that

(3) J i
0
f(s)K(s, t) ds=Af(t),

then A is called an eigenvalue of the kernel K and f is called an associated


eigenfunction.

Remark 1. (Properties of kernels) See Kanwal (1971, pp. 139-150) for


justification of the following list of results and of Mercer's theorem.

(i) The eigenvalues of a kernel are at most countable in number.


(ii) The eigenfunctions corresponding to distinct eigenvalues are orthogonal
(i.e., (f,, f2) = 0 for eigenfunctions corresponding to eigenvalues A, 0
A 2 ).
(iii) Corresponding to any nonzero eigenvalue there are at most a finite
number of linearly independent eigenfunctions; the maximal such
number is called the multiplicity of the eigenvalue.
(iv) If we let A,, A 2 ,. .. be an enumeration of the nonzero eigenvalues with
each one appearing as many times as its multiplicity, then the corre-
sponding eigenfunctions f , f, ... may be assumed to be orthonormal
[i.e., (f,, f) = 0 while l[ f ]J = 1 for all i, j].
(v) We necessarily have Y_ ; A; - Jo Jo K Z (s, t) ds dt, which implies that if
there are an infinite number of eigenvalues A ;, then A; -*0 and 0 is the
only limit point of the A ; 's.
(vi) The nonzero eigenvalue A with largest absolute value satisfies JAS=
sup {J, jo g(s)K(s, t)g(t) ds dt; g E '2 has l[g]j =1}.
(vii) A kernel K is positive semidefinite if and only if all the nonzero
eigenvalues A ; are >0.
(viii) A kernel K is positive definite if and only if all eigenvalues A ; are >0
and the corresponding eigenfunctions f form a complete orthonormal
system.
208 INTEGRAL TESTS OF FIT AND ESTIMATED EMPIRICAL PROCESS

Proposition 1. An orthonormal set f , f , ... in Y2 is complete means that any


of the following equivalent conditions holds:

(i) f, f, ... is a maximal orthonormal set.


(ii) The set of all finite linear combinations of fj 's is I[ ]I-dense in '2 .
(iii)
2
For all g E 12 we have 1[g]f 2 = Y -j° (g, f) .
(iv) For all g, h E 22 we have (g, h) = E I (g, f; )(h, f ).

We remark that for each fixed m the choice a = (g, f) minimizes ^[Y_^ , a f —
; ;

g]l 2 ; thus (ii) implies that Y-7 , a f converges to g in l[ ]I as m -* co.


; ;

Theorem 1. (Mercer) If K is continuous for 0 <_ s, t ^ 1 and is a positive


semidefinite kernel [satisfying (1)], then
W
(4) K(s,t)= E Aj f(s)f(t) for0<_s,t<1,
i=1

where the series converges both uniformly and absolutely. Also, ^^ , A, < a0.
[Here ,l; and f denote the solutions of (3) with the f being orthonormal.]

Existence of Principal Component Decompositions

Having reviewed this background material, we now begin to apply it to


processes on [0, 1]. An early reference is Kac and Siegert (1947).

Proposition 2. Let K be the covariance function of a nontrivial zero-mean


process {X(t); 0 <_ t <_ 1} whose trajectories are a.s. Z2 functions.
If K satisfies

f o,
K(t, t) dt<oo,

then K is a positive semidefinite kernel. If A,, f, and A 2 , f2 are two solutions


of (3) for which f and f2 are orthonormal, then

J
I

Z; - X(t)f (t) dt forj = 1, 2 are uncorrelated (0, A ) rv's.


;

Proof. By Cauchy-Schwarz, we can for a.e. w define

(a) Y- J 0
g(t)X(t) dt for each fixed g E Yz .

Now
j I I I

(b) EY= E { g(t)X(t) dt= J g(t)EX(r) dt = g(t)0 dt =0,


J o 0 0
ORTHOGONAL DECOMPOSITION OF PROCESSES 209

where use of Fubini's theorem is justified by


I o

J Elg(t)X(t)I dtv j [Eg 2 (t)X 2 (t)]' /2 dt=


0

'j f Ig(t)I[K(t, t)] /


r('Jo,
1 2 dt

f ^/z ^ l r/2

[ J 01
g 2 (t) dt K
ol (t, t) dt =1[g]II
] L o
K(t, t) di 00.
<

Thus we have

0: Var [ Y] = EY 2 = E J ' g(s)X(s) ds f g(t)X( t) dt


L 0 o, I
=
J 0 o
J g(s)E[X(s)X(t)]g(t) ds dt

(c) = I , f g(s)K
1 (s, t)g(t) dsdt,
0 o

where use of Fubini's theorem is now justified by

f o, Jo
Elg(s)X(s)X(t)g(t)I dsdt

f 1 Jg(s)I[EX 2 (s)EX 2 (t)] 2 Ig(t)I dsdt


f 0

= J 1J Ig(s)[K(s, s)K(t, t)]'/ 2 g(t)I ds dt


0

= f J g (s)J[K(s, S)]'/2 ds f Ig(t)I[K(t, t)]'/z dt


0

_ JIg(t)I[K(t,
0
t)]'/ 2 dt]

Thus K is positive semidefinite. Clearly, K is nonnull and symmetric; and


<I[g]IZ f
o , K(t, t) dt«o.

the rest of (1) holds since

fJ o, 0
K 2 (s, t) ds dt = J f'
0 .
Cove [X(s), X(t)] ds dt

f f (Var
, [X(s)] Var [X(t)]) ds dt
0 o

=
J 0 0
J ' [K(s,
s)K(t, t)] ds dt

= [ o J K(t, t) dt] Z .

Thus K is a kernel.
210 INTEGRAL TESTS OF FIT AND ESTIMATED EMPIRICAL PROCESS

Now Z; has mean 0 by (b). For the covariances of the Z; 's we continue (c)
by writing

Cov [Z , Z .] = f , ' f (s)K(s, t)f -(t) ds dt


; ; as in (c)
0 0

= fo A(t)f(t) dt
' by (3)

(d) = A [Kronecker's S-].


;

This completes the proof. ❑5

ASSUMPTION 1. We now assume that K is .the covariance function of a


nontrivial zero-mean process {X(t):0<t<_1} whose trajectories are a.s. in a
subspace L of '2 functions, and we suppose that K satisfies

(5) ^ K(t, t) dt<ao.


0

We also suppose that A I >_ ,t 2 >_ • • • >0 comprises the entire spectrum of eigen -
values of K, that the associated orthonormal eigenfunctions f,, f2 , ... form a
complete set for the subspace L, and that we have the Kac and Siegers
decomposition

(6) K(s, z) = E ,kjfj (s)fj (t) is valid for 0 < s, t<1.

We note from the previous proposition that

(7) Z; = J X(t)f1 (t) dt = (0, A; ) for j ? I are uncorrelated rv's.

Remark 1. We note that if we assume the K of Proposition 2 is continuous,


then Mercer's theorem guarantees that (6) holds; in fact, the convergence in
(6) is both uniform and absolute in this case, and Y ° d; <oo. However, there
are interesting examples where (6) holds but K is not continuous; see Anderson
and Darling's theorem (Theorem 5.4.1).

Theorem 2. (Kac and Siegert) Suppose X and K satisfy Assumption 1. Then

(8) A; <°o,

m
(9) E Zf) -^ I[ ]l X as m - oo a.s.
J=1
ORTHOGONAL DECOMPOSITION OF PROCESSES 211

where

(10) Z; = (X, f) = (0, A ; ) are uncorrelated,

X 2 (t) dt = Z; = kjZ* z where Z* = Z; /f = (0, 1),


(11) fo, as
i =1=1

EIX(t)—^ Z;f(t) Z -*0 for eachtasm-*oo


(12)
LLL i =l J
(13) X E Z* f with Z* = Z; Jf uncorrelated (0, 1) rv's.
i =1

Proof. Now oo> j'o K(t, t) dt = , ,tf,(t) dt =E , A, with Fubini's


theorem justified by (5). Also, (9) follows from (ii) of Proposition 1 since the
choice a; = Z; minimizes I[Y j , a; f; — X]I among all possible choices Y-' a; f.
Also, (11) is just (iii) of Proposition 1. That (10) holds is just (7). Finally, we
note that

X(i)— E Z;f(t)I 2 =K(t, t)-2 E- f(t)EZ; X(t)+ I ,1lfJ(t)


E[ i =I i =I i =I
m
(a) = K(t, t)— I Af (t)

since application of Fubini's theorem in the calculation

EZ X(t) = E I
; I f (s)X(s) ds X(t)^ =
LLL o E f(s)E[X(s)X(t)] ds

(b) = I f(s)K(s, t) ds=A f(t) ;

is justified by

IL(s)IEIX(s)X(t)I ds
fo,

: f Jfj (s)J [EX2(S)EX2(t)]'/2 ds


I
2
= f Jfj (s)J[K(s, S)]h/2 ds K(t, t)"
1/2
fol fi(s) ds J K(s, s) ds K(t,t)" 2

1
0

I
1/z

= [ J 0
K(s, s) ds K(t, t) <oo.
212 INTEGRAL TESTS OF FIT AND ESTIMATED EMPIRICAL PROCESS

Now (a) converges to 0 by (6), which gives (12). Then (12) implies (13).

Distribution of Jö X (t) dt Via Decomposition


2

We suppose first that

(14) T X 2 (t) dt = A;Z* Z for iid N(0, 1) rv's Z*

(this would be the case if we added the assumption of normality to Theorem


2). In the next two sections of this chapter we will show that W2 , A 2 , and U 2
can all be represented as in (14). How do we use this representation to determine
the distribution?
METHOD 1. The ry T of (14) has characteristic function

(15) cb(A)=Ee ür —
H (1- 2iA AY W2 .
j
i=1

It may be possible to invert this characteristic function (or the moment


generating function, etc.). Good examples of this technique are found in
Anderson and Darling's (1952) treatment of W2 and A 2 and Watson's (1961)
treatment of U 2 .

METHOD 2. If one truncates off the infinite series in (14), then the weighted
sum of independent chi-square (1) rv's that remains can be treated by the
numerical methods of Imhof or Slepian. A good example of this technique is
found in Durbin and Knott (1972) —a description and references can be found
there.
METHOD 3. It is possible to compute the first few moments of T, and then
match these with a Pearson curve, and so on. A good example is Stephens
(1976).
METHOD 4. Use Monte Carlo methods. See Stephens (1976).

We now turn to a computation of the moments of T; see Anderson and


Darling (1952). We agree that
Km denotes the mth semi-invariant of T,
which is given by the coefficient of (U)'/m! in the power-series expansion
of the log (A) of (15). Thus

(16) K,„ =2'" '(m-1)!


- ,l;",
7=1
PRINCIPAL COMPONENT DECOMPOSITION OF U,,, U 213

so that

(17) ET= A j and Var[T]=2 Y ,1.


=t j=t

This expression requires knowledge of all .k's. Alternatively, we note that


when the expansion (7) holds with K(t, t) dt <oo we have fö
(18) K(t, t) dt= E kjfjj (t) 2 dt = Y_ .1j = ET
0 0 j=1 j I

and

J
('tt

2 K 2(s, t) dsdt
0 Jo
m
2
f.,
Y, Z AA f(s)f (s)f(t)f ( t)dsdt
0 j =t j' =t
, ,

(19) =2 ,t; = Var [ T].


j=t


(where K(t, t) di <n and Fubini's theorem allows the interchange). More
generally,

(20) x m =2"' '(m-1)! - I0


K m (t, t) dt,

where

Km (s, t)= fK .' m _,(s, r)K(r, t) dr =Y Aj fj (s)f (t).


t

Exercise 1. Verify (20).

3. PRINCIPAL COMPONENT DECOMPOSITION OF U,,, U AND


OTHER RELATED PROCESSES

We now rigorize the program of Section 1 for the empirical process and
Brownian bridge.

Proposition 1. The covariance function

K„(s,t)=sAt —st for0:s,t<_1


214 INTEGRAL TESTS OF FIT AND ESTIMATED EMPIRICAL PROCESS

can be decomposed as

(1) Ku(s,t) =I Ajf(s)f ( t) ; for0<s,t1,


j =a

where

(2) A; = (j1r) -2 and f ( t) =,sin


; (jirt) for 0 <— t <— 1

with orthonormal f . The series (1) converges uniformly and absolutely on [0, 1 ].
Proof. Now Eq. (5.1.17) takes the form

(a) Af(t) = f(s)K u (s, t) ds = s(i — t)f(s) ds+ t(l —s)f(s) ds.
0 o r

Because f in (a) is represented as an integral, it is absolutely continuous; thus


it may be differentiated to give

Af'(t) =(1— t)tf(t)—j sf(s) ds— t(1— t)f(t)+


o
J (1— s)f(s) ds

= J r
f(s) ds— J sf(s) ds.
o

We differentiate again to obtain

(b) Af "(t)= —f(t)•

We note also that any solution to (a) must satisfy

(c) f(0) = f(1)=0;

just plug 0 or 1 into (a).


We also require the norming condition

f2(t)d(1 .
(d) J0

The solutions of (b), (c), and (d) are given by (2), and the convergence is as
stated by Mercer's theorem. ❑

Exercise 1. Complete the proof of Proposition I by verifying that A = (J ;


-2
is required, and that the f are orthonormal.
PRINCIPAL COMPONENT DECOMPOSITION OF UJ,,, U 2155

Having decomposed K v (which equals K u .), we are ready to carry through


our program for U, W 2 , U n , and W.

Theorem 1. We have

(3) Z' 'f on (C, ),


j=1 Jir

where f(t) =..I2 sin (jzrt) as in Proposition 1, and where

(4) Z', Zr,... are iid N(0, 1).

Moreover,
*2
(5) W2 [U(t)]2 dt= ( Z )2 .
J O jY)

Proof. As in Theorem 5.2.2, Z'—Z3 /,ß,1j where Zj =(U,j)=J f; (t)U(t)dt


are uncorrelated (0, ,1j ) rv's. Now the Zj are (jointly) normal since the integral
defining the Z's are limits of partial sums that are exactly normal; see
Proposition 2.2.1. Thus the Z* are iid N(0, 1). Equation 5 is just (5.2.11).
Equation (5.2.13) implies that the two processes in (3) have the same finite-
dimensional distributions. [We do not claim equality in (3); Fourier series of
such "wiggly" functions need not converge pointwise.] ❑

Table 3.8.4 gives the distribution of W 2 , computed by Anderson and Darling


(1952) by inverting the characteristic function.

Exercise 2. The characteristics function of W 2 is d (2iA ) - ' /2 for

d(A) = (sin .J,1)/'.

This was inverted by Smirnov to give

P(W Z >x)=—
1=
E (-1)j+'
J (2i)Z2
—^)
e 1 ___
^y
%^ exp(—
x \\\
y 1 dy.
1T 1=i (2j-l)2 2 J sinV y 2/

See Durbin (1973a, p. 32) for some useful discussion and references. See also
Darling (1955, p. 15). Another inversion was used by Anderson and Darling
(1952) to derive Table 3.8.4.

Theorem 2. (Durbin and Knott) The normalized principal components of


U,, are
/ n
(6) Zn = f(t)U(t)
(t) dt/..Äj=n -'/2 V2 cos (J^'Sk)
0 k=1
216 INTEGRAL TESTS OF FIT AND ESTIMATED EMPIRICAL PROCESS

with A i and f as in (2). They satisfy

(7) {Z: j? 1} are identically distributed (0, 1) rv's that are uncorrelated

since

(8) fcos(jlr^'k)= ✓2cos(ir4k) =(0,1) forallj>_1.

[Moreover, the CLT would suggest that the Z„ are nearly N(0, 1).] Finally,
this decomposition represents U, to the extent that

(9) 1 ✓A; Z*n f(t)->[UJ (t) +U,,(t -)]/2 asm-+ooforeach0<_t<_1,


J= 1
00 m Z n' f
(10) vn= L ✓A;Zn%Jj - L
l= 1 i fir
cc

(1 1 ) Wn= f 1 un(t) dt = / Znzy•


0 j =, (jir)

Proof. We first verify (6)-(8). Integration by parts gives

(a) Zn;= f fj (t) U,, (t) dt = ^2 J U (t)sin(jirt)dt


0 0

(b) =( ✓ 2 /jir) cos (Jilt) dU n (t)


J0
(c) =(✓2n /jir)I J 1 cos (fin) dGn(t)-
f, cos (fin) dt

✓2n Y_
o o

(d) =( /njir) cos(jar6k )


k=1

as claimed in (6). Thus

(e) Zn =..f 7; E. cos (j?t^i),


i=l

which we now show are each distributed as the sum of n uncorrelated (0, 1)
rv's. Note that

(f) Var [. cos (fire)] =


Jo 2 cos
1 t (fin) dt = 1.
PRINCIPAL COMPONENT DECOMPOSITION OF U,,, U 217

Moreover, the Z*'s are uncorrelated since

(g) Coy [J cos (jire; ), ✓2 cos (kIre; )] =


fo
2 cos (jirt) cos (klrt) dt = 0

for all j ^ k and all i. All rv's cos (jire) are identically distributed since cos (jure)
is the projection onto the horizontal axis of a unit vector where the angle
between the vector and the axis is chosen uniformly on [0, jzr]. Thus (7) and
(8) are established. Equation (9) is a standard result from Fourier series, valid
because v" is piecewise linear. As in Theorem 1, (5.2.13) implies (10). Equation
(11) is just (5.2.11). ❑

Exercise 3. (i) (Computing formula for W„) Show that

Wn= I U (t)dt=
o ;_, n J 12n

(ii) Show that

E W^ = V6, ar [ W^„] = 45' and K” = 2 2 " 1 E z


=ij "IT

Remark 1. (The null distribution of the Z„) Following Durbin and Knott
(1972), we note that the characteristic function of cos (,ire) is

E exp {it cos ( rrs)} ds =


JO
, cos (t cos (irs)) ds = Jo (t)

(where J0 is a Bessel function). Thus Z*", has

characteristic function = Jo ( t)",

density function = (1/ Ir) J Jo (^ t)" cos (zt) dt,


(12) o

distribution function = 0.5 + (1/ Ir) JJ0 (.J7 t)"t -' sin (zt) dt.
0

Durbin and Knott were able to obtain the following table 1 of percentage
points of the normalized orthogonal components Z„ from the distribution
formula of (12). They advocate that in addition to whatever other graphing
or testing is done, the Z^ 's should be "computed and considered."

Remark 2. As observed by Durbin and Knott (1972), if ßt,. ..,ß" are


independent rv's on the unit interval with continuous df G, then G(t) — t
218 INTEGRAL TESTS OF FIT AND ESTIMATED EMPIRICAL PROCESS

Table 1. Upper Percentage Points for Normalized Components z,41


(from Durbin and Knott (1972))

n " 10% 5% 2.5% 1%

4 1.3048 1.6501 1.9486 2.2459


5 1.3085 1.6507 1.9338 2.2591
6 1.3000 1.6526 1.9424 2.2613
7 1.2981 1.6494 1.9460 2.2730
8 1.2959 1.6492 1.9467 2.2809
9 1.2941 1.6487 1.9484 2.2856
10 1.2929 1.6482 1.9496 2.2899
15 1.2890 1.6470 1.9530 2.3023
20 1.2871 1.6464 1.9548 2.3085
25 1.2860 1.6461 1.9558 2.3121
50 1.2837 1.6455 1.9579 2.3193
00 1.2816 1.6449 1.9600 2.3263

vanishes at 0 and 1 and may hence be expressed in a Fourier sine series as

(13) G(t) -t= Y_ bj 2sin(jirt) with b1 = J [G(t)-t]J sin (jirt) dt.


i=1 o

They note that a natural estimate of b; is

,
(14) b
m [G(t)-t]^sin(jari) dt= J
(iZvn)

where 6„ is the empirical df of ß,, ... , ß,,. Thus the Z „* can also be interpreted
;

as estimators of natural Fourier parameters in the nonnull case. Percentage


points of
Z*2 Z*2
(15) B W
j =1 z
(j^) Z i=p+i (jir)

are also tabled by Durbin and Knott (1972); see Table 2. These can be used
to test (asymptotically) if p terms in the natural Fourier estimate of the true
df G account for the entire deviation from the Uniform (0, 1) df.

Remark 3. (On the power of W„, Zr,, and !IU .JI tests) Suppose X 1 , X2 , .. .
are iid F8, and we form statistics from the continuous hypothesized df F. Then

Jn[F„-Fep ]= /[F„- FB1 ]+J[FB,-FBO ]


(16) =0J„(FB)+S„(F^),

where, under mild regularity on F (see Propositions 4.5.1 and 4.5.2, for
PRINCIPAL COMPONENT DECOMPOSITION OF U,,, U 219

Table 2. Percentage Points of Bp = W 2 - t' (zj )Z/(j2n2)


J1
(from Durbin and Knott (1972))

Probability (= P(BP indicated value))


0.500 0.600 0.900 0.950 0.990 0.999

B (= W 2 ) 0.11888 0.24124 0.34730 0.46136 0.74346 1.16786

Bj 0.05439 0.08947 0.11678 0.14522 0.21501 0.32051

B2 0.03532 0.05295 0.06583 0.07884 0.11000 0.15662


B3 0.02618 0.03713 0.04482 0.05242 0.07023 0.09647
B4 0.02080 0.02842 0.03362 0.03868 0.05032 0.06720
B f0 0.00935 0.01152 0.01288 0.01414 0.01688 0.02062

example), one has for 0, = 0 1 „ = 0 0 + 'y//n that

S„(t) -J[F„ 1 - F8^ (t)- t]

yfo(F '(t)) - when 0 0 =0 and F9 =F(--0)


-yF '(t)f - (F '(t)) when 0o = 1 and FB = F(-/6)
- -

(17) = s(t).
Chibisov's theorem (Theorem 4.2.1) shows that Kolmogorov's test and the
Cramer-von Mises test satisfy

(18) II/ ' [F„ - Foo l II -pd U+5D as n-*co

and

n[F,(x)-F%(x)]2 dF^(x) d
f [U+S] 2 dt

Z*Z
(19) = ±j where Z* are independent N (jir(S, f ), 1),
-

i =t (jir)

since the Durbin and Knott components satisfy

I[F„(x) - F^(x)] sin (jirF^(x)) dF,,(x)

(20) "dN((s,f),
220 INTEGRAL TESTS OF FIT AND ESTIMATED EMPIRICAL PROCESS

Table 3. Asymptotic powers of components and Wn, and U19


against shifts in normal mean and variance in the one-sided situation

(from Durbin and Knott (1972))

Mean shift Variance shift

One-sided test Power of best test Power of best test


based on 0.50 0.95 0.50 0.95

xn 1t 0.466 0.930 0.05 0.05


zn e 0.05 0.05 0.336 0.787
z,= 3t 0.106 0.197 0.05 0.05
Z4 0.05 0.05 0.162 0.371
0.342 0.877 0.072 0.205
EIn 0.354 0.890 0.108 0.423
Un 0.155 0.521 0.185 0.622

*In fact, (— 1)1 z, the test statistic, if we always use the upper tail as the critical region

Thus from (19) and (20), we can compute the asymptotic power of the W„ -test
and the Durbin and Knott test based on the jth component Z, respectively.
Similar expressions for the power of the A. and U„ tests can be derived from
results given below. Table 3, from Durbin and Knott (1972), gives asymptotic
powers of the W„ -test, the A2n -test, and U„ -test and tests based on the first
four normalized principal components of U,,. The combination of Z*, and
Z* 2 and the statistic A seem most noteworthy. This seems to confirm the
suspicions of Durbin and Knott that (11) led to higher-order components
being damped out too quickly by weights proportional to 1/j 2 . Stephens (1974)
also found A„, and then W„, to outperform IIU„jI.

Exercise 4. (Watson, 1961) (i) Watson's statistic


i z
U „= UJt)— U„(s) ds di

(21)
_f
where
0

{X (t)12 dt,
0

(22) X (t) = U„(t)— f


0
U,,(s) ds

has covariance function

(23) K„(s, t) = s A t—(s+1)/2+(s— t)2/2+,'—Z.


PRINCIPAL COMPONENT DECOMPOSITION OF U,,, U 221

(ii) Show that a decomposition of type (1) holds with A 2 _ 1 = A 2; =


1/(4ir 2j 2 ), j2j _ I (')=f sin (j7rt), and f,(t)=f cos (jirt) for j>1.
(iii) Show that

2 z °° 1
(24) U" -3.a U = y 2j 2Ei^
i =I 47r

where the E; 's are iid Exponential (1).


Jiv) Show that U 2 has moment generating function M u 2(t) =
,%t/2/sin ✓ t/2, and that

(25) P(U 2 > x) = 2 (-1) k+ ' exp (-2k 2 ar 2 x) for x> 0.


k=1

Then note that

(26) ir 2 U 2 =11U11 2 ,

which seems a very curious fact; thus the tables for 1lUJI can also be used for
W .

(v) Show that 4 U 2 has the same distribution as does the sum of two
independent copies of W 2 .
(vi) Show that

2 r i -1/2 l lZ 1
U=L i if"'^— n
— ^ + 2J + 12n

i —n/2
(2 7 ) =^i-2'^ ' ":^+3(-2)2

where

Exercise 5. (Durbin and Knott 1972) Fix n. Let

i
for1isn
n+l

=V n (i/(n+1)).

Then the covariance matrix = ^1 II of V = (V,, ... , 'I")' has

__
n i j i j
^' n+2 n+1 n+l n+1 n+l
_
222 INTEGRAL TESTS OF FIT AND ESTIMATED EMPIRICAL PROCESS

Show that has eigenvalues

J^
A = n
; 4(n+1)(n+2) sine (

with associated eigenvectors

+l(sin( j J
f' n n+l) 'sIn\n+l)'...,sin(n+1Jj

for 1 sf <_ n. The normalized principal components

Z* = f;V /f d = (0, 1) are uncorrelated for 1 sf < n.


;

Show, however, that the Z are not iid. For this reason, Durbin and Knott lost
;

interest in the statistic

n+l " r r i 1 2_ n +i
^` n+i "
n . tn
+l n n ;_,

even though (as you are to show)

M ( -6', 4',5 ) for all n

and
z ^

M„= (n 1) [G„(t) —t]zdt-12n'

where „ is a modified empirical df that equals (i+2)/(n+ 1 ) at„ . [Durbin :;

and Knott note an observation of Feller that (n + 1)p — Z is a better choice than
the mean np in the normal approximation to P (Binomial (n, p) <_ k).]

Exercise 6. (Marshall, 1958) Obtain the exact distribution function F(x)


P(nW z <_ x) for n=1 and 2. (Marshall tables F3 also.) Compare these with
F (x) = P(W 2 <_ x), and note how quickly convergence takes place. See
F

Table 4.

Exercise 7. (i) For suitably regular weight functions 0, show that

2 [Yz(„:J) -Jnz `^I(^":;)I


fo U (t)
,
(t)dt =
i =I JJ

+n J (1— t) 2 t(t)dt,
o
PRINCIPAL COMPONENT DECOMPOSITION OF IL,,, u 223

Table 4. Distribution of the Crarnbr-von Mises Statistic for n = 1,2,3


(from Marshall (1958))

z FI(z) F2(z) F3(z) F.(z)


.11888 .37708 .46692 .47343 .50000
.14663 .50318 .57614 .57683 .60000
.16385 .56751 .63384 .63009 .65000
.18433 .63560 .68842 .68521 .70000
.20939 .71009 .73974 .74191 .75000
.24124 .79475 .79126 .79924 .80000
.28406 .89605 .84515 .85481 .85000
.34730 1.00000 .90296 .90617 .90000
.40520 1.00000 .94007 .93661 .93000
.46136 1.00000 .96554 .95723 .95000
.54885 1.00000 .98968 .97793 .97000
.74346 1.00000 1.00000 .99680 .99000
1.16786 1.00000 1.00000 1.00000 .99900

where

'V1(z)° J ' 0(s)ds


0
and ^Y 2 (t)=
E s^(s ds.
)

(ii) Consider a covariance kernel K of the form

K(s, t)=[s/r(s),(t)]' 12 [sAi-St] for 0<_s, t<_1

with si continuous on [0, 1]. Then, as in Proposition 1, the eigenvalues A ; and


eigenfunctions f that lead to a uniformly and absolutely convergent expansion
of the type (1) are given by the solutions to the equation

Af"(t)=-s/i(t)f(t) subjectto f(0)=f(1)=0.

We state for emphasis the obvious fact that the f are continuous on [0, 1].
(iii) Compute the mean and variance of ,(ö v„+p dt.

Remark 4. Götze (1979) established that

1
-
(28) P( W < x) - P( W Z x)I <_ const for all x,
n

a result which had been conjectured by S. Csörgo (1976), who established the
rate Iog/vc using the Hungarian construction; see Section 12.4. This and many
other results for higher-dimensional versions of W„ are given by Cotterill and
M. Csörgo (1982).

Exercise 8. (Watson, 1967) Consider a sample X,, ... , X„ from a distribu-


tion on the circumference of a circle of circumference 1. Is the distribution
224 INTEGRAL TESTS OF FIT AND ESTIMATED EMPIRICAL PROCESS

uniform? Arbitrarily establish an origin, and measure distance clockwise. Let

(29) N(x) = the number of observations in the semicircle (x, x+Z)

and define

(30) Tn = n - '^ 2 IIN- n/2 fl and G„ = n - ' " [N(x) - n/2] 2 dx.
J0
(i) Determine the limiting distribution of the process

(31) n ' [N(x)


- /2 -n/2] for0^x^ 1.

(ii) Since N(x) -n/2 has period 1, write

N(x) -n/2= Y_ ao +
0"

k=1
{a k cos (2lrkx)+b k sin (2irkx)}

and verify that a k = bk =0 for k even and

2 n 2 n
sin (2zrkX ), ; bk =— Y, cos (2vrkX ) ; for k odd.
k-rr i =, kir ,_,

Show that
2
(32) G= 2 Ek for iid Exponential (1) Ek 's.
k -, (2k-1)

(iii) Verify that G" - d G where

4(-1) k '1T Z (2k -1 ) z


(33) P(G>A)= 2 A) for all A>0.
k1ir(2k-1)exp/-

A Monte Carlo experiment by Watson shows that the approach to the limiting
distribution is rapid.

4. PRINCIPAL COMPONENT DECOMPOSITION OF THE


ANDERSON AND DARLING STATISTIC A.

In Example 3.8.4 we determined the asymptotic distribution of the Anderson


and Darling statistic

(1) A= J 1 t(1U(z)t) dt= J Z^n(t) dt where 11 "(t) =


U(t)
o — o ✓ t(1 — t)
PRINCIPAL COMPONENT DECOMPOSITION OF A„ 225
denotes the normalized empirical process. We summarize that result in
Proposition 1.

Proposition 1. (Anderson and Darling)

- d A 2 - J 0 1—:
(2) A
o « 1—: )dt
= J Z (t) dt
o
2 with Z(t)— U(t)
.'./t(1— t)

The covariance function of the 74, process is (for all n)

snt —st
(3) Kz(s, t) = for0<_s, ts1.
[s(1 —s)t(1—t)]',Z

Unfortunately, the theory of the previous sections via Mercer's theorem does
not apply since K z is not continuous on the entire unit square (note that it is
uniformly bounded by 1). Thus the method of Proposition 5.3.1 does not apply.
However, the general approach of Section 1 is still applicable. Note its analogy
to Exercise 5.3.7 with '/'(t) = [t(l — t)] - ' 1z .

Theorem 1. (Anderson and Darling) The covariance function K z of (3) can


be decomposed in the usual form as

(4) Kz(s, t)= Z A;.;(s).fj(t) OAS, t1,


=1

where

2(j+1
(5) A;=[j(j +l)] -' and f(t) =2 t(1 —t)p;(2t -1)
^(^ +1 )

with p(x) being the Legendre polynomial of degreej [defined as the coefficient
of h in the expansion in powers of h of (1 — 2xh + h 2 )_" 2 ). This series converges
uniformly in every domain interior to the unit square. Moreover,

(6) A2= J Z (t) dt =


0
Z
1=l
Z* Z/[j(j +l)],

where the Z*'s are iid N(0, 1).

Proof. See Anderson and Darling (1952) and Durbin and Knott (1972). ❑
226 INTEGRAL TESTS OF FIT AND ESTIMATED EMPIRICAL PROCESS

The first few Legendre polynomials on [-1, 1] are:

PO(x) = 1
p 1 (x)=x p(x)=l
(7) p2(x) = Z(3x 2 —1) with p(x)=3x
P3(x) =1(5x 3 -3x) P3(x) = 3(5x 2- 1)
p 4 (x) ='-8 (35x 4 — 30x 2 + 3) p' (x) = 5(7x 3 — 3x).

The orthonormal versions are ✓(2j+1)/2pj (x) on [-1, 1]. Thus the first
orthonormal eigenfunctions in (5) are

(8) f,(t)= ✓6t(1—t) and f2 (t)= ✓30t(1—t)(2t-1).

Note that they put more weight on the tails than do the Cramer-von Mises
eigenfunctions.
Note that the jth normalized principal component Z * of Z,, is

Zni =^o Z(t)2 t(1—t)pj(2t-1)dt/


.1(J+1)

=
2 fo
2^+1 n
l U n (t) dp j(2t-1)= — ✓2j+1 I pj (2t-1) dv n (t)
o

pj(2f; —1).
✓ n ;_,
These are not iid (in j) rv's. Nevertheless,

(10) Z n* 1 , ... , Z*K are asymptotically independent N(0, 1)

and
K
(11) as n-*;
j=t

useful tests could be based on Z*,, Zn


2i Tn2i or Tn4i say. R. Jones, personal

communication, found tests of this type to be roughly comparable in power


to the A„-test, though higher components typically had less power than lower
components for noncontrived alternatives. Thus, some damping of the
components seems appropriate, and suggests the statistics

(12) T„ K = Y, L
Z --> d E
j=1 J
—i—
i =1 3
for i.i.d. N(0, 1) rv's Z',

for example.
PRINCIPAL COMPONENT DECOMPOSITION OF A„ 227
Exercise 1. Show that the Anderson and Darling statistic satisfies

An = 7L„(t) dt= 1 E (2j-1) log


E n j= 1 en:j(1—Sn:n -j+l)

Exercise 2. (Anderson and Darling, 1954)


(i) Show that

EA 2 =1 and Var [A 2 ] = 2 (-r. 2 –9) = 0.57974.

(ii) Show that

aA2 – –2iriA
O(A)= Ee
cos ((ir /2) / ✓ 1 +8iA)'

thus

°° (-1) j r(j+2)(4j +1) (4j +1)2ir2


P(A Z <_ x) =— exp –
8x 1

x exp 8(w dw,


Jo \ 2x +1) – 8x
(4j +1)2ir2w2 /

where this is an alternating series with decreasing terms.

Remark 1. Borovskih (1980) establishes that for all x>0 and all n sufficiently
large

IP(An – x) – P(A 2 – x)I ^ h(x)(log n) i2/ ,


3 J

where h is a positive, bounded function that is O(1/x) as x-*co. He also


establishes his result for more general weight functions and for a two-sample
version of the statistic. His proof is based on the Hungarian construction of
Chapter 12.

Exercise 3. Carry out the program of this section for the statistic
1 2
f
'

U(t) log dt.


0 t(1– t)]

This corresponds to weight function +ji(t) = log 1/(t(1–t)) in Exercise 5.3.5.


(We do not know the decomposition for this kernel.)
228 INTEGRAL TESTS OF FIT AND ESTIMATED EMPIRICAL PROCESS

5. TESTS OF FIT WITH PARAMETERS ESTIMATED

Let X 1 , ... , X. be iid with unknown df F. We would like to test the hypothesis
that F belongs to a particular class of df's FB with 0 in some index set. Let
0, denote some natural estimate of 0 within the parameterized class of F e 's.
Then it would be natural to base tests of the hypothesis on goodness-of-fit
statistics formed from the process '.In [F,, — Fe,]. As far as the null hypothesis
situation is concerned, we now have enough notation to attack the problem.
However, we will also want to consider the power of our tests; we will find
it most convenient to do this in the content of a somewhat larger parametric
family.
To this end, we suppose that X 1 ,. . . , X. are iid with df F0, for some pair
(0, y) = (0 .... , 0,, Y,, ... , 7 K ) contained in R, +K. We will now test the
hypothesis H: y = 0 that the family parameter vector is 0, and we also desire
the power of this test under local alternatives of the form y/..In. We let IF„
denote the empirical df of X1 ,. . . , X,,; we define

(1) F, = FB,, o for some estimate 8„ of 0.

We will be interested in the estimated empirical process

(2) ✓n [EF„(x) — F„(x)] for —00 < x < oo.

Example 1. Let X, X,, ... , X. be iid. We wish to test the hypothesis that X
has some normal distribution. Given the data values of X i ,. . . , X„, we let P.
denote the normal df having mean X„ X; / n and variance 52
I; (X; —X„) 2 (n — 1). We will then form goodness-of-fit statistics based on
Now let (0, y) _ (0,, 0 2 , y,, 7 2 ) and let FB. , be specified
the process.✓ n [F, — Now
by saying that

e_32/2{1+ 6
(3) X e e' — 1 ' H3(x)+24H4(x)} ,
i

where H3 and H4 are the usual Hermite polynomials in this Edgeworth


expansion. The null distribution of f [F„ — F,,] corresponds to assuming
H: y, = y2 = 0, while the power of tests of normality based on the process
Vi[F„ — F„] can be evaluated under the sequence of local alternatives y,/Vi,
y2 /V''. If we were only interested in the null hypothesis, y, and 7 2 would be
unnecessary. ❑

Treatment of the estimated empirical process when F 9, 71 ,j„ is the true df


centers on the identity

[IF„ — F„]='[F„ — F0.Y1^] — / [F„ — F0, ,,, ✓n]


(4 ) _ U ,, (Fer/f) —In [Fe,,o —
Fe,71 j ]
,
TESTS OF FIT WITH PARAMETERS ESTIMATED 229
It is transparent from (4) that this estimated empirical process will have the
same asymptotic behavior as does

(5) Ü(Fo,o) = U(FB, o ) - [.in (O - O)]--F0,7 l


aOj (0,0)
+ Yk
k
a
ayk
F0 ,7
( 9. 0 )

provided we have sufficient regularity to make the partial derivatives of F 0, 7


behave nicely. Replacing (4) by (5) will be the key to our approach.
We now state our regularity conditions. We say that the family F0, 7 is regular
if a first-order Taylor-series approximation

II (F ,7—
0 F0,0)—), e; )
aOj
F0 ,7 1 +Y, Yk
(8,0) k
F9.YIII
avk (9,0)

(6) = O (8; — 8 ) 2 +y yk 1 in a neighborhood of (8, 0)


;

holds with all partial derivatives being uniformly bounded in x. We say that
the sequence of estimators B„ of 0 is regular if for each coordinate j we have

(7) ^Y h ( ' ; ; ) +o,(1) withh;(e) =(0,a ).

[We remark that if e is efficient, then we typically have (7) with the vector
of h; 's being the usual product of score vector times the inverse of the
information matrix.]

Theorem 1. (Darling) If F 0, 7i ,r is the true df, if the family of possible df's


is regular in the sense of (6), and if the estimator 0, of 0 is regular in the
sense of (7), then the special construction (see Section 3.1) of the estimated
empirical process satisfies

(8) 11 ✓ n (IF „—F‚)—H(F0o)II -o


v0 as n -o .

Here

(9) 01(Fe,o)=U(F,,o)—Y, Z;F;+Y_ YkFk,


j k

where

( 10 ) F; = a Fo., 1
ae; 010)
and Fk = a
ayk
Fe.,,
1 (8.0)

and the vector of Z; 's and the Brownian bridge U are jointly normal with 0
230 INTEGRAL TESTS OF FIT AND ESTIMATED EMPIRICAL PROCESS

means and

(11) Cov [Z;, 4J = J , hj (s)hj ,(s) ds and Cov [Zj, U(t)] =


0
J
0
hj (s) ds

for hj 's defined in (7).


Proof. Comparing (4) and (5) we note that

II U„ l F 8,rj ✓n) — U(F0,0)


-1w^ —v1I + 11U(Fe1 r) —U(F00 )I1
(a) -4p 0 using only that 11 F BJ ✓„ — FB,o 0 by (6).

Now, with the op (1) term being uniform in x, we have


F0,ri ✓„]= ✓ n(j — Oj)F, E—YkFk+o,(1)
k

by (6) and (7)


1
1 ^^ hj(e„r) Y- YkFk+0,(1) by (7)
i J — k

(b) _ Y,j FjZj Y_k


— YkFk + 0 ( 1)

by Theorem 3.1.2 with Zj = hj d U,


0

where (11) holds by Theorem 3.1.2 also. We attribute this theorem to Darling
(1955), though Sukhatme (1972) and Durbin (1973b) certainly improved it
considerably toward the form we have devised here. ❑

Remark 1. When authors such as Durbin (1973a, b), Sukhatme (1972), Pierce
and Kopecky (1979) and Loynes (1980) have treated the process of (2), they
have used some additional notation. Thus we let

(12) 4„, = F„(X,) = Fe.,,o(X1)


and let

(13) 1J„(t)=.ln[i„(t)—t] for 0: t<_ 1

where 9„ denotes the empirical df of the „ ; 's. Note the simple identity

(14) ✓n (F„ — F„) = Ü,,( P,,) (under regularity)


TESTS OF FIT WITH PARAMETERS ESTIMATED 231
which requires some sort of continuity assumption (it suffices if the F0 . 7 's
have mutually absolutely continuous densities). We will call U,, an estimated
empirical process also.
These authors were interested in concluding

(15) iiC„ -611 -->,, 0 as n-o0

[which is true under regularity, see (17)], where [with Fj and Fk as in (10)]
(16) v = U —Y. Zjgj +Y- ykgk with gj = Fj (Fe,ö), and so on.
j k

If (14) holds, then the requirement for concluding (15) is to be able to plug
FR' into the left-hand side of (9); thus
uniformly continuous F's and Fk 's and mutually absolutely
(17) 1 continuous densities that are uniformly bounded

suffices for (15) to hold. [Of course, convergence of U. in modes other than
would suffice to conclude jö VJ„(t) dt -> d Jö 0 (t) dt, say.] [1

Remark 2. Consider the case of a location parameter when F0, 0 = F( • — 0)


for some "suitably nice” df F. Suppose we estimate 0 by its maximum likeli-
hood estimator (MLE) 0„. Then the function F, of (10) is typically —f where
f denotes the density of F. Likewise, the function h of (11) is typically defined
by h(t) = —f'(F '(t)) /[If(F '(t))] with I = I %; (f'/f) 2 dF denoting Fisher's
- -

information. Thus Darling's theorem (Theorem 1) and Remark 1 would show


that

(18) Jn(IF„
. — F.,) =UJ „(F(• -9„))
satisfies

(19) IIIJ„—vii -- 0 asn-400 where 6(t) =U(t)+Zf(F '(t)) -

with Z and U being jointly normal and Z having mean 0,

(20) Var [Z] = J h 2 (t) dt = l - ' with I = J ^ (f'/f) 2 dF

and

Coy [Z, U(t)] =


f o, h
j (s) ds =
0
— J f'(F
t - (s)) /f(F - (s)) ds/I
^(t)
f F
(x) dx/
.f'
m
I= — f
- 1, di(Y) I
( 21 ) =—f(F- '(t)) /I
232 INTEGRAL TESTS OF FIT AND ESTIMATED EMPIRICAL PROCESS

In this case the covariance function of ID reduces to

(22) Kv(s,t)=[sAt— st] —f(F '(s))I 'f(F '(t))


- - - for 0 s,t<_1
(23) =[s A t— st] —O(s).O(t),

where

(24) 4(t)= —.i(F - ► (t))1VI.

We call f(F ') the density quantile function. (In this discussion, we have
-

concentrated on the null hypothesis, not local alternatives.)

Exercise 1. (Sukhatme, 1972) Consider the location and scale parameter


case when F90 = F((. — 01)102) for some "suitably nice" df F. Suppose we
estimate 0 by its maximum likelihood estimator 8,,. Show that Darling's theorem
(Theorem 1) would give that

(25) J(F, — F^)=Ü (F((• — B1„)/e2n))

satisfies

(26) 11 Cl —UJII ->,0 asn-+nowhere


0(t) = U(t)+Z1f(F (t))+Z2F '(t)f(F '(t))
-,
- -

with Z = (Z,, Z2 ) and U being jointly normal and Z, and Z2 having 0 means and

(27) T z ° fl o-y 11 = I '


- for Fisher information matrix I

and

(28) Cov [Z, U(t)]= — (f(F - '(t)), F - '(t)f(F - '(t)))I - '= — g(t)'I - '•

In this case the covariance function of v reduces to

(29) Kc,(s,t)=[snt— st] —g(s)'1 - 'g(t) for 0<s,t<_1

(30) _ [s A t — St] — 4,(5)c,(t) — 4 2 (S)¢ 2 (t),

where

(31) 1(t)=— Q11 — a 2/o22g1(t)

and

(32) 42(t) = (u12/o22)g1(t) — o22g2(t)•


TESTS OF FIT WITH PARAMETERS ESTIMATED 233
(Just set Z, =0 and g 1 = 0 in the above to obtain the result appropriate when
only a scale parameter is involved.) (See also Durbin et al., 1975, p. 218.)

Exercise 2. Show that Remark 2 and Exercise 1 can be rigorized when F


denotes:

(a) a N(µ, 0.2 ) df with and ar e unknown.


(b) a N(µ, 1) df with µ unknown.
(c) a N(0, Q 2 ) df with Q Z unknown.
(d) an Exponential (0) df with 0 unknown.
(e) an Extreme value (µ, o) df with (or Q 2 , or both) unknown.

Compute the various quantities of Remark 2 and Exercise 1 in these cases.


[In case (a) we have v„ =1, Q 12 = 0, NZZ = z, ^, _ - 4)(I '), and 4)2=
-1 - ' 4)(m - ') /f, where c denotes the N(0, 1) df with density ¢. See Sukhatme,
1972 or Kac et al., 1955.] [In case (d) we have o=1 and 4)(t)=-(1-t)
log (1 - t). See Durbin et al., 1972.]

Exercise 3. This exercise concerns the limiting distribution of Ü„ under local


alternatives (0, y/.Jn). Let I denote the J+ K by J+ K information matrix
partitioned into the four obvious submatrices. Then for "suitably nice" df's
F we can expect 11v„- (U +S)ll -* P O as n- co, where

gi gI

j( s=(Y► ,..., YK)


9K
- 121111'
gJ

with g = F; (F6,ö) and gk = Fk(F - )


;

What sort of statistic does one use to test the hypothesis that the true df is
some F with 0 E O? In the spirit of this chapter, a natural statistic is

(33) t,, J n( 0= „ - F„) z +f(F„) dP„;

we use the labels 1, A,,, and so on for 4i(t) =1, r/i(t) = [t(1- t)] ', and so
on. Of course, under regularity conditions on F we will have

T, a 7 =
(34)
J ^ [O(t)] 4)(t) dt
0
2

for a process 6 of the type (23), (30), and so on.


We will not rigorously prove much in the spirit of Remarks 1 and 2 and
Exercise 1 in this section; to do so necessarily leads to "dirty theorems” (see
234 INTEGRAL TESTS OF FIT AND ESTIMATED EMPIRICAL PROCESS

Sukhatme's, 1972 proof of Exercise 1). We feel that it is better for the moment
for the reader to check the steps sketched in Remark 2 and Exercise 1 in
specific cases. Interesting special cases have been considered in the literature—
see Stephens (1976) for normal or exponential F; see Stephens (1977) for the
extreme value distribution; Pettitt (1978) for the gamma distribution; and
Pettitt (1976) and (1977) for censored normal and exponential cases,
respectively.
(In Sections 4.5 and 4.6, however, we treated rigorously something very
similar to this when we considered empirical processes of residuals in the
linear model.)
In all these cases the problem reduces to the following simple form. Let
{X(t): 0<t<_ 1} denote a N(0, K) process on (C, (6) when K is a bounded
covariance function of the form
(35) K(s, t) = K(s, t)–çb,(s)çb,(t)-4 2 (s)4' z (t)

for some well-investigated covariance function K and some square integrable


functions ¢, and 0 2 . Then
(36) what is the distribution of I[RC(t)] 2 dt?

This is a "clean" question, and it will be investigated carefully in the next


section. [In some sense, Eqs. (18), (26), and (30) are the key equations of this
section. Theorem I merely indicates that all this can be rigorized for most nice
df's.]
Exercise 4. (Darling, 1955) (i) Define

X(t)=U(t)+0(t)I 0"(s)U(s) ds
0

for a "suitable nice" function 0 satisfying

f0
[,0'(t)] 2 dt=1.

Verify ,(ö ¢"(t)K v (s, t) dt = –4(s), and then compute K$ ; compare it with
(23). (ii) In the context of Remark 2, consider the choice
0(t) = –I(F - '(t))1I

Exercise 5. (Rao, 1972) For "suitably nice" F we have

–F.,(x)]+2n-1/2JY^ logt (X;)]'I-'[e logfe(X;)


[do
UJ(F0).

(Note the upper limit on the sum of n/2.)


THE DISTRIBUTION OF WZ , W2, A2 , Ä2, 235
Exercise 6. (Durbin, 1973b) For "suitably nice" F we have

,/n [1 „(x) — F B ,(x)] v(Fe),

where 9*n denotes the maximum likelihood estimator based on a randomly


chosen subsample of n/2 observations.

Exercise 7. (Darling, 1955) Let F denote the Cauchy df with density [Ir(1 +
x 2 )] -' •

(i) For the location case of Remark 2, show that Remark 2 can be rigorized
with I= and ¢(t) _ (— ✓2 /zr) sin 2 (rrt). Show that the limiting ry WZ has
mean '' —1/(4ir 2 ). Compute the characteristic function of W2 .
(ii) For the scale parameter case at the end of Exercise 1, show that Exercise
1 can be rigorized with I= and 4(t) = (1 /) sin (2irt). Show that the
limiting ry Wz has mean'6- - 1/(41r 2 ) again. Show that the appropriate Ä; are
.l; = 1/( j 2 7r 2 ) for j = 1, 3, 4, 5,... (but not j = 2). (Thus, "one degree of free-
dom" is lost.)

Theorem 2. If 1U,, —U 11 -' 0 as n - oo for some process 0 on (C, W), then

asn-*x

for Ö/„ _ ✓ n (Qi„ — I) and Ö/ = —UJ.

Proof. Just reuse the proof of Theorem 3.1.1 given at the end of Section 3.3,
replacing a.s , by -te n . ❑

6. THE DISTRIBUTION OF W Z , W^, A, Ä„, AND OTHER


RELATED STATISTICS

In the previous section we have seen that under the null hypothesis

(1) W J ^ n[F „(x) — F„(x)] 2 dF„(x)

(2) ->d W 2 =
J0
U 2 (t) di under suitable regularity conditions,

where v is a normal process on (C, 'W) having zero-mean value function and
covariance function K(s, t) of the form

[s n t — st] — 0,(s) (A,(t) for 0 s, i<_1

for certain functions 0,, ... , O„,.


236 INTEGRAL TESTS OF FIT AND ESTIMATED EMPIRICAL PROCESS

Characterizing the Distribution

In the present section we are concerned with a generalization of this problem.


Suppose that

(3) X = N(0, K) on (C, ctf),

where K is a bounded covariance kernel expressible as

(4) K(s,t)=Y AA.i;(s)f(t) for0<_s,t<_1,


=1
where the A 1 > A 2 > • >0 are the eigenvalues of K and f,, fZ , ... are the
corresponding complete set of orthonormaI eigenfunctions. We have seen in
(5.2.11) that
ft
(5) T=JX2(t)dt= A,Z,2 for iidN(0,1)rv'sZ 1 ,Z2 ,...,
o i=i

so that T has characteristic function d(1/(2iA)) - " 2 where

(6) d(A)= fl (1—A').


i =1 A

Now suppose that

(7) X = N(0, K) on (C, 16),

where K is expressible as

(8) K(s, t)=K(s, t)-4,(s)O I (t)—/ 2 (s)0 2 (t) for 0<_s, t<_1

for some fixed square integrable functions 0, and * 2 . Our problem is to


determine the distribution of

(9) T= J 0
' x 2 (t) dt.

We would like to show that

(10) f = J^;Z; for iid N(0, 1) rv's Z,, Z2 , .. .

and give a recipe for finding the . To this end we let

(11) a,; = fl o 1 (t)f; (t)dt forj=1,2,...andi=1,2,

so that a, denotes the coordinates of (p ; relative to the orthonormal basis


THE DISTRIBUTION OF W Z , W2 , ÄZ, Ä„ 237
f,, f , ...; we then let
Z

(12) S ,(,1) = (Kronecker's S;; -) +


11 for 1 s i, i <_ 2.
We define A(A) to be the determinant

(13) 0(A) _ IS21( A


) S2z(A)I =S11(A)S22(A)—S^2(A);
s2^(a) szz(a)

we make the conventions that 0, is not the zero function, that 0 2 is not a
nonzero multiple of ¢,, and that

(13') 0(A) = S, 1 (A) in case 10 2 is the zero function.

Theorem 1. (Darling; Sukhatme) Under the assumption (3), (4), (7), and
(8) we have that (10) holds (i.e., T = E^ ä ; Z; for iid N(0, 1) rv's Z,, Z2 , ...)
where

(14) the ,1; 's are the solutions of d(A)A(A)=O.

If all a ;; are nonzero, then

(15) the ä ; 's are the solutions of A(A) = 0.

Finally (the analog of the loss of degrees of freedom for parameters estimated),

(16) ÄJ; A; forj= 1,2,....

Of course, the most interesting special cases are the 1,2 of (2) and

(17) Ä 2 = J 62(t) dr.


o t(1— t)

[Also treated in the literature is

(18) Üz= f^ [0(t)— Ü] dt, 2


0

where Ü= jö 0(t) dt as in Exercise 3.6.3. Theorem 1 is not sufficient to handle


U 2 since Exercise 5.3.4 shows that the accompanying covariance function has
multiple roots; rather than complicate our theorem for this lesser statistic, we
refer the reader to Stephens, 1976.]
This theorem generalizes to any number of 4); just increase the dimension
of A(1).
238 INTEGRAL TESTS OF FIT AND ESTIMATED EMPIRICAL PROCESS

Related work is found in Kac et al. (1955); though later authors have
seemed to find Darling's (1955) approach more tractable.

Tables of Distributions

Consider the problem of testing one of the following:

(K) F is a completely known df.


(Ni) F is some N(µ, 1) df with p. unknown.
(N2) F is some N(0, o 2 ) df with a unknown.
(N) F is some N(µ, 0. 2 ) df with µ and o, unknown.
(E) F is some Exponential (0) df with 0 unknown.
(EV) F is some Extreme value (µ, v) df with µ and a unknown.

In these situations, the results of the previous section can be rigorously


established as indicated in those exercises; thus it becomes a practically useful
problem to numerically obtain the ,1 's of Theorem I in these cases, and to
;

then use these to numerically obtain the asymptotic percentage points of tests
based on W n , Ä„, and so on. This program was carried out by Durbin et al.
(1975) and Stephens (1974), (1976), (1977), with related interesting work
by Pettitt (1977), (1978). The asymptotic percentage points in Table 1 are
compiled from these references. When possible, Stephens also computed
these values by matching the first four cumulants and using Pearson and other
curves as approximations. The two methods showed good agreement.
(Note that Table 1 also includes percentage points and modifications for
Kolmogorov's D„ = IIv„ !I and Kuiper's V„ OJ„ II + Iv„jI.)
How does one use these tables with finite n? Stephens also ran extensive
Monte Carlo sampling experiments to estimate those percentage points for
finite n. After extensive plotting and smoothing he has come up with the
following recommendation. Compute the statistic W n , A, and so on. For the
sample size n actually used, compute the modification of your statistic that is
indicated in the "modification” column of Table 1. Then the percentage point
of the modified statistic for finite n is to be approximated by the asymptotic
percentage point of the table.
Some comments on the power of these tests seem appropriate; see Stephens
(1974). The integral-type statistics seem to outperform the supremum statistics
in case 0, and, by a lesser margin, in the other cases. Overall, A^, or A 2 and
then W„ or W;, seem the best choices. From Durbin et al. (1975), it seems
appropriate to also consider the normalized principal components Z, for ;

1 <j <_ 4 to be defined below.

Exercise 1. (Pettitt, 1977) Consider now two variations on case (E):


(E1) Only the first r-order statistics X„ , < • • • X are observed.
;

(E2) Observation is stopped at some predetermined time x0.


Table 1 (Part a)
Percentage Points of T
Statistic T Case Modified Statistic T„ .85 .90 .95 .975 .99

n,z x (W3-.4/n+.6/n 2)(1 +1/n) .284 .347 .461 .581 .743


Ni not considered .118 .135 .165 .198 .237
N2 not considered .285 .329 .443 .562 .723
N W2(1 +.5/n) .091 .104 .126 .148 .178
E W 2 (1 +.16/n) .149 .177 .224 .273 .337
E W2(1 +. 16/n) .14801 .17451 .2216r .27081 .33761
EV Wz(1 +.2/v%n-) .102 .124 .146 .175

Äe K A5 itself, with n k 5 1.610 1.933 2.492 3.070 3.857


NI not considered .784 .897 1.088 1.281 1.541
N2 not considered 1.443 1.761 2.315 2.690 3.882
N A z(1 + 4/n - 25/n ) .560 .632 .751 .870 1.029
E A B(1 + . 6/n) .918 1.070 1.326 1.567 1.943
E A z(1 +. 6/n) • 9151 1.0611 1.3211 1.5901 1.9471
EV A z(1 + . 2/✓n) .837 .757 .877 1.038

Üz K (U1 - .l/n+.1 /nz)(1 +.8/n) .131 .162 .187 .221 .267


NI not considered .111 .128 .157 .187 .227
N2 not considered .106 .123 .152 .182 .221
N U2(1 +,5/n) .085 .096 .118 .136 .163
E U2(1 +. 16/n) .112 .130 .160 .191 .230
EV U2(1+.2/fn) .097 .117 .138 .185

D K D-(1 + . 12/v +. 11/n) .973 1.073 1.224 1.358 1.518

D K D(1 +.12/1 +.11 /n) 1.138 1.224 1,358 1.480 1.628


N D (1 - . 01 /v+ . 85/n) .775 .819 .895 .955 1.035
E (D - . 2 /v) (1 + . 26/v+ . 5/n) .926 .990 1.094 1.190 1.308

K V(1 +. 155/v+ .24/n) 1.537 1.820 1.747 1.862 2.001


N V(1 +. 05 /v+. 82/n) 1.320 1.386 1.489 1.585 1.693
E (V - . 2 /v)
(1 + . 24/v + . 35/n) 1.445 1.527 1.655 1.774 1.910

Table 1 (Part b)
Percentage Case Ni Case N2 Case N Case E
Points jyz A-z 02 ü Ä z Üz ßrz I Üz F,2 Uz

0.01 0.01865 0.15248 0.01794 0.02191 0.17045 0.01741 0.01651 0.12944 0.01589 0.02005 0.01807
0.05 0.02565 0.20230 0.02458 0.03166 0.23468 0.02378 0.02228 0.16743 0.02135 0.02804 0.02478
0.10 0.03074 0.23779 0.02939 0.03944 0.28336 0.02834 0.02638 0.19375 0.02520 0.03403 0.02964
0.20 0.03882 0.29221 0.03698 0.05282 0.36322 0.03552 0.03269 0.23317 0.03110 0.04372 0.03733
0.50 0.06269 0.44659 0.05943 0.10171 0.63004 0.05875 0.05087 0.34043 0.04797 0.07381 0.06007
0.80 0.10370 0.70242 0.09815 0.22003 1.21818 0.09356 0.08114 0.50908 0.07580 0.12970 0.09923
0.90 0.13439 0.89323 0.12742 0.32699 1.74270 0.12173 0.10354 0.83058 0.09818 0.17448 0.12877
0.95 0.16529 1.08696 0.15716 0.44180 2.30775 0.15065 0.12802 0.75157 0.11653 0.22157 0.15873
0.99 0.23804 1.55062 0.22807 0.72457 3.70201 0.22072 0.17878 1.03482 0.16382 0.33760 0.22990

t These specially marked values are from Pettitt (1977). All other values in part a are taken from the most
recent of Stephens (1970. 1972. 1974, 1976, 1977) in which they appear. All values in part b are taken from Dur-
bin. Knott and Taylor (1975). It is noted that there are slight discrepencies between the three authors.

239
240 INTEGRAL TESTS OF FIT AND ESTIMATED EMPIRICAL PROCESS

In case (El) the natural estimate (the MLE) of 0 is

r l
= Xn:7+(n—r)Xn:r /re
]

while in case (E2) we would use

R
B Xn:r+(n—R)xo]/R,

where R denotes the random number of observations not exceeding x 0 . In


case (El) it is natural to consider the statistics r W2„ and ,Än of the form

rTn _ n (X
J o
^:. [Fn(x) — Fe(x)] 2 t'(F6(x)) dF9(x)

with weight functions ii(t) = 1 and iIr(t) = [t(1— t)] - ', respectively; the only
change needed in case (E2) is to change the upper limit of integration from
Xn : r to x0 . Let 0 <p < 1. In case (EI) we suppose r/ n --* p as n -* oo. In case
(E2) we let p denote the value to which 1—exp (—x 0/6) converges.

(i) Show that in case (E1)

r Wn-> d p V 2 = x (t)dt if r/n->pE(0,1];


0

here N(0, K) on (C, 16) with

K(s, t) = [s A t— st] —[(1—s) log (1—s)][(1— t) log (I — t)]/p.

(ii) Show that in case (El)

An -* d ^ 2 ° f P [^ 2 (t)/(t(1—t))]dt if r/n->p E(0,1].


0

(iii) Extend both results to case (E2).

This exercise indicates the asymptotic theory for natural tests of the
hypotheses (E1) and (E2). Using methods analogous to those of Stephens,
Pettitt (1977) obtained the ,. ; 's of Theorem 1 and the percentage points of the
limiting rv's „W 2 and PÄ 2 . These are given in Table 2. Pettitt (1976) gives
analogous tables for the case when only the first r of n order statistics from
a N(µ, r 2 ) distribution are observed.

THE DISTRIBUTION OF W2 , W2 , Ä 2 , Ä^ 241

Table 2. Percentage Pointe of Z2 and p W z


(From Pettitt (1977))

p
point 0.5 0.6 0.7 0.8 0.9 1.0

(a) A^
50 0.201 0.244 0.293 0.345 0.407 0.496
85 0.411 0.494 0.583 0.675 0.781 0.915
90 0.487 0.584 0.687 0.793 0.914 1.061
95 0.622 0.746 0.870 1.003 1.149 1.321
97.5 0.763 0.914 1.062 1.222 1.394 1.590
99 0.975 1.145 1.324 1.521 1.729 1.947

(b),# 2

50 0.0247 0.0341 0.0442 0.0548 0.0652 0.0738


85 0.0531 0.0720 0.0921 0.1126 0.1321 0.1480
90 0.0635 0.0857 0.1093 0.1333 0.1561 0.1745
95 0.0821 0.1103 0.1401 0.1702 0.1986 0.2216
97.5 0.1015 0.1359 0.1721 0.2087 0.2433 0.2706
99 0.1279 0.1710 0.2160 0.2613 0.3033 0.3376

Normalized Principal Components of W2.

In equation (5.3.11) we saw that


f i Z*2
(19) n[F „(x) - F(x)] 2 dF(x) =f VJ(t) dt =^Y' (-"^2
'0. o,
for normalized principal components

(20) Zn = n -112 ./L COS (1Trek)


k=1

that were uncorrelated (0, 1) rv's that converged rapidly to normality. Indeed,
the first four Z's seemed to be useful statistics in their own right.
In the present subsection we would like to obtain an analogous
decomposition

(21) W„ = n [0=(x) - F„(x)] 2 dF„(x) = Ü (t) _ Ä Z*„,?.


;

_. o

We recall that under regularity conditions on F we have

( 22 ) d W2 = J 02( t ) d t= X;Z *2
242 INTEGRAL TESTS OF FIT AND ESTIMATED EMPIRICAL PROCESS

Table 3
(from Durbin, Knott and Taylor (1977))
Coefficients a4j for calculating components Z,;; of Wn in tests for the normal distribution

io y ) [n^ j \

Z^S _ E -V c,,, where STf = ( ) (J cos V r?-)

jfori=2m-1: 1 3 5 7 9
jfori=2m: 2 4 6 8 10
i

Case (iii): 1 1.65729 -0.54696 -0.05297 -0.01599 -0.00688


Observations 2 -1.23924 0.27922 0.05706 0.02195 0.01085
are N(82, 02), 3 -1.62519 -0.74524 0.65877 0.08975 0.03249
01, By 4 -0.68592 -0.98503 0.38191 0.09137 0.03928
unspecified 5 -1.71616 -0.54727 -0.60960 0.70771 0.11416
6 0.61709 0.46174 0.89721 -0.43932 -0.11430
7 -1.83603 -0.52771 -0.38946 -0.57132 0.74832
8 0.60432 0.38818 0.39170 0.86806 -0.48515
9 3.78169 1.03851 0.67391 0.64340 1.08365
10 -0.75201 -0.45294 -0.38673 -0.44677 -1.06560

Case (ii):
0, j 0 2m.-1
Observations a31-1j = 1, j = 2m - 1

are N(0, 0Z),


02 unspecified aa,,,j is taken from the table above for case (iii)

Case (i):
0, j # 2m
Observations any„-1j =
1, j = 2m,
are N(61. 1)
B l unspecified aa„a = a a„- 1 , f from the table above for case (iii)

for lid N(0, 1) rv's . '. The reader is referred to Durbin et al. (1975) for what
we now summarize briefly (however, Tables 3 and 4 are their corrected versions
of their tables.)
Define

(23) fin; =Fj,(X^) fort i

and

(24) Jnj = n - ' 12 ./2 cos (j7r n k ) for 1 s j s 10.


k=1

THE DISTRIBUTION OF * , WZ„ A , Ä2,


2 2 243

Table 4
(from Durbin, Knott and Taylor (1977))
Coefficients bj for calculating components Zn; of in tests for exponentiality Wn
10 2 n
Zni = where cos (j 7r)
j-1 n r-I

i J 1 2 3 4 5 6 7 8 9 10

1 1.26286 0.42939 -0.08152 0.02641 -0.01392 0.00725 -0.00477 0.00299 -0.00218 0.00152
2 -0.90795 0.89184 0.43537 -0.08917 0.04128 -0.02027 0.01292 -0.00794 0.00573 -0.00394
3 0.81487 -0.42303 0.81963 0.52393 -0.13055 0.05392 -0.08171 0.01865 .0.01308 0.00883
4 .0.79381 0.36115 -0.41482 0.77280 0.52054 -0.12745 0.06432 -0.03518 0.02368 -0.01557
5 0.77881 .0.33196 0.32235 -0.31993 0-74527 0.58137 -0.15376 0.07052 -0.04365 0.02738
8 0.77533 -0.82035 0.29053 -024859 0.33098 -0.73309 -0.55782 0.14841 -0.07819 0.04538
7 0.77325 -0.31301 0.27241 -0.21355 0.24085 -0.28686 0.71952 0.68475 -0.16805 0.08174
8 0.77854 -0.31126 0.26453 -0.19890 0.20767 -0.21002 0.30039 -0.72119 -0.58531 0.16334
9 0.79770 -0.31603 0.26409 -0.19307 0.19240 -0.17908 0.21487 -0.27897 0.72921 0.81923
10 -1.07747 0.42423 -0.35054 0.25170 -0.24387 0.21890 -0.24017 0.26415 -0.39111 0.98388

Compute
10
(25) Z n = Y_ ajj ^,j ' , for 1 <_j ^ 10,
J'= 1

where in case N [in case (E)] the a - are given in Table 3 (in Table 4). Treat
;;

the Z„ as independent N(0, 1) rv's. Of course, the first two Z's, and then
the next two, are of greatest interest.

Heuristic Proof of the Darling-Sukhatme Theorem

In analogy with Section 5.1, the Darling-Sukhatme theorem (Theorem 1) is


just the natural extension of a result for matrices. We could list various
background results for integral equations, and then proceed with a proof of
Theorem 1. Rather, we will rigorously prove the matrix analog for one 0
function, and let the interested reader consult Darling (1955), Sukhatme (1972),
and Stephens (1976) for the result itself.
Suppose T is positive definite, and write

Al 0 i
(26) Z=r'AI'=(Y1 ... Y,,) ••
O a )(
YY',,
244 INTEGRAL TESTS OF FIT AND ESTIMATED EMPIRICAL PROCESS

for eigenvalues A 1 > A 2 >' • • > A n >0 and eigenvectors y,, ... , y. For ¢ 54 0 set
a, 'y^
(27)
an ^yn

We are interested in the eigenvalues of

(28) j = T 00'.
-

Proposition 1. The eigenvalues ä, 7 --!,>O of T are exactly the solution


of

(29) d(A)S(A)=0,

where

a2 ‚ A
(30) S(A) =1 + = 1 +a'(AI—A) -'a and d(A)= fI 1— 'J.
i=1A — A^ - A/

If all a; 0 0, then the ,1; 's are exactly the solutions of S(A) = 0. Finally, 5,
for all j.

Proof. The eigenvalues A ; of T are solutions of


n

(a) 0 =jAI—^j=IArr—r'Arj=IAI—AI=fl (A—a )=And(A). ;

An eigenpair , g of t must satisfy

ig = g = (T — 00 4 ,

(b) =—cO+T,g forc =g'4.

Multiplying (b) by F gives

—ca=r(,lI—T )g fora=rçb
= r[1r'r — r'Ar]g
(c) = (I I — A)Fg.

For Ä 96 any A3 we may solve (c) for g as

(d) g = —cr'(ÄI —A)-'a,


THE DISTRIBUTION OF W 2 , W2, A 2 , A 245

which when multiplied by ¢' gives

c = —c,'r'(ÄI —A) - 'a = —ca'(ÄI —A) 'a -

n -
j=, Ä — As

We will think of (e) as one linear equation in one unknown; and we will write
it in the form

(f) S(Ä)cg=0 force =g'¢.

If cg = 0, then (b) implies that ,1, g are also an eigenpair for T; and thus cg = 0
implies that ,l equals some )1 ;, which implies d(,l) = 0. If cg 0 0, then 1 can
satisfy (f) only if S() = 0. Thus in all cases we see that any eigenvalue Ä of
^ must be a solution of

(g) d(.1)S(,1)=0.

In the next paragraph we will show that the converse also holds.
Suppose now that A* solves (g). Then either case 1: A* 0 any A ; or case 2:
A * = some A. In case 1 we define

(h) g*= — ^ a1 Z

(or g* = —r`(A*1— A) 'a since this inverse exists);


-

and we note that S(A*) = 0 by case 1. Thus

Tg* = (T — 44')g* = r'Arg* — mçj'g*


= —PA(A*I—A) 'a+0.0'r'(A*I—A) 'a
- -

=—r'A(A*I—A) 'a+¢(a'(.1*I—A) 'a)


- -

=—r'A(A*I—A) 'a+(r'a)(-1)
- using S(A*)=0
= —r'{A(A*I—A) '+ I}a -

= —r '{A(A*I —A) '+(A*I—A)(A*I—A) ' }a


- -

= —r'a*I(A*I—A) - 'a = —A*r'(A*I — A) - 'a

thus A" is an eigenvalue of i with normalized eigenvector g*/Y', , g* 2 . Since


is symmetric and (i) shows A* to be an eigenvalue of , we know that A*
is real. Thus S'(d*)_—a;/(A*—A,) 2 <0, so that x* is a simple root of S(A).
246 INTEGRAL TESTS OF FIT AND ESTIMATED EMPIRICAL PROCESS

We have thus exhibited the proper number of eigenvectors in case 1. Now


consider case 2, where A* = A,. Then we must have a, = 0, else the S(A *) of
(30) does not satisfy (g). We must also have either case 2a: S(A*) 0 or case 76
2b: S(,1 *) = 0. In case 2a, A * = A, is a simple root of (g) ; and we note that A1,
y, is an eigenpair since a, = 0 yields

G)
In case 2b, A * = A, is a double root of (g) ; we note that A, has two perpendicular
eigenvectors, namely A, and the g* of (h). Thus each root of (g) has eigenvectors
of the proper multiplicity.
We leave the proof Ä <_ A for all j to the exercises.
; ;

This proof of Proposition I is virtually a copy of Darling's (1955) proof of


Theorem I for the case ¢ 2 = 0. ❑

Exercise 2. Show that A, A, for all j holds true in Proposition 1.

Exercise 3. Generalize Proposition 1 by supposing Z=Z - 14 - 4'2


Define 0(A) in the spirit of Theorem 1.
(i) Show that any eigenvalue 1 of T must be a solution of d(A)A(A) = 0.
(This is virtually a recopy of Proposition 1.)
(ii) Show the converse. (Use of Proposition I is likely appropriate. See
Sukhatme, 1972.)

An Exercise

Exercise 4. (Kac et al., 1955) Let X = N(0, K) on (C, 16) where K is con-
tinuous and has a Kac and Seigert decomposition K(s, t)= 1, A f (s)f (t)
;

for 0< s, t <_ 1. [Note Mercer's theorem (Theorem 5.2.1). Let Z* = f ö Xf dt/ .
Let ¢ E 2'2 be such that

(31) K(s, t) =K(s, t) —¢(s)O(t)


is a covariance function for 0 s s, t < 1.

K is a positive definite kernel if and only if

Y_
(i)
az
(32) b2 = —' < 1 where a; = cb(t)f; (t) dt.
=1 Al

(ii) Suppose K is positive definite. Then

(33) I[Y_ I
a,f, -4 -*0 as m-,
CONFIDENCE BANDS, ACCEPTANCE BANDS, AND PLOTS 247

(34) a' Z* -^ some W (both in mean and a.s.) as m -' oo,


^—

(35) 4 is necessarily continuous, and

i —JIb 2
(36) R(=RC— 62 W4= N(0, K) on (C, W).

7. CONFIDENCE BANDS, ACCEPTANCE BANDS, AND QQ, PP


AND SP PLOTS

Confidence Bands

If P(fI v„ 11 <—k)=1— a, then the function F„ ± k;,°` 1 /Jn provides a (1—a)


100% confidence band for the unknown true df F. Other bands follow from
considering IIviI.

Acceptance Bands

Recall from Section 5 tha t the estimated empirical process -.in [C;„ — F„] for
normal F satisfies Jn 111 Ä —F„ =Jn IIW„(( —X„)/S„)— 1. Thus, if
P(✓ n —4 (( —)/S)I <_ k) = 1—a, then the functions IF„((• —
t k/Jn contain the N(0, 1) df (D with probability 1— a if normality

1.0

100
.9
50
30
8 20
>- 15
0
Z 10
w
w .5
w
U- .6
w

<5
Ui

> .4
5 5
.3
10
U 15
2 20
30
50
.1

OL
—3 —2 —1 0 1 2 3
STANDARDIZED SAMPLE VALUE

Figure 1. (a) Percent Lilliefors bounds for normal samples.


248 INTEGRAL TESTS OF FIT AND ESTIMATED EMPIRICAL PROCESS

is correct. Thus, normality is accepted if 4) t k „/'Jn contains l n (( • —


that is, 4) t k /..Jn form acceptance bands for the test of normality. For
1—a = 0.95 these bands are shown in Figure la from Iman (1982); see also
Lilliefors (1967). Contemplating the bandwidths in such figures is revealing.
In like fashion E t l' /.fin are acceptance bands for F„(• /X„) in a test for
)

exponentiality; here, E is the Exponential (1) df and ' /.Jn satisfies


P(Jn CIF„ — E( /X„) jj <_ /")) = 1— a. For 1— a = 0.95 these bands are shown
in Figure lb from Iman (1982).

1.0

.9

.8

.7

.6

.5

.4

.3

.2

.1

0
0
STANDARDIZED SAMPLE VALUE

Figure 1. (b) Percent Lilliefors bounds for exponential samples.

Plots

Exercise 1. (Michael, 1983) If S— (2/ it) arcsin (f ), then S has density


(IT/2) sin (irs) for 0 - s ^ 1. Let S l , ... , S. be i.i.d. as S. Then if k/ n -> p E (0,
1) as n-*x, we have Var[nS„ ;k ,]-► 1 /ir e for all p.

For an i.i.d. sample Y,, ... , Y„ from Fo ((y — fs)/o ), the QQ-plot (quantile)
-

graphs ordinate Y, versus abscissa x; = F^ t ((i — Z)/ n ). The PP-plot (probabil-


;

ity) graphs ordinate i„, ; = Fo ((Yn , ; —p/o ) versus abscissa p,, = (i — Z)/ n. The
- :;

SP-plot (standardized probability) graphs ordinate S„:;


(2 /ir) aresin (F0 ( Y„, ; —µ) /o-)) versus abscissa r„ :; = (2/ ir) arcsin
( (i - 2)/n). Table 1 below gives Michael's (1983) percentage points of the
CONFIDENCE BANDS, ACCEPTANCE BANDS, AND PLOTS 249

Table 1. Upper Percentage Points for Dgp (from Michael (1983))

(Exact) (Monte Carloed)

Simple test of uniformity Composite test of normality

a a

n 0.50 0.25 0.10 0.05 0.01 0.50 0.25 0.10 0.05 0.01

1 0.167 0.270 0.356 0.399 0.455


2 0.198 0.273 0.353 0.406 0.495
3 0.195 0.260 0.333 0.377 0.461 0.143 0.212 0.249 0.261 0.271
4 0.188 0.248 0.311 0.351 0.430 0.164 0.208 0.242 0.272 0.316
5 0.180 0,235 0.292 0.329 0.403 0.104 0.129 0.154 0.168 0.198
6 0.173 0.224 0.277 0.311 0.380 0.102 0.126 0.149 0.163 0.193
7 0.167 0.215 0.264 0.296 0.361 0.099 0.122 0.144 0.158 0.188
8 0.161 0.206 0.252 0.283 0.344 0.097 0.119 0.140 0.154 0.182
9 0.156 0.198 0.242 0.271 0.330 0.095 0.116 0.136 0.149 0.177
10 0.152 0.192 0.233 0.261 0.317 0.093 0.113 0.132 0.145 0.172
12 0.143 0.180 0.218 0.244 10.296 0.090 0.108 0.126 0.138 0.163
14 0.137 0.170 0.206 0.230 0.278 0.086 0.104 0.120 0.132 0.156
16 0.131 0.162 0.196 0.218 0.263 0.083 0.100 0.116 0.127 0.149
18 0.125 0.155 0.187 0.208 0.251 0.081 0.096 0.112 0.122 0.144
20 0.121 0.149 0.179 0.199 0.240 0.079 0.093 0.108 0.118 0.139
22 0.117 0.144 0.172 0.191 0.230 0.077 0.091 0.105 0.115 0.135
24 0.113 0.139 0.166 0.185 0.222 0.075 0.088 0.102 0.112 0.131
30 0.104 0.127 0.152 0.168 0.201 0.070 0.082 0.095 0.104 0.122
40 0.093 0.113 0.134 0.148 0.177 0.064 0.075 0.086 0.094 0.110
50 0.085 0.103 0.122 0.134 0.160 0.059 0.069 0.080 0.087 0.102
60 0.079 0.095 0.113 0.124 0.148 0.055 0.065 0.075 0.082 0.096
70 0.074 0.089 0.105 0.116 0.137 0.052 0.062 0.071 0.077 0.091
80 0.070 0.084 0.099 0.109 0.129 0.050 0.059 0.068 0.074 0.087
90 0.067 0.080 0.097 0.103 0.122 0.048 0.056 0.065 0.070 0.084
100 0.064 0.076 0.090 0.098 0.116 0.046 0.054 0.062 0.068 0.081

distribution of
SP _

Note that
1
(2) D, = IIG. - I = max i Ifn:, -P.,:jI +2n,

They were obtained from Noe's recursion; see Section 9.3.

Exercise 2. (i) Develop formulas for the acceptance region of the D,,-test
of Ho: (Fo, µo, ob). Do this for three cases: when the acceptance region is to
be shown on a QQ-plot, on a PP-plot, and on an SP-plot.
250 INTEGRAL TESTS OF FIT AND ESTIMATED EMPIRICAL PROCESS

(ii) Repeat this exercise for the D „'-test. In particular, on the SP-plot,
the D' -test yields the acceptance curves

(3) rn:1 t (the D„ P percentage point) for S„ :;.

Table I also shows Michael's simulated percentage points for the statistics
Dn P obtained by replacing F0 , µ, o by ' o , X„, S. in D„ P above. Michael's
Figure 2 shows the six acceptance bands of Exercise 2 (but with I, X, S)
for a data set having n = 20. Note that because of the correspondence between
the Dn' plots, the same point causes rejection of a normality hypothesis on
each plot. However, the visual properties of the plots are different. Note that
D„ P leads to straight lines on the SP-plot.

(a) QQ Plot a (b) PP Plot


8 C I.V
DSP
7 y D
D SP E 8
6 ö
T
D s
5 c 6
a 4
3 a
3 .4

c
2

0
-3 2 -1 0 1 2 3 0 2 4 .6 .8 10
Standard normal quantiles (x) .... l niform quantiles (p)

(c) Stabilized probability plot


10

Dsr
8
I)

0 6
v
E
. 4
c
2

0
0 .2 .4 .6 .8 1.0
Sine quantiles (r)

Figure 2. Probability plots with 95% acceptance regions based on D and D5 (a) QQ plot, (b)
PP plot, and (c) stabilized probability plot.

8. MORE ON COMPONENTS

We wish to consider once more the Durbin and Knott decomposition of


Theorems 5.3.2. Thus for j ? 1 we set

(1) f;(t) = ✓2sin(jvrt) for0<_t<land A^°(j7r) -2


MORE ON COMPONENTS 251
It is now convenient to introduce (as in Schoenfeld, 1980)

(2) d;(t) = ./2cos (jTrt) for0<_t<1

since the dj 's are also orthonormal and

(3) d; = —f /.JA .
;

Thus the normalized principle components Z of Durbin and Knott satisfy

Zn ° f"' l(t)Un(t) dt/ ✓ A ;

(4) _— U dd,
0

(5) = J' djdU=n"2 Y Ldj(e,)


i_I
— Edj(f)l

n
(6) =n d () ; since Ed3(6) = 0 for these d ,
;

r =,

using integration by parts with bounded d; 's.


Let us choose constants b,, ... , b m and define

(7) Tm. = ^ b;Z* = —f Un [ b; d;].


i =I o, i =I

This statistic could arise as a statistic used to test for the Uniform (0, 1)
distribution. The ordinary CLT gives

(8) Tm„ * d N(0, E b;) as n -* oo,


i =.

for any fixed m.

Exercise 1. Alternatively, use (4) to show that

(9) Tmn -*aTm= — J U d^^=N^0,


('

o
m

iI
m

i=1
\
b; asn-oo.

Now consider a sequence of local alternatives to the null hypothesis. We


will follow the pattern of Remark 5.3.3. Thus X,, ... , X„ are i.i.d. F9 where
252 INTEGRAL TESTS OF FIT AND ESTIMATED EMPIRICAL PROCESS

9n = B o + y/f. Consider the statistic [recall (5)]

T mit = mb
m
L a F0 ) d [
(

n
0 ( f —B°)]

(10) _ Y, bn -tj2 [d; (FB°(X; ))—Ed; (e)] if Fo, is continuous


i =t r =i

m n
(11) =E b n - " 2 d; (F^(X; )) for the d; of (2).

Under this mild regularity our next assumption will be true; we assume that

(12) Tmn +ä N(7 ba b;) asn^oo.

for some constants a 1 ,. , a m .

Remark 1. Note that under regularity we have in (10) that

J[E d;(FB°(X;))— E da(e)] =^ J [d; (FB° ) — d;(FB,)] dFe.

=—y 1 d (F9)—d (F%) dFB ,


J '

0n-00
'

; (FB )) dF^+o(1)
y J — --(d

(13) y J d ' (F) (_ .F, ) dF00+o(1)

^aeoFB) ° F^°' dd; +o(1)


a
1 1
f o d'ar (-- _F
B I ° Feo J dt+o(1)
F@ ) - F eo ]
[! F dt+o(1)
= y Jo d[(_ aBa ]

=y
f 0
d'[\aeßfB) ° Feö ][i /fe° ° Feö] dt+o(1)

(14) =y (Fo )[---o logfe] dF00 +0(1)


J j

and
Var [d; (F^(X; ))] = 1 + o(1),
MORE ON COMPONENTS 253
so that we expect (12) to hold with [use (13) for less regularity]
Fo .
(15) a; = J d;(Feo)[—0 logfe] dF

Combining and (8) (12)


we see that the power of the level a Tm „-test against
the sequence of alternatives 0„ = Bo + y/vn converges to

(16) P(N(0, 1)?z^°^— ba; / b;)).


(

Remark 2. The only properties of the d 's used above were ;

(17) d (),. .. , d ( ) are uncorrelated rv's with variance 1


! m i

for which Eq. (12) can be justified (typically, by an argument of the type
outlined in Remark 1). Which orthogonal functions d and what constants b ; ;

should be used in defining T „? According to (16),


m

I/2
(18) maximizing e (a,b)= Y-
m ab b;) ; j maximizes power.
i -IIG I

Equation (18) can be looked at several ways:


(i) For fixed m and d,, ... , d we would like to choose b = a , since Y_; a;
m ; ;

is now fixed and since the choice b = a thus maximizes e (a, b).
; ; m

(ii) For fixed m we would like to choose d to maximize Y_, a; and then
choose b = a .
;

; ;
(iii) Since m = I would be nice, let us choose d; (= dm = d,) so that

z
(19) d,(F,,) eo logfo]/ I( 0 o) where I% = E®„ [ a8o log.!e] ;
-

then a, = JI (00 ), and the choice b, = a, gives


,

(20) e,(a,, b,) = a, = I.

Note also from the definition of Tm „ that in this case

(21) 7- m„ = n vz E a logfe(X1)•
254 INTEGRAL TESTS OF FIT AND ESTIMATED EMPIRICAL PROCESS

Since the Neyman-Pearson test for this hypothesis rejects for large values of

f0 ,(X1 ) r logfe^(X1) logf9


— 0 (X1) 1
An=log (en—eo)
(22) i=1 feo (Xi) ii L en — 00 J
n
Y - E a log.f0(X1) = YT1 ,
Jn e=, (38p

we see that (under regularity)

(23) this T1n -test will be asymptotically efficient.

In some sense, we have come full circle with the idea of goodness-of-fit via
components. This does, however, suggest that a reasonable choice for a good
omnibus d, is one that "approximately satisfies (19) for the alternatives against
which we seek protection."
In the spirit of the robustness literature, in testing for a location parameter
0 we could choose for d 1 (F00 ) the normalized Huber influence function. (A
more interesting situation would be to estimate location and scale and test for
shape.) (See also Parr and Schucany, 1982.)

9. THE MINIMUM CRAMER—VON MISES ESTIMATE OF


LOCATION

Suppose X 1 , ... , Xn are i.i.d. Fd = F( — 0) for some known continuous df F


and some unknown parameter 0. We define the estimate 8 n of 0 as any value
in [—oo, x] for which

(1) 9n minimizes f T n[F (x+B)—F(x)] 2 dF(x).

Since (1) is a continuous function of 0 on [—QO,00] that has a limit of n/3 as


0-* ±‚ the estimate is well defined.

Theorem 1. (Blackman) If F has a density that is uniformly continuous on


a finite number of (finite or infinite) intervals that make up its support, then

(2) 'Jn(9n-0)-dN(0,o,F) asn-->cc,

where
2 __ j J ' f2 (x)f2 (y)[F(x) n F(y) — F(x)F(y)] dx dy
3 (x) dx] 2
(3) F [I T f
is necessarily finite.
THE MINIMUM CRAMER-VON MISES ESTIMATE OF LOCATION 255
Generalizations of this theorem are found in Koul and DeWet (1983).
For further results in this vein see Bolthausen (1977) and Pollard (1980).

Exercise 1. Show that we can reexpress the numerator of (3) as

I 1(y) dY— CJ . E f2(x) dxf(y) dy


Iz
(4) J H-Wfz(x) dx

under the hypothesis of Theorem 1.

Proof. This result improves Blackman (1955). Our proof is from Pyke (1970).
We define

(a) M(b) = I n[F „(x +bn - ' 1z +0)— F(x)] 2 dF(x)


d

= J [U „(F(x+bn - ` iz )) +8„(x)] 2 dF(x)

where

(b) Sn(x) = f [F(x+ bn - ' /z ) — F(x)].

Now, working with the special construction, for any fixed B


"2
JJU „(F(• +bn - )) — U(F)II
IIv„—UjI+IIU(F(•+bn - ' / 2 )) —U(F)
(c) a.s. 0 uniformly in IbI _< B as n --> o0

by the special construction of Theorem 3.1.1, II F( • + bn ' /2 ) — FII -> 0, and the
-

fact that the sample paths of U are uniformly continuous.


Suppose first that f is uniformly continuous on all of ( —«D, a). Then
(x)= bf(y) where yx.b ,„ is a point between x and x +bn - ' /z given by the
mean-value theorem. Hence for any fixed B we have

s„(x) — bf(x)I = I bf(YX,n) — bf(x)I

(d) -4 0 uniformly in I bI B and —oo < x < co as n -^ 00.

Let w E (1 be fixed. Let B. be a fixed large constant to be specified below.


Then

(e) M„(b) ->M(b)=


E [UJ(F)+bf] z dF

as n -* oo; this follows from applying (c) and (d) to (a).


uniformly inlbi B^,
256 INTEGRAL TESTS OF FIT AND ESTIMATED EMPIRICAL PROCESS

Supposef is uniformly continuous on a finite interval only (as would happen


if F was a uniform df, say). The argument of paragraph two is still true except
on an interval of length not exceeding Bu,n -1J2 ; since F assigns mass O(n 112 )
to such an interval, (e) holds in this case also. Clearly, the type of F hypothe-
sized can be handled by the same argument. Thus we may suppose (e) holds
for the general F.
Now for fixed a,

(f) M(b) b2 J f2 dF+2b fu(F) dF+ U 2 (F) dF

is a quadratic in b; its minimum occurs at

(g) b=—JfU(F)dF/ fzdF J


(h)
= N(0, o-) by Proposition 2.2.1.

By Minkowski's inequality we have for fixed w that


(' 1/2

[M„(b)]'^ z >_[a,,(b)]'^ 2— J U^(F(x+bn - '^ 2 )) dF(x)


J

(i) _> [a,(b)]'' 2 —(some constant K s,)

where

(j) a„(b)= 52, dF.

The existence of a finite positive constant K. is guaranteed since for fixed w


we have 11U —U-O, so that 11 U„ II is uniformly bounded in n. But for b> B^,

(k) an(b)>_a„(B.)-^a(B.)=B^ J^ f2 dF

as n -^ co by the dominated convergence theorem. For b < —B. we likewise have

(1) a,,(b)>_ a.(—B^,)-^ a(—B^,) = a(B.)

as n oo. For our fixed w, we now specify B. so large that

(m) [a(Bw )]' L2 >[M(&)+1]' /2 +Kw +1 and lbI<Be,.

Thus

(n) I&,1 < B^,


THE MINIMUM CRAMER-VON MISES ESTIMATE OF LOCATION 257

B^,

Figure 1.

and applying (m) to (i) gives

(o) M„(b)>M(1,) +i for all Ibj>Bam,

provided n a- some n w .
The combination of (e) and (n) added to (o) shows that M„ is "sufficiently
u- shaped" to imply that the location of the minimum of M. corresponds to
b.„ for any E >0 and all n >_ some n r. ,,,[the graph of M(b) must lie entirely
within the marked-off area of Figure 1]. ❑
CHAPTER 6

Martingale Methods

0. INTRODUCTION

In Section 3.7 we saw how useful martingale methods can be in establishing


convergence of various processes in /q metrics. In this chapter we begin
to exploit a certain body of martingale theory associated with Aalen (1976),
Aalen and Johansen (1978), Gill (1980, 1983), Rebolledo (1980), and
Khmaladze (1981), and hinted at in the later part of Section 3.4. In this chapter
we consider only the case of a single iid sample; censoring will not be
introduced until the next chapter. Our goal in this chapter is to gain some
familiarity with these techniques and to extend the results of Section 3.7 for
U, and W„ to related processes. (In a later chapter we will similarly extend
the results for ll.) We will work on the line in this chapter, instead of reducing
to [0, 1] in all cases. Most of the Heuristic Discussion that follows is from
Gill (1984).

Heuristic Discussion

Suppose now that (M(x), Jwx ), XE R, is a martingale. Then for any


increment M(x+h) —M(x) we have E[M(x +h)—M(x)I9x ]= 0. Operating
heuristically, this suggests that

(1) E[dM(x) Sr_] = 0 for any martingale (M(x), f(x)), x E R,

where fix _ is the o-field generated by everything up to, but not including, time
x. With this background, we now turn to our problem.
Suppose now that

(2) N(x) is a counting process;

a counting process is (informally) an 7 process that can only take jumps of


258
INTRODUCTION 259
size 1. Suppose it is true that

(3) E[dN(x) j ,Vx _] = dA(x), where dA(x) is 9X _- measurable.

It then holds that

(4) M(x) = N(x) — A(x), x E R, is a martingale;

we call A the compensator of M. Note that A is an 1' process.


We define the predictable variation process (M) of the martingale M by

(5) d(M)(x) = E([dM(x)] 2 1 _) = Var [dM(x) I i X _]


= (the part of the variance that can be computed
given the past)
= E([dN(x) —dA(x)] 2 1 . x _)
= E([dN(x)] 2 - [dA(x)] [dN(x)]+[dA(x)] 2 1 gx _)
= E([dN(x)] 2 1.^^x _) -2 dA(X)E(dN(x)1 ^x _)
+ E([dA(X)] 2 I . Y_)
= E([ dN(x)] 1 9 _) — 2[dA(x)] 2 + [dA(X)] 2
X

by (3) and [dN(x)] 2 = dN(x)


= dA(x) — [dA(x)] 2 by (3)
(6) =[I—DA(x)] dA(x)
where DA(x) = A(x) — A_(x) ° A(x) — A(x — ).

We note that [dN(x)] 2 = dN(x) since dN(x) only takes on the values 0 and
1. We also note that 0 <— dA(x) < 1. We thus suggest that

(7) (M)(x) = J x [1 -6A(Y)] dA(y).

Moreover,

(8) E(dM2(x) = E(M_(x) dM(x)+M(x) dM(x)I 9x _)


by integration by parts
= E(2M_(x) dM(x)+[dM(x)] 2 j fix _)
= 2M_(x)E(dM(x)1 ,Fx _)+ E([dM(x)] 2
=2M_(x) • 0 +d(M)(x) by(I)and(5)
(9) = d(M)(x), where this is an Jvx _-measurable function.

260 MARTINGALE METHODS

Thus the process

M 2 (x) —(M)(x) has E(d[M 2 (x)—(M)(x)]^.^x _)=0,

which suggests that

(10) (M 2 (x) —(M)(x), 9x ), x E R, is a martingale provided EM 2 (x) < oo;

that is,

(11) (M) is the / process in the Doob-Meyer decomposition of


the submartingale (M 2 (x), 9x ), XE R.

Suppose now that

I (12) Y(x)= H(z) dM(z) where H(x) is 9X _-measurable for all x.


x.

Then E(dY(x) l 3X _) = E(H(x) dM(x) . x _) = H(x)E(dM(x) I gx _) = 0 by


(1), so that

(13) (Y(x), 3WX ), XE R, is a martingale provided E I Y(x)I <oo.

Moreover,

d(Y)(x) = E([dY(x)] 2 I fix _) by definition (5)

= E([H(x) dM(x)] 2 1 fi_)

= H 2 (x)E([dM(x)] 2 1 Fx ) since H(x) is 3wx _-measurable

= H 2 (x)d(M)(x) by (5),

so that

(14) (Y)(x) = J H2d(M).


-00

This also suggests that

(15) (Y 2 (x)—(Y)(x)), XER, is a martingale provided EY 2 (x) <oo.

Processes H(x) that are SwX _-measurable have the property that H(x) =
E(H(x) 1 9x _) or that H(x) can be determined by averaging H over the past;
such an H is thus called predictable. The statement (13) can be summarized as
INTRODUCTION 261

(16)J x [predictable] d [martingale] = [martingale]

provided moments exist.

Suppose now we have a sequence of martingales M. whose incre-


ments satisfy a type of Lindeberg condition; this suggests that any limit-
ing process M should be a normal process. From the martingale condition
we hope that Coy [M(y) — M(x), M(x)] = lim E {[Mn (y) — M„(x)]M (x)} = M

lim E {M (x)E{M (y)—M„(x)I3wx }}= tim E {M (x) • 0} =0; and for a normal
M M M

process M uncorrelated increments means independent increments. The vari-


ance process of M should be EM Z (x) = lim EM(x) = lim E(M)(x) by (10),
so that it seems reasonable to hope that

(17) Mn 4M_S(V) on (DR , 2 R jI II) as n->cc

for a Brownian motion S provided

(18) the increments of M. satisfy a type of Lindeberg condition

and provided [note (7)]

(19) (M )(x) --p, [some V(x)]


M as n -oo, for each x E R,

where

(20) V is 7 and right continuous with V( —oo) = 0.

As noted above

(21) V(x) = Jim E(M,)(x) = E(M)(x) = tim EM(x) = EM 2 (x)


is often true.

Of course, the original martingales Mn need to be square integrable. This


"quasitheorem” is roughly Rebolledo's CLT of Appendix B.
One other bit of heuristics seems in order. Suppose now that we have several
counting processes N (x) and that we perform the above calculations
;

and determine martingales M, ( x)=N (x)—A (x) with (M, )(x)=


; ; ; ;

5 ,[1 —iA ] dA . Now for constants c ,


X ; ; ;

(22) (x) _ c M, (x) is also a martingale.


; ;

262 MARTINGALE METHODS

We note from (5) that

d(?v1Jn)(x) = E([dRn(x)] Z I'^z-)


n
_ Y_ c+E([dMi1(x)] 2 1 X-)

+Y_Z c1 c; E([dM, 1 (x)][dM 1; (x)]I_)


imj
n
(23) _ E ci(Mi,)(x)

provided that the

(24) M1(y) - M, ; (x -) and M 1 (y) - M,; (x -) are independent given 9.

In fact, conditions under which all of the previous heuristics are actually true
are given in Appendix B. Even without Appendix B, we can use these heuristics
as the first step in a guess-and-verify approach.

Example 1. Suppose now that

(25) N1 (x) = l (x ,- x] , x€ R,

for X1 ,. . . , Xn iid F. Then Ni is a counting process with


E(dN; (x) j fix _) = P(dN; (x) =11 N; (x -) = 0) 1 [x^.Cl
= dA(x) = 1 tx, ^ Y ^ dA(x) where dA(x) _ [1 - F_(x)] ' dF(x),
so that

(26) M11(x)=N1(x) - A1(x)=N,(x) - f z l[xY]dA(y), xER,

satisfies

(27) (AUI, ; (x), ), XE R, is a martingale where


^ n =v[1
X
_ x5 :y x,l i^n].
The predictable variation process is

(MI^)(x) = f [1 X - AAj(y)] dA1(y)

= J x [1- 1 (x,^,jAA(y)] 1 [x ^r] dA(y)


;

(28) = f x 1 [)(y] {1-A(y)] dA(y).



INTRODUCTION 263

Thus
Coy [Mli(x), Mli(Y)] = EM (x A Y)
= E(Mi:)(x A y)

J
xny
= E 1(x[1 —AA(z)] dA(z)
xny
=E
J
xny
E(1[x ] )[1— AA(z)] dA(z)

[1—DA] dF
=j
(29) = V(x A y),

where

(30) V(x)= j x [1 —AA] dF forxE R.

Since the sum of martingales is also a martingale, we have


in
(31) fy0 „(x) =—'- M, ; (x) is a martingale on R with respect to the ^x

=^F „(x)— J x fn[1—F n -] dA

(32) = Un(F(x))+ f x U(F-) dA.

Moreover, (28) tells us that

(M)(x) =1 x [ 1 — F „ -][ 1 — AA] dA

—>, x J [1 — F_][1 —AA][1 —F-] - ' dF

= J x [I—AA] dF

(33) = V(x).

Our heuristic Rebolledo CLT suggests that

(34) M
y taw = §(V) on (DR, 2R, 11 11)

for some Brownian motion S. ❑


264 MARTINGALE METHODS

1. THE BASIC MARTINGALE MO„ FOR U.

We are interested in learning to exploit counting process martingale methods.


The example treated in this chapter is rather straightforward, so we will offer
direct proofs of most of the results—rather than appealing to the general
theory. As the general theory is rather complicated, we feel this approach will
serve the reader well. The more general methods are essential in the next
chapter where censoring is considered. We will learn just enough here to be
ready for that chapter.
Suppose X„,, ... , X,,,, are iid with arbitrary df F. The mass function of F is

(1) AF(x)=F(x)–F(x–)=F(x)–F_(x) for –oo<x<oo.

When it is appropriate, we let

(2) h+ and h_ denote the right- and left-continuous versions


of a function h.

The cumulative hazard function associated with F is


fx
(3) A(x)-= 1 dF for–oo<x<oo.
1–F_

Note that 0= A(–oo) <_ A(x) <_ A(y) A(r) = A(xD) ^ 00 for all x:5 y ^ -r, where

(4) r = F -1 (1) = inf {x: F(x)=l}.

The mass function of A is

0F 1–F
(5) DAA–A 1–F satisfying

The processes we consider will all be adapted to the o-fields

(6) 3z=cr{1, , l : 1_5i<_n.–oo<y<_x} for –oo<x<oo.

Whenever we say a process Z(x) is a martingale on some interval, the implied


statement is that (Z„(x),,FX) is a martingale on that interval. Finally,

(7)J n = J is our convention.


(ab]

In fact, we will typically phrase our limit theorems in terms of the special
construction of Theorem 3.1.1. Thus, suppose f„, , ... , f„„, U,,, W,,, U, W are
as in the special construction of Theorem 3.1.1. Now let F be an arbitrary df,
THE BASIC MARTINGALE M,, FOR U,, 265

and define X,, = F ' ( ;) for 1 <_ i <_ n so that X,,, , ... , X,, are iid F, as claimed.
; -

We work with this special construction in this chapter.


We define the basic martingale M. by (note (6.0.26))

(8) M,(x)=,fn 0=„(x)— dF for—cc<x<oo

(9) =UJ„(F(x))+ U (F-) dF


"

(10) =U„(F(x))+^T U(F-) dA.

Using (3.2.57) on (9) we see that

(11) M (x) = 7L „(F(x)) if F is continuous,

where

(12) "s
7L„(t) — U„(t)+ Jo U ( ) ds for0<_t<_1.

Note that if 1F, denotes the empirical df of the one observation X„ , then
; ;

(13) ^n(x)
1
E^ ^1;(x) =
✓» ;
E l [^1^(x) — J x^ (1 - 1F1 _) dA
1

J
[Jtxj^xj—
f ltXr] dA(Y) .
(14) ;Y-1 J
That ! „ is indeed a martingale will be verified below. The natural limiting
process to associate with NA,, is

(15)
I
M(x)=UJ(F(x))+ UJ(F_) dA
x.
for—oo<x<oo.

Proposition 1. For all —oo < x, y < oo and 1 ^ i n, we have

(16) Coy [M,(x), M1 ,(Y)] = COV [v11 (x), (Y)] = V(x Ay)

and

(17) Cov [M(x), F t(Y)] = V(x n y),



266 MARTINGALE METHODS

where

(18) V(x)= VV (x)= J x (1—AA) dF for—co<x<oo.

Note that 0< V(x) F(x) _< 1 with V 7. Also !ill has uncorrelated increments,
and is thus a martingale.
Proof. Note that the iid R4 have for x <y, by Fubini's theorem,

Coy [MIA(x), A (y)]

(a) = E( 1 [x,^x] 1 [xYt)+ J x dA(u) dA(v)


— J E(1 [x . X] I [x, }) dA(v) — Jx E( 1 [x,=Y] 1 [xut) dA(u)

(b) =F(x)+j JJ 1 [1—F_(uv v)]dA(u)dA(v)

+ J ao f [1—F_(v)] dA(v) dA(u) J


x }

u,x] —f -1
XY

dF(v) dA(u)— +
f m [u,x}
('Y
J dF(v)
x
dA(u)

(c) = F(x) — E [1— F_(u)] [AA(u)] 2 (see below)


u^x

(d) = V(x).

If we replace one of the two symbols J[u.x] in (b) by J( „ x] , then (because of


symmetry about the diagonal u = v) all of the terms in (b) except F(x) cancel
each other out. The remaining contribution from the diagonal is

—i x J
^o {u}
dF(v)dA(u)=— J x [1—F_(u)]AA(u)dA(u)

_— Y- [1—F_(u)][AA(u)] ,
2
(e)
usx
giving (d).
Now for any fixed x we have from (14) and the c,-inequality that

(19)
2 2k-I
{1+E

C 22k- I (1 + 22k) c 24k


f x
- 1[x ;^Y]dA(y) ]

for all x.
2k

}
THE BASIC MARTINGALE M„ FOR U, 267

This holds since


x 2k
E l[x_y] dA(Y)]

a 0 2k

E[l^ x ^,^^ 1[x^,,]] fl dA(y; ) by Fubini


co o J =I

^z 2k

1 2k 2
(f) < fl E 1 [x^) fi dA(y )
-^ j=1 j_I

2k
}
(g)
_ F() dF(Y)
I Jl — _y
( t 1 l 2k
(1 — t) ' " 2 dt t by (3.2.58)
(h) < j
`fo , J
-

= 2 2k .

In the proof of the next theorem we will show that

(i) (M^(x), 4 n(Y)) a.s. (f(x), h(Y)) for all x, y.

Now -mo d plus uniformly bounded moments of all orders implies convergence
of all moments; see von Bahr's inequality (A.5.1) for the uniform boundedness
of the moments. Thus (17) holds. In fact, using Cauchy-Schwarz to bound
product moments,

(20) all product moments of M. converge to those of M

no matter what df F is true. 11

Theorem 1. Let F be arbitrary. We have

(21) 11U n — U 11 0 -) a . s . 0 for the special construction.

Moreover, recall V(x) = J" a, (1 —AA) dF from (18),

(22) M = §( V) for an appropriate Brownian motion 5,

where S is defined on the range of V.


268 MARTINGALE METHODS

Proof. Using (10) and (15) it is clear that

I& —Ii IIU,(F)—U(F)II+JIIU


(i F (F-)II
)
dF

(a)–UII+II

= 0(1) a.s.
(1 I)(1–F_)
U 1)1/4 - 00 f
1 314dF

by Theorem 3.7.1 and the first half of (3.2.58).


From the change-of-variable formula (3.2.57) and (10) we see that, letting
d denote the points of discontinuity of F,
;

(23)

where
AO(x) =Z(F(x))+
dj x f F(d y(F-(dj)) y(t)
1 _ F (d
F-(d,) ;) 1–dt,
t —

(24) Z(t) = UJ(t) + r U(s) ds for 0 t<_ I is a Brownian motion


0 1–s

(recall Exercises 2.2.14 and 3.4.3). From (23) we see that ABI = Z(F) if F is
continuous. Finally, we define S via the equation F = S(V); note (17). ❑

Corollary 1. We have

(25) fy0 = 1(F) if F is continuous,

where Z is the Brownian motion defined in (24). Note (11) and (12).

[Note that for he Y 2 , the process J, h dZ on [0, 1] and the process Jo h dS


on [0, 11—see (3.4.28)—have the same distribution, even though they are
different processes.]
This theme is continued in Section 6.6 below.
Exercise 1. Suppose S is Brownian motion on [0, 00). Show how to define a
process Z(t) so that the process that equals 5(t), Z(t), S(t) for 0<t<a,
a <_ t <_ b, b ^ t < co is again a Brownian motion on [0, oo) having continuous
sample paths. Hint: Add in an independent rescaled Brownian bridge, plus a
random linear term to make the endpoints match up.

We now define a process (M n ) by

(26) (M,,)(x)= J x (1–F„_)(1–AA)dA for –oo<x<oo,


and we call (M„) the predictable variation process of M,,.
THE BASIC MARTINGALE M. FOR U , 269
Theorem 2. Now

(27) fy0„ is a martingale with M„(x) (0, V(x)).


Also,

(28) M,—() is a 0 mean martingale.

Exercise 2. Prove directly the elementary (28). (Prove the result for n = 1 first
and then add up the results for the M0, 's.) This result is a special case of a
1

theorem in the chapter on censoring.

Note from (28) that

(29) EM 2,(x) = E(M,)(x)

= j x E[1 — F„_](1—DA) dA by (26) and Fubini

= f (I—AA) dF= V(x)


X

as it must if (28) is to be true.


It is elementary from (22) that

(30) l4l 2 — V = S 2 (V) — V is a 0 mean martingale on (—oo, cc),

We will thus call V the predictable variation of M.


Eventually we will consider processes of the form

fL aG dM„, M,,O and J M, dqi

and establish their limiting distributions. This goal is inspired by Gill (1983).
Csörgö et al. (1983) deal with a similar topic, but not overtly in the spirit of
counting process martingales. We owe a debt to both in our development.

An Exponential Identity

Proposition 2. We have

(31)
^[lr „(x) —F(x)] _ 1
—F(y)dA0„(y) for all x <F'(1).
1 —F(x) _.1

The process in (31) is a martingale.


270 MARTINGALE METHODS

Note that (31) can be rewritten, using 1—F=(1—F4(1—A), as

1 —
F„(x)
1—F(x)

=1— 1 d(M„(Y)/ )
.! . (1—I:„(Y—))(1—AA(Y))

(31') = 1— J X Y(y—) dX„(Y)

with

X (x) = 1 d (U„ (Y)/v )


_ ( 1— F„(Y — ))(1 — DA(Y))

a (local) martingale. If dX„ in (31') were replaced by l (o^ ) (y) dy, the solution
of (31') with Y„(0) = 1 would be Y(x) = e”. Thus V. is in some sense the
"exponential of X„.” This is made precise by the theorem of Doleans-Dade;
see theorem B.6.2.
Proof. See (3.6.1) for the fact that the process is a martingale. Now

1 X 1 dM„(Y)
T _^ 1— F(Y)
fx

= d(1—F)
^1 1 F,dF„+ J-.(1 — F)(1„— F-)
dl„—I^(1—F„-)dl 1F
(a) —J-^1

d(1 — ^„ )
(b) —J 1 1 F dF„ 1—FL. .1 1 F
1—IF(x) F„(x)—F(x)
1—F(x) +1 = 1—F(x) .

Step (a) uses the fact, see (A.9.6), that

(32) d(1/ U) = —(UU_) - ' dU for right-continuous functions U

of bounded variation. Step (b) uses integration by parts; see (A.9.13). ❑

This proposition shows that Vi[IF„ — FJ/(1— F) is of the form


J1 (predictable)d(martingale). Thus, according to our earlier remarks, it
should turn out to be a martingale. As noted above, this is correct.

THE BASIC MARTINGALE M,, FOR U„ 271
Remark 1. We have just verified that

(33) % ✓n (F n — F)/(1— F) is a 0 mean martingale on (—cc, F - '(1)).

Our earlier Remark 1 leads us to suspect from (31) that

(34) %„ — (X) is a 0 mean martingale for x < F - '(1),

where

(35) (X) J _ (1—F) -2 d(NN„)= J_ (1—F) -2 (1 — F„4(1 —AA)dA.

and

ERCn(x)=E(X„)(x)= J x (1 — F) 2 (1—F_)(1— ^A)(1 —F4 ' dF

= J [(1—F)(1—F_)] - ' dF= {F/(1 — F) }I by (A.9.18)

(36) = F(x) /[1— F(x)].

All of the above is true. [The unusual thing in this example is that EX (x) is
easier to evaluate than E(X)(x).] Additionally,

(37) T = X,,, is a stopping time with respect to the 9r.

Thus we suspect that

(38) X( AT) and X( A T)—(X)( A T)


are 0 mean martingales on (—co, co)

provided F is such that F puts no mass at F - ' (1) [i.e., T< F 1 (1) a.s.]. Again,
this is true.

Exercise 3. After you gain some familiarity with the methods of Appendix
B, come back and verify all statements made in the Heuristic Discussion of
Section 0. They are a special case of results in the next chapter.

Remark 2. When F is discontinuous, nl n is not a counting process (it can


have jumps exceeding 1) and the results of Appendix B do not apply directly
to A,,. Instead one must apply the results of Appendix B to each of the
; ;
processes F, and M, separately, and then add them up. This is one reason
the M, ; 's were introduced and is one of the points of Exercise 2. Another
reason is that processes such as the weighted empirical process W. can be
272 MARTINGALE METHODS

studied by adding up results for F, and M h using weights c,/ c'c. Note that

(39) is equivalent to M. with n = 1,

so all our previous results apply to the truly basic martingales ABO, , 1 <_ i<_ n. ;

The Weighted Case

We will consider the weighted case where the basic martingale is

(40)
n
fy0„(x) - I 1 ^,c 1[xxl_ [ J x
1 [x; > r] dA(y)] for —oo <x <

(41) =W„(F(x))+ J W (F-) dA.

We note from earlier text that

(42) EM(x)=0 and Cov [M„(x), AA„(y)] = V(x A y) for all x, y,

where V(x) = J”, (1—DA) dF. Also note that

(43) jII^A„ —^ 11'-w -^ a.s. 0 as n - oo

by the proof of Theorem 1, where we now let

(44)M=—W(F)+ W(F_) dA=S(V)


J
for the Brownian bridge W of Theorem 3.1.1 [recall also (19) with W replacing
U]. Thus

(45) E(x) = 0 and Coy [fy0(x), M(y)] = V(x A y) for all x, y.

Also

(46) b0„ —(NA„) is a 0 mean martingale,

where, letting l„ _" (c ; /c'c)1, ; as in (3.4.1'),


rx
n C2. x
(47) (Ml)(x) -
(1-1`,1-)(1—AA) dA = (I —G~„-)(1—DA) dA
i 1 c c -= -=
and

(48) Var [ty9„(x)] = E(x) = E(M„)(x) = V(x) = J (1—DA) dF.


PROCESSES OF THE FORM 0M , U, (F), AND hW „(F) 273
Finally,

(49) W n( F, ) j _ dn forall x <T=F '(1). -

2. PROCESSES OF THE FORM qkM., i/r0J„(F), AND *pW „(F)

We will consider the special construction of the weighted case where

(1)
1 U(x) = Y- 1 - [ l(Xcx]_J' '(X^YJ dA (Y )

with the u.a.n. condition

(2) max {c. ; /c'c: 1 < i <_ n} -. 0 as n -* co.

Since the case 4iM. is very routine and does not seem to hold much interest,
we will concentrate instead on the important cases 4W,(F) and iW „(F).
Our theorem can be looked on as a rederivation of part of Theorem 3.7.1.
However, we will now work on (—co, oo) in terms of F, and we will obtain a
particularly interesting q- function as a corollary.
We will call t r u -shaped if

(3) is >0, \ on (—cc, 0], and 1' on [0, oo) for some 0

(or if it is bounded above by such a function). We will write ci E.2 (V) if


J 41 2 dV <oo. Note that J 41 2 dV s j /, 2 dF, so that i/i E -2 ( V) is implied
00

by the simpler condition 41 E £2 (F).

Theorem 1. If the u.a.n. condition (2) holds and ci E '2 ( V) is u- shaped, then

(4) jI[v„(F)—U(F)]4i11 °.0 -4,0 as n-*oo

and

(5) ll[W^(F)—W(F)IipII .. -•'PO asn-* o0

for the special construction.


274 MARTINGALE METHODS

Proof. Now since W (F)/(1-F) is a martingale,

P(IIW(F')iGII! e)= P(II[W(F)/(1—F)](1—F)pII Bc'a)


B EWz(F)
(a) : E -2 (1- F) 2 ^r 2 d 2 by Birnbaum-Marshall
_^ (1-F)

(b) =e-2 J 8 ( 1- F) 2 q,2 d[F/( 1- F)]

(c) _ E -2 J B (1- F) 22 dF
(1-F)(1-F_)

= e -2 J B qi 2 dV by (A.9.18)

(d) <a for 0 sufficiently small.

The same argument shows that for 0 : some 0 we have

(e) P(IIW(F) 'IIB^? E) <a under the hypothesis of Theorem 1.

Now [8', oo) is symmetric to (-cc, 0], and [8, 0'] is trivial. Thus

(f) P(II[W"(F)-W(F)]iIiII CC>5e)<5e forn>-


somen e ;

that is, (5) holds. Then (4) is a special case of (5). We wish to summarize for
later reference two statements proved above: for any y E. 2 ( V) that is right
(or left) continuous and u-shaped,

(6) P(IIW"(F)gII a)-5E z ( B


- g2dV
J^
and

(7) P(IIW(F)q/IIB%?e) E -z (e ,p2dV.

These are both applications of the Birnbaum and Marshall inequality


(Inequality A.10.4.) 0

Corollary 1. If EX 2 <cc, we can use iy(x)=x 2 in Theorem 1.

Corollary 2. (Csörgö, Csörgö, Horvath, and Mason) Suppose

(8) EX2<00.
PROCESSES OF THE FORM aM,,, çliU„(F), AND tI'W „(F) 275

Then the special construction of Theorem 3.1.1 satisfies

(9) 11[U„—U]F+'Iiö-PO and Il[U,--U]F-'II -*P0 asn-*oo.

Proof. Replace c/i and F in Theorem I by F +' and I, noting that

(a) J J
c/i2dF= (F ) 2 dI=EX 2 <0c.

Thus the first statement of (9) holds. Now F - ' has only a countable number
of discontinuities, and the chance some a,,, assumes one of these values is 0.
Thus, the two suprema in (9) are a.s. equal. This holds for W. also. ❑

Exercise 1. Show that if (2) holds and 0r E Y2 (V) is \, then

(10) II[Mn— M]c/i j - 0 as n -oo

for the special construction of Theorem 3.1.1. Verify also that

-2
(11) P01MAI 11 Bcow E) <E iAZ d`/,

and that feil may replace M. in (11).

Remark 1. Suppose F= I, and let q be as in (3.7.1). Then

P(U, /q ^ 2 E) = P(IIU, /( 1— I) ]/ [9/( 1 — I)]ll —2e)

^P(IIJ ((1 —I)/9)d[vn /(1


-1)]IIB^
o E)

by Inequality A.2.11

—P \I )J q 'dM"Ilo>E/ by (6.1.31)

(12) =P(IIKn1iö --E)


for the martingale K,, = L q - ' dM, [note (6.3.2)]

=P(IjKn o— e 2 )
<e EK (0)
-2 by Doob's inequality (Inequality A.10.1)

(13) =e_2Var[Y] =e-z J [9(t)] -2dt



276 MARTINGALE METHODS

since

(14) K(B)=

where
_ 1 _ f 1 1
with qe = l[o,e]/q
(15) Y^ ge(fj) o (1— t)ge(t) dt

and
e
(16) EY=0, Var (Y)=q -z dt and EIYI z+s ^CS Jq - c z+s ^ dt.
f 0

This argument is essentially that of Wellner (1977a). Inequality (12) is worthy


of note. Applying Brown's inequality (Inequality A.10.2) to P( —s) still
has the potential for an exponential bound, which (13) does not.

Exercise 2. Verify (14)-(16).

Exercise3. Use Brown's inequality (A.10.2) in combination with (12) to show


that for any 0< O<_ 2 and e >0

(17) 1'($IUn/qj^ö 4 e) E 'EQ^„( 8 )^lU^„cell^E])

Exercise4. Suppose that r>2 and q(t)=[t(1—t)]' 1 ' for 0 t<_ 1. Use (17)
and inequality (A.5.6) to show that for any 0 ^ p < r we have

ÖP p
_ CPEI YV' if p >_ 2

(18) E [( I N J c p 321 z ifp<2


2+ EY
2—p
where CP is the upper constant in the Marcinkiewiez-Zygmund inequality.

3. PROCESSES OF THE FORM J_ m h dM.

For precise results on processes more general than j_ h d i1 , the reader


should consult Theorems B.3.1 and B.5.2. These were summarized above in
the Heuristic Discussion of Section 0.
We will still deal with the weighted case in this section where

(1) U.(x)=^
c^;c rl^X^X]—l[X.]dA].
PROCESSES OF THE FORM J__ h dM„ 277
Theorem 1. Suppose
X
(2) K„(x)= hdM,, for—co<x<oo,

where h is a deterministic function satisfying

(3) L h 2 dV <oo [i.e., h E 22 (V)].

[Note that ß2 (F) c Y2 ( V).] Then


(4) K,, is a 0 mean martingale
and

(5) K,—(K,,) is a 0 mean martingale,


where, letting F„_ -; (cn ; /c'c)1F, ; as in (3.4.1'),
(6) (OS „)(x)= J X h z d(NN„)= J X h 2 [1— F„ - ][1—DA] dA.
Note also that

(7) VV(x) = Var [K (x)] = EK,,,(x) = E(Oä„)(x)

= J h 2 [1—AA]dF= f h 2 dV.

Exercise 1. Verify Theorem 1 directly in the case n = 1. (We give a proof


below for n = 1 based on the beautifully simple results of Appendix B. The
proof below also extends n = I to general n.)
Proof. It's now time to gain familiarity with the methods of Appendix B.
Compare (2) with (B.3.1), identifying Y, H, M with K, , h, Iy1 11 . We prepare
;

to apply c of Theorem B.3.1. We thus verify that

(a) E J . h 2 d(NN, ; )= E I X h 2 (1— F, _)(1 —AA) dA


; by (6.1.26)
J

(b) =
Jm h 2 E[1 —F 11 _](1 —zA) dA by Fubini

(c) =J . h 2 (1—AA) dF= J^ h 2 dV

(d) < 00.



278 MARTINGALE METHODS

We now apply Theorem 8.3.1 to claim that Oä, ; is a 0 mean martingale with

(e) (l„)(x)= f x h 2 d(M;,)= f x h 2 [1–F, j ](1–AA)dA.

That is, K–(l(, ; ) is a 0 mean martingale, and

(f) Var [OÖ, ; (x)] = E(U6,;)(x)


=J
x h 2 (1– AA) dF = J x h 2 dV

[recall calculations in (c)]. (Compare this simplicity with Exercise 1.)


It remains to extend our result from the counting processes AA, to which ;

Theorem B.3.1 applies to the process l&,, (which is not a counting process in
the U (F) case, if F has even one discontinuity). We will "add up" our
previous results for the M, . Thus the sum of martingales is a martingale, and
;

E{Kn(.v) – c--, (K,,,)(Y)I^x}


-, CC

(g) =E j E cn; [K ir(Y) – (K^i)(Y)]+Y-E c"''O(;(Y)KI.(Y)I Srf}


1i , cC i #j CC

_
(h) cn; [O(ia(x) – (k(;)(x)]+^^ S"—`D,r(x)lii(x) by (f), (k)
=,cc i #j CC

_ n Cni x (

(I) —K n( x ) — = K (x) — f- h 2 [1 —
r (K1)( x) „_](1 — AA) dA
W

as claimed. For (h) we write 1( 11 (y) =11( 1; (x)+ jY hdM, ; and get
v \

Y
E(f1r(Y)Kij(Y)i^X)=D(x)ll(,;(x)+E(


f
^ Y
hdlg,;
x
D'
hdUj xi
/

+K, (x)E
; h dM, fx) +K, (x)E
; ; h dMU, I . Viz)
;

x x

(k) =K,i(x)US,;(x)+0+0+0.

Since lK is a martingale, get two 0's in (k). For the last 0 note that

(1)

while
.I.T,
(XYI O'^ J
h dM,; h dAA, ; =0 on the event [X,> x and Xz > x]`,

(m) and ASU, are conditionally independent on [X, > x and X2 > x].
;
PROCESSES OF THE FORM f h dM,, 279
Theorem 2. Suppose h E.T2 ( V) as in (3) and the c,'s satisfy the u.a.n.
condition (6.2.2). Then there exist rv's, to be denoted by K(x) = l x s hdM,
satisfying

(8) K(x) N(0, VV (x)) with VK (x)= f Y h 2 dV=


x
J r
h 2 [1 — AA] dF

and K, _ I h dM satisfies
(9) IK (x) -,, N(x) for each —oc<x<oo, for the special construction.
Moreover,

(10) K = S( VK ) for an appropriate Brownian motion §


where 5 _ §,,, F depends on both h and F.

Theorem 3. Suppose the c„ ; 's satisfy the u.a.n. condition (6.2.2), Jih E f,( V)
and i/i is >0 and \ on (—co, +co], then

(11) (K,, —K)iA . -*„ 0 for the special construction

where K = S( V < ).

In the process of proving Theorem 2, the following additional results become


clear. A special case of (15) below is

(12) f x
S
l( .Y) d vfl = J Y

Y
dM = M(x).
If h, 1c .2( V), then

(13) Covl J ^hdM, L dM =Cov[ [ ^ hdl,


L
I " f
I ^hdMO „ ]

J
YA1 Y^1

hh[1 — AA]dF= V.
(14) = _q

Proof of Theorem 2. For the existence of a ry K(x) on the special probability


space satisfying (9) we will virtually recopy much of the proof of Theorem
3.4.2. With (A.9.13) in mind,

(15) I h dM = K(x) = h(x)M(x) — I _ M dh


Y Y

d x .1 x

for step functions h of the form


k
(16) h(x)= a; 1 for —oo=x o <x,< <xk _,<xk =x.
J=1
280 MARTINGALE METHODS

Then a step function h of the form (16) is chosen so that


E

(a) J (h — h e ) 2 dV<E 3 .

All that remains is to verify an analog of Eq. (i) in the proof of Theorem 3.4.2.
Thus, as (7) shows,

(b) Var I
L
J x (h—h e ) dNA„] = J x (h — h ) 2 dV <E 3 ,
t

as was needed to complete this part of that proof. We have thus established that

(c) K,,(x) = J x h d(x) -* P some K(x) whenever h E 2 2 (V).

Now, as in the proof of Theorem 3.4.2

(17) K(x) _ cn, Y,, - *d N(0, VK(x))


;, c c

by the Lindeberg-Feller theorem of Exercise 3.1.2 since

J
X xnX^

(18) Y ; = hdM,=h(X,)1[x.,^x]— hdA=(0, VK (x))

by (7) (or by Exercise 2 below). Thus

(d) x h dM = K(x) m N(O, V K (x))

by (c) and (17). The covariance claimed above comes from applying the same
argument, but using a bivariate version of (17) and (18). Also, (4)-(6) imply
Coy [K(x), K(y)] = VK (x A y). Hence K = S( Vs ). ❑

Second Proof of Theorem 2. In as much as

(a) (NN„)(x) - VK (x) for all x,

a simple time change from [—cc, cc) to [0, oo) in Rebolledo's central limit
theorem (Theorem B.5.2) suggests that K = S( VK ). In the proof of Theorem
7.8.2 we verify Rebolledo's ARJ(2) condition in a setting more general than
the present one. ❑
PROCESSES OF THE FORM J_ m h dM„ 281

Proof of Theorem 3. Since K,, and K are both martingales having covariance
function V,( :

J
9

(19) ^'(llK^^II Bam? e) <_ e -z +p2 dVoc = e z0


2h2 dV,

and

(20) p(IIK41II Bam? e) <_ e -2 r[r2 dVK = e-Z


J B 41 2 h z dV.

Both statements are immediate from the Birnbaum and Marshall inequality
(Inequality A.10.4).
In order to treat the middle [ 0, , 02 ], we choose a step function h e of the
type (3.4.23) so that

(a) J BZ
6,
(h — he) z dV<e 3 /[g 2 ( 0 )vq, z (ez)]

[as in line (d) of the proof of Theorem 3.4.2]. Then applying the Birnbaum
and Marshall inequalities to the martingales — h ) and 1' (h — J ,
E

h ) dM gives
E

z f :(h—h)zzdV<e
(hhe)dM^e)`e
(b) P ^J e, J a,
and

(c) P ^ e, li
J(h —hM0Z
e)d?ef se - zJe2(h—hE)2 2 dV<E.
e, / e,

Integration by parts and then (6.1.21) show (with 0, and 0 2 continuity points)
92

h £ dM t
I h,d^O ^ —
Jo t Jo, 11
III^ —MIIä;IIhEi }ä;+IM^(01) —M(0^)I Ih,(01)1

+IIM^—
III j BZ
8,
dlhEI

(d) ->a.s.0.

For the tail [02 , co) we note that +I' is irrelevant, and the method of (19) gives
W

(e) P(IIK ^(82, e Z


h2dV<E

for 0 = 0 sufficiently small;


282 MARTINGALE METHODS

with an analogous result for 1K. Combining (19), (20), (b), (c), (d), and (e) gives

(f) P(Il(Kö -K)1/ilj ?7e)<7s for n >_ some n E .

That is, I(K„-K)^J'i! - p 0. ❑

Note from the proof of Theorem 2 that

ifOÖ„(x)= J x hd MO„

(21) -=h(x) (x)- J AA „ dh - p h(x)b0(x)- J


X x M dh,

then 6((x) = ( h dtbli= h(x)U(x)


x
-f&I1'
X dh.
-.

The first line of (21) is an hypothesis on h. The result is true since Z. >„Z -

and Zn ->' Z* implies Z = Z* a.s.

Exercise 2. (i) Verify that the ry Y„ ; of (18) has variance V k (x).


(ii) Verify (14) directly. Also verify that (14) is suggested by (B.1.12).

4. PROCESSES. OF THE FORM j_ h dUJ„(F) AND J_„ h dW.(F)

Suppose

(1) h E .2 (F).

According to the identity (6.1.31) and then (A.9.14) we have

J x
hdU„(F)=
J
h(y) dF[1-F(y)] 1-F(z) dM„(z) 1
I h(y){[ 1- F(y)) 1 dA^„(y)
1- F(y)

+,1- 1-1 dM„(z)d[1-F(y)]}


cmay)

= J h(y) dM„(y)
X

+ _X= 1
1 -F(z) f
x (y)d[1 - F(y)l dY„(z)•
h
PROCESSES OF THE FORM J a , h dU „(F) AND J_ q, h dW„(F) 283

We summarize this for display as

h dU „(F) = aö„(x) =
(2) J
x
J x hx df^0„

for

(3) hx (Y) = h(Y) +[1 — F(Y)]-' xh d[1— F]


ly
From the left-hand side of (2) we have

Cov {K (x), K (Y)] = Coo l J x h dU1,, f y h dU 1;

(4) = Coy [h(X„,) 1 [x max} , h(X„ I) 1 [X,,, , Y]]


xny y
(5) = ^ h 2 dF—
Ix. hdF^ . hdF

for all x, y and all h E 22 (F).

From the right-hand side of (2) and from (6.3.14) we have


zny

(6) Cov [l((x), K (y)] = h*h *[l —DA] dF.

Equating (5) and (6) provides the interesting identity


xny
(7) h*h*[1 —DA] dF
f^ X Ay ('
dF— J . h dF J
z
h dF
h
=J 2 for the hx of (3).

Moreover, it shows that

(8) h E -W2 (F) implies all hx E Y2 (V) for the h* of (3) and V of (6.1.18).
Theorem 1. Suppose h, !E2 2 (F) and the c„ ; 's satisfy the u.a.n. condition
(6.2.2). Then there exist rv's, to be denoted by J h dU(F) = J% hX dM and
1%h dv(F)=$', h* dF , such that

(9) y
hdv „(F)
fx .
hdv(F)

J h dv „(F)
' -
J h d1J(F)
(' y

[( 0 ) (Vh,h(XX) Vh,In(x, Y)/


,

0 N vh,a(x,
Y) Vh,K(Y, Y)

284 MARTINGALE METHODS

where

(10) Vh h(x, y) =
J
xny

hh dF—
I x.
h dF
y

h dF for all x, y.

Analogous results hold for J h dW„(F) and f''. h dW„(F).

Proof. Just apply Theorem 6.3.2 to K (x). ❑

Theorem 2. Suppose the cm's satisfy the u.a.n. condition (6.2.2). Suppose 4r, h,
and 4/ih are all in ^z (F) and 4i is >0 and \ on (—co, +oo]. Then

(11) hdU„(F)— f hdU(F) l


for the special construction.

An analogous result holds for f h dW„(F).


Also

(12) J 1 1 ( _. o . x] dUJ(F)= dU(F) = U(F(x)),


Ix
.

and likewise for W(F).

Inequality 1. Suppose 4r is \ on (—co, 0]. For all e>0 we have

(V ' 1_^hdU„(F)II e . >-3e )



P

(13) e h t/, dF+ 2


<-62{J J h z dF JB i z dF
- zz {1—F(0)} 2

We can replace U(F) by W(F).


Proof. Now [using (A.9.14) in (a), (6.1.31) in (b), and (8) and (A.9.13) in (c)]

1 -x.
hdU,,(F)= J
=
hd (1 —
L F) 1„(F)]

(a)
f
= x h[1—F]d 1 U F +J h_ d[1 — F]

X X Q)
„ (F4 d ^hd[1—F]
(b) _ .hdt&+
1—F_
PROCESSES OF THE FORM J_,, h dU, (F) AND 1' _ h dW„(F) - 285

(c) J _, hdM„+ U„(F(X)) I hd[l —F]

f hd[l —F] d U«(F)

Thus

(14)

(d)
Ix h dU „(F)
.

f x hdM„—

= B1(x) —

Note from (14) that


in F x) f
(

Bn2(x)+ Bn3(X)•
hdF+ _^[
( ) . hdFdM „

0 3
h dUJn(F)II ^ >
_ 36 ) <_ Y l P(II 4GBn , Ile^^
(e) P ( 1144
and we will bound each term on the right of (e). Now
e
(f) P(IIY'BnIIIB^? E) <E -2 1^l2h2 dF

by (6.3.19). Likewise, (6.3.19) gives

P(IIcGB n3 I1 e^^ E) ^ E 2 J
[1 F(z)] 2 J h(r) dF(r) J h(s) dF(s) dV(z)
=e 2 -

J . J . h(r)h(s)
J rv s
(1 - F)2 dVdF(r) dF(s)

(g) <_ E -2 J 8 h 2 dF J t(i' dF /[1— F(8)] 2 .

Finally, the Birnbaum and Marshall inequality (Inequality A.10.4) applied to


U (F)/(1— F) gives
e 1
dF vn(F) /(1
P(jl Bn2j13w— )`P
\IIL 0 J ^Ih^ —F)II-^> J )

<E 2 J q, (
2
f Ihi dF) 2 d[F/( 1 — F)]

(h) <_ e -2 J h 2 dF F e 1p 2 dF /[1— F(9)] 2 using (A.9.18).

Combining (f), (g), and (h) into (e) gives the claimed inequality. ❑
286 MARTINGALE METHODS

Inequality 2. Suppose i is \ on (—oo, 0]. For all e >0 we have

\Il f
0

^i hdU(F)IIB.>2ef <2s -2 J 4, 2 h 2 dF
(15) P

We can replace U(F) by W(F). We can replace U(F), 2e and 2e -2 by 5(F),


E and e ` z .
Proof. Proceed in complete analogy with (3.4.42), which is the special case
of this result when F = I and t = 1/q. ❑
Proof of Theorem 2. Choose 0, = 0 so small that

(a) P(Ii J hdUJ„(F)II B —3r ) <_3e by Inequality 1


II 00

and

(b) PI IIiP J _ hdv(F)II B1 >_2e)^2s byInequality2.

It is clear from (14) and Theorem 6.3.3 that for some n e we have

(c)P( l^slrl J hdU„(F)— J hdU(F)]^jeZ>_s J<_e forn>_n F


1 L e, e, IIII e,III

for any fixed 0 2 . We now suppose 0 2 chosen so large that (4i is irrelevant on
[ 02 , 0o) since ill is \) a symmetric argument gives

(d) P( J hdU(F)II >_2E) <_2e 2 f ^ h


.
"2
2 dF<E

as in Inequality 2; and a similar result with U. replacing U follows from


Inequality 1. Combining these gives

i[J ^hdU„(F)—+ hdv(F)]lj>lle)<11E forn>_n e ,


(e) P(

implying the theorem. ❑

Some Covariance Relationships Among the Processes

In the spirit of Remark 3.1.1, we let NN;, denote the basic martingale of (6.1.10)
associated with W„ = U. while the basic martingale of (6.1.40) associated with
W W. is denoted by M`„. As in (3.1.24) we let p„ = p„(c, 1) = c'1/ c'el'l;
PROCESSES OF THE FORM ,(_-_ h dU„(F) AND f'_ h dw„(F) 287

and we assume that p„ - p as n -> x for some number p = p c,, . Then (6.1.16) gives

(16) Cov [M n' (x), M'n(Y)) = p.V(x AY)

for V(x) = J". (1 —AA) dF. From (6.1.49) and then (6.3.14) we obtain

Cov [W.,(F(x)), vn(Y)]

=Cov I (1 —F(x)) f x (1—F ' dM 'n, J 1d


J
xny

= (1— F(x)) (1— F)1 1 — DA) dF

(17) = (1— F(x))A(xAy)

for the cumulative hazard function A of (6.1.3). Likewise,

(18) Coy [ U n(F(x)),f'n(Y))= p,.(1 — F(x))A(xAy).

In similar fashion

Cov [W n (F(x)), J y h dM'


J
=Covi (1—F(x))
LL
J x (1 — F) - 'dM', J y hdnllo„)

xAY

(19) = (1 —F(x)) hdA


J
and

(20) Cov I U,(F(x)),


l
J h dMc.
Y I = p ^(1— F(x)) 1 . h dA.
As in (12)

1
Cov W„(F(x)), f h dNN„(F)1

(21)
= Cov L

= VI.h(X,Y)
i 1 dW„(F),
f y.
h dW „(F)
1
288 MARTINGALE METHODS

for the Vl , h of (10), and

(22)
I
Coy U,(F(x)), J y
m
h dW„(F)I = Pn Vd.h(x, Y)•

For the limiting process: Just replace p„ by p in (16), (18), (20), and (22),
while (17), (19), and (21) remain unchanged.

Replacing h by h„

Consider now

(23)
1 h„dW„(F),K„=
.0 1 hdW„(F),
.0

h.dW(F) and K= hdW(F).


J o

The condition we require is that

J
(24) (h n _h) 2 1! 2 dF_) o as n--+oo

as a measure of the goodness of approximation of h„ to h.

Proposition 1. Let at>0 be ” on (—an, +ooJ. Suppose (24) holds. Then

(25) 110[x„ —K]1 - 0 under (24)


and

(26) II'i I f _ h„ dlU4r„(F) —pö„ -^ P 0 under (24) and 4i e . 2 (F)


.JIB
for the special construction.

Proof. Inequality 2 gives

(a) f'(jj4[IÖ„—K71^ W>2e)<_2s -Z I . (h„—h) 2 1p 2 dF^O


1

under the hypothesis on 4i of (24). Of course, this implies (25). InegL.ality I


gives, thinking of 0 as a large number,

P \II 0 LJ-W h°dW (F)—lfl]r0^3e)


PROCESSES OF THE FORM f__ M„ dh, I__ U , (F) dh, I' --
W„(F) dh 289

^P( c& J (h „—h)dW„(F)II >_ 3a)

(b) E -2 J B (h„ - h) 22 dF+ 2


l W [I-F(B)]Z

x J B (h „ - h) 2 dF J B ti 2 dF}
111
(c) <.a for n ? some n . e

On [ 0, co) the function ci is irrelevant and may be replaced by si = 1; then a


symmetric argument in Inequality I gives

^3e)
P \II -^ h „dW „(F)—K„ Il e
P(II I - ^ (h„—h) dW n (F)11 >3e)

(d) <a for n ? n . F

Combining (c) and (d), we have (26). ❑

Theorem 3. Suppose the c ; 's satisfy the u.a.n. condition (6.2.2). Suppose +s, h
and Oh are all in '2 (F) and Ir is >0 and \ on (—cc, +co]. Suppose also that
h„ approximates h in the sense of (24). Then

(27) q h „dW „(F)—


-^
f-^ hdW(F) lJII-^
J

for the special construction.

Proof. This is immediate from Theorem 2 and (26). ❑

5. PROCESSES OF THE FORM J_„ (^9„ dh, ,(_„ U (F) dh, AND
JW (F) dh

Basically, these processes are treated simply via integration by parts. We call
h of bounded variation inside (—cc, co), denoted h is BVI (—cc, co), if h is of
bounded variation on ( -0, 0) for all real 0.

Theorem 1. Suppose the c„ ; 's satisfy the u.a.n. condition (6.2.2). Suppose h
is BVI (—cc, cc). Suppose >0 is \ on (—oc, +oo], jcih + I is bounded by a
290 MARTINGALE METHODS

u-shaped function. Suppose 1i, h + , h_, /rh + and tih_ are f2 (F). Then

(1) III J -^ W (F) dh —W(F)h + + J h_ dW(F) :Iill


J
(2) IIL J -^ U„(F) dh —QJ(F)h + J + _^ h_ dv(F)J
o 1I -, 0,
and

(3) III J -00 li✓0„ dh —bOh + + J ^ h_ dMj q/11 -


_ p 0

for the special construction.


Note that
n
- I iI
where

(5) Kir =f ' [l[x,,^r] — F(Y)]dh(Y)

=(0, Var[{h(x+)—h(X,, ; —)}1 [x,,.. ]]).

The special case x = co gives (using h + and h_ in Y2 (F))

(6) Y,,,=Ynr.= f v31(F)dh =(0,Var[h-(X,j)])

where U 11 (F) = f (F, ; — F) is the empirical process of the one observation X,, ; .

Proof. Integration by parts gives

(7) 41(x) 1 x W (F) dh = W,(F(x))*(x)h+(x) — qi(x) L h- dW(F)

(8) -gypW(F(x))fr(x)h+(x)—(x) f x h- dW(F) in 11 11

by Theorem 6.2.1 and 6.4.2 provided the c ; 's satisfy the u.a.n. condition (6.2.2),
provided (for Theorem 6.2.1) ./ih + is u-shaped and i/ih + E '2 ( V) and provided
(for Theorem 6.4.2) h_, ./rE' 2 (F), fr is %, and r/ih_ E.2(F).
REDUCTIONS WHEN F IS UNIFORM 291

The process U" (F) is a special case of W (F). The proof for M. is identical,
except that Exercise 6.2.1 replaces Theorem 6.2.1 and Theorems 6.3.2 and 6.3.3
replace Theorem 6.4.2. ❑
Exercise 1. Show that, under the appropriate hypothesis from above,

(9) Coy I
L
J x W (F) dh, f
1
-
h dW (F)n I
L Li . hdF
-F(z)
L J
1
i dF dh(z).

Also, W may replace W" in this formula.

6. REDUCTIONS WHEN F IS UNIFORM

Suppose F is the Uniform (0, 1) df L Then (6.1.11) gives

(1) ASU"(t)=Z (t)=U"(t)+


f.,
(s) ds
"

1—s
U
for0<_ t^1

for the basic martingale. Setting h =1 in (6.4.14) reduces it to the same identity
as (6.1.31), namely

1 d for0<_t^1.
(2) U" (t) =
J 01 — s
(s)

Also, (6.4.2) with h = 1 reduces to

(3) U"(t)=M"(t)— dfW "(s)=(1—t) dtv1 "(s)


f.
' t S ^^o 11s
for0<t<_1.
While the rhs of (2) is a martingale, the rhs of (3) is not. Note that the
relationship between M. and U. is invertible.
Passing to the limit in the above equations gives

(4) M(t)=Z(t)=U(t)+
f , U(s)
o1—s
ds for0<—t <_1,
(5) U(tt — f 0
1 1 s dM(s) for 0s i 1,

(6) UJ(t)= '1(t)— dM(s) =(1—t) for df (s) for0<t^1.


—sf t -s 1—s


292 MARTINGALE METHODS

More generally, (1) leads to

(7) o h dM
f .,
hdU+
fo ,
h(s) (s ds ^s for 0 t <_ 1, h c Y 2 .

Likewise (6.6.3) and (6.6.2) lead to


^ (' r 1 1
(8) o hdU= J o h(s)+1 I
f,
"
h(r)dr d_
] 0
- i — t, ^ h(r)dr
for 0<_t<_1,hEY2i

where the rhs of (8) is rich in martingale structure. Finally, (6.4.2) and (6.4.3)
lead to

(9)f o
,
hdU=
fo ,
Ih(s) —j1

for 0<_t<_1,hEY Z .
-'-
f^ '
h(r) dr dM(s)

Of course

(10) W, W, M. and M may replace U,,, U, M,, and M above


for the Z,, = f ,, and Z = M of (6.1.40)-(6.1.49).
CHAPTER 7

Censored Data and


the Product-Limit Estimator

0. INTRODUCTION

Let X 1 ,. . . , X. be iid nonnegative rv's with arbitrary df F on [0, co) (which


we always assume below to be nondegenerate) and let Y,, ... , Yn be lid rv's
with arbitrary df G independent of the X's. We will refer to the X's as survival
times and to the Y's as censoring times. Suppose that we are only able to
observe the smaller of X; and Y; and an indicator of which variable was smaller:

(1) Z; =X;AY, 5 ; =1 [XY] , fori= 1,...,n.

This random censorship model is a useful model for a variety of problems in


biostatistics and life testing: see Kalbfleisch and Prentice (1980) and Gill
(1980) for more general censoring schemes and other problems involving
censored data. Note that if G is degenerate, then this model reduces to the
fixed censorship model. We also allow the possibility G = 0, corresponding to
no censoring.
Our main concern in this chapter is the Kaplan-Meier (1958) product-limit
estimator C. of F defined by

^S„:— sn:i
(2) 1—^n(t)= ^[ ^1— 1 ^1—
Z „,15, \ n—i +I z„ :5, n—i+l

<t<co,
for 0_

where 0 <_ ZZ ,, • • • <_ Zn ,„ < oo and S„ :; are the corresponding S's. [If there are
ties in the Z's, the uncensored Z's (S = 1) are ranked ahead of the censored
Z's (S = 0).] Thus 1 —F n is a right-continuous, \ step function on [0, oo) with
constant value 0 or fj"= ,(1-1/(n — i+1)) b„ ' on [Z,,.,,, co) depending on
whether S n: „ is equal to 1 or 0, respectively. We have thus chosen the convention
293
294 CENSORED DATA AND THE PRODUCT-LIMIT ESTIMATOR

that 1— F„ equals 1— lF„ (Z,,,,,) for t >_ Z,,,„ when 5,, ,, = 0; other conventions set
:

it equal to zero or leave it undefined in this case: see Meier (1975), Peterson
(1977), and Gill (1980). We focus on 9„ since it will be shown in Section 8 to
be the maximum likelihood estimate of F based on the (Z , 3 )'s. ; ;

The corresponding empirical process X„ is

(3) X,, = ✓ n(F„ — F) on [0, co).

Our object in this chapter will be to present results for I„ and X. which parallel
results given for F. and U,,(F) in the uncensored case of previous chapters.
A major difference in our approach here will be that in the later half of the
chapter we will rely heavily on the martingale theory of counting processes,
as recently developed by Aalen (1978a, b), Bremaud and Jacod (1977), Gill
(1980), Meyer (1976), and Rebolledo (1978). Some of the key results of that
theory are summarized in Appendix B. A nice elementary treatment is found
in Miller (1981).
Before describing the contents of the remainder of this chapter, we need
the following further notation and definitions. Note that the pairs (Z , S ), ; ;

i = 1, ... , n, are iid with

1—If(t) = P(Z> t)=(1—F(t))(1-G(t)),

(4) H'(t)=P(Z<_t,8=1)= J (1—G_)dF,


[o,]

H 0 (t)=P(Zt,8=0)= I (1—F)dG,
.1 [0.]

for 0<_ t < oo, so H= H+ H ° . For any df F let TF = F '(1) = inf { t: F(t)= 1 },
-

and note that r = TH = TF v TG . Define empirical (sub-) distribution functions


corresponding to H, H', and H ° by

in

h
(5) H,(t)=— 1[o,t](Z;)s;, for0<t<oo

n
^n(t)= — E 1 [01](Z;)(1 — S;).
n

If A is a right-continuous function with left-limits, we write A_ for the


left-continuous version of A: A_(t) = A(t —) ° lim. T , A(s). We also write

(6) AA=A—A_ and Ac(t)=A(t)— E AA(s).

We adopt the convention throughout this chapter that 0/0=0.


INTRODUCTION 295

The cumulative hazard function A corresponding to F is defined by

(7) A(1)= 1 dF
fro") l-F_

=
f ro , t) (1-F_)(1-G -)
1
(1- G_) dF

(8) = J [0 , 1 (1-H_)
1 dH' for 0 <_ t < TG

[If F is absolutely continuous with density f and hazard rate A = f/(I - F),
then A(t)=JA(s) ds and I - F(t) = exp (-A(t)).] In view of (8), a natural
empirical cumulative hazard function IA,, is defined by

(9) An(t)= dHn


fJ ro,, 1 —^ n
for0^t<oo.

We choose to define A. in analogy with (8) rather than (7) since it is obvious
that IIH„ - H tI - a.s _0 and IIH1„ - H' II a.s. 0, while supo<,<T CIF, -Fit a.s.0 is a
much deeper result that is to be shown in Section 3 [by first showing that the
ÖA,, of (9) satisfies IIOA n - A ii ' a . s _ 0 for any 0 < r = F ' (1)]. In Section 2 we
, -

will ask the reader to verify that

(10) A „(t)
=fro t] , I —
I, di„ for0t<oo,
1F n-

and that

1- 1
(11) (1 - ? n )(1 -^ „)°1-H„ for 1 -I „(t)= f
z-i\
n-i+l) n.,

Now any df F on [0, oo) obviously satisfies

(12) F(t)= f (1 -F_)dA.


ro r) ,

We also have from (10) that

(13) G`„(t)=
fco,t]
(I -9n-) dA

The natural empirical cumulative hazard process is

(14) B „=Vi(A„-A) on [0,cO).


296 CENSORED DATA AND THE PRODUCT-LIMIT ESTIMATOR

We also consider

(15) X"/(1— F) =./n(g" — F)/(1— F) on [0, co).

The study of X,, our main interest, is facilitated by introducing these other
two processes.
In Section 2 we will establish identities that represent B. and X,,/(1— F)
as stochastic integrals with respect to the basic martingale (see Theorem 7.2.1)

(16) v,,(t)=fl H(t)—


L
J [0,1]
(1—H"4dA
I for 0<t<oo.

These identities will make possible the application of Rebolledo's powerful


martingale central limit theorem (Theorem B.5.1) to establish the convergence
of the processes B. and X"/(1— F). However, in Sections 1 through 4 we will
concentrate on some more elementary approaches to the theory. In Section 8
we extend our results to more general censoring schemes.
From now on in this chapter we will use the conventions

(17) J=
I
r]
S.
for s> 0, while
o f
ro ' tj

1. CONVERGENCE OF THE BASIC MARTINGALE M.

Consider the basic (martingale, see Section 7.5)

(1) Mn(z)= / [H(t)_J r (1 — H n _) dAI


0

=[H„(t)-H'(t)] +,i[ J (1—G-) dF—


0 E (1—H"_) dA
I
= ✓n[ Uff'„(t)—H'(t))+V [ J (1—H_) dA— J o (1 — H" ) dA
-

J
= ✓n[H(t)—H'(t)]+ J t
0
v(H"--H_) dA

Ij

=^n[H;,(t)—H1(t)]+Jo /(H” —H_) dF


(1—H_)

H)
(2) = ✓ n[Hn(t)—H'(t)l+ f0 1 (H
v—H_ —
" dH'.
CONVERGENCE OF THE BASIC MARTINGALE M„ 297

Note from (1) that

( l
IYn(t)=—
. ^^_
1
n
n
[ 1 [Z1 I) 6 I
'
— dJ 1 1z; > u1
o
A(u)
J

(3) =T ^ 1 M,^(t),

where (eil,,, ... , M i . are iid processes. It is easy to compute that

(4) Coy [A^(s),vn(t)l= Coy [M 1 ,(s),A11(t)] for o<


—s <_t

=Cov 1 1 [z I5 5 ‚S— fo^ I(z^ui A(u), 1 [Zt] 6— f 1[Z v] dA(v)I


"

LL o

= E(1 [z s] 61 [z .,, ․ )+ f5 J0 0
E[1 [z ^ U] I [z 0] ] dA(u) dA(v)

rs

Jo E(l[z,S]31[z] dA(v)— Jo E(1[z ,]&1[z ^ u]) dA(u)


)

=
J 0
S(1—G_)dF

+{
f
l , .
^s [1 —H(u v v —)] dA(u) dA(v)
0

+
f J [1
o" 5
' —H(v—)]dA(v) dA(u)}
1

5 [1 — G(v—)] dF(v) dA(u)


o [u,s]

—II [1 —G(v—)] dF(v) dA(u)


o [n t]

=
J p
5 (1— G_) dF—

since AA(u) = AF(u) /[1— F(u—)]


u• s
[1— H(u—)][AA(u)] 2

(5) =V(s)= s(1 — G_)(1— AA) dF=


0 f ' s
(1— H_)(1 —AA)dA

for general F and G

(5') = J 0
(1 — G) dF = H'(s) for continuous F and general G.
298 CENSORED DATA AND THE PRODUCT-LIMIT ESTIMATOR

We note from the above calculations that

(6) bon has uncorrelated increments.

Given (3) and (5), a weak convergence result will be straightforward.

Theorem 1. Suppose F and G are arbitrary df's on [0, oo). There exists a
special construction of both the rv's X,, Y,,,, ... , Xn ,,, Y,,,, n >-1, and the
Brownian motion § for which

(7) IIUn—S(V)IIO 4 a.s. 0 as n-4oc;

here M. =v[O;,-!o (1-l-1 _) dt1} is the basic martingale of Xn ,, Yn ,, ... ,


Xnn , Yn „ from (1) and V(t) =J (1- G_)(1-LA) dF as in (5). Also

(8) b9 = S(V) on [0, oo).

Proof. By the ordinary CLT based on (3), and by (5), we easily have

(9) Mn->f.d.§(V) as n-oo.

Also note that

(10) En(t)= ✓ [^(t) - H'(t)}==[z<rtt - H'(t)}

for 0<_t<oo,

where the 1 [z 8 i are iid Bernoulli (H'(t)) rv's. Thus a minor variation on
our ordinary theory of empirical processes shows that

(11) IIE,', - V(H')II a.5. 0 as n - oo

with Brownian bridge V, for a special construction; recall that H' is a sub df.
Also

(12) 1En=^[4 p n -H)=U.J n (H)


-

satisfies

(13) -.i [( i -IM n -)-(1-H_)}=Vn[H„--H-]= =U (H-)

for an appropriate uniform empirical process U,, so that we may also assume
of our special construction that

( 14 ) IIEn_ - U(H-) a.s. 0 as n-oo


CONVERGENCE OF THE BASIC MARTINGALE M . 299
for a Brownian bridge U. We also note that

Cov [V(H'(s)), U(H(t))] = Cov [E (s), E. (t)]


=E {[1 (z . s] S—H'(s)][1 (z . , l — H(t)]}

= H'(s A t) — H'(s)H(t) — H'(s)H(t)+H'(s)H(t)

(15) = H'(s A t)— H'(s)H(t) for 0 <s, 1< X.

Since (2) can be rewritten as

(16) „ = CE 'n+ (E„ - /(1— H_)) dH.


fo
,

we define

(17) 111 = E'+ J (E - /(1—H -))dH'

with

(18) E' V(H') and E= U(H).

Finally, we note that for a.e. w we have

JJM„ —Il —IE'II+IIJ [(E — E_)/( 1— H_)] dH' II


0

_ 1IE — E'II+II {[U„(H -) — U(H-)]/( 1— H_)} dH'


J

(a) JIIR— E 'll+1I[v „(H -)—U(H- )]/(1 — H-)'"11

x J^(1 —H_) - 3 ' 4 dH'

(b) ^lIE E +jI(U „—U)/(1— I)'/4 1 1 t (1— H_) -314 dH


since H = H' + H °

= 0(1)+0(1) J ^(1— H_)


0
' dH
-3 4 by Theorem 3.7.1
(c) =o(1)
300 CENSORED DATA AND THE PRODUCT-LIMIT ESTIMATOR

since

(d) (1— H_) -314 dHs J (1—t) -3 / 4 dt <oc


0 0

by (3.2.58).
Note that we did not have to use (15), (17), and (18) to determine the
covariance function of M. Rather, (5) sufficed. 0

Remark 1. In the spirit of Example 6.0.1 we note that

(19) E[d(l[,5,] 8 ,)I.%,_]=dA1(t)=1[z .,]dA(t). ;

This leads directly to the guess that

(20) h 1 1(t)=1 [z .^ l] S ; —A 1 (t)=1 [z; .s t ,s ; — J1 [ Z.U ] dA(u)


0

is a martingale. Similar heuristics give

(21) "
(hIr)(t)= J 1[1—DA,]dA,= J 1 [z . z , ] [1—AA]dA,
0 0

so that

Coy [h11(s), R,r(t)] = EMl 1(s A t) = E(t u)(s A t)


sA1

=J [1— H_][1—AA] dA = V(s A t)


(22) 0

for the V of (5).

2. IDENTITIES BASED ON INTEGRATION BY PARTS

Throughout this section

(1) F and G are arbitrary drs on [0, cc);

recall from the introduction that this means F is not degenerate, while G
could even be degenerate at +co.
We begin by stating the key identities; the first is elementary. Let

(2) T=Z.,:,, and J(t) = 11 0 1 (t)= 1[o,T](t).


IDENTITIES BASED ON INTEGRATION BY PARTS 301
Theorem 1. For all n ^ 1 we have

— (r J — ^eA
T 1
(3) Bn(t) J J. dl^ll„ — dlyU„ for 0 <_ t <_ T
p 1-l _ p 1-H_
and
(4) SS „(t) (` 1 —^„_ J„

dvt for0<_t<TFand0^t<_T.
1 —F(t) Jp I —F 1—U-0 „-

Of course (3) implies

(3') (t) =B „(tA T)=


J ^ J” dMN
p 1 -6-0 n -
—t <oo,
for 0<

and (4) implies

Zn(t)— RC n (tAT) —f IJn- J„


dMA for0st<oo
1 —F(tAT) o' 1—F 1 —Oil„_

(4) provided F and G are such that T < TF a.s.

Proposition 1. (Aalen and Johansen; Gill) For any df F on [0, oo), and i n , A,
and /A„ defined by (7.0.2), (7.0.8), and (7.0.9), respectively, we have, for
0 <t<oo,

(5) 1— F(t)=exp(—A`(t)) fj (1 —AA(s)),


s<<

and

(6) 1 —G`n(t) fl (1 —A \n(s))•


SSt

Also, for 0:5 t < TF, we have

14^ (t) — ` 1 Fn- ( —

-1— Jp
(7) 1— F(t) 1—F d(A " —A)

or

../ [^n(t)—F(t)] It 1 —lF"- „.


(7) 1 —F(t) Jp 1—F d0$

As in (6.1.31'), (1 —lF „(t))/(1— F(t)) is the "exponential” of the (local)


martingale Jö (1/(1 —AA)) d(/A,, —A), a terminology which is justified by the
theorem of Doleans-Dade, B.6.2, or by the proof which follows.
Y_
302 CENSORED DATA AND THE PRODUCT-LIMIT ESTIMATOR

Lemma 1. (Gill) Let A and B be right continuous 7 functions on [0, co)


with A(t)=B(t)=0 for t <0 and DA <_ 1, OB < I on {O, cc) where

(8) zA(t)=A(t)—A(t—) and A`(t)=A(t)— DA(s).


s <,

Let TB = inf { t: B(t) = cc}. Then the unique locally (i.e., on each [0, t]) bounded
solution Z of
Z(s—)
(9) Z(t) =1—
f.' d[A(s) — B(s)]
1 —AB(s)

on [0, TB ) is given by

(10) Z(t) = exp ( A`(t)) — fl,., (1 DA(s))


exp ( — B`(t)) fist, (1 —AB(s))

Proofs

Proof of Lemma 1. We follow Gill (1980), who adapted the proof of Lemma
18.8 in Liptser and Shiryayev (1978).
We now show that (10) does define a solution to (9). The right-hand side
in (10) is clearly locally bounded on [0, TB ). Let
U(t) — 1—AA(s)
o<s<< 1—AB(s)
and

V(t) = exp (—A`(t) + B`(t))•

Then using
I
(a) AU(s) = U(s) — U(s—) = U(s—) U(s) —1
U(s—) J
= U(s—)
1 1—AA(s) 1
1—OB(s)

in the second term of (b) and seeing the paragraph below for (c), we obtain

1—Z(t)=1— U(t)V(t)

fo,
= 1— U(0)V(0)_J U(s—) dV(s)—
0
J , V(s) dU(s)

by (A.9.13)

1 —DA(s)
U_Vd(—A`+B`)— Y, V(s)U(s—)
(b) _—
fo, o =_s <, 1—AB(s)
IDENTITIES BASED ON INTEGRATION BY PARTS 303

(c) = ^ t
z_
d(A` —B`)+ E Z(s—) [AA(s) AB(s)] —

JJo 1—OB 1 —AB(s)

(d) = J f
0
Z
1—OB
d(A —B),

where (1 —AB) - ' could be introduced into the integrand of the first term of
(c) for free because A` and B` are continuous. Thus(10) satisfies (9) by (d).
To show uniqueness, suppose that Z' is another locally bounded solution
of (9) on [0, TB ). Let Z=Z—Z', L(t) 1111(, and a(t)=
jö(1—AB) 'd(A +B) <oo for 0 <—t<-r e . Then, for any s <
- — t, differencing in
(9) gives

(e) 12(s)I <_


f" ' 12(u—)I da(u)< L(t)a(s).

Substituting the outer inequality of (e) back into the first inequality of (e) gives

(f) Z(s)I L(t)a(u—) da(u)L(t)a(s) 2


fs

using the left-hand side of (A.9.17) with r = 2 in the second step. Repeating
this up to any r yields

(g) I2(s)I: r^ L(t)a(s)' -*0 as r - oo

since exp (a(s)) =Y (1 /r!)a(s)'<oo implies a(s)'7r!-^0 as r -oo. Thus


Z(s) = 0, and the Z given by (10) is the unique locally bounded solution of
(9) on [0, TB ). ❑

Proof of Proposition 1. Now (5) follows from (10) by Lemma 1 with A= A,


B = 0 and Z = 1—F. Then (6) is a special case of (5). To prove (7), take A = D\„
and B=A in Lemma 1; specifically, use (5) and (6) in (10) to see that
Z = (1— F„)/ (1— F) works in (9) for this choice of A and B. Finally, note that
(1 — '_)(1 —AA) = 1— F. Thus (7) holds. ❑
Proof of Theorem 1. From the formula for/\. in (7.0.9) we have for 0<_ t <_ T that

13,,(t) V[&(t)—A(t)J d00 ”


V LJo 1 — C-0 „_ ,) o dA J

(a) = ✓n
'

0 f d H'„— (1—H „_)dA =


1 — Il I_ L
-
., J J0 1—H,,_
d A^O„.

Thus (3) holds. Then (4) follows from (7') and (3). ❑
304 CENSORED DATA AND THE PRODUCT-LIMIT ESTIMATOR

Exercise 1. Show that the product-limit estimator F. reduces to the usual


empirical distribution function F. when all observations are uncensored, 6, = 1,
i =1,...,n.

Exercise 2. Verify (7.0.10) and (7.0.11).

Exercise 3. Show that (4) and Efron's (1967) identity

(11) 1—tF„(t)=1—H„(t)+[1—^ n (t)] f—^dHn


0 1—^„-

are equivalent.
Exercise 4. (Breslow and Crowley, (1974) Show that

ol
(12) 0<—log(1— „(t))—A„(t)< 1 f 1 dH'
n (1 — Hn)( 1 — Hn—)
I l-0„(t)
(13)
<n 1—H„(tY
[Hint: 0<—log(1-1/(x+l))-1/(x+l)<1/x(x+1) for0<x<oo.]

3. CONSISTENCY OF ÖA„ AND f„


In this section we use the representations of Theorem 7.2.1 to establish strong
consistency of l\,, and F„.
Theorem 1. Suppose F and G are arbitrary dfs on [0, oo) in the sense of
(7.2.1). Recall T =TH H '(1) where I—H=(1—F)(1— G). Then
-

(1) sup j1F„(t)—F(t)I a . s .0 as n-- 00.


O 1<

[Note (3) below for 9„(T).] Fix 0 < T=- H '(1). Then -

(2) Il1n-1IIO ^a.s.0 as n -+cO•

Proof. Now IH n — H II -ß a . s .0 by Glivenko-Cantelli, so that (IH„_ H4 I -> a . s .0


also. Thus for any fixed t <_ 0 we have a.s. that

IA,(t) — A(t)I J(1 —H,,' —(1— HY'J dH„


a

+
J 0
(1—H_)-'d(H'.—H')

(a) = o(1)+ 10
(1— H-) 'd(H — H') (
- by Glivenko-Cantelli

(b) =0(1) by the SLLN.


CONSISTENCY OF A, AND F„ 305
Since A,, and A are 7, the standard argument of (3.1.83) improves (b) to (2);
we must truncate off at some 0 since A(T) = oo unless AA(T) > 0.
From (7.2.7') we note that for t <_ 0 < T TF we have a.s. that for n sufficiently
large
1—(F „_ d(A„—A)
(c) jt „(t) —F(t)j[1—F(t)]i
t 1—F
1
5

Jo ' 1—F_ 1 —AA


d—A)

dK„ I where K,,— —AA d(A„ —A)


(d) f 0 I — F_ f, 1
'

=
J (left continuous)d(right continuous)

={ —l 9„(t)
1 —F(t) } K„(t) — Jo K„d ( 1—F)
by (A.9.13)

^ 1 } IK„(t)l+^^K„1I0 fo
{I — F(B) d
(e) ^ l^K„IjöO(1) a.s.

using (A.9.14) and (A.9.16) for the 0(l) term. However,

K„(t)I = I Jo d(Ö1„ —A)+ Jo d(A„ —A)


1—F
(f) 01 „(t) —A(t)I+) E AF(s)
[AA „(s) —AA(s)]I.
sue, 1 —F(s)
AF(s)>0
We thus have

ItiA„ — AIIö+ 2 IIA„ — AIl F(0)/[ 1— F( 0 )]


(g) 4a.s.0.

Together (e) and (g) give t„ (t) -* a.s. F(t). Since i„ and F are >', the standard
argument of (3.1.83) improves pointwise convergence to uniform convergence
on [0, 0] for any 0< T. This extends to uniform convergence on [0, T), since
F is bounded. This completes the proof.

Several possibilities exist at t = T:


(i) Suppose T = oo: Then (1) already covers all possible t's.
(ii) Suppose T < and G(T—) = 1: Now t„ (T —) -> a _ s. F(T—) by (1). Since all
Z; 's are a.s. less than T, F„ (T) = a.s ,F „ (T —) -te a s. F(T—). This limit is wrong if
F(r)> 0.
306 CENSORED DATA AND THE PRODUCT-LIMIT ESTIMATOR

(iii) Suppose T < oo, G(r-) < 1, and AF(T) = 0: Obviously,


Fn(T) = a.s.Fn(T — ) a.s. F(T—)= F(T).
(iv) Suppose T < oo, G(T-) < 1, zF(T) > 0 and F(r)=1: Then all observa-
tions on [T, co) are uncensored observations at T, and so L n (T) = a s 1 for
n , some n^, as it should.
(v) Suppose r < oo, G(r-) < 1, AF(r)>O, and F(r)<1: Then proportion
AF(T)/[1- F(T-)] of the remaining approximately n[1- H(T-)] observa-
tions are uncensored observations at T. Hence F n (r)-* a. ,.F(T-)+
[1- H(T-)]AF(T)/C1- F(T-)] = F(T-)+[1- G(r-)]F(r) ^ F(T).

The five cases of the previous paragraph include all possibilities. We can
summarize them as

F(T-) = F(T-)+[1- G(T-)]AF(T)


if T < 00, G(T-) =1, AF(T) > 0
(3) ^n(T)^a.s. F(T )+[1
— — G(T )]AF(T)

if r<oo, G(T-)<1,AF(r)>0, and F(T)<1


F(T) in all other cases.

To thus answer an open question of Gill (1983),

(4) llt„ - F1I -*a.s.0 as n - co can fail;

but it can only fail if T < oo, G(r-) < 1, iF(T) > 0, and F(r)<1 [hence
G(r) = 1]. This paragraph is from Shorack and Wellner (1985). ❑

4. PRELIMINARY WEAK CONVERGENCE = OF B N AND X,,

The results of this section, but with F assumed continuous, can be found
in Breslow and Crowley (1974). The outline of the present proofs are somewhat
different, however, being based on the identities of Section 7.2.
Suppose F and G are arbitrary df's on [0, co) in the sense of (7.2.1). We
define

(1) C(t)
J
o
t (1-H) -2 dV= J (1-H_)'(1-
0

for

(2) V(s)=j (1-G_)(1-AA)dF= Jt (1-H_)(1-AA)dA


0 0
PRELIMINARY WEAK CONVERGENCE='OF n N AND X, 307

as in (7.1.5). Recall that

(3) AA= 1 A F, and 1— DA = 11— F_

Note that when F is continuous we have

(1') C(t) = J 0
" (1 — H) - 2 dH' = J 0
r (1—H) ' dA
- if F is continuous,

since

V(t) = (1— G) dF= H'(t) if F is continuous.


(2')
J 0

The natural limiting process to associate with the B. process of (7.0.14) or


(7.2.3) is

(4) B(t)= J ' (1—H_) ' dM -

[1— H(t)]-'I^II(t)—
f o,
Md[1 —H_] - ' for t^0,

where M =S(V) as in (7.1.8). (Note (3.4.28).)

Exercise 1. Use repeated integrations by parts to show that

(5) Cov[3(s),B(t)]=C(snt) for 0<s,t<co.

We thus see that

(6) 8 —S(C)
for Brownian motion S.

Theorem 1. Let F and G be arbitrary df's on [0, co) in the sense of (7.2.1).
Fix 8 <T= H '(1). Then -

(7) IIBn - 8Ilo-+a.s.0 as n - oo

for the special construction of Theorem 7.1.1.

We now turn to the X,, = ^(f n — F) process. Noting (7.2.4), the natural
limiting process to associate with X. is

1
(8) X(t)= [1 —F(t)] fo dM
' 1—F 1—H_
308 CENSORED DATA AND THE PRODUCT-LIMIT ESTIMATOR

where [recall (4)]


i t 1—F_
(9) f 1
,' 1—F 1—H_
df fH_
j0o 1—
1
df^N+ s^‚
1 OF(s)
1—H(s—) 1—F(s)
AM(s)

and ßi0 = S(V) as in (7.1.8).

Exercise 2. Show that in general

(10) Coy [X(s),X(t)]=[1—F(s)][1—F(t)]D(sAt) for 0<s,t<oo

where

(11) D(t)_ l—F_ 1 ^2dV= f0t 1 1 dH'


f,'r11—F 1—H_ (1—H_)z 1—DA
_1
ol f 1
1—H_ 1—AA
dA.

We thus have that

(12) X = (1— F)Z(D) for a Brownian motion Z.

Note also that

(13) X=(1—F)(1+D)U(l D) 1 U(K°),


1 KD
where U is the Brownian bridge defined by l (t) = (1 + t)U(t/(1 + t)) and where

(14) KD (t)=D(t)/[1+D(t)] for t>_0.

Finally, we note that

(15) D— C and X = (1— F)B when F is continuous.

Theorem 2. Let F and G be arbitrary df's on [0, co) in the sense of (7.2.1).
Fix 0<T= H '(1). Then
-

(16) jIX„ — XII0-a.5. 0 as n-+oo


for the special construction of Theorem 7.1.1.
Proof of Theorem 1. Now (7.2.3) and integration by parts gives that for t < 0
we a.s. have that for n sufficiently large

(17) B„ (t)=[1 —
Hn(t)] `R„(t)
-
— J tb0„d[1—B]-'•
0
PRELIMINARY WEAK CONVERGENCE=OF B N AND K „ 309
Using I IH,, - HII a .,.0, IIM,,-* a . s .0 by (7.1.7) and lim T> 0 a.s., we clearly
have for the first term of (17) that

(a) [1 - H „( t )]-'NA „(t) -*a.s.[1-H(t)]-'Ai0(t) uniformly on [0, 0].

For the second term of (17) we have

(b) ION„d[1- ü-1„ -] ' _E(A. - Iv1)d[1- H„ -] - + MOd[1


0 0

where

Iiini „- lv) d [ 1 - H„_] -'lle-limI'n-^IV[1-H(0)]=0.


(c) °IIJ (N^0
0 0

To treat the final term in (b) we write


Md[1- IH„ -] ' = f , M,d[1- H„ -] + J iMd[1- H,, -] '
J 0
r -

o0
- , -

(d) =
J '
0
MI d[1 - H„ -]-'- f
o,
M-d[1

+ EAId[ 1 —Hn-]-''

The sum of the first two terms in (b) converges to

(e)f o,
fM+d[1-
0
H_] - - J ft d[1 - H_] -

J
= M c d[1 -H-] '
0
-

uniformly in 0 <_ 1< 0; convergence for each follows from the Helly-Bray t
theorem, while monotonicity gives the uniformity via the standard argument
of (3.1.83).
Finally, in (d),

J o
r
.s.5
H
01vlld[1- ,,-] ' = I DM(s)0[1-Uil„_] -
-

t
'(s)
(f) ->a.s. Y_ DA^O(s)A[1- H_] - '(s) uniformly for 0< t<_ 0
(g) = J ,

0
AMd[1- H_] '; -

to establish (f)
we just treat a finite number of the biggest jumps (of F)
individually using lH„_ - H_I) f and then lump the rest of the jumps
310 CENSORED DATA AND THE PRODUCT-LIMIT ESTIMATOR

together with a bound of "Constant" holding uniformly on JAM(s)l and with


the remaining contribution to E « , A[I —H_] '(s) being less than c/Constant
-

[note that M(s)>0 can only occur if AF(s)>0]. Adding the terms in (a),
(e), and (g) gives the theorem. ❑
Proof of Theorem 2. Now by (7.2.4), for t < 0 we a.s. have that for n sufficiently
large
1—F„_ 1
(a) X„(t)= f dML
Jo

(b) =
f 0
' 1—!F„_1
df^N„ + J dfyU„.
('' 1—t„_AF 1
1—F_ 1—H-0 _ o f—F_ 1—•F 1—H- _

It's easy to treat the first term of (b) using the same technique applied in the
proof of Theorem 1; but since d(1—F„_)/((1— F_)(1—H„_)) is not d (a
monotone function), we need to use (A.9.14) and (A.9.16) to break this up
into the difference of two d (a monotone function) terms. It remains only to
treat the second term of (b). As in the proof of Theorem 1, the technique for
the second term in (b) is to pull out a finite number of the biggest jumps (on
which the convergence is trivial) and note that the contribution of the other
terms is negligible [provided enough terms were pulled out so that the contribu-
tion of the remaining terms to ^^, AF(s) is sufficiently small]. ❑

Exercise 3. Determine the covariance functions Cov [B(s), M(t)], Cov [X(s),
ß1(t)], and Cov[B(s), X(t)].

Exercise 4. If F is continuous, then C = D and K c = K D = K where K c


Cl (1 + C ). Especially, F K <_ H.

5. MARTINGALE REPRESENTATIONS

Throughout this section we consider processes {X1 : t >_ 0} adapted to 3„


{.3 : 0 <_ t < oo} with

(1) ;=Q{Itx<sl1[r^s7llx<s}s„l[y^s]S;:1 i <_n,0<_s<_t};

that is, each X, is g'-measurable. Intuitively, a process is predictable if knowing


its value on [0, t) determines its value at t. Thus any left-continuous process
is predictable. Likewise, processes like J fo ,, 1 (1—H„_) dA of (7.1.1) are predict-
able, even though right continuous. However, H;,, 0-U„, and F„ are not predict-
able. Formally, a process is predictable if it is measurable with respect to the
a--field on [0, cc) x Cl generated by the collection of all left continuous, p„-
adapted processes. A process X is called integrable if sup o ^, < , E^X,I <cc, and
square integrable if X 2 is integrable. If such a property holds for the process
MARTINGALE REPRESENTATIONS 311

{X, ATA : 0 < t < oo} for each k? 1 where Tk is a sequence of stopping times
satisfying Tk -> oo a.s. as k -> oo, then the property will be said to hold locally.
Recall that Z; = X; A Y; , S i =1 (X; < Y) ,

in in
(2) Hn'(t)=— 1 [o ,, t (Zi )5, and H„ =— 1 [o ,, } (Z; ) for 0<_t<oo
n ; _ I n ;_ I

where X 1 ,. . . , X. are iid F and Y,, ... , Y„ are iid G. Consider the process
H,. Now the predictable process

(3) A „(t)= 1 (1 —H _)dA for 0<_t<oo


0

is quite clearly an / process that is adapted to .^ ,,. We will show presently that

(4) M.(t)=^[6-U;,(t)— J (1—IH„ -)dA]


' for 0<_t<ao
0

(5) _ ^'n[Hn(t) — An(t)]

is a martingale. Thus U U„ = n ' /2 M,, + A„ is the so-called Doob-Meyer decompo-


-

sition of the submartingale H. into the sum of a martingale and an 1' process;
nA„ is called the compensator of nH ,. Since M. is a martingale, U. is a
submartingale. Thus it also has a Doob-Meyer decomposition of the form

(6) h'1lR( 1 )=('n(t)—(M„)(t))+(tAln)(t)

into a martingale plus an / process (M.), with both processes „-adapted.


The process (u „) is called the predictable variation process of M,,.
More general versions of these definitions and theorems appear in Appendix
B. We have repeated just enough of that appendix here to make the discussion
and statements of theorems of this chapter readable. Before the reader can
understand many of the later proofs of this chapter, he will have to become
familiar with Appendix B. We feel it worthwhile for the reader to master the
present section before continuing this chapter.

Theorem 1. Let F denote an arbitrary df on [0, cc) in the sense of (7.2.1). For
each fixed n >_ 1, the process

(7) {(^9„ (t), ): 0: t < oo}

is a square integrable martingale with mean 0.

Moreover, the predictable variation process of N„ is

(8) (AA„)(t) = J (1 —H n -)(1 —AA) dA.


0
312 CENSORED DATA AND THE PRODUCT-LIMIT ESTIMATOR

The approach to weak convergence taken in this chapter is keyed on the


identities (7.2.3') and (7.2.4') that represent B,, and X,,/(1 — F) as stochastic
integrals with respect to the martingale R,,. Recall that

(9) T-Z,,.,, and J„(t)°1to,z:l(t)=1to,T](t)

Theorem 2. Let F be an arbitrary df on [0, oo) in the sense of (7.2.1). Then

(10) IB„ ( A T) is a square integrable mean 0 martingale

on [0, oo) adapted to 5„ with predictable variation process

(11) (B)= J” (1—AA)dA for0<t<cm.


J 1- Hn-

Also, provided F and G are such that T < TF a.s.

(12) ?LX„(•AT)/(1—F(•nT))

is a locally square integrable mean 0 martingale

on [0, oo) adapted to S„ with predictable variation processes

(13) (7L,T )(t)=^ r j 1 Z } J" (1—LA)dA a^


for0 <t<.
do 111 1 "F 111 1 ^4"_
— —

Our proof of (11) and a trivial bound on (1—„_)/(1—F) imply that


1—F„(tnT)
(14) , 0 <_ t <_ 6< TF, is a square integrable mean 1 martingale
1—F(tAT)
since IA T<_ 0 a.s., note Remark 2.
First proof of Theorem 1. Since M. is the sum of n iid processes, it suffices
to prove (7) for n = 1. That is, we must show that

(15) M I (t)=1 jz ^, t S—A,(t), with A1,(t)= o l z s dF(u),


tu^ 1—F(u—)
is a martingale with mean 0. Now

(a) El i (t)= J ot (1—G_) 1o ( —H_) 1 1 F, dF=0.


dF—^

Also
(1— G_) dF
(b) E{1tz<,JSjg'S} = S • I tz < st + J itz>st
ss)
1—H(
MARTINGALE REPRESENTATIONS 313
and

E {AI(t)I gis} = dF(u)j c }


E{ Jo I[z ^ u] 1— F(u—)

j
= E t\f o +Jst/Ilz^Utl—F(u—) dF(u)jJW


(c) —A,(s)+ l — F(u—)
1 dF(u)
=A'(s)+ t 0 I(Zc`l + l
—H(u—)
(d) Itz>st I dF(u)
1 —H(s) } 1— F(u—)
(e) = A1(s)+f (1—G_)
dF I[z>s].
,'1 —H(s)
Subtracting (e) from (b) establishes that the process of (15) satisfies the
martingale condition; and since (7.1.5) gives

(f) E[ltz<,13—A1(t)]2= V(t)-1,

the process in (15) is square integrable. Thus M. is square integrable. We have


verified (7).
We ask the reader in Exercise 1 below to verify (8) by direct calculation.
Our purpose in asking this is to give the reader an appreciation for the power
of the theorems of Appendix B, on which the proofs that appear in the rest
of this section are based. In particular, our second proof of Theorem 1 uses
them. ❑

Exercise 1. Verify (8) by direct calculation.

Remark 1. We again need the processes of (6.1.13) based on one observation;


to distinguish them from the present processes, we need new symbols. We
thus let F, ; denote the empirical df of the one observation X; , and then set

(16) M, (t)=F, (t)


; ; — J (1—F, _) dA=F, (t)—A, (t)
0
; ; ; for 0^ t<oo,

where

(17) A, ; = f (1—F, _) dA ; with DA, =(1 —F 11 _)A.


;

Then (6.1.26) shows that

(18) (M, )(t)= ; J (1 — F, _)(1—DA) dA


0
; for 0 <_ t <oo
314 CENSORED DATA AND THE PRODUCT-LIMIT ESTIMATOR

(19) _ ^ (1—©A 11 ) dA, ; for 0<_ t <oo.


0

Now note that the A'O, ; of (15) can be represented in terms of the M, ; of (16)
as

(20) 9(t)=
E (1-6, _)dM, for0-t<oo,
; ;

with G, ; the empirical df of the one observation Y. Now 1-6_ is left


continuous and 3-adapted, and hence predictable. Thus (20) seems to rep-
resent lilrß, ; as a sum (integral) of terms of the form of a predictable coefficient
(integand) depending only on the past times a martingale difference (differen-
tial). This leads us to suspect that RN, ; is a martingale (which, in fact, we
already know to be true). This approach will be made firm in our second proof
of Theorem 1, during which the reader should note that Theorem B.3.1 is a
general result of just this type.
Second proof of Theorem 1. Compare (20) with (B.3.1) identifying Y, H, M
with RA I , (1 —G I _), M,. Note that

(21) AA,<_OF/(1—Fs F/OF=1.

We prepare to apply (c) of Theorem B.3.1. Note that

J0
(1 —G 1 _) 2 d(M 1 ) =
J (1 — G,-) 2 [ 1— DA1]

1
dA, <_
f 'c' (1
o
- 6,_)2dA,

=Jo [1—Q^,(u—)]21[X"]
1—F(u—) dF(u)
—r°° 1
Jo 11Y '"^ l ^ X '"^ 1—F(u—) dF(u)

(22) ]
— ,^o '[Z '" 1—F(u—) dF(u)

has

(1—Q's, )2d(M,)` f dF(u)


E f.^ ow E(1[z . }) 1—F(u—)
I.

(1—G_) dF=H'(oo)
= J o (1—H_) 1 1 F dF= J
, o

(23) -- 1.
MARTINGALE REPRESENTATIONS 315

Thus (c) of Theorem B.3.1 implies that NN, is a square integrable martingale
on [0, oo) with predictable variation process

(M^)(t)= (1 - G, 2 d(M,)=J(1- G) 2 [1 -A, ] dA,


Jo)
= J 0
t 1y s ]1[X^.s ] dA — J 1
[Y
t
o
> s] 1 [X ^., ] AA(s)1 [X ^ s} dA(s)

=
j l tz ,,, dA(s)- J 0
1 [Z as]AA(s) dA(s)

(24) =
Cr
j (1-Oil,_)(1-AA)dA=
0
f0
t(I -AA I ) dA l .

Since M. is just n - ' 12 times the sum of n iid processes of type fvll , it is also
a square integrable martingale on [0, oo). The proof will thus be completed if
we show that ftvl t) - (M, ; )( t) /n is a martingale. Now (24) gives

I „ I „
(25) - Y- (i„)(t)= - II [1 - H„-](1 -AA) dA
n ; =1 n r=i JO

=
J 0
t[1 -H„ - ](1 -AA)dA.

The proof that fill„ - (1/ n) Y , (111, ; ) is a martingale is virtually identical to


(g)-(i) in the proof of Theorem 6.3.1. ❑

When F is discontinuous, H„ is not a counting process (its associated


martingale is MO„) since all its jumps need not be of size one. Thus Theorem
B.3.1 does not directly apply. However, H'li—= 1 [z . ] S ; is a counting process
(its associated martingale is Our proof of Theorem 2 below will apply
Theorem B.3.1 for each 1 <_ i - n, and then show that we can add up the results.

Proof of Theorem 2. To obtain the predictable variation (B n) of (11), we will


apply Theorem B.3.lc with Y, H, M
replaced by B. J(1 -üU„_) ', and fMl „ ; 's
in (B.3.1) since (6.2.3') gives

(26)B „(t)=
f 1-B0„
o' J
” dM0„ for t >_0.
316 CENSORED DATA AND THE PRODUCT-LIMIT ESTIMATOR

We note that Y_; (ßr4„ ) = (M„) and (recall (g) -(i) of Theorem 6.3.1)
;

co
E J /
(1 -
J.
Hl„-)
2
d(A)=E J^
0 (l
( J
-

H”-)
(1-AA)dA

c J„
(a) dA
-E o (1-a-0-)
=n Y 1 /n 1
f k kl
o k=l
(1-H_)H"_ k (1-F_) - ' dF<oo

since the finiteness of the term for k = 1 is most difficult, and for it

(b) n f n (I - H-) H"-- (1 - F-) - 'dF<n 2 J(i -G_)dF<_n 2 <Co.


o

Thus (c) of Theorem B.3.1 does apply, and implies that (3„ is a square integrable
martingale having
' J 2 J
(c) (Bn)(t)= f d(M" fo (1-AA)dA.
„ 1)= '

0 1-60„_ (1-I-0„_)

To prove (12) and (13), note that


1-o„_ \ J„ J„ (1-AA)dA
fo'(
` 1-F ) (1-D- _)Z
d(f„)=

fo'( 1-F) (1 -H -)
' 1 J„
dA
J 0 (1- F) 2 (1 - H„_)
(d) <oo for 0<_ t<oc, on [T<TF ],

as required by Theorem 3.3.1d.


The means of both martingales are zero since we clearly have EB (0) = 0 =
EZn(0). O

Remark 2. Let T* = Z„ : „_,, T** = Z„ : „- Z . Define Z, Z n ` and 1T ** in analogy


with (7.2.4'). Minor changes in the above proof show

(27) Z n is a square integrable martingale on [0, co) if r< TF ,

(28) Z„' is always a uniformly integrable martingale on [0, co),


(29) 1 n ” is a square integrable martingale on [0, xD) if
Jo [1-AA] - ' dF<oo.

6. INEQUALITIES

To extend the convergence of B. and 7L„ from [0, 0] to [0, T], we will require
the following three inequalities. The first is the Gill-Wellner inequality
INEQUALITIES 317

(Inequality A.2.11). Roughly, it allows us to replace the requirement of a


Häjek-Renyi inequality for one process with the requirement of a Kolmogorov-
Doob inequality for a different one through a statement of the form 11HZII <_
2IIlo HdZII. The key to this last problem is our second inequality, which is
Example B.4.1 to Lenglart's inequality (Inequality B.4.1). The key to its utility
is that the Kolmogorov-Doob expectation need not be computed; rather, a
probability must be made small. Its importance is such that we restate it here.

Inequality 1. (Lenglart) Suppose that {M(t): t >_0} is a locally square


integrable martingale with predictable variation process (M). Then for all
A > 0, n >0 and all stopping times T

(1) 1'(IIMII0 ^A)=P(IIM Z II0 ^A 2 )^A2 +P((M)(T)^n)•

Our third inequality provides analogs of Daniels's theorem (Theorem 9.1.2)


and Inequality 10.3.1.

Inequality 2. (Gill, 1980) Let T— Z . For all n >_ I and A > I

(2) P \II1—Fllo>^/c^
and

(3) P^II1 F II 0 , a 2 :5 +eAexp(—A)<^.

Proof. By (7.5.14) the process

( 1 ^ n \SA T) =1 _
— 1
Z s)= 1 —i—
—F(sA T) o,s„T] 1—F 1—H dam°

is a martingale on [0, t] for every t < TF. Hence, by Doob's inequality


(Inequality A.10.1)

(a) P( sup Z(s)>_,t)<_ 1 EZ(t)= 1 EZ(0)=— ;


Oct A A A

and this yields, for every t < TF ,

(b) P\II1—Fl^o^r-A/ c A

If r < TF, we are done. So suppose T = TF. Letting t TF shows that the
318 CENSORED DATA AND THE PRODUCT-LIMIT ESTIMATOR

supremum in (b) may be taken over [0, T] — {TF }. If AF(TF ) = 0, then F„ (rF ) _
F„ (TF-) a.s. If F(T_) < F(T) = 1, then P(T = TF and F. (rF ) = 1) = P(T = TF ).
In either case, (2) holds.
To prove (3), note that 1— H =(1 — F)(1— G) and

(c) I—H„=(I—)(1—Qv„) on [0,°o),

see Exercise 7.2.2. Also note that

1—H T
(d) P( <_elexp(—A) for A>_1
1-0-0 o

by inequality (3.6.26), and

(e) for,1?1
P(II11 G Ilo^A)<
by (2) and symmetry of the problem in F and G. Hence

P \II1 — ^„ Ia ,'k P\
1—H 1—Qv„ T^Az
1 —@U_ 1—G Ilo
l by (c)

(II 1'H IIo-^ +P(Ii 6


l 1G Ilo ^^)
(f) :5eA exp (—A)+1/A by (d) and (e)
3/A since eA e. cp (—A) < 2/.1

as was claimed. C

7. WEAK CONVERGENCE OF B. AND X. IN II / q ^^ ö -METRICS

We already know from Section 4 that B. and X. converge on [0, 0]. In this
section we extend this to [0, T] and we introduce II /qII metrics.
We recall that

(1) 1—AA=(1—F)/(1—F_),

(2) V(t)= J (1_G_)(1_A)dF=


0
J 0
(I — H_)(1 — AA)dA,

(3) C(t) = — H_) (1 AA) dA,


— — K c -(t)=C(t)/[1+C(t)],
fo, (1
(4) D(t) = J 0
1 1
1—H_ 1-0A
dA, KD(t)=D(t)/[1+D(t)].
WEAK CONVERGENCE OF B. AND X. IN 11 /gII0-METRICS 319

Note from (7.1.8), (7.4.6), and (7.4.12) that

(5) (M)= V, (8)=C, (X/(1 — F))=D.

Warning: Recall that the Brownian motions in (7.1.8), (7.4.6), and (7.4.12)
are all different.

Theorem 1. Let F and G be arbitrary df's on [0, X) such that

(6) AF(rr) = 0 (which implies T < TF a.s.).

Then the special construction of Theorem 7.1.1 satisfies both

(7) II(3 — B)/[(1 +C)q(Kc)]I10 -A P O as n- c'3

provided

(8) q is 7 on [0, 1] with


E [q(t)] - 2 dt <oo,

and
(9) (I x „ —X 1 —KD —Ko) /(i —F))X —U(KD) IIT^p0
1—F q(KD)IIT
o II((1 q(KD) ho
as n-00
provided
(10) q is 7 and q(t)/.fit is ", on [0 2],q is symmetric about t =2
i i

and
fo , q(t) -2 dt <oo.

The most important case of (7) and (9) is when q = 1.

Remark 1. All proofs still carry through even if G = 0 (recall that this
possibility was specified in the Introduction). In this case D(t)=
Jo[(1 —F)(1—F_)] ' dF =(1— F(t)) ' -1= F(t) /(1—F(1)), so that
- -

(11) K D = F if G = 0 (i.e., if no censorship).

Moreover, (9) reduces [after symmetry about Z has been used to replace T by
00, and after we note that q(t)/.Jt\ was not used in half the proof] to

(12) JI[7^„—U(F)]/q(F)IIö APO as n-->co when G=0


320 CENSORED DATA AND THE PRODUCT-LIMIT ESTIMATOR

for the special construction of Theorem 7.1.1 provided

(13) q is ,' on [0, 2], symmetric about '=1, and J[q(t)] 2 dt < co.
0

This is fully as strong as Theorem 3.7.1, and only slightly weaker than Chibisov's
theorem (Theorem 11.5.1).
Proof of (7). Let K = K. Consider [0, S] with 3 small. Now C is 1' and
right continuous. Thus inequality A.2.11 gives

(a) JI(Bn - B)ig(C)II ö `-210 [9(C)] - ' d(^n


-B)I!o

c2IIJo [q(C)]-' dB"IIo+jjB/q(C)j .


(b)

Then using (d) of Theorem B.3.1 in (c), we have


s

P( f [q(C)] ' dBl 'EI


-

0 0

(c) < -z + P
!3
( \^ o
q(C)
d > e3 l
/
by Inequality 7.6.1

(d) = e + PI J s [q(C)] - z d(Bf)>_ e 3 ` by (d) of Theorem B.3.1


\ o

z I J" (1-AA) dA?e 3 1 by (7.5.11)


(e) \JJJo q (C) 1 -ß-0,-

(f) =e+P( J z I J" (1-H_) dC ? e 3 ) by (3)


s0 9 (C) 1 -H,
1- H_ s Ccs> 1
J
('

(g) <_E+P II z dt > s' using (3.2.58)


o 0 9 (t)
(h) <_2E for S some S and n ? some n e , using (3.6.26).
E

The Birnbaum-Marshall inequality gives

(i) P(IIB/q(C)110^ e)^P(E [q(C)] 2 d(ß)>E

J) = P( [q(C)] — z dC>_ e') < 2e - for 88,


0

since B = S(C) has (13) = C by (5). Combining (h) and 0) with Theorem 7.4.1
WEAK CONVERGENCE OF D^„ AND X,, IN II /q-METRICS 321

gives, for any 0< T, that

(14) II(Bn — l3 ) /q(C)IIö- n O as n- X.

Now
(k) q(K)=q(C /(1+C)) >q(C /2) for t <some fi>0,

where q( /2) behaves near 0 like a function satisfying (8). Thus (14) extends
to

(15) II(Bn — B) /q(K)110 -+ n 0 as n - c0

for any 0< r.


Now consider (8, T] with K(0) >2. [Actually, there need not exist a 0 for
which C(8)? I and K(0) ? . If not, the proof is even simpler.] Let

(1) R =Tv0.

First, note that

(m) (1 +C)q(K) is 7.

Then, as in (a) and (b) at line (n),

II(Bn — B)/[(1 +C)q(K)]IIÄ


II[(^n —^n(B))—(lB— x(8))]/[(1 +C)q(K)]Il ä
+ I BT ( 0 ) —B(0)1/[(l + C( 0 ))q(K (8))]

C2 d[Bn
1 Bn(o)]
f. (1 +C)q(K)
'

(I„
(n) + 1 lB—!B(0)
R_o (1).
II(1+C)q(K)Iie n

As in (c) and (d)

P d[a — ^T(B)] R
[llL ( 1 +C)q(K) e
E l
sE+Pj
f R [(
a
+C)q(K)] z d(B n — ln(
3

(0) =E+P1 J
\\
R
e (1
1 (1 —AA) dA>_E 3 1
+C) 2 g 2 (K) 1—H_
1 Jn
(p) =E+P R (1 —H_)dC>_E'
( e (1 +C) 2 g 2 (K) 1 —U-0 n _ )
322 CENSORED DATA AND THE PRODUCT-LIMIT ESTIMATOR

using

(16) (—(0))(.) =(B)(.)—(8(0))= JJ(i —H„-) '(1—DA)dA

in step (o). As in (f), (g), and (h), it suffices to note that

(17) II(1—H_)/(1-9-0„_)IIö_<some M <oo E

with probability exceeding 1— e,

which holds by (3.6.26), and then choose a continuity point O r of K so large


that

(q)f ^"
[( 1 +C)q(K)] -2 dC <e 4/M

However, (A.9.18) shows that

(18) dK=[(1+C)(1+C_)] ' dC; -

and thus (3.2.58) gives

J 0(l+C)q2(K)

K(r)
dCJ (l+C)(l±C)q2(K) dJ0q2(K) dK

(r) <_ q-2(r) dt


K(9,)

(s) K(T)
as 9 E ->r
9 (K(T))
(t) =0 since OF(r)=0.

Thus it is possible to choose 6 E so close to r that (q) holds provided AF(T) = 0.


Using (17) and (q) in (p) shows that

(u) P ( f
\II Jo
1
( 1 +C)q(K)
d[^^ — Bn (e)] R
lie.
l
e <_ 2E

provided 0. is sufficiently close to r and provided n exceeds some n e . It is


again a trivial application to the Birnbaum-Marshall inequality (Inequality
A.10.4) to show that the second term in (n) is negligible. We have thus shown
that

(v) Il(1 — B)/{( 1 +C)q(K)]j^0 -ß p 0 as n-*cc


WEAK CONVERGENCE OF B„ AND X,, IN /q^Iö-METRICS 323
provided OF(T) = 0. Combining (15) and (v) gives (7), using (1 + C) -1 in (15).

Proof of (10). Consider [0, 8]. Using the method of the previous proof, the
integral in line (e) of that proof is now replaced by [compare (7.5.13) to (7.5.11)]

fo
11
q 2 (D) { I — F }
JI B
Z J„
1—H (1—DA) dA

1 1 1
(a) 1—F_ o1 Z lll — H„_Ilo Jos q 2 (D) 1—H_ 1 —AA dA.

Hence the proof on [0, S] can be completed as in the previous proof [using
(4) and (3.2.58)]. Note that q(t)/J:" is not used in this half of the proof,
and is thus not required in Remark 1.
Now consider (0, T] with K (0) > 2 where K = K o . Again, let R = Tv 0. Note
that

q(K) _ q(1 —K) I


(b) is 1' for K > 2.
1—K 1—K i—K

The proof now proceeds as for B. [again using (3.2.58)], where at step (q) in
that proof we must now deal with the integral

r 7
r (1 - K )z
(c) J,, q 2 (K) dD— eF (1 +D)2g2(K) dD

1
dD
o. (1+D)(1 +D_)g2(K)
EC_(r) 1

(d) fJ T o 1
q 2 (K) dK (e) 9Z(t) dt.

The remainder of the proof is as before. ❑n


Confidence Bands

To use Theorem 1 to form confidence bands for S = 1— F, we first need


consistent estimates of the covariance function. We will suppose in this sub-
section that F is continuous. Recall that

C(i)= J (1—H)
0
- ' dA= J (1—H) -2 dH';
0
"
324 CENSORED DATA AND THE PRODUCT-LIMIT ESTIMATOR

thus C is consistently estimated by

(19) C(t) = f r (i—H n ) - '(1—H n -) ' dH n

= 1 —( 1)(1 -1-1 ) ' 1(Z,:,t]5ni,


\ n

and K = C/(1 + C) is consistently estimated by

_ Cn
.
(20) ^" 1+C n

Exercise 1. For 0< T so H(9) < 1, show that

(21) IlCn— CIIö 4a.s.0 and IlOn —K IIg --a.s. 0.

Exercise 2. Show that if all the observations are uncensored, S ; = I for i =


1, ... , n, then

(22) Kn =1= n=Hn=Fn,

where F n is the empirical df of the X; 's; recall Exercise 7.2.1.

To state the corollary for confidence bands, write S I — F, n = 1— Je n , and


K n = 1 —K n .

Corollary 1. If 0< r so H(6) < 1, F and G are continuous, and ,(ö +/1 2 ( u) du <
cc, then

(23) P(S(x) E 9 n (x)± can - ' /2 §n (x)/[#n (x)iP(ln (x))] for all 0<_x < 0)

(24) - P( sup IU(t)Ii/i(t)^c e,)


OstSK(0)
(25) >P(Ilv41ll-ca)
as n-^X.

When i/i(u)- 1 the probability in (25) is given by (2.2.12), and the bands
in (23) which generalize the classical Kolmogorov bands, were introduced by
Hall and Wellner (1980). In view of Exercise 2, these bands reduce to the
Kolmogorov bands when all the observations are uncensored, S i = 1, i =
1 ,...,n.
Confidence bands based on other functions /i are possible as long as the
probability in (25) can be calculated or adequately approximated. Nair (1981)
EXTENSION TO GENERAL CENSORING TIMES 325
advocates bands similar to these but with ,(1(u) = [u(1 — 2 for a <_ a. u)] "
- u 1—
The choice +/i(u) = (1— u) - ' for 0< u b,
is a choice related to the classical
Renyi-statistic, which gives more emphasis to the right tail; in this case the
probability (25)
is given in Example 3.8.2.
Exercise 3. (Gill, 1983) If F is continuous, then Kc- = K n = K from Exer-
cise 7.4.4. Supposing that G is continuous, show that
(1 — K)/(1—F)=1+ J CdF, 0

where C(t) = ^ö (1 — F) - '(1— G) -2 dG = Jö (1 — H) -2 dH ° . Hence (1— K)/(1 —


F) is?. Note that C+C = $0(1—H) 2 dH= H/(1—H). -

8. EXTENSION TO GENERAL CENSORING TIMES

Suppose now that we relax our assumptions on the censoring times and assume
throughout this section that

(1) X1,... , X„ are iid with nondegenerate df F on [0, co),


Y; has df G ; on [0, oo) (possibly G ; = 0), and
all X 's and Y 's are independent.
; ;

We now let Z; = X; A Y; , S ; = 1 [x as before and redefine

(2) H J
r

o
H1 = 1 — (1 — F)(1—G ; ),
1 n

n ^_^
(3) H„ = 1 H; = [1 — G„
-] dF,
n i o
n
n ,

(4) II-l(t) 1[z


1 1<^^s,,
n ;_, Hn(t)= ; Y_ 1 [z
n ;_,
, ] , and

0\„(t)= J 1 dH„.
As in (7.5.15),

(5) M,(t) = 1 [z, <,]5i — A1j(t), with

A1^(t)= ° 1[z;^u] 1
1 —F(u—) dF(u),
326 CENSORED DATA AND THE PRODUCT-LIMIT ESTIMATOR

is such that

(6) 11(t), t > 0, is a mean 0 square integrable martingale

with [note (7.5.18) and (7.5.19)] predictable variation process

(7) (i11)(t) = j l[1—AA(u)] dA(u) = f [1-0A,,] dA11.

As in (7.5.24) and (7.5.25),

( 8 ) Mn (t)=JY![Hn(t)—An(t)]= `"`[Hn(t)— fo ( 1— Ha - ) d A
'

i n
(9)=-= t ^ l M"(t)

satisfies

(10) IMI n (t), t >_ 0, is a mean 0 square integrable martingale

with predictable variation process

(11) ( M n )(t)= 1 Y- (^N,,)(t)=


n ;_I f.,
(1-6-O n -)(1-AA) dA for ta0.

As in (7.5.27) and (7.5.28)

(12) (t) [N"(t)-A(t)]=


f
o ,
^ dM n for t>0
"

1 n
—^—

satisfies

(13) 3„, t?0, is a mean 0 locally square integrable martingale

with predictable variation process

(14) (1-DA) dA
(B„)(t)= J rJ ” for t>_0.
1 — ^n-

The old proof of (7.2.7) carries over verbatim to give the exponential formula

1-F n (t) - 1-F n


for0<_t<TF;
(15) 1-F(t)-1-,^0 1-F d(A"-A)
EXTENSION TO GENERAL CENSORING TIMES 327
thus, as before, (15) and (4) give
X(t) v'W[l "(t)— F(t)] 1— F"- ✓"
(`

(16) 1 —F(t) 1 —F(t) Jo 1—F 1 —H,_ dam "


for0<t<-rF and OstsT

for any non-degenerate df F on [0, oo). As in the proof of Theorem 7.5.1 we


have that X n = X" (• A T) satisfies

(17) X „(t)/(1— F(tnT)), t >_0,


is a mean 0 locally square integrable martingale

with predictable variation process


T _ 2

(1 —AA)dA
(18) —F(.AT)! (t)—f,' I1 1 -F} 1 -4^"-
for 0 I<oo
provided F and G, , ... , G. are such that T < TF a.s.

Theorem 1. Suppose F is an arbitrary df on [0, oo) and

(19) lim G"(B) <1.

Then both

(20) IIf" — FIIö-a.5. 0 as n -*oo

and

(21) JJA4 — Allo"-a.,. 0 as n-^oo.

Proof. The only change needed in the proof of Theorem 7.3.1 is to note that
the SLLN holds for arbitrary independent rv's with uniformly bounded fourth
moments as in Theorem 3.2.1. ❑

Note that we still have [as in (7.1.1)]

—H" ]
dii
(22) "(t)=vT [O l^(t)
- — H^(t)]+ f0

and [as
[as in (7.1.5)]

(23) Cov [A (s),AA "(t)]—V"(s) for0<sst,


328 CENSORED DATA AND THE PRODUCT-LIMIT ESTIMATOR

where

(24) V„(t)= I t(1—G„_)(1—AA) dF= J (1-H„_)(1—AA)dA.


1 0 0

We could generalize Theorem 7.1.1 to the present situation. However, we


will leave that to Exercise 1 below, since it is time to learn to use Rebolledo's
CLT.
According to Theorem 3.2.1, and a minor variation for l-4; , we necessarily
have

(25) IIH„ — Hn1Iö ßd.5. 0 and I JH —H, jjö -"a.s. 0 as n-oo.

Suppose we now add the hypothesis

(26) II G„ — G110 -4 0 as n -* oo for some sub df G.

Since H„ =1—(1—G„)(1—F) and H„ =j (1—G„_) dF, (26) implies

(27) —HI(-+O and I1Hn—H'llß -0 under (26),

where

(28) H=1—(1—G)(1—F) and H 1 = J (1 — G4dF.

Thus (25) and (27) show that

(29) JJHO — H lI ö -*a.s 0 as n -* oo under (26) and (28).

Thus (14) and (29) show that as n -3 00

(30) (B„)(t)_> a.s .C(t)= Jt 1 (1—AA) dA under (26) and (28);


o l —H_

likewise,

(31) X„(•n T)
1—F(•AT)
^(t) ^as.D(t)

= 0' 1 1
J dA under (26), (28), and G(t) < 1.
0 —H i —DA
1_

Thus Rebolledo's CLT (Theorem B.5.2) suggests the following result.


EXTENSION TO GENERAL CENSORING TIMES 329
Theorem 2. Suppose (1), (26), and (28) hold. Let 0< r—=H '(I). Then a , -

special construction of X,, ... , Xn , Y,, ... , Yn and Brownian motions S and
7L exist for which

(32) JJB„ —BIIä-> a. ,.0 for 0<r, provided F is continuous,

and

(33) for 0 < rr, for arbitrary F on {O, cc),

as n-*cc; here

(34) B =S(C) and X = (1 —F)Z(D)

for the C and D of (30) and (31). If F is continuous, then C = D and S =7L.
Proof. The proofs will be complete if we just verify the ARJ(2) condition
(B.5.8) of Rebolledo's theorem (Theorem B.5.2). We assume initially that

(a) F is a continuous df.

Consider the process B. Let

(b) B(t)=Bn(t)= J0
fd& = J "

0
fdV [Hn' —An],

where

(c) f = Jn /(1 —H n _) is >0 and predictable.

Then

(d) AB(t)=fvHn

since (a) implies A n is continuous. Thus the term A'[B](t) of (B.5.3) is given
by

(e) A [Bj(t) _ 1d6ß;,1[n[rl;[ by (B.5.3)

I fAHnl[IJID(nH')^fr]
c <r

(35) _ f^ (.f//) 1 [tfJ^ ✓;Ftd(nHn) with f= J,,/(I — l-1, -).


0
330 CENSORED DATA AND THE PRODUCT-LIMIT ESTIMATOR

In the context of (B.3.1) we identify

(f) N = n6-„, which is a counting process under (a),

(g) A=nAn, M=N —A=v'Tz n,

(h) H = (f/'./n) 1 1U^ > rEi,

(i) Y(t)= J ' HdM=A£[B](t)—


0
J r
0
HdA.

Since

U) J " H 2 d(M)<m
0
for 0<t<ao holds a.s.,

the statement that Y = AE [B]— J H dA E 2I o c [ , P] of (d) of Theorem B.3.1


shows that AE[B] has compensator

(k) A'[B](t)= J HdA= f (f/ ) 1 [1n,.,rE]d(nl^n) for t>0.


J o 0

Thus, as in (B.5.4),

B e (t)=A e [B](t)—AE[B](t)= ^ HdM


J0

( 36 ) _ ^ r.f1[u^^,^Et dW9 n with f - Jn/(1 — Hn-)


0

has, by (d) of Theorem B.3.1, for continuous F

( 1 ) (B`)(t) = J H 2 d(M)
0

(m) —
f.
' (1— H " _)
Jn 2 1 [J,/^l — trU „If ](1 — H n _) dA

(n) -*a.s.0 as n -^ oo for each t-0

using the dominated convergence theorem and 0< T. Thus

(37) BB—B= J ./
1[^< ✓ nE] dN/Of
EXTENSION TO GENERAL CENSORING TIMES 331
satisfies (use Theorem B.3.1 again)

4(B , B )(I)=(B +B )(t) (B


e E E E - E- B )(t)
£ by (B.1.12)

=(f fd^a 0
n )(t) —(f 0
f{ 1 [Ui< ✓nE] -1 [[fI F J } d>(t)

r
— fd(n)— f2 {1[1<nc]2'0"f'1[i^ ✓ ne]}d(n)
0 0

(38) =0.

Thus (n) and (38) show that (B.5.8) holds. Thus Rebolledo's CLT (Theorem
B.5.2) implies = of B. in case (a) holds. Application of Skorokhod's theorem
(Theorem 2.3.4) gives Z•'s, S ; 's, and §, from which X 's and Y; 's can be ;

constructed.
The proof for X. under (a) is virtually identical; just redefine the f of (c)
to be

(c') f= J,,(1 - fin-) /[( 1- F)(1 -H n

and note that the application of the dominated convergence theorem in (n)
uses the conclusion IlF n - FIJö->a.s.0 of Theorem 1.
Suppose now that we drop assumption (a) and suppose

(o) F is an arbitrary nondegenerate df on [0, co).

As in (3.2.33), we define associated continuous rv's

(39) X1 i + Pi1[d,<X]+ZPi6iiI[Xi =d1},

where di 's denote the discontinuities of F, pi = F(di ) - F(di -) denotes the


magnitude of the discontinuity at di , and the 6y 's are independent Uniform
(0, 1) rv's. We define

(40) d(r) =t +Epi l [a . ,} for t >0


i

as in (3.2.49), and

(41) Y; =O(Y)= Y; +Y_Pi1[a3]•


J

Thus while the mass of F at di is uniformly distributed across an interval of


length p , any mass G which happens to have a di is moved to the right-hand
; ;

end of the corresponding interval. Thus [X s Y ] = [X < Y ]. As in (3.2.51) ; ; ;


332 CENSORED DATA AND THE PRODUCT-LIMIT ESTIMATOR

and (3.2.52), the df's F of X and G; of Y; satisfy

(42) F=F(0), G=O;(), X= 0- '()), Yr=O-'(),

where A ' is our usual right-continuous inverse of (3.2.50). We also note that
-

(43) tv„ _ (F„ (A), and so X = (A).

Finally, recognizing an application of (A.9.16) in (31) and then integrating by


parts shows

(44) D = D(0), and so X = (1— F())S(D()).

Combining (42)-(44), we note that

IIxn -xIIO= IIRn(o)-(1-F(o))§(D(o))IIö


(45) <—II^'Cn—(1—F)s(1D)IIr
(p) -0 a.s.

by the continuous version of (33) applied to F This completes the proof.


With regard to our failure to extend (32) to the discontinuous case, we note
that

(46) A ^ A(A) and C ^ C(0).

It is instructive to compute these quantities when X is a Bernoulli (2) rv.


n
Exercise 1. Establish Theorem 1 (or a special case of it) by combining the
methods of Section 7.4 with a transformation to van Zuijlen's associated
continuous rv's.

Remark 1. One can now extend the above to the case of F,,,, ... , F„„ con-
tiguous to F,. .. , F via the methods of Section 4.1. Note that if G„,, ... , G,,,,
are kept the same in both situations, they will cancel out of the likelihood
ratio statistic.

The Maximum Likelihood Estimator of F

Exercise 2. (9„ is the maximum likelihood estimate of F) Let F denote a


possible candidate for the MLE of F. Suppose, as seems intuitive, that we
know that F must assign all its mass to the points Z,, ; , <_ • • • s Zn: n and the
interval (Z,, : ,,, cb). Let p; = Of(Z : j ) denote the mass of F at Z„, ; , and p„ + ,
1— F(Z„ : ,,). If there are no ties, then the "likelihood” of Z 1 ,.. . , Z„, S,, ... , S„
EXTENSION TO GENERAL CENSORING TIMES 333
for F is equal to

n I n+l
47) n t P S,,:, pl
rt i1+1 )}

times a term that depends only on G.


(i) Show that (47) is maximized by the choice

&:j sn:; for 1 i <_ n p.


( 48 ) P. = { ñ 11- ; P+^ 1-
L n-j+l n-i+1 J

(ii) Show that this choice of p; is just another representation of the product-
limit estimator I n .
(iii) Extend all of the above to allow ties.
Hint: The quantity to be maximized can be rewritten as j j;= , as^ (1 -
a-)n-;+, -s,,.;
where a ; =p ;/Y^ .' p; . See Scholz (1980) for a broad enough
definition of maximum likelihood to allow a rigorous proof that F. is the MLE
of F.
CHAPTER 8

Poisson and Exponential


Representations

0. INTRODUCTION
The Uniform (0, 1) order statistics ,; have the same distribution as does
(a 1 +. • • +a 1 )/(a 1 +• • • +a 1 ) where the a ; are iid Exponential (1). Since
the a ; can be viewed as interarrival times of a Poisson process, the link between
empirical, quantile, and Poisson processes is strong. This link was discovered
early, and the literature exploiting it is rich in quantity and variety.
In this chapter we carefully define the link. Our exploitation will come in
later chapters, primarily in treating the convergence of the quantile process
V. and the oscillations of the empirical process U.
While many first proofs in the literature use Poisson representations, many
results were later redone via other methods. So too, we will only use Poisson
methods in a few of the possible situations.

1. THE POISSON PROCESS t\l


Let a 1 , a 2 ,... be a sequence of independent Exponential (1) rv's. For t>0
and n > 0 define RJ(t) equal to n if a o + a, + • • • + a„ t and a o + a, + • • H-
t (here a o = 0). Then {r J( t): t>0} is called a Poisson process. We take
as well known that fJ can also be defined by requiring that for all k >_ I and
all 0 <_ t, <_ • • • <_ t k that N(t, ), t\!( t 2 ) —N1(t, ), ... , IN(t k ) — I^!(tk _ 1 ) are independent
Poisson rv's with means t, , t Z — t, , ... , tk — tk-I• We accept that exists as a
process on (D R *, S R '). Thus J has stationary increments and
(1) N(t,,t2] Poisson (t 2 —t,) for all 0<_ t,<t 2 .
Exercise 1. The waiting time i, to the kth success is defined by '1 k
inf {t: N(t)> k}= a,+• • •+a k . Show that ,7 k has density t k- 'e - '/(k-1)! for
t > 0. The a ; 's are called interarrival times.
334
REPRESENTATIONS OF UNIFORM ORDER STATISTICS 335

Note that N(t) = k for r,,. < t <1 k+ , for k>0 (here -q o = 0). Also, in analogy
with (3.1.88),

(2) [N( t) >_ k] = [llk ^ t] for all k>_0 and t>_0.

We say that events occur at the times m, 77 2 , ... , and N(t) counts the number
of events through time t.
We specify a two-dimensional Poisson process N on R+ x R + by requiring N
to have stationary, independent increments whose distributions are Poisson
with mean equal to the area of the increments. We take as known that N exists
on (D T , 2 T ) with T= R + x R. Our distributional assumption on the incre-
ments can be expressed as

(3) N(s2, 1 2) — N(S2, tl) — N(s1, 1 2) +N(s1, t1)


Poisson (02—sl)(12— t,))

for all 0 < S, ^ s 2 and 0 < t 1 <_ 1.

2. REPRESENTATIONS OF UNIFORM ORDER STATISTICS

It is often useful that uniform order statistics may be represented in terms of


exponential rv's.

Proposition 1. Let a, , ... , a n+ , be iid Exponential (1) rv's. For 1 <_ i n+1
define i7 i = a, + • • • +a, and define

( 1 ) n:i = hti/h1n+I

Then 0 <_ en:1 C . . . C en :n <_ 1 are distributed as the order statistics of a sample
of size n from the Uniform (0, 1) distribution.

( ) -' '_ .
Proof. The joint density function of the i ; 's is
a e e -( - 'i 1 . e - ( i"+i - "0 = e - ",-i

for Since the marginal density of _7 n + , is


0 <_ t, ^ ... <_ to +l.
(t" + , )" conditional density of fl, , ... , 77" given n" + , = t o+ ,
exp (— t " + , )/ n !, the
is n !(t + )" for 0 <_ t, < • . • <I ^ t o+ , . Thus the conditional density of Cn: ,
m/ m± for 1:5 i n given rt" + , = t" + , is n! for 0:5 ":, <_ ... < f ‚„<_I; and
since this is independent of t,, it is also the unconditional density. This
establishes the proposition; see Eq. (3.1.95). See Breiman (1968) for this proof.
We have in fact also proved that

( 2 ) (711/17n+1 , • • • , 77n/ fin +1) is independent of 7) " + , . ❑


336 POISSON AND EXPONENTIAL REPRESENTATIONS

There is a second representation of uniform order statistics that is also quite


useful.

Proposition 2. Let 17, , ... , rt„, ... denote successive waiting times associated
with a Poisson process N. Then the conditional distribution of 0 < m < ... <
n„ <_ I given fß(1) = n is the same as the distribution of the order statistics
0 e„ ; , < <_ I of a sample of size n from the Uniform (0, 1) distri-
bution.
Proof. Let 0<t< • ..<„< 1 be fixed, and let h, , ... , h„ be so small that
0 < t, < t, + h, < t 2 < t z + h 2 < < t„ < t„ + h„ < 1. Since N has independent
increments,

P (an event occurs in (t i , t; + h i ] for 1 ^ is n and N(1) = n)


n

(a) _ II h e
^
i =1

while

(b) P(N(1) = n) = 1/(en!)

Dividing the ratio of (a) over (b) by h, • • • h„ and taking the limit as
h, v • • • v h„ - 0 yields n!; this limit also represents the density function of
rl,,....17„ at 0<t,<• • •<t„<1. 0

We also present a third representation; see (4) below.

Exercise 1. Let 1;, , ... , e„ be independent Uniform (0, 1). Then clearly
= —log n:i for 1 <_ i n are distributed as the order statistics in a
sample of size n from the Exponential (1) distribution.
(i) (Sukhatme, 1937) Prove that the normalized exponential spacings

(3) y„i =(n —i +1)[ an:i—an:i-1 ] for t<_i<_n

are independent Exponential (1) rv's (here a„ o = 0).


(ii) (Renyi, 1953) Show that
n —i +1
(4) = exp — Z y„' for 1 <_ i <_ n.
;_, n—j+l

(iii) (Lehmann, 1947 and Malmquist, 1950) Note that

(5) [ n:i/^n:i +i]' = exp (—yn,n-;+s) for 1 <_ i < n;

thus the rv's of (5) are independent Uniform (0, 1).


REPRESENTATIONS OF UNIFORM QUANTILE PROCESSES 337

Exercise 2. (i) The minimum of n independent Exponential (0) rv's is


Exponential (0/n).
(ii) An exponential ry a has the lack-of-memory property that
(6) P(a>s+tIa>s)=P(a>t) for all s,t>0.
The lack-of-memory property of Exercise 2 extends to random times as
well. We shall only discuss this informally. Let a, ... , a" be independent
Exponential (0) rv's with ordered values 0= a o <_ a ". 1 <_ • • • <a" : ". Now a" : ,
is Exponential (0/n) by (i) of Exercise 2. By the extension of (ii) of Exercise
2 to random times, we see that starting at time a",, we are waiting for the time
to failure a,,. 2 — a n: , of the first of n —1 independent Exponential (0) rv's; thus
a n:2 — a n ,, is independent of a,,. 1 , and by (i) of Exercise 2 it has an Exponential
(0/(n —1)) distribution. Continuing in this fashion yields that the normalized
exponential spacings y n; = (n — i + 1)[a,,, 1 — a n:; _, ] for 1 <_ i < n are independent
Exponential (0) rv's.

Exercise 3. Show that n,, -o d Exponential (1) as n - oo, while for any fixed
integer k we have

(7) nen:k - d Gamma (k) as n -* oo.

The spacings of Uniform (0, 1) rv's are treated in Section 1 of the spacings
chapter (Chapter 21), which could profitably be read at this point.

3. REPRESENTATIONS OF UNIFORM QUANTILE PROCESSES

We recall the smoothed uniform quantile process 4/ n defined by

Öl n (t)=Vi[ (t)—t] for0<_ t^ 1

_
(1) '1i h0^i<_n +l
fort=+wit
( " n+l) n+1
with /„ linear on each [(i-1)/(n+ 1), i/(n + 1)].

Let a 1 ,. . . , a" + , be iid Exponential (1) rv's. Let Sn+, denote the partial-sum
process of the (0, 1) rv's a, —1,. . . , a n+ , —1, and let S n+ , denote the smoothed
partial-sum process that equals S "+, at each i/(n + 1) and is linear on each of
the closed intervals between these points. Let rt ; = a, + • • • +a 1 for i ? 1. Then

n+l [
§n +l(t) —t§n+111)]
^- n +1 1 n + ,

i i
fort with0^i<_n+I
(2) 5n+l] =n + 1
338 POISSON AND EXPONENTIAL REPRESENTATIONS

and is linear on the closed intervals in between. Proposition 8.2.1 thus gives that

^n+l
(3) [§„+l – I§n+,(1)] for each fixed n.
n + I 71n+l

This representation was exploited for studying empirical and quantile processes
by Breiman (1968). Note that the SLLN, the CLT, and the LIL imply that as
n-3. c30, with b„ = 2 log t n,

n
(4) nn -*as. 1, Vi( n„- 1) -3.d N(O, 1 ), Tim — - –1 =1 a.s.
n-.w b„ n

Recall now from Exercise 2.2.1 that

N(t)=S(t)–tS(1) for0<t<_ 1
(5)
Brownian bridge

Applying (4) and (5) to (3) leads us to expect that 4/„ converges weakly to
Brownian bridge; indeed we already know this result from Theorem 3.1.1. In
a later chapter we will obtain a stronger form of this result.
Additional identities for other variations on the uniform quantile process
are found in Section 11.5.

4. POISSON REPRESENTATIONS OF U.

In Proposition 8.2.1 we saw that Uniform (0, 1) order statistics f„, ; can be
represented as

(1) „:; = m7 ; / n„+l jointly in 1 <_ i _< n +1, for each fixed n

where n ; ° a, + • • • + a is a sum of independent Exponential (1) rv's a ; . These


;

ri can also be viewed as the waiting times associated with a Poisson process
;

N. This suggests that many properties possessed by the empirical process U.


might be properties it inherits from N.
We now present the three different representations (Chibisov, conditional,
and Kac) of U. in terms of Poisson variables. The representations of U,, in
terms of Poisson processes have the proper distributions for each fixed n, but
incorrect distributions when viewed jointly in n. Thus they are suitable for
proving weak limit theorems involving -> d and -, type results, but are un-
suitable for proving strong limit theorems involving a.s. type results.
This difficulty is then overcome by use of a two-dimensional Poisson process
to provide a representation of U. that has the correct distributions when viewed
jointly in t and n. It is useful for proving strong limit theorems. The representa-
POISSON REPRESENTATIONS OF U, 339

tion, necessarily, involves quite a bit of dependence. It is possible, however,


to study a simpler Poisson bridge, and then look in on this process only at
appropriate random times. This is discussed in the next section.
Let denote a Poisson process on [0, cc), and let rt,, 1 2i ... denote the
arrival times associated with it. We define

(2) a„(t) =Vi[NI(nt) /n—t] fort>0

and

(3) 93„(t) = v „(tr)„+, /n) +.fi


n(-7 „+, /n-1)t for0<t<1,

with (ß„ (1) = 0 as continuity at 1 requires.


The conditional representation of U. is an immediate consequence of Propo-
sition 8.2.2: the conditional distribution of the process Ii„ on [0, 1], conditional
on the fact that 1(n) = n, agrees with the distribution of U,. That is,

(4) v„ I LN(n) = n ] = U„ for each fixed n.

The representation of U. based on Sukhatme's Proposition 8.2.1 was stated


explicitly by Chivisov (1964). We thus refer to the fact that

(5) ff3„ = OJ„ for each fixed n

as Chibisov's representation.
A somewhat different idea is contained in the following Kac representation
of the empirical process U „. Let N„ = Poisson (n), and define

(6) W„(t) =vnj1 1 [o.']( ^) — t} for0 .5 t^1,


l n ;.,

where, , 2 , ... are iid Uniform (0, 1) rv's independent of the N„'s.

Exercise 1. Show that W. has stationary and independent increments and


W„(t) is [Poisson (nt) — nt]/^. Clearly, we have conditionally that

(7) W „l[N„ =n]=v„

Exercise 2. (i) Show that W„^S on (D, ^, II II) as n -+co.


(ii) Verify the identity

(8 ) ONE„= UN„+I ✓n( n-1).

(iii) Show that U,= U on (D, -9, I1 II) as n->oo can be inferred from this.
340 POISSON AND EXPONENTIAL REPRESENTATIONS

5. POISSON EMBEDDINGS

We let N(s, t) denote a two-dimensional Poisson process with

(1) EP (s,t) =st.

We now define

(2) N5(t)= t\!(s,,1)


for0t1

for each s ? 0. Note that the conditional distribution of N, given the event
[N(s, 1) = n] has occurred is the same as that of U; that is,

(3) N,I[N(s, 1) = n] = U.

In fact, for

(4) Sk=inf{s:^l(s, 1)=k} fork-0,

we have

(5) (Ns.,Ns, ..•)=(v1,Uz,...),

where U 1 , U 2 ,... are empirical processes of a single sequence of iid Uniform


(0, 1) rv's 6,, ez ,... Note also that

(6) Yk=Sk —Sk_I,for k>-1, are iid Exponential (1) rv's.

It is easier to work with the Poisson bridge

(7) Ms(t) = N(s ' t) - A(s, 1) for 0 <_ t<_ 1

for each s > 0 [let IOI (t) = 0 for all 0< t <_ 1}; reasons for this name will become
clear as we go along. Note first that for each fixed s we have

(8) EM,(t) =0 for all O<_ t< 1

and

Coy [fv1l(t 1 ), Rv (t z )] = Coy [N(s, t 1 ) — i 1 N(s, 1), IN(s, t 2 ) — t zl%l(s, 1)]/s


_ [(st,) — t,(stz) — t2(St1) + (t, t2)5]/s
(9) = t, — t, t zf
or 0 < t 1 t z 1.
POISSON EMBEDDINGS 341
We note that for each fixed s >0 the process

(10) L,(t)=[N(s, t)—st]/f for t-0

is a "centered" one-dimensional Poisson process with

(11) EL,(t) =0 and Cov [L (t,),IL,(t z )]=t,At z

and possessing independent increments. These are all properties possessed by


Brownian motion also. Recalling Exercise 2.2.1 in which Brownian bridge was
defined by "tying down" Brownian motion to equal 0 at t = 1, we similarly
tie down the centered Poisson process 1,, by defining the Poisson bridge Iv1 to
be

(12) M (t)=L (t)—tL,(1).


5 L

We note that this agrees with (7).


Note that the empirical process U. has the property that

(13) !U„(1 — I)= — U,,;

that is, if we run it backward in time t, then effectively all we do to the


distribution is change the sign. Moreover, the increments of U. satisfy the
distributional relationship

(14) {U n (b —t, b]: b>_ t? a}=—{—U„(t): 0 s t b —a}.

The M, process shares these properties since

(15) f^1+0 I[(^(s,1) =n]= n/sU,,


5

implies that it shares them conditionally on each value of N(s, 1). Note from
(13) that s shares them also. Thus

(16) (1—I)= —M, and N,(1 —I)= —BI S

and
(17) {ßA (b — t, b]: b >_ t >_ a}
L

{—rlA (t):0 <_t<b—a} and likewise for hi,.

Since U„ has stationary increments, the conditional argument based on (15)


and (3) used above shows that

(18) M s and N. both have stationary increments.


342 POISSON AND EXPONENTIAL REPRESENTATIONS

Taking both s and t into account we determine, using the independent


increments of N, that the process

(19) M(s, t)=N(s, t)-t1 J(s, 1)=ffy11(t)

for s>0 and 0< t<_ I has 0 means and

(20) Coy [vfl(s1 i t,), l,'fl(s2, 12)] _ (s, A s2)( II A 12 — 1 1 1 2)

for s 1 , s 2 ? 0 and 0 < 1 1 , 1 2 < 1. These are the same moments as K possessed.
They are also the same moments as possessed by the sequential uniform
empirical process K,,. We also note that with S k as defined in (4),

(21) K„(s,•)= f^Us,= JM s, for-_s< - l,k>_0


n n

k k+l
(22) vk <_
for-s< ,k>_0,
n n n
(23) = (as a version of the sequential uniform empirical process).

This representation was introduced by Pyke (1968).

The Kac Version

Suppose {l^U(s): s >_ 0} is a one-dimensional Poisson process that is independent


of the sequence t",, b2 ,... of independent Uniform (0, 1) rv's. Then

1 rats)
(24) ^(s) ^^ [1[o,,(;) - t]= 1 l(t) jointly ins ?0 and 0-_t^1,

so that the lhs of (24) is another representation of the process ß! 5 (t) of (2).
CHAPTER 9

Some Exact
Distributions

0. INTRODUCTION

In this chapter we determine formulas for the probability that G. crosses an


arbitrary line (Section 1) and the exact distribution of the number of intersec-
tions with that line (Section 5). Of course, from the former, one can obtain
the exact distribution of II(C„ — I)'^^; we do this (Section 2), and also manipu-
late the formula to obtain the powerful Dvoretzky, Kiefer, and Wolfowitz
(DKW) inequality. More generally, we evaluate P(g <G„ <_ h on [0, 1]) for
general functions g and h (Section 3), and use this to obtain the exact
distribution of !IG —111. The DKW inequality and Noe's recursions are the
key results of this chapter.
We also wish to stress the methods used in this chapter. We will call them
the analytical, the combinatorial, and the Poisson representation methods.
Using the analytic method based on exact binomial and uniform calculations,
we derive Dempster's key formula (Section 1) for the probability that 6„
crosses a line for the first time at height i/ n. We use this method throughout
Sections 1, 2, and 3. In Section 4 we introduce the combinatorial method and
rederive Dempster's key formula with it. We give some additional applications
of this method in Sections 5 and 6 (the latter deals with the fact that the
location of the point at which the maximum of jIG, — If is achieved is nearly
a Uniform (0, 1) rv, and can be slightly modified to make the result exact). In
Section 7 we introduce the Poisson process approach and illustrate it with
some old and some additional examples. In Section 8 we apply previous results
in a brief consideration of the local time process L. of U „. In Section 9 we
treat the two-sample analog of JjU;'jj by combinatorial methods, and derive
limiting results for J IU , from our formulas.
A good starting point for additional results and references would be found
in Niederhausen (1981a) or Csäki (1981).
343
344 SOME EXACT DISTRIBUTIONS

y 1-a
1
i
i

L' 'I
i^ b
slope = b/(1—a)
1-b

0 1 4 I Figure 1.
0 a 1

1. EVALUATING THE PROBABILITY THAT G. CROSSES A


GENERAL LINE

We consider the probability that the empirical df G„ of a sample of Uniform


(0, 1) rv's , ... , „ crosses the line L defined by y = b(t — a)/(1 —a) which
intersects the t-axis at 0< a < I and the line t = I at 0< b [let c = b/(1 — a)
denote the slope]; see Figure 1. Let L' denote the related dashed line shown
in Figure! in cases when 0<b<_1.

Theorem 1. For0<a<1, 0<b, c=b/(1—a), and n?1,

(1) P(G„ intersects L on [a, 4„ „)) ;

= P(G„ intersects L' on [^„:,, 1 — a))


m n) „-
(2) —
i =o
a
(

i
[ ^J-'
a+-
c
1—a—
cn
,

where m is the greatest integer strictly less than (nb) A n.

The theorem follows immediately from the following result, which is a key
lemma in many papers dealing with exact distribution theory.

Proposition 1. (Dempster's formula) Let a>0, b ? 0, and a + ib < 1. Let

the graph of G„ intersects the line y=(t—a)/(bn)


B;
[ at height i /n, but not below this height
(3) =[Cn:j<_a+(j-1)b for !^j<i but „, ;+ ,>a+ib].

Then for any n >_ 1,

(4) P„1(a,b)=P(B1)=1 o f a(a+ib)' '(1—a—ib)" '. - -

Lines L in Figure 1 having /a = 0 and b >_ I can be reparameterized as L' in


Figure 1 with 1—a and 1—b replaced by 1/b and 0. We then obtain the
PROBABILITY THAT G,, CROSSES A GENERAL LINE 345

n/n

2/n

1/n
0 t

0 a a +b a +2b ... a+ib

Figure 2.

following simplification of Theorem 1. Since a complicated identity is involved


in simplifying the resulting expression, we will give an independent proof of
the result.

Theorem 2. (Daniels) For a = 0, b >_ 1, and n >_ 1

P(G n (t)<_bt for 0 < - 1 )= P(IIGn/III :b)


_t <
= P(G n (t) < bt for 0< t <_ 1) = 1— P(G„ intersects L on (0, 1])
(5) =1-1/b.

We now consider lines L in Figure 1 having a = 0 and b < 1.

Theorem 3. (Chang) For a=0 and 0<b<1 and n>1

P(6(:)? bi for fn:l^t 1 )=P(III/GfhIc.:I^ 1 /b)


= P(6 n intersects L on [fin; ,, 1])
= P( max (n. 1+1 / i) ^ 1/b)
Isi^n -1

1 n ( nb ) n (i-1)
i 1 i ) n-i
{b} nb
=1 bn/ + i l ^I (nb)'

(Se also Section 10.3).

Exercise 1. (Renyi, 1973) For all A > 0, show that


n C
n:i
\ co Ake— (k +11A
( k+l k
)
(7) P max —> ^l J
1 i^n i / k=O k!
by finding an exact expression for the left-hand side and letting n - x.

Exercise 2. (Takäcs, 1967) Use Dempster's Proposition 1 to determine the


exact df for (I (fi n — ')/'II for a fixed 0< a < 1.
346 SOME EXACT DISTRIBUTIONS

Proof of Proposition 1. Let p i = p 1 (a, b). Let

9=b(i—j)/(1—a—bj).
Now

(n) (a + bi)'(1 — a — bi )n-i = P(nG. (a + ib) = i)

i -I

_ Y- P(B; )P(nG"(a+ib)—nG"(a+jb)= i —jIB; )+P(B 1 )


l= 0
(n—j
j-'(1—B)" '+Pi
i =a i j )0

(a) = q+pi.
Also

n (a+ bi )` - '(1—a— bi) " - ' + '=P(nG"(a+ib) =i -1)


i —1
i -1
_ Y_ P(B; )P(nG"(a+ib)—nG"(a+jb) =i —j —i B; )

n —1
—J ei -J-ß(1 +l
° A — e)n -i
i =o i —j-1
i —j
j P'ti—j 1 0 n —i +1

(b)
_ 1—a —bi
n
— {b( — i +1)
j q

since the two terms in brackets are equal. From (a) and then (b) we obtain

A=(n)(a+bi)'(1—a— bi) " -i —q

= 11^ (a+bi)'(1—a — bi)" - '

b(n — i +l n ;-, )" -i +,


) -1)(a +bi ) (1 — a — bi
1—a —bi i

=(')(a+bi)i-'(1—a— bi) "-'[a +bi —b(n —i +l) n


—i +1 ]
n
=( i+bi)`
(a -'(1—a— bi) "-'a
)

PROBABILITY THAT G. CROSSES A GENERAL LINE 347


as was claimed. This proof is from Dempster (1959). This is also proved in
Vincze (1970), Csäki and Tusnady (1972), Renyi (1973), and Durbin (1973a).
Vincze's proof appears in Section 4. See also Pyke (1959). El
Proof of Theorem 1. The equivalence of the two probabilities follows from
the fact that I — f,, ... , I — „ is also a Uniform (0, 1) sample. Now

P(G„ intersects L)
m
_ £ P(G intersects L for the first time at height i/ n )
i=o

_ 1—a
(a) 'Eo P. a, b n using definition (4).

Plugging (4) into (a) gives (2).


We mention Chapman (1958), Pyke (1959), and Dempster (1959) in regard
to this theorem. ❑
Proof of Theorem 2. The result is from Daniels (1945); the proof we give is
from Robbins (1954). The probability in (5) equals

c f
n ! •..
J (n—I)C/n J c/n
d^n:l ... den:n-, d n:n

with c = 1/b, and can be evaluated by elementary integration.


Alternatively, consider P(G„ intersects L' on (0, 1]) in Theorem 1 with a
and b replaced by I — 1/b and 1. It is difficult to recognize that the resulting
sum actually does reduce to 1/b; paragraph one of this proof shows that it
does. See also Dempster (1959) for the resulting identity proved somewhat
differently. ❑
Proof of Theorem 3. This proof is also based on Dempster's Proposition 1.
It combines ideas from Chang (1955) and Renyi (1973). Let A = 1/b. Now,
by conditioning on

PA =P(I^i<n—I
max (n n:1+l /i)^A)
= P( n:1 ^ A /n)
A/n
+ P( max nf„ :;+ ,>iAi1=„ : ,=s)n(1 —s) " -I ds
o I^i^n—I

A 1 A/" ("/A) {j-1 }A


=(I- -f + max ,;< and
n/ J o 1=1 P( 2^j^i n
(a)
iA
„,;+,> n en ,, =s) n(1—s) n— 1 ds,
I

where the conditional distribution of {e„, j /(1— s): 2 s j s n} is that of n —1


Uniform (0, 1) order statistics.
348 SOME EXACT DISTRIBUTIONS

n n

Figure 3.

Now recall from Dempster's proposition (Proposition 1) that

P-1.1_1(a,b)=P(6n-,:j a+(j-1)b for 1sj<_i-1 and


fn _, :i >a +(i -1)b)

-1)b) "-i.
(b) =(n-1)a(a +(i- 1)b)i-2(1—a— (i

Thus

^ a/n—s , A 1
PA= (1—^ n + (.LI,) A/" Pn-1 . ;—I
^
J
n ; _,fo 1—s n(1—s)

x n(1— s)" ' ds by (a)

(c)
(
( 1 n ) + ii
n ^n/A) 0 A f _

Pn 1. 1 (n(I — sln)' n(1 - s/n)/


-

\ s\
1 ds
"-'

x^l—n

,l " J A
( n_1 \ 1

x[iA—s]` -Z f 1— ^^
I " ds by(b)
I. n

A"a n i il
) a( )

+(.ls)(iA—s)i2ds
n/ ;_^ i n nj o
i-1 iA^n i

(d )
:1""I
=^ 1-- +
n
1 ;_, i
([-1)
i
nA L 1--
L

as claimed. ❑n
Exercise 3. The following proof of Theorem 2 is due to Renyi (1973). Show
that G„(A)= P(I,G”/ III <A), satisfies

_ / n-1
Gn (A) Gn_i ^ns" - 'ds
l /A'
\ Ans
EXACT DISTRIBUTION OF ICU II AND DKW INEQUALITY 349
by conditioning on sn ,„. Then use induction to show that G(A) = 1-1 /A for
A>_1 and all n >_1.

2. THE EXACT DISTRIBUTION OF IIUJRII


AND THE DKW INEQUALITY

The null distribution of the one-sided Kolmogorov test of fit is that of IIU I.
Also, let D 1 (6 n - 1) # ^^.

Theorem 1. (Smirnov; Birnbaum and Tingey) Now ICU„ II. Also 0<
IIU II<v'i• Mainly, for ø<A <1, P(IIU„Il>v'A)= P(D>A), and

(n(l-A)) n i i 1 n-i

(1_A--)

n )( i 1r
in—A I I 1+,k
(2) =^ )+I^(

Proof. Note that if 6 n intersects the line L of Theorem 9.1.1 with a = 1 — b = A,


then 11(6„ —1) II > A by the right continuity of G „. Now, just apply Theorem
9.1.1.
This result was proved independently by Smirnov (1944) and Birnbaum
and Tingey (1951). ❑

Exercise 1. (Limiting distribution of IJv„II) Derive the limiting distribution


ofIIU^II of Example 3.8.1 by appealing to (1) and Stirling's formula.
The distribution of D„ for .1 = c/ n, with integral c, is given in Table 1 from
Birnbaum (1952).

Asymptotic Expansions

Asymptotic expansions of the distribution of IIU„ II are contained in Lauwerier


(1963), Penkov (1976), and the work of Chang Li-Chien reported in Gnedenko
et al. (1961). We quote only that Chang Li-Chien gives

t 2 2A 2A2 1 2A2\
(3) P(IJU„^^^A)=exp( -2d )3 n 1— —i-
3
[1+inT12+ -

n3 2 (5 1 1 52 234 (
1 )]

for 0 <A <O(n1J6)


Table I
Finite Sample Distribution of the Kolmogorov Statistic (from Birnbaum (1952))
P(D„ s c/n) = tabled value

n
c 1 2 3 4 5 6 7 8

1 1.00000 .50000 .22222 .09375 .03840 .01543 .00612 .00240


2 1.00000 .92593 .81250 .69120 .57656 .47446 .38659
3 1.00000 .99219 .96992 .93441 .86937 .83842
4 1.00000 .99936 .99623 .98911 .97741
5 1.00000 .99996 .99960 .99849
6 1.00000 1.00000 .99996
7 1.00000

c 9 10 11 12 13 14 15 16

1 .00094 .00036 .00014 .00005 .00002 .00001 .00000 .00000


2 .31261 .25128 .20100 .16014 .12715 .10066 .07950 .06265
3 .78442 .72946 .67502 .62209 .57136 .52323 .47795 .43564
4 .96121 .94101 .91747 .89126 .86304 .83337 .80275 .77158
5 .99615 .99222 .98648 .97885 .96935 .95807 .94517 .93081
6 .99982 .99943 .99865 .99732 .99530 .99250 .98882 .98425
7 1.00000 .99998 .99993 .99979 .99953 .99908 .99837 .99736
8 1.00000 1.00000 .99999 .99997 .99993 .99984 .99968
9 1.00000 1.00000 1.00000 .99999 .99997
10 1.00000 1.00000

c 17 18 19 20 21 22 23 24

1 .00000 .00000 .00000 ,00000 .00000 .00000 .00000 .00000


2 .04927 .03869 .03033 .02374 .01857 .01450 .01132 .00882
3 .39630 .35991 .32636 .29553 .26729 .24147 .21793 .19650
4 .74019 .70887 .67784 .64728 .61733 .58811 .55970 .53216
5 .91517 .89844 .88079 .86237 .84335 .82386 .80401 .78392
6 .97875 .97235 .96506 .95693 .94802 .93837 .92805 .91712
7 .99598 .99419 .99195 .98924 .98605 .98236 .97817 .97349
8 .99944 .99907 .99856 .99788 .99700 .99590 .99456 .99296
9 .99994 .99989 .99980 .99968 .99949 .99924 .99890 .99846
10 1.00000 .99999 .99998 .99996 .99993 .99989 .99982 .99973
11 1.00000 1.00000 1.00000 .99999 .99999 .99998 .99996
12 1.00000 1.00000 1.00000 1.00000

e 25 26 27 28 29 30 31 32

1 .00000 .00000 .00000 .00000 .00000 .00000 .00000 .00000


2 .00687 .00535 .00416 .00323 .00251 .00195 .00151 .00117
3 .17702 .15935 .14334 .12885 .11575 .10392 .09325 .08363
4 .50554 .47987 .45517 .43145 .40870 .38693 .36612 .34624
5 .76368 .74338 .72309 .70288 .68280 .66290 .64323 .62382
6 .90565 .89368 .88128 .86851 .85541 .84203 .82843 .81463
7 .96832 .96269 .95661 .95010 .94318 .93588 .92822 .92022
6 .99110 .98895 .98651 .98378 .98076 .97745 .97384 .96995
9 .99792 .99725 .99645 .99551 .99441 .99315 .99172 .99012
10 .99960 .99943 .99921 .99894 .99861 .99821 .99773 .99717
11 .99994 .99990 .99985 .99979 .99971 .99960 .99946 .99930
12 .99999 .99999 .99998 .99997 .99995 .99992 .99989 .99985
13 1.00000 1.00000 1.00000 1.00000 .99999 .99999 .99998 .99997
14 1.00000 1.00000 1.00000 1.00000

350
Table 1 (cont)

c 33 34 35 36 37 38 39 40

1 .00000 .00000 .00000 .00000 .00000 .00000 .00000 .00000


2 .00091 .00070 .00054 .00042 .00033 .00025 .00020 .00015
3 .07497 .06717 .06016 .05386 .04820 .04312 .03856 .03448
4 .32729 .30923 .29205 .27570 .26018 .24544 .23145 .21819
5 .60470 .58590 .56744 .54934 .53161 .51427 .49733 .48078
6 .80069 .78663 .77250 .75831 .74410 .72990 .71572 .70159
7 .91192 .90332 .89447 .88538 .87608 .86658 .85690 .84707
8 .96578 .96134 .95664 .95168 .94648 .94104 .93539 .92952
9 .98834 .98638 .98423 .98191 .97939 .97670 .97382 .97077
10 .99652 .99578 .99494 .99399 .99294 .99178 .99050 .98910
11 .99910 .99886 .99857 .99824 .99785 .99741 .99692 .99636
12 .99980 .99973 .99965 .99954 .99942 .99928 .99911 .99891
13 .99996 .99994 .99992 .99990 .99986 .99982 .99977 .99971
14 1.00000 .99999 .99999 .99998 .99997 .99996 .99995 .99993
15 1.00000 1.00000 1.00000 .99999 .99999 .99999 .99999

c 41 42 43 44 45 46 47 48

1 .00000 .00000 .00000 .00000 .00000 .00000 .00000 .00000


2 .00012 .00009 .00007 .00005 .00004 .00003 .00002 .00002
3 .03081 .02753 .02459 .02196 .01960 .01750 .01561 .01393
4 .20562 .19373 .18247 .17181 .16174 .15222 .14323 .13474
5 .46464 .44891 .43359 .41868 .40418 .39008 .37639 .36310
6 .68752 .67354 .65965 .64588 .63223 .61872 .60536 .59215
7 .83711 .82702 .81684 .80657 .79623 .78583 .77539 .76492
8 .92345 .91719 .91075 .90415 .89739 .89048 .88344 .87628
9 .96754 .96413 .96056 .95682 .95293 .94888 .94467 .94033
10 .98759 .98596 .98421 .98233 .98033 .97822 .97598 .97363
11 .99573 .99504 .99428 .99344 .99253 .99154 .99047 .98933
12 .99868 .99842 .99813 .99779 .99742 .99701 .99655 .99605
13 .99963 .99955 .99945 .99933 .99919 .99904 .99886 .99866
14 .99991 .99988 .99985 .99982 .99977 .99972 .99966 .99959
15 .99998 .99997 .99996 .99995 .99994 .99993 .99991 .99988

c 49 50 51 52 53 54 55 56

1 .00000 .00000 .00000 .00000 .00000 .00000 .00000 .00000


2 .00001 .00001 .00001 .00001 .00001 .00000 .00000 .00000
3 .01242 .01108 .00988 .00880 .00785 .00699 .00623 .00555
4 .12672 .11916 .11203 .10530 .09896 .09298 .08735 .08205
5 .35020 .33769 .32556 .31381 .30242 .29140 .28073 .27041
6 .57911 .56623 .55353 .54101 .52868 .51654 .50459 .49283
7 .75442 .74392 .73342 .72294 .71247 .70203 .69162 .68126
8 .86899 .86160 .85412 .84654 .83889 .83116 .82337 .81552
9 .93584 .93122 .92648 .92161 .91662 .91152 .90632 .90102
10 .97115 .96856 .96586 .96304 .96011 .95708 .95393 .95069
11 .98810 .98679 .98540 .98392 .98237 .98073 .97900 .97720
12 .99550 .99490 .99425 .99356 .99280 .99200 .99113 .99022
13 .99844 .99820 .99792 .99762 .99729 .99693 .99654 .99611
14 .99951 .99941 .99931 .99919 .99906 .99891 .99875 .99857
15 .99986 .99983 .99979 .99975 .99970 .99964 .99958 .99951

cuntinued

351
Table 1 (cont.)

c 57 59 59 60 61 62 63 64

1 .00000 .00000 .00000 .00000 .00000 .00000 .00000 .00000


2 .00000 .00000 .00000 .00000 .00000 .00000 .00000 .00000
3 .00494 .00,140 .00392 .00349 .00310 .00276 .00246 .00219
4 .07706 .07236 .06793 .06377 .05986 .05617 .05271 .04946
5 .26042 .25077 .24144 .23242 .22371 .21529 .20717 .19933
6 .48128 .46092 .45876 .44780 .43705 .42649 .41614 .40599
7 .67094 .66068 .65049 .64035 .63029 .62030 .61040 .60057
8 .80762 .79068 .79171 .78370 .77567 .76761 .75955 .75148
9 .89562 .89013 .88455 .87889 .87316 .86736 .86150 .85557
10 .94734 .94390 .94036 .93674 .93302 .92921 .92533 .92136
11 .97531 .97334 .97129 .96916 .96695 .96466 .96230 .95986
12 .98924 .9811.21 .98712 .98598 .98477 .98351 .98218 .98080
13 .99565 .99515 .99462 .99406 .99345 .99281 .99212 .99140
14 .99837 .99V15 .99791 .99765 .99737 .99707 .99674 .99639
15 .99943 .99934 .99925 .99914 .99902 .99889 .99874 .99858

c 65 61. 67 68 69 70 71 72

1 .00000 .00CCO .00000 .00000 .00000 .00000 .00000 .00000


2 .00000 .00CCO .00000 .00000 .00000 .00000 .00000 .00000
3 .00195 .00173 .00154 .00137 .00122 .00108 .00096 .00086
4 .04640 .04352 .04082 .03828 .03589 .03365 .03155 .02958
5 .19176 .18445 .17741 .17061 .16406 .15774 .15165 .14578
6 .39603 .38528 .37672 .36736 .35819 .34921 .34043 .33183
7 .59083 .58:19 .57163 .56217 .55280 .54354 .53437 .52531
8 .74340 .73523 .72726 .71919 .71115 .70311 .69510 .68712
9 .84958 .84355 .83746 .83133 .82516 .81895 .81271 .80644
10 .91731 .91320 .90901 .90475 .90042 .89604 .89159 .88709
11 .95735 .95476 .95211 .94938 .94659 .94373 .94080 .93781
12 .97936 .97786 .97630 .97469 .97301 .97128 .96950 .96765
13 .99063 .98583 .98898 .98809 .98716 .98619 .98518 .98412
14 .99602 .99562 .99519 .99474 .99425 .99374 .99321 .99264
15 .99841 .99523 .99603 .99781 .99758 .99733 .99707 .99678

c 73 74 75 76 77 78 79 80

1 .00000 .00000 .00000 .00000 .00000 .00000 .00000 .00000


2 .00000 .00000 .00000 .00000 .00000 .00000 .00000 .00000
3 .00076 .00068 .00060 .00053 .00047 .00042 .00037 .00033
4 .02772 .02598 .02435 .02282 .02138 .02003 .01877 .01758
5 .14013 .13468 .12943 .12438 .11951 .11482 .11031 .10597
6 .32342 .31519 .30714 .29928 .29159 .28407 .27672 .26955
7 .51635 .50750 .49875 .49011 .48158 .47316 .46485 .45664
8 .67916 .67123 .66333 .65546 .64764 .63985 .63211 .62441
9 .80014 .79382 .78748 .78112 .77475 .76836 .76197 .75557
10 .88253 .87792 .87326 .86856 .86381 .85902 .85419 .84932
11 .93476 .93165 .92848 .92525 .92197 .91864 .91525 .91182
12 .96576 .96380 .96180 .95974 .95762 .95546 .95324 .95098
13 .98302 .98187 .98069 .97946 .97819 .97687 .97552 .97412
14 .99204 .99142 .99076 .99008 .98936 .98861 .98783 .98702
15 .99648 .99616 .99582 .99546 .99508 .99468 .99426 .99382

continued

352
EXACT DISTRIBUTION OF IIU„II AND DKW INEQUALITY 353
Table 1 (cont)

c 81 82 83 84 85 86 87 88
1 .00000 .00000 .00000 .00000 .00000 .00000 .00000 .00000
2 .00000 .00000 .00000 .00000 .00000 .00000 .00000 .00000
3 .00030 .00026 .00023 .00021 .00018 .00016 .00015 .00013
4 .01647 .01542 .01444 .01353 .01267 .01186 .01110 .01040
5 .10178 .09776 .09389 .09017 .08659 .08314 .07983 .07664
6 .26253 .25569 .24900 .24247 .23609 .22986 .22379 .21786
7 .44855 .44056 .43269 .42493 .41727 .40973 .40229 .39497
8 .61675 .60914 .60159 .59408 .58662 .57922 .57188 .56459
9 .74917 .74276 .73636 .72996 .72356 .71717 .71079 .70442
10 .84442 .83949 .83452 .82953 .82451 .81947 .81440 .80932
11 .90833 .90480 .90123 .89761 .89395 .89025 .88651 .88273
12 .94867 .94630 .94390 .94144 .93894 .93640 .93381 .93118
13 .97268 .97119 .96967 .96811 .96650 .96486 .96317 .96145
14 .98618 .98531 .98440 .98346 .98249 .98149 .98046 .97939
15 .99336 .99287 .99237 .99184 .99129 .99071 .99011 .98949

c 89 90 91 92 94 96 98 100
1 .00000 .00000 .00000 .00000 .00000 .00000 .00000 .00000
2 .00000 .00000 .00000 .00000 .00000 .00000 .00000 .00000
3 .00011 .00010 .00009 .00008 .00006 .00005 .00004 .00003
4 .00973 .00911 .00853 .00798 .00699 .00612 .00536 .00469
5 .07357 .07063 .06779 .06507 .05994 .05520 .05082 .04678
6 .21207 .20643 .20092 .19555 .18520 .17536 .16600 .15712
7 .38775 .38064 .37364 .36674 .35327 .34021 .32757 .31533
8 .55735 .55018 .54306 .53600 .52207 .50839 .49496 .48178
9 .69806 .69172 .68539 .67908 .66651 .65403 .64165 .62937
10 .80421 .79909 .79395 .78880 .77847 .76810 .75771 .74731
11 .87892 .87507 .87119 .86728 .85937 .85136 .84324 .83504
12 .92851 .92580 .92305 .92026 .91457 .90874 .90278 .89670
13 .95969 .95789 .95605 .95418 .95032 .94632 .94218 .93791
14 .97830 .97717 .97601 .97482 .97234 .96974 .96702 .96417
15 .98884 .98818 .98748 .98677 .98526 .98366 .98196 .98016

Open Question 1. Can an asymptotic expansion for the distribution of U.


itself be made on a space such as (D, -9, II II) that would allow results such
as (3) to be claimed as corollaries?

An Approximation to the Distribution of IIUJIII

Remark I. (Harter) Let 17 ° denote the upper a percent point of the ry lull;
( )

see (1.3.8). Then (1.3.6) suggests that the upper a percent point i» of IIG„ -III
be approximated by -q ' /Vi. However, Harter's (1980) numerical investigation
(

suggests that

n+4
( 4 ) 17n - r7^°^/ n^
3.5

is far more accurate in small samples. For a values of 0.20, 0.10, 0.05, 0.02,
0.01, approximation (4) seems to have an error of less than 0.5% for n>6
and of less than 0.25% for n >_ 11.
354 SOME EXACT DISTRIBUTIONS

Similar results are reported for each of D„, D, K„, W„, and U„ by Stephens
(1970); both upper and lower tails are considered. See Chapter 4 for these
results.

A Slight Modification

Exercise 2. (Pyke, 1959) (i) Let 0-c<_ 1. Then for 0<nc—A <1

P(max [ci „ :; ]d)=P( 'n , ; >_ci—A for 1_<i<_n)


ISin
(5)
(A/C) n
=(1+A—nc) (Jc—.l)'(1—jc+,l)n-j+^
j= 0 3

This probability equals 0 or 1 as ne — A is ? 1 or <0.


(ii) Use this to evaluate the distribution of 11(6„ — I)'Il, when 6„ is a smoothed
empirical df of (3.1.6) that equals i/ (n + 1) at each „,; with 0:5 i <_ n + 1 and
is linear in between. See Penkov (1976) for asymptotic expansions.
(iii) Rederive (1) using (5).

The DKW Inequality

We now use the exact distribution obtained in Theorem 9.1.2 to obtain an


excellent bound on the tail probability P(IIU„ II >_ A ). Such a bound is used in
establishing a.s. limit theorems for IIU„Il.

Inequality 1. (Dvoretzky, Kiefer, Wolfowitz)

(6) P(IIv„Il>_A)/2 <— P(IwnO>_A)<_cexp(-2A 2 ) for all A>_0.

The choice c=29 works. Recall ^^%/„ ^^ = II OJ„ II.

Proof. The first inequality follows trivially from (1IJlI _ IIUJ„11 v IIURII, where
u I I — J) U n II
From (5.2.2) we have for 0<A < / that

(a) P(IIUn11>A)=Av Y- Qn(j,A),


j=(AJn)+I

where

Qn (j, A) — il ✓^)'(n—j+A^)” j /nn


— \/ (J

EXACT DISTRIBUTION OF IIU.'fl AND DKW INEQUALITY 355


Now for AV'i< j s n, we have

—,1n2
d
da
log Q,, (j, A) _ (j —Avfn)(n —j +A^) n—j +A/
—4A 16A
-4A —
( n_ j+.^l ^ l 2

< (1 — 4 /n2)(n /2—j +A ) 2< nz 2 /

(b) = -4A-16A(---- A I z
\2

We will divide the sum (a) into the two parts E' and .":

I— j /nI<a forjel' and 1Z — i/nj >a forjEE",

where we will specify a later.


We consider ' first. Integrating (b) yields
( z z
(c) Qn(j,^)<Q^(j,0)expj- 2,12 -8dZ12
n + 3,n) 9n}

for A >_ 0. From Stirling's formula

^-;-' nn< e_
e
Q„(je0)=\j17i(n— j)
I(n J) n l J
n
< (0.4337)(2 — a) 312 (+ a) "2 / n 312

(0.4337)b/n 3 " z with b=b,,=(z —a) 312 (2 +a) ' " z


for je 1'. Thus from (c) we have

n
AfriE'Q„(j,A)< A exp ( - 2k 2)^' exp l —8^1 z 12— n +3
^ Iz l .n
( 8A 2J2 \
< 2^ 6 exp( -2,1) exp Z
n z =o — ni--

<2Ab exp ( 2 A z ) I /n+ J exp ( 8,1 z s 2 ) ds


- -

L o J
/ 1 2 l
<_ 26 ` ^+ 8 I exp (-2A 2 )

(d) <(1.782)6exp(-2A2) for n?3.


356 SOME EXACT DISTRIBUTIONS

If j E ", we integrate (b) again to obtain


z z
)
(e) Qn(J, A)<crQn(j, T) exp 1 —2,k2- 8,12(2—n+3A
(

9n )

for A ? T where
\ z \
CT=exp (2T2 +8T 2 ^2 + 3
2 I +9 4 I.

If 2A/(3'Ji) s ad where d = f/(1 +f) = 0.738, then the second term in the
exponent of (e) does not exceed —(Aad) 2 . Thus the sum of the last two terms
in the exponent of (e) never exceeds —A 2 a 2 d 2 . Thus for j E Y." and A >_ T we
have from (e) that

Qn (j, A) < crQ, (j, T) exp ( -2A z ) exp ( — a z d z A z )

2e da Q"(j, T) exp (-2a z )


<

since A exp (—K,1 2 ) takes on a minimum of I/ 2eK at A = I/ 2K. Hence


from (a), for A ? T = 0.27 we have
CT
A ✓ nY."Qh(j,k) < exp (-2A z ) ✓ n^"Qn (j, T)
2 e da

C T Z 5.625 2
exp ( -2,1 ) = exp (-2^1 )
<v adT 2e ad
(f) = (3.266/a) exp (-2A 2 )

Plugging (d) and (f) into (a) with a = 0.221 yields the theorem.
The choices d = 0.738, T = 0.27, and a = 0.221 each resulted from a minimiz-
ation problem. However, the constant 29 is still excessive. What is the minimum
constant possible? Hu (1985) shows that C=2\/ works.
This proof is taken from Dvoretzky et al. (1956). ❑

Open Question 2. What is the minimum value of c that works in (6)? Does
c= 1 work as conjectured in Birnbaum and McCarty (1958)?

We will demonstrate here that the DKW inequality is quite good. The best
we will get for a Binomial (n, z) ry is Hoeff ding's inequality (11.1.6), which gives

P(IUJ (z)I >_ A)<_2 exp ( -2d z ) for all A >0.

The DKW inequality extends this from t = 2 to all t in [0, 1] and maintains
RECURSIONS FOR P(gsG, sh) 357

the same exponential rate; only the constant is increased. Also, as we compare
the DKW inequality with the limiting distributions of Example 3.8.1, we see
that the constant 2 in the exponent cannot be improved on.

Exercise 3. Integrate by parts to show that

E I!U n 11' <_ 2cr1'(r/2)/2 (2)+ ' for all real r>0

and

E exp (rIIU„ II) <_ 1 + f c/r exp (r 2 /8) for all r> 0

for the constant c of the DKW inequality.

Exercise 4. (Beekman, 1974) If N = Poisson (0) and is independent of the


Uniform (0, 1) sample f,, ... , i„, then

or1
[ 1
K,,= sup t- l[ar,()]
0 ;=t

satisfies

(Ox +i) -'


`

P(KB>x)=x8 i exp(-9x —i)

for all 0< x <_ 1. Hint: Compute the probability conditionally on the value of
N, using the Birnbaum and Tingey theorem (Theorem 9.2.1) on each term.

3. RECURSIONS FOR P(g - G„ ^ h)

Let X 1 ,. . . , X„ be iid with completely arbitrary df F. We are interested in


computing

(1) Pn=P(a;<Xn:i^b; for 1:5i<n).

In case a _ —m (in case b. = +co) for 1 ^ i < n, we will label the probability
;

in (1) as P. (as P„).

Example 1. (Exact distribution of ICU;; II.) Now

[IIU^II^A]_[IIG„- Ill <a/ ✓n]


(2) <6n: <_ -
; j 1 + <_
for 1i<_n
a.s. n ✓n n ]
358 SOME EXACT DISTRIBUTIONS

Thus in this case we define

i A i-1 A
(3) a;= n -- v0 and b = + — nl ; for 1:i_
<n.

Note that
P„ P(,, :i b i for 1!5isn)=P(IIU II sA),

(4) Pn=P(a1<e . for 1 - i<


_n)=P(IIlnII«),
Pn=P(ai<enf^b i for 1!5in)=P(lIU : 5 A).

[With regard to (3), we note that on the interval [e _,, en: ,) the event I -G <_
;

can be rewritten as e„ <- (i - 1)/n +A/^ .]


:; ❑

Example 2. (Exact distribution of ljU0/qi(G„)II.) Now

(5) [IIU /0(G )11^A


- Abi 1 1+A (inl)
n v_ forl<i<_n .
Ln \n // / ]

Functions fr of interest include iji(t) = ut(1- t) and i/i(t) = -log (t(1 - t)).
Table 1 gives the percentage points of the distribution of J1U +n/ 6 „(1 -G„)Ij,
as computed by Kotel'nikova and Chmaladze (1983). ❑

Example 3. (Exact distribution of IIUJ „/011.) Now

(6) I+A+1i// on (0, 1)]


(7) = P(g <_ G n h on (0, 1))
for

(8) g = I -,14i/'./ and h = I+Api//i.

It seems reasonable to suppose g and h are of the rough shape shown in


Figure 1. Thus we assume that

g <h, g(0)^0h(0), g(1) <h(1), g is J' on some [a, 1] having


(9) 0 <_ a ^ 1 with g(t) <0 on (0, a), h is >' on some on [b, 1] having
0 <_ b <_ 1 with h(t)> 1 on (b, 1), and g and h are continuous.

We now define

(10) a- h - '(i/n) inf {u: h(u)> i/n} A 1

and

(11) b1 = g - '((i - 1)/n)= sup {v: g(v)<_(i- 1)/n}v0


Table 1
Percentage points of the d.f. of the Studentized Smirnov Statistic
(from Kotel'nikova and Chmaladze (1983))

zc a 0.90 0.95 0.975 0.99 0.995

2 1.26907 1.34260 1.37863 1.40003 1.40713


3 1.73008 1.95218 2.10302 2.23308 2.29737
4 1.96420 2.31584 2.56750 2.81348 2.95194
6 2.18772 2.55976 2.88688 3.23074 3.43742
6 2.33076 2.75127 3.12181 3.54284 3.80781
7 2.44399 2.89834 3.30801 3.78493 4.09840
8 2.53331 3.01575 3.45755 3.98053 4.33239
9 2.60703 3.11196 3.57987 4.14204 4.52574
10 2.86849 3.19219 3.88212 4.27733 4.68842
11 2.72073 3.26018 3.76875 4.39231 4.82708
12 2.76574 3.31861 3.84318 4.49124 4.94660
13 2.80501 3.38935 3.90780 4.57728 5.05076
14 2.83954 3.41385 3.96447 4.65279 5.14227
16 2.87023 3.45327 4.01458 4.71958 5.22334
18 2.89768 3.48840 4.05920 4.77908 5.29564
17 2.92237 3.51990 4.09919 4.83244 5.36053
18 2.94477 3.59837 4.13526 4.88081 5.41908
19 2.96518 3.57417 4.18794 4.92425 5.47221
20 2.98385 3.59774 4.19773 4.96398 5.52063
21 3.00099 3.61932 4.22495 5.00032 5.56490
22 3.01693 3.63915 4.24999 5.03388 5.60556
23 3.03149 3.65744 4.27304 5.06444 5.84306
24 3.04511 3.67438 4.29435 5.09283 5.67772
25 3,05780 3.69015 4.31410 5.11915 5.70985
26 3.06966 3.70479 4.33248 5.14369 5.73976
27 3.08075 3.71847 4.34960 5.18848 5.76763
28 3.09119 3.73130 4.38585 5.18784 5.79367
29 3.10099 3.74328 4.38067 5.20783 5.81808
30 3.11024 3.75458 4.39476 5.22659 5.84097
31 3.11901 3.76528 4.40800 5.24422 5.88247
32 3.12728 3.77527 4.42049 5.26082 5.88273
33 3.13610 3.78481 4.43227 5.27850 5.90191
34 3.14259 3.79379 4.44348 5.29130 5.91999
35 3.14966 3.80232 4.45404 5.30530 5.93714
36 3.15642 3.81042 4.46406 5.31864 5.95336
37 3.16285 3.81809 4.47357 5.33124 5.98877
38 3.16900 3.82545 4.48264 5.34325 5,98349
39 3.17489 3.83243 4.49128 5.35471 5.99745
40 3.18058 3.83913 4.49948 5.36556 6.01073
41 3.18594 3.84552 4.50735 5.37600 6.02342
42 3.19112 3.85163 4.51487 5.38588 6.03551
43 3.19610 3.85748 4.52201 5.39539 8.04714
44 3.20087 3.86307 4.52892 5.40451 6.05821
45 3.20552 3.86849 4.53548 5.41319 8.06885
48 3.20994 3.87368 4.54186 5.42152 6.07907
47 3.21421 3.87859 4.54790 5.42954 6.08885
48 3.21829 3.88340 4.55370 5.43725 6.09820
49 3.22231 3.88801 4.55931 5.44487 8.10722
50 3.22810 3.89241 4.58473 5.45179 6.11591
continued

359
Table 1 (tont.)

n a 0.90 0.95 0.975 0.99 0.995

52 3.23346 3.90085 4.57494 5.48522 8.13239


54 3.24031 3.90665 4.58449 5.47770 6.14768
56 3.24671 3.91602 4.59333 5.48939 6.16184
58 3.25277 3.92288 4.60160 5.50026 6.17510
60 3.25850 3.92930 4.60939 5.51048 8.18756
62 3.26386 3.93543 4.81889 5.51999 8.19921
64 3.26896 3.94112 4.62352 5.52904 6.21016
86 3.27382 3.94850 4.63005 5.53751 6.22049
68 3.27845 3.95167 4.83810 5.54549 8.23026
70 3.28280 3.95648 4.64187 5.55300 6.23948
72 3.28703 3.96110 4.84739 5.56015 6.24822
74 3.29099 3.96541 4.85257 5.58691 6.25647
76 3.29480 3.96965 4.65748 5.57338 6.28426
78 3.29840 3.97359 4.66220 5.57946 6.27179
80 3.30196 3.97734 4.68687 5.58534 6.27888
82 3.30531 3.98093 4.67095 5.59089 8.28581
84 3.30853 3.98446 4.67505 6.59615 6.29206
86 3.91161 3.98775 4.67891 5.60117 6.29818
88 3.31453 3.99089 4.68263 5.60608 6.30411
90 3.31745 3.99398 4.68621 5.81070 6.30969
92 3.32015 3.99885 4.88965 5.61506 6.31515
94 3.32289 3.99972 4.69284 5.81933 6.32031
96 3.32543 4.00237 4.69606 5.62344 8.32527
98 3.32791 4.00494 4.89909 5.62727 6.33003
100 3.33030 4.00750 4.70200 5.63110 8.33460

UI ... U8 U9 U10
1

IC VI V2 V3 •••
V10

Figure 1.

360

RECURSIONS FOR P(gsG„sh) 361
for 1<_i<_n. Then

(12) P(IIUn/cII^A)=P(a;< n:r^bi for 1<i<-n)


(13) =P(G„(a,)<i/n<_G.(b,) for 1<_i<_n)

is also of the form (1). Table 2 gives percentage points of the distribution of
IIv„/ I(1- I)^I as computed by Kotel'nikova and Chmaladze (1983). ❑

Table 2.
Percentage Points of the d.f. of the Standardized Srnirnov Statistic
(from Kotel'nikova and Chmaladze (1983))

n a 0.75 0.80 0.85 0.90 0.95 0.975 0.99 0.995

1 1.73205 2.00000 2.38048 3.00000 4.35890 8.24500 9.94987 14.10674


2 1.92006 2.18871 2.56185 3.18855 4.49780 6.35217 10.02177 14.15864
3 2.01739 2.28097 2.65033 3.24861 4.55606 6.39382 10.04749 14.17661
4 2.08062 2.34152 2.70598 3.29406 4.58938 6.41598 10.06078 14.18574
5 2.12893 2.38506 2.74540 3:32882 4.81126 6.43006 10.06888 14.19127
8 2.16268 2.41855 2.77537 3.35120 4.62887 8.43976 10.07436 14.19496
7 2.19181 2.44549 2.79920 3.37028 4.63866 6.44888 10.07830 14.19763
8 2.21573 2.46783 2.81884 3.38571 4.64790 8.45232 10.08128 14.19964
9 2.23629 2.48682 2.83539 3.39858 4.85537 6.45663 10.08360 14.20119
10 2.25417 2.50323 2.64962 3.40950 4.86155 8.48012 10.08548 14.20242
11 2.26993 2.51765 2.86205 3.41891 4.68672 6.48304 10.08702 14.20345
12 2.28395 2.53046 2.87302 3.42717 4.67120 6.48547 10.08830 14.20431
13 2.29659 2.54195 2.88285 3.43447 4.67503 6.48753 10.08938 14.20504
14 2.30804 2.55237 2.89170 3.44102 4.67842 6.46933 10.09031 14.20565
15 2.31852 2.56188 2.89974 3.44892 4.88139 6.47090 10.09114 14.20818
18 2.32812 2.57058 2.90708 3.45221 4.88408 6.47228 10.09184 14.20864
17 2.33702 2.57859 2.91384 3.45710 4.68645 6.47348 10.09248 14.20707
18 2.34529 2.58602 2.92006 3.48161 4.88883 6.47457 10.09303 14.20746
19 2.35296 2.59293 2.92587 3.46572 4.89057 6.47558 10.09355 14.20779
20 2.38017 2.59943 2.93126 3.48957 4.69234 6.47648 10.09397 14.20807
21 2.36895 2.60547 2.93833 3.47313 4.69396 8.47724 10.09441 14.20832
22 2.37330 2.61120 2.94108 3.47844 4.69548 6.47798 10.09478 14.20858
23 2.37936 2.61861 2.94555 3.47957 4.69689 6.47869 10.09508 14.20880
24 2.38502 2.62169 2.94977 3.48249 4.69817 6.47929 10.09543 14.20900
25 2.39045 2.82650 2.95375 3.48525 4.69940 6.47985 10.09570 14.20920
26 2.39557 2.63109 2.95753 3.48783 4.70053 6.48039 10.09596 14.20939
27 2.40047 2.83549 2.96113 3.49031 4.70158 6.48090 10.09623 14.20955
28 2.40515 2.63981 2.96458 3.49286 4.70256 6.48135 10.09640 14.20970
29 2.40985 2.64363 2.96782 3.49486 4.70348 8.48175 10.09685 14.20983
30 2.41392 2.64742 2.97096 3.49899 4.70433 6.48213 10.09683 14.20995
32 2.42196 2.85459 2.97681 3.50091 4.70594 8.48287 10.09720 14.21019
34 2.42941 2.86119 2.98218 3.50452 4.70733 8.48355 10.09752 14.21038
36 2.43830 2.86730 2.98716 3.50784 4.70862 6.48408 10.09782 14.21058
38 2.44277 2.67301 2.99178 3.51088 4.70980 6.48459 10.09805 14.21076
40 2.44880 2.67839 2.99613 3.51373 4.71091 6.48507 10.09829 14.21090
42 2.45452 2.88342 3.00019 3.51839 4.71182 8.48547 10.09848 14.21103
44 2.45981 2.68812 3.00400 3.51887 4.71272 8.48588 10.09866 14.21114
46 2.46490 2.89259 3.00762 3.52118 4.71358 6.48621 10.09882 14.21129
48 2.48970 2.69680 3.01100 3.52341 4.71437 8.48853 10.09903 14.21141
50 2.47424 2.70079 3.01418 3.52542 4.71508 6.48886 10.09918 14.21150
80 2.49397 2.71821 3.02813 3.53433 4.71799 8.48802 10.09973 14.21183
70 2.51006 2.73226 3.03933 3.54133 4.72018 6.48888 10.10018 14.21209
80 2.52354 2.74401 3.04888 3.54703 4.72188 8.48952 10.10050 14.21236
90 2.53507 2.75412 3.05666 3.55197 4.72321 6.49004 10.10073 14.21251
100 2.54506 2.76290 3.08360 3.55810 4.72440 8.49040 10.10090 14.21260
362 SOME EXACT DISTRIBUTIONS

The Recursions

We now turn back to the basic problem stated in (1) ; we wish to compute

(14) P„ =P(a ; <X,, :; b; for lei<n),

as well as the one-sided versions P„ and P,,. To avoid trivialities we assume

(15) a 1 <a, b,

and

(16) a; <b1 for i <_f<_n

Noe's Recursion 1. We now let c, <_ • • <_ c z „ denote the 2n boundaries


a, , ... , a„, b, , ... , b„ arranged in any 7 order. We let a,, = b o = co = —oo and
a 1= bn +1 = c2„+1= + 0c) ; but we will not refer to these as boundaries. Now let

(17) g(m) = (the number of a-boundaries among c o , c l , ... , c m ),


0sm_s2n

and let

(18) h(m) —1= (the number of b- boundaries among co , c,, ... , cm _,),

1 <_m<2n +1.

Thus g(0) = 0, g(2n) =n, h(1)-1=0, and h(n +1)-1=n. The probabilities
of the various intervals (c,„_,, cm ] are

(19) pm =F(cm )—F(cm _,) for 1 sm <_2n +1.

The probability P. can be calculated by means of the following recursions. Let

(20) Q0(0) j•
For 1:m <_2n,

(21) Q1(m)= ± ()Qkm —lP'


k =h(m) -1 k

for h(m +1) -1^isg(m -1).


Let
( 22 ) Qg(m- 1)+1(m)=0.

RECURSIONS FOR P(g<_G ., 5h) 363
Then the probability we seek is

(23) PP = Q,,(2n).

See Noe (1972) for the above. The following one-sided version is from Noe
and Vandewiele (1968).
In case there is only a lower boundary (b = Co for 1 <_ j <_ n), then
;

(24) pr —F(a m )—F(a,„_,) for 1 <_m<_n +l.

We compute P. by the following recursion. Let

(25) 1 Q0 (0)=1.
For 1^mn +1,

(26) Q,(m)= ( t )Qk(m -1 )P;„ k for0i<_m -1.


k=o k

Let

(27) I Qm(m) =0 .

Then the probability we seek is

(28) P„ =Q„(n +1).

In case there is only an upper boundary,

(29) replace a m by 1— bn _ m+ , for 1:5 m <_ n + 1

and the recursion (25)-(27) will give

(30) P„ = Q„(n+1)

for the solution.


Noe (1972) and Niederhausen (1981b) make computations of various prob-
abilities of the type P. and P,,, and give extensive tables. Note that these
recursions would lend themselves to symbolic programming.

Example 4. Suppose a,, ... , a, o , b,, ... , b, o are ordered from smallest to
largest, with ties allowed in any combination, as in column 1 of Figure 2. The
quantities g(m) and h(m) —1 of (17) and (18) appear in columns 3 and 4. As
m ranges from 1 to 2n = 20, the indices i for which the Q (m) of formula (21)
;

must be computed are given in columns 5 and 6; that is, indices h(m + 1)— 1 _<
i <_ g(m —1) are specified. Thus, in this example with n=20, Q (m) must be
;

evaluated using (21) a total of 52 times. ❑



364 SOME EXACT DISTRIBUTIONS

in g(m) h(m)-1 h(m+1)— 1 g(m — 1)

0 0
a, 1 1 0 0 0
a2 2 2 0 0 1
b^ 3 2 0 1 2
a3 4 3 1 1 2
a4 5 4 1 1 3
ab 6 5 1 1 4
as 7 6 1 1 5
bg 8 6 1 2 6
b9 9 6 2 3 6
b4 10 6 3 4 6
a7 11 7 4 4 6
be 12 7 4 5 7
be 13 7 5 6 7
as 14 6 6 6 7
b7 15 8 6 7 8
as 16 9 7 7 8
be 17 9 7 8 9
al o 18 10 8 8 9
be 19 10 8 9 10
b io 20 10 9 10 10
21 10 •

bi -••

I ii II 11111

1 i n
n+1 n+1 n+l
Figure 2.

Example 5. (Exact power of tests of fit.) Note that minor changes allow us
to use these same methods to compute the exact power of tests of fit. Stephens
(1974) makes various power computations based on formulas of the type
discussed in this section. O

Proof of Noe's recursion (20)—(22). We define

(a) Q;(m)=P(a;<Xj.j^bj and Xj.j:Cm for 1^jsi).


RECURSIONS FOR P(g<G„sh) 365

Note that

(b) Qo(0) = 1,

and also

(c) Q0(1) =1 with h(1+1) - 1 = 0 = g(1-1) and


Q 1 (1)=0 with g(1 - 1)+1=1

since we will interpret an event placing no restrictions as the sure event. Thus
the recursions (20)-(22) hold for m = 0 and m = 1.
Now for the inductive step. Fix 1 <m <_ 2n. We will assume that

(d) k(m -1) is known for h(m)- 1 <_k<g(m -1-1)


twith Qk (m -1)=0 for k >_g(m- 1-1)+1.

The recursion will be completed if we can show that (provided m <_ 2n)

the recursion formula (21) is necessarily satisfied for Q ; (m)


(e) with Qk (m) =0 for k >_g(m -1)+1.

Now conditioning on which k specific observations do not exceed c m _,,


we have

(f) Q(m)= ± ( ) P(aj <Xk .j sbj and Xk :j <_c m -, for 1 <_jk)


k=o k
xP(ak +j <Xi-k:j bk +j and Cm _ 1 <Xi _ k :j <_Cm for 1 <_j<_i — k)
i I
(g) = Qk (m —1) P(a k +j < Xt-k:j _ bk +j and
k=o k

Cm -1 < Xi-k:j C c m for 1 <_j <_ i — k).

Now if k < h(m) —1, then bk+ , <— b h(m) _, <_ Cm -,, so that

(h) the P( ) in (g) is equal to 0 if k < h(m) —1.

Also,ifh( m ) - 1 <_k^i^g(m - 1), thena i <ag (m- ,)qcm- l<cm^bh(m)^bk +l,


so that

(i) the P( ) in (g) is equal to p;„ k if h(m) —1 C k i s g(m —1).

Recall from (d) that

0) Qk(m -1)=0 for g(m- 1 - 1)+1 <_k.


366 SOME EXACT DISTRIBUTIONS

Plugging (h), (i), and 0) into (g) gives

(i)Q(I)i_k
(k) Q(m)= ±
k=h(m)-I k
if h(m+1)-1<isg(m-1).

Moreover, if k >_ g(m —1) + 1, then cm <_ a g( ,„_, )+ , ^ a k , so that

(1) Qk(m)=0 for k>_g(m-1)+1.

Combining (k) and (1), we see that (e) holds. ❑

Exercise 1. Prove Noe's recursion (25)-(27).

For comparison, we indicate some other approaches that appear in the


literature. We introduce the notation

(31) P.(a,b)=P(ai 5 n:,^b; for 1:5 i^n)

with

P.(a)=P(ai 5 ,:r for 1:5i<—n)


(32)
Pn(b)=P(Cn:; b for 1<_i<_n).
;

Bolshev's Recursion 2. If b satisfies 0 <— b, <_ • .. <_ b n <_ 1, then

(n)(1—b„
(33) P„(b,.....bn)=1 — m+i) m P„ m(b,,...,bn m)
m=1 m

with P0 = 1. Note that when a ; = 1 — b„_ ;+ , as in Figure 3, then

P(gG n )=P(en:; sb. for 1<—inn)


(34) =P(a;<_i,,:j for 1<_i<_n) when a ; =I—b„- ;+ ,

= P(G„ h) with h as in Figure 3.

Thus the recursion (33) can be rewritten as

n
(35) P.^(a......an)=1 — a: n m(am+i...an)
-I m

with Po = 1. Kotel'nikova and Chmaladze (1983) used (35) to table the exact
distribution of IIU / I(1—I)II and IIv/ G(1 —G„)II for sample sizes n up
RECURSIONS FOR P(gs0„sh) 367
1 1 h
• g •

0 0
0 b, b 2 b3 ••. b„ 1 0 a, a2 a 3 ... a" 1

u ; = 1 - v„_ ; +I

Figure 3.

to 100; see Tables 1 and 2. All this raises the interesting prospect of finding
the distribution of limit functionals such as IU «/log (1(1 — I ))-' II through
finite n calculations.

Steck's Recursion 3. If b satisfies 0 <_ b, ^ • • <_ b" <— 1, then


" -Z n
(36) Pn(b1,....bn)=bn — E (. (bn — b;+l) "
-;
P;(b,.....b;)
j=0 3

with Po = 1. See Steck (1968), though our proof is found in Breth (1976). Steck
indicates this performs well at n = 100. See also Epanechnikov (1968).
Proof of Bolshev's recursion. Now
P"(b,,....bn )=P(nG n (bk )>_k for 1:k <_n)

=1— Y_ P([nGf(bk) =k —
l]nf [nG,,(bJ)?j])
k = 1 i =I
fl/ a
=1— (1 —bk )n-k+lbk -1
k =, k- 1
(

xPQ^ [nGn(b;)^j]I[nGn (bk) =k —


I]
-i

k+1
—bk)" Pk - ,(b......bk -1)•
=1—
k_, ( k— 1 )(1
Letting m = n — k+ 1 in (a) gives

(b) Pn(bl,..., ' n )=1 — , Q (1 —bn- m +1)mPn-m(bl.....bn -m)


m=1

as claimed in (12). ❑

Proof of Steck's recursion. Now


n-1
(a) A=[en:n^bn1 =U A;,
368 SOME EXACT DISTRIBUTIONS

where we have the partition A 0 , A,, ... , A"_, of A defined by

Ao=[b1<e":t]nA,
(b) A;=[ ":j^bi for 1< —i <_j and bj +i <6":j+i]nA

for 1<_jn-2,
A n _,=[fin:; <b; for 1in]r'A;

note how A is partitioned on the basis of the smallest index where ";; <— b ;
fails. Now, using (a),
n-2 n-2
(c) P" = P(A"-,) = P(A) — P(A)=b— E P(A ). ;

j=0 j0

Now A; , for j< n-1, occurs if and only if j of f,, ... , e" fall in [0, b; ] and
satisfy en , ; b. for i j, while the remaining n —j points fall in (b;+ ,, b"]; thus,
using conditional probability,

(b,/b; , ..., b /b )
; ;

(d) P(A')= \/ b(b"—b1j

Plug (d) into (c) to obtain the formula claimed. ❑

Remark 1. It may be of some interest to compare the first few terms of the
recursions (33) and (36) for P"(b). For (36) we obtain

Po =1, Pi(b,)=big

Pz(b,, bz) = b2 — (b2 — b 1 ) 2 P0 =2b,b 2 —


bi,
P3(b1, bz, b3)=b3 — (b3 — b,)Po - 3(b3 — bz) 2 Pi(b1)•

For (33) we obtain

Po= 1, P1(b,) = 1 —(1 —


b1)Po= b1,
P2(b1, bz)= 1-2(1 —b2)P1(b1)—(1 —b 1 ) 2 Po =2b 1 b 2 —bi.
P3( 6 1, bz, b3) = 1-3(1— b3)P2(bi, bz) — 3(1— bz) 2 P1(b1) — (1— b,) 3 po .

We note that the recursions really proceed differently. Note that both lend
themselves very well to symbolic programming. ❑

Exercise 2. Write out Noe's recursion (25)-(27) to obtain formulas for P,(b,)
and P2 (b 1 , bz).
RECURSIONS FOR P(g <_ G, ^ h) 369
Steck's Formula 4. If a and b satisfy, (15) and (16), then

P„ =P„(a,b)=P(a „ -b, for I^isn)


; :;

(37) =n!det[(b1—a;)+'+' /(j—i+1)!]

where (x) + = x v 0 and where it is understood that all elements of the matrix
having subscripts i >j + I are zero. We prove this below. Steck indicates that
problems of numerical accuracy set in with (37) prior to n = 50. Steck's formula
also provides the basis for the following result of Ruben (1976), which we
state without reproducing its lengthy elementary proof by induction.

Ruben's Recursion 5. If a and b satisfy (15) and (16), then

(38) P„ (a, b) n!k Y n f^ c„ (k) ,

where the 2 " ' n -tuples k = (k,, .. . , k„) in K. and the corresponding constants
-

c„ (k) are generated recursively by means of

(39) K, = {(1)},K 3 = Li {(l ,k 1 ,....k„),(k, +1,0,k 2 ,...,k„)}


kE K„

and

(40) c,(( 1 )) = 1, c”+1((1, k 3 , .... k")) = c„(k),

c„+ 1 ((k 1 +1,0, k2i ..., k„))=— (k,+1)c„(k).

These can also be generated recursively by means of binary trees:

k c(k)

(1,k,,...,k„) c(k) \ —(k,+I)c,,(k)

For n =1, ... , 6 the values of K. and c„ (k) appear in Table 3.

Exercise 3. Use Table 3 to verify that

3 !P3(a, b) = (b, — al)+( b , — a2) +(b3 — a3)+ — z(bl — a2) +(b3 — a3)+

— z(b1 — a1) +(b 2 — a3) ++b(b1 — a3)+

for all (a,, a 2 , a 3 ) and (b,, b 2 , b 3 ) satisfying (15) and (16).



370 SOME EXACT DISTRIBUTIONS

Table 3
Coefficients in Ruben's Formula
(from Ruben (1976))

K1 K3 K4 K5
1 (1) 111 (1) 1111 (1) 11111 (1) 11120 (-2)
201 (-2) 2011 (-2) 20111 (-2) 20120 (4)
120 (-2) 1201 (-2) 12011 (-2) 12020 (4)
300 (6) 3001 (6) 30011 (6) 30020 (-12)
K2 1120 (-2) 11201 (-2) 11300 (6)
2020 (4) 20201 (4) 20300 (-12)
11(1) 1300 (6) 13001 (6) 14000 (-24)
20 (-2) 4000 (-24) 40001 (-24) 50000 (120)

KB
111111 (1) 111201 (-2) 111120 (-2) 111300 (6)
201111 (-2) 201201 (4) 201120 (4) 201300 (-12)
120111 (-2) 120201 (4) 120120 (4) 120300 (-12)
300111 (6) 300201 (-12) 300120 (-12) 300300 (36)
112011 (-2) 113001 (6) 112020 (4) 114000 (-24)
202011 (4) 203001 (-12) 202020 (-8) 204000 (48)
130011 (6) 140001 (-24) 130020 (-12) 150000 (120)
400011 (-24) 500001 (120) 400020 (46) 600000 (-720)

The Exact Distribution of

That the null distribution of Kolmogorov's statistic is given by

(41) P(IIG" —1 75 a) = P. (a, b) with

a i (—a+ )vO and b(a+ n


I 1 Al
)

was shown above. Using Ruben's formula, Ruben and Gambino (1982) com-
puted Table 4, giving the coefficients of the polynomials in the exact distribution
of (IG —Ill for 3 ^ n ^ 10 on the range 1/ n ^ t <_ I— I/ n. For example, when
n=4 and 2<_t<—;

P(II6"—fJJ t)=-1+(8)t+(I)t 2 —IOt 3 +6t ° .

They did not give coefficients for 0 <_ t < 1/n and 1-1/n < t < 1 since
0 if 0<_ t< 1/(2n)
(42) P(JjG"—I t)= n!(2t-1/n)" if 1/(2n) ts1/n
if 1-1/n<t^ 1
is true for all n.
Table 4. Coefficients of P(D„ S a) for a In the various subintervals of
[1/n,1-1/n], n = 3(1)10
(from Ruben and Gambino (1982))

n=3 [1/3,1/2] [1/2,2/3] n=4 [1/4,3/8] [3/8,1/2] [1/2,3/4]

1 0 -1 1 3/8 3/8 -1
a -8/3 10/3 a -9/2 -63/8 37/8
a2 14 2 aE 21/2 75/2 3/2
a3 -12 -4 a3 24 -48 -10
a4 -48 16 6

n=5 [1/5,3/10] [3/10,2/5] [2/5,1/2] [1/2,3/5] [3/5,4/5]

1 96/625 336/625 0 -1 -1
a 672/125 - 188/25 -728/125 522/125 738/125
ae -1464/25 12 224/5 24/5 12/25
a3 240 424/5 -458/5 -56/5 -92/5
a• -288 -240 74 -6 22
ab 0 160 -20 12 -8

n=6 [1/8,1/4] [1/4,1/3] [1/3,5/12] [5/12,1/2] [1/2,2/3] [2/3,5/6]

1 -5/81 -35/1296 5/16 5/16 -1 -1


a 10/27 145/27 -565/81 -7645/648 3371/648 4651/648
as 235/9 -785/9 1525/54 775/9 175/36 -8280/7778
a3 - 1280/3 4240/9 515/9 - 1985/9 -185/9 -265/9
a4 2360 -2800/3 - 1115/3 295 0 160/3
ag -4800 320 560 -240 32 -38
a8 2880 320 -280 104 -20 10

n=7 [1/7,3/14] [3/14,2/7] [2/7,5/14] [5/14,3/7] [3/7,1/2] [1/2,4/7]

1 8640/78 -31860/78 54510/78 54540/76 0 -1


a -8640/74 140400/75 - 120240/75 - 157740/7 5 - 153852/7 5 81448/75
a2 20880/73 - 178760/7 4 36240/74 73740/74 31512/73 2700/73
a3 - 128160/73 15600/73 45040/7 60040/73 - 103000/73 -8960/73
a4 -2880/7 99840/49 - 15950/49 -51950/49 3770/7 -150/7
a5 92160/7 -57600/7 -3540/7 15860/7 -4380/7 324/7
ae -43200 11200 2120 -2296 468 20
a? 40320 -4480 -1680 1008 - 168 -40

[4/7,5/7] [5/7,6/7]

1 -1 -1
a 104486/75 141986/75
a2 11220/74 -7530/74
a3 - 11120/75 - 14870/73
a4 710/49 5210/49
ab 414/7 -766/7
aB -80 58
alp 30 -12

continued next page


371
Table 4. (cont.)

n=8 [1/8,3/16] [3/18,1/4] [1/4,5/16] [5/18,3/8] [3/8,7/16] [7/16,1/2]

1 -1260/131072 -78860/87 -12480/87 1168790/87 140/85 140/85


a 8815/84 -77490/85 25795/84 -2286620/8 6 -2480534/6 8 -4127620/8 8
aZ -277830/84 3877940/85 -559265/84 434630/85 1720964/8 5 628892/84
a3 80325/64 -804930/83 523005/83 951790/84 409108/84 -2414488/8 4
a4 -737100/64 600390/64 -211190/84 -302820/83 -79590/64 11060/8
a5 49140 -22785 8020 -11802/8 240940/84 -143220/64
as -55440 10360 -9485 7931 -46900/8 18956/8
a7 -161280 35840 14580 -12096 4928 -1344
ae 322560 -35840 -11200 6720 -1792 256

[1/2.5/8] [5/8,3/4] [3/4,7/8]

1 -1 -1 -1
a 1500284/88 1894034/88 2547218/88
as 33740/84 138670/85 -187922/85
a3 -131480/84 -192710/84 -274142/84
a4 -140/8 20790/83 98390/83
as 6188/64 5838/64 -16842/64
ae -140/8 -1686/8 1610/8
a° -110 158 -82
a5 70 -42 14

n=9 [1/9,1/81 [1/6,2/9] [2/9,5/18] [5/18.1/3] [1/3,7/18] [7/18,4/9]

1 -89600/97 500080/97 -9682980/9 8 3267040/9 5 17812480/9 8 17812480/99


a 555520/98 -3365600/98 46340000/9 7 3360000/9 8 -4929792/9 6 -57544816/97
aE -703360/95 7289920/95 -62515040/9 6 -92895040/9 8 16411472/9 8 29588180/96
as -2535680/9" -4827200/94 14007840/9 5 88599840/9 5 13801760/9 5 17568528/95
a4 9623040/93 -29097600/94 27233380/9 4 -37843540/94 -7209524/9 4 -19041652/9 4
as -13332480/81 56743680/9 3 -21514080/93 9812712/9 3 -174776/9 3 7508424/93
a8 9067520/9 -30365440/81 804160/9 -2242912/81 849296/81 -1697136/81
a? -2938880 806400 -1420160/9 56000 -244384/9 232288/9
as 3225600 -752840 179200 -78608 30484 -17864
a9 0 215040 -107520 45696 -13440 4992

[4/9.1/2] [1/2.5/9] [5/9.2/3] [2/3,7/9] [7/9,8/9]

1 0 -1 -1 -1 -1
a -80189328/97 25924114/9 7 31524114/9? 39382322/97 52539010/97
az 9157696/95 654840/95 4491760/9 6 1879024/96 -4709320/9 6
as -41500928/9 5 -1820000/9 5 -2730000/9 5 -3818640/9 5 -4759832/9 5
a4 1434944/99 -34720/93 -42980/9' 537628/94 2016644/9 4
as -2859920/93 79408/93 13412/81 90468/93 -389732/9 3
as 49224/9 840/9 -9632/81 -35840/81 43736/81
a7 -43232/9 -1760/9 -1704/9 4512/9 -2938/9
as 2234 -70 294 -226 110
as -372 140 -112 56 -16

continued next page

372

RECURSIONS FOR P(g<6 h) 373
Table 4. (cont.)

n=10 [1/10,3/20] [3/20,1/5] [1/5,1/4] [1/4,3/10] [3/10,7/20] [7/20,2/5]

1 .01016064 .04281984 - .11313792 - .15369417 .504851382 .504851382


a - 1.2828224 - .421848 1.3578768 10.0991268 - 9.13959144 - 10.91844432
a2 59.185728 -69.709248 82.82232 - 150.30918 18.787578 28.8265952
a3 - 1349.9138 2361.38112 - 2113.96752 425.05848 296.303992 388.658424
a4 14595.0336 -31227.6384 19088.8076 5786.4656 - 715.81608 - 1748.43816
a5 -25256.448 190366.848 -78189.552 -51541.8624 -3310.9272 - 1651.356
ad - 1075576.32 -406990.08 138876.32 198156.672 13338.192 29670.48
a? 11496038.4 - 896718.8 35481.6 -467389.44 1061.78 - 95278.16
a8 - 49351880 6096384 - 594720 754387.2 -74592 153964.8
as 92897280 - 10752000 1102080 - 801024 134400 - 129792
a1 -58060800 6451200 -806400 419328 -77952 44928

[2/5,9/20] [9/20,1/2] [1/2,3/5] [3/5,7/10] [7/10,4/5] [4/5,9/10]

1 .24809375 .24609375 -1 -1 -1 -1
a - 11.95911504 - 19.70752482 8.19872518 7.45283848 9.23169134 12.25159022
a2 82.9458144 237.91401 11.648385 8.5131018 2.5835922 - 12.5159022
as 152.373432 - 1225.12164 -44.62164 -62.910792 -85.4994 - 104.373768
a4 - 3033.98844 4108.5788 - 48.7964 13.44168 142.51944 472.82088
a5 13862.5956 -9945.9612 197.0388 255.0996 151.3764 -984.0348
as -35923.104 16984.8 48.3 - 320.828 -831.012 1233.372
a7 59053.68 - 19328.4 -440.4 -232.08 1273.2 -984.72
as - 60672.6 13977 128 781.2 - 1004.4 493.2
as 35232 -6240 392 -618 418 -142
a 10 -8712 1528 -252 168 -72 18

Exercise 4. Prove (42).

Additional Results

Many other finite-sample size formulas for probabilities of interest can be


found in Csäki (1981).

A Related Result

Exercise 5. (Steck, 1971; Pitman, 1972).


(i) Show that if

(43) Dmfl = Ii(^m -^^)#II


for iid samples from a continuous df [which we might as well assume to

374 SOME EXACT DISTRIBUTIONS

be the Uniform (0, 1) df I], then

(44) mnDm„= max [(m+n)i—mR 1 ],


lsi<_m

(45) mnD= max [mR —(m+n)i+n],


;

IS ism

(46) mnD m „=n+ max I mR ; —(m+n)i+21,


2 ,^;< m

where R 1 , ... , R,„ denote the ranks of the observations X 1 ,. . . , Xm in the first
sample when ranked among the combined population X,, ... , Xm , Y,, ... , Y,,.
(ii) Show that

P(mnDm„ < r and mnDm n < s)


+n)_r
—P(i(m
(47) +<R,<i(m+n)—n+s for 1<i<_m
m m 1

[and that all four symbols < in (47) may be replaced by ].


(iii) Let b, ^ • • • <_ b„ and c, <_ • • • <_ c„ be integers having bi <c. The number
of sets of integers {R,, ... , R„} such that R, < • • • <R and b ; < R i < c ; for
1<— i <_ n equals the nth-order determinant det (d i; ), where d;; =0 if i > j + 1 or
if c b ^ 1 and where dj =
, — ( `i _6; + +'-'-'
, ) otherwise. Niederhausen (1981b) gives
extensive tables for such two sample cases also.

Proofs

Lemma 1. Let A 1 ,... , A, and B,, ... , B. denote events such that for any
integer k the sets

(48) {A„ Bs : r < k, s <_ k} and {A„ Bs : r> k, s> k} are independent.

Then

(49) P(B,B2 ... B„A,A 2 • • • A„_,)=d„=det(d;; )°det(D),

where d, for 1 < i, j ^ n, are defined by

0 ifi>j+1
_ 1 if i =j+1
(50) d” P(B,) if i =j
P(BiB1+l . .. BJAiA;+I ... Ai-i) if i <j,

and A; is the complement of A,.


RECURSIONS FOR P(g<_G, sh) 375
Proof. Note that the conditions on the events imply

(a) B,, ... , B. are independent.

The events A,, A 2 ,... are 1-dependent. Put

(b) Ar = A I A 2 ... A,, B, = B, B2 ... Br.

Assume the lemma true for n = N. Then

(c) 'IN = P(BNAN -1)•

Consider O N+ ,. The elements of the last row of D , are all zero except

(d) dN +1,N = I and dN +1,N+1 = P(BN+I)•

Therefore,
(e) AN +l = P(BN +1)' 1 N — zN,

where D N differs from D N only in having P(B N ) in the last column on D N


replaced by B N B N+ ,AN in every element of the last column (this latter event
satisfies the same conditions relative to the other events appearing in 0 N as
does BN). Therefore

lf) ON— P \BNBN+1ANAN -i)— P\BN+IAN-IAN)

by induction and (b)


and

(g) P(BN+I)ON = P(BN+I)PIBNAN -1) = P(BN+IAN -1)


by induction and (a).

Thus plugging (f) and (g) into (e) gives

(h) LN +l = P(BN+IAN -1) — P(BN+I AN -1 AN)

by elementary set theory


= P(BN+1AN-IAN) = P(BN+IAN) by (b) .

Therefore the lemma is true for n = N+1.


For n = 2 we have, using (a),

(i) ^ 2= I P(B 1
) P(B,BzA,) =P(B1B2)—P(B,B2Ai)=P(BIB2AI)
1 P(B2)
as desired. ❑
376 SOME EXACT DISTRIBUTIONS

Proof of Steck's formula. Now

(a) P,,=n!P(a,^ ,< bi, 1 :i^n; e,^ez^


... ^^„)•

Let B; = [a ; ^ f; ^ b ; ]. Let A 1 =[ +1 ]. The events A ; , B. satisfy the condi-


tions of Lemma 1. Hence

(b) P(a,^e,^b;, 1si<_n, and e1 e2


=P(B 1 B 2 • • • B„A,A 2 A,)=det(d).

By (50) we have

(c) dij=0 if i >j+1, d=1 if i =j+1,

and

(d) d„=P(B;)=b;—a;.

If i < j, then

d^ij = P(B1B1+, ... BBA7Ai+1 ... Ai-,)


=P(ar< , < _j, and fi>fi+,> ... >j)
_b„ irr <
(e) =P(b,^ >e,+,> ... >ej>a^)
=(b; — a;)+' + ► /(j — i +1)!.

The theorem then follows from applying (c)-(e) to (b). ❑

4. SOME COMBINATORIAL LEMMAS

Combinatorial methods have proved fruitful in analyzing both one and two
sample empirical processes; see Takäcs (1967). We present a brief sampling
here; it is in the spirit of Vincze (1970). The only application made in this
section is a rederivation of Dempster's proposition (Proposition 9.1.1).
However, in the next section we will use them to derive some new results; see
also Takäcs (1967).

Lemma 1. (Andersen) Let x,, ... , x, denote any n + 1 real numbers that
satisfy: (i) the sum of all n + 1 is zero and (ii) no proper subset sums to zero.
Let X 1 ,. .. ‚ denote a random permutation of the x i 's and let S;
X 1 + •+X1 for1^i^n+l with S o =0. Let

(1) L (the smallest i for which Si = max (0, S,, ... , Sn ))


SOME COMBINATORIAL LEMMAS 377
and

(2) N = (the number of positive terms in S 1 , . . . , S„).

Then we have

(3) P(L= i) =P(N= i) = 1/(n+1) for 0 <_i^n.

Proof. Given any particular realization of X 1 ,.. . , X„ + ,, we now graph


So , ... , S. and place a second graph of S0 ,. . . , Sn just to the right of this.
Note that each Si is at a distinct height. Draw horizontal lines connecting So
to So , S, to S, ... , S. to S; these represent graphs of the partial sums for
each of the n + 1 cyclic permutations of the realized X ; 's. For the situation
pictured in Figure 1 L takes on the successive values 3, 2, 1, 0, 8, 7, 6, 5, 4

Sn

Sn
So 1 \ / so S' \

Figure 1.

and N takes on the successive values 6, 4, 3, 0, 1, 5, 8, 7, 2. Typically, the


successive values of L are just a cyclic permutation of 0, 1, . .. , n keyed by
the location of the maximum Si . Typically, the successive values of N are just
N+ 1 —(rank Si ), which is a permutation of 0, 1, ... , n. The result follows
since each of the n! "cycles" is equally likely. This is from Andersen (1953).

Corollary 1. Let X 1 ,. . . , X„ + , be symmetrically dependent rv's for which


S,, i = 0. Then (3) holds provided

(4) P(S; = 0) = 0 for all 1 < i s n.

Lemma 2. (Takäcs) Let k 1 ,. .. , k„ be nonnegative integers with k, + • • • +


k = k <_ n. Among the n cyclic permutations of (k,, ... , k„) there are exactly
n — k for which the sum of the first i elements is less than i for all 1 <_ i <_ n.
Proof. We need only consider the plot of the 2n points {(i, k, +• • • +k; ): I
i <2n} where kn +; = k; for 1 <_ i s n. The following two plots completely illus-
trate the situation. In Figure 2a the n — k = 8-5 = 3 "successful” cyclic permu-
tations begin with k6 , k8 , and k, respectively; in Figure 2b they begin with
k4 , k 5 , and k6 . (This proof is from Vincze, 1970.) ❑
378 SOME EXACT DISTRIBUTIONS

i Ik i
1 0 1 2

10
2
3
4
0
0
2
31
2 2

//
5 2
6 0 60

5
7
8
1
0
70
80•

d #'.
t^
8 16 1 8 16

(a) (b)
Figure 2.

Lemma 3. (Tusnädy) Let A 0 , A,, ... , A. denote a partition of a probability


space for which P(A)=O for 1 <— j <_ n and P(A 0 ) = 1— nO. For 1 s i <_ n let v;
denote the number of occurrences of A i in n independent trials of this
experiment. Then

(5) P(v,+• • •+v i <i for 1^i<_n)=1—n8.

Proof. Let K denote an integer-valued iv that equals i, provided v> v1 for


all 1 <—j < n; in case of ties, the choice of rc is to be made according to an
independent random mechanism. Let 1r = (z , v, +1 , ... , v,,, v,, ... , r K _,) and
let B = [ v, + • • • + v i < i for 1 s i ^ n]. Given ir, the n cyclic permutations are
equally likely. Thus

P(B)=E(E(P(BI ir)I v,+• • •+v„))


=E(E(1—(v,+• • •+v„)/nly,+• • •+v„))

=E(1—(v,+• • •+v„)/n)=1—(n8+• • •+n8)/n=1—n8

where P(B a) = 1— (v, + • • • + v,)/ n by Takäc's Lemma 2. (This lemma is


stated in Vincze, 1970.) ❑

We think it useful at this point to illustrate the use of Lemma 3 by giving


a second proof of Proposition 9.1.1.
Proof of Proposition 9.1.1. Now B; = C; n Di where

C; _ [exactly n — i observations fall in (a + ib, 1]]


and
— [the interval a + (i —j)b, a + ib contains less than il
D; observations for all I < j - i J
SOME COMBINATORIAL LEMMAS 379
Now

P(C) =1

We will now use Tusnädy's lemma (Lemma 3) to evaluate P(D,I C ). For this
;

purpose, let a 1 ,... , a ; be Uniform (0, a + ib) rv's, and let A 0 = [0, a), A i =
[a +(i -j)b, a +(i - j+1)b) for 1 < -j i. Then

P(D IC )= 1 -ib/(a+ib)=a/(a+ib).
; ;

Thus

P(B1 ) = P(D,I C1)P(Ci)

a
(a+ib)'(1-a-ib)"-'
a+ib (n)

as was claimed. This proof is from Vincze (1970). ❑

Lemma 4. (Csäki and Tusnädy, 1972) If, in the context of Lemma 3, M = M"
denotes the number of indices i, 1 <_ i r, such that v, + • • • + v; = i, then

n
(6) P(M?m) =m! B'° for0<_msn
m

and

(7) P(M =m) =m!I m I(1-(n- m)0)Bm -m <_n.


for 0<

Proof. When n = 1, the assertion is trivially true. We now proceed by induction


on n. Suppose that the assertion holds for n < N where N> 1. If vo > 0 then,
by the induction hypothesis,

1
(a) P(M^mfvo)=m!(N—vo) on [v0 >0]
m N'"

since the conditional probability of any A ,1 <_ i < N, given Aö is 1/N. Establish-
;

ing (a) in the case vo = 0 requires separate treatment.


Let P"(m; 0) =P(M?m) and let p"(m; 6) -P(M= m). We claim that

(b) P"(m; 1/n) = P"_,(m-1; 1/n).

For m = 1 (b) is obvious, since both probabilities equal 1. For m> 1, letting
380 SOME EXACT DISTRIBUTIONS

j denote the (m — 1)st index for which v, + - • • + v; = i holds, both


(n)i(l—n)n ; pi
(c) Pfl(m; 1/n)=^^^ \ll (m-1; 1 /I)

and
^-' n —1 7 ' j \ ^-^-;
(d) 1°n-1(m-1; 1/n)= ' 1 n I P;(m-1, 1/j).
\

But these sums are termwise equal since (;)(1—f/n) _ ("; `). Thus (b) holds.
Hence by (b) and the induction hypothesis,

P(M?mI v0 =0)=PN { m;
NJ

=PN-^ m-1; N
1` by (b)

N-11 1 1
=(m-1)! by induction
(m-1 ) N
=
m!^ N )

m N'°
1
so that (a) holds on [ v o = 0] too.
Now it follows easily from (a) and the fact that vo = Binomial (N, 1— NO),
that
P(M ? m) = EP(M >_ m v0)

ty-k
M! E[( N m vo )) N m o (Nm k k)( 1—N0 ) k (N0 )
l\
N—m)(1—NB)k(N9)N-k m N9
=m!\ m k-o k J N"'

=m! (hem
as was claimed. Verify (7) by easy subtraction. ❑

Note that setting m =0 in (7) gives an alternative proof of Tusnädy's result


(5).

5. THE NUMBER OF INTERSECTIONS OF G,, WITH A


GENERAL LINE

In Section 1 we asked if G. intersected a general line L. We now ask for the


number of intersections.
NUMBER OF INTERSECTIONS OF 6,, WITH GENERAL LINE 381
For 0<d<1 and c >d let

(1) N„(c, d) =the number of times G„(t) =c(t –d), 0 <_ts1


= the number of times U. (t) = n' / Z [(c – 1) t – cd l, 0 <_ t <_ 1

= the number of times G. intersects L


= the number of times G. intersects L'
= the number of times G„(t)= I+c(t -1+d)
(2) = 1V„(c, d).

Theorem 1. (Takäcs; Csäki and Tusnädy) For 0 <_ d < 1, c> 0 and all n ? 1

P(N„(c, d)? i+1)

(3) =i!(.) nc)'(


d+
fl Cj k d» (k–i)
k '(
x( d+—) ( i–d– —) n -k

for i=0, 1,...,(nc(1 –d)).

Upon noting that

(4) [IIU II<n 2 A]=[Nn( 1 ,A) = 0 ],

the Birnbaum and Tingey theorem (Theorem 9.2.1) follows from the case c = 1,
d = A, i =0 as a corollary. Similarly, upon noting that

I [N(A. =i
(5) LI^1 II^
„1 <^ – 1 a/ ]

(the intersection at t = 1 always counts 1 in this case), the special case c = A ->1,
d = 1-1/1t, and i =1 yields Daniel's theorem (Theorem 9.1.2) as a corollary.
Note that II(1–G„) /(1– I) 1 I = IIG/IU.
Moreover, noting that N. (A, 1-1/ A) equals the number of times that
I – G„ (t) = A (1– t) for 0: 1:5 1, which in turn is distributed as the number of
times G. (t) = At for 0 < t -- 1,

(6) PIN„ A,1 –– i fori= 0,...,n

( ) 1
7 ^ as n -^ oo.
A'
382 SOME EXACT DISTRIBUTIONS

Exercise 1. Obtain Chang's theorem (Theorem 9.1.3) from Theorem 1.


Proof. First, note that G. (t) = c(t – d) can hold only for t = d + k/ nc for
some nonnegative integer k. Thus if

^" k k
(8) Ek = I G n (d +—) _–
L nc n

for k = 0,. . . , (nc(1– d)), then clearly

( n k k\" -k
i
-_ )" –d–nc l .
(

(a) P(Ek)=(k)(d+

Now we need only to find the conditional probability of the occurrence of


exactly i of the events E0 ,. .. , Ek _ 1 given that Ek occurred.
Let v; , 1 <_ j k, denote the number of 6 1 's in the interval (d + (k – j)/ nc,
d+(k–j+1)/nc) and let vo denote the number of ei 's in (0, d). Then E k _ m =
[Y_j , vJ = m] n Ek for m = 1, . .. , k; and by Lemma 9.4.4, the conditional
probability that there are exactly i such indices given Ek is

P(exactly i occurrences among E0 ,..., E k _ I I Ek )

=i\k,l I c )1(d + k) '` 1–(k–i)


J d+k/nc)

(9) k 1 l(d+ncj
–tlj i, (nc)' (d+—

k/n

0
0 d 1

Figure 1.

Hence,

P(N"(c, d)>_ i +1)


(nc(1—d))
_ E P([exactly i+ 1 occurrences among E0 , ... , Ek _,] n Ek )
k =i

(nc(1—d))
(b) _ P(exactly i occurrences among E o , ... , Ek -, I Ek )P(Ek ).
k =i
NUMBER OF INTERSECTIONS OF G,, WITH GENERAL LINE 383
Thus the assertion of the theorem follows upon substitution of (a) and (9)
into (b), coupled with a regrouping of the factorial terms. ❑

Exercise 2. (Takäcs, 1971; Csäki and Tusnädy, 1972) Show that if c < 1 is
fixed and d=u/n, u >_ 0, then

u 1
lim P N„ c,— >i +1 I
n-. n —

-i -' e-u-k /c
(10) u+i/c CO (u+k / C )k
£ i=0,1,2,....
C k-i (k—i)!

This generalizes Chang's theorem (Theorem 9.1.3).

Exercise 3. (Csäki and Tusnädy, 1972) Show that if A>0 is fixed and
c = ,1(1—v/n), then

(11) limP(N„(A(1 —n),1— ) ?i +1) = d(1 —f)- ;

for i= 0, 1, 2, ... where


= (m —v) m
d = m
m ov(v) m!/^

This generalizes Daniel's theorem (Theorem 9.1.2).

Exercise 4. (Csäki and Tusnädy, 1972) Show that if u ? 0 and u — v ? 0,


then for all z ? 0

lim P(1N„(1+n - '/ 2 v, n - ' / Z U) >_ n'/ 2 z)


n-m

(12) =exp(- 2(u +z)(u +z—v)).

In particular, with N. (1, n - ' / 2 u) = (the number of times U„ (t) = —u, 0< t 1),

(13) P(ZN,( 1,n - ' /2 u)?n'' 2 z)->exp( -2(u+z) 2 )


(14) = P(II(U — u) + II >— z).
2 1
Hint: exp I\— t(1 — e ) dt =
) xp (-2b(a +b))
Z= o t t(1— t)

for b>0, a + b ? 0. Darling (1960) expands (13) to obtain the 1/v' term.
Csäki and Tusnädy (1972) obtain similar results for the empirical df of a
Poisson sum of independent Uniform (0, 1) rv's.
384 SOME EXACT DISTRIBUTIONS

Smirnov (1939, 1944) proposed N„(1, 0) as a goodness-of-fit statistic [reject


for small values of N= N,,(1, 0)] and proved the v=0 case of Exercise 3.

6. ON THE LOCATION OF THE MAXIMUM OF U.+ AND 0

In this section we will see that the point at which U, is maximized is uniformly
distributed over [0, 1]. Another kind of uniformity obtains for the slightly
different O n process defined below.
Let the smoothed empirical df 4„ be defined by

(1) 6„(e,,:;)= i /(n+1) for 0<_i<_n+l

with (D„ linear on each of the intervals [(i -1)/(n+ 1), i /(n+l)] for 1<i<
n + 1. The smoothed uniform empirical process is

(2) On(t)=v'[(9n(t)— t] for 0<— t<_ 1.

Note that On is a random element on (C, T) with U. a random element on


(D,- 9 ).
Note that

(3) „' (i /(n+1))= fin:; for0^i<_n+1

with Vin' linear in between. We call

(4) 4/„(t)=^[Q3n'(t)—t] for0<t1

the smoothed uniform quantile process. Note that EÖ/„ (i/ (n + 1)) = 0 for all
1<_i<n.
Note that

(5) I1U+nII= sup v„(t)=V max [i /n in:;], —

0st1 0<-inn

so that U„ (t) is maximized at one of the points f„ :o , „,,, ... , „,„ ; and that,
except on a set of measure zero, the maximizing point is unique on [0, 1). We
let

(6) p; = p„ = P[v„ (t) is maximized at


; for 0 i <_ n,

and

(7) K = K. will denote the index of the maximizing en;;.


ON THE LOCATION OF THE MAXIMUM OF U4. AND U: 385
Thus p; = P(K = i). Also, we have that

(8) en:.. denotes the point at which U(t) is maximized.

Note that

(9) IIU II=.^n[K /n —


en:..)•

we also let

(10) 1-a„ = µ( {t: Q_1„(t)> 0 })

where is denotes Lebesgue measure.

Theorem 1. (i) (Birnbaum and Pyke)

(11) Cn: c = Uniform (0, 1) for all n.

Even though 0 < p, < • .. <p,, < 1,

(12) K/n -+ d Uniform (0, 1) as n- X.

(ii) (Gnedenko and Mihalevic)

(13) µ„ =Uniform (0, 1).

We now define

(14) P„ K, C,,:,,, lä, L

as in (6)-(9) above (except we define them in terms of 0 instead of U) and


by letting L denote the number of indices i for which U (4 . 1 ) is >0.

Theorem 2. (Pyke) . and L are uniformly distributed over 0, 1, ... , n. Thus

(15) p =P(K= i) =P(L= i) = 1/(n+1)


; for 0 : in.

Also

(16) Cn :K ^ d Uniform (0, 1) as n -x.

Remark 1. We will follow Dwass (1959) for Theorem 1 and Pyke (1959) for
Theorem 2. We will not, however, prove the claim 0 < p, <.. < pK < 1. See
386 SOME EXACT DISTRIBUTIONS

Birnbaum and Pyke (1958) for this claim, as well as

n-1
Pi = Y 1 n)ji( n - j)„-;-I/nn,
(

(17)
j_-n -i j+ 1 j
i !-1
(18) np -
; exp (-j) — oo,
as n-,
i =I ji

(19)
i -I
np„_ tee-Y exp(-j) (j
;

j=o
+ l)I as

1 n ! n -1 n i
(20) EK/n = 1+ n n+l E=o I
2

and the joint df of K and From (9), (11), and (20) it follows that
n ! n -I n i
(21) EIIvnII-
2n n+1 Y ^-•

The result (13) is due to Gnedenko and Mihalevic (1952). Cheng (1962) shows
that the number J ° J„ of indices i for which U,, ( „:;) >_ 0 satisfies

1 i 1 n \ i 1 / i \ ”-'
(22) P(J=j)=- Y_- I- 11- - I for 1j<_ n.

Vincze (1970) shows that the number M = Mn of horizontal intersections of


the graph of G. with the line y = t/(n9), where n8 <_ 1, satisfies

(23) P(M?m)=m!I IO m , 0<-m<_n.


\m
n/

We will have more to say on this in Section 7. Wellner (1977c) considers

(24) min"", max


isisn I 1sin-I
„ "
1
^'
+ , and max°".
Iin 1

He presents formulas (some known previously) for the joint df of (K, „ :K ) in


all three cases; here K denotes the index at which the extremum is achieved.
Dwass (1959) extends (11) and (13) to arbitrary positive convex combina-
tions of uniform empirical processes. Dwass (1959) also obtains the joint
distribution of the value of the order statistic and the index of the order statistic
at which the maximum VG,,/III is achieved; his methods extend to other types
of supremums.
Proof of Theorems 1 and 2. We will now establish (11) and (13) by appeal
to Anderson's lemma (Lemma 9.4.1). Thus for m much larger than n we define
ON THE LOCATION OF THE MAXIMUM OF U; AND 0 387

i-1 1
(a) X;=G „(( for 1^i<m +1,
i—i i l ) m+l

and note that X 1 ,.. X,„ + are symmetrically dependent. Let L= L. and
. , ,

NE N. be as in Anderson's lemma; and note that

LI I 2n
and k
(b)n m+1 cm +1 m+1 c m +1

for all m ? some m , (note Figure 1). We will verify below that the S,
0

X, + • • • + Xi satisfy

(c) P(S; =0)=0 for 1_5i^m.

O n —I n jumps
m + 1 subdivisions of [0,11

0 1

Figure 1.

Thus Anderson's corollary (Corollary 9.4.1) yields that L and N are both
uniformly distributed on 0, 1, ... , m. Thus

(d) L/(m+1) -* d Uniform (0, 1) and N/(m+1) Uniform (0, 1)

as m - . Combining (b) and (d) yields

(e) ,,:^ = Uniform (0, 1) and µ„ = Uniform (0, 1).

Thus (11) and (13) hold, subject to proving (c) in the next paragraph.
We will establish (c) only when m + 1 is a prime; and thus we note that
(d) holds when m + 1 oo through the primes. We now establish (c). Assume
some S; =0. Letting J =nG„(i /(m+1)), this means J/n = i/(m+1) or (m+
1)J = ni. But m+1 is prime, son divides J. Thus J = n, and hence m + 1 = i.
But 0 <_ i <_ m. Thus we have a contradiction of the assumption that some S i =0.
Thus (c) holds.
The result (12) follows from (11) and the fact that

(f) IK /n— 116n- 111 a.s.o

by the Glivenko-Cantelli theorem (Theorem 3.1.2).


388 SOME EXACT DISTRIBUTIONS

We now turn to Theorem 2. Now the negative of the spacings 5;, ; =


(1/(n + 1))—(en:1 —en . ; _ 1 ) for 1si^n+1 are symmetrically dependent rv's
that sum to zero and satisfy (9.4.4) and

IIv+nII= max [i /(n+1)—en:i ]


I^in+1
(g) = max
Isin+1 j=1

Another application of Anderson's corollary (Corollary 9.4.1) yields P(K = i) _


P(L= i) =1/(n+1) for 0<_i Thusn.
(h) k/(n+1)-*dUniform (0, 1) as n--x.

Finally, Glivenko-Cantelli again yields

I . /(n + 1 ) — en:.I — I I`
<IIG —III+1/(n+1)
n

(i) _*a.s.0-

Equations (h) and (i) yield e„ : ,Z -* d Uniform (0, 1). ❑

Exercise 1. Prove (17)-(20). Finally, show 0 <p l < • • • <pn < 1.

Exercise 2. (Kac, 1949) If µ denotes Lebesgue measure and U denotes


Brownian bridge, then

(25) µ({t: U(t)>0})-Uniform (0, 1).

[Use Doob's theorem (Theorem 3.3.1).]

Exercise 3. Prove (23) using Lemma 9.4.4.

7. DWASS'S APPROACH TO G, BASED


ON POISSON PROCESSES

We let {N(t): t>_0} denote a Poisson process with

(1) EN(t)=At for some 0 < A < 1

(contrary to our usual canonical choice of A = 1). We define

(2) T= sup {t: 101(1) = t}


DWASS'S APPROACH TO G, BASED ON POISSON PROCESSES 389
to be the last time at which N(t) = t. Note that T takes on the values 0, 1, ... , 00.
That

(3) P(T<oo) =1

follows immediately from the SLLN and the representation of the Poisson
process in terms of exponential interarrival times. Following Lemma 1 below,
we show that

(4) P(N(t)<t for all t >0)= P(T =0)= 1 —A.

Now for [ T = n ] to occur we must have [N(n) = n] and [!N(t + n) < t + n for
all 1>0], conditional on [N(n) = n], J(• + n) — n =— N; thus, using independent
increments,

P(T= n)= P(N(n)=n)P(T =0)


(5) = (1— A)(An)" e -A " /n! for n =0, 1, ... .

Moreover, as in Proposition 8.2.2, given that [T= n] has occurred, the n


events of the Poisson process are uniformly distributed on the interval [0, n].
Thus N(tn )/ n for 0 ^ t <_ 1 is such that its conditional distribution given [ T = n]
satisfies

(6) r(- n )/ n I [ T = n] = (the Uniform (0, 1) empirical df G").

Suppose now that f is a function of I%! such that

(7) the value of f is completely determined by NI(t) for 0 <_ t ^ T

Then, for an appropriate function j" of G ", we have

(8) E(.f(N) I [T'= n]) = E(f"(G"))•

Conversely, given a function f" on G. for all n >_ 1 and making an appropriate
definition of fo , we can determine a function f on NJ satisfying (8).

Theorem 1. (Dwass) If

(9) Ef(!N) =(1 Y_


—A)
" =o
a"(Ae "^)",
-

then

(10) Ef"(G ")=a"n!n " - for n=0, 1,....


390 SOME EXACT DISTRIBUTIONS

Proof. Now, using (9) and (5) for (a) and (b),

(1—A) ') n
(a) an(Ae -
n=0

(11) = Ef(rJ) = Y, E(f(N) I [T = n])P(T = n)


n=o

= L EJn(G )P(T = n)
n=0

(b) =(1-A) I Efn(Gn)


n =o
n ( ,1e - ')".
n.
l

Equating the coefficients of the power series (a) and (b) gives (10). ❑

The power of this approach is quite phenomenal. Of course, we must be


able to compute Ef(J). The approach and all but one of the examples that
follow are from the excellent paper of Dwass (1974).

Example 1. (Zeros of U) Let N denote the number of times that !(t) = t


for t>0; clearly N satisfies (7). We let N. denote the number of times that
G n (t) = t for 0< t<_ 1, with N0 — 1. By repeatedly applying (4) we see that N
is a Geometric (A) ry (note Fig. 1) for which

(12) P(N?k)=A k for k=0, 1.....

Equation (11) can thus take the form


n n Ak
^ o P(Nn >_k) nl (,1e - ^)"(1—,1)=P(N>k)=A k = (1—A)
(1—a)
n n—k
(a) —o (n — k)! (A e ") n (1— A) by Lemma 2 below.

Figure 1. Arrows mark the renewal points leading to the geometric distribution of N.
DWASS'S APPROACH TO G. BASED ON POISSON PROCESSES 391
As in (10), we conclude that
n n_k ! I k
(n—k
(b) P(N"^k) (n—k)! nn )! \n/

(13)
=k!\k/ \n) k,

which agrees with the fact that N. = Nn (1, 0) in (9.5.3). It is an easy application
of Stirling's formula that

(14) P(Nn / /?x) -exp( —x 2 /2) as n -^oo

for all x >_ 0. ❑

Example 2. (Ladder points of U ") Let L denote the number of ladder points
of IN( t) — t for t> 0; that is, L denotes the number of times that N( t) — t achieves
positive maxima which exceed all preceding maxima (note Figure 2). Since

t
Figure 2. Arrows mark the ladder points, while J„ J2 ,.. . indicate the newly achieved excesses
over previous maxima.

the ladder points are also "renewal points", it is again clear from (4) that L
is a Geometric (A) ry satisfying

(15) P(L^k)=,k k for k = 0,1,....

Thus the calculations of Example 1 show that the number L n of ladder points
of the empirical df G. satisfies
n 1 k
(16) P(L">k)=k!()(—)
k for k^0

while
(17) P(L n /Vi> x)-+exp( —x 2 /2) as n -*co

for all x ? 0. ❑


392 SOME EXACT DISTRIBUTIONS

Exercise 1. Verify (14) and (17).

Proposition 1. For all x> 0

(18) P(sup[t\1(t)—t]?x)= Y n —x (Ae -A ) „ e" x (1—,1).


t>0 „>x n.

Exercise 2. Prove (18). Then use (18) and Lemma 2 below to rederive the
Birnbaum and Tingey formula (9.2.1).

We define the excess J of i by

th value of FJ(t) — t at the first instant N(t) > t


(19)
— 10, if the above event fails to occur.

Note that J = I, as shown in Figure 1; we will call 1,, 2 the excesses of


' , ... ,

N. Clearly, I,, ... , I,j [N ? k] are iid because of the strong Markov property
of the Poisson process.

Proposition 2. The conditional distribution of J given that [T> 0] is Uniform


(0, 1) ; symbolically,

(20) JI[T >0]=Uniform (0, 1).

Proof. Let 0< x < 1. Let f, denote the conditional density of J given [T> 0].
Let f„ denote the density of the nth waiting time i „. We must show

col n—I
(a) fi(x)= i I1— If„(n—x) /P(T>0).
niL n—xJ

Now al„ = n —x happens with "probability” f„ (n — x) and guarantees that


N(n—x)—(n—x)=n—(n—x)=x; moreover, Il(n—x)—(n—x)=x is the
value of J provided [N(t) < t for 0 <_ t <_ n — x], and by Sukhatme's proposition
(Proposition 8.2.1) and Daniels's theorem (Theorem 9.1.2) this latter event

(n—x)/(n-1)
n—x n—x
slope —1
slope 1 — n — 1
7" transforms to

0 n—x 0 1

Poisson picture G._, picture

Figure 3.
DWASS'S APPROACH TO G. BASED ON POISSON PROCESSES 393
has probability [l — (n - 1)/(n—x)]. Thus

(b) .fi(x)=nZt x)" -de-"("-'n/P(T>0)


n—x (nX nl)!(n—
(n - 1+1 —x) n -2 (Ae-")"-`Ae-"('-")/P(T>0)
= (1—x)
n=1 (n-1)!
= A e" c ' - " ) e -A(I- " ) /P(T> 0) by Lemma 2 below
(c) = A /,1 =1= (the Uniform (0, 1) density)

as claimed. ❑
^

We define the ladder excesses J,, J2 ,... as in Figure 2; thus J; is the amount
by which the excess at the ith ladder point exceeds the excess at the (i —1 )st
ladder point. Again note that J = J, in Figure 2. It is again clear that
J,, ... , Jk I [ L> k] are iid rv's.
We now summarize these distributional results we have established for the
excesses I,, IZ , ... and the ladder excesses J,, J2 , ... .

Proposition 3. Let N and L be as in Examples 1 and 2.

(i) Given that N >_ k> 0, the rv's I,, ... , Ik are conditionally lid Uniform
(0, 1).
(ii) Given that L> k> 0 the rv's J,, ... , Jk are conditionally iid Uniform
(0, 1).

Exercise 3. It follows from above that

(21) M =sup[^I(t)— t]=6, +• • • +CL


t >o

for independent Uniform (0, 1) rv's f',, 62 ,..., independent of L. Verify that
n n n
P ,max/ = 1)'(x — i)/n!
1 r=o()(-
i

Use these two results to give an alternative proof of Proposition 1.

Example 3. (Excesses and ladder excesses of n(G" — I)) Let I, In2i ... be
the excesses and 1n1 , J" Z , ... be the ladder excesses of n (G" — I) [just replace
rkJ(t) and t by nG "(t) and nt in Figures 1 and 2, respectively]. Let N,, and L"
be as in Examples 1 and 2, respectively, and let

(22) Mn= nuIG, — III =Jn1+ ...


+J"c. and Sn=lnj+...+InN^.
394 SOME EXACT DISTRIBUTIONS

Then for the conditional distribution of the ladder excesses we have

(23) f,, 1 ,... , J" k I [L" _


> k] are independent Uniform (0,1),

so that

(24) M"=1+ ...


+Cc,,.

This yields

(25) P(M"/f>t)-> exp(-2t2) as n+oo for all t>_0.

Likewise

(26) In ,, ... , Ink [N" ? k] are independent Uniform (0, 1),

so that

(27) Sn = l+. SN,

and
(28) P(S"/V > t)-+exp (-2t2) as n-^oc for all t-0. ❑

Proof. Now for a Bore! subset B of Rk

1— A)
n l (Ae -'k )"(
Y- P([(J" i ,...,J"k)EB]r[L"?k]) n"

(a) =P([JI, ...,Jk)EB]n[L^k])


by Dwass's theorem (Theorem 1)
(b) = P((e I , ... , ek ) E B)P(L>_ k) by Proposition 3

(c) =P((e1,...,fk)EB) £ P(L">_k)n^(Ae -z )"(1—A)


"= o n.
by Theorem 1,

so that equating coefficients gives

(d) P([(J"1, ... , J"k) E B] n [L" ? k]) = P((e1, .. . , 6k) E B)P(L" ?k).

This proves (23), from which (24) is immediate. Finally,


M" 1.}....
Y
+6L L.
by ( 24 )
V L" T

= 12+oP (1) by WLLN;


J
so that (17) completes
L the proof of (25). ❑
DWASS'S APPROACH TO G,, BASED ON POISSON PROCESSES 395
Exercise 4. Verify the analogous results (26)-(28) for S„ and the 1„ k 's.

Dwass also considers

(29) N(c) = the number of times that *l(t) = ct for t>0

and

(30) L(c) = the number of ladder points of rl(t) — ct for t > 0

for c >A. Clearly, N(c)= Geometric (A /c). Thus

(31) N„ (c) = the number of times that G„ (t) = ct for t>0

satisfies
/ n 1 k
(32) P(N„(c)>_k) =k!1 for k=0, 1,... and c> 1.
k,(nc)

This is an alternative proof for Daniels's theorem (Theorem 9.1.2) and a special
case of Csäki and Tusnädy's theorem (Theorem 9.5.1). Dwass also considers

(33) M(c)= sup[(t)—ct] and M„(c)°nhIG„— cIlI


>ot

for c> 1, as well as excesses and ladder excesses for c> 1. He also makes
some limited remarks in the two-sided cases.

Exercise 5. Show that

P([ — r «1(t)—t <s for0is T]u[T =0])


(34) =P(M<r)P(M<s)/P(M<r +s).

Example 4. (Crossings on a grid) Bergman (1977) considers the smoothed


uniform empirical df G„ that equals i/ (n + 1) at each 1; and is linear in
;

between. A crossing from below is said to occur in the interval (i/ (n + 1),
(i+1)/(n+1)] for i = 1,...,n if

(35) „:;<i /(n+1) and 1;„ ;+ ,>_(i +1)/(n+1).

We let K. denote the number of such crossings from below. A crossing from
above occurs in (i/(n+1), (i+1)/(n+l)] for 1Ci <_n -1 if

(36) „:;>i /(n+1) and f„ + ,<— ( i+ 1 )/(n+1).


:;

A crossing is defined to be crossing from below in (1/(n+1), n/(n+1)] or a


396 SOME EXACT DISTRIBUTIONS

crossing from above. For technical reasons, a crossing from below on (n/(n +
1), 1] is not called a crossing. Let M. denote the number of crossings.

Exercise 6. (Bergman, 1977) Derive the exact distributions of K„ and M.


Then show that

(37) P(K„/V7i? t)-> exp (—e 2 t 2 /2) as n-^ x for all t?0

and
(38) P(M„/V >_ t) -> exp (—e 2 t 2 /8) as n-x for all t ? 0.

Some additional results, from a different point of view, are stated in Darling
(1960).

Lemmas

Lemma 1. Let X,, X 2 ,. . . be a stationary sequence of random variables each


of which assumes the values —1,0, 1..... Define

(39) M(X I , X2 , . ..) = max (X I , X,+ X 2 , ...) and M + = max (0, M).

Suppose that EM <oo. Then

(40) P(M<0)=—EX,.

Proof. Now M(X,, X2 ,. . .) = X 1 + M + (X2 , X3 , ...), so

(a) EM=EX,+EM+ and EM + =E(M + 1 [Mzo] )

Thus
(b) EM=E(M+1[m ot )—P(M<0),

since M <0 implies M = —1. Hence,

(c) E(M+1LM^o])—P(M<0)=E(X1)+E(M+l{M .o])

which completes the proof. ❑

Assertion (4) holds since

P(N(t)<t for all t>0)= P(IN(l)-1<0,N(2)-2<0,...)


=—E(N(1)-1)=1—A.

follows from Lemma 1.


DWASS'S APPROACH TO G. BASED ON POISSON PROCESSES 397
Lemma 2. We have

(41) (n+u) " - k


(Ae-^)"=.1ke /(1 —A)
n=k (n—k)!

and

(42) (k+u) w +( "n (n—k)!


u)
-k-1 (A e -a )n =AkeAu
n=k

for all k=0,1,2,...,0<—A<1, —<u<.


Proof. First suppose that u >_ 0. Let l be the Poisson process. For k =
Proof.
0,1,2,...,

P(IN(1) = k) = A k e -A /k!
_ Y_ P(t\l(1) = k, t — (u + 1) is last crossed at height n)
n^k

(a) _ Y, P(NI( 1 )=klN(n +u+1) =n)P(101(n +u+1) = n)(1 —A)


nzk

by (4). Under the condition N(n + u + 1) = n, 1N(1) is binomially distributed


with parameter p = 1/(n + u + 1) and sample size n. Hence (a) equals

(n) (n+u)" -k [A(n +u+1) "] e -acn +u+')(1


(b) Y_ —A).
n ^k k (n+u+1)" n!

After rearranging, this gives


(c) AkeAu= E ( n+u) "-k
(Ae-A)n
1 —A nzk (n—k)!

which proves (41) in the special case that u>_0. Now write

(d) (n+u) "- k = (n+u)" - k - '(n—k+k+ u).

Hence, from (41),


-k-1
A k e Au ( n +u ) n -k - ' (n +u)n
(Ae ')" +(k
(e) 1—,1 n +1 (n—k-1)t +u)n^k (n—k)! (Ae-")".
The first expression on the right-hand side can be evaluated by (41) to equal
-A)Ak eacu+ll /(1 —A).
(f) (A e
398 SOME EXACT DISTRIBUTIONS

Hence considering the second expression on the right-hand side,

) n k -
(n+u 1
-a n
(k+u) Y_
"aA, (n — k)! (^ e)
=(Akean—(A e-A)Akeacu+'))/(I—A)

=Ak e al.
(g)

which proves (42) also in the case that u ? 0. It follows that the left-hand sides
of (41) and (42) have absolutely convergent power-series expansions in u for
all u. Hence from considerations of analytic functions, both sides of (41) and
(42) are equal for all u. ❑

8. LOCAL TIME OF UJ n

Let

(1) L(x)-1n-1/2 {number of times U(t) =x,0<t^1

Thus L. serves as local time or "occupation density" for the empirical process
U,,.

Proposition 1. If µ denotes Lebesgue measure on [0, 1] and AG R is any


Borel set, then

(2) µ{0-t<_1:v„(t)EA}=2 J D_„(x)dx.


A

In fact, for any Borel function f,

(3) f
f(Un(t)) dt=2
=
J f(x)^-n(x) dx.

Proof. For an arbitrary set A and Lebesgue measure we have

{0<_t<_1:IJ n (t)EA}= J dt

n
dt
i =^ {C+n:i i
751 C :i +l.vn{!}EA}
5

_ dt
i =0 {f;ter<^;+Jn(i/n—t)EA}

LOCAL TIME OF U, 399

J f
n i
= — n2ds letting s=^^- -t)
i= 0 {f ,:j-5i/n—s/v/n<f.:j.j,srA) n

= n ' /2 — ds
=0 {Jn(i /n —fn: + ^)<ssJn(i /n -2^ : ),seA}

—1/2 n
= (Jn(i /n -f). ✓ n(i /n -))(s) dS
i=0 A

(a)
—1j2
A { i=0
— f n ds.

(b) = Jn A
- ' /z {the number of times U(t) equals s} ds

(c) =2J L(s)ds.


A

Note that (b) holds since the graph of U n (t) is a series of n + 1 line segments
of slope —Vi, and (a) simply counts the number of these segments that contains
each fixed s in A. ❑

Exercise 1. Prove (3).

Exercise 2. Recalling Section 9.5, show that

Zn —'/2 AV (1, n —' /2 x)


f /2
in ' Nn(1 n -'/2IXI) for x<_0.
for x?0

Example 1. With f(x) = x 2 , we have

[Un(t)] 2 dt = 2 x2Ln(x)dx,
(4) f

so that the classical Cramer-von Mises statistic may also be viewed as the
second moment of the local time (density) 2Q_ n . [Set f= 1 in (3) to see that
the word density is appropriate.] ❑

Exercise 2. Show that

/2
2L n (x) = n I(nil2(+/n—f^:^+^)
— '

.n ij2 (i/n—^'^:^))(X)
i=0

i=o
400 SOME EXACT DISTRIBUTIONS

Exercise 3.Use the result of Exercise 2 together with the joint density of two
uniform order statistics to show that, for x ? 0,
n n i i „ -i
(6) E(21-„ (x)) = n E i (-- n-'^zx) \ 1--+n
n
^izx
= (1i +I () n ) /

and hence that

(7) xZ du
E(20 „(x})^ f 11— u) exp ^— 2u(1—u)
o 21ru
'

exp( — Y 2/ 2 ) dy
2x

(thanks to M. Gutjahr and E. Haeusler) and use the DKW inequality for

(8) E (21L„(x)) < 1 29 exp (-2x 2 ) for all n>_1.


x

Exercise 4. Show that for any real x and for all y > 0

(9) P(L„ (x) > y) -> exp (-2(x + y ) 2 ) as n -> oo

(10) =P(II(U—x)+11>Y)=P(ßw+I!>x+y).

Use Exercise 2 and Section 9.5.

Open Question 1. Prove that

Q-„ = Q_

where IL denotes the local time of Brownian bridge U, satisfying for Borel
sets A,

µ{0<—t<_1:U(t)EA}=2J L(x) dx.


a

Show that Exercise 4 yields convergence of the one-dimensional laws of L„.

Open Question 2. Find the exact and limiting distributions of L„ =


sup_ <x<oo L. (x).
Open Question 3. Establish (functional) laws of the iterated logarithm for
and the process IL,,; the latter will probably be related to the results of
Donsker and Varadahan (1977).
THE TWO-SAMPLE PROBLEM 401
9. THE TWO-SAMPLE PROBLEM

We do not wish to be heavily concerned with two-sample problems in this


book. Yet, the limiting distribution of the one-sample statistic D is easily
handled by the methods of this section; equations (4) and (9) provide an
interesting alternative to Example 3.8.1, and Exercise 9.2.1.
Let X 1 ,. . . , X,,, and Y,, ... , Yn denote independent random samples from
continuous df's F and G, respectively. Let F. and G n denote the empirical
drs. We would like to test the hypothesis that F = G. We will thus seek the
null hypothesis distributions of

(1) Dmn = ltFm — Gn lI, Dmn — II(IF m —Gn)+II, and D mn = II(^m — G) It .

Theorem I. (Gnedenko and Korolyuk) For integral values of nd we have

2n 2n
(2) P(D nn^ d)=
(n +nd) (n)
and

(3) P(Dnn?d)
If 2n I 2n 2n
2 n +nd n+2nd ) n+3nd )

2n 2n
+(- 1)J+1
(n+Jnd)ll( n '
where J = (1/d).

Theorem 2. (Smirnov) For all x> 0 we have

(4) P( n/2D„n ? x) -> exp ( -2x 2 ) as n -' oo

and

(5) n
P(^2D nn >_x)->2 Y ( -1) k + ' exp (-2(kx) 2 ) as n -+oo.
k=1

These two rather elementary theorems will be proved at the end of this
section.
Note the identity

(6)J (Fm v,„ ( F)— / v(G)+,JN F_


( G)

where N = m + n

402 SOME EXACT DISTRIBUTIONS

for independent uniform empirical processes U,,, and N„. Thus

(7) D„ 1 JU ,” 0/„ 1 * I) for F=G continuous,

where for the Skorokhod versions of these processes we have

4/„ -1)/ as mnn goo


(8) II[ v aJ„,— J —vt JII^°0
using the triangle inequality. Since

n m
(9) N U— Jv=u for all m, n

by Exercise 2.2.6, we thus have


Theorem 3. (Doob)

li as m n n-->oo.
(10) m+n D mn -'d IIU #
Combining this last result with (4) and (5) gives the limiting distribution
of as already recorded in Example 3.8.1.
Proof of Theorems 1 and 2. Now the number of distinct orderings of n X's
and n Y's is (n„ ), and these are equally likely. Thus following the graph of
n[F„ (t) —G„ (t)] from t=0 to 1=1 is the same as taking 2n steps from left to
right starting at the origin, where at n of these steps we also go one step up
and at the other n we also go one step down. (See the dark graph in Fig. 1.)
Note that there are („n ) such "random walks” or "paths,” and they inherit an
equally likely distribution. Thus

(a) P(Dnn'— d) = Nnn/( nn),

where N'.„ denotes the number of such paths that reach a height of at least
nd. But any path that reaches height nd does so for a first time, and when the

2nd

nd _ W

first time height nd was reached

Figure 1.

THE TWO-SAMPLE PROBLEM 403

path from this first time on is reflected about height nd (see the dotted part
of the graph in Fig. 1) it will eventually end up at height 2nd. We thus note
that there is a one-to-one correspondence between all paths from (0, 0) to
(2n, 0) that ever reach height nd and all paths from (0, 0) to (2n, 2nd). But
the latter set of paths correspond to n + nd steps up and n — nd steps down;
there are (nd) such paths. Thus

_ 2n
(b) Nnn
n + nd .
Plugging (b) into (a) gives (2).
Now when m = n, formula (2) and Stirling's formula (Formula A.9.1) give

P( mit/(m+n)Dm n >x)=P( n/2D^n >—x)


=P(D n -x 27n)
n

_2n 2n )
with equality when x 2n is an integer
n+ 2nx)/( n
n! n!
(n+ 2nx)! (n— 2nx)!
n n+l/2 e -n
(+ 2nx)n+ßx+1/2 Q -(n+ 2nx)
n

l nn +1/2 e -n
(n — 2nx)n- 2nx+1/2 e -(n-fix)
x

_ 1 1
n+ 2nx+1/2 ( rx\^x+l/2
n-
1 N
+ V / 1+VnL

✓2x

C 1+ Vn^( 1 V1l) J I(1+^) f^-

/ V GX \ v ^x

XL t 1—^
I
n( 1+ ✓ ]-fxr /i- \mal ✓ x
)
(1-2n2 (
J L` 1
(e-2xz)-,(efx)-fx(e-"x)
L ✓2x

(c) = e -2 x 2
.

This establishes (4). The proof of (3) and (5) is outlined in the next exercise.

Exercise 1. Use Figure 1 and a reflection principle [see Fig. 2.2.3 and equation
(b) of the proof of (2.2.7)] to establish (3). Then use (3) to prove (5).
CHAPTER 10

Linear and Nearly


Linear Bounds on the
Empirical Distribution Function G,

0. SUMMARY

Consider the problem of bounding G. between a pair of straight lines through


the origin as in Figure 1. Since G. (t) = 0 for 0:5 t < en: , , we must be content
with a lower bound over the interval [e,,,, 1].
In Section 9.1, we derived Daniels's (1945) result that

(1) P(1FG„/I^(> A) = I/A for all A ? I and all n

as well as Chang's (1955) exact expression for P( II I /G„ II, >_ A). These, when
coupled with a bound on the latter (Inequality 1 in Section 3), provide upper
and lower bounds on G „, respectively; in particular they imply that for any
e >0 there exists a constant M E such that both

(2a) G (t) < M t


E for all t

and

(2b) G„(t)?t/M for all : - e.1


E

occur with probability exceeding 1— E. This last pair of results, very useful
technically in establishing --* d of many statistics [as in Pyke and Shorack
(1968)], is recorded separately for greater visibility in Section 4. We refer to
(2a) and (2b) as establishing the existence of "in probability linear bounds.”
In Section 1, we examine from an a.s. point of view just how small and
how large en;k (with k fixed) can be; see Robbins and Siegmund (1972) and
Kiefer (1972), respectively. This seems an appropriate chapter for these results
404

SUMMARY 405
at
1
(n-1)/n

••

3/n
2/n t/X
1/n
0 9
^na Ec:2 (n:3 •N En:n-1 Erin 1

Figure 1.

because as one corollary we learn that

(3a) lim IIG,/111 at lira 1/(nf, ,) =co a.s.


:
n-.00 n—m

and

(3b) li n 11I/Gn 11 f ^ li m nen:z = oo a.s.;

thus

(3) a.s. linear bounds on G n do not exist.

There are three ways around this.


The first is to allow the slopes of the lines to depend on n; this problem is
solved in Section 5 where the upper-class sequences for 116,/ I11 and III/6.11 f:
are characterized. See also Exercise 10.7.1.
The second approach is to use for the upper (lower) bound a "nearly linear"
function that has infinite (zero) slope at t = 0. Thus we find no matter how
small a fixed positive 8 and e are, we can a.s. find an n s, £ such that

(4a) G, (t) < (1 + e) t' -s for all t and all n ? EE,s, .

and

(4b) G,(t)> (1—s)t' +b forall t>_^ n ,, andall n? nE,s,,,.

That is, 6,, is a.s. squeezed between a pair of nearly linear functions; see
Wellner (1977b) and Fears and Mehra (1974) for applications. Functions
differing only logarithmically from linearity [see (5a, b)] are possible, but the
406 BOUND :
THE EMPIRICAL DISTRIBUTION FUNCTION G.

bounds in (4) are more convenient in most technical applications. Such


applications include the establishment of a SLLN and a LIL for linear combina-
tions of order statistics n ' cn; X„ . Because of their technical usefulness,
- :j

the results similar to (4) are recorded in a separate Section 6 for greater visibility.
The best upper and lower a.s. bounds we obtain are

(5a) if 4i \ and Eç!i(f) < oo, then


a.s. G. <( 1 I4,11 + 8 )/ r, on (0, 1) for n ? n,,^,
and

(5b) a.s. 6„> (1 -e)t/1og 2 (e e /t) on [„,,, 1] for n> n,.

Of course, these immediately translate into lower and upper a.s. bounds on
G'. In particular, (5b) and (5a) yield

(6a) a.s.G'^(I+e)tlog 2 (e e /t) on [1/n, 1]forn?n,

and

(6b) if t0 \ and Efi(e)<oo, then a.s.G„'?^i/(^^I+i^l +e)


on (0, 1) for n ? n,, 0,.

Note that E/r(e) < is equivalent to f /i '(t) dt <oo, and this may be helpful
-

in choosing a suitable +(i - directly.


The third approach is to truncate off near zero: thus we consider MI6„/ 111 ä ,
III/6„ U ä, , JIG/ I fl ä , and III/G Ij ö as aJ1 0. The key range is a=
(c log t n )/ n with c E (0, oo). Over this interval of c c (0, co), the a.s. tim sup
of the four rv's above all decrease from oo to 1. Some of the detailed results
of Weisner (1978b) are recorded in Section 5. Exponential bounds for the four
rv's as well as identities of the type [ ti I /G„' >_ ] _ [ lIO„/ I /A >_ A] for
A ? 1 are found in Inequality 10.3.2.
We now wish to highlight as a seemingly new theme a result that is actually
so tied to the previous results that its proof uses the results of Section 1 and
it in turn is used to establish the upper bound (5a) in Section 6. Thus we ask
for which functions i' does the extended Glivenko-Cantelli conclusion I(6„ -
I)Vrll ->a.s. 0 hold? This question is answered in Section 2 where we show that
if

(7) t' is \ on (0, 2] and symmetric about t = 2,

then we have the characterization, see Lai (1974),


( 7 a) lye II(G„ -I)q/1I = {^ a.s. according as Egi(e) _ ^^ +y(t dt = j <^
)

(7b) o l
ALMOST SURE BEHAVIOR OF e k WITH k FIXED 407

This yields either II Iqr Ij or oc as the a.s. lim sup of IIG„iIi 1I for 4r \. In Section
6, we establish the parallel results that

(8a) lim sup G°1(t)—t = 1 a.s.,


„yam v(„+l)<„/(n+n t(1— t) log t [1/t(1— t)]

while

(8b) lim sup ? lim o f„ , = oo a.s.


:

n-+ao 1/(n+l)^t^n/(n+l) t( 1 — t) n-+n

Wellner's results for truncation to II II ä” ^” carry over to this problem also; see
Section 5. Also found in Section 5 is Chang's (1955) result that

and )II^°0 asn-oo


(9) II Il —I)Ila"a"APO 11(1 1 I
for a„ [ 0 and na n -* oo.
These then are the main results; however, other points arise along the way.
Upper-class sequences for II n ° (G n — 1)/[1(1— I )]'-" ^^ are characterized a la
Mason (1981b) in Section 5; v=0 corresponds to the basic upper bound
considered above. In Section 1 a.s. upper (lower) bounds for 4n:k with k fixed
were established (stated); in Sections 8 and 9 we state analogous results for
G(a) and en:k” for various ranges of a,[0 and kn . These are from Kiefer
(1972).

1. ALMOST SURE BEHAVIOR OF ,r n , k WITH k FIXED

In a number of our theorems a key role is played by the first few order statistics.
In this section we will characterize both how small (see Theorem 1) and how
large (see Theorem 2) the sequence e n:k may be.

Theorem 1. (Kiefer) If k >_ 1 is a fixed integer, if na„ -* 0, and either a n


or na n \‚ then

(1) P(„,k <_ a n i.o.) = j according as Y (nan)k


— <oo
t n=1 n o0

The convergence part of (1) does not require a n or na„ to be \,. Recall that
[ n G n (an) ^ k] = [ Snk ^ a.1.

Exercise 1. (Kiefer) Let k> 1 be a fixed integer. Define


1
a* =-- {log n log e n.. log J _ 1 n(log1 n)1+E}-1/k
408 BOUNDS ON THE EMPIRICAL DISTRIBUTION FUNCTION G.

for any fixed integer i ? 1. Use Theorem 1 to show that

0 0
(2) P( fn :k _< a„* i.o.) = i according as e
= {
<
> .
Set i =1 to conclude that

log (1 /nfnk ) I
(3) hrn °o =k a.s.;
ng2 to n k

and this is to be interpreted as a statement about how small e n :k can be.

For arbitrary F we find it handier to phrase our result in terms of maximums,


rather than minimums.

Corollary 1. Let X 1 ,.. . , X„ be iid F. Let k >_ 1 be a fixed integer and let
nP(X > c)-* 0 with c,,/. Then (the convergence part does not require c„ to
be A)


(4) P(Xn:n —k+l > Cn i.o.) = 01 as
n_lJE —
n k- 'P(X > c„)k
0. =

Note (1.1.15) for the proof, and note (10.7.9) for contrast.

Theorem 2. (Robbins and Siegmund, when k = 1) Let k >_ 1 be a fixed


integer. If a„ \ and if either na„ A or lim„.., na „/log 2 n >_ 1, then
)k
(5) P(fnk> a n i.o.) = j according as Y (na exp (—na n ) _ {<oo
t „_, n ”

Oo
Note that [nG„(a„) < k] _ [^„ k > a„].

Exercise 2. (i) Use Theorem 2 to show that

nen:k
(6) h rn log en = 1 a.s. for each fixed integer k;

and this is to be interpreted as a statement about how big n :k can be.


(ii) Give a direct proof of this result. (See Theorem 2 of Kiefer, 1972).

Exercise 3. (Robbins and Siegmund) For i 3 define

-1
na „*=1og 2 n + k log 3 n+ Y_ logj n+(1+e) log ; n.
j=3
ALMOST SURE BEHAVIOR OF f„, k WITH k FIXED 409
(The sum is interpreted as 0 unless i > j.) Then

0 >0
(7) P(en:k ? a* i.o.) = 1 according as e = <0”

Exercise 4. Rephrase Theorem 2 for X n:k (as Corollary I rephrased


Theorem 1).
Proof of Theorem 1. This theorem is from Kiefer (1972). Suppose
E ° (na n ) k / n < oo. Since na„ -> 0, Feller's inequality 11.8.1 implies that for n
sufficiently large

(a) P((n-1)Gn- ► (an)>-k-1)^2(k-1)an-'(1-an)n-k--2(na„)k-I

Now observe (Exercise 6 below asks the reader to verify the trivial details)
that since a n -*0

(b) [nGn(an)^ki.o.]c[(n-1)G„-,(an)-ak-1 and „<_a n i.o.].

But, using independence and then (a),


Co
Z P((n-1)G„_,(a„)?k- lande„<_a„)
n=I
m
a n P((n-1)G n _,(a n )>_k-1)
n=1
k
(nan)
s2 `L,
E-. a(na
n n )k-1-2^
L+
n=I n=I n
< 00;

thus the Borel-Cantelli lemma implies that the rhs of (b) has probability 0.
Hence the Ihs of (b) has probability 0.
Suppose E; (na„ ) k / n = oo. Then Y_w (2'a 2i ) k = oo by Proposition A.9.3 if
na n \‚ and by a minor modification of the proof of Proposition A.9.3 if a„ \.
Define

A; = [at least k of ^Z^+1, ... , 62i +i do not exceed a 2 1 ].

Now for j sufficiently large

P(A) = P(Binomial (2', a 2')? k) ^ P(Binomial (2', a 2i) = k)


1
2 j) k using na„ ” , 0,
kß (2'a
410 BOUNDS ON THE EMPIRICAL DISTRIBUTION FUNCTION G.

thus the independent events A, satisfy P(A3 ) = oo, so Borel-Cantelli implies


P(A ; i.o.) = 1. But [A; i.o.] c [nG n (a n ) ? k i.o.]. ❑

Exercise 5. Show that if a n ", 0, then [nG n (a,,)>_ k i.o.]c [(n —1)G n _ 1 (a n )?
k —1 and en c a n i.o.] for any fixed integer k 1. ?

"Proof' of Theorem 2. The proof of this is very long. See Robbins and
Siegmund for k = 1. That proof has been modified for k> 1; though the proof
is given in Shorack and Wellner (1977), the result is just stated in Shorack
and Wellner (1978). See Frankel (1976) for an alternative method. Galambos
(1978, p. 214) gives nearly the same result for k = 1. Our proof of Chung's
theorem (Theorem 13.1.2) contains much of the same flavor. ❑
Proof of Corollary 1. Now by the inverse transformation (1.1.15)

P(Xn:n -k +l> Cn 1.0.)=P(tn:n -k +1> F(Cn ) l.o.)

= P(;tn:k <_ 1— F(c) i.o.) by symmetry of 4 about z

=
1 0 as <co

by Kiefer's theorem (Theorem 1), using n(1— F(cn )) \ 0. ❑

2. A GLIVENKO—CANTELLI-TYPE THEOREM FOR 11(G, — I)'/II

We consider positive functions

(1) /i \ on (0, 2] and symmetric about t = Z,

and we seek to characterize those cur for which II(G n — I)ryII 0 as n -oo.

Theorem 1. (Lai) Let ii satisfy (1). Then

0 a.s•
(2) lim ^^(G n —I)cufl = accordingas J 1 a[i(t) dt= j <^.
n -= 'X a.s. o l
Corollary 1. If cli "z,, then

( 3 ) li W IIGnII = 100
I
^ accordingas
a.s. g o1 (t) dt = <0.

Proof. By symmetry we may suppose cli \ on (0, 1). Suppose first that
j r (t) dt <co. (We will follow Wellner, 1977a.) Note that

(a) II( G n—I)41I^IIGnc1'11ö+11hP11ö+4,(8)116n—Ill.


A GLIVENKO-CANTELLI-TYPE THEOREM FOR II(G„—lh"lI 411

Now tIi(t) <_ f ä i/r(s) ds for all t implies


(b) 11I(P11äß q(t)dt<E for8=some S . E

Also

IIGiPIIö = sup n - ' E 1 (0,,j(6,) 4G(t)


o<t^s i =I

n
: n 11
(o,sl(^r)4(eI)

s
^a.s. (t) dt as n -+ oo by the SLLN
0

<e by(b);

thus we have

(c) lim IIG„r[rII0 -- s a.s.

Also

(d) ii(S)IIGn-III -* a.s. 0 by the Glivenko-Cantelli theorem.

Plugging (b)-(d) into (a) gives

(e) TIm IRG„- I)çlijI +E+0=2E a.s. for all E>0;


h-W

that is, II (G„ - I) i/1 II a s 0 as was to be shown.


Suppose now that 1ö t/i(t) dt =oo. (We follow a suggestion of P. Gänßler.)
Now

II(Gn — I)iPII '(n:i){Gn(en:t) (mn(en:i —0)J/ 2

Since for all M>0

-JP() -Mn) byA.1.10


M M n=o
(g) _ p ( 4 (en) > Ml 1

-o \(2n) 21 '
412 BOUNDS ON THE EMPIRICAL DISTRIBUTION FUNCTION G,,

the other Borel-Cantelli lemma gives

(h) lim 0( 6^) = 00 a.s.


n-.00 n
Thus (f) implies

(i) lim II(G ^ -I)4iII=co a.s.

The first part of the corollary follows immediately since

U) I Il G n'Y II — IW II II ( 6 n — ')' I '

the second part of the corollary follows from

(k) II6n^II>— +G(Sn:1)


n

and the argument given in (g)-(i). U7


Exercise 1. (Lai, 1974) Let T. = sup {n: II (G, - I)o 11 > E} for (,(t)
[t(1 - t)J -S with 0< S <1 fixed. [Theorem 1 implies that P(T <oo) = 1 for all E

s >0.] Show that ETE <oo for 0 <r<S ' -1. -

Exercise 2. (Lai, 1974) Derive an analog of (2) for him,, -o 51 (6,, - I) 2 1dI.

3. INEQUALITIES FOR THE DISTRIBUTIONS OF II6^/III AND


III/6nIIr,. ,
:

The distributions of IIG n / I l1 and II I/G n I! L., were obtained in Theorems 9.1.2
and 9.1.3, respectively. Now we examine the limit distribution corresponding
to Theorem 9.1.3 and give some inequalities.

Theorem 1. (Chang) For any n >_ I and A> 1

max n n: +,/ i ^ A)
P( l I/G Il ,,, ^ A) = P(1^i5n -1

(1) - (1 n I+ i l I i I ni A 1-n]
e + ^° 1 — 1) -' A e -iA
(2) -* (

il
1=1

(3)
(3) = e - " + 1- e -A* where 0< A * s 1 satisfies A * e - " * = A e -'k .

For A ? 2, these probabilities do not exceed their limiting values.


INEQUALITIES FOR THE DISTRIBUTIONS OF JIG„//II AND III/G,JI4 413
Proof. The formula (1) was established in Theorem 9.1.3. It remains only to
prove (2), (3), and the final assertion of the theorem.
Now
PA = P( max nen:e+l /i A)
Is n-1

_(_)n + )^”>
(a)
(': n n

a" . ;

r=o
Note that

a no -+ e -A and A' e - '" for each i>_1 as n -4 oo.


i.

Hence, the dominated convergence theorem will yield


W

lim Z a n; _ Y- lim a",


n-.ao i _ 0 ;=0 n-.m

and complete our proof, provided we can exhibit a sequence bi for which
W
(b) a"; ^ b ; for all n >_ 0 where b; <x.
r=o

Clearly, a 0 <_ bo = exp (—A) for all n >_ 1. For 1 <_ i <_ (n/,1) we have
( iA ) ' [ ,__ J
an,— (i ) (n) n !'
I I
by Lemma I below, and el e - 'A <_ I
1 —i /n i
1 1
sinceis(n/a)<n/a.
-1/A i3-2
b.
;

For i > (n/A) we have a,=0. Thus (b) holds.


We will now show that for A ? 2 the probability P A does not exceed its
limiting value. But this holds since a" o <_

( n) 1 '— 1
^ andforAz2wehave 1 1— fi I"
i n I! fl J

by Lemma 2 below. We leave (3) as an exercise. ❑


414 BOUNDS ON THE EMPIRICAL DISTRIBUTION FUNCTION G.

Lemma 1. Let n>1 and A ? 1. Then we have

( n Ai ' .ki " - ' exp (1 /l2n) (e,1e - ")`

for0^i^n/a.
)i (} 1 n] 2ar i(1—i/n)

Proof. As in Chang (1955), Stirling's formula yields


n-i
(n)( A ! '[1 ntl

< (^^ .
nJ -
n +i/2
n
e e i /12n
2Tr A i [n —^li] n-i
i.+i/2 e
-` 27I (n — i) "2 e - ^" ,2 n n "-'
ei /12n
1 n —Ai
'

2ir i(1 — i /n)^ n—i


ei /i2n
1 i(,1 —1) " -i
2ir i(1 — i /n) A1 1 n.—i
e i /i2n
< 1
(e,1 e ")'
2-ir i(1 —i/n)

as was claimed. ❑

Lemma 2. Let n ? 1 and A ? 2. Then we have


Ai n

1 -- <_ e -ai forO<-1nn/it.


n

Proof. Let a = 1 / A so that 0< a <_ 2. Then


n

^1 e'/Q = [(l
na ^ ex/Qlx i/n
\ \ a

= [exp (ga(il n))] n,

where

(a) g"(x) = (x /a) +(1 —x)log(1 —x) for0x<_a.


a

Now

1 (1—x) — g ( x l —_— x(1 —a) ( g xl


(b) g ä(x)—a a(1—x /a) to ( l—a ) a(a —x) to `1 —

aJ
INEQUALITIES FOR THE DISTRIBUTIONS OF 1I6„/Ill AND III/G,,Il ,, 415

and

(c) gä(x) = (2a —1— x)/(a — x) 2 .

We must show that g. (x) ^ 0 for all 0 <_ x <_ a and 0< a

a x=a y
1 ^1 x=(2a-1)
g%>0
ga=g'a= 0 —^

ga=0
7
1^^
g^ a <0 ga=—m

0 V ► x
0

Figure 1.

Now for all 0<a<1 we have g a (0) =gä(0) = 0; while gä(0) = (2a —1)/ a 2
is <_0 for 0<a<_. Since gä(x)<0 for all O<x<a when 0<a<_2 i we have

(d) gn(x)<ga(0)=0 forall0<x<awhen0<as :

This establishes the claim. ❑

Exercise 1. (Chang, 1955) Show that (3) is true. [Hint: Use Lagrange's
formula; see Whittaker and Watson, 1969, p. 133.]

Inequalities

It is also useful to have a simpler bound on the probability distributions of


the previous theorem.

Inequality 1. (Shorack and Wellner) For all A >— 1 we have

(4) P( III/Gn jj 6‚:, ^ A) = P( max n n:r+l/ i >_ ,t) <_ eA e - ".


1sisn-1

Inequality 1 will follow as a corollary to the following inequality giving


probability bounds for the suprema of ratios taken over intervals [a, 1] with
0<_a-1.

Inequality 2. (Wellner) For all A >_ 1, 0 <_ a - 1, and 0<_ b < 1 we have

(5) P(JIGn/Ijj .=A) <exp (—nah(A)),


416 BOUNDS ON THE EMPIRICAL DISTRIBUTION FUNCTION G.

(6) P(i!I/Gi A) < exp (—nah(1/A)),

(7) _exp ( nbh(A)),


P( JIG/IDJe^A) < —

(8) P(III/Qs 'I^'v^A)<exp( — nbh(1/A)),

where
(9) h(x)=x(logx -1)+1 for x>0 and
h(x)=xh(1 /x) = x +log(1 /x)-1 for x >0.

The functions h and h are shown in Figure 2.

10

12

10

-2 L 1 2 3 4 5 6 7 8

Figure2. h(A)=A(logA -1)+ 1, h ( 1 /A)=—(IogA +1) /A+1,h(A)=A -1 —log A, andh'(I/A)=


I/A—I+IogA.
INEQUALITIES FOR THE DISTRIBUTIONS OF JIG./1(t AND Ill/G„tIf , 417

Proof of Inequality 2. We first prove (5). Since {G n (t)/ t: 0 ^ t <_ 1} is a reverse


martingale by Proposition 3.6.2, and hence {exp (r6(t)/t): 0< t <_ 1} is a
reverse submartingale for every r> 0,

P( IIG /IIIä? A) = P( Ilexp (rG,,/I)I)ä?exp (rA))


s e - 'A E exp (f r/na)nG,,(a))
by Doob's inequality (Inequality A.10.1).
=e - 'A {1—a(1—e' /n n)}n

<_ e - rA exp (—na(1 — e'/n°)) since 1—x e - X


= exp ( — nah(A)) by choosing r = na log A.

This is statement (5).


To prove (6), note that {—G(t)/t: 0< t<_ 1} is also a reverse martingale;
hence {exp (—rG n (t)/ t): 0< t <_ 1} is a reverse submartingale for every r> 0.
Thus

[II I/G.IIa A]
=[t?AG n (t) for some a t<_1]
_ [—(G n (t)/ t) ? —1/,1 for some a <_ t <_ 1]

(a) _ [ sup (—G(:)/:) ? —1/A]


a5r^i

satisfies

P( III/Gnl)ä ^ A) = P( sup (—G(t)/t) >_ —1 /A)


a<<<I

= P( exp (—rG n / I) II 4 ? exp (—r/A))


eE exp ((r/na)( —nG,(a)))
by Doob's inequality (Inequality A.10.1)
/na) }n
= e '/A {1 —a(1 —e—r
e exp (—na(1 — e - n°)) since I —x <_ e

= e -n° ”/ (1 / A) by choosing r= —na log (1/A).

This is statement (6).


Now (8) and (7) follow from (5) and (6), respectively, in view of two easy
event identities: for all A >— 1 and 0<_ b - 1

(10) [III/Gn`II,^A]=[IIGn/IIU,/AAA]
418 BOUNDS ON THE EMPIRICAL DISTRIBUTION FUNCTION G.

and
(11) [IIGn`/IIIb? A] = [ III/ 6 nhIb^A]• ❑
Exercise 2. Prove the event identities (10) and (11).

Exercise 3. Show that (5) can be improved to


a
P(II6,,/IIIä^A)<exp —n
l—a h(A) .
[Hint: An argument paralleling the proof of Inequality 11.1.1 in the next
chapter works.]

Proof of Inequality 1. Taking b =1/n in (7) and noting that

(12) [IIG /III'/h?A]=[III/GnIIf,.:,'A]


yields
P( III/ 6 ,IIi,:, A) =P( IIGn'/III 1i ,,^ A)
<_ exp (-1(A))= e,1 e - '^.

This is statement (4). ❑

Exercise 4. Show that the event identity (12) holds.

We shall soon see that Inequality I is sufficiently strong to permit characteriz-


ing the upper-class sequences of 111/6,, II ‚

Exercise 5. Now {nerv; + , /i: 1 is n — 1} is a reverse submartingale. The best


we could get from this was

^ A) <_ inf E( e .:z)/ e ra < 14,1 2 e -A


P( II I1 6 ,E II r >O

for all A >_ 1. Verify these claims. (It should be noted below that this inequality
is not strong enough to permit characterizing the upper-class sequences of
III/ 6, II
Exercise 6. Generalize (10) and (11) by replacing I by a monotone
function g.

4. IN-PROBABILITY LINEAR BOUNDS ON G.

The Glivenko-Cantelli theorem (Theorem 3.1.3) says that as n gets large, the
empirical df becomes nearly linear in the sense that I 1 G„ — III 0 as n -* oo.
IN-PROBABILITY LINEAR BOUNDS ON G,, 419
I

t/M

I Mt
0 Figure 1.
0 I

This result is not nearly as strong as what is actually true. In this section we
show that we can choose the slopes on the lines in Figure 1 so that the narrow
central strip with pointed ends will include G. with very high probability.

Inequality 1. Given e >0 there exists 0< M = M < 1 and a subset A„ of 12


M E

having P(A ) > 1- E on which


ne

(1) 1 -( 1-t)/MSG„(t) t/M and Mt<G'(t)<1-M(l-t)


for O<t<1,
(2) Mt<G„(t) for all tsuchthat0<6„(t)
and G„'(t) <_ t/M for t z 1/n,
(3) Cn(t)< 1-M(1-t) forall tsuchthat6„(t)<1
and 1-(1-t)/M<G'(t)fort^1-1/n.
If n exceeds some n we may also require
E

(4) IIGn — I II _ G ` — I II < E.


Proof. This is an immediate consequence of Theorem 9.1.2 and Inequality
10.3.1. The final statement comes from the Glivenko-Cantelli theorem
(Theorem 3.1.3). This was stated and used by Pyke and Shorack (1968). ❑

Having established the in-probability linear bounds of Inequality 1, we note


that a.s. linear bounds do not exist. Indeed, (1.3) and (1.6) imply, respectively,
that
(5) lim II6 n /III>_ tim (1/n '„ : ,)=oo a.s.

and
(6) lim 1 I/G„ 1 ? iii neni2 = oo a.s.
420 BOUNDS ON THE EMPIRICAL DISTRIBUTION FUNCTION G,

5. CHARACTERIZATION OF UPPER-CLASS SEQUENCES FOR


llG./Ip AND III/GnII :,

In the previous section we obtained in-probability linear bounds for G. and


we saw that a.s. linear bounds do not exist. In this section, the rates at which
JIG„/III and III /G„ jI C , blow up are determined. Theorems 1 and 2 are from
Shorack and Wellner (1978). See also Mason's exercise (Exercise 10.7.1).
Corollary 1 is from James (1971).

Theorem 1. Let nA„ 1'. Then

(1) n n = f <^
P(IIG,/III ^ An i.o.) = j 0 according as Rj l ^

Theorem 2. Let ,l„/n \ and suppose either A. 2' or lim„_ o A„/Iog 2 n >_ 1.
Then
0
(2) P(I) I/C n II ^n „ A n i.o.) = j 1 according as nj I n -00

Corollary 1.
log 16,, I II =1
tim a.s.
n-W
gz
Corollary 2.

lim I1 I/fin 11 el -‚ =1 a.s.


n^0 log 2 n

Remark 1. Comparing Theorem I with Kiefer's theorem (Theorem 10.1.1)


shows that for nA„ 1' cc we have

(3) P(1 /(nen:1)? A n i.o.) = P( IIGn/III ^ A n i.o.).


Also,

(4) IIGn/III=1k-
maxn kl(nn:k),

and the limit in (16.1.3) is \ in k. These two facts indicate that it is the small
values of,,, that control the largeness of IIG„/III. Note, however, that (n,, : ,) - '
and IICn/III have different limiting distributions.

Remark 2. We note that

(5) III/Gn 11 f‚:, =1i'k n nSn:k+I/ k,

and (10.1.6) being constant in k suggests that the k=2 term controls the
UPPER-CLASS SEQUENCES FOR 16 n /III AND III/G„jI% , 421
maximum. In fact, Theorems 10.1.2 and 2 show that for A n /n \ and A n 1'

(6) P(n n:2^Ani.o.)= P( II I/G, II A,, i.o.).

Again, the limiting distributions are different. Note (7) below.


Proof of Theorem 1. Suppose l 1/ nA n <X. Let n k = (a k ) where a> 1 is fixed.
As in proposition A.9.3, we know that
=1
(a) — < 0c.
k=1 An k

Now define
Ak=[nk <n
max5 nk+l
IIG /I II^an];

and note that monotonicity of nG n and nit,, implies

P(A k )=P( max IlnGn/III>_nAn)


nk <n nk+l

P( Ilnk +lenk +I/III ^ nkAnk)

= P( II G nk+l/ I II ^ nkitnk/nk+l)

nk+l
= nkltnk

(b) a
l ,
n nk

where the last equality follows from Daniels' theorem (Theorem 9.1.2). Com-
D
bining (a) and (b), we have Y_ P(A k ) <oo; so that
(c) P(Akl.o.)=0
by the Borel-Cantelli lemma. But (c) implies that P( IIGn/ III ? A. i.o.) = 0.
Now suppose 11/nA n = m. Let

An = { IG,,! I II ^ A, ]

[0"1( r) ,
= s u p n. 1 nil n
o<1sI I

1[0,^^(n)
su p > ndn
0<,I I J
ii
=E nit,,

(d) = Dn.
422 BOUNDS ON THE EMPIRICAL DISTRIBUTION FUNCTION G.
Since the events D n are independent and

Y_
[' P
n=I D —
l /n) Y
^`

n1
nAn1 = ao

the Borel-Cantelli lemma implies P(Dn i.o.) = 1. But since D. c A n , we thus


have P(A n i.o.) =1. O
Proof of Theorem 2. Suppose
z
(a) E1 -n
n
e <oo.

We will use the subsequence

n; _ (exp (aj/log j)) forj ? 2

with a > 0. We define

An=LMn—An ,

where

(b) Mn =III/GnIIfn:i= max


1_isn
n^n:i +l/i;
and we let

B; —[M,,. ? (n;/n; +1)An ;+ ,].


Note that

(7) Mn/n = max fin , ;+ , /i is a \ function of n.


I^isn

To see this, suppose „ +t falls between fn;k and 6n:k +1 for O^ k <_ n; then

0 Sn:I Sn:2 ' ' , Sn:k T Sn:k +l ' * * Sn:n

Sn +1

bn+l;i +l Sn:i +l
= forl<i^k -1,
f i
yn+I;k +1 _ bn+] C en ;k+1 and
k k k

Sn +l ;i+l _ ^n:i Sn;i


—<_— fork+1-5fsn,
i i i -I
UPPER-CLASS SEQUENCES FOR SIG ^ /III AND III/GH,, 423

so that (7) is established. The monotonicity of (7) and the fact that ,1 n /n is
\ yield
, I.I I ,+l
I'
n.0 Acn U [Mn /n^A 1 /nj +lJ = BJ.
n=nj n=nJ

Thus P(A„ i.o.) =0 will follow from the Borel-Cantelli lemma once we show
that

(c) P(BB)<cc.
It thus suffices to prove (c).
Let do = A n A 21og 2 n. Then

(d) do/n ",, d o - oo, and Y d" e


n., n
-°^ <o0
since d,, exp (–d) <_ A 2, exp (–A„)+(21og 2 n) 2 exp (-21og 2 n). Since

BBc D;=[Mn;?(n;ln;+1)dn,],

it suffices to show that


Co
(e) Y_ P(D;)<°°•
J=I

Now by Inequality 10.3.1 of Shorack and Wellner, we have

Y_ P(D) i5 Y_ e n—n n
j =2
d ,
j=2
;
j+l
;
n;+ exp (–dn ,,) exp \(1
n +1} dn;.1 /

: (some M.) d„ by (A.9.9)


j 2

<oo by combining (d) and (A.9.10).

This completes the convergence half of the proof.


Note that

III/Gnll ;^:i

and from Theorem 10.1.2 we have that P(ne :2 A n i.o.) =1 in case (a) fails.
El

Exercise 1. (Maximal inequalities) Show that for any q positive on [a, b]

nIIGn/gl is "inn.
424 BOUNDS ON THE EMPIRICAL DISTRIBUTION FUNCTION C;

If q is positive and >' on [a, b], then


^Iq /G "Ilä /n is \, in n.

These are general versions of observations made in the proofs of Theorems 1


and 2.

Behavior on [ a" , 11

We consider briefly limit theorems for IlG n / I II Q . , Ij(G — 1)/ I II ä n , and associ-
ated rv's as a" J,0. The first claim of (8) below is from Chang (1955); the other
results are from Weilner (1978b).
If a" J 0 and na" -- 00, then
i > —
I
(8) I -D 0 and I -^ 0 as n co.
a^ a.,

We now turn to a.s. convergence. We assume throughout


(9) a . and a n = (c" log t n)/n defines c".

If c" -+ oo, then

(10) III" /lila,, III /4 "Ilän, IIGn'IIIIa^, III/GjI ., all 1 a.s.


If c. Too, then we also have
6 G
(11) li m 2a.s. and li i2 a.s.
v( " I I)*II a,, m^^l( " J I I)*I^ a^ =
The lim sup of in the second expression of (11) corrects a typographical
error in Wellner (1978b).
Let c" = c s (0, x). Then

(12) m IIG /III ä n = some ßC a.s. and


li

li m III /6 "Ila n =some 1 /ßc a.s.,

where ß'c \ from oo to I as c 2' from 0 to co and 1 /ßC equals co on (0, 1] and
\A from oo to 1 as c >' from 1 to co. Also

(13) m IIGn'/Illä n =some 1/y, a.s. and


li

li m 11I /G"' II ä, = some a.s.,

where 1 /y and y both \ from co to 1 as c>' from 0 to co. From (12) and
(13) the behavior of II (4D" — ')/' II,, and II (G „' — I)/I II ä . can be determined.
UPPER-CLASS SEQUENCES FOR JIG„//II AND I^1/G„ 14' , 425
If c,-+0, then

(14) IIG./III.,, III/G.11.„, IIGn'/ItI4,, III/G 'IIä.alllimt000a.s.

The rate at which these go to oo is examined by Wellner (1978b). Note the


figures in Sections 10.8 and 10.9.

Exercise 2. Prove (8).

Exercise 3. Use Inequality 10.3.2 to prove the results stated in (10)-(14).

Another Extension

We report here results from Mason (1981b). He shows that for a„>-0 and 7
we have for each 0 v 2 that

(15) P ( [I(1-I)]'-"
n" (G„-I) „i.o. l
II ,a /

= according as ,1, _"^


111 n=, na„ 1 -00

For v = , this is a result due to Csäki (1975) that will be considered in detail
in Theorem 16.2.3. For v = 0, this is equivalent to Theorem 10.5.1 of Shorack
and Wellner. For 0< v<, this is due to Mason (1981); he uses the method
of proof of Shorack and Wellner, coupled with his following extension of
Daniels's theorem (Theorem 9.1.2).
For each 0 <_ v < Z there exists a constant c" such that for all A> 1 we have
Il a _
(16) Pi n - 2(1+v)A}
" 05-: and
l o J
" C - I "

where a on °1 and a=1An '(vA)'^" for 0<v<.

Exercise4. Establish (15) and (16). [See Mason, 1981b and Marcus and Zinn,
1983 for the two parts of (16).]

Exercise S. Show that for all 0 ^ vs,

(17 ) n -W
f
lim log " 1” - I
g Ln II [I(1- I)]'-"
l = 1-v a.s.

See Mason (1981).


426 BOUNDS ON THE EMPIRICAL DISTRIBUTION FUNCTION G.

6. ALMOST SURE NEARLY LINEAR BOUNDS ON G. AND G-,'

We now improve the result of Section 10.4 to an a.s. bound by using bounding
functions having slopes 0 (lower bound) and co (upper bound) at t = 0.

Theorem 1. (Wellner) Fix 0< 8< 1. Let e >0 be given. Then a.s. for n
exceeding some we have

(1) (1– )t' <(1 + e)t' -s for &:, < t

and

(2) (1– s)t' +s <Gn-'(t)<(1+e)t' -b for I. t.


n

Proof. Equation (1) follows immediately from Corollary 10.2.1 with i(t)=
t 1 Theorem 2 below; then (2) follows from the trivial identities of
Exercise 10.3.6. This is proven via different methods in Wellner (1977b).

We stated Theorem 1 in terms of powers, because of their convenience;


but more nearly linear bounds using logarithmic functions are easily obtainable
from Corollary 10.2.1 and Theorem 2.

Theorem 2. (James; Wellner) Let g(t) log t (e e /t). Then

1
(3) lim inf = lim — =1 a.s.
n_'°° fn:t5ts1 Jig n-.00 ^n f:i

and

G n'
(4) lim = 1 a.s.
Ig vn

Note that (10.2.3) and (3) provide rather tight a.s. nearly linear bounds on
G.
Proof. See James (1971) for a preliminary version of this result; this is from
Wellner (1978b).

Since jjI/(gG n )I_,? 1 /[g(1)G n (1)]= 1, we only need an upper bound on


(3). Define a n = n ' log n. Then (let 11 C = 0 if c> a)
-

(a) ^^I/(S^n)^^ ^:^ ^ III/(gGn)jjf,-:, v III/(gGn)1Ia,,.


ALMOST SURE NEARLY LINEAR BOUNDS ON G. AND G-.' 427

Now

li m III/(gGn)II'.-:, ^ li m III/G I"::,/g(an) since g is \

li [ II I/Gn 11e.:‚/log2 n][(log2 n)/g(an)]


= lim [ ICI/G II£n: ,/log 2 n] since (log e n)/g(a) 1

(b) =1 by Corollary 10.5.2,

and

i llI/(gGn)II I 1i DI/G II ä„ since g>_1

(c) =1 a.s. by (10.5.10).

Plugging (b) and (c) into (a) gives the upper bound. Thus (3) holds.
We note that (10.5.10) implies

(5) Ti III/(gGn)II4n =0 a.s. and li m JIG-'/(Ig)Ilä^ =0 a.s.

provided

na„
(6) lira >0 and lim d„ = 0.
n-.001og2 n n^ 0

Note that (3) implies i/i = I/g is an a.s. lower bound for G„ on [fi n ,, 1].
Thus i/i' is an a.s. upper bound for G' on [1/n, 1]. We claim that

(7) i, '(t)<_h(t)=tg(t)=tlog 2 (e e /t) for 0<—t-1.

Now (7) is equivalent to t<_ Or(tg(t)) = tg(t)/log e (e e /tg(t)), which is


equivalent to log 2 (e e /tg(t))g(t), which is equivalent to 1/(tg(t))<1/t,
which is equivalent to g(t)> 1; and this last is clearly true. But (7) implies h
is also an a.s. upper bound on Qi;„' on [1/n, 1]; that is, (4) holds. ❑

Exercise 1. Show that (10.0.5b) and (10.0.6a) follow easily from Theorem 2.

Exercise2. Show that Ei/i(e)=J /i(t) dt<ooif and only ifJ [/i - '(x)] dx<oo.

Corollary 1. We have

a.s.
(8) him t/(n+t)supn/(n+l) t(1 — t) logt [1/t(1 — t)] 1
428 BOUNDS ON THE EMPIRICAL DISTRIBUTION FUNCTION G,

and
^n'(t) —t
(9) lim sup , lim nfn: , = oo a.s.
n- 1/(n +1)< «n/(n+l) t(1 — t) n-.00

Proof. The lower bound in both (8) and (9) follow from (10.1.6) by replacing
sup by t = 1/n. The upper bound in (8) is mainly just (4); but breaking [1/n, 1]
into [1/n, e] [to be treated via (4)] and [e, 1] (to be treated via Glivenko-
Cantelli) is necessary because of the (1 — t) factors in (8). ❑

Open Question 1. Determine the a.s. behavior of G „' —1 more carefully than
in (6) and (7).

7. BOUNDS ON FUNCTIONS OF ORDER STATISTICS

Theorem 9.1.2 of Daniels concerned

(1) IIG /III =max {i /(nfn:i): I C i:5 n}.

Kiefer's theorem (Theorem 10.1.1) was concerned with large values of

(2) 1 /(ne,,:t).

The almost sure behavior of (1) and (2) is similar, as noted in Remark 10.5.1.
We now consider the more general quantity

(3)
1
max Ig( f" : ' )
Isk„ na n

for g \ on (0, 1) and a„,'.

Theorem 1. (Mason) Let a„ >_ 0 be >' and let g>0 be \ on (0, 1). Then
the following three statements are equivalent:
W
(4) E P(g(f)>na„)<oo.
n=1

For every e >0 there exists 6>0 such that

tim max
ig(e.^:0 < e a.s.
(5)
n-. I^i^(nö na„

For some (or for every) k„ having k„/n 40,

(6) um max ig(e.:0 = 0 a.s.


n -ao ► Sisk, na n
BOUNDS ON FUNCTIONS OF ORDER STATISTICS 429

On the other hand, if


co
(7) E P(g(f)>nan)=cc
n=1

then even

(8) lim 8(f"' =oo a.s.


)
n-= nan

Proof. We follow Mason (1982). Clearly (5) implies (6) for all kn >_ 1 having
kn /n+0.
If (6) holds for any single kn , then it holds for k" = 1; that is,

(a) lira Xn , n /(na n ) = 0 where X g( ).


n-W

Thus by Corollary 10.1.1 to Kiefer's theorem

(b) oo> Y_ P(X>nan)= Y- P(8(6)>na")= Y_ Pn= Y (nPn)


n=1 n=1 n=1 n=d n

where p" ° P(g(e)> na n ) N 0 satisfies np n -+0 by (ii) of Proposition A.9. 1. Thus


(4) holds.
We now suppose (4) holds and verify (5). We consider two cases.

Case 1: Eg(f)<cc.

Since g N and a7 we have

lg ) 1
( ( " s)
(c) max "^ — a,n a_1 g n:f -
<r<(ns) na n •

Thus we have

ig( ) 1
1 (ns)
(d) lim max
n-.ao l^;s(n8> na n j <—

um —ng( )
i_I
a^ n^ao

1_ __ 1
<_—him — I g(^1)110,2s1(,) since n:(nS) ^a.s. s
a l n- = n f=1

1
^— by the SLLN
a, E[g(6)l[o,2,](6)]
(e) <s for small enough 6

establishing (5) in case 1.


430 BOUNDS ON THE EMPIRICAL DISTRIBUTION FUNCTION G.

Case 2: Eg(6)=oo.

Since g,,,

(f) max
ig(fn:i)
1sr (ns) na n
C— -j=1

]
g(Sn:i)
na,,
_ L.i =1 g ( Si)
'

na,,
'

and, using (4), this

(g) - 0 a.s.

since iid Xi 's having E IX I = oo satisfy

1X
1 +. .+x"I
(9) tim
n^m bn
<oo,
= 0 a.s. according as Z P(IXn ^> bn ) = j
loo n=1 l

where b n = na" >_ 0 satisfies b n /n 1 [see Feller's theorem (Theorem 2.11.2)].


Thus (5) holds in case 2 also. [Note that ? can replace > in (9).]
Suppose now that (7) holds. From (9) we see that this implies

(h) Y_ P(g(e) ? Mna n ) = oo for all M > 0.


n=1

Thus g(C, :I )?g(en ) and YP(g(tn )>_Mna,,)=oo so that Borel-Cantelli gives

P(g(^n:1 )/(na„)>Mi.o.)=I for all M >0;

that is, (8) holds. ❑

Corollary 1. If a n 7 oo under the hypothesis of Theorem 1, then (5) may be


replaced by

(10) tim max lg ( "" ) =0 a.s.


n-" 1^isn na”

Proof. Replace (d) by

1 g
i(n:t) [ 1" ]

(d') lim max <_ l im — lim — E g(:


ni )
a)
n- l^isn n i=1
na" n-.o' a,, n- -

= 0 • Eg(f) = 0 a.s. by the SLLN.

This takes care of the case Eg(f) <oo. The old proof works in case Eg(f) = co.
BOUNDS ON FUNCTIONS OF ORDER STATISTICS 431
Corollary 2. Let a n >_ 0 be ‚"co and let g>0 be \ on (0, 1). Then

(11) lim max i9(fn ) =0 " ifEg(^)<oo


n-. D isisn na,

and

Jg\Sn:i )
(12) lim max = oo if and only if Eg(^) = oo.
n-.ao 1<isn f

Proof. The result (11) and half of (12) are established in our work above.
Though we do not need it, we remind the reader that for any ry X >— 0 [and
in particular for X = g(e)]

n o0
(13) P(X>n)<EX< P(X?n).
n=^ n=o

The other half of (12) follows from (f) (with S = 1) in the proof of Theorem
1, (9), and (13). ❑

Exercise 1. Show that if a n >_ 0 and na„ / oo, then

(14) tim llGn /jll = j 0 according as Y 1n = j <oo


n -.W a loo n_, na l=00

Also (Mason, 1982)

(15) • lim IR(Gn—I)/III`/1092n=e a.s.

Open Question 1. What is the limiting distribution of

max{ig( n , ; )/n:l<_isn} and max{h(i/n)g( 1 ):1<_i<_n}

for various g and h? Note Steck's result in Section 9.3.

Exercise 2. (Strength of fiber bundles) Let X 1 , ... , Xn be iid F on [0, cc)


with ordered values Xn:, <_ • .. < Xn:n. If X 1 ,. . . , X,, represent the strengths
of fibers in a bundle, then the bundle strength can be represented as

Z = Isisn
max (n— i +l)Xn:; .
Z

Suppose 0 = sup o< , < , (1— t)F '(t) is finite. Show that Zn / n -4 0 a.s. as n -> o0
-

if and only if EX < oo.


432 BOUNDS ON THE EMPIRICAL DISTRIBUTION FUNCTION G.

Exercise 3. (Sen, 1973b) Let X 1 ... , X„ be lid Fon [0, oo) with 0< EX 2 <
,

oo. Let > 0 be given. Then there exists 0< C < oo, 0 < p e < 1, and n < co such
F

that

P( sup xjF „(x) — F(x)l > e) s Cp for all n ? n E


„ .

0<x<oo

Moreover,

P(IZ„-9l> E) <Cpf forall n >_n E

for Z. and 0 as in Exercise 2.

8. ALMOST SURE BEHAVIOR OF 7L„(a„) /b„ AS a„ J.0

Let b„ = 2 log e n as usual and let

(1) 7L„(t) = U„(t)/ t(1—t) for 0<t<1.

Let a„ 0 throughout this section, and suppose

(2) a„ =(c 1og 2 n)/ n defines c„ where

In this section we will present without proof results concerning the almost
sure behavior of 7L „(a„)/b„. The corresponding results for the quantile process
will also be presented in Section 9. All material is taken from Kiefer (1972).
Recall that the ordinary LIL yields

(3) lim „(a )/ b„ = 1 a.s. for each fixed 0 < a < 1.


n— co

In case na „T we state that

(4) lim 71n(a „)/b„ = 1 a.s. provided lim c„ =m;

thus a lim sup of I is maintained provided a„ does not decrease too fast.
The most important special case occurs when lim„ c„ = c. To describe the
results we must introduce the function (which arises in exponential bounds
for binomial rv's)

(5) h(A)=A(logA -1)+1

graphed in Figures 10.3.2 and 11.1.1. For each c>0

(6) let ß. > 1 solve h(ßC) = 1/c,


ALMOST SURE BEHAVIOR OF Z, (a„)/b„ AS a„ ,1. 0 433
and for each c> 1

(7) let P. < 1 solve h(ß)= 1/c.

Then
(8) lim 7L-(a n )/b n = c72(ßC —1) provided lim cn =c>0.

Also,
(9) lira Z(a)/bn =V 7(1—ß) provided lim c n = c> 1.

These limiting functions are graphed in Figures 1 and 2. (The graphs of cßC

9
goes to

goes to I
0 goes to 1

0 1 2 3 4 5

Figure 1.

7
goes to

1 goes to 1
goes to I

C
0 1 10
Figure 2.
434 BOUNDS ON THE EMPIRICAL DISTRIBUTION FUNCTION 6„

and cf3 that are also shown are geared to the statements of Theorems 1 and
2.)
It is trivially true that

(10) force <1.

We can combine (4), (9), and (10) to yield Figure 3; the solid line gives an
a.s. value of lim sup„ Zn(a n )/bn , while the dotted line provides an a.s. upper
bound on this lim sup.

✓ /2(1 -PC) 1

0
loge n clogen cnlogzn
n n n 2
with c>1
with c,, -+ o

Figure 3. Graph of lim„ Z „(a„) /b,,.

The behavior of 7L n as c,, - 0 is more complicated than that of 7L n . Suppose


an 1 0 9

(11) cn -^0 and log(1/cn )/log2 n—somedn ,10 asn-*oo.

Then

log (1 /cn)
(12) li m7Ln(a n) cn log2 n 1 a.s.
to gz n =

Figure 4 below should provide a quick visual summary of key special cases
of (4), (8), and (12); values beyond oo on the vertical axis are used to indicate
the rate at which Z(a)/b n converges to oo •

Exercise 1. Use Section 6.1 to establish that lim sup,, Z (a n )/b n =0 a.s. when
na n = (log n) with e > 0.

Equations (8), (9), and (12) above are special cases of the following
theorems.
ALMOST SURE BEHAVIOR OF NORMALIZED QUANTILES AS a„ 1 0 435

log n
21og 2 n

log t n
2

log t n
2c

0 '-•
0 1 1 c C C 10 92 11 c„log 2 n 1
n(logn)'*` nlogn nlogn n n n 2

Figure 4. Graph of lim„ Z„(a„)/b„.

Theorem 1. (Kiefer) Suppose a„ 10 and 1 < lim„ c„ <_ lim„ c„ < 00; and sup-
pose there exists kn t such that kn = [1 + o(1)]c„ßc, 1og2 n. Then
nG„(a n )
(13) lim= 1 a.s.
„—w cn /3cn 1og 2 n

Theorem 2. (Kiefer) Suppose a„ J.0, lim a c,, <o0, and lim a log (1/c„)/
log t n = 0; and suppose there exists k„ T such that k,, = [1 + o(1)]c„ß + log 2 n.
Then

(14) lim n G ,(a ) — 1 a.s.


"

n-.w c„ß+C, log e n

9. ALMOST SURE BEHAVIOR OF NORMALIZED QUANTILES


AS a„ ]. 0

We now state the results corresponding to those of Section 8 for the quantile
process. We have

(1) lim V " (a) =1 a.s. for each fixed 0<a<1.


n .o a(1—a)b„

Once again, this lim sup is maintained if a„ does not decrease too fast. Thus
we state

(2) lim a = 1 a.s. provided c„Too.


n^m (i (aä ) )b n
436 BOUNDS ON THE EMPIRICAL DISTRIBUTION FUNCTION 6,,

In case lim e c„ = c in (0, oo), we will need to introduce some notation. For
each c > 0,

(3) let yC > I solve h(y.) = yC /c [or equivalently h'(1/ yC) = 1/c]

and

(4) let y'C < 1 solve h (y') = y' / c [or equivalently 1 ' (1/y) = 1/c].

7 ^s to

1 Des to 1
Des to 1
0 C
01 in
Figure 1.

10

EI
I
N
0 1 2 3 4 5

Figure 2.
ALMOST SURE BEHAVIOR OF NORMALIZED QUANTILES AS a„ 10 437

Now /7(1/A) = (1/A) —1 + log A, and it is apparent from Figure 10.3.2 that y-,
and y are well defined. Now

(5) lim V/ n(a„) _


n_'°° an\1—an)bn 2 \) c V ' i -1) a.s.
+ provided c n -^c>0.

Also,

(6)
V(an) _I 1— yc_) a.s. provided c„ c>0.
lim an(1—a)6„ 1\

See Figures 1 and 2 opposite. (The other functions graphed are the a.s. lim inf
and lim sup of nG-n'(a n )/log e n, which is the form in which Kiefer (1972)
states the theorem).
If a„ _> I/ n, then

(7) lim —
n- Va„ (1 — an

) bn 2c„ = 1 a.s. provided c n 0.

The result for V„ is less satisfactory when c„ -*0. We have that

(8) Lim c„ log (nG;'(a„)/log e n) = —1 provided c 0 and na

this still yields

Vn(an)
(9) li m a„ provided c„ J, 0 and na n T o0.
( a„) b„ = 0 a.s.
CHAPTER 11

Exponential Inequalities
and 11 - / q 11 - Metric Convergence
of U n and 4/ n

0. INTRODUCTION

In Section 3.7 it was shown that if a symmetric q I on [0, 2] satisfies

q(t)2dt <co,
(1) J0

then for the special construction of U. it follows that both

(2) II(U —U) /qll -+‚' and II('' —V) /qjl - pv 0 as n -goo.

Our main goal in this chapter is to prove sharpened versions of this type of
theorem for both the uniform empirical process U,, and the uniform quantile
process 0/„. The theorem for U,, was originally proved by Chibisov (1964),
and was reexamined by O'Reilly (1974). For symmetric functions q which
satisfy both q? on (0, 2] and t -112 q(t) u on (0, Z], the Chibisov-O'Reilly
theorem refines "(1) implies (2)” to the assertion that (2) holds for the special
construction if and only if

(3) limq(t)/ tlog 2 (1 /t) =oo.

If a symmetric q satisfies only qT on (0, Z], then (2) holds if and only if
1/2

(4) T(q,A)= t-`exp(—Ag2(t)/t)dt<oo for all A>0.

438
UNIVERSAL EXPONENTIAL BOUNDS FOR BINOMIAL rv's 439
Our approach to proving these 11 • / q 11 convergence results for U,, and V„
will be to first develop good exponential bounds for binomial rv's and for
uniform order statistics. The exponential bounds are then extended to neigh-
borhoods of 0 for the processes U. and V,,, and these inequalities yield, in
turn, the probability inequalities for U' / q and IV n / q (I ä, respectively, upon
which our proofs will be based.
In Section 6, we establish the convergence of U. and other processes in
weighted', metrics.
In the last three sections of this chapter we collect facts concerning moments
of order statistics, the binomial distribution, and exponential inequalities for
gamma, beta, and Poisson rv's related to those of Sections 1 and 3.

1. UNIVERSAL EXPONENTIAL BOUNDS FOR BINOMIAL rv's

A careful analysis of the detailed behavior of the empirical process will require
tight exponential bounds on the tail behavior of the normalized Binomial (n, t)
ry U (t) ; we suppose for conveience that 0< t<_. These can be derived from
,,

the inequalities and methods of Section A.4. For the short, well-behaved
left-hand tail we obtain Hoeffding's bound (6) which is strong and clean. For
the long and troublesome right tail we have the necessarily much weaker bound
(3) of Bennett.
We will extend these results to intervals of is for the process U,,. The analog
for U will also be considered. These results are of full strength only for very
short intervals. In the next section we will piece strong results on short intervals
together to get strong results on any interval.


12 8 -1 0 1 2 5

Figure 1. The functions h and ii'.


440 EXPONENTIAL INEQUALITIES AND 0•/qII-METRIC CONVERGENCE

The basic ry we will deal with is the normalized binomial

(1) X (Binomial (n, p) —np) /✓ n, with q i —p.

We will seek to bound P(±X ? A). Our bounds will be expressed in terms of
the important function

(2) ii(A)=2h(1+,A) /,A2 with h(A)=,1(logA -1)+1.

We will study 4i carefully in Proposition 1 below, immediately following our


main inequality. Note Figure 1.

Inequality 1. (Exponential bounds for the binomial ry X) Let 0 :5 P:5 21 -


(i) For all ,1 > 0 we have

P(X ?A)<inf e -•A Ee'x


r >O

(3) <— eXp . 2 Pq 0 \pes/


np ^1
= exp — q h 1 +p^ (Bennett)

Az
(4) <exp ( — ) (Bernstein),
✓n
2[Pq+gA/(3)]
and for all A >0 we have

P(—X ? A) ^ inf e - " E e - 'x


r >o

/ A2 / \\
(5) exp I —2 41 - -= )I I
-
(Wellner)
\\ p \ p n///
^z
(6) exp I — p
2 q ) : [expression (3)] (Hoeffding).
(

(ii) Since AiIi(A) is 1' by (9) below, we have

P(IBinomial(n,P)—nfl>A)^2 exp (
A2 ^il A ))
✓n J 2p \ ✓ n p

for all0<p<—p^2

and all A > 0.


UNIVERSAL EXPONENTIAL BOUNDS FOR BINOMIAL rv's 441
Proposition 1. The most important properties of +1 are

(7) +y(A) is j for A ? —1 with i(0)= 1,

(8) t'(A) — (2 log A )/A as A -* X,

(9) Ai'(A)isTforA>-1,

(10) /i(A) ^ 1/(1+A/3) for A - —1

11-6 if0<A<38and0<-6<1
(11)
6(l-6)/A ifA>38and0^6<1,
(12) 01—^i(A)<A/3 if0^A3,

(13) 0:/J(A) - 1<—IA A if-1^A 0,

(14) +/"( 0 )= — , 0(-1)=2, +i'( -1)=—^,


2 J _ n

^(A)=1-3+ A10+...+ (n( forlAJ<1,


(15)
+2)(n+l)+...

(16) A+Ji(A) equals 0 at 0 and —2 at —1, and has derivative 1 at 0.

Proof of Proposition 1. Note that

(a) h'(A) = log A.

For (7) we note that g(A)=+/(A-1)/2=h(A)/(A-1) 2 has g'(A)=


[(A+1)/(A-1) 3 ]f(A) where f(A)=(2(A-1)/(A+1))—log A. Now f(1)=0
and f(A)=[4A—(1+A) 2 ]/[A(1+A) 2 ]=—(I—A) 2 /[A(1+A) 2 ]<0. Thus f(A)
is>0(is<0)forA<1 (forA>1). Thus g'(A)<O for allA?0, except g'(1)=0.
Thus Ii(A) is J, for A ? —1. That 111(0) = I is trivial.
The result (8) is trivial.
For (9) we note that g(A)=(A—I)Ii(A-1)/2=h(A)/(A-1) has g'(A)=
[(A — 1)—log A]/(A — 1)2. Thus g'(1) = z'- and g'(A) > 0 for the other A's exclud-
ing —1. Thus Agi(A) is T for A >_ —1.
For (10) we note that h(I+A) and g(A)=(A 2 /2)(1/(1+A/3))=
3A 2 /[2(3+A)] satisfy h(1+0)=g(0)=0, h'(1+0)=g'(0)=0, and h"(1+A)=
1/(1+A)_>27(3+A) 3 =g"(A) for A?-1. Thus (10) holds. From (10) we
trivially get (11). Verifying (15), (14), and (16) is easy. We obtain (12) easily
from 11'(0) = 1 and «i'(0) = —3, while (13) comes from connecting the points
(-1, 2) and (0, l) on the graph of 0 by a straight line. ❑
Proof of Inequality 1. Statement (3) is just a direct consequence of Bennett's
inequality (Inequality A.4.3), and then (4) follows trivially from (10). Inequality
(6) is immediate from Hoeffding's inequality (Inequality A.4.4). Thus we need
only prove (5) here.
442 EXPONENTIAL INEQUALITIES AND 11 •/qjl-METRIC CONVERGENCE
First note that —X/ Jn _p with probability 1, so the probability in (5) is
zero for A > Jp and we may assume A s f p without loss. Now

P(—X >A)<_P(e r" >_e'A ) -

(a) e -"AEe-rx= exp (r(.Ip—A)+n log (q+Pe '


I" )).

Differe /n t iate the exponent of (a) to find that it is minimized by the choice
rm i n =V n [log (1 +A/q.Jn)—log (1—,1/p'). Plugging this back into (a) gives

(b) P(X >_ —A) = exp ( — ng(A/')),

where

g (A)= (p —A)log(1—A /p)+(q +A)log(1+A /q)

A 2 A3
3+...
= —(P —A) )
AP z 1 + (3p
L\P +2

...

+(q +A) L\q 2q z / + 3q'_ /J

/ A3
= Z +— lz— lz)_(P_A)(A3 3 +...) +(q +A) 3 —_ ..^
2pq 2 p q 3P 3q

A z k3 1 A 2 (1 1)
—(P —A) 3p3
'2Pq +...J+ 2 `P z_ 9 2 J

A2
2 pq
+(p—,t) log 1 —A
L \ P/ p 2P
f ( l
+ A + ,1 z + A2 ( 1
2 9/
z — lz l
2 2 2 3
A2 1 !
= A +ph(1— - +- Az + z — zl
2pq p 2 P 2P 2 P q

I — q
2p \ Pl + 2Pq( — qP^

az —al + p(1 -2p) 1


[ q 4i (

2Pq P) q J

since A<_p implies p 1— (


? p (1-2 p )
q q

and establishes (5). This is actually a slight improvement over Wellner (1978b),
UNIVERSAL EXPONENTIAL BOUNDS FOR BINOMIAL rv's 443

and we summarize it as

(17) P( — X>—A)<_exp( — A2
[q4l(_^

n
p )+
P(1 9 2P)
1/

<eXp \ 2pq (11 p ) [ P+q ^\ ^p/^/

for all.A >0.

The following constants will arise repeatedly. Let ß> 1 denote the solution
of

(18) 1/c = h(ß) = (ßC — 1) 2 (ßc —1)/2.

In Figure 10.8.1 we showed graphs of /3, cpC, and ✓ c/2(ßC —1).


We state as an exercise a list of other facts about 4i that are much less
important.

Exercise 1. We have

(19) h'(A)=log A,

(20) h(1+A) = f "log (1+x) dx is T ford >0


o

2 3 n n

= 2 — 6 +. . + n(n ) i) +...
forJAI<1.

Applying (11) to (3) gives

(21) P(X>A)<—exp(—(1-8) A(3PS ) I ifA>_3pS^.


2pq

Exercise 2. Let X be Poisson (0). Then


/ i 1\
(22) P(t(X-0)/-,/0?A)<—exp — a/i^-f f for all, >0.

Solve this exercise by passing to the limit in Inequality 1. (Note Example A.4.1).

Exercise 3. (Slud, 1977) If 0 ^ p <; and np <_ k <_ n (or if np <— k < nq), then

(23) P(Binomial (n, p) ? k)? P(N(0, 1)>_(k—np)/.lnpq).


444 EXPONENTIAL INEQUALITIES AND Il•/qil-METRIC CONVERGENCE

Exercise 4. Combine Exercise 11.9.2 and (11.9.23) below to show

P([Binomial (n, p) — np]/Jn> A)

a AZ A
<^1+ p
/^A eXp \ 2p o \Vnp
for all A ? 0 and

P(— [Binomial (n, p) — np]/.In? ,1)

A
ctr
— \ Tp/-1^A exp \ 2p \ ^p//

for all Jp > A> i /I.

Binomial Bounds Extended to Neighborhoods of the Origin

The following extension of Inequality 1 to intervals is based on the important


observation of Proposition 3.6.1 that

(24) U(t)/(1 — t), for 0 <_ t<1, is a martingale.

Inequality 2.
(i) (James) Let 0 < p <_ 2, n>_ 1, and A > 0. Then

( 25 ) P(IIU /( 1— I)IIö?A /( 1— p)) ^ exp( — ^


2 +p\
A //
2pq p ✓n

(ii) (Shorack) Let 0 <p <_ 2, n>


- 1, and 0 <,1 <_.. Then

exp . 2p
\ /)
(26) P(IIU /( 1 — I)jI A /(1 — P)) `— 2
exp (— 2Pq

Proof. Since {v„(t) /(1— t): 0 t:5 p} is a martingale we have that {exp (±
rU „(t)/ (1 — t)): 0: t < p} are both submartingales by proposition A.10.1, where
BOUNDS ON THE MAGNITUDE OF IlU/qH{ °4
45

r denotes any real number. Thus for any 0 < p < 1 we have

y^(t) — 1
(IJU:/(1—I)Il ö--a/(I — P))=P sup t
( 0-t -5p 1—t 1—p.
-

=P( sup exp(t rU ' (t) ^?expl 21 I^ for all r>0


0!5tf=p 1—t \1—p/

<— ro fo exp 1— 1 rA ' E (exp I f r 1 " (P) )) by Inequality A.10.1


P \ P
= inf exp (—rA)E(exp (±rX )) with X = U(p);
r>O

and apply Inequality 1. N


Section 14.5 contains the natural analogs of this inequality for the Poisson
process.

2. BOUNDS ON THE MAGNITUDE OF II U . /q Ij

In this section we seek to bound the probability that the empirical process
ever exceeds boundaries such as those shown in Figure 1. Our inequalities
will be built up out of the binomial bounds of the previous section. We will
need these bounds to prove Chibisov's theorem in Section 5.

—q
Figure 1.

To express our bounds we shall again require the function

(1) +/'(A)=2h(1+A)/,12 forA>0 where h(A)=A(log A —1)+1.

Recall that properties of 4i are described in Proposition 11.1.1.


Recall that Q and Q* denotes the classes of all continuous nonnegative
functions on [0, 1] that are symmetric about t = and satisfy

(2) q(t)7 and q(t)/f\ for0<t<_2, for Q and


q(t)y' for0^ t<_Z, for Q*.
446 EXPONENTIAL INEQUALITIES AND II'/gII METRIC CONVERGENCE
-

Inequality!. (Shorack and Wellner) Let 0<_ a (1— 6) b < b s 8 < and A>
0. Then for q e Q

(3) P(IIU /gllä?A) <_ S


6
a
!

t exp t — (1 -8) Z y^
2 2
q t t) ^ dt,

where

(4) y = 1 >_ '(,1) always works,

and

always works,

I-1v 1 .
ifa?g 2
n n

Moreover, for q E Q*
f b
q2(t)
(6) 1'(II^n/9IIa-^) ` g s, t
exp (— (1 —S) b y^(t) 2z I dt,
a

where

(7) y(t)1

and

(8)

[When a = 0, both probability bounds (3) and (6) exceed 1 in the "+" case.]
If b>,0 it may be an advantage to use a fixed small 6 in (3) or (6) rather
than 3 = b; the exponent typically will not matter either way, but the 3 term
in front will stay bounded. For q E Q* it is difficult to bound y + (t) below by
a constant. Hence in applying (6), we will use (11.1.11) to bound the right-hand
side of (6) by the sum of two terms, a device used by both Chibisov (1964)
and O'Reilly (1974).

Corollary 1. When q( r) = f, 0 ^ a <— (1— 6)b < b <— 8 s2, and A >0 we have

(b/a)exp( A2
(9) P(IIl±/VniIlä=a) 3 log )
BOUNDS ON THE MAGNITUDE OF IlU„/qIIä 447

where (4) and (5) hold and (crudely)


(10)
(11) y
(1-S)
> 3S na(1-S)/A
j ifA:538-,/na
ifA?3S na.

Proofs. Let
(a)
Define
(b) 0=1-5

and integers 0:5 J < K (note that J>_2 since O>_ Z and b <- 2) by

(c) 0'<a<_8 and O <b<B J- ' (weletK=mifa=0).

[From here on, 0' denotes 0' for J <_ i < K, but 0' denotes a and B J- ' denotes
b. Note that (new 0' - ') <_ (new 0' )/ 0 is true for all J <- i < K.] Since q is >'
we have

An ti Lmax sup U°(t)


Jsi2K o'stso' q(t)

c `rL max sup._


U(t)
(d) ?a
JsisK 0 1 stso t q(0 i )

so that
K
(1-6')
(e) P(An)s^ l.
p
\II1^ Iflo ''^q(B') {1-e' ')J -

We first consider A. Now (11.1.26) gives

1 -A2 9 K 2 g2(6')(1- B'-1)


P{An) ^ exp
i i- `(1- 8' - ') 2\

(f) < K p
ex -- 2 q2(e`)
, 8y since 1- B' ' >_1-b _ 0
; .J 2 8

1 - exp - B2 y dt
=J+I1-0 t
K-'1 J BB1
_, 1 \ 2 q2(t)
A2 t
A 2 g 2 (eb) 82y _1
+exp 1 -- q2(a) B e y I +exp I
2 a III \\\ 2 Ob J
b 2 2
3
(g) a -ex
t (-(1- S) 2 q ` t) Y - ) dt
448 EXPONENTIAL INEQUALITIES AND II'/gII-METRIC CONVERGENCE

by (b), (c), and the fact that l bb t >_ (log (1/0))/8 >_ 1 follows from
a s (1 — S)b and S <_ Z. This completes the proof of (3) in the " —" case.
We now consider A. From (e), (f), and (11.1.25) we have

K ( A2 g 2 ( 0 1 ) ( Ag(0`)(1- 0i -')
(h) P(An) 5 ^ exp — 2 ®^-^ B 4` B; -, f
1
11
But a= 9 '<_6' for i <_K and q(t) /t is \, so

(i) q(01)(1- 8` -I ) < q(8') q(a)


a

-
Thus, as in case A , we have

(1) P (A ) 8 fa t exp j —
2 q tt) 82 (^a.^ )> dt

completing the proof of (3) in the "+" case. We need only remark that for
(5) we use

(k) q(a) q(a) 1 q( 1 1 n) 1 <1


aV v v7n na —

We now prove (6). Up to step (f) the proof is the same as that for (3).
Again we first consider A-,, but now qE Q *. By (11.1.16) and (f),

(A„)<_ Y_ exp
2

i
K

\— 2

1
0)

c.' p ( — A2 q (e'.)
ex 1 +ex p
B3/ q(a) 92)J
_j 2 0 0 (_• 2 a
1 I (t)03)
1 1 exp 1
—A2 dt+exp (—^2 qz(a) 0 2 )
S B y e t \ 2 t 2 a

z t z e
since q ^ <—) 8(+, ) for 8 ' +' < t < 0 '

b
(l) s
f ae
i exp ( — 2z q 2 t) (1— S) 3 ) dt+exp 2 g2aa) . 2

using 8 K <a < 0 K - '


/ \
1 dt,
6 2 2
exp ( —(1— 8)32 q ( t)
(m) <_ 2 bt
BOUNDS ON THE MAGNITUDE OF IIU^/qIIö 449

since Oa < a <_ b, implies


b / \

f
2 2 a 2 2
I \
ea exp (-8 3 q (t) dt ^ s Jea r exp C—B 3 2 q t t) 1 dt
(n) >_ C1 J a t dt 1 exp (-83 2
ea q 9a )) '—exp ( —BZ 2Z g2aa) )

completing the proof of (6) in the "—" case.


We now consider A„ and q E Q*. From (e) and (11.1.25) expressed in terms
of h [recall (11.1.3)1, by using h/ and :fr", we have

K n 8'-'
exp ( -101_1h
q0 ll
t (
e \
(o) <_ Y t 1
-j S f 0 i o 1t
exp —nth 1+

K-1
'kq(t)
✓nt //
03 dt

n9 K q( K ) ^ (1
+exp(_ + '^ — o K- ')
1-9 -ih(
,

^8 //

3
J I
b C—nt
ex
t 2nt2
aB p
A2g2(t) 8 e C O3Aq(t)
../itt i,
dt

K -I
+expC-1n eOK_, VnOK-, (1—BK ')//
hC1+
2 fb1 (0 6A2 q2(t) , ( o3A(t) y
(cSaB t exp dt
2 t vt

since aO < a <_ b implies that

1
3aot
l
b
—exp (—O6A 2 q2 (t)
2 t
(0
3 Aq(t) 1 1 dt
-, /it //

( a 1 eXp (_ 0 6A2 q z (t) 4( e3_q(t) 11 dt


J aet 2 t ` %nt JJ
—1 J a 1x e p C—nthC1+
s aø t
11 dt
JJ
e3Aq(t)
✓n t

>_(!
1
Eidt^ exp 1 —nah(l+ '19(a) ®3
t \ 'aO '/

(q) n 0K-I)/
? exp C — 1 nBB K _ 1 I—
K I A q(a) l
.

We can replace B 3 by 1 in (p) since /i is \. ❑


450 EXPONENTIAL INEQUALITIES AND 11 /qiI - METRIC CONVERGENCE
-

Proposition 1. (i) Let q E Q *. Define g by

(12) q 2 (t)=g 2 (t)tlog e (1 /t) for0<t< .

Then

(13)
f . ,/
21 exp(— Lg t (t) f dt<co for allE>0,

provided

(14) g(t)-+oo as t10.

If (13) holds, then lim sup, o g(t) = +oo necessarily, but lim inf, o g(t) <oo is
possible. (ii) If q E Q, then (13) implies (14).
Proof. (i) Merely note that

/21 exp —
t g2(t)) '/2 -f
(a)
f o,
t dt— o
t[log(1/t)]`g2(t)
dt,

where

1/2
(b) [tlog(1 /t)] -" dt <oo when,>1.
0

(ii) We follow Csörgö et al. (1983). Now

(a) I 1 _e 2 (t)
J‚ s exp t — q-Ls/
(s) I ds? exp ( )
J s

(b) =2exp(—[Egt( t) -1og2(1/t)Jl

where

h(t)= e(g 2 (t) /t) — log2 (1/t)


(c) -moo as t j,0

since the lhs of (a) converges to 0 as t.[0. Thus

q 2 (t) /t =[h(t)+log 2 (1/t)] /E


(d) >_ e - ' log 2 (1/ t) for all 0< t!5 some t E

for any E > 0. Thus (14) holds. ❑


BOUNDS ON THE MAGNITUDE OF IUn f gllä 451

Exercise 1. Let 0 < a <_ b < 1 and A > 0. Then

P(IIvn/v'iTIIa>A)
{any Inequality 11.1.1 bound on P(I v„(b)I //E>— A a b)}.

(Hint: Use Inequality 11.1.2.)

Bounds on the Magnitude of U/q

We now develop analogous results for Brownian bridge U. Note first that

(15) U(t)/(1 —1) is a martingale for 0<_ t < 1;

see Exercise 3.6.2 and Section 6.6. The following is now easy.

Inequality 2. Let qEQ*, 0<_a<_(1—S)b<bs5<_2 and A>0. Then i

(16) P(IIU/( 1— I)IJO-A/(1 — b))<2 exp ( ^ 2 )


2b(I —b)

while

(17) s fa exp(—(1-6) 3 2 g Z t t) ) dt.

Proof. Since {U(t)/(1— t): 0< t< b} is a martingale, {exp (rv(t)/(1— t)): 0:
t <_ b} is a submartingale. Thus

P(III/( 1— I)IIö> A/(1 — b))<_2P(IIU + /( 1— I)IIö> A/(1—b))


U(:) ) 1rA
_.52P1 ö upb exp( >exp(
)
rA b )
- 5 2infexp(_ 1 El exp
b
(r ()))

=2 inffexp \1 rb 2b(1—b)
+ 2(lr2 b) /

=2 exp ( — setting r=A/b


2b(1? b))

using the fact that a N(0, r 2 ) moment generating function is exp (0. 2 1 2 /2).
Now let A [ II U/ q II ä ^ A ] and just recopy the proof for An out of
Inequality 1. ❑

452 EXPONENTIAL INEQUALITIES AND II'/gII-METRIC CONVERGENCE

We now summarize some stronger results in this vein [the reader is referred
to Section 1.8 of Ito and McKean (1974)]. If q denotes a continuous function
that is positive on (0, 1) with q(0) = 0, then

P(S(t) <q(t), tJ,0)= P([w: S(t) <q(t) for all 0< t <some Ste,])
(18) =0or1.

(In Exercise 2.5.1 the reader was asked to prove Blumenthal's 0-1 law; (18)
is a trivial consequence of that Iaw.) The function q is called upper class for
S if the probability in (18) equals 1; otherwise, it is called lower class for S.

Exercise 2. Show that for q E. Q and 0< a < 1

(19) P(S(t)?q(t)forsome0<r-a)^2 J o q( t) 3 exp(— q2 (t ) ) dt.


0 2;rt 2t /

Exercise 3. (Feller, 1943; Erdös, 1942; Kolmogorov; Petrovski, 1935


criterion) If q E Q, then

1
(20) P(§(t) < 9(t), t^0)_^O

as
q(t) q2(t)

o+
2^rt3 exp (—
2t J
dt = I <m

The reader is asked to prove the upper-class part. [Generalizations to the


partial-sum process are considered in Strassen, 1967 and Jain et al., 1975.]

Exercise 4. We may replace § by U in (18) and (20), while the analog of


(19), for q E Q, is

(21) P(U(t)>—q(t) for some 0<t<a)

2irt
o
q(t) exp I—
f
3 (1 —t) 3 2 1 (1 —t) /
dt.
q2(t)

(Hint: Use Exercise 2.2.4.)

Exercise 5. Trivial modifications in Inequality 2 give for all A >0 that

(22) P(I^§IJö A)<_ 2 exp( — A 2/( 26 ))

and, provided 0 a<_(1—S)b<b<_S<1,

( 2 3) P(II§/gllä_A)<_S J Q texp^— (i—S)2 gZl t) )dt.


EXPONENTIAL BOUNDS FOR UNIFORM ORDER STATISTICS 453
3. EXPONENTIAL BOUNDS FOR UNIFORM ORDER STATISTICS

Our object in this section is to develop good exponential bounds, similar to


Inequalities 11.1.1 and 11.1.2, for

P(t'(fn:i —
pi) > P9 A),

where

and for

P \II Vlllo1 A6).

Recall that

Vn(pi)=fn( n:i — p i) fort<i<_n.

Our bounds will be expressed in terms of the function

(2) i (A)=21(1+,1)/A 2 =(2/A 2 )[,ß-log (1+A)] with h(A)=Ah(1/A),

where h(x) = x(log x —1) + I as in 11.1.2. Properties of the function will be


given in Proposition 1 below. We agree that .(A) _ —oo for A <_ —1. Note
Figure 1.

Exponential Bounds for Order Statistics

Inequality 1. (Exponential bounds for N n (b)) For all A >0 and 0< b <1
we have

(3) P(+Vn(b)^A)^exp — ,1 2 fA l
( 2b (bam /)
i

n in the + case
b ^r( b l_
=exp^-2
(4) 2

xp — A in the — case.

Corollary 1. For all i = 1,,. . . , (n/2) and all A > I we have the crude bounds

(5) 9 A)<exp(—A/10)
P(+Vi(n,i—pi)? P
454 EXPONENTIAL INEQUALITIES AND II'/9II-METRIC CONVERGENCE

Figure 1. The functions h and .

and

(6) P( — ./n(6.,:; —pi)^ p A)^exp(— A 2 /4).


Moreover, for all i = 1, ... , n and all A ? 1

(7) P('Jf,:1—p11 'A)


EXPONENTIAL BOUNDS FOR UNIFORM ORDER STATISTICS 455
Proof of Corollary 1. The bound (6) follows immediately from (4) using
1—p >— 2 for 1-5 i <_ (n/2). To prove (5), note that (3) yields
;

P(/i(e,,:i — p1)? P1q A)=P(V (P1)' P'q A)


>/z
2
15exp ( — A (1 — p,)l 7=c )
I Pr p` //
/ z
(a) <— exp I — 4 i(f ^1) since 1—p; ?

since it is I and np >_ n/ (n + l) ? 2 for n ? 1 and all i >_ 1. Now


;

4z ^(v A)=4
' (v A)iWA)

A
for A >_ 1 since ,taji(A) ? by Proposition 1

=4^G('),

where äir('/)=0.13321 ...>0.1, and thus (a) implies (5). The inequality (7)
follows from (5) and (6) by symmetry about Z. ❑

Exercise 1. Corollary 1 can be modified to

( 8 ) P('Ien:j—P,I^ P9 A)<_2exp ( — k2' for A>0.


2(1+A))

The following proposition records useful properties of the function Vii.

Proposition 1. The most important properties of are

(9)^i(A)= 1 1— I log(1+A)
)=I+A 4`( I+A) ,

(10) fora >—1 with. (0)=1,soi (,1)?lfor-1<A^0,

(11) ç(,l)^ ^asA-> ^(A)-2log(1/(I+A))as A- —1,

(12) c(A) ? 1/(1 +2A/3) for A > —1,

(1-8)for0<—A^ZS and Os8<-1


(13) ,^(,^)^ f33(1
—S)/2AforA> 6 and 0^S<_1,
456 EXPONENTIAL INEQUALITIES AND It/Nil-METRIC CONVERGENCE

2 2 2 32
-1 „
(14) ^ (A)= 1-3A +4,^ — A + ..+ A+ .. forIAJ<1,
5 n+2

(15) c'( 0 )- - ,

(16) Al (A) =2(1 —I log(1 +,1)) is ‚" forA> -1

and has derivative equal to 0(A)/(1 + A) > 0.

Exercise 2. Prove Proposition 1.

Proof of Inequality 1. It follows immediately from (10.3.7) and (10.3.8) of


Inequality 10.3.2 that

(a) P(G '(b)/b? 71) <exp ( nbh(rl))


and

(b) P(b /G(b) j rl) <—exp ( nbh(1/rl)).


Rewriting (a) and (b) yields

(c) P(V(G „'(b) — b) >(ri-1)bVi)^exp (—nb!))

and

(d) PI — (G^'(b)—b)? ( 1— )bVin) exp(—nbh(1 /n)).

Letting first A = (77 —1)bVi in (c), and then A = (1 - 1 /77)bVi in (d), then (c)
and (d) give both the + and — parts of (2) by use of (1). The inequality (4)
follows immediately from (10). ❑

Corollary I yields the following useful moment bound.

Corollary 2. (Wellner) (Bound on absolute rth central moments of uniform


order statistics). For all real r> 0 and 1: i <_ n

(1 7 ) Elen:, — Pil r ^ C.(P1g1l n)" Z — Cr(i/n 2 ) 2,

where Cr = 1 + 2 • 10' • I'(r+ 1).


Proof. Let h =. The exponential bound (8) implies that

1 —F(s)= P(Ien:i — pSI^s)<2exp (—hs/so)


EXPONENTIAL BOUNDS FOR UNIFORM ORDER STATISTICS 457

for s>_ so = (p ; q ; /n)' /2 . Hence

EIS & —p; I'=


0
t'dF(t) = r J (1—F(s))s'
0
- 'ds

0
— r
<
f so

S r—
' ds+2r
50
exp ( hs/so )s' ' ds
— —

(1+2rF(r)h ')s^. -

Note that use of an exponential bound of the type (7), but with constant ;o
improved, only changes Cr . ❑

Exponential Bounds for Order Statistics Extended to Neighborhoods


of the Origin

We now extend Inequality 1 to intervals near zero by use of inequality 11.1.2.

Inequality 2. (Exponential bounds for suprema of N„) Let A >0, 0 <p <‚
and n ? 1. Then for p <— 2,

(18) (k: (p,^ n))


and for p+,1n - '' 2 -2,

(19) \II^n lllo ' l^p)^ exp( — 2p ß \ p"


(20) < exp ( — Az ).
2p

Our proof of Inequality 2 will rely upon the following simple events identity.

Lemma 1. For all 0 < p < 1, 0 < A < 1, and r2!1 we have

(21) LII ( ^1 '— I )+ I O >A J LII(^1 — I ) IIO +A9> 1 AJ


and

(22)
1
[1 ( ^1 '— I ) 110 'A ^^I(G1—I)+^Iov av)
°>
1+A 1

458 EXPONENTIAL INEQUALITIES AND II'/9II - METRIC CONVERGENCE

G a (t), s
r+ (1 —r)t
1

(1 +r)t — r
r+(1 —r)a

r
(1 +r)a —r
t, 1(S)
0
0 _!_a
1+r
1
rDR

Figure 2.

Proof. First, we prove (22). By Figure 2

i1 (G 1 — I ) IL >r ]
_[G„(t) - t+r(1 —t)forsome0<_ta]

forsomer-s<r +(1 —r)a


=[fvn'(s) -.5
^n'(s) —s

r J
for somer^s<_r +(1 —r)a]
1—s 1—r
'—I)
(a) =Eil 1 —I Il ^1
or+('—r)a r—rJ

=
Thus, letting p r+ (1 — r)a and A = r/(1 — r) so that (p r)/(1 r) a= — — _
p(1 + A) — A = p — Ag, (a) yields (22). Similarly, from Figure 2 we also have

E I (G1 — I ) Ilo >r J


_ [t —G„(t)> r(1 — t) for some 0< t< a]
=[G„(t)<(I+r)t—r for some 0<_t<—a]

(s)> forsome0^s< (1+r)a — r]


[ f' - 1±r
r >
G( s) —s r
- (1 +r)a —r
forsome0^s--
1—s 1+r

(b) —' >iT]


EXPONENTIAL BOUNDS FOR UNIFORM ORDER STATISTICS 459
Hence, letting p=(1+r)a—r and A=-r/(l+r) so that a=p(1—A)+A=
p+Aq, (b) yields (21). ❑

Proof of Inequality 2. From (22) and (i) of Inequality 11.1.2 (written in terms
of the function h rather than the function #A),

(a) P((G1'-1) Ilo > 1 ^P/

=1'
(V 1— It A> 1+A/9)

h [ l+1
ceXp \ —n l (p — A) + ,t/9P - A))

(b)
=expt —nq+A (1+1+Algp-A)/'

note that the probability in (a) is zero for A > p, so we have assumed A <— p.
Thus q + A <_ 1, and hence

(c) q+A ^p(1—p /

and

A/q q+A 1
(d) 1+1+A/q p—A 1—A/p i

Combining (c) and (d) with (b) and using h(x) ‚' for x>— 1 yields

II exp (—np(l—h(l/
(e) II 1' 1) fo > 1 ^ P/ ^ p^—AP^^

exp(—nph(l— p =expl —np2


pZA 2/p2 h(1—p)
)/ I

=exp \
—n 2p^\ p//

Replacing A by A/' in (e) yields (19).


460 EXPONENTIAL INEQUALITIES AND /q I-METRIC CONVERGENCE

Similarly, from (21) and (ii) of Inequality 11.1.2 [written in terms of h


rather than /i using (6)]:

P
(f) Ii AP/

=P
\!I (G I —I) 1Io +A ^ 1 - A /q/

= p
(II (G1 — I ) ( + ^ > 1 — (P+A)/
\
r
<exp —n(p+A)h 1— A/q q —A
(1 —A /q) p +A ).
Note that

_ A/q q A __ 1 —

(g) 1 1 —A /q p +A 1 +A /p'

Hence (f) and (g) yield

P (II (G1''I )+ Ilo > 1 ^P 1


^exp( — np( l+ p )h( 1 +/p )
=exp(_nph(i +p))

2
= exp -np_ 2/ 2 h 1 +^
\ P /P P/
r i
}
(h)
=expC—n2P^ \P //

Replacing A by A/Vi in (h) yields (18), provided p +A/Vi<_z. ❑

4. BOUNDS ON THE MAGNITUDE OF IIVR /qH

Now the inequalities of the preceding section yield probability bounds for
^IV /qII ä in the same way that inequalities of Section 11.1 were used in Section
11.2. Since the proofs parallel those of Section 11.2, we leave them as an
exercise.
Recall the definitions of the classes Q c Q* used in Section 2.
WEAK CONVERGENCE OF U, AND V. IN jj•/q^) METRICS 461

Inequality 1. Let q E Q, n ? 1, 0 <— a <(1 — 5) b < b <_ S <— Z, and A >0 satisfy
S+n 2 A<_2. Then
Ill t bb 2 2

a>Alcsfa ^exp^—(1-6)'y^2 q ^ t) ^dt,


(1) P\I qn
where

(2) y = 1 always works,

(3) Y+=+^^ä(f

? i(A) / if a ? q 2 (1/n) v l/n,


and

(4) ßi(.1)=,iI (1+A), I (A)=Ah(1/,1).

Moreover, for q E Q*,

(5) P\ q Il
/IlV #IIb
Ii — II
a
) z_
-A /

b2

s Ja(1-s) t
exp(—(1—S)
by^(t) 2
2

t \
t) 1 dt,
1
where

(6) Y (t)=I?y+(t)=0^ _
v'— )

Exercise 1. Use Inequality 11.3.2 to give a proof of Inequality I along the


lines of the proof of Inequality 11.2.1.

Corollary 1. When q(t) = f

llb>A) 31og(b/a)
<exp —(1—S) 3 y } 2 I,
(7) P \I1v
^ a 11

where y - = 1 and y'='r(,1/ na).

5. WEAK CONVERGENCE OF U. AND V. IN • /q 11 METRICS

In this section we suppose that q e Q * ; that is,

(1) q>_0 is ,* and continuous on [0,] and symmetric about t = Z.


462 EXPONENTIAL INEQUALITIES AND 11 /qII-METRIC CONVERGENCE
We will establish necessary and sufficient conditions for the special construc-
tion of Theorem 3.3.1 to satisfy 11(U — U)/ q 1I -0 as n - x. One of our
conditions will be phrased in terms of the integral
1/2
(2) T — T(q, A)= t ' exp ( Ag 2 (t) /t) dt.
- —

Theorem 1. (Chibisov) Let q E Q *; see (1). Then

(3) T(q, A) < oo for every A > 0

is necessary and sufficient for

(4) 11(v„ — U)/ q 11 - p 0 as n -' oo for the special construction.

For q E Q [i.e., we also have q(t) /f \ for 0< t <_ 2]

(5) g(t)= q(t)/.lt log 2 (1/t) -*oo as t-*0

is necessary and sufficient for (4).


Proof. Suppose q E Q* satisfies (3). Then for 0< A < 1, q E Q* implies that
s?At and q(s)<_q(t) in the exponent below, and gives

(a) s -' exp (—Ag 2 (s) /s) ds? (—log A) exp (—q 2 (t) /t).
At

Since the left-hand side in (a) converges to 0 as t -+ 0 by (3), we have t -2 q(t) -^


0; and hence
0o as t-0;

(b) h(8) =inf {t -'/2q(t): 0 <t<_ 0)/co as 0+0.

Now for 0< 0<_ 2 we have

(c) 1!(v U) /quo 2 U /qj +llU/gIlö+JJU. — UJI /q(0)


and, with a„ = e/n, s > 0,

(d) JJUn/gIlO S llUn /q II O n+ J U n/qlla. -


Note that if A„ _ [ n: , s a] then for any e >0

(e) P(A„)=P(n^ n;,^E) -*1 —exp(—e)<E;


WEAK CONVERGENCE OF U. AND V. IN II•/qII METRICS 463
and hence on A'n, with P(A) ? 1— 2E for n > some N E , we have

(f) IIv„/qII^'lA = n' / ZIII/gllö „ S E 2 /h(a„) < r

for n > some NE.


Now we use (11.2.6) to control the second term on the right-hand side of
(d). Let B„=[I^OJ„/q^Iä>_e]; by (11.2.6) we have, for 8>B (e.g., 6=2) and
any E > 0

t-' exp( _ (1_S)6-


(g) P(B„)^4 f
a ^1- r (t)E2g2 (t)) dt.

By (11.1.11) and (11.2.8)

1-6 ifeq(t)/tVi<_36,
(h) y (t) - 138(l — 3) tN/—
n/ Eq(t) ifvq(t)/tfn>_3S.

Hence by adding the two resulting terms and using na„ = s, it follows from
(g) and (h) that

P(B„)s4 f 0t
'exp(_ (I_6 dt
)2t2g2(t))

+4
S
f ao
(I-s)
t ' exp (—z6(1-6)'sq(t)V) dt

4 foo t_'eX
P
( (16)
2t2g2(t)dt

+4
s
f a

,(I s)
-
t ' exp(—z&(1—S) 7 eh(B) nt) dt

E 8 '^
<--+— s-' exp (-2S(1—S)'sh(B)s) ds, set s = nt
2 3 E(I-s)
(i) <e for B <— 0E

since (3) guarantees that the first integral can be made small by choice of 0
sufficiently small, and the second interval can be made arbitrarily small for
small 0 in view of (b). Combining (d)-(f) and (i) yields

Cl) forn^someN..
Combining (j), (11.2.17) of Inequality 11.2.2, and Theorem 3.3.1 with (c) yields
(4) (in view of symmetry of the process about t = Z).
464 EXPONENTIAL INEQUALITIES AND II'/gDI METRIC CONVERGENCE -

Suppose now that 1I(U — U)/q1) -, 0. Fix 0 <E < 1. Now

II(U — U) /q ^ sup {[U(t) —U(t)]/q(t): 0< 1< 6 2 (e„., n 1/n)}


?sup {[— ✓ t —v(t)]/q(t):0<t<e 2 ( . 1 n 1 /n)}

(k) ?sup {[—ef —U(t)]/q(t): 0<1< C 2 (en:I A 1/n) }.

Now our given condition (4) implies P(JI(v„— v) /q1I e) 1, and thus (k)
implies

(1) P( —U(t)<e^ t(1 +q(t) /f) for 0<t<6 2 (e„ : ,n1/n)) -*1
as n -> oo. Letting

(m) h(t) =f(1 +q(t)/ ✓ t) and B„=(en: ,n1 /n),

we see from (1) (using U = —U) that for any fixed small number ß ;

(n) P(IIU/h1I ö B">E)<_ß for


2 j all n >_some n = nß .
;

Thus there exists a subsequence n (we now insist Y_ ° ß <oo) on which


; ;

(o) P(IIU/hIIö B ? e i.o.) =0,

which implies for this e > 0 that

(p) P(U(t) < sh(t) for all 0<t<some S^,) =1

(i.e., eh is upper class for U for all e > 0).


By Exercise 2 below

(q) T(h, A) <oo,

and hence (a) implies that h(t)/fit= 1 +q(t)/ ✓ t --oo as t-+0. Hence

(r) q(t)/f-*co as t--0.

Thus
oo> T(h, A /4)
=
1
exp I —4 (f +9(t))Z l dt
f 1/2
(some 8) 1 eX
/—A [q(t)+q(t)]2)
J c t p I\ 4 J
dt

fo I exp (—A q - t) I dt. (

(s) '
WEAK CONVERGENCE OF U. AND V. IN II• /qil METRICS 465

Thus

(t) T(q,A)<oo,

establishing (3).
That (5) is equivalent to (3) for q in Q was shown in Proposition 11.2.1.

Corollary 1. Let q E Q * • If (3) holds then

(6) max f - 1 n') -+ P 0 asn -goo.


in

Exercise 1. Show that if q E Q* satisfies jö [q(t)] -z dt <oo, then (3) holds.


[Hint: x exp (—Ax) < 1/.1e.]

Exercise 2. Suppose (1) holds. Show that (3) is equivalent to

(7) eq is upper class for U for all e > 0.

[Hint: See Inequality (11.2.2) to get (3) implies (7) and the second paragraph
only of O'Reilly, 1974, p. 644 gives (7) implies (3).]

Exercise 3. (Csörgö et al., 1983) (i) Define a function q E Q* as follows:


set a k = exp ( — e k2 ), bk ° ( 1 /ak) log log ( 1 /ak) = exp (eke)/k2, and m k ° bk +1 - 1
for k = 1, 2, .... Then set

(ak — t)'""
(bk+l- bk)
+bk forak+lit<_akandk=1,2,...
( 8 ) l/q2(t)= (ak—ak +l) k

1/q 2 (a,) for a, t <_ 2.


Show that q E Q* satisfies Jö [q(t)] -Z dt <ao so that (3) holds by Exercise 1,
but that

z
lira g 2 (t) = Lim q (t) = 1 <oo.
1-.o ,_o tloglog(I/t)

Combining this with Exercise 1, it follows that (3) does not imply (5). [Shorack,
1979c and Shorack and Wellner, 1982 claimed it did. Condition (5) was
introduced by Shorack, 1979c.] (ii) In fact, if h (t) -* oo as t j.0, then there exists
a q in Q* having 1ö [q(t)] -2 dt <c'. for which lim,., o [q(t) /f h(t)] = 0.

We remark that the investigation of (4) by Chibisov (1964) used (3) while
O'Reilly (1974) introduced (7). Chibisov (1964) proved slightly less than the
466 EXPONENTIAL INEQUALITIES AND II' /gII-METRIC CONVERGENCE
equivalence of (4) and (7). He used the representation 8.4.4 of U,, in terms of
a Poisson process. Using this same Poisson representation, O'Reilly (1974)
established the equivalence of (3), (4), and (7). The new inequality 11.2.6 is
used here in their proofs.
Chibisov (1964) established a result analogous to Theorem 1 for the process
vn of (8.4.2). O'Reilly (1974) established a result analogous to Theorem 1 for
a smoothed quantile process V; we develop this in Theorem 2.

Convergence of V. in 11 • / qjl metrics


We will now use Inequality 11.4.1 to prove O'Reilly's (1974) theorem charac-
terizing weak convergence of V. and Q. in II / q (I metrics. Recall the definition
of the class Q* introduced in Section 11.2. Our condition will be phrased in
terms of the integral
1/2
(9) T- T(q, A)= t -' exp (— Ag 2 (t) /t) dt.
0

Note that f 1[a b] with 0<— a < b s 1 denotes the function on [0, 1] that equals
f on [a, b] and is 0 elsewhere. Because of minor difficulties [V n (0+) equals
v'i 'n ,, and not 0, etc.], our theorem will be stated in terms of0/ n 1 [1 /(n+1),n /(n+1))
[We use 1/(n + 1) instead of 1/n so that Cn; , is included in the conclusion, etc.]

Theorem 2. (O'Reilly) Let q E Q*, thus q >_ 0 is / and continuous on [0, 2]


and symmetric about t = 2. Then the processes V. and V of Theorem 3.1.1 satisfy

(10) !I(Nnl[1 /(n+1).n/(n+l)) — V)/9II - ' 0 as n -o0

if and only if

(11) T(q,A) <oo for everyA>0.

Let qE Q; thus we also have q(t)/ ✓ t is \ on [0, 2]. Then (10) holds if and
only if

(12) g(t)- q(t)/..fit log e (1/t) -*oo as tJ1 0.

In terms of the construction which will be introduced in Chapter 12, (10)


could be phrased as

(13) II(Vn1t1 /(n +l),n /(n +l)j — Bn)/9l) aas.0 as n -*oo

for the processes V n and H. of Theorem 12.2.2.


WEAK CONVERGENCE OF U,, AND V. IN 11•/qII METRICS 467
Proof. Suppose q E Q* satisfies (11). Then, for 0< A < 1, monotonicity allows
us to replace s by At and q(s) by q(t) in the exponent to obtain

s-'exp (—Ag 2 (s)/s) ds> (—log A) exp (—q 2 (t)/t).


(a) 1"t
Since the left-hand side in (a) converges to 0 as t -* 0 by (11), we have
t - '/ 2 q(t) -> xo as t - 0; and hence

(b) g(0) = inf { t - ' /2 q(t): 0<1 <_ O}- 00 as 9-o 0.

Now for 0< 0 <_ 2 we have

( C ) II(Wn fI 1/(n+l),n/(n+])] — V)/gllö/2

< IiVn/9II e/(n+t)+ IiV/9IIö+ lien —VII/ q (0).


We use (11.4.5) to control the first term on the right-hand side of (c): Let
An = [ II V, / q II i / (n+ 1) ^ e}; by (11.4.5) we have, for 8>— 0 (e.g., S =)‚ any e>0,
and n large enough that S + n - '/ 2 e <_ 2 that i

(d) 4 J (
S 1/(2n+2)
8 21
(1—S)6y+(t)E2g2(t ))
By Proposition 11.3.1 we have
1—S ifEq(t)/tom<_2S
(e) Y (t)— Z5(1—S)t.in/eq(t) ifEq(t)/tV7i? 5.

Hence by adding the two resulting terms, it follows from (d) and (e) that

P „ <4
f8 't ex ( (1—S)7e2 g2(t) ^ dt
A ) 1/(2n+2) P — 2t
e
+ t-' exp (—Q8(1— 5) 7 q(t)Vi) dt
5

(1-5)
cSJot-'exP( — 2t2g2(t))dt

4
+— t'exp (—(1—S)7q(0) t) dt
S1/(2n+2)
o
I.

E+ g s exp (—äS(1— S) 7 eg(0)s) ds


2 5 1/2

(f) ^ E for 0:5 some BE,


468 EXPONENTIAL INEQUALITIES AND I1- /giI-METRIC CONVERGENCE
since (11) guarantees that the first integral can be made small by choice of 0
sufficiently small, and the second integral can be made arbitrarily small for
small 0 in view of (b). Combining (f), (11.2.17) of Inequality 11.2.2, and
Theorem 3.1.1 with (c) yields (10) (in view of symmetry of the processes about
t=D.
Suppose now that (10) holds, and fix 0< e < 1. Now

JI (V„ lni(n+1).ni(n+I)t — V)/q ll


sup {(V l t 1 /( n+1),„ /(" +,)7 — V(t))/q(t): 0< 1 <s 2/(n+ 1 )}
(g) = sup {—V(t)/q(t):0<t<r 2 /(n+1) },

Since (10) holds, (g) implies

(h) P (—V(t)<eq(t)for0<t<s2 /(n+1))-*1

as n -> co. Now (h) implies that q is upper class for every e > 0, and hence by
Exercise 2, (11) holds.
The equivalence of (11) and (12) for q€ Q is found in Proposition
11.2.1. ❑

Corollary 2. Let q E Q* satisfy (3). Then the spacings S ni = 6n:i — e„ :; _, satisfy

sni
(14) 1
q(i /(n+1)) ^ °0
max1 asn- x.

Proof. Because V/q has continuous paths a.s., (10) implies that the maximum
discontinuity of 4/ „1 ( , /( „ + , ) „ /( „ + , )] /q must converge to zero in probability; but
that is what was claimed. ❑

A Comment on Other Versions of V.

In addition to V. and J,,, the literature contains discussion of a modified


uniform quantile process 0/„ on (D, ') defined by

(15) 0/„(t) =^[ „ :; —t] for i /(n +1) I<(i +1)/(n+ 1),0 <_i^n.

The corresponding functions G', t', and (G„' are shown in Figure 1.
Paralleling (8.2.3) we have the representations (valid for each fixed n)

+l
V,, n r^„+^(n +l) tS,,(1)+ n+1(n+1
(16) n+l ^l„ +1 L —tIJ

for (i-1)/n <t^i /n, 1:i <_n


WEAK CONVERGENCE OF U, AND V. IN jj-1gj( METRICS 469

n=4

n+1=5

0 .25 .5 .75 1.0

Figure 1.

and

(17) Qn= +l ^n+t — I^n+^(1)+1


l
^ n+ l ^
( l l —I
/J

Theorem 3. Let q E Q*• Then the triangular array of row-independent Uniform


(0, 1) rv's „, ... , f„„ and the Brownian bridge V of Theorem 3.1.1 satisfy both

(18) 11(4/„—N)/qII -' P 0 asn->oo

and

( 19 ) jj(Vnl[ro.ni(n+1)] — V)/qII w 0 as n-+x

if and only if (11) holds. The results (18) and (19) hold for each q E Q if and
only if (12) holds.
Proof. Suppose q E Q* satisfies (11). Then since

(a) II(Vnl[1/(n+1).n/(n+l)7—Vn 1 [o,n/(n+I)])/q II

as n-oo
max 9(i/(n+l)) -+ 0

and
v' sni
(b) 1I(Qnl[o,n1(n+»]—'n)/qfl^ max -p,, 0 as n - oo
‚i n+' q(i/(n + 1))
470 EXPONENTIAL INEQUALITIES AND I•/q^I-METRIC CONVERGENCE

by Corollary 2, the "if' part of the assertion follows from Theorem 1 and the
triangle inequality. The only "if" part of the assertion can be routinely proved
along the lines of Theorem 1. ❑

Exercise 4. Extend Chibisov (1964) to show that

(20) i„_ ✓n(NI(nI) /n —I), n >_1,

of (8.4.2) can be replaced by equivalent processes defined in terms of a


fixed Poisson N* for which

(21) 11(V* §*)/gjjö


— 2 -' 0 as n--oo

for some S — S(nI )/'/i = S if and only if (11) holds.

6. CONVERGENCE OF v, W,,, 8/., AND I8„ IN


WEIGHTED £, METRICS

In Section 1 we considered various functionals of processes. Some of these


functionals were integral functionals, and for them an 2, metric could be
appropriate.

Theorem 1. Let 0 <r^2. Suppose qr(t) is nonnegative on (0, 1), is bounded


on [0, 1-0] for all 0<0 : ‚ and satisfies

” (t)[t(1— t)]'/2 dt <co.


(1) J0

Then the special construction of Section 3.3 satisfies

(2)

and
f o
, gijW„ — W^' dI p 0 as n -> oo for the special construction

t +i„ dI + P 0 -
as n oo for the special construction.
(3) J0
Note that we can think of (2) as implying either

(2') i[' /'(W„ —W)]i. - p 0 as n -* oo for the special construction

for the usual norm I [I]I r = (Jo jJY dl ) " on 2„ or we can think of (2) as implying

(2 ") J[W„ —W]1 , -p,, 0 as n -> oo for the special construction


CONVERGENCE IN WEIGHTED 9, METRICS 471

for the weighted metric [ f]I (Io 'I fi dI )'/' on 2r (Ir) = { f: J,'/'f E t,}. We
note that (2") implies

(4) g(W) - p g(W) as n -^ oo for the special construction

provided g is continuous in the 1[ ]I-metric.

Remark 1. Shepp (1966) shows that J /i(t)S 2 (t) dt either converges a.s. or
diverges a.s. according as Jö ti(t) dt is finite or infinite. Thus the converse of
Theorem 1 (and Exercise 1) holds in case r = 2.
Proof. Consider W. first. Now for any 0< 0 <_ 2

E f
0
gi(t)IW„(t)I'dt

= f B gi(t)ElW„(t)I'dt byFubini'stheorem
0

(a) s p(t)E[w„(t)]'' 2 dt by Liapounov's inequality

(Inequality A.1.5)

= J B tp(t)[t(1 — t)] r ' 2 dt

(b) <82 for 0 = 0E now specified small enough.

Thus Markov's inequality (Inequality A.1.2) gives

(c) P( J B
0
l/i^MU„^'dI?e)<_E fWdI/e<E 2 /E=E.

The same argument gives


e
(d) P /rIWI' dI>_s f <_e.
o

Also, the interval [1 — 0, 1) is symmetric to (0, 0]. Finally, since

fe
B 4Iw^-WI'dl- IIIke 8 IIW -wil'

(e) P 0 as n -+ oo by Theorem 3.1.1, for our fixed 0.


472 EXPONENTIAL INEQUALITIES AND II'/gII METRIC CONVERGENCE -

Thus the decomposition [using the Cr inequality (Inequality A.1.6)]

Jo
1

+uIm
n
E 0

—NNE' dI< 1W„ — WI' dI

+cr j
l
i/4W„I'dI +
o f
o "
J4Wr'dI

1 IIIW„I' dI + 1 l/4wl' dI}


+J 1-9 f'-0 J

combined with (c)-(e) gives

(g) P Q rydW„ —Wi (' dI > 5sc, j < 5E for all n ? some n E .
o /

That is, (2) holds. Chibisov (1965) proved (2) for r = 2 with W. replaced by U,,.
For the Y,, process the above argument still holds, since we will now show
[allowing %/„ to replace W. in (a)] that

(5) EINn(t)I ' <_ C, [t(1— t)]'/ 2 for all 0 < t <_ 1.

Now Corollary 11.3.2 shows that (5) holds if we replace t by p 1 = i/(n+1)


with 0 <_ i <_ n + 1. Suppose now that p 75 t < p + , with p s t <_ Z. Then by the
; ; ;

linearity of Q. on [ p , p + , ] we have
; ;

(h) N „( t )=aN „(p; ) +(1 —a)\U„(p,+,) for some 0<_a<_1.


Thus Minkowski's inequality (Inequality A.1.9) gives
{E^41^(t)^`}'^'s a{ESN„( P1 )jr}' +(1 —
:5 a pq +(1 —a) p^ since (5) holds fort =p ;

(i) :5•a pg +(1—a)2 pq usingi>1


: 2 Pq
6) <_2 t(1—t) for0<p,<p i <t< .
As for verifying (5) for 0 <_ t <_ p,, we note that on this interval
(k) N„ (t) =aN „(p,) for some 0 a 1.
Thus
{EIN„(t)r}'/` =
apq since (5) holds for t = p,
= ✓ä ap,( 1— aR) q1/ 1— aR
(1) <2 t(1—t) for0<t^p1.
CONVERGENCE IN WEIGHTED Y, METRICS 473
Combining 6) and (1) completes the proof of (5). Thus step (a) in the proof
for W. also carries over to V. All other steps in the proof for W. carry over
trivially. ❑

Exercise 1. Show that under the hypotheses of Theorem 1 that

" iIiIi „ —W dI -.> p 0 as n - oo for the special construction


(6) J0
if If (i /(n+1))=R n (i/(n+l)) for 0<_i^n+I with A n linear on each [(i-
1)/(n+l), i /(n+l)].

Exercise 2. (Serfling, 1980, p. 283) Show that if X 1 ,.. . , X. are iid F where

[F(x)(1—F(x))]1/2dx<m,
(7) foo .

then

(8) J dx = O p (n - ` /2 ).

Remark 2. Note that

1
(9) q'(t)
[t(1— t)] )+r/2[1og (1/t(1— t))(log2 (1/t(1— t)))]'

does satisfy (1) if E > 0, does not satisfy (1) if e <_ 0.

Exercise 3. Show that theorem 11.5.1 easily implies that (2), with r = 2, holds
provided jö iJig 2 dl < for some function q satisfying (11.5.5); basically, these
conditions require that q 2 (t)/[t(1— t) log e (1/t(1— t))]-) , 00 as t-*0 or 1. Thus
note that Theorem 11.5.1 does not quite imply Theorem 1.

Exercise 4. Establish the correctness of Example 3.8.3 by applying Theorem


1 rather than earlier theorems.

Exercise 5. Let A. denote the Anderson-Darling statistics of Example 3.8.4.


Use Theorem 1 to show that

fo
(10) A 2, -A2 l U (t)2 dt as n-*o .
t(1— t)
474 EXPONENTIAL INEQUALITIES AND II'/9II METRIC CONVERGENCE
-

7. MOMENTS OF FUNCTIONS OF ORDER STATISTICS

Let X 1 , ... , X. be iid F. We are interested in necessary and sufficient conditions


for
- F-'(p))- Eg(N( 0 , p( 1- p)/f2 (F - '(p)))

when k„ = np. We follow Anderson (1982) in our first three subsections. We


would also like bounds on the moments and on the rate of convergence; we
follow Mason (1984) in the fourth subsection. Historically, Blom (1958) was
an important work.

Existence of Moments

Theorem 1. (Anderson, 1982) Let X = F. Suppose g >0 is J' and g(x) -+ o0


as xTF '(1). Define
-

-log (1 - F(x))
(1) a- lim
x -.F '(l) log g(x)
Then
<00 >1
(2) Eg(X) = either according as a = =1.
= co <i

Example 1. (Moments) Let XF and let r>0. Define

-log (1 - F(x))-log F(-x)


(3) a+= lim 9 a_= li m
X -^ log g x log x
Then
<00 < (a + v a_)
(4) E IX j r = either according as r = = (a + v a_). ❑
=00 (>(a +va

Example 2. (Moment generating function) Let X = F. Define

-log (1 -F(x)) -log F(-x)


(5) a2= lim and a, = lim
X-.00 X x-.w X

Then
<00 tE(a1,a2)

(6) E e` X = either according as t = a, or t = a 2 . ❑


0o t <aIort>a2
MOMENTS OF FUNCTIONS OF ORDER STATISTICS 475

Existence of Moments of Functions of Order Statistics

Theorem 2. Let X 1 ,. . . , X. be iid F. Let r> 0. Then

I
<ccr/a_<i<n+1—r/a +
(7) EIX„,;I'_ as
i <r/a_ or i >n+l—r/a +

for a + and a_ as in (3).

Theorem 3. Let X 1 , ... , X. be iid F where F ' (1) = co. Let g>_0 be / co.
-

Suppose
—log (1— F(x)) = 0.
(8) a = lim
X-0 log g(x)

Then for all possible centering constants a E (—cc, oo) and all possible scale
factors b> 1 (typically, b = V) we have

(9) Eg(b(X„:;—a))=oo for all 1 <_ i <_ n and n? 1.

Convergence of Moments of Functions of Order Statistics

For convergence, we will require some assumptions. Suppose

(10) f(xx )>0whereF(xx )=pE(0,1)andf=F' exists at; =F '(p), -

(11) k.1n = p+O(1/n) for integral k„,


—log (1— F(x))
(12) a =hm >0,
x-. = log g(x)
(13) 0:!:-:
_ g < co is continuous and 7 on (—cc, cc),

and either g is bounded or

(14) there exists M, x*> 0 such that log g(tx) < Mt log g(x) for t>1 and
x> x*.

Exercise 1. Show that g(x) = exp (tx) with t>0 and g(x) = x'l [0 ) (x) with
r>0 satisfy (14).

Theorem 4. Suppose (10)-(14) hold. Then

(15) E9(./n(X n:k. — Xp )^Eg(W),

where
(16) W=N(O,p(1—p)/f2(xx))•
476 EXPONENTIAL INEQUALITIES AND II'/9II-METRIC CONVERGENCE

Exercise 2. Suppose (10) and (11) hold, and let r >0. Then

(17) En r12 1X,, :k, —xp l'-* EI wir as n - ► oo

if and only if

(18) (a + v a_) >0 [see (3) for a + and a-]

if and only if

(19) EIXI s forsome S >0.

Exercise 3. Suppose (10) and (11) hold. Then

(20) E exp (tVi(X„, k, —xp ))-* E exp (1W)


for all (>0 (or, for all t<0)

if and only if

(21) a 2 >0 (or, a, <0)

if and only if

(22) E exp SX < co for some S > 0


(or, E exp (—SX) < 00 for some S > 0).

Remark 1. Anderson (1982) extends these results by giving conditions when


various "robust functionals” T satisfy

E(T(F„) — T(F))' = E T;(F F,, — F)/J!) ^+o(n- (r +k_1)I2


Gi

where T (F, •) is the ` jth Frechet differential of T at F.”

Bounds on Moments

Throughout this subsection we will follow the approach of Mason (1984b).


See also Blom (1958), and see Anderson (1982) for further references.
We suppcce that the measurable function h satisfies

(23) 0 <_ h(t)<_ Mt -d 1(1 —t) -d2 for 0< t< I

for some real numbers d, and d2 and positive constant M.


MOMENTS OF FUNCTIONS OF ORDER STATISTICS 477

Inequality 1. (Mason) Let r>_ 1. Then for p i = 1– q = i /(n + 1) we have ;

lI \ ^^Z j ^n:^
(24) —I E h(t)dtI sCrpT'a gT rd 2
,

for all i such that

(25) r(d,-1)<i<n+I–r(d2 -1).

Suppose X 1 ,. . , X. is an iid sample from F where

(26) F '(t) =
- J f h(s) ds for some h.

Then

(27)
f p" -
h(s) ds=X n;; –F '(p ; ) -

and Inequality 1 gives

rn
(28) E!Xn:i – F '(P^) c1P, d`9. dz

ford,-1<i<(n+l)–(d2 -1).

Application of the C, inequality thus yields immediately the following result.

Inequality 2. Let r 1. If (26) and (23) hold for (d, n d2 )> 1, then
n r/2
2r—' ( C,+C I)P1 rd 9i r^2
(29) ( piq )n:i — EXn:ii r ^
for all i as in (25).

Of course, in this case (24) takes the form

'^2
(30) ( ! )
E(X :; –Xp – C,qT`az for all i as in (25).
Pig1

"Proof" of Inequality 1. We offer only the following intuition for this result.
Now

Cn
(a) J n h(t)dt= n (fn:i – P;)h(p )•
J ' Pr9.
478 EXPONENTIAL INEQUALITIES AND II'/giI-METRIC CONVERGENCE

The rth moment of this should be bounded by

()
n r/2

E! :i — piI rh(Pf) r
(b) P1g1
(c) <—Constantp -i rd iq ^ rd2 by (11.3.17) and (1)

provided the rth moment exists. Now existence of the rth moment of the
integral in (a) should be controlled [use (1)] by the integral of

r (diI)(1
(d) x (density Of ,, j )

over (0, 1) ; that is, by


1

(e) t -r(d, -1)(1— t) - r (d2_ 1) t' '(1— t)n - ' dt.


-

The integral (e) will be finite precisely when (3) holds. Details of this proof
are found in Mason (1984b). ❑

Exercise 4. Prove Inequality 1.

Some Proofs of At.derson's Results

We follow Anderson (1982) closely.


Proof of Theorem 1. Special case: g(x)=xl (o,. ) (x) and F '(1) is arbitrary. -

Suppose a=1. Now F(x) = 1-1/x for x>1 has a=1 with EX = oo.
However, F(x) = I — x - ' exp (— log x) for x ? 1 has a = 1 and EX = 3< oo.
Suppose a> 1. Then there exists 1 <c < a and x* such that —log (1 —
F(x)) > c log x for x ? x *. Thus 1 — F(x) < 1/x` for x > x *, which implies
Eg(X) < oo.
uppose a<1. We will show t=1 P(g(X) > n) = oo, which implies
Eg(X) = 00. Since a < I there exists a <c < 1 and a subsequence x 1 <x 2 <
having I — F(x k ) > xk ` for all k with X k -) co as k -> co. Let y k = (Xk ) for k>_1
with yo =0. Then

P(g(X)>n)= Y_ E P(X>n) ? Y, Yk—Yk i > lim y^


n=1 k=1 n =Yk -t+l k=1 Xk k -oo Xk

(a)

implying Eg(X) = oo. CN7

Exercise 5. Prove the general case of Theorem I by appealing to the special


case above. (Try the case when g is T and continuous first to get the idea.)
MOMENTS OF FUNCTIONS OF ORDER STATISTICS 479

Proof of Theorem 2. Suppose first that n - i+ 1 < r/ a + . Now

( r ] ( 1/r
PlI Xn:il > x) — PlXn:i > x )

>_ P(X 1 , . .. , X„ - i +l all exceed x”)

(a) _ [1- F(x' 1 r)In -i +l ,

and so
-,+l
lim -log P(I Xn: ,I r > x) lim -log (1- F(x'^r))n
X_W log Xx_m log X

=n -i +1 lim -log(1-F(y))=n -i +1 a+
r ,,- W log y r
(b) <1.

It follows from (b) and Theorem 1 that EjX„ :; I' = oo. The case i < r/ a_ is
analogous.
Suppose now that r/ a_ < i and r/ a + < n - i + 1. Now

-log P(I Xn:,I r> x)


(c) lim
X^ O log x

P(Xn:i < -X' l r) P(XX:i > X 'ir)


>-min lim -log lim -log
' x^C log x ).

Since

n
(d) P(Xn:; > X' /r ) i P(X,, ... , X„_ ;+ , all exceed x' /r )

i +,^
= (n^[1 -
F(x'^r)^n

the inequality in (b) also switches in this case to >, and gives

-log P(Xn:i > X 1 ' r )


(e) him >1.
zy^ logg

Similarly,

(f) lim -log P(Xn:i <-x'/'r)


> 1.
x-•oo log X
480 EXPONENTIAL INEQUALITIES AND II'/qiI-METRIC CONVERGENCE

Combining (e) and (f) into (c) gives

P(IX":il' >x)
(g) uni > 1.
X-0 log X

Then (g) and Theorem 1 give EIXn;; V <oo. ❑


Proof of Theorem 3. Since b> I and g / oo, for all large x we have

P(g(b(X":, - a)) > x) ^ P(g(X :1) > x)


(a) >_ [ 1- F(g ' (x + ))]"-'+'
- as in (a) of Theorem 2

using the left-continuous inverse g '(y) = inf {y: g(x) ? y }. Thus


-

-log P(g(b(X" :+ - a)) > x)


lire
X_. log x
-log [1-F(g 1 (x +))]
-

<_(n -i+l) tim


X-C log x
-log (1- F(x))
(b) = (n - i + 1) lim as in Exercise 5
log g(x)
=0 forall1^i<nandn^1,sincea =0
(c) < 1.

Now (c) and Theorem 1 establish our claim. ❑

We refer the reader to Anderson (1982) for Theorem 4.

8. ADDITIONAL BINOMIAL RESULTS

Throughout this section we suppose that 0 <p < I and

(1) X Binomial (n, p)

and we define q = 1-p and

)pkqn_k.
(2) b(n, p; k) = P(X = k) = (

Now the mean and variance of X are expressed by

(3) X = (np, npq)•


ADDITIONAL BINOMIAL RESULTS 481

We define m to be that unique integer for which

(4) (n + l )p —1 < m <_ (n + l )p defines m.

Proposition 1. We note that

(5) m is a mode of the Binomial (n, p) distribution,

in that b(n, p; k) has a unique maximum at in if m ^ (n + 1)p, while it achieves


its maximum at both in and m —1 if m = (n + 1)p. Moreover,

(6) b(n, p; k) is strictly monotone in k on either side of the mode(s).


Proof. (See Feller, 1950, p. 140.) Now

_ n k n-k
( n k- In-k+1=
(a)
())
p q (k-1)P q k q
(n+l)p—k
=1+
kq
so that
< <
(b) k= =(n+l)p implies b(n,p;k-1)= =b(n,p;k). ❑
> >
We can also reexpress (4) and (5) as

(^) Binomial (n, p) has a mode at the unique m = np+S


with —q < S ^ p; if S = p, then np — q is also a mode.

One of the most celebrated results of all probability theory is the limit
theorem that
X—np
(8) =(0 1)

is asymptotically normal. This is but a special case of the ordinary CLT.

Theorem 1. (DeMoivre, Laplace) Z. - d N(0, 1) as n -* oo; or,


A
(9) P(Z„ A) -. P(N(0,1)^A)='(A)= f O(x) dx
A
Ziz dx
_ Zar e "

as n-,
oo, for all real A.
482 EXPONENTIAL INEQUALITIES AND II'/gfI-METRIC CONVERGENCE

Bounding the Binomial Tail by the Tail Term Nearest the Center

We will compare individual probabilities to tail probabilities, and show that


the probability in the tail of a binomial distribution is of the same order as
the tail term nearest to the mean.

Inequality 1. (Feller) We have that

(10) k ? np implies that

P(Binomial (n, p)> k) b(n, p; k)


1—(n+ 1)p/(k+1)
and
(11) k <_ np implies that

P (Binomial (n, p)s k)^ +l)b(n, p; k).


1 — (n +l)q/(n —k

Proof. If k ? np, then k >_ m by (7). Let n ? i >_ k + 1 be arbitrary. Also as in


the proof of Proposition 1,

<n form<_k<i -1,


(a) n(n,p
b(, p; i; i)1 ) — ( i - 1)p k+l q
q -<p
so that

b(n,p;k +i) n —k p ' _ ;

fork ? np and i ? 0,
(b) 6 ( n , p; k) <( k +l q) —B

where

(c) 0 =0n•k,p has 101 <1.


=k+1 q

Thus

(d) P (Binomial (n, p)?k)<b(n, p; k)[1+0+B 2 • • ]


^ b(n, P; k)/(1-9)

establishes (10). Interchange the roles of p and q to obtain (11) from (10). ❑

Note the crude inequalities that for all n >_ 1, 1 <_ i <_ n, and 0 <_ t <_ 1,
— )"_i+1
(12) P( 'n:, t) ^ P(61, ... , C„ -;+, all exceed t) = (1
ADDITIONAL BINOMIAL RESULTS 483
and
„ , +I
(n)P(6I,. ..,e„-1+I all exceed t)=1 n 1(1 - t)
-
(13) P(hn:i>t) 5

Large Deviations of Binomial rv's

The next result is a special case of a theorem of Bahadur and Rao (1960).

Theorem 2. (Bahadur and Rao) Let 0 < p < r < 1, and let X.
Binomial (n, p). Then

(14) P(Xnl n ^ r) =

where

(15) p = (P/r) r (( 1- p)/(l - r))' - '

and

(1-p) 1
(16) 1 - lim d lim d„ <
2Tr n- W n-.0 (1 p/r) 2vr *

Proof. Let k denote the smallest integer >_ nr. Then

Ph = P(X„l n >_ r)? P(X„ = k) = (k)p k ( 1- P)” -k


n n+l/2 p"r(1 —p )n(I-r)
„r +l / 2
(a) (nr) (n(1-r))r )+ '/' 2 27r

by Stirling's formula (Formula A.9.1)


(b) = n -vz p°/ 2,7r

so that we have a lower bound of the proper order. An upper bound is provided
by Feller's inequality (Inequality 1); thus

P"`- -(n+11)pl(k+1)P(X°=k)- 11
i plrP(X"=k)
1-p n -I/2pfl
(c) ^1-p/r 2^r

Thus

(d) Iim d„ >_ 1/ ✓27r and lim d„ (1 -p)/[(1 -p/r)'J2a]. ❑


484 EXPONENTIAL INEQUALITIES AND 1 •/q ff -METRIC CONVERGENCE

Note that (15) implies

(17) 1 logP (X„/n >_r) -+ rlogp +(1— r)logl — p asn-+ 00.

9.EXPONENTIAL BOUNDS FOR POISSON, GAMMA,


AND BETA rv's

Poisson rv's

Throughout this subsection we suppose that r> 0 and

(1) X =Xr= Poisson (r),

and we define

(2) p(k) = p(r; k) = P(X = k) for k=0, 1, ... .

The mean, variance, and moment generating functions of X satisfy

(3) X (r, r) and Ee'x =exp(r(e` -1)).

By observing that

(4) p(k- 1) /p(k) =k/r forall k-1,

we conclude that

Poisson (r) has a unique mode at (r) if r 0 (r) (there


(5) 1j s a second mode at r-1 if r=(r)), and p(k) is
strictly monotone on either side of the mode(s).

It is classic that the normalized ry

(6) Z =Zr =(X, —r)/^

satisfies
rA

(7) P(Zr<_,t)-*P(N(0, 1)<_a)= 4)(A) = I q(x)dx


A
1
e " l' dx
-

2^r

as r -> ao, for all real A.


EXPONENTIAL BOUNDS FOR POISSON, GAMMA, AND BETA rv's 485

As in the binomial case, the tail probabilities are of the same order as the
tail term nearest the center. Thus, by (4),

p(X>k)= p^=p k 1+ r( + r r +... ^pk r J


;.k \ k+1 k+l k+2 } ;_o^k +i

1 forallk>_r,
(8) 5Pk/[1 k+l]
while
k
k-1 +«') pk ^k\'
P(X <_k)= >2 Pk=Pk(l+—+—
j =O r r r J ;=o r/ }

kl
(9) <_ P k / 1-- for all k < r.
r J
Exercise 1. (Large deviations) (i) Using the same technique as in Theorem
11.8.2 of Bahadur and Rao, show that

(10) P(Xr /r> A) = r 2 [e"/(eA)]'dr fork > 1,

where

(11) 1 <lim d,- lim d,<_ 1


2irA .^^ •^^ A —1 27rA
Thus

(12)
! log P(Xr /r A)- (A — 1)—log A asr->oo, for each A>1.

Now show that for 0< A < 1, Eqs. (10) and (12) still hold, but A (A —1) on the
rhs of (11) must be replaced by 1/(1—A) [since (9) is now used instead of (8)].
(ii) Obtain (12) again, this time as a corollary to Chernoff's theorem
(Theorem A.4.1).

We now obtain strong universal exponential bounds on Poisson tail prob


-abiltes.

Inequality 1. For each A > 0 in the "+" case (each 0< A <_ 1 in the "- - case)
the ry X, = Poisson (r) satisfies

(13) P(t(X,—r)/f^A)<_inffexp(—IA)E(exp(tt(X,—r)/^))

2
(14) cexp \ 2 '^^^
486 EXPONENTIAL INEQUALITIES AND 1I• /qJI-METRIC CONVERGENCE
Proof. Now
(a) P(± (X. — r) >_ A) = P(exp (±t(Xr — r)) > exp (tA)) for all t>0
^infexp(—to+r(e:' -1) F tr)
r >o

= f exp (A — (r+ A) log ((r +A) /r)) in the "+" case


l exp(—A +(r —A) log (r/(r —A))) inthe"—"case
(b) =exp(—(A2/(2r))+G(±A /r));

the minimum is simply obtained by differentiating the exponent and solving.


Now replace A by of in (a) and (b) to get (6). In Exercise 11.1.2 the reader
was asked to establish (14); however, the intermediate bound (13) will also
be required in Chapter 14. ❑

Exercise 2. (i) (Bohman, 1963) For all k we have

(15) P (Poisson (r) < k) s P(N(O, 1) <_ (k+ 1— r)/v)


and
(16) P (Poisson (r)^k)?P (Gamma (r +1) >k),

where Gamma (r+ 1) has density xre X/r! for x> 0. [Recall that r! I'(r+ 1).]
-

(ii) (Anderson and Samuels, 1967) Show that

(17) P (Poisson (r)^ k)> P (Binomial (n, r/n)<k) if k_


<r -1
and
(18) P (Poisson (r) <_ k) <P (Binomial (n, r/n) <_ k) if k >_ r.

Thus, the Poisson approximation to the binomial typically overestimates the


tail probabilities.

We now turn to a discussion of the order of magnitudes of Poisson prob-


abilities and Poisson tail probabilities. Suppose first that A is chosen so that
one of k = r t f A is an integer. Then for that integral r ± f ,1 we have [with
$

a( ) as in Stirling's formula (Formula A.9.1)]

P(Xr=rtfA)=P((Xr— r) /'=A) =e rrrt ✓:a/(rtfA)!


e —a(kf) e —rrr:L rA

_ a(k*) e —("r

= e 1
exp (— TI1± Ilog^i± A I -
2ar r± / A \ \ ^/ ‚j:/ fJJ/
a(kt) 2
±-
(19) =e 2^rr 1 i exp\—^
2 +G ✓r
EXPONENTIAL BOUNDS FOR POISSON, GAMMA, AND BETA rv's 487

Recall now our convention of writing

c=ab ifIc—aj<_b.

Since 11— ir(A )I <_ A for JA 1 by Proposition 11.1.1, we have

(20)
_ —a(kt)
P(Xr=rt.lr.l)= 2irr 1O+Sexp(— 2O+2I
I A2 8 \ if
I'kI V IA13
y sS.

Recalling (8) and (9) we note that

(21) (1 r+fA+1) '\11+A/f) `^(1+,1/f)f/A

<_(1+J /A ifl,1J/f<s
and

(22) ( l r_VA y 1
=f/a.

Thus (8), (21), and (19) and (8), (22), and (19) give, respectively,

±1/2 A2 A
(23) P(t(X,—r)/I>,1)<
_(1±.%-) Vh eXp ( 2 (f))

in case r t f A is an integer; this is typically, but not always, stronger than (14).

Exercise 3. Show that


(24) P(±(X, — r)l^>—A,( 1 +c.)) -40
P(±r(X, — r)1 ^ 2 A,)

under appropriate conditions on A, -+ and c,A, -4 a0.

Exercise 4. (Moderate deviations) If ,1, - Qo in such a way that A, = o(r 1 / 6 ),


then

(25) P(t (X, — r)/Jr> A) ^ 1— q3(A) cb(A,) /(A ,)/A, as r -' .

Then using aji(±A/f) = 1 F A/(3f )+ O(,l 2 / r), you can show that

(26) P(t(X,—r)/f>,l,)--[1— 1(A,)]exp(RA 3 /(6f)) asr-*oo

provided A, -, co in such a way that A,= o(r" 4 ). Analogously, note the


488 EXPONENTIAL INEQUALITIES AND II'/9U-METRIC CONVERGENCE

expansion of ii in Proposition 11.1.1,

(27) P(±(X, —r)/f > A r )


k
^ [1 4)(Ar)] exp (A 2 E ( -1)+2' as r^a^
2 )
- -

I ^' ( 1)(j+2)W

provided A, Qo in such a way that A r = o(/(k+l)/(2(k +3))) (i.e , Ak +3 _ o ( r(k +l)/2


.
))

Gamma rv's

Throughout this subsection we suppose r> 0 and

(28) X = X, = Gamma (r+ 1) with density x re -X / r! for x>0.

Recall that X, = (r+ 1, r+ 1). Easy differentiation shows that

(29) the mode of Xr is at r.

We thus manipulate

y)re- (r +TY) dy
P((X, —r) /I ?A)=^ r(r +f /r!
JA
Q -a(r) m
= e -r[(I +y /Jr) -i -log (t+y/ ../r)) dy

2 7i A

e -a(r) m 2
(30)
VJ
I
= 2 A exp(— 2 ^(5)) dy forA>0

and
e -a(r) f 2
(31) P( —(Xr—r)/f>
—,l)=
exp. 2 j \ fl/ dy

for0<A <f

where

(32) (A)- 2 (l +A) forA> -1 and

h(A)=(A- 1) —logA fora>0.

The function rli is described in Proposition 11.3.1. [See Devroye (1981) for an
alternative.]
EXPONENTIAL BOUNDS FOR POISSON, GAMMA, AND BETA rv's 499

Exercises. (i) Use that A^ (A) is Tin the exponent of (30) and (31) to show

1 exp
(33) P(±(X—r)/sIr?A)< 2
tAlf) A ( —« ))

Use that +j(A) is J, in the exponent of (30) and (31) to show


_ a(r)

(34) P(±(X—r)/ ✓ r^A)> e P(N(0,1)?A ( );


p(ta/J)

the rhs of (34) can be further bounded below using Mill's ratio to obtain an
expression looking more like (33).
(ii) Obtain also the bounds based on the moment generating functions,
as in (13) and (14).

Exercise 6. (Moderate deviations) Show that if A, -+ oo, then

P(t(X, — r)/Jr? Ar)

[1-4)(,1,)] provided A, = o(r 6)

1— 1 (Ar)] exp ( A:/(3')) provided A, = o(r'^ 4 )

[1 — 4'(A,)] exp ( FA 2
i =I (J+ 1 )r'^ z

provided A, = o(r(k+,)i(z(k+a)))

Exercise 7. (Large deviations) Obtain the analog of Exercise I for the


present case of gamma rv's.

Beta rv's

Throughout this subsection we suppose a, b> 0 and

(35) X = XQ , b = Betaa + 1, b + 1)
(a+b+1)!
with density x°(1—x)' on [0, 1].
a!b!

Recall that

a+1 (a+1)(b+1)
(36) Xa,b
a+b+2'(a+b+2)z(a+b+3)^
490 EXPONENTIAL INEQUALITIES AND 1•/qfl-METRIC CONVERGENCE

Easy differentiation shows that

(37) the mode of X a , b is at a/(a + b).

In this subsection we will use ä to denote the function in Stirling's formula.

Exercise 8. Show that

P (( X a +b/l (a+b) 3 ^ A )
_ a + b+ 1 e _0)-d)

(38) a +b 27r
ab(a +b) /a yz b by
x JA exp^—
2 [a+b^( ab(a +b))

a Y
dy.
+a+b^`(

Derive an analogous expression for the other tail; you get the same expression
with +i replaced by i/i(— ), and different limits on the integral. Now derive
analogs of Exercises 6-8.
CHAPTER 12

The Hungarian Constructions


of LL, U,, and V,,

0. INTRODUCTION

We are already familiar with special constructions from the construction of


Theorem 3.1.1. We summarize that theorem again here.
There exists a triangular array of row-independent Uniform (0, 1) rv's
{e,,, , ... , ,,„: n ? 1} and a Brownian bridge U defined on a common probability
space for which

(1) tUn — U 11 '4 a . s. 0 as n -* oo for the special construction

and

(2) IIV„ —VII 0 as n -> oo for the special construction;

here U„ is the empirical process of „, , ... , i„,,, V„ is the quantile process of


these same rv's, and V = —U is also a Brownian bridge.
This is a beautiful theorem. It suffices for establishing many deep weak
limit theorems concerning -^ d and - P , CLT- and WLLN-type results. It is
useless however for establishing strong a , s. limit theorems, as in SLLN- or
LIL-type results; it gives absolutely no information concerning the joint distri-
butions involving more than one row.
The proper form for the joint distributions to have is well known to us. In
Section 3.5, we defined the sequential uniform empirical process K,, of a single
sequence of independent Uniform (0, 1) rv's , h2 ,... by

1 (fls)

(3) K,,(s,1)== Y [1 [o ,, ] (,)—t) for s>_0 and 0<—t--1.

491
492 THE HUNGARIAN CONSTRUCTIONS OF I. 1J„ AND 9/„

We noted there that for 1 <_ m s n

(4) K(m /n, •) = m/n U,,, for the empirical process U. of , ... ,

We also showed that

(5) K„ ^ e.d. K = the Kiefer process.

In Sections 2.5 and 2.7 we saw two other approaches for special construc-
tions, and we applied them to the partial-sum process S „. The two additional
techniques are Skorokhod embedding and the Hungarian construction. Both
provide rates of convergence; that is, both would allow us to improve a
CLT-type result whose proof rested on (1) to a Berry- Esseen-type result.
However, only the Hungarian construction provides a single sequence of iid
61, e2,... useful in proving strong limit theorems.
In Section 1 we briefly summarize the Hungarian construction of K „. We
do not give a proof establishing the basic construction. It is long and difficult!
Our purpose is to learn how to use it. The basic fact is that a construction
satisfying IIK,, —KII = O((log n) 2 /Vi) a.s. is possible. The key paper is Komlös,
Major, and Tusnädy (1975). [An alternative that has rate (log n)/V is also
presented, but its incorrect joint distribution theory limits its use.]
In Section 2 we see how a sequence of uniform quantile processes N„ that
converge at a rate can be constructed from the partial-sum process of indepen-
dent exponential rv's. The basic ideas here are due to Breiman (1968) and
Brillinger (1969). Although use of Skorokhod embedding gives an a.s. rate of
nearly O(n ' 4 it is also possible to use a Hungarian construction to obtain
- " )

an a.s. rate of O((log n) /v'i). Again we stress, these results can yield Berry-
Esseen-type theorems, but not strong limit theorems. The Hungarian construc-
tion here is different from that used in Section 1.
Note that the approach of Section 1 yielded an a.s. rate of O((log n) z /Vi)
and the potential for strong limit theorems. It is known that the rate cannot
be improved past O((log n)/'Ii); it is still an open question whether this rate
can be achieved. If so, applications await it (see Section 14.4 for one example).
This is another good reason to omit the proof of what is currently available,
but perhaps not best possible.
The constructions and approximation rates discussed in Sections 1 and 2
all concern the supremum metric II • II. In Section 3 we summarize yet another
construction of 0/„ and U, which pays close attention to the behavior of these
processes near 0 and 1; it is due to Csörgö, Csörgö, Horvath, and Mason
(1984a). This refined construction is based on the same partial-sum approxima-
tion as in Section 2, but particular care is taken in treating the approximation
error near 0 and 1. The result is a construction which is suited to the stronger
metrics II • /qII where q(t)=[t(1 _ t)] 12 ° with 0: v <2. In Section 4 we prove
a Berry- Esseen-type theorem for functionals of U,,, N „, and N„.
THE HUNGARIAN CONSTRUCTION OF K, 493
Even though the proof of the powerful Theorem 12.1.1 is not presented,
one of the basic elementary tools is quite important. The Wasserstein distance
was discussed in Section 2.6.
To extend the empirical process results of this chapter to the general
empirical process — F) on the real line, one simply plugs F into both
K. and K and notes that

(6) K.(s, F(x))—K(s, F(x)) is easy to treat

using the results developed below (even if F is discontinuous); we will not


mention this again in the chapter. Results for the general quantile process
(F n'—F ') are taken up in Chapter 18.
-

1. THE HUNGARIAN CONSTRUCTION OF K.

Let K. denote the sequential uniform empirical process of Section 3.5 defined
by
1 <hs>
(1) IK„(s, t)= J- Y ^ LI to , o (^,)—z] for s?0 and 0<—t<1

for a sequence f,, ' Z , ... of independent Uniform (0, 1) rv's. Suppose

(2) U. denotes the empirical process of C, , ... ,

We recall that

(3) K,,(s, t)= (ns)/nU ( „ s) (t) for 0 s, t<_ 1.

We let K denote a Kiefer process. Thus K is a normal process with 0 means


and

(4) Coy [K(s, , t,), K(s2 i 12)] = (s, n s2)[( t, A 12) — t, 12]

for s, s 2 ? 0 and 0 <_ t, , t Z <_ 1. Recall that

(5) 3„ = K(n, • )/f - U for all n.

In our previous encounter with K. in Section 3.5 we showed

(6) K„='Kon (DT,2T,1^ ) asn-*oowith T=[0, 1] 2 .

We now improve this by constructing special versions of K. and K whose


sample paths are very close together. This powerful theorem is due to Komlös
et al. (1975). It has many applications.

494 THE HUNGARIAN CONSTRUCTIONS OF K,,, U, AND V.

Theorem 1. (The Hungarian construction) There exists a version of


,, 2,... and K as above for which
i
7) li jOJ„— D$ „^^e(lo^ ) ^someM<m a.s.

In fact, we have

lim max III„


k '.)_ K(k,-) j ]

/
g(lo n)2
M <oo a.s.
(8)
„-,m L l-5ksn n v n Vn
^

In fact these results are but simple consequences of a more powerful


theorem.

Theorem 2. (The Hungarian construction) There exists a version of,,


and K as above for which

>((c,logn)+x)lognf
(9) P(^mk xIf /iKn \ n' ' / —K(k,') II
c2 exp ( c3x)

for all x and all n > 1; here 0< C 1 , c2 , c 3 <x.

Corollary 1. We have
g 2
(10) k-
E( max IlK (k /n,')— K (
<
) c )
I_k_n J
for some 0< c < oo.

Exercise 1. Establish that Theorem 1 is an easy consequence of Theorem 2.


Proof of Corollary 1. Let M„ denote the maximum ry in (10). Then

a) EM„= J 0
P(M„>_x)dx

<c '
log2n
+ J o P(M„?x +(c,logn) l ^dx
to to n \ to
=c l g2
✓n
n
+
o
n 1
^P^M„>_(y+c,log ) g I dy g n
✓ n / .T
log n
letting x = y
fn
THE HUNGARIAN CONSTRUCTION OF Iä, 495

log e log n
(b) <_ c, n+JQ
Ic2exp ( — c3y)dy by Theorem 2.

(c) <— (some c) (log e n)/'Ii

as claimed. ❑

Open question 1. It is known (see Komlös et al., 1975) that the conclusions
in (7) and (8) cannot be improved past (log n) /V. Is it possible to remove
one log n term from (7)-(10)?

The proof of Theorem 1 is very long and very difficult, and it would seem
that it might not give the best possible result either. We omit it.
It is possible to get the best possible rate if you are willing to give up
something, namely, the Kiefer-process structure. See Csörgö and Revesz (1981)
for a readable proof.

Theorem 3. (The other Hungarian construction) There exists a triangular


array of row-independent Uniform (0, 1) rv's, , ... ,„„, n >_ 1, and a sequence
of Brownian bridges B„, n >_ 1, for which

(11) P( IIU„ — B„ II > (( c, log n)+x) /V) c2 exp (—c 3 x)

for all x and all n > 1; here 0< C 1 , C2 i c3 < oo and U,; is the empirical process
of „1,...,e„„•

Remark 1. Combining Theorem 1 and Theorem 15.1.2 allows us to conclude


immediately that for the iid Uniform (0, 1) rv's f, , f2i ... and the Kiefer process
6S of Theorem 1, we have simultaneously both (7) and

(12)
(12)
„yam V
lim 0/„ +
1K(n, ) IlIog 2 n log e n
n
"
1 4
<_ some M < oo a.s.
II i L

Contrast (12) and (12.2.6). (See Csörgö and Revesz, 1974.)

Open question 2. What are the best possible rates for simultaneous results of
the type (7) and (12).

Open question 3. Give a result analogous to (7) for the W. process of Section
3.1.

Exercise 2. Use (7) to prove Chibisov's theorem (Theorem 11.5.1). Break


[0, 2] into (0, a„] and [a,,, 2] with a„ = (log n)4/n.
496 THE HUNGARIAN CONSTRUCTIONS OF K,,, U. AND 9/„

Remark 2. Note that the modulus of continuity ri a of U. satisfies

Ew„(a)^E sup IU J„( t )—B„(t)+B„(t)-3„(s)+B„(s)—U,,(s)I


Ost -ssa

^2EIlU,,—B,ll +Ew6,,(a) =2EIw ,,—


I3,,II +Ewu(a)

(13) <_2cn-"2(log n) 2 +10 2a log (1 /a) by (10) and (14.1.20).

Thus, using Chebyshev's inequality,

(14) lim mP(cw„(a)^e)


li =0 for alle >0.

Note that this is an alternate way to establish that U. satisfies (2.3.4) or (2.3.8).

2. THE HUNGARIAN RENEWAL CONSTRUCTION OF Ö/„

Let X,, X2 ,... be independent and suppose

(1) the moment generating function of the X is finite in some neighborhood


;

of the origin.

For each n > 1 we define

(2) en:i= (X, +...+X.)/(X,+... +X ,) for1<_i<_n.

If all X; = Exponential (0), then recall from Proposition 8.2.1 that 0'
• • • n:n - I are distributed as Uniform (0, 1) order statistics. In any case let
U. and V. denote their empirical and quantile processes, and let N. denote
the smoothed quantile process.
Let us define the Brillinger process

(3) I (s, t) _ [ §(st) — tS(s)] /f for s ? 0 and Ostt <_ 1.

From Exercise 2.2.13 we know that

B(s, •) = U for each fixed s > 0.


We single out for special attention the Brownian bridge

(4) I$„=I$(n +1, •) =U foreach n?1.

Exercise 1. Show that B(s, t) and K(s, t) /Ii have different joint distributions
even though B(s, • ) = U - K(s, • )/f for each fixed s>0.
THE HUNGARIAN RENEWAL CONSTRUCTION OF ®/ „ 497
Theorem 1. (The Hungarian renewal construction) Suppose (1) holds, then
there exists a version of iid X1 , X2 ,... and B as above for which

(5) P( JJV„ I^ II > ((c, log n)+x)/ ✓n) <_ c2 exp (—c 3 x)

for all x and all n > 1; here 0< c 3 , c2 , c 3 < 00. We may replace V. by N„ in (5).

Note that this is an improvement over Theorem 12.1.1 in that the extra log n
factor has been removed, as in Theorem 12.1.3. In fact, the best possible rate
has been achieved. However, it is deficient in that the joint distribution of
these N. and V. are different from what they would be if only a single sequence
of iid Uniform (0, 1) rv's were involved. Thus the best rate has been achieved,
but at a price. However, this is the appropriate process to consider when
testing whether or not the interarrival times of a renewal process are
exponential.

Exercise 2. (Csörgö and Revesz, 1978a) Use (2.7.7) to prove Theorem 1.

We will prove here [based on (2.7.6)] the following corollary to Theorem


1. Brillinger's (1969) name is attached since the main idea of using Breiman's
(1968) representation to obtain a rate is due to him. Since the Hungarian
construction was not available to him then, he obtained a poorer rate.

Theorem 2. (Brillinger) Suppose (1) holds. Then there exists a version of


iid X,, X 2 ,... and B„ as above for which
log n
(6) li I^N n — B„ II < some M < oo a.s.

Note that V. may replace c'/„ in (6).


Proof. Let X 1 ,.. . , X,... and S denote the random elements of the
Hungarian construction of (2.7.6). Let 5„ denote the smoothed partial sum
process of X 1 , ... , X,,. Then

II& -S(nI)l ✓n II ^ II§„ -§„lI+1I§„ -S(nI)/ ✓n II


= O((log n)/f)+[ max X ]/V ; by (2.7.6)
(a) = O((log n)/'/i)
since
P([ max X ]/v> r(log n)/.fin) < P(X >_ r log n)
; ;

= nP(exp (tX > exp (tr log n))


<_ nn "E e"`
- for some positive t "near" the origin
n -2 for r sufficiently large
(b) _ (the nth term of a convergent infinite series)
498 THE HUNGARIAN CONSTRUCTIONS OF K,,, U, AND V„

yields (a) via the Borel-Cantelli lemma. Now define c/ n in terms of these
via (8.3.1), and let 4" denote a random permutation of
;

(c) fin;;= n - ' / 2 Ö/ n (i/(n+1))+i /(n +1) for 1 <i<_ n.

We also define B ", as in (4) with (2) in mind, by

_ S((n +1)I) §(n + 1) Brownian bridge.


(d) t3 n + 1 — t n + 1

We then have from the triangle inequality that

n+1
Ilc n —8 n (Ij C q%n +l If
yn+l
_ S((n +1)I)
n- + 1 1
+ )F n
.Ln - 1 11211 §((n+1)I )I
= [1 +o(1)][O((log n) /',[ti)] +O(b"/Vi)O(b" + ,)
a.s.

[by SLLN of (4)] [by (a)]

+[by LIL of (4)] [by LIL of (2.8.6)]

(e) = O((log n) /v) a.s. with the constant independent of rv.

This is a simpler proof, but along the rough lines of Csörgö and Revesz (1978a).
The treatment of V. is even simpler [we really do not need (b)]. ❑

Exercise 3. Show that

(7) E IIV, — I3 , 11 c(log n) /'


for some 0 < c < a . We may replace V. by Ö1" in (7).

Exercise 4. Use Exercise 2.9.1 and (2) to show that

(8) o/ n /b""»Ye a.s. wrt 11 11 on D

for the class of functions Ye in (2.9.8). We may replace V. by Ö/" in (8). [Note
that (8) is an assertion concerning the V. process constructed via (2), and not
about the V. resulting from e,,
62i ... iid Uniform (0, 1); other methods are
needed to make a similar assertion for the latter process.]
A REFINED CONSTRUCTION OF U. AND %/„ 499

Exercise 5. Show that the best rate that could be achieved in (6) with the
Skorokhod embedding of (2.5.25) is (log n) 1j2 (1og 2 n)" 4 /n" 4 . [This is Brillin-
ger's, 1969 rate. The rate in the theorem is from Csörgö and Revesz, 1978a.]

3. A REFINED CONSTRUCTION OF U,, AND V.

The constructions of K,, U,, N", and V,, in Sections 1 and 2 work well for
proving results requiring only convergence of the processes with respect to
the supremum metric II - (I, but do not say anything if convergence with respect
to a stronger weighted 11- /q 11 metric is needed. A refined construction of V.
and v", which emphasizes the behavior of the processes near 0 and 1 and
gives a weak (in-probability) approximation rather than the strong (almost
sure) approximations of the preceding sections, has been given by Csörgö, et
al. (1984a).
This remarkable construction proceeds by first improving the inequality of
Theorem 12.2.1 near 0 and 1. The basic idea is that (12.2.5) involves approxima-
tion errors of partial sums of exponential rv's that are accumulated over the
whole interval [0, 1] which results in the c, log n term inside the probability;
this order of approximation is possible when all n of the partial sums are
required to be close to a Brownian motion. If attention is focused on a
neighborhood of zero, however, for example, a neighborhood corresponding
to a fixed number of summands, then many fewer terms will have contributed
to the sum, and the corresponding approximation error will actually be much
smaller. In order to obtain the same order of approximation near 1, it is
necessary to begin a separate partial-sum approximation at 1 and "run it
backwards" over the interval. Of course, it then becomes necessary to "patch
the approximations together" at t = Z . This program can be carried out, and
the result is the following refinement of Theorem 12.2.1.

Theorem 1. (Csörgö, Csörgö, Horvath, and Mason) There exists a sequence


of independent Uniform (0, 1) rv's , 2,•• and a sequence of Brownian-
bridge processes B" on a common probability space such that

(1) P(11V"—B"11ö^">_(c, log d +x)/Vi) ^ c 2 exp (—c3 x)

and

(2) P(IIV"—B"II1-d/n? (Cl log d+x)/.in)<_c2 exp(—c 3 x)


for all n 0 <_ d <_ n, 0<_x <_ d" 2 , and n >_ 1. Here 0< c 1 , c2 , C3 , n o < oo are positive
constants.

Note that Theorem I reduces to Theorem 12.2.1 when d = n. If d is fixed,


however, then Theorem 12.2.1 has been improved near 0 and I by a factor of
500 THE HUNGARIAN CONSTRUCTIONS OF K,,, U. AND V,

log n. Theorem 1 can be used to control the error of approximation when the
difference between V. and B. is divided by a function which is small near 0
and 1. The resulting weak approximation theorem follows.

Theorem 2. (Csörgö, Csörgö, Horvath, and Mason) On the probability space


of Theorem lit follows that (12.2.5) holds and

n
(3)
—^n
ei "

[I(1 —I)]^ /z ^ c/(n+n


11 -
,—C/ ("+ I) — °
O (1)

for every 0 < v <' and 0 < c < oo as n -* oo.

The most important special case is that of v = 0: thus for any constant c> 0
we have

(4) , [I(1
y—an
—I)]I/211 c /(n+l)
} 1 — c/(n+t)
—O°
(1)

As will be shown in Chapter 16, for b n ° 2 log t n we have both

l
(5) [j(l—I]'
c/(n+t)
/2

n
1 c/(n+i)
= O(b)

and
I— c/(n+n
B
n

(6)

[I(1-1)]'/2
_
1 cl(n+1)
= O (bn), °

so that they both go (slowly) to infinity (for any versions of the processes),
but nevertheless, the special construction of V,, and 3. satisfies (4).
When Theorem 2 is converted to a theorem for the corresponding version
of U,,, the following partial analogue of Theorem 2 results.

Theorem 3. (Csörgö, Csörgö, Horvath, and Mason) On the probability space


of Theorem 1 it follows that

A(log n) 2 (1og 2 n)1'/4 =2-1/4


a.s.
(7) 1- = Ilun+lBnll n J

and

(8) n' /° n = O 10 n ).
A REFINED CONSTRUCTION OF U. AND V. 501

Moreover, for every 0 ^ v < ä

(9) nV [1(1—I)]1/2-^ ^,:^ —O°(1)

Again, the most important special case of (a) is that of v = 0.


Theorems 2 and 3 have many corollaries and consequences, as shown by
Csörgö et al. (1984a). The following corollary, which extends the Chibisov-
O'Reilly theorem in a certain sense, is an important and illuminating con-
sequence of Theorems 2 and 3.
Let g(s) be any positive, real-valued function on (0, 1) which satisfies

(10) lim g(s) = lim g(s) = 00.


s+o s-1

Corollary 1. If g satisfies (10), then on the probability space of Theorem 1


it follows that

(11) ,/2 -PO asn-+coo

and
V
(12) l
3 JJ
1/2
n1(n+1)

g 1/(n+I)
"pp0 as

Exercise 1. Use Theorems 2 and 3 to prove Corollary 1.

Exercise 2. Use Theorem 3 to show that

a yn(t)
P sup <_ x) -- P( sup S(t) <_ x)

if 0< a„ < some fixed (3 < 1 for all n sufficiently large, and na" -* x where S
denotes a Brownian motion process. [If a„ = a E (0, 1) is fixed, then this
becomes (3.8.8).]

As we were in the galley proof stage, the following major theorem of Mason
and van Zwet (1985), that gives a different construction for U,, completely
paralleling results for V,,, was announced.

Theorem 4. (Mason and van Zwet) There exists a sequence of independent


Uniform (0, 1) rv's 6,, e2 ,... and a sequence of Brownian bridge processes
B. on a common probability space such that

(13) P(IIU„ —B„Ilö/"? (c 1 log d+x)//) s c 2 exp (—c3x)


502 THE HUNGARIAN CONSTRUCTIONS OF K,,, U„ AND V„

for all 1:d <n and all —oo<x<oo. (We may replace II IIö /n by II II I -d/n in
(13).) Thus

log n
(14) lIM IIUn—Boll <someM<ooa.s.

(paralleling (12.2.6)). Also

ur n- 13 nl 1/n,1-1/n]

(15)nYll [I(1 — I)]1/


2 = p(1)
v

for every 0<_ '< 1/2 (paralleling (3)). Finally,


n+Bn n/(n+1)
l./

(16) nV [I(1 —I)]' l2 ° 1/(n+n


°
—O (1)

for every 0 <— v < 1/4 (paralleling (9)).

4. RATE OF CONVERGENCE OF THE DISTRIBUTION OF


FUNCTIONALS

If two processes are close, then the distribution of a smooth functional of


them should be close also.

Theorem 1. (Komlös, Major, Tusnädy) Let 4r denote a functional satisfying


the Lipschitz condition

(1) I+'(f)—+1(g)I<MII.i-8II for some 0<M<oo,

and suppose the ry ii(U) has a bounded density. Then

loan )
(2) sup I P(cf(vn) - x) — P(^ i(U) <_ x)I - o (
1
We may replace U. in (2) by V. or 4/ n .

This is a best possible rate for a proof based on Theorems 12.1.3 or 12.3.1.
Unfortunately, the rate 1/f is typically possible.
Proof. Let D denote the bound on the density of ##(U.). Recall from Theorem
12.3.1 (use Theorem 12.1.3 for U,,) that

(a) P( IINn— B nII>n-112(cl log n+A))<c 2 exp(—c3 A) for all A


RATE OF CONVERGENCE OF THE DISTRIBUTION OF FUNCTIONALS 503

and absolute constants c,, c2 , c3 . Let

(b) C3 J f?

Then for any real number x

P(+b(V„) x) P(i6(I„)max+E„)+P(I ii(V„) — ^ E„)


^P((B„)max)+De„+P(IIV„-I„IIjE„/M)
(c) =P(+/i(UJ)<_x)+De„+1/n2 using (b) in (a)
(d) =P(O(U)sx)+O(e„).

The lower bound is analogous. '

Exercise 1. (S. Csörgö, 1976) For W f ö U(t) dt as in Chapter 3, use the


Hungarian construction to show that

sup I P( W„ < x) — P( WZ <_ x)I = O((log n)/ ✓n ),


X

where W 2 = Jö U 2 (t) dt. (Recall from Remark 5.3.4 that the rate 1/n is possible.)
CHAPTER 13

Laws of the Iterated Logarithm


Associated with U,, and V,,

0. INTRODUCTION

In Section 1 we establish Smirnov's LIL that lim IIU !I / b n = Z a.s. and then
strengthen it to Chung's characterization of upper-class sequences. We also
estimate the order of magnitude of the probability that 11 U,, 11 ever crosses an
1' barrier. These are based on maximal inequalities for IjU n /q11 contained in
Section 2, and our earlier exponential bounds.
Section 3 contains Finkelstein's theorem that U n /b n M► Y' a.s., and Cassels's
variation on this. The same results hold for V. also. We use these results, via
the Hungarian construction, to establish K(n, •)/(sb n ) ' a.s. Section 4
extends these results to (I /q (I metrics by giving James's characterization of
those functions q for which U,,/(qb n ) v* YC = {h/q: h E X} a.s. The necessary
Q

exponential bound in proving these results is the Shorack and Wellner


inequality (Inequality 11.2.1).
Section 5 contains Mogulskii's other LIL that lim b n IJU n 11 _ r /2 a.s.
It is trivial to extend these results to general F on R; we do so in
Section 6.

1. A LIL FOR UUUMjf

We begin by establishing a LIL for IIU, II.

Theorem 1. (Smirnov) Let b,, = 2 log e n. Then

(1) lim IIUn


n —= b
I) _ lim f ^ n(G/° - I)#jl = 1 a.s.
-

n n.= V nb n 2

Recall that II@/ II = !IU„ HH, so that N;; may replace U' in this result.
504
A LIL FOR IIU Il5
05

We immediately strengthen this theorem by determining more precisely the


rate at which the drift of IIU" II to takes place.

Theorem 2. (Chung). Let A. /'. Then

(2) P(IIUJ" II >_ A" i.o.) = j a according as E ^" exp (-2A) = j <^
l n=, n l o0

Recall that IIV " II = IIVJ" II, so that V. may replace U,, in this result.

See Theorem 3 below for an interesting corollary to our proof of the


convergence half of (2).

Remark 1. We note that we should be able to guess Smirnov's theorem. Now

P( IIU II^A)^P(IIU"Ii^A)^ 2 P(IIU Ill A),


since II1l n II = IjQJ ; Ij v II UJ" II with the two terms in the maximum having the same
distribution. In Example 3.8.1 we saw that

(a) P(Ilvnll ^A)-.>exp (-2A 2 ) as n -oo.

Reverting to the subsequence n k = (a") with a> 1 is a standard trick in LIL-type


proofs (see Proposition 2.8.1); replacing A by (1+e)6" k /2 in the right-hand
side of (a) leads to convergence if e > 0 and divergence if e <_ 0.S
The proof of these theorems requires a maximal inequality in conjunction
with the DKW exponential bound of Inequality 9.2.1. The maximal inequality
is placed in a separate Section 2 for greater visibility.
We define A = (A, , A 2 , ...) and

(3) P"(A)= P(IJIJ,,,II>_A,,, for some m>n);

thus P"(A) is the probability that a plot of 11U,„ II will ever intersect the shaded
region in Figure 1.

S •
> 2 •
I 0

s. rn
1 2 3 •••
n n+1 000

Figure 1.
506 LAWS OF THE ITERATED LOGARITHM

Theorem 3. (i) A n >' and 2,1 /log 2 n s 1 i.o., then

(4) P„ (A) =1 for all n.

(ii) Suppose 2A n /log 2 n / X. Then (log P„(A))/(2A) \‚ and

(5) logt (A)


Jim —1.

(iii) Mainly, if 2A /log e n / oo but An = O(n" 6 ), then

(6) log P(A)


as n - aD.
2Ä n

Recall from (9.2.3) that under the hypotheses of Theorem 3(iii)

(7) log P(IIUnJI ^'tn) ^ —1 as n -->oo;


2Ä n

thus (6) follows from (5) and (7).


Proof of Theorems 1 and 2. We will prove only theorem 1 and the convergence
half of Theorem 2. The divergence half of Theorem 2 will be omitted.
Since G„(2) is the average of n independent Bernoulli (2) rv's with variance
(T 4, the classical LIL implies

(a) lim II U II/ b„ ? lim v;; (z)/ b„ = Q = 2 a.s.

To complete the proof of Smirnov's theorem, we must establish

(b) a.s.
If we set An = (1 +E)b„/2 for any s> 0, then the series in Chung's theorem
converges; thus P( IIU„ II/ b„ > (1 + E)/2 i.o.) = 0 for any E > 0. Thus (b) follows
from the convergence half of Chung's theorem (to the proof of which we now
turn).
The proof given below of the convergence half of Theorem 2 follows Shorack
(1980); the theorem is due to Chung (1949).
We first note that we may assume without loss of generality that we have
the rather coarse bound

(c) A„ <_ log 2 n for all n.

To see this, suppose An / and

(d) Y_ (A n /n)exp(- 2,l„) <oo.


n=1
A LIL FOR IIU.II 507

Then Ä n = A n n log 2 n is / and using 1t n in (d) still yields a finite sum. Thus
would imply (RU,, 11 > a„ i.o.) = 0; and since Ä n ^ a n , it trivially follows that
P( DU,, I1 >A i.o.) = 0. Thus we may assume (c) in what follows. It is also true
that

(e) An > (log 2 n)/2 for all n ? some n o .

We now justify this. Since x exp (—x) is \ for all sufficiently large x, there
exists an n o such that n ? n o implies
A2
^"'exp(-2^tm)>_Anexp(-2,1n)ß
n o m na In

(f) >_ A n exp (-2a „)(log n —some constant);

and note that (f) converges to ca on any subsequence n' for which A,,<
(log 2 n')/2. Thus A n > (log 2 n)/2 for all but a finite number of n, which
yields (e). This paragraph is essentially contained in Robbins and Siegmund
(1972).
Let n k be a subsequence >' to ca. Let
Ak[ max IIUnII>An k ].
nk^n^nk+l

Now [ IIIJ n 11 > A n i.o.] c [A k i.o.] since A n 1'; thus Borel-Cantelli will yield
P(A k i.o.) =0 and complete the proof once we show that
co
(g) Y_ P(A k ) <m.
k=2

We now specify our choice for n k as (note the discussion in Section A.9.1,
which is not used here)

(h) nk = (exp (k/log k)) for k> 2

Note that for all real d


d
k k+l
1— nk _I—exp d( _ lØg(k+ 1)))
nk+1 lk
! log(1+1/k)
(i) = 1—exp d 1 —log +(k+ l)
k (log k)(log (k+ 1))) )
0) —d/logk ask-+ca.

Combining (e) and (h) we have

(k) lim 2A „ k /log k _> 1.


n—^

508 LAWS OF THE ITERATED LOGARITHM

We now apply the maximal Inequality 13.2.1 with q = 1 and

'Pi k 7
and a = nk+' _
I
(1) c= 1— 7
( 8 log k) n k ( 8 log k)
It is appropriate [see (13.2.2)] to apply this maximal inequality since (j) implies

(m) 1—c --15/(161ogk) and a -1^-1 /(8logk) ask- x,

so that for k sufficiently large Eqs. (k) and (m) yield


(n) 2(a —1) 2(1/8) <
1.
(1 —c) 2 .1 nk (15/16) 2 (1/2)

Thus for k sufficiently large

P(A k ) 2P( Ilvnk +, Il > (nk/nk+l )An k ) by Inequality 13.2.1


<— 116 exp (-2(n k /n k + ,) 2 ,l„ k ) by DKW Inequality 9.2.1
(o) 116 e 5 exp ( -2d„ k )

since (c) and 0) imply

(p) — 2 (nk/nk+ 1 ) 2 .1 „ k =- 2[1 — (1 —(nk/nk+l)2)]A2k-- 2.l„k+5.

Thus

(q) P(Ak) <_ 116 e 5 exp (—c nk ) where cn = 2.1 n is A.

Now
m nk+I
oo> I Y- (cn /n) exp (—cn ) by (d)
k=2 n =nk +l

°O C
/ 1^

Y- (nk +l — nk ) nk exp (—Cnk +l ) since C. I'


k=2 nk +l

(r) ? 1 exp ( — cnk +l) by (j) and (k);


k22

so that (q) and (r) yield (g). This completes the proof.
The term e 5 in (o) shows why the sequence n k of (h) was required. ❑
Proof of Theorem 3. According to Chung's Theorem 2 we have log P(A) _

(a) Y_
log I = 0 for all n unless

n=1
(cn /n) exp (—c) <m where c,, 2A;
A LIL FOR DW„II 509

and thus the present theorem is uninteresting unless (a) holds. Let n k and A k
be as defined in the proof of Chung's theorem, and recall from that proof that
(a) implies

(b) c„ > log 2 n for all n sufficiently large

[which proves part (i)] and that for each K > 1 we have

P„ K (A) P(Ak)
k=K

(c) s 10 6 exp (—c„ k ) by (q) of the proof of Chung's theorem.


k=K

We now set d„ = c„/log 2 n, so that d„ / co by our hypothesis. Also note that

cn k = d„ k logt nk
(d) >_ d„, log (k/log k) — d„ k allowing coarsely for ( )

Thus for any e >0 and for K sufficiently large

k - e lo k\ d ^k
P„ K (A)^106 k Y_ exp (d„k—d„k log(io
g k))
=106
k 1
10 6 k d nk^' - e ) < 10 6 ^^ x -a„ k (I- ` ) ax
k=K K_1

106 k (1-r)+I,
(K — 1 )-e„
d(1—e)-1

and hence

log P„ K (A)
C„ K
er log 106 —log [d„ K (1—)-1]—[d„ K (1—e)-1] log (K —1)
C„ K

logK
=o(1)— "
d K (1—s)logK+
c„ K c„K

log K
(e) ^ —1+2e+ +o(1) by (d)
C„ K
(f) =-1+2e+o(1) by hypotheses.
510 LAWS OF THE ITERATED LOGARITHM

Since cn 1' implies

(g) log I(A)/cn \,

we see from (f), since e > 0 is arbitrary, that

log P(A)
(h) lim
n -.co cn

This completes the proof of (ii). [Note that P P (A) becomes smaller if we
replace IlU.11 by either IIU II.] [Note that this proof fails going from (e) to
(f) if 2A /log 2 n 71+6 for any 0 < 3 < co. In this case we obtain, in place of
(h), that

(8) lim log P.(A) /(2,1„) <_ — 8/(1+8) when 2a. /log 2 n / 1+5.

Thus an exponential bound is still possible.] ❑

Remark 2. (Erdös, 1942) Let X,, X2 ,... be iid Bernoulli (1). Let S.
X 1 +•••+Xn —n/2. LetA n /'.Then

0 °D Ä n Z <00
(9) P(Sn > VA i.o.) = 1 according as exp (
n=1 n =-2A^.
n) _

Erdös bases his proof on the inequality

(10) c exp (- 2.1 2 ) <_ P(Sn > TijA) <_ d A exp (-2A 2 )

for some c, d > 0. He introduced the subsequence n k = (exp (k /log k)). Exercise
1 below asks the reader to make minor changes in our proof of Chung's
theorem (Theorem 2) [note the minor differences between (10) and the DKW
inequality (Inequality 9.2.1)] in order to establish the convergence half of (9).
The divergence half of (9) in Erdös (1942) is long and delicate, but the same
type of minor changes in it will produce a proof of the divergence half of (2).

Exercise 1. Prove the convergence half of (9).

Exercise 2. Prove the divergence half of (9).

2. A MAXIMAL INEQUALITY FOR IU/4I

We saw in the proof of Theorem 13.1.2 how delicate LIL-type results require
us to relate probabilities dealing with a whole block of subscripts to a probabil-

A MAXIMAL INEQUALITY FOR j^U^/qfl 511
ity that just involves the last subscript of the block. We now develop such
inequalities for rv's of the type U'n / q jj . They are due to James (1975) and
Shorack (1980).

Inequality 1. (James) Let q be a positive function on some (0, 0] such that


q(t)/f is \ on (0, 0] and satisfies q(t)/f-co as t-*0. Let A>0, 0<c<1,
and a> 1 be given. Then there exists 0< b,,, C, a <_ 0 such that for all integers
n' <– n" having n"/ n' <_ a and for all 0:5 a < b <_ b,A , c, a we have

(1) P(n"mkXn"'ii q iia i c2P (IN(f) ) Ha > T^

Proof. Let {r ; : i ? 1} be an ordering of the rationals in (a, b); then the


probability p on the lhs of (1) satisfies

p <_ P max sup >A ✓n for tSk = fQJ k .


kn" ; zt l q
\ \n ' (i 1 )
±Sk(r^) I /

For n' <– k<_ n" and i? I we define

A ;k

— max max q(r)
[ n'smsk <i
±S'(r))
<_a^ and±S (r)
q r >a^ J ^ ^
1

and
B;k =[t(SS ..(r; )–Sk (ri ))<(1–c)q(r 1 )A n'] where 0<c<1.

i • •
i -1

i
—►
n k n"

Figure 1.

Now for fixed k, the A ik are disjoint in i; also A i * k * and B ;k are independent
provided k* <– k. Thus by Inequality A.7.1

±U n .(1) ca \ / ±Sn„(r;) \ n„
P sup >—)?P( sup >c n' I by ; <_a
<<<b q(t) // . i^a q(r,) / n

P ULAikBik) ^[inf P(Bik)]P(UZAik)


k i ik k i

>_ [inf P(B;k)]P,


512 LAWS OF THE ITERATED LOGARITHM

where for all i, k (provided b,, C, a was chosen small enough) we have

(a) I'(Bk)^ Var[Sn„(rr)—Sk(r,)) (n „— n' )r,(l r,) —

(1 — c) 2 A 2 n'g 2 (r,) (1 — c) 2 A 2 n'g 2 (r;)


a -1 su t(1 — t) ^1 ❑
(1 —c) 2 ,1 2 a<<pn q 2 (t) 2'

We will now inquire into how small b,,, C,. must be. By (a), it suffices to have

(2) S(6A.^.a)< 2( —1) where g(t)=


(

q(t )

In case q(t) = 1 for all t, it will suffice if

(3) b,,c,a <— (1— c) z ,A 2 /(2(a —1)) in case q = 1.

The present inequality is a strengthened version of the inequality appearing


in James (1975); the original did not keep cA/.f arbitrarily close to A, and
thus did not have the possibility of a LIL. (Evidently, James's thesis did.) Nor
was it noted that the following very useful corollary is possible; see Shorack
(1980). We will need this extra strength in Chapter 16.

Inequality 2. (Shorack) Let a> 1 be given, and let n' s n” be any pair of
integers satisfying n "/ n' <— a. Let 0: a <_ b <— 1 be arbitrary. Then for any 0< c <
I and all A ? A^,^ ° 2(a — 1)/(1 — c) we have
ji11 b
ax a>A)^2P( "
(4) P \nmk n "II I(I k I II I(1 n— I)IIa > J

Proof. The only change needed in the previous proof is in Eq. (a); this now
becomes

a -1 t(1—t) a -1
P(Bik) C sup b
(1 —c) 2 A 2 a
t(1 i) (1 —c) 2 A 2
(a) for A1>!,k, . ❑

3. RELATIVE COMPACTNESS M► OF U. AND V.

The main contents of this section are functional LIL's for U. and V „. Properties
of the functions in the limit set J' are put forth in Proposition 2.9.1. The
theorem is one of the central results presented; it is a natural analog of Strassen
(1964).
RELATIVE COMPACTNESS OF U„ AND V n 513

The empirical process U,, has trajectories in the metric space (D, II II). Let
^ h: h is absolutely continuous on [0, 1] with
(1)
h(0)=h(1)=0and j [h'(t)] 2 dt- 1
0

Theorem 1. (Finkelstein) Let b„ = 2 log e n. Then

(2) aJ„/b„M► YC a.s. wrt II II on D.


Also
(3) N„/ b„ M► Ye a.s. wrt II II on D.

Theorem 2. (Cassels) Let b„ = ./Jg2 n. Let e > 0. For a.e. w there exists
an NE ,,, such that for all n >_ N, we have

(4) IU, (a'b]I < (b—a)(1—(b—a))+e for all 0 a<b<1.

Thus, for a.e. w we have


b] —
(5) lim t b Ab—a)(1—(b—a))
— fora110a<b<_1.
<—
n-.

The same results hold with U,, replaced by N„.

Recall from (2.9.9) that

(6) h(b)—h(a) (b aa)(1—(b—a))


and all hEX
forall0<_a<b< land

Note that y = f t(1 — t) for all 0 t <_ 1 describes a circle of radius Z centered
at (‚0).

lim
n-.eo
f
Exercise 1. (Finkelstein, 1971)
ol
[yn(t)] 2 dt = 1
bn —
it
a.s.

[Hint: This result is easy once the calculus of variations is used to show that
sup {föh 2 (t) dt: hE }=1/ßr 2 .]
Proof of Theorem 1. Cassels's (1951) theorem was first established using very
delicate approximations to binomial-type expressions. Finkelstein's (1971)
theorem was inspired by Strassen (1964). The results for V. are trivial in light
of the results for U,,.

514 LAWS OF THE ITERATED LOGARITHM

We will prove this theorem by verifying (2.8.19) and (2.8.20) of the —


criterion theorem (Theorem 2.8.4) where for each m > 1 we let Tm
{0, 1/m, 2/m, ... , (m —1)/m, 1 }.

Consider first (2.8.20). Let a k = (a k ,, ... , a km )' where

(a) aki- 1((i - 1)/m,, /m](Sk) ;

and let S„= (5,,,,...,S)'= a,+"'+a, so that

(b) Ibn =II m


^b,-,)(m^ — ^ T lb„/l ' m l )'

where II ,,(U „ /b„) is the 11 T„ projection of v„/b „. Since a,, a 2 ,... are iid
(0, T) with o; ; = (1/m)(1-1/m) and Q = —1/m 2 for i #j, the multivariate LIL
;;

of Exercise 2.8.2 implies that

(c) S„ /'/ibnM► Im a.s. wrt I on Rm,

where Jm {(t, , ... , t m ): Y_, t.=0 and m ” , t . — 1 }. Let R,, +1 denote all
x = (xo , x, , ... , xm+ ,)' in Rm +l having x0 =0, and let lIi : R o„, + , -* R m by
(d) W(X)=(X,—Xo,X2—X,,...,Xm—Xm-,).

We may rephrase (c) as


(e) Ir(II m (U n /bn ))M► Jm a.s. wrt I Ion R m .

But +/i is trivially a one-to-one bicontinuous mapping; thus

(f) nr(Un /b,,)-4 (Jm) a.s. wrt I Ion Rm+,

by a trivial special case of Wichura's M► mapping theorem (Theorem 2.8.2).


We will have thus established (2.8.20) as soon as we show that

(g) -'(Jm) = 1 Tm (')

for ' as in (1).


Let hE YE. Then Y_;° [h(i /m) —h((i- 1)/m)]= h(l)— h(0)=0. Also
mrt /m 12
rY_[h(i /m) —h((i- 1)/m)] 2 =m i h'(t)dt
> > c+-n/m

f
m 1 i/m

<_ m — [h'(t)]Z dt by Cauchy-Schwarz


I m o-l)/M

= J 0
[h'(t)] 2 dt s 1.
RELATIVE COMPACTNESS -» OF U. AND V, 515
Thus 1(i(II m (h))EJ, and II m ( )c1/i '(Jm ). Now let t=(t i ,...,tm )'EJm .
-

Then x = (xo , x 1 , ... , ç,)= i/r - ' (t) is defined by letting xo = 0 and xi = E; t; for
1 s i < m. Now define h by letting h(i/ m) = x i for O<— i<— m, with h continuous
and linear on each [(i— 1)/ m, i/ m]. Absolute continuity of this h follows from
its piecewise linearity. Clearly, h(0) = h(1) = 0. Also Jö [h'(t)] 2 dt =
m '(mt 1 ) 2 --1 by definition of im. Thus h E Ye; h has II ,,(h) = rG - '(t) for
t E Jm above; and Jm) c 11 TT (W). Thus (g) holds.
Now let us turn to (2.8.19). Note that the II m approximation (U,, /b„) of
U,,/ b„ satisfies

(h) Il b”—\bn/Il^^mam0,„
ma II O
Un((ib
nl)/m)Il^i-1>/m

Let P. denote the number of times 0 <— e k <— I/ m for 1 <_ k <_ n. Now let e; , 2 , .. .
be independent Uniform (0, 1) rv's that are independent of all other random
quantities introduced so far, including v„. Let U, denote the empirical process
based on f*, ... , f*, . Now note that 0, ... , A m „ are identically distributed
(and dependent) and the infinite sequence of each D i„ (with 1 <_ <_ m fixed) i
satisfies (= here means jointly in all values of n = 1, 2,...)
1 n 1/m

1Xinln — 11 1[O,,](ek)—nt lb^


V n i =1 0
1/m

V ►! L
1 ( 1 k) —nt ll / bn
[o r1
M o

1/m
r *\ _
= 1 ^ 1 [O,mt] (Sk/ n mt
= 1 m ll0 / bn

% rt i l 1 [O, s]\Sk )— m S
s

ll '
l bn

= v” 1 1 [O,S^( k)
^ Yn S -i- n m fs n
n V lin iL' [ ( Y n^ S n 11/b

• n II U* + \ Y" m/ V Yn II/bn

.< V n IIUv„II/bn+IYn—mI/ (V ^ 16n)

(•)
t
Y„ b,,, III*,II IYn—n/mI
=
n b„ b, + Vb„

Considering separately the terms in (i), we note that since lim sup k IIUk II/bk = i
516 LAWS OF THE ITERATED LOGARITHM

a.s. by Smirnov's theorem (Theorem 13.1.1), and since v„/n -_. 1/m by the
SLLN, we thus have an a.s. lim sup of 1/ 2m for the first term in (i). Applying
the classic LIL to independent Bernoulli (1/m) trials show that the second
term in (i) has an a.s. limsup of (1/m)(1-1/m). Thus (2.8.19) holds with
a. =1/ 2m +'./(1 /m)(1-1/m). ❑
Proof of Theorem 2. Now U/ b„ M► a.s. wrt I) II on D. Thus by Proposition
2.9.1 we have for n >— some N , that
E m

(a) IIOJ„/ b„ — h„ II < E/2 for some h„ = h,,, ,^, .


E

Since h„ E Ye we have by (6) that

(b) h„(b) — h „(a)<_ (b — a)(1 — (b —a)) forall0<—a^b1.

Applying (a) and (b) yields

IU.,(b)—U,(a)l Un( b) —

h(b)I +I U b a)— h(a)I +Ih(b) — h(a)I

<e /2+e/2 +J(b— a)(1—(b —a))

for all O< a < b <— 1 provided n > N . This establishes (4).
EW

That the lim sups of (5) are no bigger than claimed follows from (4). That
they are at least as big as claimed follows from applying the ordinary LIL for
each of the countable set of pairs a, b having both members rational; recall
that U, is right continuous.
The proof for N. is completely analogous. However, a bit of epsilonics is
needed for the second result. The key elements are the corresponding result
for U. and the identity (3.1.12). We leave it as an exercise. ❑

Exercise 2. Complete the proof of Theorem 2 for V,,.

Proposition 1. If {aö(s, t): s >_0, 0 <_ t <_ 1} denotes the Kiefer process and
b„ = J2 log e n, then

^ö(n, • )
(7) V—b n a.s. wrt II lion D.

Proof. Just combine Finkelstein's theorem with the Hungarian construction


of (12.1.7). ❑

Exercise 3. Use (7) and the Hungarian construction of Theorem 12.1.1 to


give an alternative proof of Finkelstein's theorem. (Given all this power, it is
now trivial.)
RELATIVE COMPACTNESS OF U, IN II /gII-METRICS 517

4. RELATIVE COMPACTNESS OF U. IN 11 /qIt-METRICS

Let X denote the collection of functions defined in (13.3.1) that appear in


Finkelstein's theorem. Let b" = 2 log e n as usual. We would expect that
v"/( q bn ) ' q = {h/q: h E C} a.s. wrt I) II on D provided things do not blow
up near 0. We seek necessary and sufficient conditions on q for this.
We suppose that q is a continuous, nonnegative function on [0, 1] that is
symmetric about t = Z, and that

(1) q is 7 and q(t)/f is " on [0,J.

Theorem 1. (James) Let q satisfy (1) ; that is, let q E Q.

(i) If

(2) f 1
. g 2 (t)log2[ 1 /t( 1— t)]
,
dt<oo,

then

QJ"
(3) qb" M► q = {h/ q: h € C} a.s. wrt I) II on D.

(ii) Conversely, if

(4)
f z
, q (t)log 2 [I/t(I—t)]
"
1 dz = oo for some 0>0,

then µ► fails, since for all a > 0 we have

(5) lim II
+ 1 >_

^"
n —w q b" o
al" JIM 1 (
n-- q(^n:,)bn
'') _ oo a.s.

while

(6) h(^n:,)/q(n,,)-^0 a.s. asn-+ooforallhEX

[A look at the proof shows that we can decrease a/n in (5).]

Suppose now that we define g in some neighborhood of zero by

q2(t) logt (1/ 1 ) = . 0 2 (t) logt ( 1 /t) if çb(t)


(7) g2(z) =
t log(1/t) log(1/t) f
518 LAWS OF THE ITERATED LOGARITHM

Proposition A.9.2 shows that (2) implies

(8) g(t) -*x as t1.0.

Our proof of James's theorem (Theorem 1) will show that (8) is sufficient to
imply that for all e > 0 there exists 0< a < z for which
E

(9) lim IIU„/(qb„)IIQ„<e


fora„=En^2(I/n)b„;

the assumption q(t) /f \ is not needed for this.


The behavior of U and v„ is quite different; we note that in the proof
below of James's theorem, it is U that blows up when (2) fails. For 4J„ we
have the following stronger result. See Shorack (1980).

Theorem 2. Let q satisfy (1). Then

(10)
v„
qb„ kq = {h - /q: h E J'} a.s. wrt II II on D

if and only if

(11) 4,(t)= q(t) /f / co as t \ 0.

The proof of these results rests on Finkelstein's theorem (Theorem 13.3.1),


the maximal inequalities of Section 13.2 and the Shorack and Wellner inequality
(Inequality 11.2.1). The proof of Theorem I is a greatly improved version of
James's proof, the improvement resting on inequality 11.2.1.
Proof of (i) in theorem 1. For ¢ as defined in (7), the elementary exercise
A.9.1 shows that for any M> 0

(a) a„ <oo where a„ = M /[n4, 2 (1/n)b„].

Thus the Borel-Cantelli lemma implies that a.s. we have 6„ > a„ for all n ^ some
N , ; and since a„ \ 0, this implies that a.s. f„ : , > a„ for all n >_ some N., [or,
a

see Kiefer's theorem (Theorem 10.1.1)]. Thus a.s. we have for n>_ N', that

II U„ II'' nt na„
qb„ o c os, p 0 (t)b„ c 4,(a„)b„ 4,(1 /n)4,(a„)bn
(b) -*0 as n -*co.

Our proof of (b) is the only place where we require the full strength of (2).
In the remainder of this proof we will assume only (8); in particular, the
requirement q(i)//i\ will not be required any further.

RELATIVE COMPACTNESS OF U„ IN II /glj - METRICS 519

Now that we have dispensed with the integral condition and just have
g(t)-*co, we want to show that we can replace g by a smaller function g
satisfying g( t) --> oo, but that -oo in a nonerratic fashion. (Replacing g by a
smaller smooth g still satisfying the integral condition would be much harder.)
Suppose g(t) -* oo as t }. 0. Define §<g by

(c) g(t)= min (inf {g(s):0<s<t}, log ((1/t)) which /0oas tj 0.

Now let f(t)=g(1/t) so f(t) /co as tToo. Let


(d) ä„ = M/[nq 2 (1/n)b n,] where $(t) = log (1/t)/log e (1/t)g(t)

for some fixed )arge M. Then 1/ ä„ <_ n log e n. It is thus possible to replace f
by a function f <_ f that / to co so slowly that g(t) f (1/ t), which 7 co as
‚1, 0, satisfies

g ) as n -> oc where än = M
n z (1/n)b„'
(e)
^(t) = 8(t) log (1/t)
loge (1/t)

We now define

(f) q(t)=fg(t) log (1/t)/log2 (1/t).

Since q ^ q, it suffices to prove our theorem for q. To simplify our notation,


we now re-label q, 0, g, S. as q, ¢, g, a„ for our proof below. Note that our
newly redefined functions satisfy

(g) 0(t)=g(t) log(1/t)/log 2 (1/t)=q(t)/, T oo as

g(t)7 ast1,0 with g 2 (t):log2 (1/t),and


(i) g(1/n)/g(a„) - las n - oo, where
(j) a„ = M/[n¢Y(1/n)b 2,,] with 4 2 (1/n) <_ log n

for some large M to be specified below.


In our treatment below, we will consider each of the intervals

n [ log n (log n) 2 1
(k) [0 , a ] Ia I1 1! ,2L 1,
. ,

n nn n n
[ (lon) 2 1 ^
>a J >[a>z]
n
520 LAWS OF THE ITERATED LOGARITHM

(for some small a to be specified below) separately. Of course, the interval


[Z, 1] is symmetric to [0, 2].
Consider a general interval

(1) [cn, dn ] where cn \ and do \.

Let

(m) nk ((1 +e)k)


where 0 = 0. will be specified small enough below,
and let

(n) A.n=[jjU,/(Rbn)^^am's].

To establish that

(0) lim ^^QJn/(9bn)II,'„ -E,


n -. x

it will suffice to show that


co
(p) E P(Dk) <°°
k1 '
where

(q) nV A m C Dk =
m =nk
[
max
nkk.^ I q I Cnk +l'

By James's maximal inequality 13.2.1 we can choose a = aB so small that

(r) P(Dk) ^ 2P( IIUnk +1 /gjj,k+,' (1— 0) 2 eb nk )

provided we can choose k so large that the 0 of (g) satisfies


_ z
2 b2n k and dnk ^a e .
(s) 1/0(dn6)`(I
0) 6

[We are verifying (13.2.3) in (t) with c= 0 and a =1 +20.] Since 0 is \‚


condition (s) trivially holds provided the a in (k) is chosen small enough.
Thus, using (g), the Shorack and Wellner inequality (Inequality 11.2.1) with
S = 0, applied to (r) gives

(t) P(Dk)- e f ^k I exp ( —(1— 8) e (Iog 6 2 2 nk) p 2 (t)yk) dt,


k+
RELATIVE COMPACTNESS OF U„ IN II /9II-METRICS 521

where we can use (recall that it is N)

i(cflk*l) \)
^
7^ _ ( eb,, (c, k+l ) )
(12)/k—(ebn
+1 Cllk+l — nk+lCnk+, nkCnk+l /.

Thus
I CI-B)6FZY p^ Z Id^ k )
6 fc.
P(Dk) — du
c I (lo gl n k
(13) ^ M6[log (1/c»k+,)]/(log nk)(l-e^6E=,k02(d„k)

We now apply this to intervals in (k).


We first consider the interval [a,,, 1/n]. Setting

(u) c„ = a„ and d„ = 1/n in(1)and(13)

for a„ as in (j) we get

(v) P(Dk) ^ M(log nk)/(log nk)(l-e)°EZ,y+46Ztlink).

Now (p) will follow provided we show that

(w) yk cp 2 (1/ n k ) > (any large M we choose) for all k ? some k91 ..

Now using 1/i \ we have

6n 4(an +,)
(X) ek=yk^2(1/nk)=ßz(1/nk)^ by
nk+lank,

= 0 2 ( 1 /nk) 4G(bn,4^(a" k+I )O( 1 /nk+l)b„ k+S IR) by 6)

(y) = 4 2 ( 1 ! nk)tJI(fk/M),

where

fk = b nkT (a nk+l )bnk+I T (1 / n k+l )


bflko(a „ k+I) b,,k+,45(1/nk+l)
(log nk+l) by (J)
log (1/a„ k+1 ) log nk+,

(z) g(a.,k+1)g(1/nk+,)(log nk+l) by (2) and (j)


(aa) -goo ask-cc by(h)

with

(bb) 10 9fk — log2 nk+1 by (h)•



522 LAWS OF THE ITERATED LOGARITHM

Since ip(A) -r 2(log A )/ A as A -* o , the ek of (y) satisfies [using (aa)]


ek
M<6 2 (l/n k ) 2 1og 2 nk +1 2Mg2(1/nk)
by (2)
g(an k + 1 )g( 1 /nk+1)(log nk+l)g(an
~ k+ ,)g(l/nk +l)

2 111 by (i)

(cc) > M for all k >_ some k0,,a

We now specify M = ME to be so large that when k >_ k,,,M = ke, £ , the exponent
in (v) exceeds 3. Thus
(dd) P(Dk ) <_ (some MB)/k 2 for k?
and thus (p) holds. Hence, from (o), since £ >0 is arbitrary
(ee) lim ^lQ. 1 "/(9 6")IIä^" =0.

We next consider the interval [1/n, (log n)/n]. In this case (13) becomes
e 7 +Q'2((1o8
(ff) P(Dk) s M(log n k )/(log nk )(I_o) "k+l) / "k+1 >>

where [using ii(A) —(2 log A)/A as A-+oo]

ek yk p 2 ((log nk +l) /nk +l) = 0(b 4( 1 /nk+l)) 2 ((log nk+I )/nk +l)
logt (1 /nk +1) 2 ( ( log
nk+1)/nk+} using (g) and (h)
)

Vnkc ( 1 / nk +l )

(gg)

Thus (p) holds, and we conclude


(hh) li m JIU"/(9b")1jiIn " )/ " =0 a.s.

We next consider the interval [(log n)/n, (log n) 2 /n]. In this case (13)
becomes
k'(a)
(I-9)6£=Ym
VJ) P(Dk)^ Mallog nk)/(lognk)

where
z
(log n k +l)
(kk) ek = Yk 02 (
nk+, )

b"kO((log nk+I)/nk+I) )2( (lonk+I)2


g ).
= +fr/
(log nk+I) ` nk+I

p( 0 )0 2 ((log nk +1) 2 /nk +l)


(Il)
RELATIVE COMPACTNESS OF U, IN U /9II-METRICS 523

Thus (p) holds, and we conclude

(mm) li m =0 a.s.

We next consider the interval [(log n) 2 /n, a] where a is specified sufficiently


small below. In this case (13) becomes

(nn) P(Dk) <_ M(log nk)/(log nk)^1-e)bEZYkmZco)


where

yk = f'f
+ (
bn
k
4((log nk+1 )2/nk+l)
(log nk+l)
l/ W(fk)

(oo) — 4r(0) = 1
since

(pp) fk - 0 ask - oo by (g).

If we thus choose a = a so small in (nn) that the exponent exceeds 3, then


E

(p) holds; and hence

(qq) li m IIUn/(gb n)II (log n) 2 /n ^ E a.s.

We summarize what we have accomplished so far. Thus for any e > 0, there
exists 0< a < 2 such that (9) holds. Also
E

(rr) lim IIvn/(R 6 n)IIö`<e.


n-.0

The rest is trivial. First, note that

sup IIh/9IIö` ^ III/9IIö by (2.9.9)


hEW

= II 1 /T II O E = 1 /( a )
(ss) <_ a if a is chosen sufficiently small.
e E

Combining (rr) and (ss) give

(tt) li a.s.
s up II \ bn — h / l q ll o e - 2s
Coupling (tt) with

,
(uu) II \bn —h // 9 lla/ ZC q(a) II 6n —hllo 2
524 LAWS OF THE ITERATED LOGARITHM

Finkelstein's theorem (Theorem 13.3.1) shows that for any h E Ye, any w in the
probability space, and any subsequence n' we have

(vv) IIU n ./bn .— h) /q^J -+0 if and only if IIU, , /bn .—hII -+0.

Thus (3) holds. ❑


Proof of (ii) in Theorem 1. Suppose (4) holds. Then we may suppose that

(a) [g2(t)log2(1 /t)] -' dt=oo for some 0<0<1 /e.


Jo
We let K >0 and define
t 1 _
(b) 4'(t) _ a" = n logt n ' d Kn02(a .) logt n .

We will show that if 0<0< 1, f> 0, and f7 on (0, 0], then

[tf(t)] ' dt=oo implies Y


= 1 nf(1 /(n1log
z n)) = 0°.
-

(c) foo n

Let n 0 = inf {n: a n <— 0 }. Then


1
n — an +l) n+l
nn e a^ + \ ) dt nY_ (a
tJ{/ t C a flan +l)
— (n +1)log2(n +l) —nlog2n 1
n=na n log 2 n f(an +1)

Constant 1
n =0 n+1 f(an +l)

Thus (c) holds.


Setting f(t) = (q z (t) log e (1/t))/t = 4 2 ( t ) log t (1/t), (a) and (c) yield
°° 1 _
—cb
(d) n l n^ z (an)logt(1/an)

But log e (1/a)— log 2 n, so that E ° do = Co. Then P(f',, . 1 <d i.o.) ? P(fn <d
i.o.), and n P(e < d
o i.o.) =1 by Borel-Cantelli. Also, Y_," a n <Co implies by

Borel-Cantelli that a.s. n > a n is true for all n >_ some N ; and since a n \ 0,
this implies that a.s. fn , I > a n for all n > some N,. We have thus shown that
[or see Kiefer's theorem (Theorem 10.1.1)]

(e) P(an < fin: , <d i.o.) = 1.


RELATIVE COMPACTNESS OF U" IN II /gII - METRICS 525

Now on the a.s. event of (e) we have for n >_ some N' that i.o.

Ilvn/(gbn)IIc"^U (6n:1)/[q( n: )bnJ


(1/n— „ ,)(1—nd„)/'/i
: 1—nd„
bn'J,4(an) bn ndn4(an)
_ (1 — nd„) K/2 by definition of d„
>fK/2

(note that nd„ - 0 as n - oo). Since K >0 is arbitrary, we have thus shown

(f) Tim u” ? lim u " (en:1) = CO a.s.


II q b" o n-" q(en:1)bn

Thus (5) is established. In fact, a/n in (5) may be replaced by d„.


That Ih(„,,)I/q(f n:1 ) <-1/4 (e„ :1 )-0 a.s. follows trivially from (2.9.9) and
0 a.s. ❑
Proof of Theorem 2. Note that for all w we have

IlUn/(gbn)II0" / " _ II I/(gbn)I^ö" II 1 /4llö" / " = 1/çb(bn/n)


(a) -0 asn-ooby(11).

The analog of (13) when U. replaces U. is [see (11.1.26)]

(14) P(D) <— M log (1/c„k.,)/(log nk )(I-0)6£202(4"k).

We now apply this with c„ and d„ defined by

(b) c„=b„/n and d=a —a e

for some a small enough to be specified below. Thus

(c) P(Dk) ^ M(log nk)/(log nk)" 1)

= (a summable series)

provided a is chosen small enough for the exponent in (c) to exceed 2. Thus,
as in the proof of Theorem 1, we have

(d) lim ^Iv„/(qb„)^^ n^ /n E a.s.


n- W

Also, uniformly for h E we have from (2.9.9) that

(e) (Ih /gllö<IIh /v7IIö/0(a)<1/4(a)<_8


526 LAWS OF THE ITERATED LOGARITHM

provided a = a E was chosen small enough. Combining (a), (d), and (e) gives

^%n
\ a

(f) li
m 1 6 — h I/qIl <_e a.s.
(

- E uniformly for h E
\ n / 0

The final trivial details are as in the proof of (i) of theorem 1.


Suppose (10) fails. It will be shown in Theorem 16.2.1 that

(g) lim Un a.s.


n-.m 11— I)b n ll 0
But

sup( -1.
(h) 1— I)ll

Thus we cannot have Un/ I(1 —I)b n ^»J ,(,_,) a.s. wrt1) on D; but if
lim,., 0 0(t) < oo, this M► is required. Hence lim,. o 0(t) = oo. ❑

The key steps (12)-(14) will be referred back to in Chapter 11 when we


evaluate lira IIv n /(gb n )110- for various sequences a n \, 0 in the special case
q = I(1 —I).

Exercise 1. (James) If f is positive and / on [0, 0] where j0 [tf(t)] ' dt < oo,
then there exists M> 0, a 0 > 0, and no such that for any 0< a < ao and any
n >_ n o we have

2 U2 g(a2/2n)> Mn' /4 for 0<_ i 25 n-1

where g(t)= f(t)/log e (1/:).

Open Question 1. Formulate and prove the analogous results for V..

5. THE OTHER LIL FOR IIU,II

The successive maxima of partial sums satisfy the "other LIL" given in (1.1.9).
Similar behavior is exhibited by 11U,, this is the content of the following
theorem of Mogulskii (1980).

Theorem 1. (Mogulskii) Let b n = s12 log e n. Then

(1) lim b n IIUJ,Il _ it /2 a.s.


n-.m

Recall = 1!U,,11.
THE OTHER LIL FOR 1JU„II 527
The key ingredients in the proof of this theorem are the exponential bound

( 2 ) P(IIUII^A)— 2 1T exp( Br2 ) as,l->0

from (2.2.12), and the minimal inequality of Section A.2.


Proof. We will first establish that

(a) Lim 6„ jjv„ II > a/2 a.s.

Let E, 0>0 and a>1 and let

(b) nk - (a ka-B ) for some small 0>0.

Note that

(c) Am = [IJU,nli ^ (1 -2 e)(ir/ 2 )/b,nl


satisfies
n

(d) U AmCDk= min IIS,nII<_(1-2e)ß-lot ]'


"k_,5^n_nk
M=„p_l [

where
m

(e) S,„= X (t)


; withX;(t)=1[o,,](f;)—tfor0<—t<1.

Thus (a) will follow once we show that


W

(f) Y_ P(Dk)<oo•
k=t

We define

(g) AI = (1-2e)V ir/ 8 log e n k and X1 2 = M n k — n k _,

and note that

(h) A1+A2_A1 ask-icc

by the mean-value theorem since

f(x)(1— 0)(log a)
(i) f(x) = ax' ' has f'(x) — .

xe
528 LAWS OF THE ITERATED LOGARITHM

Since P( I U,„ II A) -^ P( II U II A ), we can specify M to be so large that


P( IIU m II < M) > Z for all m > 1. Now by the corollary to Mogulskii's inequality
(Inequality A.2.9), we have

P(Dk)<_ 2 P( IISn,II 25 Al +A2)

—2P ( np IIC A'_^-2'

(1 +E)A^ 1
(^) <2P l IIUJ nk II <_ J for all large k

by (h), using the fact that our choice of M above implies


P(IISn — S;II A2) = P(IIS,, —S;II M nk—nk-I)
k

^P(IISn k - S;II^M nk - i)= P(IIUnk-1 M)

(k) ? 2 for all nk _ ; 5 i <- n k .

Applying the KMT theorem (Theorem 12.1.3) to (j) we have

(1) P(Dk) P ( I ^ II (1 +e)A,1 (Constant)(log n k )


2 ) nk

Since the second term on the rhs of (1) is clearly convergent, (f) follows from

1 (
C1-e)1r
P(IIUII<0+e),l P IIvII — 81o g2 n k

( ) ✓- log 2 n k exp j - l 6)2 1og 2 n k } by (2.2.12)


1 - 4e ar l 111

(Constant) (log k)
log nk )Ii(I_e)2
(

(Constant)
V for all large k
`

(m) = (a convergent series).

We have established (a).


We will next prove that

lim b n IIU n (I <_ 2 a.s.


lT
(n)
n-W
THE OTHER LIL FOR IIU J 529
We now let

( 0 ) nk = k k

and we define

(P) Rk - IIsnk—Snk -III

for S. as in (e). The R k 's are independent and for all large k

(l+e)^(7r/2) _ < (1+e) 2 ir


<_
P(Rk 2log, nk — P ^^^"k-nk-.II — 81og 2 nk
(1+s) 2 or + (Constant)(log nk)
^'(IILII < 8 log n k ) as in (1)
e nk

4
2 loge nk exp 41og2 nk
^(1+s) — ((l+e) )]

+ (convergent) by (2.2.12)
(Constant)(log k)
(k log k)' + ( convergent)
g

(q) _ (a divergent series).

Thus the other Borel-Cantelli lemma [(ii) of Lemma A.6.1] gives

b„ R k ^r
(r) ki ^ 2'

Thus a.s. we have

nk (H SnkI II + Rk)
lim b% li U nk 11— lim bnk II Ik H lim b
k-.00 v /1k Y Ilk

b
slim "k Rk +lIm ( 6flkII`Snk-II1 ) nk-1/nk

(s) 2+lim ( IIL^k-VII I lim ( nk-I/nk bn ) k


nk )

(t) 2
530 LAWS OF THE ITERATED LOGARITHM

using Smirnov's theorem (Theorem 13.1.1) on the first lim in (s) and using (o)
on the second. We note that (t) implies (n). ❑

Open Question 1. Is the following strong form of Mogulskii's theorem true?

(3) P(liu„11^^ni.o.)
( 3 2
_ { according as ^" exp ^—
n l _, n 8 ^1 2,^ _ <(
=co

Note that 1 /A„ = (1 t e),7r/ 8 log 2 n are the "crude" dividing sequences.

Exercise 1. (Mogulskii, 1980) Show that

i
(4) lim b„ VJ „(t) dt ='4 a.s.

Other interesting references are Jain et al. (1975) and Csäki (1978).

6. EXTENSION TO GENERAL F

Using Theorem 1.1.2, it is clear that for a sample X,, X2 ,... from F we have
the following results.
Smirnov's theorem becomes

(1) Il m b F) # Il sup F(x)[1—F(x)] a.s.


n —co<x<.0

Finkelstein's theorem becomes

(2) he X} a.s. wrt II II on D.

The extensions of the other results to arbitrary F are also easy, but we do
not know an extension of Theorem 13.5.1.

Open Question 1. Is it true that

(3) lim b Vi IIF„ — Fly = rr sup F(x)[1— F(x)] a.s?


CHAPTER 14

Oscillations of the
Empirical Process

0. INTRODUCTION

In Section 1 we consider the modulus of continuity

(1) w(a) = sup JU(C)I= sup IU(t)—U(s)e


ICI<a ot - s<a

of U and establish Levy's classic result that


w(a)
(2)
li ö 2a log (1/a) —
= i a.s.

We also show (the constant out front is crude) that

Eo,(a)<7 2alog(1/a) for0<a<_2.

In Section 2 we consider the analog of this result for the empirical process
U,,. For sequences a„ satisfying

(3) (i) a„ \O and (ii) na„ 7 co (smoothness),


(4) log (1/ a„)/log e n -^ oo (a„ is not too big),

and

(5) log (1/a„) /na„ -.0 (a,, is not too small),

we show that
W,,(an) _
(6) l
im 2a„ log (1/a„)— I a.s.
531
532 OSCILLATIONS OF THE EMPIRICAL PROCESS

for the modulus of continuity w„ of U,,. The condition (4) is such that

(7) a„ = 1 /(log n' with c„ -> oo satisfies (4),


but c„ = cc (0, oo) fails.

The condition (5) is such that

(8) a„ =(c,, log n)/n with c,, -* co satisfies (5),


but c„ = cc (0, co) fails.

The a.s. limiting behavior of wn (a„) /[2a„ log (1/a„)]" 2 on the boundary
sequences a„ of (7) and (8) with c„ = c is also obtained. Our proofs are based
on the inequality that for 0< a <_ S <_ Z,

A2
(9) P(w„(a)>A./a)<_ä3exp^— (1—S)° 2 forallA>0
) na)
ti(

for the qi function of Inequality 11.1.1; we can improve the 4i term to I in the
case of wn.
All of these results apply also to the Lipschitz z modulus

(10) (5(a) = sup IU(C)l


a^IcJ=I

of U and the corresponding Lipschitz z modulus ä,„ of U,,. Thus

(a)
(11) lilm 1 a.s.
2 log (1 /a)

and

wn(ars)
(12) lim =1 a.s. provided the a„ satisfy (3)-(5).
n- 2 log (I/a„)

Moreover, the results for w „(a„) /v and w„(a„) on the boundary sequences
of (7) and (8) with c„ = c are identical.
The theorems of Section 14.2 are proven via classic methods involving
exponential bounds. There are two alternative approaches of importance to
these same theorems. These are based on the Hungarian construction and
Poisson embedding.
In light of the Hungarian construction of Chapter 12 we might expect that
(6) could be established via consideration of the sequence of Brownian bridges
B„ - K(n, •)/'Ji. The analogs of (2) for B. are proven in Section 14.3. In
Section 14.4 we apply these results to U,, via the key result of the Hungarian
THE OSCILLATION MODULI w, ui, AND rü OF U AND S 533
construction that for a version of v„ we have

(13) um (lag n)z/_<_ some M <co a.s.

It is suspected, but has never been proven, that log n may replace log t n in
(13). The improved version of (13) would allow us to prove (6) and treat the
case (7) with c„ = c via these methods [the case (8) with c„ = c cannot be
handled by these methods since "Poissonization" sets in]; as it is, (13) only
allows us to treat the case (7) with c„ = c and to prove (6) under a condition
far more restrictive than (5). All this is carefully discussed in Section 14.4.
Many results for empirical processes have been proven via some form of
Poisson embedding. This approach is taken up in Sections 14.5 and 14.6, the
necessary inequalities having been developed in Section 11.9. We find that the
proofs based on this approach are very similar to earlier ones; however, the
inequalities developed earlier apply directly to U,,, whereas this Poisson
approach uses the analogous inequalities for the Poisson bridge FYI,. The key
idea is that for a two-dimensional Poisson process rkl,

(14) NI (t)=[N(s, t)—tN(s, 1)]/ t\l(s, 1)


5 for0<_t<_1

observed at the random times S 1 , S2 , ... , where

(15) Sk =inf{s>_0:N1(s,1)=k} fork>0,

has the same probabilistic behavior as does U 1 , U2 .... Our inequalities from
,

Section 5 are for the process M, that replaces the denominator N(s, 1) of N s
in (14) by I; and results can then extend to N. via the SLLN result that
N(s, 1)/s -*1 a.s. as s - ao. We illustrate this on Chibisov's theorem (Theorem
11.5.1) in Section 14.5.
We feel that all three methods described above are of such importance as
to deserve the careful treatment we have given them.

1. THE OSCILLATION MODULI w, w, AND w OF U AND S

Let X denote a process on (D, 9). Recall that C denotes an interval (s, t] in
[0, 1], that ICI = t — s, and that X(C) = X(t)—X(s) denotes the increment. The
modulus of continuity cw x of the X process is defined by

w% (a)= sup IX(C)I = sup IX(s, 1]! = sup IX(t)—X(s)


CIsa Ost -spa 0s1-ssa

(1) = sup sup IX(t+h)—X(t)I.


Oshsa0^tst-h
534 OSCILLATIONS OF THE EMPIRICAL PROCESS

Since the big intervals typically control the behavior of this modulus, we also
define

(2) w(a)= sup IX(C)I= sup IX(t+a)—X(t)I.


ICI=a oar=I - a

Slightly different in spirit, but of related behavior, is the Lipschitz z modulus

IX(C')I
(3) wx(a)- sup
as^cjsl V IC'!

In all three cases we will consider one-sided moduli; thus w (a)=


sup {±X(C): 0 < ICI < a}, and recall that we use w in the statement of a result
if that result is true for all three of w x and tw.
Our special interest is in Brownian bridge U, so we define

(4) w = wv , w = mv , and c^ = w v .

Since Brownian motion S has independent increments, it will be convenient


to study the moduli of S and then make use of the fact that

(5) U S —IS(1).

The Modulus of Continuity

Since a.e. sample path of U is continuous, we know that w(a) - 0 a.s, as a -^ 0;


Levy's (1937) classic theorem determines the rate of convergence to 0.

Theorem 1. (Levy)

w(a) _
(6) li 1 a.s.
m 2a log (1 /a) —

We will, in fact, prove that Brownian motion S satisfies

(7) lim sup


IS(C)I _
< 1 a.s.
a to jcQ I2ICI log (1/ Cj)

and
§ +(C)
(8) lim sup > 1 a.s.
a O IcI=a 2a log (1 /a)

These two together imply that

(9) lim ws(a) =1 a.s.


a10 2a log (1 /a)
THE OSCILLATION MODULI w, Ca, AND w OF U AND § 535
[It is (9) that is usually known as Levy's result; of course, the transition from
to (6) via (5) is trivial. We find (6) handier because U is the natural limit
of v„.] We will, in fact, establish a result that is stronger than (8):

S (C)
(10) lim sup > 1 a.s. for each 0< c o < d0 <_ 1.
Flo icl=a 2a log (1/a)
C=[co d]

In addition to (9), (7) and (8) clearly imply that

(11) lim sup I§(C)I = I a.s.


ato ich =a ✓ 2 C log (1/IC!)
-

and

(12) lira sup IS(C)I _— 1 a.s.


ato jcl^a 2 C log (1/

It is also clear from (8) and the symmetry of S that

(13) ±S may replace ISS in (9), (11), and (12).

From the representation (5) we thus note that

(14) U can replace § in any of (7)-(13).

In particular, the two strongest statements

(15) lim sup IU(C')I 1 a.s.


a1o^C^tco 2 C log(1/ C )
and
U (C )
(16) lim sup >1 a.s. for any 0: c o < d0 <_ 1
Mio IcI=Q 2-a log (1/a)
C^[co,do]

will be of use below. With regard to ii we thus claim:

Theorem 2.

(17)
'Io 2log (1/a) — 1 a.s.

The proof of Theorem 1 requires a good exponential bound, such as that


in Inequality 1 below; and it uses monotonicity of w(a) in the role of a
maximal inequality.
536 OSCILLATIONS OF THE EMPIRICAL PROCESS

Inequality 1. Let 0< a < 1 be given. Then for any 0< 6 < 1 we have
p (—
64 1 .12
(18) P(W5(a)?a,J)<aS (1 —S) Z 2 ) for all ,t >0.
A ex
Proof of Inequality 1. (This seems to be a new and simpler proof of this type
of inequality.) Let M be the smallest integer satisfying
(a) 1 a3 2 2 1 a62
M <— 4' sothat >
M^M-1 4 .
Then partition [0, 1] into M subintervals of length 1/M. Let (s, t] have length

o ••• S ^ t ••• 1
M
Figure 1.

t — s <_ a. Let j/ M denote the smallest point in the partition


{0, 1/ M,2 /M,...,(M -1)/M, 1} that is >_s. Then
s (a)= sup I §(t) — S(s)I
Opt-spa

: max f sup 1,S( j , 2 + r I + 2


sup r,
(19)
M llO r a MM o^rsI /M h
M m Ij^ J
where the 2 enters on the second factor in case no j/ M point falls between s
and t. Thus the stationary increments of S and then (2.2.6) give
M
1
P(cw s (a)>A / )^ { P(lsIA/
Y= I 1 +5
_-)
11
P(II§II M?A s
2(l +s)J
/ \
s M 4P N(0,
(
a) > A1+S
+4P N(0, 1/M)
(
ASS
2(1 +S) ^)
4M+8) A2
(l
27r J A exp ( 2(1 + 5)z
+ 2(1+S) exp A z as Mz Z
by Mill's ratio A.4.1
S aMA 2 4(1+S)
1 4M(1 + 6 ) ( _A
1+ 2 exp Z 1 Z by (a)
^A 2Tr S aM} 2 (1+S) )

<_A4M {l+l }exp(-2Z(1 by(a)


+8)z)
z
(b) 64 exp (
—(1— S) 2 ) by (a). ❑
aS A 2
THE OSCILLATION MODULI w, Co, AND OF U AND S 537

Proof of Theorem 1. In fact, we will first verify that

(a) lim sup I§(C)I :1+2£ a.s.


alo lcl=a 2a log (1/a)
Let

(b) ak = 0" for some 0< 0<1 (close to 1) to be specified below.

Let S = 1— 0, and set

L§( C)I >(I+£)


(c) Ak=[ sup
CI<ak
^^

va,,
2log(1/a k )
J.
Then for sufficiently large k we have from Inequality 1 that

P(Ak)sak.2exp{—(1-6)2(1+£)2log (ak)l

— ^2 ak for S = 1-0 specified small enough


= S 2(0s)k

(d) _ (a convergent series).

Thus P(A k i.o.) =0; that is,

(e) k-.wklog(1/ak)^(l+£)
^ a.s.

Since for a k+I <a < a k we have


W(a) w(ak) 1 w (ak)
a log (1/a) sJ2a,, 1 log (I/ak+1) 2a,. log (1/a k )
+2£ W(ak)
if 0= 0 is close enough to 1.
(f) 1+ £ Za k log ()/ak)

Applying (e) to (f) gives (a). We restate (a) as: For almost every sample point
w there exists an a 0 (£, w) for which

(g) wheneverIC1:5a0(£,w).

That is, we have just proved (7) (since E > Cr is arbitrary).


538 OSCILLATIONS OF THE EMPIRICAL PROCESS

We now prove (8). Let

(h) q(a)= 2a log (1 /a).

Now for any 0>0 and n >_ some n e we have

P(A„)=P( max sI n
i 1 ,n
J i (1-0)q(1/n)I

_[1— P(N(0,1)>(1
\ -0) /
2log n)]” by independence

(i) [1 —exp (—(1-0) log n)]" by Mill's ratio A.4.1

\
= 1— n 1 _ 0 ) <_ exp ( —n e ) since l —x <_ exp ( —x).

(j) <_ K !/ n BK /// for K so large that 0K> 1.

Thus ° P(A„) <oo; and so Borel-Cantelli gives

§((i-1)/n ,

(k) tim max a.s.


n^w 1^,^n q(1/n)

Hence

(1) lim sup S(C) > 1 a.s.


1cI='in q( 1 /n)

<1 /n. Then


Now let n be chosen so that 1/(n+1)-a_

S(C) S
H (C) q(1/n)
(m) Icup
q( a) ICIuPn q( 1 /n) q(a)

IS(C)I q(1 /n(n +1))


— sup
IC^^I/n(n+» q(1/n(n+ 1)) q(a)

Since q(1/n) /q(a) -1, the lim inf of the first term on the rhs of (m) is ?1 by
(1). Since q(1 /n(n +1)) /q(a) -.0, the limsup of the second term on the rhs
of (m) is 0 by (7). Thus (8) holds.
The extension from (8) to (10) is easy. Just note that if the exponent n in
(i) is replaced by ((do —co )n), the series in (j) is still convergent. ❑

Theorem 3. (Bickel and Freedman) We have

(20) Eü(a)<7 2alog(1/a) for0<a^ .


THE OSCILLATION MODULI w, Co, AND ui OF U AND S 539
Proof. Since US—IS(1) we have

(a) w(a) = wv(a) s w§(a)+a) §(1)I

t
K

0 S t

Figure 2.

It thus suffices to establish (20) for Ew § (a). Now with K =(1/a)+1, we have

( k
w§(a) <_ 3 ocmax l otsup K ^I K, K+ t I= 3 oc maK 1 Mk,
k 1
(b)
]

where the M k are iid with

(c) Mk = II§IIö K = 1/K II§IIö= 1/K II§II


We thus have

Ew s (a)<_3E[ max Mk]^3 max Mk ?A) dl


I 0 P(OsksK-1
Oak<-K-I

s3
J 0
[1—P(max Mk <A)J dA

=3 J {1_[i_ P(Mk >_A)]K}dA

=3 J{1—[1_P(IISjI^A)] K }dA

Jj
N' v(a)
(d) <_3K P(IISII^AvK)da+3 dA
v(a) 0

using 1—(1—p) K <Kp,

where

(e) 9(a)= 2a log (1/a).


540 OSCILLATIONS OF THE EMPIRICAL PROCESS

Thus (2.2.6) gives

(f) J
Ew5(a)^3q(a) +3K 4P(N(0, 1) >A ✓K) dl
q (a)

A
(g) `3q(a)+ fq ( a) A 1 2 K exp (— 2K^ dA by Mill's ratio A.4.1
'"

6✓K °° 1
=3q(a)+ j — e -''dy
(log 1 /a)Ka Y
lettingy=KA 2 /2
m
e ydy
(h) <-3q(a)+
2ir(1 2)Ka (log 1 /a)Ka

<_3q(a)+3.5V ^6q(a)

as is required to go from (h) to (20) via (a). U


The Lipschitz 2 Modulus

Theorem 4

(21) lim ^ (a) = 1 a.s.


aIo 21og (1 /a)

Proof of Theorem 4. Suppose f is a continuous function (consider a sample


path of S) for which

f(t) —f(s)
(a) li mdsup —sh(a)>M,withh(a)= 21og(1/a)^,.
t

Then there exists a k J, 0 and 0 <_ s k tk 1 with tk — sk >_ a k for which

(b) IIf(tk)—f(Sk)I} ^M{h(ak )}{ tk — sk }?Mh(t k — sk ) tk — sk.

Consider the three terms enclosed in curly parentheses in (b); the first is
bounded, the second converges to oo, and hence the third converges to 0. We
now redefine a k (and go to a subsequence if need be to replace -- 0 by 10) so
that

(c) ak = tk — Sk J 0 and (b) holds.

Thus (b) gives

If(tk)—f(Sk)I
(d) lira sup Ii(t)—AS)I lim M.
Skh(tk — Sk)
a10 r —ssa vah(a) k-.cn tk —

THE OSCILLATION MODULI io, Co, AND m OF U AND S 541
Thus

wr(a) Wr(a)
(e)
^—
for any bounded f.
al ö 2a log (1/a) ö 2 log (1/a)

To reverse the argument, we note that if

(f)
I f(t) —f(s)I _
^h (a) —M for0^t—s<_a, then
11(t) —f(s)I >M.
t—sh(t—s)

Thus

t)—f(s)I
lim sup If( <Iim sup f(t)—f(s)I
^

(g) %lah(a) alo t—sh(a)'


alo r-ssa t-spa

so that

(h) a foranyf.
0 2a log (1/a)^a 0 2log(1/a)

Combining (e) and (h) with Levy's theorem (Theorem 1) gives (21). This proof
is due to R. Pyke. Shorack's (1982) original proof was based on the inequality
of Exercise 1 below. ❑

The proof of Theorem 4 shows that

(22) lim tof (a) =lim C f(a) =lim wl (a)


'

ato 2a log (1/a) a10 2a log (1/a) alo /2 log (1/a)

for continuous f

Exercise 1. Suppose q satisfies

(23) q is / and q(t)f is \ on [0, 1].

Suppose a and b satisfy

(24) 0<_a(1—S)b<b^5<_1

for some S. Then for all A >0


.
S(C)I 100f b 1 4A2 q 2 (t)
(25) P sup >_ <_ 3 2 exp —(1— S) — dt.
a-5ICJ^b 9(ICI) S t 2 t
542 OSCILLATIONS OF THE EMPIRICAL PROCESS

2. THE OSCILLATION MODULI OF U.

Our primary concern will be with the modulus of continuity w n of the empirical
process U „. In this section we let

(1) wn(a)=wv,(a)= sup U (C)^= sup IU(s, t]


jcIsa Or-,a

= sup sup (v n (t +h) —U n (t)I.


O - ha 0 <1^ 1—h

In analogy with Levy's theorem (Theorem 14.1.1), we would like to show that

w (an)
lim 1 a.s. for a n \ 0.
n,m 2 a„ ( 1/a n )=
l0 g

However, the rate at which a n \ 0 is sensitive; it can affect both the value of
the a.s. limit and the form of the appropriate normalizing constant.
We will assume that a n satisfies the regularity conditions

(2) (i) a n \ 0 and (ii) na„ 1' (smoothness),


(3) log (1 /a n )/log e n - oo (a n is not too big),

and

(4) log (1 /a„)/na n -+0 (a„ is not too small)

Theorem 1. (Stute) Define w n by (1) to be the modulus of continuity of U.


Let a n satisfy (2)-(4). Then

(5) lim w” an) = 1 a.s.


(

n-a^ 2a„ log (1 /a„)

Remark 1. Condition (2) is a smoothness condition. Condition (4) requires


that a„ not be too small; if

C„ log n
(6)
l an =
n

then (4) holds if c„ -^ oo and (4) fails if c„ = cc (0, cc). Condition (3) requires
that a n not be too big; if
1
(7)
an - (log n)`^

then (3) holds if c„ -* oo and (3) fails if cn = CC (0, cc). Roughly speaking then,
THE OSCILLATION MODULI OF U,, 543

Theorem I handles all "smooth" cases ranging from an values as small as


a„ _ (c„ log n)/n with c„ - oo to a„ values as large as a. = 1/(log n)` with
cn -*ao.

We now state two theorems that deal with the boundary cases a„ _
(c log n)/n and a„ = 1/(log n)`, respectively.
For the case a n = (c log n)/n we will have need of the functions ßC and
ß defined earlier. Recall that for c>0 and cl we let ß' > 1 and ß1
denote the solutions of

(8) h(f3)=- and h(ßc)=I whereh(x)=x(logx-1)+1.

Recall also that

(9)
f ß+ I from m to l as c T from 0 to oo,
P c T from 0 to las c T from 1 to X.
-

Roughly speaking, for a n sequences as small as (c log n)/n the phenomenon


of "Poissonization" sets in.

Theorem 2. (Komlös et al.; Mason et al.) Let

(10) a„ = (c„ log n)/n where c„ -- cc (0, oo).

Then we have

(11) c/2(ßi —1) a.s.


li m 2a„ loga(1/a„)

(the limit function is graphed in Figure 10.8.1) and

w (an)
(12) um c/2(1—ß^) a.s.
log (1/a„)—

(the limit function is graphed in Figure 10.8.2).

The case an = 1/(log n)` is one that will arise in connection with the Kiefer
process in Section 14.3. [Theorems I and 2 above are phrased in the spirit of
conclusion (14.3.6) of Theorem 3.1 below, while the upcoming Theorem 3 is
phrased in terms of conclusion (14.3.7).]

Theorem 3. (Mason et al.) Let

(13) a n = 1/(log n)`« where C. -. c E [0, 00).


544 OSCILLATIONS OF THE EMPIRICAL PROCESS

Then we have

(14) f = liIn < lim w " (a " ) = JTi c a.s.,


log 2 n 2a„ Iog 2 n
while
w(a)
asn oo.
(15) 2a „log2n '

Note that for c E (0, oo) an alternative expression to (14) is

w (an) w (an) 1 +C
(14') 1 = lim < lim = a.s.
2a „log (1 /a„) n.= 2a „log (1 /a„) V c

This is the format used in Theorems 1 and 2; we used the format (14) for
Theorem 3 since the rhs of (14) with c = 0 is stronger than (14') with c = 0.

Open question 1. If c„ -4 C E [0, ix)), is it true that

Wn(an)
+C] a.s. wrt I I?
2a„ log Z n I

An examination of our proofs of Theorems 1-3 will show that they, in fact,
establish even more than is claimed. The additional results are easily sum-
marized.

Theorem 4. We may replace w (a„) in Theorems 1-3 by

sup {#UJ „(C): Cj = a„ and Cc [co , do ]}

for any fixed 0<_ c o < d0 <_ 1. For co = 0 and do = 1, since we denote this quantity
by w"(a„), we obtain the special case that

(16) w;; (a„) = sup #U(C) may replace (o,* ( a„) in Theorems 1-3.
ICI=a,

We will also extend the theorems of this section in an important fashion


to all intervals whose length exceeds a„.

Theorem 5. We note that

w(a)
in Theorems 1 -3,
(17) 2log((I/an) may replace 2aw1
THE OSCILLATION MODULI OF U„ 545

where

(18) w;'^(a„)= sup #QJ„(C)


aIcItl vT
We could reasonably be interested in going beyond the lower-boundary
case and considering the oscillation w„(a„) for intervals having a,, =
(c„ log n)/n with c„ -p0. The next remark from Shorack and Wellner (1982)
will cover this case provided c„ does not go to zero too fast. In particular, it
will cover a„ = 1/n(log n) M for M>-1; but it will not cover a„ =(log n)/n' +s
with 5>0, unless S = 8,, \ 0.

Remark 2. Suppose

a„ =(c,, log n)/n where c,,-'0 and log (1/c„)/log n-0.

Then we have

✓n log (1/c„)
lira w„(a„)<2 a.s.
„- w log n

Open question 2. Obtain carefully formulated refinements of the rather crude


Remark 2.

We now turn to the proofs. As is usual, such theorems require a good


exponential bound and a maximal inequality.

Inequality 1. (Mason, Shorack, Wellner) Let 0< a <_ S <_ Z. Then for all A >0

(19)P(w„(a)?.t.^)—<as exp^—(1—S) ° 2 ^G(


3
na/1

Moreover,

(20) P(w(a) > — S3 exp(—(1-8) 5 2 ).


—A ✓a) <

The next inequality follows along the lines of Stute (1982), with care taken
to extend the generality.

Inequality 2. (Stute) Let r>0. Let a, and A. satisfy

(21) (i) a_ \ and A. i', (ii) ma,„ 1',


(iii) a,/ ma,„ <_ some d, (iv) A,,, ? 2 log (1/a,„).
546 OSCILLATIONS OF THE EMPIRICAL PROCESS

Let r >0 be given. For n k = ((1 + 0)") we can choose 0 = O., d so small that for
k ? some k£, d we have
w l+e) Z a) r+E
(22) Pl max ^/m
— (am) ^ r+2e 1 ^2P1 rs ( `
n,_I mink / A m \ Vank(l +e) 1+0 ink ^^

We remark that at step (25) below, the conditional representation (8.4.4)


of U. in terms of a Poisson process is used. This is an important technique
in many settings, particularly in some of the earliest proofs.
Exercise 1. (Stute, 1982) Let 0< a, S < 1, and A >0. Show that if (i) a <5/4,
(ii) 8<(A5) 2 , and (iii) A s (some xs )/ na, then
(23) P (w n (a)?A.)<-2P(IU n (a)I>_(1 - 8)A,ia).

[Stute then got a bound similar to, but weaker than, (20) by using moment
generating function bounds on the rhs of (23).] (Bolthausen, 1977b also used
a version of this inequality.)
Exercise 2. Attempt to prove a version of Inequality 2 using Proposition 1
below.
Proposition 1. Let I✓n = o[U m (t): 0<_ t<_ I and I <_ m n]. Then
(Viw n (a), 19n ), n >_ 1, is a submartingale for any fixed a > 0.
Proof. Now for any fixed set C we note that (v'iv n (C), fin ), n? 1, is a
martingale, since its just an iid sum. Thus

E(wn+l(a) I `van ) = E( sup wn+,(C)I I


lcl^a

sup E(IUn+,(C)I(&n)? sup IE(U, ,(C)I`6n)I


ICI^a (C( a

= sup n/(n +1)Iv n (C)l


ICI^a

(a) = n/(n +l)m n (a)

establishes the claim. ❑


Proof of Inequality 1. Let

(a) An =[w(a)>_ate].

Let M denote an integer, and note that for t — s <_ a

U (s, t] = V/l[G n (t) — G a (s) — (t — s)]

(b) :sv'i[G (t) — Gn(J/M) — (t I /M)] +/ /M



THE OSCILLATION MODULI OF U,, 547
S t

j_ j +_i_ •00 k k+1


M M M M Figure 1.

if JIM is the largest grid point not exceeding s. Thus

(c) wn(a)< max


0sj<-M-1 Osr<a+l/M I
sup v„(-, -+r(+—},
JJJ

and the stationary increments of UJ, give

(d) P(wn(a)? s MP( 11Un1jö +l/M >A,/i– ✓n/M).

The w(a) term is analogous; thus

— U n (S, 1 ] = %I nRt — s) — (Gn(t) — Gn(S))]

k+1 k `

(e) ^^ ( M –s)–(Q^„(Mvsf–G n (s))

leads to

(f) P(w-(a)>_A/a)^ MP( U - 1Iö?AI – ✓n/M).

Thus James's inequality (Inequality 11.1.2) in the ”+" case gives

2
(A a n/M) Z (1–a-1/M)
P(A„)_<M exp( – 2(a+1/M)(1–a–I/M)

(Ala– ✓n/M)(1–a-1/M)
X (a+1/M).^ /^

A 2 a( 1–a-1/M)
! 2 A
(g)
<–Mexp1 – 2 (a+I/M) ( AM✓ä)' P ( an))

\
sin e Ji is j. c
We now require M to be the smallest integer for which

(h) 1/M <a5 3 , which entails 2/M ? 1/(M-1)>_ aS 3 .

We also note that if A ? 82,J, then (h) implies

(i) A >_ S Z na?Vn/(M ✓ctS).


548 OSCILLATIONS OF THE EMPIRICAL PROCESS

Using (h) and (i) we have

a(1 —a -1/M)
J a)(1 -6)?(1 —S) 2 and
( ) (a +l/M) ^(1—

—S),
(1 AM a)1

so that (h) gives

(k) P(wn(a)>_A,/)5 — exp — (1—S) ° Z-


\ ( %I an )/

provided A ? S Z na. In case of w - , we can replace 0 in (g) by 1 using Shorack's


inequality (Inequality 11.1.2); this leads to

(1) a 3 exp \— (1—S)` 2


P(w^(a)?A )^ 4
/

provided A >— 8 2 /na. Now (1) is (20) for A ? 6 2 v'na, and combined with (k) it
establishes (19) in case A > 6 2 /.
Before turning to the proof of (19) and (20) in case A < S Z na, we will
improve on (1) for later use. Now using Shorack's inequality (Inequality 11.1.2)

P(An)- 5 MP(IIU.,JIAv —M)b


y(f)

(A,I — ./n /M)2(1—a )i


<_ M exp (—
2a

x r (A./ a^ )(1 —a)


//

23 p
s ex
Kä 2
—4
z
1 —MA✓n✓--)l z(1_a)2

xa/iI an ` 1—M,lä)(1—a)//
\\
(24) —^3 exp —(1_ S)4 2 #A _i — S) 2 I I
( ( for A >_ S z na,

provided0<a^S<_2.

THE OSCILLATION MODULI OF U,, 549

We now turn to the proof of (19) and (20) in case A < 3V. We now note
that for 0: t — s <a we also have

max sup I0J„ 1 M , M +r


1<_j^M O^r_a ]

(m)
+2 max sup I UJ„ I j — r,
nj M Osrr IfM `M M

the factor 2 on the second term is needed in case no JIM point lies between

S t Figure 2.

s and t. We now suppose M is the smallest integer such that

1 a3 2 4 5
(n) M <— 4 , which entails M < 5a2 + 1 <_52
a

Now (m) and the stationary increments of U. imply

j
M

_I 1 1
P
1 1
( QJ„ 1%
—I 0
1—a 1
1—a 1+S

„ II1/M ^ / 1-1M
/ 1S
+ j=1
M p(I U
1—I 0 1-1/M2(1+8))
(o) =b+d.

Now by James's inequality (Inequality 11.1.2) we have

Ala (1—a)2 ( Av^a- (1—a)


b<2M exp 1(
2a(1—a) (1+S) 2* a 7 (1S)))
10 / z z
4 _)
(P) c aSz exp 1 —(1— 5)3 2 ( an)) a.z exp \ —(1_3)

using (n), and then and 4i(t)> 1— S for :< _6 2 by (11.1.11). James's
inequality (Inequality 11.1.2) also gives [note also (n) and (11.1.12)]

a 2 a (1— 1/M) 2 3 2 AV (1-1/M)S


d<_2Mexp ( -
2(1/M)(1-1/M) 4(1+5) z " 1\(1 /M)'/2(1+5)))
/ z \
c aSz exp l —(1— 5) 3 (1— 5) I if <_35.
(q) 2 2V
550 OSCILLATIONS OF THE EMPIRICAL PROCESS

Thus

(r) P(A) 32 exp (\1—S)


—( ° 2 if A -^M

But the stipulation on A in (r) holds for ,1 <_ 52,/na since

5^ 5
(s) < S Z na < {M since M < (n).
a42 by

Thus (19) holds in the case A s 3 2 '/i. ❑


Proof of Inequality 2. Let E >0 be given. Let

(a) n = nk-I, n = nk, li = nk +l

and for n <_ m <_ n define

(b) A_ = [ sup ±U _(s, t] >_ (r +2s)A m d].

Suppose

(c) M = n — m and V m denotes the empirical process of M+I , ... , en.

Now define

(d) Bm=[ sup IVM(s, t]I<K\A,„ä]

for a very large constant K = K E, d and a very small constant 0 = 0,, d to be


specified below. Note that

(e) Uo(s, t]= m /nv m (s, 11 + M /nV M (s, t].

We will now show that we can choose K = KE , d so large that for all n < m ^ n,

160 p ( — K Z BA_, ( K_1,„ 11 g


P(B„,)<_ a ex usin S =' in (19)
32 ^l Ma Jl

160 exp — K 2 0 log (I) 0 ( f KAm


a 16 \ am / /
by (21)(iv) and since /i is J
THE OSCILLATION MODULI OF U„ 551
and M=m(n/m-1)>_m(n/n-1)>_m0/2 for k>some k A
K20
160 exp 6 (—K log (1/a m ),i(J Kd)) by (21)(iii)
am
^ 160 exp (
d
8 ^o (l g K) log (1/a m ) I by (11.1.8)
am
160
^--exp
KO \
(log K) log (1/a m ) I for K large enough
am 16d

1 0 exp(—G log
6 =160am '^160a 0, ' see(g)

(f) <— 2 if G is chosen large enough,

where
z
64Gd
(g) K=KE,d=2, B=BF,d=4D, D=DE,d=eexp( ).

We note that for k - some ke we have

(h) am (1+0) 2 (n/n)a m <_(1+0) 2 (ma m /n)ß(1+0) 2 a„


for all n<_m<n

using ma,,, ‚' . Thus for n s m <_ n, on the event A m n B m we have

sup un ( S' t] > sup


^^
(S' t]
vri^ by (h) and a m \
r-S(I+9)0„ Ya, 1-s^a., v"m

y(S' t] _ _ ' M sup 1Vn-r(S' t]1


sup by (e)
n «a , m ^j n I-s-a.,, Va m

>_ YY Am
(r+2e)Am
—V Y
- i K1Am
+2e)—./(1+0) 2 -1K ✓ 8] as k^oo)
( — (]+0) [(r
(i) >_Am(r+E) by examining KO in (g)
(j) ?A,,(r+e) since A m /.

Thus

(k)
n
U (Amr)Bm)-Dn= sup Un\S^ t]
^^
1
?(r+)A J,
552 OSCILLATIONS OF THE EMPIRICAL PROCESS

Hence

P ((A m n Br)\ U l(Ak r) Bk))


m=n k=n

'— I P( (AmnBm)\UAk
T
m=n k=n /

U Ak)n
n=n P ^^ Am\ kn Bm^

(1) _ P( A \ U A ) P(B )
m k m by independence
m=n k=n

n
>_[ inf
nksn
P(B )]P U A m )
m
m=n

(m) ^2 P( U Am

as claimed. ❑
Proof of Theorems 1, 2, and 3 upper bounds. This proof is from Mason et al.
(1983). Let E > 0 be given. Define

(a) A
r
wm(am)
(r+2E) 2 log (am/
L .^ J'

where we specify r later. We seek to show .^ P(A m ) <co; but this is a


consequence of the maximal Inequality 2 provided we show that

(b) I P(Dk)<°o,
k=1

where
r+e
(c) Dk = I a„ ,I((l+d)Zank) /((l +d))> 2log \
+ LLL l d ank1 _,/ J

with

(d) nk = ((1 + d ) k ) and d = d a sufficiently small number.


E

Now by (19) we have

20r+E i 1
I'(Dk G s3 ank ( —(1 —S) 4 7k' (

1+ d)2exp
)—
I +d log ank
-1
()

s >4(r+021k/(l+a)2l -1
(e) < Ca;,('-
k-1
using ( 2 ),
THE OSCILLATION MODULI OF U, 553
where

(f) Yk_^ (r+e)/(1+d)2


2lo
g ( a.' _,^ /

Suppose now that the hypotheses of Theorem I hold. Set r = 1 in the


definition of A. made in (a). Note from (f), (2), and (4) that

(g) Yk -* 1 ask -* o0

since the argument of ill converges to 0. Thus from (e) we have for S = S E
sufficiently small and K = K E , s sufficiently large that

(h) P(Dk+I) Ca. k fork>_ K.

The worst situation for the convergence of the rhs of (h) is when the a m 's are
as large as possible; but from (3) we know that for k sufficiently large

1
< for any constant cc (0, co)
a "` (log n k )`

--(k log (l+d)) - `

(i) --(klog(I+d))-2R by now specifying c=2/E.

Thus (h) and (i) show that

P(D k ) <_ Constant/ k 2 .

Thus (b) holds in that P(Dk ) <oo. Thus

(j) li m "( a") < 1 a.s. under Theorem I hypotheses.


Za n log (1/a)

Suppose now the hypotheses of Theorem 2 holds. We consider (11) first.


We rewrite Eqs. (e) and (f) as

+ ck log (I+d) {t^>4t.+F>ZYk,t/^I+a)2]-I


(k) P(Dk+I)--CI (I+d)
k )

where

/ ^(r+e) \
(I) yk ? y + = ill 1 r I for k sufficiently large.
554 OSCILLATIONS OF THE EMPIRICAL PROCESS

The series on the rhs of (k) will be convergent for any small E >0 and sufficiently
small choice of S = 5. provided r is at least as large as the solution R of the
equation [note the exponent of (k)]

1 =R2y+=R2+fr( ^cR)

vc \ Z v R\I [recall (11.1.2)]


= R 2 2(^ R J h^l+

(m) =ch(1+ - I.

Thus, from (8)

VR
1+ =13

or

(n) R= J(PC -1 ).

We have shown that [note (11)]

w(a)
<
2 (ß^
(o) um c —1) a.s.
„ oz 2a„ log (1/a n )
under Theorem 2 hypotheses.

We now turn to the upper bound in (12). We note that (24) applies since
the correspondence

r+e
A ---- 2(oga, and na-^ n k + ,(1 +d) z a „ k
l+d

'—1 2klog(1 +d) - ✓ (l +d)'ck log (1+d)


(p) +d

satisfies the requirement A > S Z na of (25) for all sufficiently small 5 since
((r+E)/(1 +d) 512 ) 27 c >_ 5 2 . Thus applying (24) to the Dk of (c) gives

(1 —S)°yk(l log
P (Dk )CS3a „k (1 +d)2exp(— +d) \al_,/)
= ca(kl-'ii)4(r+El2yq/(l+d)2] -I9
(q)
THE OSCILLATION MODULI OF U„ 555

where

(1—S)z(r+E)/(l+d)z
Y k =—IP 21og1 1
nk+Ian ; \a„k-i ,

f (1- S) (rs 2 )2 fork sufficiently large


2

(1+d) c

r+e
(r) ? y- \— 2 c for d and 3 sufficiently small.

Thus the series in (q) will be convergent for any small s > 0 and sufficiently
small choice of S = S£ provided r is at least as large as the solution R of the
equation [note the exponent of (q)]

^R IR

Thus

(t) l--=-=ß or R=I(1—ß^).

Theorem 3 will be proven in its entirety in Section 4. Nevertheless, the


following paragraph may be of interest.
Suppose now the hypotheses of Theorem 3 hold. Then from (e) and (f) we
have for k sufficiently large

(u)
P(Dk+l)^C{[klog(1+d)]`}

since when (13) holds we have

(v) yk -+ 1 ask-* co.

The series on the rhs of (u) will be convergent for any small 8 > 0 and
sufficiently small choice of S = S provided r is at least as large as the solution
E

R of the equation [note the exponent of k in (u)]

(w) c(r2-1)=1.

Thus

l+c
(x) R =
c
556 OSCILLATIONS OF THE EMPIRICAL PROCESS

We have thus shown that

wn(an) 1+ c
(y) lim a.s. under Theorem 3 hypotheses.
n-•^ 2a„ log (1 /a„) c

[The trivial modifications needed in definition (a) for the case c =0 are easily
made.]
Comment: We cannot proceed along these same lines in a proof of Remark
2, since in that case the hypotheses of our maximal inequality fail. The upper
bound of Remark 2 is the cheap one that results when the whole sequence
1 ° P(A m ) [not the subsequence Z ° P(Dk )] is made convergent by choosing
,1 large enough in the exponential bound. ❑

Exercise 3. Prove Remark 2 showing that the second condition can be


weakened considerably if the bound on the lim sup is allowed to increase.

Proof of the Theorem 1 (and its Theorem 4 version) lower bound. This particular
proof is here for a purpose. It is based on the conditional Poisson representation
of U. It is a crude proof in that the factor 3 ✓n enters below at line (f). Because
of this crudeness, it is not possible to establish the lower bound in Theorem
2 by this method; it will be established in Section 6 via consideration of the
Poisson bridge. This proof is presented here so that the shortcomings of the
conditional Poisson representation are made explicit.
Let

1
(a) M=Mn=
Qn

and define

ty ((i-1)a is ] 1
(b) Bam= max "' — (1 —e)r 2log
n ! YQ„
l^i_M - an

As in (8.4.2), we define

(c)v„(t)= ✓n m N(n t) —t fort>_0


l
for a Poisson process N, and we note that [see (8.4.4)]

(d) dnI[N(n) =n]-=U_


THE OSCILLATION MODULI OF U„ 557

Thus

t[N(n(i-1)a", nia"]— na]


max
P(B„)=P
<_i--M nan

(1—E)r 2log^ IN(n)=n 1

P(E)
(25) =P(E^F)--
P(F)
_ t[N(na")—na„] ^ M
—(1—s)r 2 1 og( 1 /a")) /P(N(n)=n)
(e) —P( na"

since has stationary independent increments.


Now by Stirling's formula (Formula A.9.1)

(f) P(r(n)=n)—(n"1 e -" — 1 1


n ! 2 Mrn 3.%n

From the exponential bound on Poisson probabilities of (11.9.14) we thus


have for n sufficiently large that

P(B)<_3V4 1 _exp ! —(1— E) 2 r2 log (1/a")

xß ! (1— e)r 2 Iog (1/a") ))1"ß


(\ na" J
^3f [1—exp [—(1—e) 2 r 2 log (1/a„)]] M by (4) and (11.1.11)

= 3v' [ 1— a „^ -E)2.2 ] M -- 3Vi exp ( — Ma,'- ' )


£)z Z

(g) 3fn exp (—a„h e)2,2-I ) by (a)

3'/
setting r = 1
xp (a-,`)
_ 3^
with c” -^ co by (3)
exp ((log n)"-)
(h) _ (a convergent series).

Thus Borel-Cantelli implies P(B" i.o.) = 0; that is,

U,,((i —1)a n , ia n ] 1
(i) lim max > 1 a.s. with M = I
-.=,^;^M 2a„ log(1/a") a"
558 OSCILLATIONS OF THE EMPIRICAL PROCESS

under Theorem 1 hypotheses. Thus


±U(C)
(26) lim sup ) > I a.s. under Theorem 1.
n-^oo ICI =n„ 2an l og (1 /an

This completes the proof. Note that our proof can be trivially modified to
show that under Theorem 1 hypotheses

±y0(C)
(27) lim sup -1 a.s.
> for each 0 <_ co < d0 s 1;
n -, ICI =Q.. 2a n log (1 /an)
C [co.do]

just redefine M=((da —c o )/a n ). ❑

We will prove Theorem 3 (and its Theorem 4 version) in its entirety in


Section 14.4 by appeal to the Hungarian construction. In fact, much more
general results will be discussed.
We will prove the lower bound of Theorems 2 (and its Theorem 4 version)
in Section 14.6. In fact, a whole set of Theorems for U. will be developed via
consideration of the Poisson bridge.
We refer the reader to Mason et al. (1983) for all of Theorem 5. However,
the reader may note Corollary 14.3.2 below.

3. A MODULUS OF CONTINUITY FOR THE KIEFER


PROCESS K n

Let K denote a Kiefer process, so that

^ (n^ ) =v foralln1.

In this section, we use w n to denote the modulus

(2) wn(an)= w ®„(an)° sup Jin(C)1= sup IBn(s, t]I


IcI^a„ O f -sca„

for various sequences a n satisfying the mild smoothness conditions

(3) an \ 0 and na,,/.

Let us define c n by

_ log(1/a„)
(4) cn = to ° or a„ _ (log n) `^,
gz n
MODULUS OF CONTINUITY FOR KIEFER PROCESS US„ 559

and we suppose that

(5) c„ . C E [0, cc].

The following theorem is due to Chan (1977) for c = cc. The Jim sup result
is due to Csörgö and Revesz (1978b). The lim inf result seems to be due to
Mason et al. (1983), though the same conclusion for S(n• )/v'i had been
established by Book and Shore (1978). Note that for CE (0, cc), the statements
(6) and (7) agree.

Theorem 1. Define w„ by (2). Suppose (3) and (5) hold:

(i) If c E (0, oo], then

"(a") "(a„} —
(6) 1 = lim <— lim 1+c a.s.
n-.00 2a„log (1/a n ) n -.00 2a„log (1/a n ) V c

(ii) If CE [0, oo), then

(7) =tim
n -.
`”'" "(a ) <lim
2a„ log, n „^^
2" a ")
a ( = l+c a.s.
„Og g n

(iii) The theorem continues to hold if n is regarded as a continuous variable


on (0, co).

Note that Levy's theorem (Theorem 14.1.1) immediately implies that

(8) w(a)
_________ -4 1 asn- cc
2a„ log (1/a„)

provided that a„ - 0. That is, the -> p limit agrees with the lim inf in (6) and (7).
From Exercise 2.2.12 we know that

(9) K(n, t)/.,/ [S(n, t)—tS(n, 1)]/ ✓n

where the LILof theorem 2.8.3 implies that S(n, 1)/(fib„) [-1, 1] a.s. wrt
where b„ = J2 log e n. Thus, note (6),

(a) a,S(n, 1)/'/ fib„ §(n, 1)


^0 a.s.
2a„log (1/a„) 21og (1/a„) Vib„
as n->cc when c=oo;
560 OSCILLATIONS OF THE EMPIRICAL PROCESS

also, note (7),

(b) anS( n, 1) / / n _ S( n, 1)/v n


-

-0 a.s.
2a" log 2 n – a " b"
as n -* oo when 0:5 c < oo.

Thus it suffices to redefine

(10) Wn(an)=G1g(n,.)/ ✓ n(an)

and prove Theorem I for the redefined w".

Corollary 1. Theorem 1 also holds for w n defined by (10).

Our proof will require the exponential bound of Inequality 14.1.1 and the
following maximal inequality.

Inequality 1. Let 0 <a < I be given. Then for any 0< S < I

§(m, \ / 2 \
(11) P max sup -?A'%ä
C)I I<as3 expl – (1–S)' —
( ^M!5" ICIsa / \

for all A > 0. [This continues to hold if n is regarded as continuous on (0, oo).]
Proof. Let Wv n =Q[S(m, t): 1 msn,0<_ts1]. Then

E( sup IS(n +1, C)IIW")>_ sup E(IS(n +1, C)II '§n )


ICI^a IcI^a

(a) >_ sup { §(n, C)i= Mn ,


IC

so that [and this is also valid for n continuous on (0, oo)]

(12) (Mn , W"), n >_ 1, is a submartingale.

Thus for any real r> 0 we have

P( max Mm ?a an) =P( max exp(rM„,/(an))>_exp(rA 2 ))


lmn tsm^n

Z ,
s exp (–rl )E exp ( rM'
an

by Doob's inequality (Inequality A.10.1)

exp (–rA 2 )E exp (rZ2)


MODULUS OF CONTINUITY FOR KIEFER PROCESS K, 561

= exp ( -rk') J P(erz= ^ x) dx by (A.4.1)

=exp(-rk 2 ) J P(Z> (logx)/r)dx


1

=e2

1+

by Inequality 14.1.1
f
64f e - cl -s>Z(log z)/(2r> dx
e aS 2 (log x)
]

(b) e-.a2 [ 1+ 64v/ r_ JW x -(I-s) 2 /(2 , ) I


dx
aS e log x ' .

Now for any 0< 6 < Z the quantity in square brackets in (b) does not exceed
(64/aS 3 ) provided r= rs = (1 - S) 3 /2. ❑

Proof of Corollary 1 for c = ao. Let

(a) nk = ((1 + 6) k ) for some sufficiently small 0 = 0,

to be specified below. Let

IS(m, C)! 1
(b) A,„° sup ^- > (r+)
e 2a„, log ^ — J
ICHu,, v m , am/

for r to be specified below, and note that

IS(m, C)(
( c c ) nU A,„ ^ Dk = max sup
m — nk nk^mcnk+1 ICI<pnk "K

?(r+e)\j 2 an k + l log (
k)]
'

where Inequality 1 gives

ank+,
(d) < KB exp (-(1- 0) "k
P(Dk) _ (r+ s} Z Iog 1/
ank \ nk+t a nk ank

Since for k sufficiently large


z
nk a nk+l — nk nk+ta n k+,
(e)
/ nk+l ank nk+lJ nkank
) 2 '
nk / 1 (1-B}3
/nk+1
>^ (1+0)3'
562 OSCILLATIONS OF THE EMPIRICAL PROCESS

the estimate (d) gives

e> °c. +e>z


(f) P(Dk)

< Ke an, if r = 1 and 0 = 0 is chosen sufficiently small


E

<_ K B (log n k ) - "^k where c„ -^ oo by (4)


<— KB /(log n k ) 2 for k sufficiently large

+0)) z k b
(log(1K81 2
y (a)

(g) _ (a convergent series).

Thus Borel-Cantelli gives P(D k i.o.) = 0, and hence (c) implies P(A,„ i.o.) = 0.
That is, since r > 0 is arbitrary,

S(n, C)J
(h) lim w" a " = lim sup < 1 a.s.
n -.^ n log (1 /an) n -=^cl^a, /2na, log (1 /a")

when c = oo.

We now establish the lower bound. Let

1
(i) M=M„=
^an)
and define

(j) B„ max
wn(( t-1)a n, ia„J 1
--(1—B)r 2log J
1
1si-M an

for any Brownian motion W. (for Corollary 1, the appropriate choice is


W„ _ S(n, • )/Vi). Now by independent increments

1M
P(B)= P(N(0, 1)<
— (1 _O)ritj 2log 1
a„ J
1M
=11 —PI N(0,1)>(1 -0)r 21og 1
L \ a / J
1 ( _(I — "^
< 1 —exp B) 3 r 2 log 1
an J
for all large n, by Mill's ratio A.4.1
= [1— a „1 - e > 3 r 2 IM

ex (I-° '2 I
P ( —Ma„
)

exp (a^' rt B 3'


)
-')
MODULUS OF CONTINUITY FOR KIEFER PROCESS K,, 563
1
= r I) using ( 4)
(k) exp ((log n)'n(I _U_o)
1
ec1-B^ if r=1, where c n 30(1—O)^oo
exp ((log n) ^ )
(1) =(a convergent series), for any 0<0<1.

Thus, since the lim inf of the supremum of a larger set of values is larger,

W(C )

lim sup >1 a.s. when c = cc in (5)


(13) n- IcI=a^ 'J2a log ( 1 /a")

for any sequence of Brownian motions W.

Note that (13) still holds if the sup is restricted to cc [c o , do ] for some
0: Co < d 0 1.
Thus Corollary 1 holds in case c = cc. Theorem 1, in the case c = «0, follows
from (9) through (10). [Since (6) and (7) are different versions of the same
statement in case cc (0, cc), a proof of (7) will complete the proof of Corollary
11 ❑
Proof of Corollary 1 for CE [0, oo). We will consider the lim sup in (7) first.
The proof that

W(a)
(a) li
m 2a" log2 n — 1 + c a.s.

is similar to the proof of (h) in the proof of the c = co case. Just replace
log (1/a.) by log t m in line (b) of the c = oo proof, and obtain at line (f) of
that proof that

(b) P(Dk) KB
a"k[log nk](1-e)'(r+,)2
KB K
c k(^-a^=c.
[log nk](1 a) (r+e)
(c)

If we now let r = VIT, then (c) is a convergent series for any e > 0 provided
0 = 0 is chosen small enough. Thus (a) holds.
We must next show that

(d) lim "(a") ? l+c a.s.


n-^' 2a" log e n

Our proof will be modeled after the proof of Proposition 2.8.1, which could
be consulted at this time for motivation.
564 OSCILLATIONS OF THE EMPIRICAL PROCESS

Let M—(1/a) and n k = ((1+8) k ) as before and let

(e) Zk = max
I M "k+t

[S(nk +l , Lank +t ) — S(n k+l, ( i — 1) an k+ )] — [S(nk, lank +t ) — S(nk, ( 1 — 1) a nk +t )]

2 (nk +l — nk la nk +t 1°g 2 nk +1

Then the rv's Zk are independent and

P(Zk?(1 — E) I +c)=1 — P(Zk<(1 — E) I +C)

= 1 — [1 —P(N(0,1) -(1—e) l+c 2 log2nk +l)]"',

M = Mnk+,
?1 — [1 —exp ( — (1 — E)(1 +c)logznk+l] M
by Mill's ratio A.4.1, for all large k
- cl -E>cl+r)]M
= 1—[1 —(log nk +l)
-Ed(l+C> )
> 1 —exp (—M(log nk +l) - (l
(1 - e ) c l +c>) by (4) and (5)
1 —exp (—(log n k +1 )` -
1 —exp (—(log nk +l) ) - -1 —exp (—K B k) for some KB
2
KB 1 K0 ) ]
KB K
>_1— 1 —
-E + 2 (V`
ki = kl- E - 2 kz-zE

_ (a series with infinite sum) — (a series with finite sum)


(f) _ (a series with infinite sum).

Thus the second Borel-Cantelli lemma implies, since E > 0 is arbitrary,

(g) lim Zk ? 1 + c a.s.


k-.

Keep in mind that n k = ((1 + O) k ) where 0 will be specified below to have a


very large value. Now a quantity we are really interested in is related to Z k by

S(nk+I, lank + t) — S (nk +l, ( i- 1 ) a n k +t)


= max
l , M^k +t 2nk+1ank +t logt nk +l

(h) nk +l — nk
Zk
nk +l

—F S(nk,
max
1,i M k+t

lank +t) —S(nk,

2nkank +t logt
( i— 1)ank +t)

nk +l

MODULUS OF CONTINUITY FOR KIEFER PROCESS K,, 565


Thus (g) and (a) applied to (h) give us

l+c— l+c a.s.


(i) lYk
in — F^—
+0 0
^ I+B

for arbitrary large 0. Thus

(j) lim Yk >_ l+c a.s.


k-+

and this in turn implies (d).


We now turn to the lim inf in (7). Let

(k) nk = ((1 + 0) k ) for sufficiently small 0— 0 E .

We will show that

(14) lim sup W "k (C _/ a.s.


) when cE [0, oo) in (5)
k-.x' ICI=a,, 2 ank 10 82 nk
for any sequence of Brownian motions W".

Just replace log (1/a") by log 2 n in the definition of B. at line 0) of the proof
in the c = co case, and obtain at line (k) of that c = oo proof that for any
0<0<1 and CE(0,00)

P(B"
) 1
exp ((log nk)`^k-U-e) . )
1
if r–/
exp ([k log (l+
exp (–d'k') for some d, d'>0
(1) _ (a convergent series, by the integral test).

Thus for c E [0, cc) we have (the case c = 0 is trivial).

(m) the lim inf in (14) is >J a.s.

We also note from Levy's theorem (Theorem 14.1.1) [or (8)] that

(C) S( C)
(15) sup W = sup -.p when c E [0, oo]
SCI' a^ 2a" log 2 n IcI=,, 2a" log (1/a")
for any sequence of Brownian motions W".

Combining (m) and (15) gives (14). Minor changes in (m) and (15) give (14)
566 OSCILLATIONS OF THE EMPIRICAL PROCESS

with J Cl = a replaced by ICl^ a. Also, this latter version of (14) implies that

(n) the tim inf in (7) is <_ f a.s.

We will now show, completing the proof, that

§ + (n, C)
(16) lim sup ?^ a.s.
n-.00 ICI =a„ 2na log 2 n

To go from (14) to (16) we must "fill in the gaps." Defining k by n k <_ n < nk+,,

original term

term 2

term 3 ^,
nk

t t +ant
+a.R,

term 1
Figure 1.

we note from Figure 1 that

t +a n )—S(n, t)]
iim sup [S(n,
n^w075[^I-a^ 2nan 1og 2 n

[S(nk, t +ank) -S ( ak, t)]


lim sup
k-.W O -rs1-a„ k 2nk+lank logt nk +l

I S(nk, t + a.) - S(nk, t + a nk )I


— lim sup
n-.m Os [s l -a„ VLnkank logt nk

^S(n, t +a n )—S(n, t)—S(n k , t +an)+S(nk, 1)1


—Ti sup
n-.ao 0^[<1- a 0 2nk an, log2 nk

—^ l+c a.s.;
(o) 1 +9^ l+e l+c
MODULUS OF CONTINUITY VIA HUNGARIAN CONSTRUCTION 567

the first term in (o) comes from (14), the second term comes from (a) [note
that the length of the intervals in the supremum is a" — a" k and that

0<
a,—a" ^1— na"
(P) nksl—nk^l— k
a" A nka", n n nk+I 1+0

so that (a) can be applied on the subsequence n k with interval length of order
(0/(1+9))a" k ], and the third term in (o) is similarly implied by the proof of
(a). Since 0>0 is arbitrarily small, (o) implies (16). ❑

Corollary 2. Show that if the c„ of (4) satisfies c" -* c E (0, co), then

B
(17) lim sup "(C) =f a.s.
^ck"„ J2lCI log 2 n

Proof. As with Theorem 1, we need only prove (17) with Ia n replaced by


S(n, • )/v and then apply (9). That the lim inf is a.s. ? f follows trivially
from (16). That the lim inf is a.s. =V then follows trivially from (15). ❑

4. THE MODULUS OF CONTINUITY AGAIN, VIA THE


HUNGARIAN CONSTRUCTION

In Chapter 12 we saw that for the Hungarian construction, the empirical


process U,, of the first n independent Uniform (0, 1) rv's of a single sequence
6, , f2 , ... could be well approximated by a Kiefer process aö to the extent that

satisfies

/ (log n) 2
(2) Ti m !;U" —B„ some M <oo a.s.

In Theorem 14.3.1 we learned that

(3) lim WB " (a ) = 1 a.s.


"

2a" tog I/a„)

provided a" \‚ na,,!', and log (1/ a")/log 2 n- oo as n -* oo. Combining these
should tell us something about a,"(a"). It does yield Theorem 14.2.3 but only
part of Stute's theorem (Theorem 14.2.1) (see Proposition I below). These
results are from Mason et al. (1983).

568 OSCILLATIONS OF THE EMPIRICAL PROCESS

Proposition 1. Let a" > 0 satisfy

(4) (i) a" \ and (ii) na/

and

(5) (i) log(1/a") - oo and (ii) (log n) 2


log e n • 2na" log (1 /a")

as n - oo. Then

^,(an)
(6) lim = 1 a.s.
2a" log (1 /a")

Condition (5)(i) requires a" to be small. Condition (5)(ii) prevents it from


being too small; it is more severe in this regard than is (14.2.4). Note that
a" = (log n)'/n satisfies (14.2.3) and (14.2.4), but not 5(ii).
Proof of Proposition 1. We note that

4 W 1(a ) _ jun(C)I
2a„ log (1 /a") IcIP„ f2a" log 1/a")

(a) scup„ 2a" log (1 /a„)


! log en by (
+Ot a.s. b (2)
2nalog(1/a"))

(b) = sup I^ "(C +o(1) a.s.


)) by (5)(ii).
jcjt"„ 2a” log (1 /a")

Applying Theorem 14.3.1 to (b) gives (6). ❑


Proof of Theorem 14.2.3. As in the previous proof, we write

w(a) IB"( g)l +0 loge n


(7) = su
a" log 2 n Icj p„ 2a" l0 2 n (vf2na l,oge n

JBn(C)j +o(1)
(c) = SU p
^cI_o„ v2a " Iog 2 n

and apply Theorem 14.3.1. ❑

Remark 1. We would like to replace the second condition in (5) by


(log n)/ 2na" log (1 /a") - 0 as n - m; indeed, if we could replace (log n)2/v'
EXPONENTIAL INEQUALITIES FOR POISSON PROCESSES 569
by the known best possibility (log n)/V1 in (2), then this would be possible.
With the present condition (5), the sequences

(8) an-n-s with0<8<1 and a(2±E) (21og 2 n)/n

satisfy (5). The sequence

(9) an =c„(logn)/n whenc„?co

fails to satisfy the present (5), but would satisfy the improved (5). The sequence

(10) a„ = c„(log n)/n with lim c„ <oo

fails both versions of (5). Let us emphasize

if (2) is ever improved to the extent of replacing (log n) 2 by log n,


(11) then our proof of Proposition 1 provides another proof of Stute's
theorem (Theorem 14.2.1).

[There is another way to look at this: if (2) is ever improved to rate (log n)/v,
then our proof of Stute's theorem (Theorem 14.2.1) implies Theorem 14.3.1
via the improved (2).]

5. EXPONENTIAL INEQUALITIES FOR POISSON PROCESSES

The One-Dimensional Poisson Process

In this section, we present an analog of the James-Shorack inequality


(Inequality 11.1.2) and the Shorack and Wellner inequality (Inequality 11.2.1).
Note also Inequality 11.2.2. These are from Shorack (1982c).

Inequality 1. Let {(t): t ? 0} denote a Poisson process. Then

(1) P(II(r%J—IYIIo/./i?A)exp ‚ ( --.-


47b l)

for A >0 in the ”+" case and for 0< A <_ f in the "—" case.

Inequality 2. Let q / and q(t)/' \, for t ? 0. Let 0 <_ a <_ (1— 8)b < b <_ S < 1.
Then
b 2 2 \
(2) P(II(N—I)'/g)lä^A)^s a rexp(—(1—S)y } q i t)
1 dt,
570 OSCILLATIONS OF THE EMPIRICAL PROCESS

where
‚(
(3) y =1 and y + =‚ A (a) f.
a /

Proofs. Now {exp (±r(I^(1) — t)): 0 <_ t<_ b} are both submartingales. As in
the proof of Inequality 11.1.2

P (II( N — I) * IIo—A)=P(Ilexp(±r(N — I))IIo>—exp(ra)) for all r >0

<_ inf exp (—rA)E(exp (±r(t^(b) — b))) by Doob's A.10.1


r >0

(a) <_ exp ( — (A 2 /( 2 b))+(i(±A /b))

as in the proof of Inequality 11.9.1. Now replace A by A f to obtain (1).


For Inequality 2, just reread the proof of Inequality 11.2.1 noting that since
the factor (1— B' ')/(1— e' - ') is not needed at line (e) of that proof, the factor
of 0 is not needed in line (f). Thus we only need (1-6) in (2) and (3), whereas
Inequality 11.2.1 had a (1— 6) 2 in both places. ❑

Exercise 1. Prove that 'i,, of (8.4.2) converges weakly to Brownian motion S


in !l/II if and only if q is as in Chibisov's theorem (Theorem 11.5.1).

Exercise 2. Show that for each real k> 0,

(4) P( II (N — I) /vT o° " ^ E log e n)-'0 as n -co

for all s > 0. [Hint: Set q(t) = f, b = b„ = logs' n, and a = a„ = (log n) - ' in
Inequality 2 to handle the interval [a., b„]. Then letting A.
[ II (N — I)/ Vi II , > e log t n], we have for the first arrival time q, of that

P(A„)<P(A . n[T1 > an ])+P(''l1sa)

.5P(V?e log e n)+1 —exp(—a„)= 0+1 —exp(—a„)


-'0.

Combining these gives (4).]

The Centered Poisson Process

It is routine to mimick these results completely for the centered Poisson process
(s >0 is fixed) defined by

(5) L(t) =[(s, t) —st]/./i for 0 <_ t _< 1.

Recall N(s, t) is a two-dimensional Poisson process.


EXPONENTIAL INEQUALITIES FOR POISSON PROCESSES 571

Inequality 3. For each fixed s> 0 and each b> 0, we have

(6) P(JILs^^0 A)<_exp\ ^


2 2 \^ bs //

for A >0 in the "+" case and for 0< A <J in the "—" case.

Inequality 4. Let q / and q (t )/ f \ for t > 0. Let 0 <_ a (1— S) b < b <_ S < 1.
Then

(7) P(Ill-5/q^^a-a)^^ a
6

ieXp{ (i — — S)y 2
}
2 2
q I 1
t) }dt,

where
(1)
(8) y 1 and y + = +p t Aä )

Exercise 3. Follow the details of Inequality 1 and Inequality 11.2.1 to prove


Inequalities 3 and 4.

The modulus of continuity w.. of the centered Poisson process 1, is defined


by

(9) w(a) = sup jl_5(C) = sup Il-s(tl, t21!.


ICIsa Qsf2-r1a
0s" 1 s1251

Recall that C denotes an interval (t 1 , t 2 ] with ICI = t 2 —t.

Inequality 5. Let 0< a ^ 5 < 1 be given. Then

(10) P(w (a)?,1.ä)<-S3exp(—(1—S)


2 ui( sa)/

for all A>0.


Proof. We have written out the details of the proof of an analogous inequality
for U,, (see Inequality 14.2.1). Read that proof verbatim, noting only that the
extra (1 — b) again cancels out. ❑

Proposition 1. Let `ii 2 = o [D_,(t): 0 <_ t <_ 1 and 0 <_ r<_ s]. Then

(11) (fw s (a), %), s?0 is a submartingale

for any fixed a > 0.


572 OSCILLATIONS OF THE EMPIRICAL PROCESS

Exercise 4. Prove Proposition 1. Recall Proposition 14.2.1.

Applications of the above exponential bound will often occur in conjunction


with the following maximal inequality.

Inequality 6. Let x> 0. Let sequences 0< a s < 1 and A s > 0 be given that satisfy

(12) (i) a s " 0 and ,1 5 /, (ii) sas /,

(iii) d s / sas - d, (iv) . ,A s > 2 log (1/ a s

for some 0< d <. Let e >0 be given. For S k = (1 + 0) k we can choose 0 = 6r, a
so small that for k >_ some kE d we have

`Ur(ar) ``'sk +,((1 +B)ask) > x+e ASk )


>x +2E) <_2P(
(13) PI sup
sk1r<sk Vu r A r \ a 3k (1 +0) 1+9 J
Proof. We could ask the reader to depend on the proof of the maximal
Inequality 14.2.2 for U,,. Though we are covering the same ground, we feel it
best to write out the details here.
Let e >0 be given. Let

(a) S ° Sk -1, S — Sk, S — Sk +1

and define

(b) T= inf {r? s: w r (a r ) ? (x +2s)V,a r }.

We also define

Br= sup J ^ Ls(C) — Lr(C)I <Kf^L r forssr^s


(c)
1^Cl^a,
^

s —r ✓ar J
(d) _ [w,-r(ar) < KV OA r \ a

for a very large constant K = K f d and a very small constant B =B Ed to be


specified below.
We will now show that we can choose K and 0 so that

(e) inf P(Br )?2.


s5r5s

EXPONENTIAL INEQUALITIES FOR POISSON PROCESSES 573
Now for <- r <_ s we have from (d) and Inequality 5 with 8 = Z that

< 160 p
P(B;) a,
( 1)
(_ !^fOAK',1,
2„

eX 16 s-r JJ
/ z
! exp j - -j--0 log (1/a,)0 (Ka.)) by ( 12 )(iv)
<_ 60

since s-r=r(s/r-1)?r(s/s-1)=r9 and ifr is %


K20
^ 160 exp ( - log (1/a,)ir(Kd )) by (12)(äi)
a,
160 Ko(loK)
g
—exp ( - /a.)) by (11.1.8)
a, 4d log ( 1 /
160 ( Ko(1oK) i )
^ exp - og (1/a.) J
8d
160
=—exp(-G log (1/a,))=160aG - '--160af
a,

(f) <_ 2 if G is chosen large enough

since we now specify

D E2 ( 32GA
(g) K=KE,d= ^ 0 Oe ,a-
D=DE.d-sexpt
,
EJ
4D'
Thus (e) holds.
Now on the event [T = r] n B, we have
ra, sa s
w s ((1+0)a S )>ws (a, ) since a,=—<—=(1+0)a,
r s
1 s(C)
1, -
VD_.(C)j
(h) w.(ar) - sup
s s ^C^^ a , s-r

(i) (x+2e)A, ✓ä,- F KVA,V by (c)

=[(x+2E)—KV (1+8)2-1}Af- since =(1+6) 2


e
[(x+2e) -2KO]A,✓ä,
1+6
6) >_ (x+ e)A, V by (g) if was sufficiently small
(k) ? (x + e )A, V since A, / and a, \ .
574 OSCILLATIONS OF THE EMPIRICAL PROCESS

That is,

(1) [T= r]n B,c Ds =[co s ((1+6)a 5 )>_(x+E)A s V] foralls<_r^s.

We thus have, using (f) in step (m),

^x + 2 e) = 1 P(s<T <
_s) by(b)
2 P \ sup s VA,
)
2

(m) = P(s<_ T^ s)[ inf P(B,)]<_

< P(D)
SSrSs
f s
^
P(B r ) dFT (r)

(n) by ( 1 ).

This is what we sought to show. ❑

Theorem 1. The modulus of continuity cv_ t of L S is such that

(14) c4 , a s , s may replace w,,, a, n

in the upper bounds of Theorems 14.2.1 - 14.2.3.

Proof. Our proof of Theorems 14.2.1 - 14.2.3 depended only on the analogs
of Inequalities 5 and 6. ❑

Corollary 1. Equation (14) continues to hold if w, instead denotes the


modulus of continuity of either fill s or N, ; see Section 8.5.
Proof. The result for 1, and the identity

(15) M,(t)=L,(t)—tl_ s (1),

where L,(t)=[RJ(s, 1) — s]/f=O( 21og 2 s) a.s. by the LIL combine to give


the result for ftv ,. Then

(16) N,(t)=1y0,(t)/ N (s, 1)/s


-

where NJ(s, 1)/s --> a. ,. 1 by the SLLN and the result for M, give the result
for N,. ❑

The Poisson Bridge

In (1) and (2) we presented analogs of the James-Shorack inequality


(Inequality 11.1.2) and the Shorack and Wellner inequality (Inequality 11.2.1)
for the centered Poisson process N — I; we now present these same analogs
for the Poisson bridge I S , though we could get by without them.
EXPONENTIAL INEQUALITIES FOR POISSON PROCESSES 575

Proposition 2. Consider ., for fixed s > 0. Then

(17) {fA . (t)/(1— t): 0<— t < 1} is a martingale.

Proof. SUI = fill s . Let Wr = o [NA(r'): 0 <_ r' <_ r]. Knowing Ml (t) in any neighbor-
hood of t = 0 gives knowledge of tkJ(s, 1) since v(I) is a pure jump process
plus a linear function with slope proportional to N(s, 1). Also, the conditional
distribution of N(s, t) given IN, is (s, r) plus a binomial ry with s, 1) —IN(s, r)
trials and probability (t — r)/(1 — r) of success. Thus for 0<_ r< t < 1

1 )_ {F (s,r)+[f`I(s,1)—N(s, t)](t—r)/(1—r)}—tN!(s, 1)
(a) E ( M(1)
1—t ^' (1—t)/

N(s, r)—r(%(s, 1)
(1—r)

(b) =^(r) ❑
1—r

Inequality 7. Consider any fixed s > 0. Let 0< b < 1 be fixed. Then for all
A >0 we have

2exp ` -
(18) P \Iii — iio > 1 — b)^ 2b(I b)^(^b(1—b)//

Proof. Now both of {exp (±r(^II_,(t)/(1— t): 0<_ :<— b} are submartingales by
Proposition A.10.1 for any real r. Thus, for M = M,,

P( INN t /(1— ')IIö — A/(1— b))


\\
<_ P( sup exp (± rM( t) l >exp ^ M )) for all r>0
0^1^b 1 — t 1— b

ri — b) ))
nf exp (_ j - - )E( exp (t by Inequality A.10.1

(a) = inf exp (—rA)E(exp (±rM(b)).

Now

(b) M(b) = [(1 — b)i0i(s, b) — bt\!(s, (b, 1])]/.ls,

where we have represented (b) as a linear combination of independent


Poisson rv's having means sb and s(1 — b).

(19) The differencing in (b) prevents separate cases for RA and M 5 .


576 OSCILLATIONS OF THE EMPIRICAL PROCESS

Since a Poisson (0) has moment generating function

(c) exp(0(e' -1)),

the log of the moment generating function of M(b) is thus

sb(e(1-e>ri ✓ s_1)+s(1—b) e -b.i ✓ s_I)

^sbr (1—b)r + (1 —b)2rz + (( 1_b)r\

L VJ 2s k=3 VJ /k /k'J
2 2
br + b r
+s(1—b)
[— ,Is- 2s
k
b(1— b)r2 (rl^)
+sb(1 —b) -
2 k=3 kl

(d) =sb(1—b)[e'l ✓s-1_ ].

Letting

g(r) = — Ar +sb(1— b)
(e)
L e' / 7 -1

our bound in line (a) can be expressed as

(f) exp (inf g(r)) = exp (g(rmjn))•

Solving g'(r) = 0 for r,, ;n and plugging into (f) gives (18). [Note that this very
same calculation is performed in going from line (c) to line (d) in the proof
of Bennett's inequality (Inequality A.4.3) if we replace the na g and b of that
proof by the present b(1— b) and 1/f.] ❑

Exercise 5. Verify that the bound (f) in the previous proof reduces to (18).

Note that (18) can be rewritten as

(20)( i' 11—I


I J6
o A)
<_2exp\— (1—b)2
A
z
2b( 1 —b) +,\^b
A //

for all A > 0. On the rhs of (20) the (1— b) 2 does not matter at all (since we
typically work with b's that are less than the generic e > 0), and the rest of
the bound is the exact analog of (11.1.25). Not surprisingly, we can prove the
same theorems that (11.1.25) yields. [In fact, the unwanted (1—b) in the
argument of , in (18) will typically cancel out.]
EXPONENTIAL INEQUALITIES FOR POISSON PROCESSES 577

Let q denote a continuous nonnegative function on [0, 1] that is symmetric


about t = 2 and satisfies

(21) q(t) / and q(t)/ /i\ for0<_ t<_2;

thus q E Q.

Inequality N. Let gE Q. Let O<a<(1-8)b<b<S<Z. Fix s>0. For all A>0,

(22)
b

I'(IIM/gllä?A)<^ a t exp(—(1—S) Z y 2 q t) ) dt, t


2 2

where

(23) ,v° 0 ( ag(a) 1

Proof. Reread the proof of Inequality 11.2.1 in the A„ case verbatim. Note
that at line (i) of that proof the extra factor (1 — b) in the argument of r' in
(18) even cancels out, so that we obtain exactly the same result here as we
did there. ❑

An Application

Given these inequalities, it is easy to use the conditional representation of


(8.4.4) and the SLLN to provide an alternative proof of Chibisov's theorem
(Theorem 11.5.1).

Exercise 6. Provide the alternative proof of Chibisov's theorem described


above.

Weak Convergence of ICJ.' to f r'

Let , ... , „ be independent Uniform (0, 1) with empirical df G,,. Define a


process N. by

(24) >nc
N (x)= {n G,(X/n) if 0n

Exercise 7. Show that the left-continuous inverse N„' of N. satisfies

(25) =N-' on (D r , 2 n (1 (1 ) as n- cC,

where T= [0, co) and II II. is defined appropriately.


578 OSCILLATIONS OF THE EMPIRICAL PROCESS

6. THE MODULUS OF CONTINUITY AGAIN, VIA POISSON


EMBEDDING

Recall from Section 8.5 that N(s, t) denotes a two-dimensional Poisson process,
that for each fixed s > 0

(1) ^S(t)=[N(s,^ —st]


fort?0

is a centered Poisson process having independent increments and

(2) v5(t) = [ ^ (S' t) N(s' 1)] =L 5 (t)—tL s (1) for 0<_ t <1

is a Poisson bridge. We also defined

t) —tN(s, 1)]
(3) N$(t)—[N(s, for 0<_ t <_ 1
(s, 1)

and waiting times

(4) Sk = inf {s ? 0: N(s, 1) = k} fork >_ 0

and we observed that

(5) (^lst,Ns,,...)=(v1,vz,...)

for empirical processes U,, U 2 ,... of a sequence , of independent


Uniform (0, 1) rv's. Finally, we recall that

(6) Yk - Sk — Sk _, fork >_ 1 are iid Exponential (1) rv's.

Lemma 1. The upper bounds in Theorems 14.2.1-14.2.3 hold if w „, a,,, and


n cx are replaced by w5 = O) N ,, a s , ands oo (see Section 14.5).

Over the years many results for U. have been proven by one or another
form of Poisson representation. In this section, we are considering some of
the more difficult results for UJ,,, results dealing with its modulus of continuity.
This is our vehicle for a careful discussion of Poisson embedding, which is
the real purpose of Sections 14.5-14.7. The basic point we wish to make here
is that

(7) I Theorems 14.2.1 - 14.2.3, and many more could just as


well have been proven via Poisson embedding.
MODULUS OF CONTINUITY VIA POISSON EMBEDDING 579

We did not choose to do so. Instead we highlighted the proof of Section 2


based on the exponential bound of the inequality of Mason et al. (Inequality
14.2.1). If we had chosen to follow the approach of (7), then Shorack's
inequality (Inequality 14.5.5) would have been highlighted in a proof of Lemma
1. Suppose for the moment that we had done this. Our next step would be to
say that since S k - co a.s. by (6), we can use Lemma 1 and (5) to immediately
and trivially claim that

(g) (all of the upper bounds given in Theorems 14.2.1-14.2.3


t have just been reestablished via Poisson embedding.

Why did we follow the approach we did? Using our approach, the inequalities
used apply directly to U. ; following (7), the inequalities developed would
apply instead to N. (Since these inequalities are interesting in their own right,
we put them in Section 14.5. We also note that Lemma 1 has independent
interest.) Finally, with regard to the lower bounds in Section 14.2, we observe
that Theorem 14.2.1 was proven via the crude conditional Poisson representa-
tion, Theorem 14.2.2 will be proven later in this section via two-dimensional
embedding, and Theorem 14.2.3 could have been proven via a Poisson rep-
resentation.

Exercise 1. Reprove the lower bound of Theorem 14.2.3 via Poisson rep-
resentation.

There is still one major objective to be accomplished in this section. We


will offer a proof of the lower bound in Theorem 14.2.2. This is another
application of Poisson embedding. (Recall the comment in the first paragraph
of the proof of the Theorem 14.2.1 lower bound.)
Proof of the lower bound in Theorem 14.2.2. We will consider the process D_ 5
of (1). We let

clogs
(a) M=Ms=(1/a,) whereas =
s
Now define

(b)
E
B5= t ma^
±L,((i-1)a,, ia s ] 1
<_,t5=(1—e)r 21og1/a s .
]

Then, using independent increments,

t[Poisson (sa,.) — sa,,] 1 4


(c) P(BS)=P( _, <(1—E)r 21og^S,
sa
( [Poisson(sas)—sa,]
E
^—PI
l \
sa=t(1—e)r
S
2loga
1
J,
l
J
580 OSCILLATIONS OF THE EMPIRICAL PROCESS

1 ± (1-e)r log
2 (1 /a ) J "'
(d) <- [ i -exp ( -(1- e)r2 \log
as/ \ Sas ))J
by (11.9.19)
E ) r 2 'o] M y_dr 2 ko).,. exp ((-
_ay (I -_ E)r 2
ar ))
[1- a (I-
S -exp (-Ma S
_ 1/ (s log
e /C

Thus, on the sequence of integers Sk = ((1 + 0) k ) for some 0>0, we have

(f) P(Bk) = (a convergent series), provided 1- (1- e )r 2 +j<> 0;

and this is a convergent series for any r exceeding the solution R of the
equation [the argument of r1 in (d) is of the order t(1 - ).J/ r]

(g) Rzci(± 2 /cR) = 1.

As in line (n) of the proof of Theorem 14.2.1, the solution of (g) is

(h) R += c/ 2 (ß' -1 ) or R - = c/ 2 ( 1- ß)•

Thus

±L, (C)
(i) lim sup R'.
k- lCl=a rk sJ2aS k log (i /asp)

We now leave it to the reader as an exercise to "fill in the gaps" and show
that (i) can be extended to

±1 (C)
5

ü) lim sup ? R a.s.


s-m ^cI=a, ✓2a log (1 /a 5 )
5

[Just follow the outline of step (o) in the proof of Corollary 14.3.1 for the
case of c E [0, co). Step (o) calls for use of the lim sup result: for this you can
appeal to Lemma 1.] Thus, combined with the upper bound of Lemma 1, we
have

±L,(C) t
(k) lim sup = R a.s.
s-= cI=a, 2a log
5
(1 /a 5 )

Thus, note the justification of Theorem 14.5.1,

±M (C) t
S

(1) lim sup = R a.s.


s-s^ IcI=a, 2a log (1 /a 5 )
THE MODULUS OF CONTINUITY OF v„ 581

Thus i(s, 1)/s a.s. I gives


±N(C) f
(m) lim sup = R a.s.
5-.w IcI=a., 2a 5 log (1 /as )

Since the rv's S k of (4) satisfy S k -* oo a.s., (5) gives


±U k (C) t
(n) lim sup = R a.s.
k-.00 ICI=as k 2 aSk log 1 /ask)
(

But from (14.2.10)


(o) ask k log (k(Sk /k)) -4 1
a.s. by the SLLN,
a k ^Sk logk

and so the above implies


±Uk(C) t
(p) lim sup = R a.s.
k-.w ICI=ak 'J2ak log (1 /a k )

(One has to use again the same argument used to "fill the gaps".) This completes
the proof of (14.2.11). For (14.2.12), you can replace I(i in (d) by 1 so that (g)
is replaced by R 2 = 1.
Our thanks to D. Mason, J. Einmahl, and F. Ruymgaart for pointing out
the earlier erroneous statement of Theorem 14.2.2, which was the result of our
treating the "—" case incorrectly here. ❑

Exercise 2. Write out the details of the proof of step 6) above.

7. THE MODULUS OF CONTINUITY OF V.

We follow Mason (1984a). We consider both the modulus of continuity wo,,


of V. as well as

(1) 6(k) = max max , I en:i +j — Sn:i — n I .


We would expect, correctly, these two to have the same behavior.

Theorem 1. Let 1 <_ k„ n with k„ >' . Let a s = k„/ n.

(i) If

(2) k ' -+0 as n -oo


log n
582 OSCILLATIONS OF THE EMPIRICAL PROCESS

then
ns"(k")
= n-li - log n = 1 a.s.
(3) h^ W log n

(ii) If (in analogy with Theorem 14.2.2)

(4) k" -*cE(0,00) asn-*oo,


log n
then
✓ nwv„(k„/n)nS„(k„) c 1
(5) ti m g
2c lo n
= lim
n-.= lo n

2
c a.s.,
g =J(__1 '

where y+ < 1 is the solution of (see sections 10.5 and 10.9)

111
(6) & — + J=
( with h (,l) =,1 —1—log a.
yC C

(iii) If [these are similar to the conditions of Stute's theorem (Theorem


14.2.1) ]

(7) there exist 1' sequences 0< d„ <_ k„ ^ d„ with d„/ n [ 0, d',,/ n J, 0
and d/d„--* 1,

log (n/k„)
n ,co (k„ is not too big),
(8) lo gz

(9) 1 k n --0 (k„ is not too small),

then

n6(k)
(10) lim mow, ^ (k" /n) = lim =1 a.s.
„ 2k„ log( n/k„) „-- 2k„ log (n/k„)

Boundary sequences for (8) and (9) are

(11) k„ n/(log n)`^ with c„ -- co and k„ = c n Iog n with c„ -* cc.

Exercise 1. Prove Theorem 1. Recall (11.5.16) that

n+l
V [ Sn+l( — t^„+^(1)+ n+1 (n+lt
(12) n+1 „+^
”/J n+l )
THE MODULUS OF CONTINUITY OF V,, 583

for (i — 1)/ n < t <_ i/ n, 1 <_ i <_ n. Thus the modulus of continuity of V. is
asymptotically identical to that of the partial-sum process S. of iid rv's
X, , ... , X„ where X i + 1= Exponential (1). Since an increment of S. of the
form S„(i /n,j /n]= [Gamma (j — i) — (j — i)] /Vi, Eq. (11.9.33) shows why (i
turns out to be the key function. Model your proof after the proofs of Section
2, checking Mason (1984a) for the appropriate maximal inequality.

Exercise 2. It would seem natural to state (7) as k„ / while k„/ n \0.


However, as in a communication due to E. Häusler, no such sequence k„
exists. Show this.
CHAPTER 15

The Uniform Empirical


Difference Process ®„ = U,, + J,,

0. INTRODUCTION

The empirical difference process D,, is defined by

(1)

(2) =U,,—OJ„(QD„')+.^[Q^„-Gn'—I].

Our main result is the Kiefer-Bahadur theorem of Section 1 that establishes


that jjD n JI is a.s. of order ((log t n)" 4 log n/f). More precisely

(3) a
limn i'n 1^.s.
g°II _v
n-.ao v'

This will allow functionals of V. to be treated as functionals of U,,. In a later


chapter, a LIL for a linear combination of order statistics will be established
in this fashion.
In Section 2, Vervaat's integrated empirical difference process W.
2Vi Jo D. (s) ds is considered. Both = and . are established.

1. THE UNIFORM EMPIRICAL DIFFERENCE PROCESS D.

We define the uniform empirical difference process ®„ on [0, 1] by

(1) Dn=U,+Vn;

here V. denotes the uniform quantile process. Figure 1 illustrates the sense
in which D. is a difference, and thus shows why we expect it to be smaller
THE UNIFORM EMPIRICAL DIFFERENCE PROCESS D. 585
1 ii

N(t)/Jn /.-
IDn(t) /✓n

/ n(t) /.r
-

-1V n (t)/^n

0 t c '(t) 1 0 c 1 (t) t 1

Figure 1.

than either U. or N „. We also note from Eq. (1.2.9) that


(2) ®„ =U, —UJ n (G„')+.%n[G„ o6 — I],

where the contribution of the second term is small since

(3) ViIIG.°Gn'—ICI<— .

From Eq. (2) we see that D. is naturally associated with the modulus of
continuity a n of v „. By virtue of Smirnov's theorem (Theorem 13.1.1) we have
JIQ^„' — I^^
1 -

(4) l^m a.s. where b „= 21og 2 n;


b„/ — 2
and thus a study of U,, for intervals C whose length C l is of order a„
(2±E)b„ /V should give information on the D. of (2). N ote also Stute' s
theorem (Theorem 14.2.1). It suggests [2a„ log (1 /a„)]`h12 ..b„(log n) /(2'fi)
as the order of in (2), and this is exactly right.
The study of ® „(t) was introduced by Bahadur (1966). Kiefer (1967) and
(1972) obtained the results for the process D. that we present here. His proofs
of these results are long and difficult; the approach we follow here is due to
Shorack (1982a), and it suffices for Theorem 2 and the upper bound in Theorem
1. We also present an elementary proof that lira in (6) is a.s. bounded by '/;
the fact that this lim is a.s. finite is probably the most useful result of this section.

Theorem 1. (Kiefer) We have

n h /41 B'
(5) a.s_ 1 as n -> o0.
OJ„ log n
Recall that JAN„ 11 = IIv„ 11.
586 THE UNIFORM EMPIRICAL DIFFERENCE PROCESS D,=-U,+N„

Theorem 2. (Kiefer; Bahadur) Let b„ = 2 log e n. Then

(6) lim n1/4II®„ ll —_ 1 a.s.


n-^ 6„ log n f

Corollary 1. (Kiefer) We have

(7) as n -> oo,

where

(8) P(vlji ff > x) = 2 (-1) k+ ' exp (-2k 2 x 4 ) for all x>0.
k=1

Bahadur's (1966) name is listed on Theorem 2 because he initiated the


study of 11 (t), and his proof of a result for fixed t yields an upper bound of
fin (6).
Note that Corollary 1 is an immediate consequence of Theorem I and
Example 3.8.1.
We first extend Theorems I and 2 to somewhat more general smooth df's
F on the unit interval.

Example 1. (Kiefer, 1970) Let X 1 , X2 , ... be iid F on [0, 1] where F is twice


differentiable with inf, F'(x) > 0 and sup, F"(x) < co. Let F. denote the
empirical df of X 1 , .. . , X,, and let f— F' and

(9) R (t)= i[1_,'(t)—F '(t)]f(F '(t))+-./[F„-F '(t)—t]


- - -

for 0-t<_1.

Let e —F(X) and define Q;; OJ 0/ n , ®„ in terms of these f l , ... , „. Thus


;

v„ = /[l=„ o F ' — I]. Then since F' = F - ' (G ), two easy applications of the
-

mean-value theorem give

II®n—RnD = IIV —v [F '(G,')—F ']f(F ') I


- -
-

- -

IIV^II II.f(F '(gn'))—f(F ')II/[inff(x)]


x

for some Q9 n' between Gn' and I


(10) Constant IAN„ II 116;;'— III
using the conditions on f and f' again
= O(n - " 2 log t n) a.s.

by Smirnov's theorem (Theorem 13.1.1). Thus

(11) Theorems 1 and 2 hold for the process R also;


THE UNIFORM EMPIRICAL DIFFERENCE PROCESS D. 587

that is, we may replace Di,,, U,, by l8 n, U,. We will extend this in Theorem 18.2.3
to df's on (—oo, cc). ❑

Exercise 1. (Kiefer 1967) (i) Show that for fixed t E (0, 1)

n' /4 R(t) — [ 2t(1—t) ]" 4


(12) li 3 3 j a.s.
m (log t n) 3 / 4 L
(ii) Show that for fixed 0< t < 1

(13) P(n i/a U8# n (t)<_x) dy asn goo.


t(1— t) o / \ t(1— t)^

Exercise 2. (Duttweiler, 1973). Let 1 <_ i < n, p = i/(n + 1), and q = 1— p.


Show that
n+2 n+2
ED,(P)=2E(P—en:i)1[aI](P—fin:,) -2
\2 ' 1/2

since
E(p — n :i)l[o,,J(P — en:i) Elena pI {E(4n:i — — p) 2 }' /2
Hence, ifpn = i/(n+1) -+pE(0, 1),

ED n,(P) =n ' /Z (2Pq/rr)' /Z +O(n - ') as n-+oo.

Proof of Theorem 2 (See Shorack, 1982a). We will work with the Hungarian
construction of (12.1.7) for which
/ (log n) z
(a) ^ n some M <oo.
lim IIvn—lI
n -.'X

We now note that uniformly in t

n' /4Dn = n 1/4 [ Un — U,(Gn )] + n3 /4 [G. ° Gn — I1


' '

bn logn fb,, log n sJbn log n

Bn — Bn(G ') Un — IBn

/(b n /s) log n /Vn) log n

_ U,(G ')—I3n(G;')
+0(1) by (3)
(b //i) log n n

G '
(14) _ "—"( " ) +o(1) a.s. by (a)
b n (log ^) /✓ n

(b) = 1 -2e " — " ( ^ "') (l+0(1))+0(1) a.s.


22a n log (1/a,,)
588 THE UNIFORM EMPIRICAL DIFFERENCE PROCESS D, U,+V,

for

1 1 1
(c) a„ = (2-2E) b„ Vi with (log ^_ ) —2 log n.

We will return to this presently.


Consider the function

t 0 t<2-2e
(d) hE(t)= 2-2e if 2-2e^t<_Z+2e
1—t 2+2E^t1.

—2s

Figure 2.
0 2-2e Z+2e 1

Let

(e) I{3—E,1+E].

Note that h e is a member of Finkelstein's class X. Chan's theorem (14.3.6)


and (14.3.13) with c = cc yield

[l"(t)-3„(t+h.(t)b„/.%n)]
(f) lim sup = 1 a.s.
, 2a„ log (1/a„)
EJE

Moreover, if g„ is a function on [0, 1] having

(g) 1Ign1I-4 0 as n-+oo,


then
[I,(t)—B,(t+[he(t)+g„(t)]bn/^)]=
(h) lim sup i a.s.
^-'' EJ, 2a„ log (1/a„)

follows from writing the term in (h) as the sum of the term in (f) plus another
term that contributes nothing. Now Finkelstein's theorem (Theorem 13.2.1)
guarantees a.s. the existence of a subsequence m = m 0, on which

Vm
(i) gm b — h E satisfies ^jg m I^ -* 0 as m -* oo.
m
THE UNIFORM EMPIRICAL DIFFERENCE PROCESS D . 589

Note that for this function g m we have

[he(t) +m(t)]bm
G) t+ '(t).

Thus (h) implies

[lBm(t) — Bm(Gm ' (t))]


(k) im sup ? 1 a.s.
m^^ o=<<^ 2am log (1/am)

Applying (k) to (b) yields, since E > 0 is arbitrary,

(1) lim n'14lI®4il/ b„ log n >_ f a.s.

Trivial changes allow D„ to replace D„ in (I).


For the converse, we note from Smirnov's theorem (Theorem 13.1.1) that a.s.

(1+8)b„
(m) 11G ' — III <_ c„ ° 2'ifor all n >_ some n^,. E

for any fixed E > 0. Applying this to (14) gives

lim n1/4 II®n II


um sup I13n(t)—Ian(s)I
n- 1b, log n n- - b„(l og n )/ n

= lim sup (I3 n(t) — B.(s)I


2c„ log (1/cn) VT
(n) (1 + e )/ 2 by Chan's theorem (Theorem 14.3.1).

Combining (1) and (n) completes the proof.


We wish to give a second proof of (n) that does not depend on the Hungarian
construction. Now

hrn n1/4JI®„ 11
;—

n- b„ log n

<lim n1/4IIU " — " (G ') as above


"
n-.^ b„ log n
n 1/ n (JIG„' —111)
lim
„ -.m b„log n
nwn([(l+e)/ 2]bn/ ✓n)
(o) < 1
„ ^^sib, log n
590 THE UNIFORM EMPIRICAL DIFFERENCE PROCESS ®U„+0/„

by Smirnov's theorem (Theorem 13.1.1)


fiTE (1+e) b„
=tim with a -_
m 2a „log (1 /a„) 2 2
1+E
(p) = 2 a.s. by Stute's theorem (Theorem 14.2.1).

But (p) holds true for all E > 0. ❑


Proof of the upper bound in Theorem 1. We define

(a) a” = 1 /(Kb„.!n)

for a fixed large K. Then


4II #II
= L l i i s
JII ince
" ® III log n IIv” 1 1= IIV " 1I ^ ^^

g n 1
^^"II® lo-

(b) lim I^^„-U„(G


”-°°
')II
II -III log n
J„'

using (2), since

n- 1/4
lim n3 /4II6"°Gn'-III ^ lim
„-^ !IG -III l og n „ IIGlog

1 6„ 1 b„
(c) = lim < lim
n- b„ II 0/„ ^I log n lim b„ I4Nn II n -. to g n

2
— • 0 by Mogulskii's theorem (Theorem 13.5.1)

(d) = 0.

Continuing on from (b) we have

t)I
L^ lim sup IU"(s,
„ ,.o It
- -
n
sitllG ' IJI
- - IIG" ' - III log

jU „(s, t]i

1 m sup
”z-sa uiIG — 'ii log n

-.W
+lim sup IU"(s, t)I
a, IIG„` - III log n
THE UNIFORM EMPIRICAL DIFFERENCE PROCESS D . 591
U,S,t I U,, s,t
(e) : tim sup b^ tim
m sup I ( ]I
II -.st a,, lo g n + I^-.^I>a„ 2(t—s) log ( 1 /an)

by Mogulskii's theorem (Theorem 13.5.1), using 7r/2> 1

/1 w,, (a,,) h
Wn(a,,)

(f) 4W 1i+
--v/2a log(I/a )
n n n 2log (1/an)

(g) =/i/K+1 a.s.

by Theorems 14.2.1 and 14.2.5. Since K is arbitrarily large, we have

(h) L < 1 a.s.

This is the upper bound in (5). 0

Self-Contained Proof of an Upper Bound

The previous upper bound proof is short and clear, but it traces back through
many previous deep results. For this reason, we present the following proof,
essentially Bahadur's (1966), of a slightly weaker result (15) below.
Proof. For constants d, and d 2 to be specified later, let

(a) a n — d n-1"2(log2 n) I j bn — n / 4
1 2 1,
,

c,— den-3/4(log n)" 2 (log 2 n) 1/4 ,

there is no loss in assuming that a, b n , c„' are integers. Fix is — tj <_ a n . Then
one of the intervals [0, 2a n ], [a n , 3a n ], [2an, 4a,,], . . . ‚ [1 —3a,,, 1— a n ], [1-
2a n , 1] must contain both s and t; and there are less than a„' such intervals.
Let In = [ t* — a n , t* + a n ] denote the interval containing s and t and let

tnj = t*+fa n /b n for III ^ bn

denote a partition of In . If

S E [t,,, tni+l ] and t E [ tnj, t,,1+1]

then

[6,,(t,,,) — G,, (t,,±) —(i,, — tn,j+l)] —2a,,/b,,

^6 n (s)-6 n (t)—(S—t)

/ n2a
^[G nn,i+l)
( t1 —Gn(tnj) —(tn,i+l—tnj)]+ ba
592 THE UNIFORM EMPIRICAL DIFFERENCE PROCESS ®, =U,,+0/„

Thus

sup {^G „(s)—G„(t) — (s —t)j:0 s, t <_1 and is—tj<_a„}


(b) max {!G (tn1 ) — G (t n1 )—(t ni —(t nj )1: t„ i and t are in

some In and Ii-jI^b n +2 }+ b”


n

M„+ b".
n

Note that each of the at most 4b n /a n terms in the maximum M n of (b) is of


the form

Binomial (n,p„ ij )—np nij l /n where a n <_p„ ij <_a „(1+1 /bn )

for all i, j. Thus inequality (11.3.3) gives

P(Mn ^ Cn ) Z P(I Binomial (n, pnij) — npnij l ^ ncn)

2 ex nc”
2
‚ \n
(
n )
_.. 1 b (11.1.3)
C ^ p 2Pnij(1— p) \Pnij y

/ 8b \ c„

^^ a„ } exp L 2an(1 +1 /bn) ^an^j

since Agl (A) is / and 4r is \; and it is easy to check that

P M
10 Z
lim g 2d2
logg n >cn)1- 1
(c) < -1,

where (c) holds since we now choose

(d) d2> 2v'd.

From (c) we obtain P(Mn ? c„) <co, so that P(M„ >_ c n i.o.) = 0; thus

(e) lim M„/cn <_ 1 a.s.

From (a) we obtain

(f) 2a „ /(b„c„)-*O as n->cc.


THE UNIFORM EMPIRICAL DIFFERENCE PROCESS D . 593

Plugging (e) and (f) into (b) yields

(g) limcn'sup{IG"(s)—G"(t)—(s—t)I:O<s, t<_1 and Is—tI<_a"}


<_1 a.s.

Now let d, = 2 - ' /Z + e, and combine (4) and (g) to claim

U n—Un(Gn')11
(h) um jj --1 a.s.
V fl c n

Since IIG" o« n ' —1 cn -> 0 as n - oo is trivial, we obtain from (2) and (h) that

n" 4 d
(i) tim 1/2 n)'/4 llBnII ^2i/4 a.s.
(log n) (21og 2

Now recall that (i) holds for any d 2 exceeding 2v'd=2(2 - ' /2 +s)'/ 2 for all
e>0. Thus

(15) lim
n "4
b" ^I gnll s'./ a.s.

Our "easy proof" of (15) misses (6) by a factor of 2. ❑

The Order of D/q

Theorem 3. (i) Let a,, a 2 , d,, d2 , M all be nonnegative. Suppose


i
(16) a, + a 2 = 2 with a 2 <_,
(17) a,d,+d 2 ? 1 with d,>0

(18) a, d,>Zd, and c=1 , or a, d, +d z =Z+äd, and c=—ä.

Then

(19) him
n-w
n°2(lo g2 n ) `
(log )d2 I U +V I—M (logn)dl/n
II(1—I)]°1
some MZ <a^ a.s.

(ii) If (16) holds, then

n° lo )
l+° U +N I-9(io g n)/n
(20) lim Z(gz " ° Z <_ some M <co a.s.
log n [1(1—I)] , 9(log2n)/n

Note that replacing the denominator I° , of (19) by its minimum value


M,(log n) d l/n leads to a coefficient of IjU n +N nof n" ,+0 2/(log n)°id , +d2, and
594 THE UNIFORM EMPIRICAL DIFFERENCE PROCESS DU „ +N„

hence a simple proof of (19) based on this replacement and Theorem 2 will
not work. Also the case a, = a 2 = 41 , d 2 = Z, d, = 2 "just fails” to satisfy the first
condition of (18); but it does satisfy the second condition of (18), and is of
the same order as the Kiefer-Bahadur theorem (Theorem 2).
Proof. The proof is deferred until Section 16.4, where a necessary lemma is
established. ❑

2. THE INTEGRATED EMPIRICAL DIFFERENCE PROCESS

Vervaat (1972) considered the process

(1) W,(t)=2/i J r D n (s) ds for 0 t <_ 1.

Theorem 1. (Vervaat, 1972)

(2) W n =U 2 on (D, 2,11 11) as n -* a0.

Proof. From Figure 1 it is clear that

(a)J [G (s) +G'(s)] ds = t 2 + J r„


^'
[G. (s) — t] ds;

the second term on the rhs of (a) corresponds to the shaded areas in Figure
1. Thus

W n (t)=2Vi ® n (s) ds =2n r[


6(s)—t] ds

=2V n [Un (S) +v 11(s — t)] ds


G„'(1)

(3) =2Vn Un( s) ds—Vn(t).


E „'(t)

Introducing the special construction of Theorem 3.1.1 shows that (recall


V=—U) uniformly in t

1UN n (t)=-2V n (t)U(t)—V 2 (t) +o(1)


=- 2V ( t ) U ( t) —V 2 (t) +o(1)= 2U 2 (t)—U 2 (t) +o(1)
(b) =U2(t) +o(1) a.s.

and establishes the theorem. ❑


THE INTEGRATED EMPIRICAL DIFFERENCE PROCESS 595

Exercise 1. (Vervaat, 1972) Show that

(4) b_M► {h 2 : h E } a.s. wrt II I on D

for ° as in Finkelsteiri's theorem (Theorem 13.3.1). [Hint: Replace V, by —v n


in (3) above using the Kiefer and Bahadur theorem (Theorem 14.1.2).]

It follows immediately from (2) that

Tn = W„(t) dt- , j W=
(5)
J 0 0
UJ 2 (t) dt,

where W is the familiar Cramer-von Mises statistic of Chapter 5.

The Parameters Estimated Version

Consider now the estimated empirical process

(6) Ü,,(t)_/j[^n(t)—t] for 0<r<_ 1,

where e n is the empirical df of rv's

(7) enr = Fe„(X,)

as spelled out in Remark 5.5.1. Making the obvious analogy we define

(8) ®n = ü,, +4/ n

to be the estimated empirical difference process with

(9) 4/n(t)=I[Qv„'(t)—t] for 0<_ t<_ 1

the estimated quantile process. Finally, let

(10)MV n (t)=2V { ® n (t) dt for 0<t<1.


.1 0

ASSUMPTION 1. We assume that

(11) Ü,,= Üon(D,2,II II) asn-*oo

for some process 6 on the measurable space (C, `6) of Chapter 2.


596 THE UNIFORM EMPIRICAL DIFFERENCE PROCESS ® U„ +%/„

Establishing (11) was the subject of Section 5.5. It follows immediately


from (11) that

ll
(12) W„= UJn(t) dt^ d W= I U 2 (t) dt as n-> X.
0 Jo

The distribution of W was investigated in Section 5.6. Tables of its distribution


were provided for several different F.

Theorem 2. If (11) holds, then

(13) TT - *f(1)dt-4dW= I U'(t)dt as n-*co


0 Jo

for the ry W of (12) and Chapter 5.


Proof. The proof of Theorem I (with on everything) suffices, because (3)
is still valid and because Theorem 5.5.2 gives 0 = —6. ❑

Open Question 1. How good are tests of fit based on T„ and T„?
CHAPTER 16

The Normalized Uniform


Empirical Process Z,,
and the Normalized Uniform
Quantile Process

0. INTRODUCTION

The process

(1) 7L "(t)= U (t)


" for0<t<1
t(1t)
-

has mean 0 and variance 1 for each fixed value of t; hence it is called the
normalized uniform empirical process.
We note from Chibisov's theorem (Theorem 11.5.1) that = fails for 7L" and
from James's theorem (Theorem 13.4.1) that M ► fails for Z n / b" ; but in both
cases, the function I(1— I) "just missed."
In this chapter we will consider the rate at which Z,, blows up. Section 1
considers -* d, while Sections 2 and 3 consider -* a.s. . Analogous = results for
0/ n / I(1— I) are established in Section 1, while an a.s. result appears in Section
16.4.

1. WEAK CONVERGENCE OF ^17L" 11

In this section we consider the weak convergence of the normalized uniform


empirical process

(1) Z (t)= " (t) for0<t<1;


t(1-t)

597
598 THE NORMALIZED UNIFORM EMPIRICAL PROCESS d„

the name derives from observation that 7L(t)


has mean 0 and variance I for
each fixed t. The natural limiting process to associate with 1,, is

(2)ZL(t)= U(t) for0<t<1.


t(I — t )

It is obvious that

(3) Z„ f.d.Z asn- X.

[Note from the LIL (2.8.7) that 7L is not a random element on the measurable
function space (C, f) of Section 2.1.] Recall from (2.2.3) and Exercise 2.2.9
that the Uhlenbeck process X on (CR , WR ) with R = ( —c, cx)) is a stationary
normal process having

(4) EX(t) = 0 and Coy [X(s), X(t)] = exp ( — It — sI) for all s, t E R,

and that X can be represented in terms of Brownian motion S by letting

(5) X(t) = e - `S(e 21 ) for —oo< t <co.

Thus from Exercise 2.2.4 we can represent Z by

t)=
l(t) = -t)S( It §(^t t )

(I-t) t(1 - t)(1

(6)
— !1
^C`2 log 1 _
t 1
J for 0 < t < 1.

<
We now turn to consideration of the norm of 1. Again, the LIL of (2.8.7)
implies IIZ II = °Ö a.s.; thus we let 0< d„ <e 1 and consider ill (II. We define

=
(7) an
e(1—d
d„(1—e „))

where0<d„<e„<1.

We observe from (6) that


e 2 ' log (e" /(1—e,,))
^I II ä„ = {IXII 2 - ' log (d„ /(1-d „))

II X II 2 '' °g „”
- by stationarity of X

(8) = 11§ /✓ I 11 i” by (5).

Now the limiting distribution of the norm of normalized Brownian motion


^/^ was studied by Darling and Erdös; their results were presented in Section
WEAK CONVERGENCE OF flZ, I 599

2.11 and necessitate the introduction of some notation. We will need the
normalizing functions

(9) b(t) ✓21og2 t and c(t) 21og2 t+2 log3 t-2 log 4ar

with b„ = b(n) and c„ = c(n). We also let E„ denote the df of the extreme value
distribution; thus

(10) E„(x)=exp(–exp(–x)) for –oo<x<cc.

It is trivial that the df of the maximum of k independent rv's having df F is


F k . We also note that the extreme value df satisfies

(11) E„(x–log 2) = E v,(x) for –oo<x <oo.

We now state the Darling and Erdös theorem (Theorem 2.10.1) that

(12) b(t)11 7 11 1 –c(t) - d EU as t-moo

while
II
(13) b(t) —y1–c(t) - d E^, ast-*oo.

[The key to the Darling and Erdös proof was the representation (5) and (8)
of § in terms of X.]
From (8), (12), and (13) we see immediately that

(14) b(a„)^^ll$^^d,,–c(a„) -^ d E„ ifa„-*ac

and

(15) b(an)!lZII – c(a„) _+ d E2, ifa„-*co.

Our consideration below of the asymptotic behavior of 114 11 will allow us


to consider instead (IZ„ II a° with d„ = 1– e„ = (log k n )/ n and k=5. Now for
any choice of k> 1 we have b(a„)/b„- 1 and

b(an) b(an) b(an)


b c„–c(a)= b [c„ – c(an)]+c(an) b(n) –1 ->–log2
n n

via elementary calculations. Combining this with (14), (15), and (11) gives

(16) bn 11 Z' 11 d, „ -* d E v as n -g oo
600 THE NORMALIZED UNIFORM EMPIRICAL PROCESS Z.

and

(17) bn II Z II'e^ d ^ — c„ -*E v as n -4 oo,

where d o = (log k n )/ n for any k>1.

Exercise 1. Provide the elementary calculations mentioned in the previous


paragraph.

[Consideration of the Darling-Erdös proof and the use of stationarity in


(8) shows the four rv's b„ lI7 (^'d! 2 — c„ and b„ 1^7L' ^^ i^i ^ — c„ are asymptotically
independent with E„ marginals; thus their maximum should indeed be
asymptotically Eü as in (17).]
We are now in position to consider the main result established by Jaeschke
(1979).

Theorem 1. (Jaeschke)

(18) bnIIZ II — c„ - d E2 as n -oo

and

(19) bnjlZn11—cn -" d E4 as n-.00.

Since cn /bn- 1, this trivially implies (recall that # denotes +, —, or I I)

(20) IIZ'n lI/ b n -+ 1 as n - oo.

One reasonable variation on 11 4 11 that has application to distribution free


statistics is the quantity

U(t)
sup
f^:^ VG t)( 1 — G(t))

considered by Eicker (1979); this is a "Studentized version" of Ill„II. Eicker's


variant
% i
max n+2 n+1
—e„ / n+1( 1 n+1)

exhibits properties of symmetry that make it preferable. A "continuous version”


of this statistic is

II 7)I
7
-
vn
WEAK CONVERGENCE OF 11,11 601

The second version still seems the most preferable. Thus we let

n + 2„ :;

(21) T„ = max ± ( p"^) and T„ = T„ v Tn-,


Iii p

where p,, il (n + 1) = E (^„,;) and q„ ; = 1— p„ ;.

Theorem 2. (Eicker) We have

(22) b„T„—c„ - d Ev as n-^ t

and

(23) b„T„—c,, -.> d E,, asn->x

while

(24) T„ / b„ P 1 as n oo.

Inasmuch as the distributions of the statistics in Theorems 1 and 2 are


controlled by the small-order statistics, one suspects that the practical applica-
tions of Theorems 1 and 2 are nil. Nevertheless, they are probabilistically very
interesting.

Proof of Theorem 1. It suffices to prove this theorem when U. denotes the


empirical process of the Hungarian construction of (12.1.3). Let B„_
K(n, • )/Vi for the Kiefer process K of (12.1.8); thus H. is a Brownian bridge
and

log' n
(a) lim 11U„—B„Ij IJ <—someM<co.
/

Let a„ be defined by (7) where

(b) d„_(logsn)/n and e„=1—d„.

Now

U,,-8,, bM(log2n)/
„^ Mb„
lim b„ ^` li m = li w log n
„-- [TI—I led

(c) = 0 a.s.,
602 THE NORMALIZED UNIFORM EMPIRICAL PROCESS Z„

and
e

bn ll^n 11 ti„—Cn=bn
11
1 Ms I)I „ d —en

b„
1111 Bn
M —I) dn
e
— c„ +op (1) by(6)

bnIIZ#II„—cn+op(1)

(d) JET if# =t by(16)


d lEv if # =I I by( 17 )

since a n -* m. Our theorem follows immediately from (d) and

(25) b n II7L n II o 0 g k n )1 n—cn -* p —oo asn-*cc

for any k >_ 1; (25) will be established in the rest of this proof.
We will first show that (25) is implied by

(26) II^bIIö' -p as n-^oo.


n

Since for each fixed x we have (x+c„)/b- 1, it follows that

p(bn II Z n 1I ö..0„fix)=(
IImnIIÖ ] x +zc„)\(

b n b„

(e) -- 0 for each real x, by (26).

But (e) implies (25). Thus it remains only to establish (26).


Now

(f) Ilzn 1{0 " ^ II z n {i OJn +Rn ill /n .

Moreover,

IIZnIIO
bn —
Q-D„
I
.^ l o
I — ^^„
I
1
b„
n

n-1
C 2 [II G n/ I IIO n+1 ]

bn

(g) _+ p 0 by Daniel's theorem (Theorem 9.1.2);


THE a.s. RATE OF DIVERGENCE OF IIZ' 603
and the Shorack and Wellner inequality (Inequality 11.2.1) with S=Z gives
a \ 2
--eb„)<_ 12 (log(nd))exp( — e b "

4G(Eb"))
P \ " II ,i" / \ 8
-^ (12k log t n) exp — 4" log b„
)

(h) -0 as n-+oo,

implying

JjZn1^ i " -+ P 0 oo
asn-,.
(') b”

Combining (g) and (i) into (f) gives (26). 01

Exercise 2. Give a proof of (26) based on the Poisson representation and


Exercise 14.5.2.

Exercise 3. Prove Eicker's Theorem 2. (Hint: See Jaeschke, 1979, p. 113.)


[Eicker (1979) shows that Theorem 2 holds no matter which of the three
versions discussed prior to (21) was used.]

Exercise 4. Note that

0
' ' d
J '

Z,,(t) dt = o v"(t)dt aresin (2t-1) dt

^ n
(27) = arcsin (2t-1) dU„(t) = ^^^ aresin (2f —1) ;

in

Show that Y,, ... , Y. are iid (0, (ßr 2 — 8)/4) = (0, 0.467). This would seem to
be a very useful statistic.

2. THE a.s. RATE OF DIVERGENCE OF II7Ln IW 2

For each fixed t the ry Z n (t) is normalized Binomial (n, t) rv. Recall from
Inequality 11.1.1 that for 0<t< 2 the attainable bounds on the long tail
P(7L"(t) > A) are much poorer than those on the short tail P(Z"(t) <—A). This
difference is responsible for the difference in the behavior of IIZ I} I' 2 and
11ZniWö 2 that we will see exhibited below.
604 THE NORMALIZED UNIFORM EMPIRICAL PROCESS Z.
Note also that (t) [or (t)] as t goes from 0 to Z behaves the same as
Z(t) [or 7L„(t)] as t goes from 1 to 2. Thus we concentrate solely on [0, Z] in
most of this section.
We recall for contrast with the results below that (16.1.20) gives

(1) II z111 ^ D l as n- cC,


bn

where b n = 2 log e n and # denotes +, —, or as usual. It also follows from


the classic LIL that

(2) lim 7l„ (t) /bn = 1 a.s. for each fixed t E (0, 1).
n -.

We have also seen in Section 10.8 that the a.s. lim sup of 7L„ (a n )/b n with
a n \, 0 may be different than 1 (it may even be infinite when # denotes +
or) 1).
The question that led to Theorems 1 and 2 below was posed by Csörgö and
Revesz (1974). Theorem 1 was obtained by Csäki (1977). The fact that is
an a.s. upper bound was also obtained by Shorack (1977), which was not
published until Shorack (1980). Theorem 2 is the form due to Shorack; we
simply take this opportunity to state what had been proven.' This result also
follows from Csäki, who also evaluated the lim sup for a few key sequences.
His results are given in Section 3. Shorack used a martingale in n for his
proofs and Csäki used a martingale in t. We use Shorack's method, improved
further via Inequality 11.1.1, for all theorems in this, and the next, section.

Theorem 1. (Csäki)
-

(3) lim
n -.
"
=z ^/2 a.s.
bn

Theorem 2. (Shorack) Given a n \0, let a n = c n (log 2 n)/n define c n . Then


c ^ lim cn > 0
(4 )
lim'
J)z 12 n 111 according as
(5) n-. bn = ao ^ lim c n = 0

(Also, lim lZ„ Ih/ Z /b„ <J2 a.s.)

We now ask at what rate the divergence of J7l„jJö z takes place, and give a
characterization.

Theorem 3. (Csäki) Let A,,?. Then

(6)
lim
)Z ))ö 2 0
= a.s. according as —
1 1<x
(7) n- 00
n=i n,1„ l. =oo
THE a.s. RATE OF DIVERGENCE OF BIZ 2 605
It thus holds that
+1 1 1/2
(8) lim
n-,ao
lo gon
gz
l0 2 a.s.

Il
Remark 1. We may replace II Z„ by II Z d I in Theorems 2 and 3.

Remark 2. Equations (3) and (8) show that I^ZnI^ö z and IIZnIh. z behave
rather differently. Recall also from Corollaries 10.5.1 and 10.5.2 that

(9) n-w
go gz
n
lim lo ll " III = 1 a.s.,

while

(10) lim II I "Ins" = 1 a.s.


n- lo gz

Recall also from (10.1.1) and (10.1.2) that

(11) um log
n-.
in / n"
gz
n k) _ a.s, for all fixed k,

while

(12) ►
1 m n:
nSk
= I a.s. for all fixed k.
log t n

Open Question 1. Evaluate j,. n lJ ö z / b„. Does it equal 1?


Proof of upper bounds in Theorems 1 and 3. Let 0 < 0< 1 be small. Just begin
with equation (m) of the proof of James's theorem (Theorem 13.4.1) with
q(t)=f, and hence ¢(t)=1. From Eqs. (13.4.12)-(13.4.14) you draw the
conclusion (for numbers E + and e - to be defined below)

(a) P(Dk) MB log (1/c„ k+ )


(log nk) ( e> 6 (E=^ rk

with
E+b
(b) Yk = 4' "kyk = 1,
nk+l Cn+l

(c) Dk =
[ t^
^k mnk+t
e
1l
max 1 1LIIIwith n k =((1+9) k ),
C^ k+1
J

and a/ as in Proposition 11.1.1.


i
606 THE NORMALIZED UNIFORM EMPIRICAL PROCESS Z.

Consider Theorem 1 first. Now for A n /

1 1 °° 1
(d) P fin: .o. /
f =0 if ^—<o
nA n ] nA n

by Kiefer's theorem (Theorem 10.1.1). So let A,,? so that

(e) cn= 1 /(nAn) \0 withlog(1/cn)-.klog(1 +0)


°° 1
and —<00,
1 nAn

Then, a.s. for n ? some n e,, we have

(f) r° 1)110„= 0. -

Thus we will have

_ t
(g) um II s a.s.
n-, °° bn

provided we demonstrate that


00
(h) P(Dk)<cc•

In the "—" case of (3), simply set

(i) E- = '(1+e) for any small e >0,

and choose 0 <_ some 0, so small that (a) and (e) give (h) via

(j) P(D) < k 8k = (a convergent series).


2

Consider the "+" case of Theorem 3. Recall from Proposition A.9.4 that
there exists a sequence en for which

= I
(k) <oo and e n 1 0 (slowly) as n -* x.
n=1 nAnen

In this "+" case we will replace the c n of (e) by

1
(1) Cn nAne.'
THE a.s. RATE OF DIVERGENCE OF IIZ„UUV Z 607

but we note that (f) still holds for this new c„. We also replace s + by

(m) e+_
=e+_
k =e.JA„ k /b ;, A for anysmalle>0

[so that the definition of Dk is appropriate for (6)]. The exponent of (a) then
satisfies

iI„x ^ 2 ^ E VAnt
EV
(Ek) Z i^k =( b1k
l bn `Y
k b 1/(An, en,) /
2

n;

EA nk 2 log An k
since i/i(A) --- (2 log A )/A and A -> co
b„ k EA nk

s log A n , E•
for large k
,/e„^ log 2 n k V

(n) -*X ask-co.

Plugging (n) into (a) gives (h) in the ”+” case.


Since it is a trivial matter to replace VT on [0, 0] in (g) by 1(1-1), and
since [ 0, 2] is trivially taken care of as in James's theorem (Theorem 13.4.1),
the proof of (6) and the upper bound in (3) is complete. 0
Note that at line (f) we showed that

( 13 ) IIl-mn/ 11 Ilo” -ßd.5. 0 as n goo

provided c„ = 1 with A„ / and 1 <cc.


nA„ , nA„

Proof of (8). Let M >0 be large. Setting A„ = log n in Csäki's theorem


(Theorem 3) we find that for any M>0 we have a.s. that

(a) (M'JIog n )1nag 2 n (11 1+11 I Z)'/log , 1.0.

Letting A n = (log ‚)' + e for any fixed E >0 in Csäki's theorem (Theorem 3) we
find that a.s.

)(1+r)/2] 1/109 2 n
(b) "<
((jln11I/2)I/10g2 (log n
M for all n>some n.

Since taking logarithms shows that

(c) Jim M°g2"= I and tim (log n)'°g2"= e,


n-.co n-.X
608 THE NORMALIZED UNIFORM EMPIRICAL PROCESS Z.

Eqs. (a) and (b) combine to give


e(l +e)/2 a.s.
lim (II^±llö 2'1/108
2
(d) e' /ZC
n' 00

for all e > 0. Taking logarithms in (d) yields the result. ❑

The proof of Theorem 2 is contained in the next section.


Proof of the lower bound in Theorem 3. Without loss of generality, assume
.1 n > 1 for all n. Let p. = 1 /(Mn,1 n ) where M is a large constant. Then
W W 1
Y^ P(& <Pn) = Y^ MnA =0 '
n

where the events [in <p n ] are independent. Thus Borel-Cantelli implies P(:n <
pn i.o.) = 1, and hence

(a) P(en:i <Pn i.o.) = 1.

Thus
+ 1/2
)—Pn
^^[nQ^n(Pn)—nPn]
(b) III 1 ( 1„ 1 )IIo an^p((
I1
>_^/M nQs n (pn )—M since A.>_1
J
✓ M 1 — M i.o. with probability one, by (a).

Since ./Ä(1 -1/M) can be arbitrarily large, we have that the T i (b) is
a.s. infinite. Thus (7) holds. ❑

The proof of the lower bound (3) is omitted. Consult Csäki (1977), or
consider a Poisson bridge proof that follows the rough outline of Csäki's proof.

Exercise 1. Show that for any m > 0

(14) Ti
7 n
ö °g"'
")/"
^ 1 a.s.,
n

while

(15)
IIZ (ogm n)/n
-*„O as n- cC.
b „
ALMOST SURE BEHAVIOR OF 11Zjä' 2 WITH a1,0 609

3. ALMOST SURE BEHAVIOR OF jZn I^ 2 WITH a n \ 0

Description of our conclusions requires consideration of the function [see


(11.1.7)]

(1) h(A)=A (logA-1)+1 fort>0.

Recall that this is the function that a rises in bounding the longer tail of a
binomial distribution. As usual b n = J2 log, n.

2
v
1

0
Figure 1.
.236... 1 C

We define c n by

/ c,, log t n
(2) a =
n

We will present three theorems; they concern the cases cn - CE (0, 00), c n - 00,
and c n -' 0. All results are from Csäki (1977).
We consider first the case C. - cc (0, oo). For each c>0

(3) let ß' > I solve h(ß)= 1/c,

and let

(4) Lc=fv (ß-1).

Theorem 1. Let a n \ 0. If c n -^ c E (0, c) as n - o3, then

^n 1/2
(5) lim
n-m bn
I I °" = L, a.s.

As we see from Figure 1, the a.s. lim sup of (7L n fl ' 2/b n varies continuously
from V'2 to co on the class of sequences of Theorem 1. This is more detailed
information than is contained in Theorem 2 of the previous section. The
ordinary LIL at t = Z tells us that the a.s. him sup in question can never be less
610 THE NORMALIZED UNIFORM EMPIRICAL PROCESS 1,,

than 1. Thus the picture will be completed if we give a class of sequences on


which the a.s. lim sup varies continuously from 1 to h We now do this.
.

Theorem 2. Let a„ N 0. If c„ -4 Co and

(6) log2(1/a„)
-c asn-+oowith0 c 1,
log e n

then

(7) lim
n- ao
bn
" _
° a.s.

Example 1. Note that

_ 1
yields c=0,
(8) a" log n
(9) an = exp (— (log n )`) yields cc (0, 1),
(10) a„- n - 'logn yieldsc =1,
(11) a -S with 8<1 yields c =1.

If c„ - 0, then the a.s. lim sup in question is infinite by Theorem 3 of the


previous section. Provided c„ does not converge to 0 at too fast a rate, we are
able to characterize the rate at which the a.s. lim sup becomes infinite. ❑

Theorem 3. If c„ N 0 and

(12) log (1/c) 0,


log e n
then

log(1 /c„)
(13) lim 17+11ii2,/cn 1 a.s.
/logen

Example 2. Possible cases to which Theorem 3 could be applied include

(14) c= 1
log 3 n
and
c,
(15) c„ — (log2 n ) where c,, c, > 0.
ALMOST SURE BEHAVIOR OF JIZ;II,/ 2 WITH a„\,0 611

The special case a n = 1/n (or C. = 1 /log 2 n) yields

(16) li I71„ I(log 3 n/1og 2 n) = I a.s.;


n-W

this should be compared with the result of Baxter (1955) (see Csörgö and
Revesz, 1974), which can be rephrased as

c log, n )_
(17) lim Z n - V 1 a.s.
n-. n log, n

Proof of Theorems 1-3 upper bounds. Let

cn log 2 n
(a) a,,=- where c o -' c e (0, oo)
n

and let

(b) do / c slowly [see (k) below for a definition of "slowly"]

We define

(c) An = [ lIl'n jI 0a ^ > rb*n ] for appropriate r>0, a,, s a %, and b* 7.

We seek to show E ° P(A,) < oo, so that Borel-Cantelli will give

7L
(d) lim II b D °"<r a.s.
n-+o n

We will use the maximal inequality (Inequality 13.2.2) with

(e) nk=((1+9)k), C=I-9, a 1+29, n'=nk, nn^nk+l

and a fixed small 0 = 0, to be specified below; according to this inequality, it


suffices to show

(f) P(Dk)<oo,

where [note that c/'^ (1 - 0) 2 ]

(g) Dk=[llZnk+i+,'(1-9)2rbk].

Now by the Shorack and Wellner inequality (Inequality 11.2.1) with 8= 0 we


612 THE NORMALIZED UNIFORM EMPIRICAL PROCESS Z,

have [the VT7 in the denominator of 7l„ accounts for one (1- 0)]

3 *
log (an k /an,+,) 7 r z (bflk) z +
(h) P(Dj) 50
exp (—(1 —8) 2 Yk),

where (since i r is "a) we may take

rb
(i) nk
n k+l a nk+,

[Note the complete analogy with James's theorem (Theorem 13.4.1), Stute's
theorem (Theorem 14.2.1), Theorem 14.2.2, and Csäki's theorems of the pre-
vious section.]
For Theorem 1 we set b nk = b nk and consider three subintervals:

V) [an, da], [dnan, ar], and [an z]

for some small a r - a re . Consider first [a n, ad]. From (h) we have

4(lo0dnk ) $ (_ ))
P(Dk) exp —(1—B) rz (logznk)^G`

_
^lCrJl

(k) ( 4/ 0 )(log dn k )
(l og nk )(I e^B^Z^c,rirr)l

where log n k — k log (1 + 0) and log dnk washes out, since we specified it went
to oo "slowly." Thus for k sufficiently large we have

(some Me )
P(D k )—

(1) = (a convergent series)

provided r exceeds the solution R of

(m) 1 = R 2 1J!( R)

and 0 = 0, is chosen sufficiently close to 0. As in the proof of Theorem 9.2.1


at its steps (m) and (n), the solution is

(n) R=\/(/3_1).
ALMOST SURE BEHAVIOR OF IIg^U.. WITH a„',,0 613
Now consider the interval [da, a,]. As in (h), we have

P(Dk) ^4log (a /(ankdnk))


, exp (
—(1 — 0) 8 r 2 (log2
nk) !J do k / /
5 log n k
(o) 0 (log n

for k sufficiently large since ßy(0) = 1. Since log n k --- k log (1 + 0), this series
converges for any r> v. Combining this with (n) gives

^n a
(p) lim II l I "s(f v R)+s=L,+e a.s. for all r > 0.
n-." bn

From James's theorem (Theorem 13.4.1) we know that any I ä! 2 contributes


at most 1 to the a.s. lim sup. We have thus established the upper bound in
Theorem 1.
It is worth noting that if we had tried to consider the interval [a n , a,] in
one application of Inequality 11.2.1 we could have obtained the bound

(q) R*=

While true, this is not good enough.


For Theorem 2, we set b*nk = b nk and consider the same three subintervals
in (j). Since c n -> oo now, and since

2
(r) i — r) >_ (1— 0) for all sufficiently large k,
( Cnk

we obtain from (1) that

Me
P(Dk) k a

(s) = (a convergent series for all r> 1)

provided 0 = 0, is chosen small enough. Thus the lim sup over this first
subinterval is a.s. bounded above by 1. From (o) we obtain

P(Dk) ' Me log (I/an k )


(log nk) (1_6)9r
M'(log n k )`
by (6)
(log nk)°'2
(t) = (a convergent series for all r> R),
614 THE NORMALIZED UNIFORM EMPIRICAL PROCESS Z„

provided 0 = 0, > 0 is chosen small enough, where R is the solution of

(u) R2-c=1 (or R =.J1 + c).

Combining (s) and (u) gives the upper bound in Theorem 2.


For Theorem 3 we note that (12) implies

(v) c"k -1 ask-cc,


Cnk+I

and this is all we will need to establish the upper bound. We let

(w)
/(
b*, = log 2 n/ I ./c„ log t n log
L
Q) 1
Cn
.

In this case we also consider the three intervals in (j). For [a n , ad] we obtain
from (h) that
P(Dk) s 3 log (a *k /a „k.,)
0
( 1 - 0) 7 r 2 loge n k r
xexp (
2 c„ k log 2 (1/c„ k ) \c„ k log (1/c„ k ))
(1 -8) 'r z 2
Me (log d„ k ) exp (-
2 r logt nk )
since i/i(A) -- (2 log A)/ A as A -x'
_ M9 (log d k „ )

(log nk )( 1- B) 7r

(x) = (a convergent series for any r> 1)

if 0= 0, is chosen small enough. For [ad, a,], (h) gives

P(Dk)<_M9 log (an k 1 )


(1- 0) 8 r 2 log e Ilk
x exp / _
2 c„k log (1/c k ) „ \ c lag(1jc
„k r ) 11
„k

1
^-M0 log — exp( -(1-8) $ r log2nk)
a „k

_ MB(log nk)
G
)(t
(l og nk -8 f(1„ k

(y) _ ( a convergent series for any r> 0)


THE a.s. DIVERGENCE OF THE NORMALIZED QUANTILE PROCESS 615

since d„ k - oo. Thus for any r> 1, we have


ZV a
(z) lim I e il a„<_ r a.s.

for some sufficiently small a,. James's theorem (Theorem 13.4.1) ensures that
the lim sup over II II 1 ! 2 does not exceed 1. This gives the upper bound in
Theorem 3. ❑

Exercise 1. (i) (Mason, 1981b). Consider the weight function

Wk [I(1-1)] - ' +k for0<_k<_2.

Then for positive a„ „' we have

0 °° 1 < oa
p(n k IIwk(Gn — I)Il a„ i.o.)= 1 according asnnl _f k>^=
na '
n /c 1- l

(ii) (Mason, 1983). For 0 k <Z we have

P(n k Ilwk(Gn —1 )11 75 x) --[Hk(x)] z for x?0

and

P(n k IIWk(G„)(Gf - 1)11:x)->[Fk(x)] 2 for x^0,

where, with N a Poisson process,

Hk is the df of II (R — I) I +k Ii1i ö

and

Fk is the df of 11((ß — I)Nt-I+k 1I; .

Recall that n, denotes the first jump point of N.

Exercise 2. Verify (16).

4. THE a.s. DIVERGENCE OF THE NORMALIZED QUANTILE


PROCESS

In this section we will settle for a partial analog of Theorem 16.3.1; we present this
because it is required in Chapter 18.
616 THE NORMALIZED UNIFORM EMPIRICAL PROCESS Z.

Theorem 1. (Csörgö and Revesz) Let a„ =9(log z n)/n. Then

(1)
„-.Jim oo l V„ p
I (1 — 1) a. n / b„<— 2 a.s.

for b„ = 21og & n.

Open Question 1. Obtain analogs of the theorems of Section 2.


Proof. We follow Csörgö and Revesz (1978a). In order to show that the 1im
in (1) is a.s. less than K (only later will we specify K = 2), we will show that
for all n sufficiently large

(a) x, <G(t)<_y, for all a„<_t<_1—a„,

where

K t(1 —t)b„ K + t(1 —t)b n


(b) x,= t— and y,=t+

By (1.1.23), (a) will follow if we can show that

(c) G„(xt)<tsG „(Y,) for all a„<_t^1—a„

whenever n is sufficiently large. Allowing ourselves slightly more generality,


we now redefine a n as

d(log z n)
(d) a„ -
n

Suppose for the moment that

c(log 2 n)
(e) x,> forallat<_1—a„
n

for some fixed constant c; then by Csäki's theorem (Theorem 16.3.1) we would
have, for each e > 0 and all n ? some n e , that

+e) I—x,)b„
Q^„(x,)5x,+ (I'` byCsäki

[K t(1 — t) —(L^ +e) x,(1 —x,)]b„


=t— by(b)

(f) < t

THE a.s. DIVERGENCE OF THE NORMALIZED QUANTILE PROCESS 617

provided K can be assumed chosen to satisfy [see (16.3.4) for L^]

(g) [x,(1—x,)]/[t(1—t)]<K2/(Lc+E)2 for all a„<t<1—a„.

Since x,/1<_ 1, and since

1—xt
=l+=1+K t b„ by (b)
1—t 1—t 1—tom
(K 1/a„)b„
s l + since t <— 1— a„ by (e)

=1+K ,
d

we see that (g) holds so long as K is large enough to satisfy (recall e > 0 is
arbitrary)

(h) K> Lc(1+K).

We now turn our attention back to (e). We note that

clog e n b„clogtn
x,— n =I—K t(1—t) b

c c1og 2 n 1 c \ b„
= dt — n
J +^1 It—K t(1—t)^

>_0+f^(1—d)^—K ( iftexceedsthea„of(d)

JJ
i j n
>riL(1—d)^—fK

0 [i.e., (e) holds]

provided K is chosen small enough to satisfy

(i) K 1C ) 1 d

for some appropriate d. We rather arbitrarily specify

(j) c = 0.236... and L c = J2,


618 THE NORMALIZED UNIFORM EMPIRICAL PROCESS Z.

a choice which is made possible by Theorem 16.3.1 and Figure 16.3.1. Trial and
error then shows that specifying

(k) d=9 and K=2

leads to the joint satisfaction of (h) and (i), hence (e) and (f). But (f) is just
the left-hand inequality in (a). The right-hand inequality in (a) is verified by
a completely symmetric argument. ❑

Proof of Theorem 15.1.3

Proof of Theorem 15.1.3. We consider only M,(log n) d, /n <_ t <_ z; the rest is
treated symmetrically. Note that

IIUn+Vn11=V'IjGn —I+Gn' — I+Gn oG _G n-Gn'II

(a) { Il u n +U (G)Il +`In II 43 ° Gn 1 — h i,

where Theorem I shows that a.s. for n > some n , we have 0

(b) _I 1-9(log2n)/nC (
2+e)bn
<4(
log2 n ) 1/2
v' , 9(Iog2 n)/n
„/ ; • \ n /)

Let us now subdivide [0, 1] into n 3 intervals of length 1/n 3 .

This: -1/n3- i/n3 t Gn'(t) j/n 3 if t <G „'(t)

or this: j/n3 Gnl (t) t i/n3 ifG (t) <_ t

Suppose {U n (tAGn'(t), tvG (t)] } + >0 and let

(c) i and j be the same as in the figure.

(The case {v n (t AG (t), t v 6(t)] } > 0 is similar, and is omitted.) Thus for
-

all n >— some n^, we have

max IUn(t)—Un(Gn`(t))It-°'
Mj(log n) d '/nsts1 /2

(d) max { max


n 3 M 1 (log n) dt /1 /2n 1 lll U— il/n3 <_4(log2 n /n)1n(i/n3)1/2
\ a

{I un ( 313 ^ + n 22 J(n 3 ) ' l

(e) = 0 +o(n2) a.s.,


THE a.s. DIVERGENCE OF THE NORMALIZED QUANTILE PROCESS 619
where A. is defined by omitting 2/n 512 from line (d). Also, we will specify

(f) K„ — M2(log n)dZ (we will set c = I later).


(log e n )`
Then for each term in the definition of „ in (d) we have

P,;=P
\I „ \n'^n 3 Jl >
((i /n 3 ) a '(K„1 n a2 )) 2
(g) —2 exp
({ 2.4((log 2 n)/n)'i2(i/n3)1/2}

fl
\ X ^ n24((log2
(I/n3)a'(K„/na2) ^^
)/ n)`f2(i/n3)'tz

for all j satisfying the condition in (d), by inequality 11.1.2. Now the bracketed
{ } term in (g) satisfies
(2a,-iiz)^onli2 za2K
(h) { } [M,(log n)d'/n] 2
2 .
8(log 2 n)'i

The argument of ' in (g) is bounded above by

n )1/2-°, K
(1)M,(log n) d 1 4na2(1og2 n)'i 2
_K„
4M; /2-° I(log n) d i (I/2- a ,) (Iog 2 n )'i 2= B„
for all i as specified by (d); and thus the % term of (g) satisfies

(positiveConstant)(log2n) 1
(j)
E
+y — B

n 8 J for some s> 0

[this first bound on the rhs of (j) is appropriate if B„ oo, else the second]
for n sufficiently large. Combining (h) and (j) into (g) gives
Constant (log n)(d ,c2a,-112)ßo)+d,+d,(112-a,)
(k) pij 2 exp — n((2a,-I/2)ßo)+2a2-t/2(log2 n)`-'

Constant (log n) [d,(2a,-1/2) OI+2d 2


A n [(2a,-1/2No] +za - I/2(l og2 n )2C+iI2)
g

n)d,a,+a,)
(1) =2 exp (—Constant (log
since a, ;2! ä, letting c = 1, and so forth
2
(m) <2 exp (-8 log n) < 8
n
620 THE NORMALIZED UNIFORM EMPIRICAL PROCESS 7L

provided da + d2 _> 1 and M2 is sufficiently large. Now the number of sub-


scripts i, j that satisfy (d) is bounded by

(n) (number of subscripts i and j) <_ n 6 .

Combining (m) and (n) shows that the A. of (e) satisfies

6
(o) P 0„ >_ <_ n = = (term of a convergent series).
( -)

Thus

n a26 'n
(p) um K < 1 a.s.
n^ „

Finally, note that the second term in (a) satisfies


o 6 —' _ I—M,(Iog n)"i /n
a n n
(q) Jimn =j Q <_ Constant a.s.
[1(1-1)] I M,(logn)d' /n

Combining (p) and (q) into (a) gives the first result (15.1.19).
To prove the second result (15.1.20), we note that the B. of /i is of order
(log n) d=/(log 2 n)' -° i + `, with d2 = 1. Hence the p;; of (k) is bounded by

g 2„ ' '/2 (log )2


p;; ^2 exp —Constant (1
(log 2 n) 2 (log t n) 2C
X (log 2 n)(log2 n)' -„ ' + `l
(log n) d2 )

(r) <2 exp ( 8(log n)d2(1og2 n)'')


-

for n sufficiently large. Now apply steps (n) through (q) again.
for
The special case having a, = a 2 = 4, d, = 1, d2 =-,, c = 0 (but allowing
m- dependent rv's) can be found in Singh (1979). ❑
CHAPTER 17

The Uniform Empirical


Process Indexed by
Intervals and Functions

0. INTRODUCTION

In this chapter we will consider the uniform empirical process U,, "indexed"
by a collection of functions We write
1 _
(1) U.(f)= iavn=J— ^E^[1( i) i], — forf€
0

In Sections 17.1 and 17.2 we treat the case of (indicators of) intervals, 9=
0 < s < t < 1 }. In this case we write U. (C) for U. (1 ) =
{l c : C = (s, t],
U,, (t)—v n (s) with C = (s, t], and let ICI=It—s^. Since

(2) Var [U.(C)] = ICI( 1— ICI),

our interest will focus on functions q for which "UJ.= U with respect to 1) • /q11"
where both U. and U are considered as processes indexed by intervals. These
two sections parallel Sections 11.2 and 11.5.
Indexing by Swc C[0, 1] is considered (as a special case) in Section 17.3.
The main theorem there implies weak convergence of {U n (f): f c g} when
6= Lip (a)_{fEC[0, 1]: f(t)—f(s)I<_ t—sI°} ifa>2i these results are due
to Strassen and Dudley (1969) and are closely related to the higher-dimensional
results stated in Chapter 26.

1. BOUNDS ON THE MAGNITUDE OF IU n /qjI 11ta , bt

For any interval C = (s, t] with 0 <_ s <_ t ^ 1, let f(C) = f (s, t] =f(t) — f(s ), and
let ICI = 1—s. Let T denote the class of such intervals, let '(a, b)
621
622 THE UNIFORM EMPIRICAL PROCESS INDEXED

{C E le: a ^ ICj ^ b }, and let 16(a) = 19(a, 1). For any subcollection ' let
^IhII w -=sup{^h(C)I: CE '} and if q is a nonnegative function, let 11h/qI
denote sup {Ih(C)I/q(jCI): Cc '}.
Our goal in this section is to develop inequalities for UJ„/gll, (a , b) . The
development here closely parallels that of Section 11.2, making strong use of
the inequalities contained in Section 11.1, and is also related to the inequalities
in Section 14.2.
Our inequalities again involve the function

(1) i'(A)=2h(1 +,t)/,A2 forA>0with h(A)°A(logA +1)+1.

Recall that properties of Ji are summarized in Proposition 11.1.1. We also use


the classes Q and Q* of continuous nonnegative functions on [0, 1/2] that are
symmetric about t = z and satisfy

(2) q(t)/ and q(t) /V\ for 0<t<_2 for geQ

and

(3) q(t)T for0<t^ for qEQ*.

Inequality 1. (Shorack and Wellner) Let 0 < a (1 — 6)b < b <_ S < 2 and
A>0. Then for gEQ

(4) P ( sup 'Qj„(C)l


q(ICI) / s
f
^-,1 l <24 ab Zexp — (1 -3) y
t 1
4
A2q(t)Z
2t /
dt,

where

(5) Y = x(21 /2Aq(a))


sa-1 n
z
/
4 2 'k if a 2 (1
S n ) v 1n

Corollary 1. When q(t)=V we have

(6) sup „
A J
^O(C)I 24s exp(— (i —S)°y -2)
PCIJCJ^bq(ICI) J a ` 2

where
3
21/2 (1-5) if,t< 3 52 an

(7) y— (an) 3S 2 an(1 —S) 3


^, k if A ? 7 S Z an
BOUNDS ON THE MAGNITUDE OF IIU,/qII„ca,h) 623

Proof of Inequality 1. Let

__ y"(C)I
>A
(a) A” Slop q(I CI )

Define

(b) 0=(1-3)
and integers J<— K by (if a = 0, then K = ce and we consider J ^ i< K)

(c) OK<a<_BK-' and B J <b<_9 J- '

(we let 0' denote 0' for J:5 i < K while 6 K denotes a and 6 J- ' denotes b);
and let

(d) W,={C: 0' ICI <O' - '} for J<i_<K.

Since q is T we have

(e) A"c max sup IU°(C)I


i >A
[ J^i^K cEr, q(e )

s t

Figure 1.

Now for any integer M i , Figure 1 shows that

max sup ;
JsisK `B, q ( u
C€%

(f) < max max sup


C Iu"(JIM,,j,M1+t)1
/
e '
1J
JsisK O^jsM^ OstsB' q\ )

+ max max sup [


1Qj.(j1Mi— . tj1Mi11
`-
J<iK 0 `-j^Mj O^i^I/M. J°
q(e )

but we suppose that M is the smallest integer such that


;

(g) M16201
624 THE UNIFORM EMPIRICAL PROCESS INDEXED

and this entails for J <_ i s K that

(h) M; < Z
S e; + 1 < 5 288 ; -, SZe,-, .

Thus using the stationary increments of U,,,


> '
n
P(A ) Y
P
C K M.-1
A9(e )
1 —I o 1-0
i=J j=0
(II ^n III" ' 1—B '
,

1+8

K M.—t u l/M S
1-1 lM'
1
(i) + P ^Aq(g') = B+D.
' =J j=0 1 —I o 1-1/M; 1+S/

Now by Inequality 11.1.2 we have

K _ AZg2( ®`)(1—B ' )z


(J) B <_ M,2 exp ; ' z

f Aq(01)(1 —O)
X ^ 9' Vi 1 /1
1+S

K 4 _ A 2 9
ei- ') s ( OV—n
^ ex 1
, —S by (h)
l

since a = 6 K <_ B' for i <_ K and q(1)/t\ implies for J <— i <_ K that

q(O') (1 -9i-') 9(e`) q(a)


(k) B' -' 1+S 0' a

Thus, using Qi" and q 2 (t)/t\,

4 K ' 1 ( B ' 1 ! A z q(t)


-

tZ exp — 2 t (1_8) 4 y) dt
B < S Z ; ^+^ 1— B ^ J `

+ Sz 0 K exp (—
A2
(1_5)4i)

4 z z
(1) +S b6 exp ( 2 qgb )(1-6)`7)

<_ I^ Ja 6 tz exp / — 2 z 9z( t) (1— S) ° y 1 dt


since 1 be dt >_ (b — bB)/ 0b 2 8 ? 1/ bO and J^ ' e t -z dt/ S = 1/a. Likewise (note that
WEAK CONVERGENCE OF U, IN II•/qII METRICS 625

B' /2 means a' /2 when i = K)


Dc K M p U„ '/ M1 Aq(B')(1-1/M ) SB; /2 1 ;

(II1I B'/2(1+S) 1-1/M;)


KU 1M ,ßq(6') 1 1 \
;/2 (I — S)'/ 2
MAP \ I II

by (g)
K. 1
I — A2
q (01)
(m) 2M ; exp (1—S)3 1
r=r \ 0 M, 2(1/M; )(1-1/M1 )

( Aq(0')(1—S)2 //
`e' /2 (1/M ).%n
;

(n) <_^^ '^2 exp— 2 g 2 (t) (1 —S)°y)dt


f .

using in (m) that i is \ and that (k) and (g) imply

Ag(B') 2 1 Ag(o`) Aq(a)


(o) 01 (1-s) ;
JM;B^<eV1 s`a,I s

Combining (i), (1), and (n) yields (4). ❑

2. WEAK CONVERGENCE OF U. IN I) METRICS

In this section we suppose that q E Q; that is,

(1) q(t)7, t - ' /2 q(t)\, and q is continuous on [0 21 and i

symmetric about t = Z.

We will establish necessary and sufficient conditions on q for the specially


constructed U. of Theorem 3.1.1 to satisfy 1I (U„ — U)/ q jJ ,,(a) +‚O as n oo with
(e log n )/ n, E > 0. Our condition will be phrased in terms of the integral
/2

(2) T(q, A) =
fJ o ,
t-2 exp (—Ag 2 (t)/ t) dt

where A > 0.

Theorem 1. (Shorack and Wellner) Let q e Q. Then

(3) T(q, A) < oo for every A > 0


626 THE UNIFORM EMPIRICAL PROCESS INDEXED

is necessary and sufficient for Skorokhod's U. to satisfy

jj(U,, - v)/g En 4a g n>

(4) -sup IUn(C)-U(C)I : ICI>_en ' - log n -> P O


{ q(ICI)
for every E > 0.

Bounding ICj away from 0 in (4) is essential; if jCj shrinks down to a single
point, e, say, then 1J„(C) /q(^Cj)=vi(n - '- 0)/0 =co since all interesting q's
have q(0)=0.
The condition (3) in this theorem can be stated much more simply; this is
done in the following proposition. The Chung-Erdös-Sirao (1959) criterion
(see also Ito and McKean, 1974, p. 36) implies that for q E Q

(5) P(sup U(C) < q(E), E %0) = ^0


IC =E

/ 2

according as (1) exp


2 !— 3 i q (t) \j dt = -
[ ]

o+ t f - t /
\2 l <00.

If the integral on the right-hand side of (5) is finite (infinite) we say that q is
interval upper class (interval lower class) for U.

Proposition 1. Let q E Q. Then (3) is equivalent to

(6) eq is interval upper class for U for all e > 0

and to
7) g(t)=q2(t) /[t log (1/t)] -+co as t\0.

Thus (7) is an easily verifiable condition under which (4) holds.

Exercise 1. Prove Proposition 1. [Hint: To show that (3) implies (7), first use
q to show

-exp (-A g2(s) I ds>_I -1 exp -(- t) -log


JA^s2 tI//

for 0 <A <1, then let t\0. To show that (7) implies (3), note that
b tb
r -2 exp (-,Ag 2 (t) /t) dt = t - 2 +A g ( ' ) dt.
o 0

To show the equivalence of (3) and (6), argue directly using (2) and (5).]
WEAK CONVERGENCE OF U, IN Ik•/qjl METRICS 627

An Application

As noted in Exercise 3.8.3, Watson's statistic U„ may be written as

o [v„(t)_U.(s)]2dsdt
2 2 o
1'
= I I n[G (t)—G (s)—(t—s)] 2 dsdt.
2fo J o ,

Thus Un can be regarded as an "interval version" of the Cramer-von Mises


statistic W. Since Var[v„(t)—v„(s)]=It—sI(1—It—sl), a natural
"weighted” version of Un, or equivalently an interval version of the Anderson-
Darting statistic of Example 3.8.4, is

I I [U
(t)—U (s)]2
dsdt.
(8) T" Jo JoJo
It—SI(1—It—SI)
It—sI(1 —lt —sI)

T. was introduced by Shorack and Wellner (1982).

Example 1.

(9) Tn-*d T2= [U( t)—U(s)]2


ds dt.
0 Jo It—SI(1—It—SI)

To use Theorem 1 to prove (9), we let c„ = n 'log n, DZ „


{(s,t)E[0,1]x[0,1]: It slic„ }, D 1 ,,=[0,1]x[0,1]nDz ni and choose

q(t) = [t(1 — t)]" 4 . Then

ITn— T 2 I I[y„(t)—y„(s)]2—[U(t)—U(s)]2I
dsdt
D , It—sI(1—It—sI)
[y„(t)-1J„(s)]2+[UJ(t)—UJ(s)]2dsdt
+J DZ It—sI(1—It—sl)

_ DS ,
Ivn(t)—U.(s)—(U(t)—U(s))Ilvn(t)—Uf(s)+v(t)—U(s)I
It—sRi—It—sl)
dsdt

+R„

SAU,,— 1,1 £[11—sl(1-1i—sl)] -112 dsdi+R,


vII^(^r ^ 9^^^ecc^ +II91II

= op (1)0,(1)0(1)+op (1) = o, (1)


628 THE UNIFORM EMPIRICAL PROCESS INDEXED

by Theorem 1 and since, by Inequality A.1.1,

E(R) 2x(area of D en ) 4c„


(R„> s)< _ <_- -*0 as n-* oo.
E E E

The statistic T. would provide a test of uniformity if its asymptotic distribution,


that of T2 , were known. We conjecture that the distribution of T2 on the right
side of (9) is that of a weighted sum of independent chi-square rv's, but we
do not know the weights. ❑

Open Question. What is the distribution of T2 in (9)? What are the power
properties of the test based on T„?

Exercise
as
2. D Let j
l — ^j l for 1 i, <_ n. Show that T n of (8) maybe written

(10) T =n Y_
„ +—
2 n n

n i=i j=1
E {Dlog(D) +(1 —D ;j )log(1— D) }.

Proof of Theorem 1. We first prove necessity: Suppose that (4) holds for
every E > 0. Choose and fix e > 0. Then

J ` (E 2 n - ' 1ogn)

?sup ICI = e 2 n - ' log n


Un(C)—U(C) :
q(ICI)
_^ICIIi2IC1^/2_U(C)
(a) ?sup ICI = r 2 n ' log n
q(Cl)

e( 1 +o(I))[ICI log ( 1 /ICI)]'^ 2— U(C).


=sup
q(ICI)

CI = e 2 n - ' log n}

2 e[ICIlog( 1 /ICI)] ' /2 _U(C)


ICI =e 2n 1log
(

sup j q(ICI)

for n large; and hence for n > N , E

P( —U(C) < 2 s[ICI log ( 1 /ICI)] "2 + Eq(I CI )


for all C with ICI = E 2 n - ' log n) > 1— e.

Thus eq* (t) s[2(t log (1/t))' /2 +q(t)] is interval upper class for every s >0,

WEAK CONVERGENCE OF U, IN li-/gjE ,, METRICS 629

and hence, by the equivalence of (6) and (7),

(b) q*(t) =2+ q(t) 1 o0


[t log (1/t)]112 1(t log ( 1 /t))'' 2 J
as t -.0. But this clearly implies the same is true for q; i.e., -q is interval upper
class for every E > 0.
To prove the sufficiency part of the theorem, replace g by

(c) g(t)= min {inf{g(s):O s<_t}, log (1/t)}Too as 40.

We will use (17.1.6) to handle intervals C with en - ' log n ^ j Cl ^ a n


q 2 (1/n); then (17.1.4) will handle intervals C with a n <_ (Cj < 0 for fixed
(small) 0.
Since t"' /2 q(t) is \, with b n = En - ' log n and M very large,

(d) P(Iwn/gjjW(b,.an)? e)
'/2
IQJn (C)i
P sup ,/2 _8 log
a
{g{an))
1 {C:b,sICIsa n } ICI ( .) /

(e) <_P up I W n( C)I ? M(log n) 1 ^ 2 ) for all large n


{} Cl
200 ( M 2 ( 3M(log n)" 2 " \
exp `— 32 (log n)+G` (E log 1/2
n n) J1
1
by (17.1.5) and (17.1.6) with 6=2

200n (M2 (3M11


2 (log n)ii (j-1j))
E log n exp — 3
200n M2 1
~ E log n
exp ^— M
/c1gz j (log
f or n) large M

(f) < E for M large enough, and NM in (e).

Also
(g) P(IIU/g11w(0,e)? )^E for 0=some BE

in view of Exercise 14.1.1; and

(h) P(Iwn/ qII 19(a n.e) e) ^ e for n >_ some NE

by (17.1.4) and (17.1.5) with 8=0 since y? l/i(2 1 " 2 E/0). Applying (f)-(h) and
630 THE UNIFORM EMPIRICAL PROCESS INDEXED

Theorem 3.1.1 (note that Ilv n —UJI1,,^2IIU n —UII) to

(i) II(Un —U)/gIIw(b,l—O)C Ij U n/qlI (b,a^) + II U n/ glIw(an,o)


+IIU/qll g(,,B)+IIU - UIj (,I-B) /q(8)
shows that the term in (i) does -‚0. Finally, for O<_ s ^ t<_ 1 with t — s >_ 1-0
we have q(1 — s) >_q(l—t)vq(s); so that

^n I supt LU (t)I
: t?1 -01 +sup j
IUn(s)1 :S^ B^
(

i q ^«-e.n 1 q(t) l q(s)

has (for 0= 8 E sufficiently small), by the proof of Theorem 11.5.1,

(j) P^II^°II ?e Ise for all n >_some NE

and likewise
fu >eI .
(k) P \II9^c^ e,n

Hence (4) holds. ❑

3. INDEXING BY CONTINUOUS FUNCTIONS VIA CHAINING

Now we want to consider the uniform empirical process indexed by a class


of functions :9c C[0, 1]. In fact, we will consider a more general process.
Let (a', d) be a compact metric space (e.g., '=[O, 1]', d = Euclidean
distance), and let P be a Borel probability measure on the Borel sets @( ,T).
Let X,, X2 ,... be iid'valued random variables with distribution P, and let

(1) Pn = n '(8 x,+...+S X ^)


-

be the empirical measure where S X denotes the measure with mass 1 at x.


Let C(') denote all real-valued continuous functions on ä. Let I1 1) denote
the sup norm on C('). Let F be a II II-compact subset of C('). We want to
consider the process

(2) Z (f)= ✓n
i r
fd(P n —P) for fE c C()

_ =Z ,
1 n
[f(Xi)-Ef(X)]
n` /
^ {
INDEXING BY CONTINUOUS FUNCTIONS VIA CHAINING 631
Note that f(X1 ) is just a bounded real-valued rv, and thus Ef(X) is well
defined. Since

IY,(f)— a.s.,
the Y; 's and hence also 7L,,, can be viewed as random elements with sample
paths in the separable metric space (C( ), II II) where II II now denotes the
sup norm on the set C(SW) of all real-valued continuous functions on
Easy calculations show that

(3) E7L„(f)=0

and

(4) Cov R. (.f), Zn (g)] = J ( I 1 g — I g dP I dP


f— f f dP ll `€3. /
Cov [f(X),g(X)] for f,g

Let 7 denote a mean-zero Gaussian process with covariance

(5) Cov [7 (f), 7 (g)] = J ( f f dP) (g — J g dP) dP, fgE.


f—

It is true that 7 can be taken as a random element on (C(SP), II II). [Since

E[Z(f)-7(g)] 2 (f—g) 2 dP< IIf—giI 2 ,

this follows from the hypothesis (8) in Theorem I below and Dudley, 1973,
Theorem 1.1.]

Exercise 1. In the case' = [0, 1], P = Uniform (0, 1), show that for the family
of (discontinuous) functions ={1^o,t:0<_t^l}, Z„(l (o,. ) )=UJ , and
Z(1 Eo ,. ] ) = U where U. and U are as in Chapter 3.

From the multivariate CLT we have

(6) In fd.Z as n-*cc.

To use Theorem 2.3.3 to show that Z„^Z, we need to show that {7L„} is tight;
and this will hold if 9 c C(') is not "too big.”
To make "too big" precise, for any metric space (S, p), let
( k
(7) N(e, S, p)=inf j k: Sc U V,diam (V)<2e
l 1=1
632 THE UNIFORM EMPIRICAL PROCESS INDEXED


where the Y are Borel subsets of S. Then H(e, S, p) = log N(E, S, p) is called
the metric entropy of (S, p). In our present context, N(e, 1, II II) is a measure
of the "bigness" of 9. The hypothesis (8) in the following theorem guarantees
that H (and hence N) grows sufficiently slowly as eJ,O.

Theorem 1. (Strassen and Dudley, 1969) If

(8) H(u, x, 11 11) 1 / 2 du <oo,

then

(9) ZZ as n -oc i n (C(), `60),11 11).

Corollary 1. There exist processes Z = 74, , n -1, and Z* = 7L defined on a


common probability space such that

(10) IIZ* — Z * 11= SUplZ„(f) — Z * (f)I a.s, 0 as n->oo.


fE19

The corollary follows immediately from Theorem 1 and Theorem 2.3.4.


Note that Theorem 1 and Corollary 1 hold for any distribution P on Z To
see what Theorem 1 yields in a particular case, consider X= [0, 1] k, k? 1. For
r> 0, write r = m + a where m is a nonnegative integer and 0< a s 1. Let
c C(fe) —C([0, 1] k ) be the set of functions f with Il f ib < 1, all partial
derivatives of order m continuous, and for mth derivatives, g =
a mf/a m `x, ... a m "x,, where Y_; m, = m, for all x, y E
[0, 1] k. Then Fr is compact, and the following bounds on H(e, . „ II II) hold.

Proposition 1. (Kolmogorov; Clements) There exists a constant M, 0< M <


oo, such that

1
(11) M e k ^' <— H(e, ^„ II lI) <_ Mr for all e>0.

We will not prove Proposition 1; see Dudley (1983) for a proof, references,
and discussion.
Since (11) implies that (8) with 1 =., holds if r> k/2, the following
corollary is an immediate consequence of Theorem 1.

Corollary 2. If'= [0, 1] k and . _ 9r with r> k/2, then Z. =7L as n -^ co


in (C(,), II II), and for the special processes of Theorem 2.3.4,

IJZ*n — Z * Il = sup IZ (f) - 7L * (f)Ha.s. 0 as


JE..
INDEXING BY CONTINUOUS FUNCTIONS VIA CHAINING 633
In the further special case of X = [0, 1] and P = Uniform (0, 1), we write
U. (f) and U(f) for Z. (f) and Z(f), respectively. In this case the conditions
of Corollary 2 become simply _ = Lip(r) = { f: f(x) —f(y) <_ Ix — yI',
x, y E [0, l]} with r>. We now summarize this.

Corollary 3. If' = [0, 1], P = Uniform (0, 1), and -9 = Lip (r) with r> Z, then
U,, U as n--*oo in (C(), II II).

Corollary 2 cannot be improved upon greatly, as shown by the following


proposition.

Proposition 2. For = [0, 1] k, P = Uniform ([0, 1] k ), the Gaussian process.


Z is not continuous on k/2 . Hence, weak convergence of Z. to Z in C(Sk , 2 )
necessarily fails.
Proof. Let f E C([0, 1] k ) and suppose that f vanishes on a neighborhood
of the boundary of [0, 1] k, but not identically, that 2f E k/2i and that
J [o I] k f dP = 0. Then c 2 = J [o.I] k f2 dP > 0.
For m = 1, 2,... let C; , j = 1, ... , m k denote the congruent cubes obtained
by dividing each axis into m equal subintervals. Let C, denote the cube
containing the origin, and set

f(mx)/m k/2 xE C1
.%(x)= t
o XE[0, I) k n C;.

Let); denote the translation of f, accomplished by translating C, to C. Then,


for s; = t1, Y-^ I S;f E "° k/2.
For every m, Z(f) = N(0, c 2/ r 2 ) are iid since Cov [Z(f ), Z(fk )] = 0 for
j ^ k and

f 2(mx) dx = Zk fzdP.
E[Z(.%)] 2 = f f dP = I
c, m m f[o,,]k

But

max{I7Lt ^^s'j'
1 I:S =^1}=maxj^ y s 7L( ;):S =f1}
' ' - j . ;

mk
= i =1 IZ(I)1
mk
= m - '` F, jZ; j where Z; are iid N(0, c 2 )
i =l

-+ a.s. c 2/ 1r as m - ► oo,
634 THE UNIFORM EMPIRICAL PROCESS INDEXED

while

as m -> O,
Il; l S' f II 2m'`^ z

and hence 1 is a.s. discontinuous on 9k/2. ❑

Proof of Theorem 1. To show tightness of {7L,, (f): f c }, we will first show


that for every e >0 there exists a S = S E such that for all n >_ 1

(a) P(Sup[l7n(f)—/ (g)i: IIf — gll^ 8 ]>E)<E.

_ y, then
A preparatory exponential bound is the starting point: if AIf —g,41 <

(b) P(I7n(.f) -7n(g)I^A)^2exp — --i) for A>0.

This follows from Hoeffding's inequality (Inequality A.4.6) since I Y,(f)-


^',(g)I^2II.t'—gII a.s.
or m = 1, 2, ... cover by N(2 -m -4 , ^ II II) sets, each of II II-diameter
2 -m-3 , and select one point from each, thereby forming a set m c 9 dense
within 2 -m-3 . For each f E , and m ? 1, choose fm (f) = fm E 3' m such that
2 'n 3 . Let Cm = H(2 m 4 , 3',
IIf—fm II -
-
- 1-
/ 2 /2 m-1
and d = max {5C,n, m-Z }.
II II) m
Thus, by (8), both Lm = , cm < and ^m dm <ao; and we may choose r> 11
so large that Ym = , dm < e/2.
We claim that S = 2 - r -3 will work in (a). To show this, we use a "chaining
argument": write

Z (J) - 4(g)I — Imn( fr)+ L (Z (Jm) n(fm 1))


m=r+l

Zn(gr) — (1,(gm)1n(gm-f))I
m=r+l

(c) ^I7Ln(fr)—Z,(gr)I+ Z IZn(f.)-1n(fm-1)


m=r+l

m=r+l

Then, since dr +2 m = , + , dm <e, and since IIf—gII S implies IIfr — gr II X3 6 <_


2 -r, we have
INDEXING BY CONTINUOUS FUNCTIONS VIA CHAINING 635
P(sup [IZn(f) - 7n(g)I: IIf — gil < 6 J> 6)
-5 P(sup [I Z (fr) — Zn(gr)I: If — g,II ^ 2-r ] > dr)

+PI sup Y- IZn(fm(f)) -7Ln(fm-1(f))I> Y_ dm)


\je3 m=r+1 m=r+1

+P( supI 7Ln(gm(S)) -1 n(gm-t(g))I > L dm )


gei m=r+1 m+1

by (c)
^ N ( 2 -r-a , `fir, II II )2 sup P()Zn (fr) —7tn (gr)I > dr)
l IIf,-g,Il 2 • -

+ N(2-m-a, J II II ) Z P (I^n(fm) —Z n(fm - JI > dm)


m=.+1

+ Y, N(2-m-4,' II II) 2P (I m n(gm) — Zn(gm - 1)I > dm)


m=r +i

II II) 2 exp ( — d / 8 • 2-Zm )


m=r

by (b) with y = 2 - m
2

=6 Y- exp [2H(2 - m -a ,^, !I Ii) -22m d,nl 8 ]


m=r

(d) =2 r exp [log3+2H(2 -m a , ^, II II)-4m d/8].


m=r

Now

(e) 2H(2-m-4, II II) ^ 4r cm c 4 m drin / 25


by definition of dm , and since dm ? m -2 , for all m ? 11

log 3—log d,,, <_ log 3+2 log m <_ 4mm -a/20
(f) :4' dm/20<4' dm, 0-0 .

Hence

log 3+2H(2 - m -a , s, Ii 1I) -4m d/8


:5log 3 — zöo 4m dam, by (e)
^ log dm by (f).

Thus (d) is bounded by 2 ZM =r dm < E, and this completes the proof of (a).
636 THE UNIFORM EMPIRICAL PROCESS INDEXED

To show that {7L „} is tight, let 0< < 1, choose E = e k = 773 -k, 8 k = S ek for
k ? I in (a), and set

A,r = (l {zeC(3): Iif— gII<-3 k implies (Iz(f)—z(g)11:541.


k=1

Thus A C(,9) is equicontinuous and by (a) we have

(g) P(7L„cA,1)>1 -2 for all n >_1.

Now if -9, is a finite set dense within 3, in ^, we can choose M so large that

(h) P(sup[IZ,(f)l:fE^,]>M)^2

for all n since E7L„(f) 2 = EY;(f) < oo. But then, with

B,r = {Z E C(f): IIzJi <_ M+ ii},

the set K = A. n B,1 is compact by Arzela-Ascoli, and by (g) and (h) we have

(i) P(7L„ E K) ? I — 77 for all n ? 1.

Thus the laws of {7L„} on C(-q) are tight, and this together with (6) implies
(9) by Theorem 2.3.3. ❑

Theorem 1 is limited by the requirement that' be compact. More general


results in this direction which do not require compact X, due to Pollard,
Dudley, and Dudley and Philipp, are stated in Chapter 21. Kaufman and
Philipp (1978) and Kuelbs and Philipp (1980) have LIL's and strong invariance
principles in parallel to Corollary 3.
CHAPTER 18

The Standardized
Quantile Process On

0. INTRODUCTION

Let X 1 ,.. . , X. be iid with df F and empirical df F. Suppose F has a density


f that is positive on (c, d), where —oo c < d <_ oo, and zero elsewhere. Let

(1) g-f(F-') on(0,l).

denote its density quantile function. Then

(2) O„(t)=g(t)v[F„'(t)—F-'(t)] for 0<t<1

is called the standardized quantile process. A heuristic Taylor-series expansion,


based on Fn' = F - '(G„') and the fact that F - ' has derivative 11g, gives

(3) O.,(t)=Vn(t)•

Thus in Section 1 we establish the weak convergence of Q. to Brownian bridge


V, in II/ II metrics. In Section 2 we establish that for "smooth” df's F on
(—oo, oc) the difference II0„ —V, goes to zero at a rate that is almost n - ' 12 ;
this is, enough to imply that important theorems of Kiefer, Finkelstein, and
so on extend trivially from V. to Q.
We take this opportunity to recall for the reader the main results established
so far for V.. In Theorem 12.2.2 we saw that

(4) lim (I4/„—B.II/lam <M<oo a.s.


n-^m

(for a Hungarian construction)

for V. represented in terms of iid Exponential (1) rv's a 1 ,.. . , a,, 1 ,...
... and
637
638 THE STANDARDIZED QUANTILE PROCESS 0.

for a sequence of Brownian bridges B. related to a Brillinger process 3 via


B„ = B(n + 1, • ). We also had O'Reilly's theorems (Theorems 11.5.2 and 11.5.3).
Thus if

(t) )
(5) T(q,A)= t -'exp( t dt forgEQ*,

then the smoothed quantile process i/„ satisfies

(6) asn-► co
if and only if T(q, A) <x for all A >0

for the special construction of Theorem 3.1.1. Also, if

(7) g(t)=q(t)( t Jog 2 (1 /t) for q E Q,


-

then

(8) 1I(V„— U) /qjj -^' 0 as n -oo if and only if g(t)-+co as tJ,0.

The Kiefer-Bahadur theorem (Theorem 15.1.2) showed that

(9) li n j4 11Un +V„I1 1 a.s.


M =
b„ log n f

for any version of U. based on an iid sequence, , ... ,„, ... .


Finally, in Section 16.4 we showed that

V. 91o2 g n
(10) I m lb,:52 a.s. fora„=
I(1 —I )°n

1. WEAK CONVERGENCE OF THE STANDARDIZED


QUANTILE PROCESS Q.

Throughout this section

(1) we consider nt , • ... S„n„ Gn, Un, U, V,,, and V — U,

for the special construction of Theorem 3.1.1. We now fix a df F and define

(2) X„,°F -'(^„;),so that X„ ; =F

We let F. denote the empirical df of these Xn , ... , X„ „.


WEAK CONVERGENCE OF THE QUANTILE PROCESS Q . 639

Example 1. (Asymptotic normality of sample quantiles) Let D denote the


differential operator, so that DF - '(p) denotes the derivative of F ' evaluated -

at p. For 0 <p < 1 we let Xn , p denote Xn:k , for any integer kn satisfying

(3) /(kn /n—p)--0 as n-400;

we call such Xn:p a pth quantile, while Xn,(np)+, is called the pth quantile.
Suppose 0 < p, <. • • <p. < I and F ' is differentiable at each p ; . Then
-

(4) (Ii(Xn:p,—F'-'(P1)),...,^(Xn:P.—F-'(p.))) d N(O, )

as n - oo, where the i, jth element of T is

(5) ter;=p,(i—pp)DF-'(P1)DF-'(p) for t<_i,j<_K.

Proof. Suppose K = 1. Then

F -i ( n:k„) — F-'(P)
(a) —F '(P))=
(Xn:p ]V n:k„ — P)
fn:k„ — P
= An^(6n:k n — P)

=A.[Vn(kn/n)+^\nn p/J

(b) = An [V'( )+0(1) by hypotheses;


n

recall G(k/ n) _ fn:k . Now

IVn( kn /n) — V(P) jVn (kn /n) — V(kn/n)I+IV(kn/n) — V(P)I


(c) a.s. 0 as n -* cc

using (3.1.61) on the first term and using the continuous sample paths of V
and kn / n -+p on the second. The difference quotient A n satisfies

(d) An ßa.,.
- DF '(P) -
as n -* oo

since F - ' is differentiable at p and

t _ r
In:k.. PI S Sn: k„ n + n — P
kn kn

IIG'III+I nn— P

(e) -'a.5.0

640 THE STANDARDIZED QUANTILE PROCESS Q„

using the Glivenko-Cantelli theorem (Theorem 3.1.3). Plugging (c) and (d)
into (b) gives, for the special construction,

(6) '[X„:p — F '( p)]


-
a.5. DF '(p)V(p)
-
under (1) and (2).

The limiting ry in (6) clearly has the claimed normal distribution. This com-
pletes the proof for K = 1.
For K> 1 we simply observe that a random vector converges a.s. if and
only if each coordinate does.
We have actually established in (6). This implies that the weaker 4 d
of (3) holds for a pth quantile of this, hence for any, triangular array of row-iid
F rv's (not just for the special construction). ❑

Exercise 1. Show that if F has different right- and left-hand derivatives f


at some F '(p) with 0<p< 1, then I (Xnp F '(p)) converge to a distribu-
- — -

tion centered at 0 with different N(0, p(1— p) /[ f(F '( p))] 2 ) densities to the
-

right and left of 0. See Weiss (1970).

When F is sufficiently regular, we have

(7) DF -'(p) = 1 /g(p) for g = f(F ')


-

where f denotes the derivative of F; then (6) implies that

(8) g(p)✓n [XX,:(,p) 1 — F '( p)] tea.,. V(p)


-
under (1) and (2).

This format is slightly handier when we extend from the vector situation of
(3) to a process.
If F has density f, then the standardized quantile process 0„ is defined by

(9) Q„= gVi[1' —F-']=gs[F-'(G) —F -'] on (0, 1 ),

where g denotes the density quantile function defined by

(10) g = f(F ')


— - on (0, 1).

Proposition 1. If (7) holds for 0 <p<1, then

(11) 0., ' r.d. V


- as n - n3.

Proof. This, of course, is just an elementary restatement of (3). ❑

Theorem 1. (Häjek; Bickel) Let 0 <_ a < b<_ 1. If g —f(F) is positive and
continuous on an open subinterval of [0, 11 containing [a, b], then the special
WEAK CONVERGENCE OF THE QUANTILE PROCESS Q. 641
construction of (1) and (2) satisfies

(12) IIQ„—VIIä-p0 asn-+co.


Proof. Applying the mean-value theorem to (9) gives

(13) Q,(t) = [g(t)/g(t)]N„(t) for some t* between t and G„'(t).

Since t* is uniformly close to t in that


(14) sup It — t i c —'II ßa.5.0,
astsb

we have that
(15) sup (g(t)/g(tn*) -1 1'$.5.0.
asisb

Thus (15), jjV n VII'as.0, and the triangle inequality applied to (13) give

(12). See Bickel (1967). ❑

Example 2. If F has a density f that is positive and continuous on a closed


interval, then Theorem 1 holds with a = 0 and b = 1. For a density positive on
[0, cc), the theorem holds with a=0 and b<1, and so forth. For a density
positive on (—co, oo), the theorem holds with 0< a <b < 1.

Extension to 11 /q fl Metrics

We are now considering the convergence of Q. to V in certain supremum


metrics on all of (0, 1). The key technical difficulty to be overcome is to keep
the t* in (13) from being too much smaller than t. To accomplish this we
recall Inequality 10.4.1. Thus, for all e >0 there exists a number b = b in E

(0, 1) and a set A having r

(16) P(A )?1—s


„f

such that

(17) bt<_Gn'(t) forallwEA „E.

We now make some smoothness assumptions on F and g=f(F - '). Suppose

I
(18) F has a continuous density f that is positive on some
(—oo, oo) having F(c)=0 and F(d)=1.
(c, d) c

(19)
I Iff(c)= 0(iff(d)= 0),thenf is 7 (is \) on
some interval whose left (right) endpoint is c (is d).
642 THE STANDARDIZED QUANTILE PROCESS Q„

The monotonicity of (19) combined with (18) implies that for t sufficiently
small g(t)/g(t) is bounded by g(t)/g(bt) so long as cv E A„ E , where g=
f(F '). Define
-

q(t)g(bt)
for 0< t
g(t)
(20) qb(t) q(t)g(1 — bt)
for4<_t<1
g(1—t)
linearly, and continuous forks t < .

Theorem 2. (Shorack) Suppose F satisfies (18) and (19). Suppose: (i) qb E


Q* satisfies Chibisov's condition of (18.0.6) for each positive b in some
neighborhood of 0; or (ii) qb E Q satisfies Shorack's condition of (18.0.8) for
each positive b in some neighborhood of 0. Then the special construction of
0„ given in Theorem 3.1.1 satisfies

(21) II (G " — -V)


+P II 0 as n -, cx (for the special construction) ;
9
here Q equals G. on [1/(n+1), n/(n+1)] and equals 0 elsewhere.
Proof. Continuing on from (13) we write

(a)
q g(t *n) q l
[ g(t) ] 4l-. -4/ + g(t) —1 ^0/

g(l*n)

Let s >0 be given. Let 1 „ denote the indicator of A n , of Inequality 10.4.1.


E
q

Then for a = a E >0 chosen so small that g is 1' on (0, a) [see (19)], we have

(b) lnE
1
:(t) — V(t) g(t) 1V^(t) — V(t)j + 2 g(t) IV(t)I
for0t<a
q(t) J g(bt) q(t) g(bt) q(t)

as our bound on (a), where b = bE depends on e. Thus

(c) 1,E11(Q *n — V) /g1Iö: jj(V —V) /qb + 2jV/9bjjö


Now our hypotheses on qb and either (18.0.6) or (18.0.8) give

(d) V
(V q ;

IV) 1 0 0.
Also Inequality 11.4.1 implies that a = a E can be supposed to have been chosen
so small that
v
I<_e.
(e) q
WEAK CONVERGENCE OF THE QUANTILE PROCESS Q. 643

Thus for n >_ some NE' we have from applying (d) and (e) to (c) that
)

(f) P(1^EII(On9 V)IIo^3E J<_2E,

implying by (16) that

(g) P( (Q" 9 ^I 3E)<_3e for n>_N` )

l flo

By a symmetric argument we have that

(h)
9
P( (Q "
1 >_3e) <_3e
1 ^a
'
for n>some N.

Applying Theorem 1 gives

P! (fin)
(i) sl<_e for n>_ some NE's.
9 0 /

Combining (g) -(i) gives

(j) PI (Q° ?3E f ^7e for n>_some Ne .


9

This is just (21). See Shorack (1972b) for the first real extension of
Theorem 1 to (0, 1). The same key methods give this theorem [note Shorack
(1979)]. However, O'Reilly's (1974) result was needed to make things work
at full power. ❑

A condition that is often easy to work with was discovered by Csörgö and
Revesz (1978). Given their Lemma 1 below, it will be trivial to check that
Theorem 3 follows from the proof of Theorem 2. Suppose f exists on (c, d)
and satisfies

(22) sup F(x)[1— F(x)] lf ^ I


f2( x) <_ some M <oo.
(x)
c<x<d

As in Theorem 2 we will assume that either

(23) T(q, .1) < oo for all A > 0, with q E Q*,

or [see (18.0.7) for the g of (24)]

(24) g(t) -+oo as t J,0, with qE Q.


644 THE STANDARDIZED QUANTILE PROCESS Q.

Theorem 3. Suppose F satisfies (18) and (22). Then the special construction
of V. given in Theorem 3.1.1 satisfies

(25) I -4P as n-+oo (for the special construction)


9 0

for all q satisfying either (23) or (24). Here Q'n equals Q. on [1/(n+l),
n/(n+1)] and 0 elsewhere.

Example 3. We now show that

(26) normal df's satisfy (18) and (22) with M = 1.

It suffices to consider when F is N(0, 1). Then f'(x) = —xf(x), and Mill's ratio
A.4.1 shows that (22) holds with M = 1. It also holds that

(27) logistic df's satisfy (18) and (22) with M = 1.

Again it suffices to consider F(x)=11(1+e -x ). Since F(1— F) = f and


f'(x)/f(x) = (1 — e x )/(I +e -X ), condition (22) holds with M = 1. It is trivial
-

that

(28) exponential df's satisfy (18) and (22) with M = 1.

Exercise 2. See if (18) and (22) are satisfied for all (i) Uniform and (ii)
Cauchy df's.

Lemma 1. (Csörgö and Revesz) If F satisfies (18) and (22), then

1) ]
(29) for all 0<_s,t^1.
g(s)c
g() snt I— svt ) M (

Proof. Our assumptions give

(a) ^d logg(t)^ t t.
t( M t) =Mdi logi

Thus for s < t we have

(b) log g(t) —log g(s) <_ +[M log (t/(1 — t)) — M log (s/(1— s))]
=M log [t(1 — s)/s(1—t)];

and just change the + in (b) to — in case s> t. ❑


APPROXIMATION OF 0„ BY V, WITH APPLICATIONS 645

Proof of Theorem 3. Now for 0: t <_ 2, on the set A., of Inequality 10.4.1
we have from (29) that
g(t ) tvG-.'(t) 1M. tv t /b ]M
g(t) I tnGn (t)I
Constants C[
(Abt

(a) = Cb -2M
Just replace the bound g(t)/g(bt) in line (b) of the proof of Theorem 2 by
the bound Cb -2M just developed in (a), and note that the rest of that proof
then carries through with q replacing q b . 0

2. APPROXIMATION OF O n BY V. WITH APPLICATIONS

In this section we consider the rate at which h O,, —V,, j converges to 0.


We again assume that the density f of F satisfies

(1) f>0 on its support (c, d)

and f' exists on (c, d) and satisfies

(2) sup F(x)[1 — F(x)] Lf2 (x) ' - some M <oo.


c<x<d f (x)
We make the additional assumption that

(3) f is >' (is \) in some interval with left end c (right end d).

Let X 1 ,. . . , Xn be iid F with standardized quantile process

(4) On = gf [F;' — F ') on (0, 1)


- where g = f(F ').
-

Then the probability integral transformation of Proposition 1.1.2 shows that


f; = F(X) for 1 <_ i <_ n are iid Uniform (0, 1). Let V. denote the quantile
process of these f,'s, and let U. denote their empirical process.

Theorem 1. (Csörgö and Revesz) Suppose F satisfies (1) -(3). Then O n of


(4) satisfies

(5) On — ßn11 = (rn) a.s.,


where

(6a) (log2 n)/'Ji <1


(6b) rn = ( log 2 n) 2/V if M= =1
(6c) (log2 n)(log n) (m- ' )( + e / f
' ) for all e >0 >1
646 THE STANDARDIZED QUANTILE PROCESS Q„

and M is as in (2). (We stress that this theorem holds for any iid sequence.
No mention is made of constructions.)
Proof. We follow Csörgö and Revesz (1978a). Now application of Taylor's
theorem shows

(7) Q n( t )— V ,,( t )= g (t) ✓n[F-'(t +n -"i2V,,(1))— F-'(t)]—V (t)

(8) =------dn(t)g(t)
V(t)g( g2
fn
2v 9 ([ )
(t*)
1 N „([) t(1— t) *It (1—t)
* S (t *) S(t)
(a) =[ - 2^ t(1 —t)JLt *(1—t*)J g(t * )J 8(t * )

for some t* = t *,. such that

(b) It — t * <_ n -1/2IV„(t)l.

We now define
9(log2n)
(c) a„= n
Recall from Theorem 16.4.1 that the first bracketed term in (a) satisfies
1 N2 ' an / 1 092 n
-

(d) lIm ! 2 <4 a.s.


✓n I (i — I) I a„

Consider now the second bracketed term of (a). Now by (10.3.10) from
A >_ 1 we have
A n = [G„`(t) is ever s to t/,t on [a,1]]
=[6(t) is ever :!:At on [a n /A, 1]] = B;
-:

and the Tim of G„ /I on [a n /A, 1] equals ß9/ A, by (10.5.12), with ß9/ , <2 for
A =2 by Figure 10.8.1. Choosing A =2 we thus have
0 = P(B n i.o.) = P(A„ i.o.);
and thus
(e) ImIIt /t *IIä„^2 a.s.

By symmetry and (e) we have


:(1—t)
2 a.s.
(f) l^^IIt*(i — t *)(Ia^' ''
APPROXIMATION OF Q„ BY V. WITH APPLICATIONS 647
The third bracketed term of (a) satisfies

(g) I t*(1—g((*)
t*)S'(t*
an
II'-an )

by assumption (2).
The fourth bracketed term of (a) satisfies
a,\ M
by Lemma 18.1.1
Ilg( t *)Ila, an— \IItA t * 1 — (tV t * )„
( ' Ila
(h) —
< 2 M using (f) and Exercise 1.

Combining (d) and (f)-(h) into (a) gives

n - M2"" 91og 2 n
(9) lim ^^^n ^n ^^ ö„ a" logt
— a.s when a n
n- 0 n

under hypotheses (1)-(3). Thus, since (0, a n ] and [1—a n , 1) can be treated
symmetrically, the theorem will follow if we can show

(i) IIQn—V, '= o(rn ) a.s. for rn as in (6)-(8).

We will first show that

a log 2 n ^
(10) li m +IV 1i 0 18 a.s.

Let 0<—t<_a n . If Cn'(t)<—t, then

9 (logz n)
(j)IVn(t)1 `.ant <—.man <-
✓n

If Gn'(t)>_ t, then for n sufficiently large

IVnit)I N ^lG ' (t)S' nIG ' (a.) — a d+Y Ila


n n n n

<— a n (1— a n ) 3 log e n +ma n by Theorem 16.4.1


9(log 2 n) + 9(log 2 n)
vn
(k) — 18(log 2 n)
✓n

Combining (j) and (k) gives (10).


648 THE STANDARDIZED QUANTILE PROCESS Q„

Equation (i), and thus the theorem, will follow once we show that
(1) li,m IjQ„ II0 - = o(r„) a.s. for r„ as in (6a)-(6c).

Now since J 1/g = F - ',

( 11 ) Q n (t)g(t)J7 g(s)-' J ds.

If t <_e„ :( „, ) , then we note from (11) and (3) that for n large enough

(m) IQ (t)I — t) = I V (t)I;

and the bound (10) applies to (m). If t> f„, then

IQn(t)I I t(1 —s) ”' ds


Vi by (11) and Lemma 18.1.1
<) Is(1—t)J

M M ( 1 \
<_(1—a„) - ,t t ds
SM)
^() (

M
(1 — a„) —

n ifM<1
1— na
(n) Wit)
log \ if M = 1
— (1 —a„
—M M

(1 M n 1 •^ t M-1 ifM>1
S n:(nt)

0( 1og2n )
ifM<1

(o) n ) O( lo g 2 n t ifM =1
< I log(a
e J n:l
^ ^

v n /

Jr an M-1
_j Or lo^nl
iif M> 1
` J
(p) = o(rn)

since (10.1.3) gives

(q) tim log (1/n „ :l) —


—1 a.s.
tog2 n
n-.00
APPROXIMATION OF O„ BY V. WITH APPLICATIONS 6149

with the implications that

log (an/en:1)
(r) lim = 1 a.s.
n-= l ogt n

and

(s) Q
r " = O((log n)) for alle > 0.
Sn:l

This completes the proof. We note that our (s) gave us a slightly better rate
than Csörgö and Revesz obtained in case M> 1. ❑

Exercise I. Use (10.3.10) with A =2 and the ß9 x limit of (10.5.12) and Figure
10.8.2 to show, in analogy with the proof of (e) in the previous theorem, that
lim n —,!It/tII ,,2 a.s.

Exercise Z. Show that if 0<f(c) <o0 [if 0 < f(d) < 00], then monotonicity of
f in a neighborhood of c (of d) can be dropped in Theorem 1. [Begin with
line (11) of the proof.]

Some Easy Applications

The following applications of Theorem 1 are also from Csörgö and Revesz
(1978a) and (1981).

Theorem 2. Suppose F satisfies (1)-(3). Then


ijallu"^'Q"II= 1
(12) liIn n a.s.
n-- bn logn f

for 0 n as in (4). As usual b n = 2 loge n.


Proof. Now

(a) Uri+On=(Un+vn)+(On—Vn)

where the Kiefer-Bahadur theorem (Theorem 15.1.2) establishes the order of


the first term in (a) and Theorem 1 establishes the negligibility of the second
term. Basically,
1/a
n
(b) bn lo g n =o(1)

for all choices of r" in (6a-6c). ❑


650 THE STANDARDIZED QUANTILE PROCESS U.

Theorem 3. Suppose F satisfies (1)-(3). Then

(13) Q,,/b,, ' a.s. wrt II

for the collection 9' of functions in Finkelstein's theorem (Theorem 13.3.1).


Proof. Again,

(a) on—vn+on—vn
b„ b„ b„

where Theorem 13.3.1 applies,to 4/„/b„ and (0 „- 4/„)/b„ is negligible by


Theorem 1. ❑

Theorem 4. Suppose F satisfies (1)-(3). Then an alternative to (12) is

(14) li m io n +0$„ ^ ^
1
log n) 2 (log 2 n) „

J <_ some M < a a.s.,


n -oo /{( n

where

(15) Bn=^(^')=U

is defined in Theorem 12.1.1.


Proof. We write

(a) lion +B,, l ion —v,, + IUn + V ,, + Il IB n —U nII,

where the middle term gives the rate. See Theorem 1, Theorem 15.1.2, and
Theorem 12.1.1 for the rates of the three terms in (a). ❑

When (12) and Exercise 2.9.1 are combined with Finkelstein's exercise
(Exercise 13.3.1) and Mogulskii's exercise (Exercise 13.5.1) they yield

J0(r) dt 1 f' 1 Z
(16) lim bZ = ^2 a.s. and lim b„ J „(t) dt =4 a.s.,
n+0
n o

as well as a myriad of similar results for functionals of Q. other than Jö ( ) 2 dt.


We could also obtain Berry- Esseen -type results (the rate n - ' 12 is missed
by various logarithmic factors) for functionals of Q „.
For extensions of the work of Jaeschke (1979) and Eicker (1979) in Chapter
16 the reader is referred to Csörgö and Revesz, 1981, Chapter 5.
APPROXIMATION OF Q„ BY V. WITH APPLICATIONS 651

Exercise 3. Let F satisfy (1)-(3). If in addition both I f"J is bounded and


(fl is bounded away from 0 on some finite interval (e, d) c (c, d), then
n ilz
(17)
lyrnlogzn IIB"+Q"jjd' some K >0 a.s.
F

Thus Theorem 1 leaves little room for improvement (see Csörgö and Revesz,
1981, p. 1953).

A Comment on the Csörgö—Revesz Condition

A function f is called slowly varying at t =0 if for every c> 0 we have

i(ct) -

(18)
.
1 as t-^0.
f(t)
If

(19) g(t)= t°f(t) with f slowly varying at t=0,

then g is called regularly varying at t = 0 with exponent a. (Nice general


expositions are found in de Haan (1975) and Seneta (1976).)

Exercise 4. (Parzen, 1980) If g = f(F ') and -

'(F-' (t)) g'(t) _T ao at 0


t(1— t) fz (F -i l
(t))^
I ö t(1— t)
(20) lö i
g(t) — a1 at 1,
then

(21) tog(t) and t -° 'g(1— t) are both slowly varying at 0.

Moreover, if a o v a 1 < 0, then all absolute moments are finite; while if a s v a, > 0,
then

(22) EIXI'<co ifr<


a 0 v a,

A SLLN for the Quantile Process

Exercise 5. (Mason, 1982) Suppose F ' is continuous. Then for each r,,
-

rz > 0

II I'ß(1— I)'2(F' — F ')II


-

(0 a.s. if E(X - )' l 't<oo and E(X + )'l'2<oo


(23)
t oo a.s. if either expectation above equals oo.
652 THE STANDARDIZED QUANTILE PROCESS Q.

3. ASYMPTOTIC THEORY OF THE Q-Q PLOT

The Standardized Q-Q Process

Let X 1 ,. . . , Xm be a sample from F with empirical df F m , and let Y, , ... , Y„


be a sample from G with empirical df G „. We suppose F and G are continuous.
Define

(1) A_G-' °F -I and &mnfi'°gymI

Doksum (1974) indicates that if the X 's are controls and the Y's are
;

treatments, then A(x) can sometimes be appropriately regarded as the amount


the treatment adds to a potential control response x. One might then ask if A
is positive for all x (the treatment is always beneficial) or for what x is it
positive (when is the treatment beneficial), or does a confidence band for A
contain a horizontal line (is Y = µ + X) or any line at all (is Y = µ + oX )?
Form the standardized Q-Q process

(2) Dmn= g°G -' °F m+n [G '°f m - G ' °F] - on(-co,co)

(3)
_
- g°G -^ °F
mit
m +n(m„
D) on(-,);

here g denotes the density of G. Note that

mit
(4) ®mn— g °G'°F m "i:m - G -`°IFm +G -' °Fm-G'°F]
+n[Gn
_ , _,
° m - G (^m)]
-g°G °F m +n[G
G -I ( F m )- G '(F) mit -

+ g ° G ' °F [i=m-F]
t=m -F m+n

(5)m+n
On(F)+vm(F);

here 0, denotes the standardized quantile process of Y's defined by

(6) On= g° G 'v[G^'


- - G - ']
(7) =4/„ = [the uniform quantile process of G(Y,), ... , G( Y,,)]

and

(8) Um(F)=i[Fm -F]


ASYMPTOTIC THEORY OF THE Q-Q PLOT 653
denotes the empirical process of the X's. Thus, for independent Brownian
bridges U and V,

(9) ®mn—\ v(F)+ \ OJ(F) ifmnn->oo


(10) =Wmn(F) for Brownian bridges W mn

using Exercise 2.2.6. The following theorem is now routine; see Doksum (1974)
for an early version without a q function on an interval [a, b] F [0, 1], and
Aly (1983) for this version.

Theorem 1. (Doksum) Suppose F is a continuous df and suppose G and


q satisfy the hypotheses of Theorem 18.1.1 or 18.1.2. Suppose also that
0 <A < m/ (m + n) < 1—A <1 for all m, n and some A. Then a special construc-
tion of Xmt , ... , X,„ m , U, Y.,,..., Ynn , V satisfies

mn(F`)Wmn]
II -*„0 as m n n-+oo;
(11) (I[® q

here ®m equals Dmn on [1/N, 1-1/N] and equals 0 elsewhere.


,,

Exercise 1. Write out the details of a proof of Theorem 1.

We will not attempt to form confidence bounds for A based on Theorem


1, inasmuch as the function g°9 - ' in (2) is unknown. The situation is similar
to using the sample median X. Thus f(Xf —i) -^ d N(0, [4f 2 (x)] - '); and
before one can test a hypotheses such as z=0, the quantity f(x) must be
estimated. Instead, one uses the sign test statistic which is N(0, 4 - '), and
whose power depends on f(z). We take up the analog of the sign test in the
next subsection.

Confidence Bands for A

Let

m n
(12) Hmn= m+n Fm + m+n f'n

be the empirical df of the combined sample, and define

(13) T'" n —
mit (Fm(x)—Gn(x)I with q(t) - t(1 — t).
m+n .. up^^n q(^ mn (X))
654 THE STANDARDIZED QUANTILE PROCESS Q„
Now note that (using Section 1.1)

G (0 +I)=G„(G ' ° F) -

i n in
= nl ^ 1 [Y G - '-F( ) ] = n j L t 1[F- '(G(yj)),5.] a.s.
(14) = t„(= the edf of a sample of size n from F).

Let K.„ = K„,,,(1— a) denote the 1— a percentage point of T.„; thus

mit IF.(X)—Gn(x)I
(15) 1— a=PF_c1\
m +n a < up)b[O^ mn (x)(1 — H mn (x))] 1/2 ^ Kmn

Now F, G, and q are such that

(16) Tmn-->pT=IIU/gllä asmnn->a),whenF =G.

Thus

(17) Kmn (1—a)- +K =K(i—a) where P(T<K)= 1 —a

defines K.
Define

m mit K2
N=m+n, A =N, M=--, c=M,

u = F -(x), v=Grt(X),

and now rewrite the event of (15) as

(u — v) 2 <c[du +(1 —A)v] {1 —[Au +(1 — ,1)v]} for a<u<b,

and then as d(v) ^ 0 where

d ( v ) =[1+c(1 —A) 2 ]v 2 —[2u—c(1— A)(2Au -1)]v


+[u 2 — CAU+CA 2 u 2 ].

Since the coefficient of v 2 in d (v) is positive, d( v) < 0 if and only if v is between the
two real roots of d(v) =0. These roots are

u +Zc (1 — A)(1 -2Au)± 1 [c 2 (1— a) 2 +4cu(1— u)]" 2


(18) h u _
( )

1+c(1—A)2
ASYMPTOTIC THEORY OF THE Q-Q PLOT 655

We can thus rewrite (15) as

(19) 1—a= P(h (Fm(x))^On(x)^h + (Fr(x))fora<_Fm(x)<_b).

Continuing on, applying (14), gives

h(F(x)) z5Gn(0(x)+x)c h+(Fm(x))


l—a=PFG
( for aF,n(x)25b
G ' ° h(Fm(x)) — xC 0 (x)^Gn ' ° h+(Fm(X))—x
(20) °PFa (
for aFm(x)<_b

when G„' is the usual left-continuous inverse and Gn'(t) = sup {x: G (x) <_ t}
is the right-continuous inverse. El

Exercise 1. Verify (20).

Of course,

(21) a level (1 — a) distribution free confidence bound for A is given by (20).

This is from Doksum and Sievers (1976); see Switzer (1976) for q= 1.

Let (x) denote our usual greatest integer less than or equal to x, and let
« x » denote the least integer greater than or equal to x. Then the band (20)
can be rewritten as

[Amn(x)r A'mn(x)) — [ Yn:(nh - (i /m)) — x, Yn:« nh + (i /m)»+1 — x)


(22)
for Xn:; < x < X n , ;+ ,

on a <_ F m (x) <_ b (use X n . 0 = —oo and X n:n+ , +(X if need be). The W band
(q(t) = '/t(1 — t)) of Doksum and Sievers (1976) and the S-band (q(t) = 1) of
Switzer (1976) are shown in Figure 1 (from Doksum and Sievers, 1976). They
found that the W band performed well in the tails, and recommend it.
The width of the W band at x, when multiplied by mit (m + n), is equal
to [see (20)]

(23)+ n [G,-' ° h + (Fm(x)) — Gn' ° h (Fm(x))]

2K F(x)[1— F(x)]
(24) ^ p , uniformly on [a, b]
g° G - (F(x))
656 THE STANDARDIZED QUANTILE PROCESS Q.

g S• .

4 S . `'
D
2 _ r'

- 2r w —3

— 2 —1
s ,

0 1 2

Figure 1. Synthetic data: the level 0.95 S band (solid line) and the W band (dashes).

provided

(25) G has density g that is strictly positive on (a, b).

Exercise 2. Prove (24). Hint: The key steps are summarized by

[the width (23)] = + n [G"' 0 h + (Fm) — G - ' 0 h + (Fm)]

0 h (Fm) — `-' - 1 0 h ( F m)]

° h + (Fm) — G ' (Fm)] —

+ rn +n [G'

m+n [G —' ° h (Fm) — G ' (Fm)]


_ N(F)—N(F) 2K F(1—F)
{26) m+n goG - '(F) + g °G-^(F) .

Note that the first term in (26) is zero. The second term in (26) results when
the lead term u in both the h and h of (18) cancel when h + —h - is formed,
and mnJ( m+n) times the t2[ +4cu( 1 — u)]'' 2 term of both h + and h
contributes K F(1— F)/g o G(F); the other terms in h + and h - all wash
out since mit /(m +n)c -'0.

Exercise 3. Derive formulas analogous to (22) and (24) for the S bound. In
particular, show that in case q(t) = I for all t, then

[i(x), llmn(x)) =

[ Yn:(n(i /m —K,,,,,(t—a)/Jmn /(m+n))), Yn:«n(i /m +K m „(1—a)/Jmn /(m+n)) »]

for X, x <X, :j ,,0 :5i^ m,


THE PRODUCT-LIMIT QUANTILE PROCESS Y ,, 657
and

(27) — A-,n] G K {F) II ° 0 as m n n- co.


m+n [tmn


Of course K mn (1— a) and K now refer to percentage points of
mit/ (m + n) F m —6,, j) and IJ U JI, respectively.

4. WEAK CONVERGENCE = OF THE PRODUCT-LIMIT


QUANTILE PROCESS Y.

The quantile function or inverse distribution function corresponding to the


product-limit estimator IF„ is

(1) „'(t)inf{x:„(x)>_t} for0t<F(T).

Note that we cannot hope to estimate F '(t) for t> F(r) since all the observa-
-

tions are <—T. Assuming that F has density f, the standardized product-limit
quantile process is

(2) V n = gf [' — F - '] on ( 0, F(T)),

where g = f(F ') is the density quantile function.


-

The appropriate limit process for V,, is

(3) V _ —X(F) = —(1— I)S(B) on (0, F(T))

where B = D(F '); see (20.4.12) for the definition of X. Thus V is a mean-zero
-

Gaussian process with covariance function

(4) Coy [V(s), V(i)] = (1— s)(1 — t)B(s A t)

with

(5) B(t) _ (1— F) -2 (1— G) - ' dF

J [a,:]
(1— u) -2 (1— G(F '(u))) ' du.
- -

Theorem 1. (Sander) Suppose that g —f(F) is continuous and >0 on


[0, 0] where 0 < F(,r). Then

(6) Il^'n—Vllö ßa.5.0 as n--oo

for the special construction of Theorem 7.1.1.


658 THE STANDARDIZED QUANTILE PROCESS Q.

Corollary 1. For 0 < p < F(T), under the assumptions of Theorem 1,

(7) ✓n[f.'(p)— F-'(p)] *dN 0, ( asn -goo.


\ f(F (P)) /

Proof. By Theorem 7.4.2 it follows that

(a) L(F-')=V'[tn(F-')—I]

satisfies

(b) 1Ivn(F-')—X(F-')IIö-*a.5.0 as n r

for 9 < F(r). By Vervaat's (1972) lemma (see Exercise 1 below), (b) implies that

(c) Vn' [F()—I]

satisfies

(d) IIvn YIIö ^a.s.0


— as n-*cc

where V —X(F - '). But, by (d /dt)F - '(t) = 1 /g(t) and the mean-value
theorem, it follows that

Vn = —
F, ( gYn

= F -'(F(f^'))— F -'(I) gV„

F(i -) — I

(e) 8(I*) V„ where I1In —II1ö- IIF(')—I11ö -4a.s.0


n

by (d). Thus, as in the proof of Theorem 18.1.1,

(f) 1
11'n—Yn11öC g(In) -1 , 11 ly 11 _...0 n 0

by (d) and the continuity and positivity of g. Combining (d) and (f) yields

(g) IIYn—YItö ^e.s. 0 as n-*ao

which implies (6). ❑


THE PRODUCT-LIMIT QUANTILE PROCESS V. 659

Exercise 1. (Vervaat, 1972) Suppose that {x n } is a sequence of nonnegative


/ functions on [0, 1 ], that as n -* oo, and that

(8) Ilan(xn—I) yII -> 0 as n -00,

where y is a continuous function on [0, 1]. If x (t)= inf{u: xn (u) >_ t }, show
that

(9) Ilan(x„' -1)—(—y) Il -*0 as n->oo.


CHAPTER 19

L- Statistics

0. INTRODUCTION

An L- statistic is a linear combination of a function of order statistics of the form

( I /
( 1 ) Tn L Cnih(Xn:i)
n i.,
--

for known constant c„ and a known function h. In Section 1 we establish


;

asymptotic normality of T„, as well as a functional CLT and LIL for both
T,, ... , T. and T„, TT+ The proofs are all contained in Section 4. General
,, .. ..

examples are given in Section 2, and examples built around randomly trimmed
and Winsorized means are considered in Section 3.

1. STATEMENT OF THE THEOREMS

Introduction, Heuristics and the Main Theorem

Let X 1 . . , X„ ... be iid rv's with arbitrary df F and empirical df I n . Let


,.

Y_
c„,, ... , c„„ denote known constants and let h denote a known function of the
form h = h, — h 2 with each h i T and left continuous. Consider

in /
(1) Tn=— Cnih(Xn:i)
n ;_i

which is a linear combination of a function of the order statistics X „,, C


Xn: „, or an L- statistic. Then

(2) T„
f
= o ,
h(Fn')J„
l
0
dt = h(F) d'Y„

660
STATEMENT OF THE THEOREMS 661
where we define

(3) J(i)c1 _i<_n,


for(i-1)/n<t—<i/n, 1<

with J„(0) = c,,,, and

(4) ''Y„(t)= J ' J„(s) ds


1/z
for0<t<_1.

Note that c„ / n=T(i/n)—'Y((i -1)/n) for 1:i<—n. A natural centering con-


;

stant (provided the integral exists) is

(5) µ„
= fo, h(F - ') J„ dt = J 0
, h(F - ') d 1'',,.

Provided J,, -+ J in some sense, we would have greater interest in


s
(6) µ= h(F - ')Jdt= h(F) d'Y'
0 0

where

(7) 'I'(t) = J r J(s) ds for0 <t<1


i/z

(assuming '1' and are well-defined integrals).


We now suppose that the Xi 's are of the form

(8) Xi = F '( fi) -

for iid Uniform (0, 1) rv's . Let 6„ and U. denote the empirical df and
empirical process of f,, ... , i,,. Define

(9) g = h(F - ') on (0, 1).

Then [we will verify step (11) below in the next section]

=J 1

0
g(G )dVn — gd'`P,,
0

(10) = f , gd[`Y„(G,,) — 'Y,,]


0

(11) =—f [%P,,(G„) —4 .] dg


0
1 under Assumption 1 below


662

(12) _—
f ol
[6,, —I ]J dg (note the approximation sign)
L- STATISTICS

1 n i
[I11—
, t]Jdg
n i=i o
n

n
(13)
_,
=1^Yr° „•
i r.

Thus, the key to this approach is to control the size of the error made in using
the approximation of step (12). That is, we need to control the error
n
( 14 ) Y, =Tn E Y, = T
— µn -1n ;_, . — fA. — n
n

In particular, it is clear that demonstrating

(15a) o➢ (1) yields a WLLN if EI YJ <oo


(15b) o(1) a.s. yields a SLLN if E^ Y) <oo
yn =
(15c) op(n-''2) yields a CLT if EY 2 <o0
(15d) o(bn/Vi) a.s. yields a LIL if EY 2 <oo (b„ = 2 log 2 n).

We will also be interested in establishing functional versions of the CLT and


the LIL. Note that Y; in (13) contains the influence function.

The Assumptions

We need to formulate assumptions sufficient to rigorize the argument sketched


in the introduction above. We follow Shorack (1972a).
AssuMrrION 1. (Bounded growth) (i) Suppose

(16) IJiBand all IJ $ Bon (0, 1)

where

(17) B(t)=Mt -b l(l —t) -e= for 0<t<1 with (b 1 vb 2 )<I

is the scores bounding function. (ii) Suppose

(18) h = h, — h 2 with h,, h z 7 and left continuous on (—cc, cc)

and
(19) h1(F -1)I <Don (0, 1) for i = 1, 2
STATEMENT OF THE THEOREMS 663

where

(20) D(t) = Mt -d ß(1— t) -d= for 0<1<1 with any fixed d,, d2

is the tail rate function. Let

(21) a = (b,+d,) v (b 2 +d2 ).

(For a CLT or LIL, we will require a <; while a <1 will yield a SLLN.)

Associated with g=h(F ') is a Lebesgue-Stieltjes signed measure and


-

integral ,( dg; integration with respect to the total variation measure will be
denoted J dlgI.

Remark 1. Suppose g>_0 is \, and has Jö g(t) dt < oo. Then tg(t) ^
1.g(s)ds-+0 as t-->0.Thusforh J', ;

(22) EIh;(X)I'<x

implies

(23) Ih 1 oF - 'Is[I(1—I)] '/'d, (t) where 4(t)-+Oast-^0or1.

Lemma 1. Suppose Assumption 1 holds. Then

(24)J 0
[t(1— t)]'B(t)dlgl(t) <oo ifr>a.

Thus the ry

(25)Y=— J [1^ X 11 —t]J(t) dg(t)


0

of (13) satisfies

(26) EY = 0 if a < 1,

(27) Var[Y]=o 2 o 1 [snt—st]J(s)J(t)dg(s)dg(t)<oo ifa<2,


fo fo
and

(28) EIYI 2+s <oo forall0<_S< )-2ifa<2.


( a1

All proofs for this section are grouped together in Section 19.4.
664 L- STATISTICS
AssuMvrlox 2. (Smoothness) Except on a set of is of Igo-measure 0 we
have both J is continuous at t and J. -* J uniformly in some small neighborhood
oftasn - oo.
ASSUMPTION 2'. (Smoothness) Suppose

(29) J has a continuous derivative on (0, 1)

satisfying

(30) IJ'(t)I<_ B(t) for0<t<1


t(1- t)

for the scores bounding function B of Assumption 1; and suppose

1-
n 1,n ,1 <—i <n
l
(31) c,, ; =J(t) for somete

[The interval in (31) could be widened to width Min. but this should be
sufficient.]
ASSUMPTION 2". (Smoothness) Suppose J is Lipschitz in that

(32) IJ(s)—J(t)I_<MIt—si forall0s, is 1,

and suppose the c,, ; satisfy (31). (This is a special case of Assumption 2.)

The Main Theorems

Theorem 1. (CLT) (i) Suppose Assumptions 1 [recall (23)] and 2 hold


with a<. Then [see (27) for Q Z ]

(33) '(T,, — j ) a — J^JUndg 3 aN( 0 ,i 2 )•


0

(ii) Suppose Assumptions 1 [recall (23)] and 2' hold with a <Z. Then

(3 4 ) Vi(TT — ;i)— J ' JU dg d N( 0 , i 2 )•


0

(iii) Suppose Assumption 1 (with b, = b 2 = 0 and d, = d2 = a <) and


Assumption 2" hold. Then

(35) v (TT — /A) — JJU dg +dN(0,o )


-

0

STATEMENT OF THE THEOREMS 665

and
(36) ../ii (,, — µ) ^ M[EI h,(X )I + EI h2(X )I ]l.%n.

(iv) If J equals 0 on [0, a] and [b, 1] for some 0< a <b < 1, then Assump-
tion 1 may be omitted entirely in (i) -(iii) while Assumption 2, 2', or 2" need
only hold on [a, b].
(v) In all six cases above, the ry f ö JU „ dg satisfies

(37)
o
Jv„ dg = JU dg
a fo'

for the special construction of Theorem 3.1.1.

Theorem 2. (LIL) Let a <2. Let b„ = 2 log e n. Then


(38) f(T„b — n
N)
,
a.5. [—Q, Q] as n -^ o0
n

or
(39) '/i( bn — µ) ms [—u, a]
as. asn-400
n

in all six cases of Theorem 1.

Theorem 3. (SLLN) Let Assumption I hold, but now with a < 1. Then

(40) T„ — µ„ -* a.s. 0 as n -* o0

or
(41) T as.
T N as n -* o0

in all six cases of Theorem 1.

Functional Versions of the Main Theorem

We define the L-statistic process of the past by

k(Tk —µ k ) k-1 k
(42) 7L„(t)= for <t—,1<_k<_n,
fo, n n

with 7L(0) = 0, and the L-statistic process of the future by

1(1)_ T Irk) n
(43) for n <t-- k?n
U k+1 k'
with ß l„(0) = 0.
666 L-STATISTICS

Theorem 4. (Sen) Suppose Assumptions 1 and 2 hold with a <Z.

(i) (CLT) Both

(44) Z Sandi'Son(D,9,^^
= 11) asn-oo

for Brownian motion S.


(ii) (LIL) Both

(45) Z n/ bn a.s. ✓L and 71 nl bn — a.s. •%f as n -* oo

for b„ = 2 log e n and Strassen's class 5 of functions [see (2.8.5)].

In Chapter 2 we showed that both (44) and (45) hold for the partial-sum
processes of the Yk 's of (13) defined by
1 (n!) Yk I
(46) S„ - 1 0' (the past)

with S(0)=O and


_ ("h') Yk _
S S ( „,, > (the future)
(47) (n/I) k=1 0 Q(n/ j)

with S(0) = 0. Thus Theorem 4 is an immediate corollary to the following


Theorem 5, and rewriting (14) in the forms

(48) k(Tk —{A,k )I'1 =Sk /.in+kyk / ✓n for 1sk<n

and

(49) V /r (Tk — /d k ) = ✓ n Sk / k + s yk for k >_ n.

Theorem 5. Suppose Assumptions 1 and 2 hold. Suppose

(50) a=[(b,+d,)v(b 2 +d2 )]<2.

(i) (CLT) We have the functional CLT for the past

(51) max klyk) -> p 0 asn-oo


Irk<n

and the functional CLT for the future

(52) max VIy k I ß p 0 as n-+oo.


STATEMENT OF THE THEOREMS 667

(ii) (LIL) Let b„ = 2 log e n. We have the functional LIL for the past

(53) max kl ykt -^ e s 0 as n - oo,


I ksn sJbn

and the functional LIL for the future

(54) max
ka:n
6J yk as ^
.
0 as n .

(iii) (SLLN) Even if we weaken (37) to a < 1, we still have

(55) yn -^ a . s . 0 as n oo.

(iv) We can replace y„ by y„ in (i) -(iii), when y„ is defined by (59) below.


(v) If J equals 0 on [0, a] and [1-b,!] for some a, b>0, then Assumption
I may be omitted and Assumption 2 need only hold on [a, b].

Of course we are interested by replacing µ„ by µ in the theorems above.


One approach is to consider µ„ - µ directly. Another approach is to consider
T„ - T, where T„ is the special case of (1) described in Remark 2 below.

(56) T„ Y_
Remark 2. Consider the special case of (1) defined by

_—
n
I n

i_ 1 c1 —urn
c„ h(X„ )
; :; where 4 /n = J , „
;
j
J(t) dt

„')J„ dt = f
r
J g(G„) dI'
i
= g(G ol g(G n )Jdt =
-

0 0

(57) = I g d 4r (G ),
0

so that [under Assumption 1, as in (11)]

(58) f,,-p= J gd [T(On) - `p]


0

_ - ["I'(G„) -'Ir] dg under Assumption 1.

In this case we must control y„ defined by

(59) yn= T„ - µ -i Y- Y =T„-µ- °. ;

n ; _ 1 n
668 L-STATISTICS

Thus for the special scores c,, ; of (56) we have µ„ = p, and (33) could be of
direct interest to us in this case.

Theorem 6. (i) Suppose Assumptions 1 and 2' hold and suppose

(60) ;
c =J(t) for sometc t n
-1 i
, n ,1<i<n.

Then

(61) -µ1 :sMn -(112 as n-0o if a<1

and, for any 0 <- S < Z - a,

(62) n(T„-T„)=O(n -s )a.s. asn - n ifa<1.

(iii) If J is Lipschitz, EIh i (X)I <cao for i= 1, 2 and (60) holds, then we
can even claim that

(63) M/ Ii for all n for some M < oa.

Our theorems, as they apply to the conclusion /i ( T„ - A) = a -(ö JU dg,


are due primarily to Shorack (1969, 1972a), where the above hypotheses and
the fundamental integration by parts were developed. Through Mehra and
Rao (1975) and Wellner (1977a, b), in-probability linear bounds were replaced
by a.s. nearly linear bounds making a LLN and a LIL for T. possible. Through
Boos (1979), the endpoints in the integration by parts were handled slightly
smoother. Sen's inequality (Inequality 3.6.3) allow these same proofs to extend
to functional versions. Note that Theorems 1-3 are corollaries to the theorems
of this subsection. Other approaches that yield theorems approximately equal
to Theorem 1 are found in Chernoff et al. (1967) and Stigler (1969).

Remark 3. A careful look at the proofs shows that extending from J^ -> J to
a random function J„ - J can typically be adjusted for in the proof of asymptotic
normality with only minor changes.

Better Rates for Embedding for T, with Extensions to T., Based on the
Hungarian Construction

To establish the following less important result, we require the special scores
(56) used in T, as well as the added smoothness Assumption 2' on J.

Theorem 7. Suppose Assumptions I and 2' hold with a < 2. Then

(64) ' max i kJ k I/b , = 0(1) a.s.


.

STATEMENT OF THE THEOREMS 669
It is also true that

(65) max kIyk I/bk = 0(1) a.s.


k n

Remark 4. Let Y denote a (0, o- 2 ) rv. Komlds et al. (1975) showed that if Y
has a finite moment generating function in a neighborhood of 0, then there
exists iid rv's Y?, Yr, ... , Ye,... distributed as Y and a Brownian motion
S on [0, oo) for which

E;nt ; Y' S(nt) _ log n


(66) lim sup ^ - O a.s.
n-.co O l l QV n N n vn

This is the best possible rate of embedding. If we only have Ej Yj' <x for
some r> 2, then the rhs of (66) is to be replaced by O(n - ( ' /2) + 1) ) a.s. If we
only have EY 2 <OD, the rhs of (66) can be replaced by either o p (1) (for a
functional CLT) or o(b) a.s. (for a functional LIL).

Remark 5. If we plug (64) and (66) into (48), we see that if Assumptions 1
and 2' hold with a <', then

(nt)(T(n,)-p. ) S(nt)
lim sup I
n -+coOstsl ^o, vi

Y* _ S(nt)
I a.s.
,ntt
n(67) -= i ii su p S a.s.
n-.ao
Otss l .'/ j Ø

for the KMT construction of rv's Y*, ... , Y n*, ... of (13); this holds since (64)
implies that max t .5 x < n kIyk l/.fi= O(b /v)= o((log n)//i), which is better
than the best possible rate O((log n)/s) of (66). The appropriate rate for
the rhs in (67) is obtained from the Y; 's from Remark 4. The key point is that
the rate is determined solely by the Y; 's of (13) or (25), and not by the y ; 's.
If we wish to replace Tn in (67) by Tn , then we also use (62) to determine the
appropriate rate that is possible. [Equation (65) is just an unused observation.]

A Brief SLLN

If h is the identity function, then for r, s> 1 having r - ' + s - ' = 1 we have

(68) ITn - /-^nl =J Jn( 0 „' -F- ')dt ifh =l


t t/. t
IJJI'dt] n ' -F 'I dt
f0
S ]"'

<_ [Jo IF
(69) 0 a.s. if h = I
670 L-STATISTICS

prcvided that for the r, s> 1 we have

i n 11
(70) lim -E jcn; j' <e0 a.s. and E(XI s <oo with-+-=l.
n-^ n ;_, r s

Note that this does not require any limiting function J to exist. For the proof,
merely apply the obvious analog of (2.6.4) or (2.6.8).

2. SOME EXAMPLES OF L-STATISTICS

Estimates of Location

Suppose

(1) Pn: Xn,= Eng= F-'(fni)=F for 1si^n.

for the special construction fin ,, ... , U of Theorem 3.1.1. We let

(2) µ = Es = EF - `(e) = J 1 F - '(t) dt


0

(3) µ = F '(2) = (a median of s)


-

(4) U2=Var[E]=Var[F-'(e)]=j [F - '(t)] 2 dt-I

when these are finite.


oi L
( f F - '(t)dt

Example 1. (The sample mean X n ) Note that

(5) .%n(Xn—%6)=.'.(En—ib)= I F -' du n


J 0

J
1

(6) = Tmean = F ' d U, where 0-2 < oo, by Theorem 3.1.2


-

(7) = N(0, u 2 )

since h=F - 'E92 and QI[h]^ 2 -h^_cr 2 . ❑

Example 2. (The sample median X n ) In Example 18.1.1 we learned that if


F - ' has derivative DF - ' (z) at t = Z and if kn is an integer satisfying ✓n (kn / n -
1/2) -^ 0 as n - oo, then (note that any reasonable Xn may replace Xn:k. )

(8) %/n(Xn:k„ N') = N/ (En:k„ I2)


(9) =DF-'(z)Vn(i)
a
=a - DF - '(i)Un(z)
SOME EXAMPLES OF L- STATISTICS 671

(10) =Tmed DF—'( )U(2)


a

U( , ) when F has a positive and continuous


(11) __ 1
f(µ) z derivative f in a neighborhood of µ
I

(12) f
%(N) l/2dU =i(µ) 0 1[1
/2,l]dU

( ❑
(13) N 0'4l2(µ)/.

Example 3. (The sign test) The sign statistic (centered at µ) can be represen-
ted as

"[F ( ) — F(/Z)] = U„(F(j))


=U(F(µ))
a

(14) = T51g „ = U(z) if F is continuous at µ.

Comparing Examples 2 and 3, show why TT;gn is preferable to Tmed in testing


situations; T„ 1ed must be Studentized by a complicated estimate of f(µ,), while
T51g „ appeals only to the binomial distribution. ❑

Example 4. (Joint behavior of X„ and )„) Combining Examples 1 and 2


gives

F
(15)
%n (X„ µ x ) a 1
fo ,
-'dOJ
o2 <oo and F has
derivative f as in (11).
' dU
fl/2

Note that the covariance of the limiting ry is (let h, = F ' with h, = µ, let
-

h 2 — 1 [,/z,1]/f(µ) with h 2 =/f(/L), and compute 0 h,,h 2 = (h, — h,, h 2 —h 2 »

' dt—µ /2 fµxdF(x)—µ /2


(16) Coy = 1'/2F
(.Z) f(ja)
Then
^z J, x dF(x) — µ/2
.f(N-)
(17) Z = [ 1

* 4f2(1Z)

denotes the covariance matrix of the limiting rv's in (15). ❑


672 L-STATISTICS

Suppose F has density f on (—cc, cc) with finite Fisher information for
location 10 (f). As an alternative to the P. of (1) we consider

(18) Qn: Xn; = b/ ✓n + e n; for 1 <_ i <_ n.

Then the likelihood ratio statistic L„ for testing P. vs. Qb is


b/✓n)
(19) L b _ log f(Eni{' —

i 1 f(r1)
and according to Theorem 4.5.1, when the null hypothesis P n is actually true,

(20) I supB ILn—bZn —ib 2 I0 (f)I- PL O as n + cc, provided 10 (f)<oo,

where

(21) Zn _ —1 Y- f (En;) = 1
vI1 i =I JJ
E ß0(m1)
(E V ^! 1=1
with 1O
0

F1
Jo F —

(22) =Zoe= f 1 ßo dU
a

(23) = N(0, Io (f)) when Io (f) <co.

Example 5. (Joint behavior of X n and Xn under Q.) Note that


1 W

Coy [ Tmean, Z1Oe ] = F ' ¢ o dt = —


- xf (x) dx
o —^

=— J O x df(x)=xff + J ^ f(x) dx
(24) =1

and
1 1
Coy[Tmed, Z1oc]_ ,' 4odt=— f(x)dx
f(l-^)f 12 f(L) Jµ
(25) =1.
Thus Theorem 4.1.4 (Le Cam's third lemma) gives

as n -4 oo under Qn
(26) L ✓n Xn — µ I -* d N[ [ ], ]
provided Q 2 < oo and II (f) < o0
for the asymptotic joint behavior. 0

SOME EXAMPLES OF L -STATISTICS 673
Suppose tests of Ho : Pn vs. H,: Qb, b> 0, can be based on either of two
test statistics. How do we define the asymptotic efficiency? Pitman efficiency
is described as the limit of the ratio of the sample sizes that produces equal
asymptotic power for the same sequence of alternatives. Suppose

(27) V(T„ — v) -* d N(bc. T 2 ) under Qn and


"(Tn —i) d N(bc,i 2 ) under Q„.

Then if the T. and T„ tests are compared, equal alternatives requires [see
(18), and note (20)]

b b

while equal asymptotic power requires [see (27)]

be be
T T

Thus the Pitman efficiency of the Tn -test with respect to the Ta -test is

(28) ^TT= limnb) =C / Z T.


Z

Example 5 (cont.) The Pitman efficiency of the median test with respect to
the mean test is thus

1/[ 1 /( 4f2 (w))] z z


(29) 'med,mean—
1/ o 2
—4^T (,^)•

For normal F, the value of this efficiency is 2/ ir. O

Example 6. (The LR test) The log likelihood ratio test statistic of Ho : 0 = 0


vs. H,: 0 0 0 under the model

(30) Qmm: Xm = 0 + E ni for 1: i ^ n

is, when the null hypothesis P. is true,

n J (X ni — 0.) _ n f (Eni — % n (en — 0) / v n )


(31) 2 Y_ log
f(X) —2 YI log 1(Eni )
(32) =2L" where Tn='(Bn -0)

674 L-STATISTICS

(33) =2[T„Zn-21P,I0(f)] provided Tn =0 p (1) and 10 (f)<oo

using (20)
]_
2 2
[

JO z \. l 7 n j ^lJ
1 ^(.n
z z
(34) ^ I Z^ (f) — Iö Z T — Z^ ( f, if T„ (some T) ° T(U).
)

In typical situations, if we use the maximum likelihood estimate (MLE) for


9n, then

(35) Tn =Z/I
n
0 (f), typically, for the MLE.

When (35) holds, then (34) reduces to

(36) Z 0 /I0(f)

However, if we used a X; percentage point, then (34) shows that our test is
asymptotically conservative; it gives us a representation of the correct
asymptotic distribution that assumes only that 10 (f) < oo and that our estimate
9n satisfies vfn— (4, — 0) =,, t —= T(U) under Pn. ❑

As follows from the examples above, as additional statistics are introduced


we need only derive the asymptotic representation and covariance with Z 0
of each of them.

Example 7. (The linearly trimmed mean) Let 0< a < 2 be fixed, and let
a n = (na). For n even (we omit n odd) we define

(
(37)
7'o
'n
1
— L.
‚i/2

n
i— a —1/2 / 1 _.J!
a\ 2
(Xn:i+X,:n —i +l)
(_
n i_ a . +I n 2n

and

(38) µo— '/2 (t—i)[F '(t)+F '(1— t)] dt


- -

f. (i— a )z
as in Crow and Siddiqui (1967). Then

(39) ✓Il(Tn—p') 7O=— 'JaUdg,


J 0
SOME EXAMPLES OF L-STATISTICS 675

Ja
Figure 1. J for linearly trimmed mean.
0 a 1—a I
2

where

(40) JQ(t)= [(i — a) — it 2 2111+ for0^t1.


(2 — a)

Note that since EZ10c = 0,

Coy [T° ,Z1°°]= — f J°(t) Cov[v(t),Zoo°] dg(t)


°
=
J o
J (t) J
Q

r
4o(s) ds dg(t)

= J ja (t)f0F_ t (t) dg(t) if Io(.f) <oo

=E Ja(t)i° F '(t) dF '(t)=


o
- - J ' Ja(t) dt

(41) =1 ifI0(f)<oo.

°
Thus, provided I (f) <oo, we have

(42) v' (Tn—p. ° ) -+ d b+T ° =N(b,Var[T ° ])

as n - co. Thus the Pitman efficiency of T , with respect to X„ is °


(43) WTo,T._ = o z / Var [T ° ]

in the hypotheses testing situation. U

Example 8. (The general L-statistic) If the regularity conditions of Theorem


14.1.1 are satisfied, then

1n Cl 1 1'
(44) TL„ — J(t)F '(t) dt I =TL = — JU dg
-

I n r=1 Jo J Jo

676 L- STATISTICS

where [as in (41)] under regularity we expect

Cov[T,Z10 ]= J ^J(t)dt
0
iflo(f)<co
in
(45) = 1 if we have location equivariance: — c^; = 1.

Thus, under regularity,

(46) TL, ^ = b + TL = b — J 1 JU dg under the alternatives Q;


o

and the Pitman efficiency of TL, ^ with respect to X^ is

v2
(47) STL, m*a"=Var[TL]

in the hypothesis testing situation. Cal

Exercise 1. (The trimean) Consider T. = 2.Y^ +4[F(;)+F (;)]. Derive its


asymptotic form, and covariance with Z, o ,.

Exercise 2. Let J = —H" o H - '/ H' o H ' for some df H. Consider T^


-

(i /n)E ^= ,J(i /n+1))X^:;. Derive the asymptotic form and covariance with
the Z10 of (22). Specialize to (i) logistic H, (ii) Cauchy H, (iii) Student's t,„
df H, and (iv) the extreme value H.

Trimmed and Winsorized means will be considered in the next section.


Note that Winsorizing combines smooth weights with point mass weights.

Other Examples

Example 9. (Gini's mean difference) Let X 1 ,. . . , X^ be iid F. One possible


dispersion parameter and its natural estimator is

(48) @=E1X 1 —X2 1 and T^ =E^ jX; —X; j ( 2J ^.


i<j

We seek the limiting distribution of V( T — 9). Now


n ^-' ^ n
T. _El IX^:;—Xn:iI /(2) _ i E I ^_^ j (X^:; — X^:i) ( 2 )
I<j

i n
- n
(— n+2i -1)X^:i \2)
(

y-1 X^ , ; —+ =1 J: (n —i)X^:i ]1(/


_ E j=Z 1=1 2 = + =1

(49) _— I
in

n c ; X^:; with c^;= 21 2


; =i
i-1 ]
n-1 .
- -


SOME EXAMPLES OF L-STATISTICS 677
+s
Then Theorem 19.1.1(iii) gives (assuming E IX IZ <x for some 5>0)

(50) '(T„-0)-ß d

with
E 2(1-2t)U(t)dF - '(t)=N(0, Q2 )

(51) «2 =f f
1
0
, 4(1- 2s)(1-2t)(sAt- st)dF - (s)dF - (t)
o

=f m [ f by - xj dF(x)] dF(y) _02

since

0
(52) µ= J
h(F - ')Jdt=-2 J 0
t F - '(t)(1-2t) dt

J
2[ o F - '(t)tdt- Jo F - '(s)(1-s)ds
I
('t ('t
J
=2 F(t) dsdt- F-'(s) dtds
0 0
J o s
t t
2
J o s
[F-'(t)-F-'(s)] dtds

s)I dtds
fo , ot IF - '(t)—F - '(

=0.

Exercise 3. Verify equality of the two expressions for Q2 in (51).

Exercise 4. If X 1 ,. . . , X. are iid N(0, 1), then

k
[(p-,(3i-1 [(D - , ] k und (D -t
3n+1), Xn:i-EXk+tl =-(
n t J ° Jo
t
( 53 ) _ Ud[V-t]k +t = N(0. 1 [ EX zk +i_ (EXk+t) 2
]
k+1 o (k +l)2 /

for integers k>_ 1. Show this. [Shorack (1972a) missed the 1 /(k+1) 2 in (53).]

Example 10. Let X t , ... , X„ be iid Bernoulli (0) rv's. Then g(t) = F - '(t)
equals -oo, 0, 1 for t=0, 0<t<_1-0, 1-0<t<_1. Let J(t) equal 0 or! as
0< t< 2 or z <_ t <_ 1, and let c,, = J(i/ n ). Then T„ (1/ n) E t cn;X,, : , equals Z
;

if more than 2 of the X's are positive, while T. equals the proportion of
positive X 's if less than 2 of the X,'s are positive.
;
678 L- STATISTICS
(i) Suppose 0 = 2. Then ln(T" — 2) is asymptotically distributed as a ry
having a N(0, -a' ) density on (—oo, 0) and having point mass 2 at 0. Note
that J is not continuous a.s. Jgj; hence the hypothesis of Theorem 19.1.1
fails.
(ii) Suppose 0<. Then .Ji(T" —2) is asymptotically N(0, 0) by Theorem
19.1.1.
(iii) Suppose 0>. Then f (T" — 0) is asymptotically N(0, 0(1-0)) by
Theorem 19.1.1.

3. RANDOMLY TRIMMED AND WINSORIZED MEANS

Let Xn ,, < • • • < Xn : n denote the order statistics from a random sample
X,, ... , X. of size n from a population having df F. Let 0 <_ a < b <_ 1. We let

B ont>b
(1) g°F-', A=g(a), B =g(b), ga,b= g ona't--b
A ont<a

where we suppose g, a, b are such that A and B are both finite. We let a"
and ß, denote integer-valued rv's depending on X,, ... , X. for which 0 <— a" <
ß" <_ n. The randomly trimmed mean is defined by
1 p^
(2) T" =T"(a",n—l3")= Y_ X":,,
JGn — an

and the randomly Winsorized mean is defined by

R.
(3) Tn = Tn(a", n — l3 ") = 1 a "X":.,+, + X":i + (n — ß")X":p.
n i =a„ +1

We call a"/ n and (n —,6,)/ n the random adjustment percentages. The key to
the asymptotic behavior of T. and T "* is contained in the next lemma. We
follow Shorack (1974) throughout this entire section.

Theorem 1. Suppose

(4) n"=a+Op(n -' t' 2) and R"=b+o (n - ' f2 ).

(i) (The trimmed mean) If

(5) g is continuous at a and b,


RANDOMLY TRIMMED AND WINSORIZED MEANS 679
then

(6) '(T.—µ) rb U"dg [la



— A]%In a

la1Ja ]

-[B-/L],^^ß"-b^„

where

(7) IL(a,b)= 1 gd7= 1


ba fab b—aAA
f
x dF(x).

(ii) (The Winsorized mean) If

(8) g has a derivative at a and b,


then

(9) —{ fbv"dg+ag'(a)[U.(a)_vr --a


° la n

+(1—b)g'(b)[U n (b)—.( ß "_b)


J }'
where

(10) µ * =µ * (a, b)=aA+ J b gdl+(1 — b)B


a

=aA+
E xdF(x) +(i —b)B.

Example 1. (Ordinary trimmed and Winsorized means) In the usual deter-


ministic case a" = (na) and n —ß" = (n(1— b)). However, we will only assume
the weaker condition

(11) n"=a+op (n - '' z ) and ß"=b+op (n - ` 12 )

Then it is immediate from Theorem 1 that under (5) and (11)

(12)
a
^(Ta — u) — 6 1 a U dg=
b

f 1
b—a o
l &,bdU^

(13)
a .b lo f ('

=T(a,b)=— b1 a U dg= b l a I ga, b dU


'

(14) = N(0, o- 2 (a, b))



680 L- STATISTICS

with (note (3.1.73))

(15) Q Z (a, b)= [ J aA2+


a
x2
A
dF(x) +(1 —b)B Z —µ* (b—a) 2 .

Likewise

v„ dg+ag'(a)UJ„(a) +( 1— b)g'(b)U „(b)}


= Jab
a —I

(16) under(8)and (11)

=T *(a, b) =—{ ! U
b dg+ag'(a)D_1(a) +(1— b)g'(b)U(b)
a (Ja JJ
for the special construction
(17) = N(0, o,(a, b)),
where
a
Q* (a, b)= x 2 dF(x)+aA 2 +(1 —b)B 2
A

+a 3 (1— a)[g'(a)] 2 +( 1— b) 3 b[g'(b)] 2


b
,.

+2 f [a 2 (1— t)g'(a) +(1— b) 2 tg'(b)] dg

+2a 2 (1— b) 2 g'(a)g'(b)


a
F is symmetric about 0.
(18) =
_ B x2 dF(x)+2a[B + ag'(a)] if 2

The natural variance estimate of T„ is


(19) V
2 _ ( 1 /n)[a„X„:a„+1+a„ +1 X ,:;+(n — ß )Xn:ß.] — [Tn(a „,ß")] 2
[(ß„ — a")/n] 2
It is clear that

(20) V„ -^ P o 2 (a, b) where a" P a, ß” p b, and (5) all hold.


n n
Thus asymptotically correct confidence intervals for µ are given by

(21) Pµe T„— ° ^ ZV" T„+ . L2 V. 11 -^1—a asn >oo.

(Because of the g' terms, confidence intervals based on T* do not seem


worthwhile.) ❑
RANDOMLY TRIMMED AND WINSORIZED MEANS 681

Proof of Theorem 1. It is clear from (19.4.3), a„ / n -+ p a, and /3„/ n - b that

(22) S„ mI —
1 p”
Xn:; — f ß^/n 1 f
g dI = —IJ„ dg,
b
.

n i =a„+I a„/" j a

since each subsequence n' contains a further subsequence n" on which Ja (r)
I ( ‚ / „ ß ^ / „ t (t) and J(t)=1 (a.b] (t) satisfying J„•-(t)->J(t) for all try {a, b} and
since gI({a, b}) = 0 under (5).
(i) Now write
n r ß,/n b

(a)'Ji(T„—/.l)ßn—anlS„+v' J am„/n gdl—gdI


f a

_ JA In \ßn n an) —(b—a)]}•

Then (6) is clear from (a) and (22) since


ß,/n / ß,/n
dI =fnl ” — b)g(b)+fn [g—g(b)J dI
b \ n b

(b) =Vi (R"— b)g(b) under (4) and (5).

(ii) Write
fl,/n f b
. l
fi(T n—µ*)={ Sn+.in n gdI—
/n
gdl
I
+^n a"Xn:a" +i —a"g(a)+ an g(a) — ag(a)]
En n n
+VRr n nßn nß" g(b)—(1—b)g(b)11
x":Q„— n nß
ßn
(c) ^Sn+n/ [Xn:a n +lg(a)]+ n n [Xn:l3^ — g(b)],

and note that


g(&:ß.)—g(b) ] /-' n:0„—Yn+ßn—ß^
`/ (X,,:p,—g(b))=[ 4n:ß„ b V I!
I
(d) ag'(b)[Vn(ßn/n)—Vn(b)+Vn(b)+ ✓nrPn—b)]

using (8)

(e) ^g'(b)^—UJn(b)+^(Pn—b^J

Plugging (22) and (e) into (c) gives (9). ❑


682 L -STATISTICS

The Metrically Symmetrized Winsorized Mean

Suppose now that X 1 . . , X. are iid FB = F( - 0) where


,.

(23) F is symmetric about 0.

We seek a robust trimmed-mean type of estimate of 0.


We let 0°, denote some preliminary consistent estimate of 0 and we let B„
denote some positive rv. We assume that

(24) Ji(B°, -0)=O (1)


and
(25) Vn (B„ - B) = Oo (1) for some B.

We now let a n and n - ß„ denote the number of observations s to Bn - B„


and ? to 9n+ Bn , respectively. For this a n and ß„ we refer to T* = T„(a„, ß n )
as a metrically symmetrized Winsorized mean.

Theorem 2. Suppose (23)-(25) and

(26) F has a strictly positive and continuous derivativef


in some neighborhood of B.

Let A= -B, a = F(A), and b = F(B) = 1-a. Then

(27) / (T, - 0) =a (1 -2a)T„(a)+2aVi(en- 0),


where
(28) T„(a)= T„((na),n -(na)) =T(a)=T(a,l-a).
a

Proof of Theorem 2. Without loss, we may assume 0 = 0. We let

(a) d =F(Bn -B „) and 6 = F(6°, +B,).

Note that

.../(a„/n -a) =UJ „(ä) +Vi(d -a)


a

(b) - U n( d )+F(Bn- B F( - B) ✓-_{(B„


- 0) -(B„- B)1
„)-

a (8 „ - 0) (B„ -B)
-

( c) =Un(a)+f(-B)-
a
► [( 9n,-0)-(B„-B)]
RANDOMLY TRIMMED AND WINSORIZED MEANS 683

while

n—b)=1J„(6)+,ßn(i — b)
(d) =UJ
a
„(b)+f(B)./n [(en— e)+(B„ — B)J.
Thus (9) gives

, /n(T„ — A*)
-a

(e) =—1 U„dg


a Ja

—ag'(a)[—f(——B n +B)—f(B)V(B^+B„—B)]
(f) II —
f I-aU„ dg+af(B)g'(a)2 n(en — 0) ,

=— I a UJ„dg+2a ✓n(6„-0)
Ja
(g) = (1-2a) T„(a)+2a f (e„— 9)

as claimed.
Note that if assumption (23) is dropped, we can still claim

vn n —µ ;k )=(b—a)Tn ((na),n—(nb))+(a+l—b)^(e„-9)
a

(29) —a^(Bn—B)+(1—b)^(Bn—B).

This will be required below. ❑

Exercise 1. Verify (29).

Exercise 2. Show that the analog of (27) for trimming is

(30) Ii(T—O) agdv„+2Bf2a)/(e„-0).


a E
Example 2. Suppose F is symmetric, as in (23). Suppose 0< a c < is a fixed
preliminary adjustment fraction. Let

(31) 8n= T(a 0 ) = (the 100 • a 0 % trimmed mean).

Let y„ = yn (X,, ... , X„) be an integer-valued ry for which

(32) y„/n=2a+OP (n '' 2 ) for some 0<a<2.


Ö$4 L- STATISTICS

Define B„ to be the smallest value for which

(33) [9 „— B n , 9 „+ B„] contains n — y„ of X 1 ,.. . , X.

Of course, the number of observations in (—co, B'n—B„) (in (B *, +B ",00)) is


called a,, (is called n — ß,); thus a,, + ß„ = y,,. We suppose that

(34) F satisfies (5) at g(a 0 ) and satisfies (26) at B = g(a).

Then Theorem 2 gives

(35) Vi(T — 8) = (1 -2a)T„(a)+2aT„(a o ),


,,

which is a weighted linear combination of trimmed means. In a robustness


setting, we would likely take a o "rather close” to 2. We might then define a
"tail heaviness function” H,, by a formula such as

(36) H„ =V-1 E (X; — e„) 2 1 E jX; — e °, (for example),


n ;=l I n i.,

and then let

(37) y"= A(H,,) where A: [1, co) [0, 2) with At and continuous;
n

thus, lighter-looking tails cause us to trim less the second time. Note that the
interpretation of the weighted combination on the right-hand side of (35) is
thus very reasonable. We note that

(38) 2a = A
(V E(X — 8) 1 2

EjX -01

for the choice of H,, above. This appears to give a very reasonable adaptive
estimator. The natural way to Studentize ✓n (T* —8) is to divide by V* where

, *2 _ 4a 4a2
(39) Vn — SYY+I —2aoSYz +(1 -2ao)2Szz

and Sy, = (1/n) E; , (Y — Y)(Z; —Z), and so forth, where

Xn ,p„ if i> ß„ Xn, „_(na) if i> n — (nao)


(40) Y = X,,, ; otherwise and Z, - X,,. 1 otherwise
Xn :,, +, if i a„ Xn:(„ao) if i ^ (nao). ❑
RANDOMLY TRIMMED AND WINSORIZED MEANS 685

T9(2,2)
sample

H n Hn
— pseudo-sample
r•
xn
Figure 1.

Example 3. Suppose that X,, ... , Xn are iid F(• — 0) where

(41) F is symmetric about 0.

Suppose we use the ordinary trimmed mean Tn (ao ), with 0< a 0 <, as our
preliminary estimator 0 of 0. Then the total number of observations trimmed
is

(42) 'Yn = 2 (nau).

Now choose Bn to be the smallest value for which [ TT (a0 ) — Bn , Tn (a o ) + Bn


contains exactly n — y,, observations. Let a,, (let n — ß n ) denote the number of
observations less than T,, ( a0 ) — Bn (greater than T,, (a o ) + B,,). Replace the order
statistics Xi by pseudo order statistics, see Figures 1 and 2 (with Zn for Xn ),

Xn:a^„ if i < an
(43) Zi = Xn:; ifa,,+,^i<ßn.
X,,: ßry ifi>ßn

Note that the sample mean Zn of the pseudo observations is exactly the
Winsorized mean T,, (a n , n—/3,,). Let

n
^` (

(44)
) V n= L (Zi — Zn) 2 ,
n-1 i =,

T9(2,2)
sample


•• • pseudo-sample

Xn

Figure 2.
686 L- STATISTICS
and note that this is the V„ of (20). It is proposed to use

(45) ✓ n (Z„ — 0)/ V, = t O ^_ «

to obtain confidence intervals for 0. Note that Example 2 shows

(46) ✓n (Z„ — 0)/ V = T(a o )/ o 2 m N(0, 1)


V if (26) holds at a o . ❑
a

Example 4. (Nearly symmetric random means) If (23) and (26) hold, and
if the random adjustment percentages are equal to the extent that

(47)
an
- n
=_o1/2
n — —n Ya p (n ) where n
an
a
some a,

then Theorem 1 immediately implies

(48) .fin (T„ 0)


1 2a fal
—1 a a U d

and

(49) .^(Tn- 0) =—^


^ .
U„dg+a%'(a)[U (a) +v„(1 — a)]
a
fa a 111 '

Note that (47) represents an approach that is fundamentally different from


metrically symmetrizing. ❑

Example 5. Suppose F is symmetric about 0, as in (41). Let 9 denote any


estimator asymptotically equivalent to T (a o ). The value of a o may be deter-
mined by an adaptive estimation procedure, and may hence be unknown; but
we assume that a o can be naturally estimated by some ry ä„ to the extent that

(50) ✓n(ä„ —a o )= OP (1).

This would be true if your choice for 0n was a nearly symmetric randomly
trimmed mean ä la (48) (or if it was a Huber-type estimator). Now define

(S 1 ) y,, =2ä,,,

and repeat Example 3, from line (42) on, verbatim. Nice estimator! ❑

Example 6. Let 0< a, < • • . < a K < 2. Suppose c„ ; = c,, ; (X,, ... , X„) -+ P c, as
n * oo for I i i ^ K where Y_K I c,=1. Suppose F is symmetric as in (23) and
-

satisfies (26) in neighborhoods of a,..... a K . Then the random means of (12),


RANDOMLY TRIMMED AND WINSORIZED MEANS 687

Y_
(28), (45), and Example 5 [use the temporary notation M(a 1 ) for any one of
them] all satisfy

(52)
ß[i =1
cn1MM(a,)
a i =1
- 8] _E c.T ((na,))•

Johns (1971) shows how the c,, ' s can be chosen so that the asymptotic variance
;

of (52) is uniformly close to the Cramer-Rao bound over a large class of rather
smooth F. Moreover, K =2 is shown to produce good results. (Note that
F need not be symmetric if the ordinary random mean of Example 1 is
used.) 0

Remark 1. Suppose F is symmetric as in (23). We have shown above that


many random means are asymptotically equal to

1 'm U
7,,,(a)—
(53)
1-2 a f a
dg

1 1 a

(54) _ U dg for a special construction



1— 2a a

1/z

(55) 1 2a a § dg,

where

U(t)+U(1— t)
(56) S(t) _ — (Brownian motion on [0, 2]).

Shorack (1974) extends this to an analog of the two-sample t-test, for which
F need not be symmetric.

Example 7. Show that if

(57) 6= 0+op (n - ' / 4 ) and B„ = B+op (n '/ 4


- )
replaces (24) and (25), then (27) and (35) still hold provided we strengthen
(26) to

(58) f(A +E) —f(A) +E) —_f(B)


and f(B
E E

are bounded for e in some neighborhood of 0.



688 L- STATISTICS

4. PROOFS

We let M denote a generic constant. To simplify notation, we assume b, = b 2 = b


and d, = d2 = d. We also suppose h 2 =0 and write h for h 1 ; then dg may also
replace dg I since g is then 1'.

Proof of (19.1.11) (and (19.1.45)1. From (19.1.10) we have

T. — z =^' gd[ n(G )— n]


0

= g[n(^n) — n]IO — ^ ' [ 4' ( G) — Tn] dg

_—f 0
` [ 4 rn(Gn) —41 n] dg+

since the interval 0< t < f'n ,, we have under Assumption 19.1.1

(1) jg(t)['1'(G (t)) — '1'(:)]I <_ D(t) Jo ^ B(s) ds <_ Mt -d Jo s-b ds


(2) <Mtl-cb+a)=MtI -"-*0 as t^0,

and since a symmetric argument works for n:n < 1.


1< ❑

Proof of Lemma 19.1.1. Note that

-M f , [t( 1— t)]' -b dg
[t( 1— t)]'Bdlgl
fo, 0
CM[t(1—t)]r-bgI0+MJo I [t(1— t)]' -b ) dt
g dt

f
^0 -0+M o[t(1 —t)] -"I
j [t(1
— t)]' -b f dt using a < r

s M' f
ol
[t(1 — t)]' -b -a- ' dt

(a) < ifa =b+d <r.



PROOFS 689

Thus, by Fubini's theorem,


i

EIYI^E I 1 [, l]—tIIJidlgj
0

= J0` E 1— tflJI dlgl

=f 2[t(1— t)]I JI dlgl


0

(b) <oo if a < 1 by (a).

Thus another application of Fubini's theorem shows

(c) EY= E{1 [1] —t]Jdg= J 0•Jdg=O.


0 0

For higher-order moments, we note that [see (19.1.25)]


i
J dg <_ B dg (recall h is assumed 7)

=IBgjcI+ fjgB'I dt- 5 B(e)D(6)+J DIB'I dt


f
5 (some M)B(f)D(6)
(d) <_ M'[ (1— 6)] a - where a < .

Now note that

(e) J[ra]odt<cc
ifaO<1,

and set 2 + 5 = 1/a for the boundary value. We can thus apply Fubini's theorem
to find

Var[Y]=j
0 f o,
dg(s) dg(t)

=I
0 o
J [s" n t —st]J(s)J(t) dg(s) dg(t) =Q 2

as in (19.1.27). ❑

690 L- STATISTICS

Proof of Theorem 19.1.5 From (19.1.14), (19.1.11), and (19.1.12) we have

— Yn= I [(G) 'Yn — — (G — I)J] dg


0

_f { ' Gn(t)
n
jG() J,,(s)ds— J(t) }[G (t)
— t
— t] dg(t)

(3) _ f {An(t) }[G (t)—t] dg(t).


0

[ We define the ratio to be 0 if G (t) = t.] Consider A. Since IIG, —III -> 0 a.s.
by Glivenko-Cantelli, Assumption 19.1.2 implies that
(4) for a.e. w we have A n (t) -* 0 a.e. Igl as n -> oo.

Next, we seek an a.s. bound on A. Now


ç 6 ^ (t) IJn I ds
IAn(t)I^ Gn(t) —t +IJI,
so that for any tiny 0>0 and for en ,, r < en;n we have

IAn(t)I^[B(G )vB] +B
^ M0[t(1 — t)] -(n+O) a.s. for n ? n e,^,
using Theorem 10.6.1 of Wellner. For 0< t < fn ,, we have G. (t) = 0 so that

A n (t)I< J r
0
B(s) ds /t+B <_Mt I- b /t =Mt -b,

and a symmetric argument applies for n , n < 1.


t< Thus for any small 0>0
we have

(5) IAn(t)I <_ M9 [t(1— t)] -(b+0) on (0, 1) a.s. for n ? some n 0,.

Up to now we have mainly followed Shorack (1972a), with some influence


from Mehra and Rao (1975), Wellner (1977b), and Boos (1979).

Case (ii). (LIL) Recall b n = 2 log t n. In this case for n >_ n 00, we have
from (3) that

lim —m i ,/z -e
n-.ro l n -. ao

xS lim
I n o0 0
J ^IA (t)I[t(1—t)]

n 112 - 0 dg(t)}
11
(6) ^1 •0=0 a.s.
PROOFS 691
(provided 0 was chosen so small that a+20<12) using James's theorem
(Theorem 13.4.1) to obtain the 1 in (4) and using the dominated convergence
theorem, with dominating function

M9 [t(1 — t)] ,' -n-z e

guaranteed as in Lemma 19.1.1, to obtain the 0 in (6). Now (6) trivially implies

max
k?n
b IYkl -^0 a.s.
bk
as n - oo,

and thus (19.1.54). Also, for K E and then n £ chosen large enough, we have

(7) max klYkl ^ max klYkl + max bk ^IYk1 E

1-kin Vbn Isk^K, /ibn K,Sksn V n b n bk

for n >_ some n 6 ; thus (19.1.53) holds.


See Sen (1981) for a version of this result with c n; =J(i/(n+l)) and a
growth condition on J". Govindarajulu and Mason (1980) have a version with
J(i/(n + 1)) or cn , for a continuous J' satisfying a growth condition. Ghosh
(1972) has an early result in this vein. Wellner (1977b) treats Fyn /b n .

Case (iii). (SLLN) In this case, for n > n 0 , we have from (3) that

(8) li 1Yni { IAn(t)I[t(1—t)]1-°dg(t)}


H n—I1 BII^^1^^ f o'
^—I)]
=0.0=0

using Lai's theorem (Theorem 10.2.1) for the first 0, and using (4) and (5) and
the dominated convergence theorem with dominating function

b -2 e
M0 [t(1— t)]' —

guaranteed by Lemma 19.1.1 for the second 0. See Wellner (1977a) for a version;
see also Sen (1981).

Case (i). (CLT) In this case

u
II i2-0
(9) 1— t)
II[I(1 I)]' /2 Bit Jo jAn(t)I[t( dg(t) =Zn• nn

where A n —* as. 0 as n -> oc by the dominated convergence theorem as in (6)



692 L- STATISTICS

For e >0 we specify M , then K6 , and then n to find that


6

max k
l kn
o r-
- max ^Zk A k
lsksn n

(10) ^ maKe _ZkAk+1lmk ^ ^ Zk1 [ a Ak ]


ax

e + M [e/ M ] = 2e
Q with probability exceeding 1- s,
E

since

(11) max Zk = Op (1)


l^k^n ^_
n

by Inequality 3.6.3. Thus (19.1.51) holds. Likewise (19.1.52) holds since

kyn Ii^k l^man kZkAk


max n
(12)
C[a x jj
Zk][m
an nk]
' = O (1)• O(1) =o ( 1 )

since
n
(13) max nZk = Op (1).

Versions of this CLT appear in Sen (1978, 1981) and Govindarajulu (1980),
and a somewhat different version in Mason (1981a).
To improve -* p 0 to as. 0 we would need to expand to a J' term. If we
do, uniformity in g, Jn, J using a common B, D is fairly straightforward. The
proof of Theorem 19.1.7 should serve as a model for both of these remarks.
Proof of Theorem 19.1.6. Now

(14) ✓nIµn AI = ✓n I I 1gd['I' n -'I']I = ✓n f g(J -J) du)


-
n
J o 0
2/n 1 1

f
1-1/n
DB
+
Bdt+)D ^-
I(1 —I)dt
(15) ^^( 1/n
o 1-z/n V /l

using the mean-value theorem and (19.1.30) on the second


term

:M ./ t
f. 2/ n

°dt +M—
j J
^
1 ( 1/2

t ° - 'dt

M
= (Mn'/ 2 /n' -° ) +Mn - '/ 2 n° =nZ °
as claimed.
PROOFS 693
It's easy to show from (14) that

r ' (M1 MEjh(X)j


mlµ„ - µl ^ ✓n J 0 \ n / dg` ✓n
We also have a.s. for n ? some n 0, » , from (19.1.2), the line prior to (19.1.57)
and (19.1.19) and then from (19.1.60),

(16) ✓nIT„—T„I<V'i J 0 D(G ')IJ J„I dt


f
" —

, 2/ 1/2
(17) ^.‚/ t (a+e)Mt bdt+,n^
-
-
t (a+o)n 't (I+n>dt
- - -

i/„

J)
i
+ (analogous terms for

=o(n -& ) forany8<z—a


f) 1/2

as claimed. Using the "in-probability linear bound” of Inequality 10.4.1


(instead of the "a.s. nearly linear bound” of Theorem 10.6.1) gives

(18) — T,, = Op (n - c 2- ^^)

Note that uniformity in g, J„, J using a common B, D is immediate.


Many authors have had to deal with versions of this result. 0

Proof of Theorem 7. Rewrite (19.1.14) as

(19) —2ny„=-2[n(T„—µ)—S„]=2n
t ('I'(G„)—If—(G„—I)J]dg

J r (Gn)U2 dg
ft
0

for some 6*(t) between G(t) and t. Now the integrand of the middle term
of (19) is bounded by

2t(1—t)B(t) for 0<t<C„ : , and e„,„<_t^1,

so that the argument leading to (5) gives

1J'(G*(t)l - Me [t(1— t)] -( '+ b+O) on (0 , 1) for n >_ some ng,^,.


694 L -STATISTICS

Thus, with q = [1(1 — I )] 1-B and 0 sufficiently small,

J
J2ny,I ^ IIU,/g1I 2 ^ IJ'(G )Ig 2 dIgI
0

( 20 ) = IIU /g11 2 O( 1 ) a.s.

Hence (52) and (53) follow from James's theorem (Theorem 13.4.1).
Representations with a lesser rate appear in Govindarajulu and Mason
(1980) and Sen (1981). ❑
CHAPTER 20

Rank Statistics

0. LINEAR RANK STATISTICS

Let X 1 ,. . . , X„ n be lid having a continuous df. Let R„,, ... , R nn denote the
ranks and D 1 ,.. . , D the antiranks of these observations. Let
cn ,, ... , cnn denote known weights and d,, 1 ,. . . , dnn denote known scores. We
wish to consider the linear rank statistics

(1) Tn — C SC ' E 1 Cnid,,R ^ C ^ C i ^ , dniCnD„; = JO h,, dl,,

(2) =—f n dhn,


o,R
where h n is the scores function defined by

1 i
(3) hn(t)=dni for —L t<--and l=imn
'

n n

with h n right continuous at 0. We also consider the associated process T n


whose value at k/n is Tnk = J k/(n+l) h n dR,,; note that Tnk is the contribution
to T. made by the first k order statistics X,,, ... , X,:k-

1. THE BASIC MARTINGALE IOW„

Remark 1. The key result we will try to mimic is (6.1.49). Thus W n denotes
the weighted empirical process. Moreover, W. (t)/(1— t) is a martingale on
[0, 1), and we have the relationship

(1) W,(t)
= f 1 1 s dM,(s)

695
696 RANK STATISTICS

for the basic martingale

(2) Mn(t)°W „(t)+ J 0


W (S)"

1—s
ds for 0 <_ t^ 1

of (6.1.41).

We now turn to ranks. We will make use of the following notation. Let

(3) P"; -'


1
n+1 R^ =UB„(Pn;) = C where C —
i =^ ✓c'c
We then note from (3.7.4) that

ø(p,1) __ R.
(4) Z; = for 0< i _< n —1, is a martingale.
1—/ n
i 1— i/ n'
We will also let

(5) M\i
for an appropriately defined basic martingale M. The analog of Eq. (1) that
will provide us with our definition of M. is

n _ n
(6) OZZ; — Z;_,
=n—i +l aM' n—i+l (M' —Mr-I)
for 1 <_i<_n -1.

We now note that

_ n n n—i
n—i R ' n —in— i+1 R, ' -
(7) AZ'
= n [ n_i +1 C + 1 R ]
—i'-'
n—i +1 n—i n
_ n
(8) n—i+1
provided

(9) OMM_ n —i+l C`+ 1 R,_ 1 =C; + 1 R ; for 1 <_i<_n -1.


n—i n—i n—i

We also define the o-field by

(10) =u[Rn(t):025t^Pn;]=o[C1,...,C;].
THE BASIC MARTINGALE 697

Inasmuch as (9), c„ = 0, and the finite-sampling Exercise 3.6.3 together imply

(11) E(AM I ; ,) _ n n —i n —(iR , -1)


1
+_ __
n —i
!
=0.

we have that

i
(12) fvii,, —i <n (with Mo =O and iM„=0)
=M;= AM; for 0<
n j=1

satisfies
/ t
(13) 1^1A„ 1 n , 0 <_ i <_ n, is a martingale with respect to the p"'s.

We note from (9) and (12) that

(14)( ±) = I C,+
;=, \ n—j R^}=R,+1n;.,± 1—j/n
l„(!„;)
Ri

=R„(p„r) +i
n j =, 1—j/n

We extend AN„ to [0,1] by letting M (t)=NA „(i/n) for i/n<_t<(i+l)/n


for 0 i <— n. The natural limiting process to associate with Rv ,, [note (14)] is

(15) M(t)_=W(t)+J ^ W(s) ds for0<—s<_1;


0 1—s

note that this is exactly the same process as the of (6.1.44) when F = I.
Recall from (6.1.44) or (6.1.24) that

(16) M is a Brownian motion on [0, 1].

Note from (9) and the finite-sampling Exercise 3.6.3 that

Var[OM1 ]=Q2„ (n i)2 [(n — i +1)Z-2(n —i +l)


n-1
i - 1 — i
+(i -1)/1— n-1 )

1 n —i +Z
rn— (i —1)I 1+ 1 I
n (n —i) L \ n-1/ J
_ 1 n —i +l 4
(17) for 1<i<_n-1 and n?2.
n-1 n —i ^n
698 RANK STATISTICS

Thus

(18) sup IVar[IO^D „(t)] —tI -.O as n-+oo.


ost^1

We will consider II /q II metrics where

(19) q is / on [0, 1] and


J 0
” [q(t)] -2 dt <oo.

Theorem 1. Suppose c„ = 0 and max {c n, ; / c'c: 1 < i <_ n } -* 0 as n - oo.If q satis-


fies (19), then
(20) II(fW„ —^rU)JgIIQ-^ a•s. 0 as n co for the special construction

of Theorem 3.1.1.
Proof. Note that
n+1 f P^•^^ 68„ i i —I
(21) 1 n (t) = 68 „(P„,)+ n forty<_t< n, 1 <i<n,
o 1 —I„ dt

where I„ (t) = j/ n for p, <_ t < p + , and 0 <_ j n-1, with the convention that
;

the term Il / (1— I„) in the integrand of (21) equals 0 on [p,,,, 1]. Now
II68 n —W a.s. 0 by Theorem 3.1.1. Apply this to (21), just as was done in the
proof of Theorem 6.1.1, to obtain JIN^III„ — lyll II -a_5.0. Now apply the Birnbaum
and Marshall inequality (Inequality A.10.4) using the proof of Theorem 6.2.1
for a model. ❑

Exercise 1. Verify the identity

pg„( Pni )=1,fl i)_ (i—j) /n


(22) AM n (J) for0^i<n
( n ^_, I — j/ n n

by plugging (14) into the rhs. Show that this, in turn, gives

(23) R„ (PIU)
= Y 1 OfW„ for 0 <_i<n.
1—i/n - 1 1—j/n n

These are the analogs of (6.6.3) and either (6.6.2) or (1), respectively. Use
Theorem 20.2.1 to justify passing to the limit in these two formulas. Equation (23)
gives

(24) W(t) =
1—t
J r
o
1 dM(s)
1—s
forall0<_t<1;

and (22) really reduces to the same thing.

Exercise 2. Prove an analog of Theorem 1 on II /qII convergence of l,,.


PROCESSES OF THE FORM T. = jo h„ dR„ IN THE NULL CASE 699
2. PROCESSES OF THE FORM T. =J0 h" dR„ IN THE NULL CASE

Suppose c" 1 , ... , c,,, are known constants satisfying


i

(1) c„=0 and max c"'->0 as n-^co.


,, i=n c c

We also suppose d„ 1 , ... , d„„ are known constants, and we define a scores
function
i -1 i
(2) hn(t)=d„i for —<ts, 1i<_n,
n n

with h„ right continuous at 0. We suppose (D„,, ... , D) is the vector of


antiranks of iid continuous rv's X,,,. . . , X„„ ; then (D 1 ,..., D„„) takes on
each of the n! permutations of (1, ... , n) with probability 1/n!. We are
interested in the linear rank statistic

I Cl
T. ni c,, E, = O h„ dR n .
(3) .l c,c d

Suppose, additionally, that the data comes sequentially so that only the
first k order statistics are available. This could well be the case if we were
dealing with life testing data. If at a particular moment only the first k order
statistics are available, then (3.1.33) shows that D„,, ... , D„ k are available.
We can thus compute

1 (k' C Pnk
Pk
(4) Tnk = d
CSC ' ' dniCnD ., =fo hn dl n

f
"
Rn
( 5 ) =hn(pnk)R n(Pnk)— dh ne
P

with p„ k = k/ (n + 1). Note from (3.6.15) that

(6) ET„k =0 and


1 knk' 1 k 1 k'
n
Cov[Tnk, Tfk]=
n-1 n n i j= dn
=, d–(– 1 n iI
'/ \– =l

With this in mind, we define a process T n on [0, 1] by

(7) T„(t)= h„dR„ forOst<1


0

so that T n(pnk) = Tnk•


700 RANK STATISTICS

To obtain convergence of T„ to some limiting process T, we will require


that h„ "converges" to some h in 22 . The natural limiting process T is then

(8) T(t) J hdW for 0t1;


0

recall (4) and (3.1.63). We know from Theorem 6.4.1 that ET(t) = 0 and

(9) Coy [T(s), T(t)]


= rs„f

Jo h 2 (r) dr—( J S h(r) dr) (f"h(r) dr ) = Qhl[o.,l.hl[o.,]

for 0<_ s, t ^ 1, and we compare this with (6). See (3.1.39) for the a-notation
used
used in (9).
We consider II /q II metrics where q is ‚" on [0, 1]. We require that the
function q satisfies

(10) h/q and 1/q are in 2 2 .

We now require that the h„'s converge to some limit function h at a rate
sufficiently fast that

(11) I[(h,, — h)/q]I- 0 as n-► oo.

Theorem 1. Suppose (1), (10), and (11) hold. Then the process T„ of (7)
satisfies

(12) 1I (T,, — T)/ q II -- 0 as n - c for the special construction.

An interesting corollary is obtained by setting q = 1 in these hypotheses,


and concluding that

(13) T„=T„(1)-" d T=T(1)=N(O,Q,h,) under (1), (10), and (11)

for the o of (3.1.38). Häjek and Sidäk (1967, p. 163) prove that (13) holds
under these same conditions with q = 1.

Proof. Now using the notation of Section 1,


k k
(a) Tnk=E d;CC=Y_d+ORI
I I

(b) _ — Od ;+l R ; + dk+ , R k summing by parts


1

PROCESSES OF THE FORM T„ = J o h„ d62„ IN THE NULL CASE 701

k i
(c) =—^(1-- Od;+jZi+dk+I Rk
n
k
(d) _ —> OP;+, ZI + dk +l R k with Apr+i = p; +I Pi —

provided

C1 — -
i- 11 1 <i<_n +l.
for 1_
(e) p
n Jd + —(d
n ; l +• • • +d; _,)

Note that

(f) P. = Pn +l = dn•

Summing by parts again gives

^k•
Tnk —pIAZa — pk+iZk+dk+IRk

k
_^ PjnZ; —Zk (d,+...+d k )ln

(d,+• • •+dk)
k n OMA —Zk by (6.1.6)
(g) —^p'n —i+1 n
— k j d +d,+...+ d1_, +...+dk) Z
II, i+i
n—
J
1 AM — (d^
n
k

k (d,^.....+dk) k d 1 +...+d i _,
(14) =^d;^M;—Zk
n +Y n—i+1
ln k/n
(15) _hndM,,—{On(k /n)/[1—k/n]} hndt
f Jo

f,
k/n 1 {ns} /n

+ hnr) drdfJA n (s)


I —{ns }/n j 0

(h) = Bn1(k) — Bn2(k)+ Bn3 (k)•

for {x} the greatest integer less than x. Compare (15) to (6.4.14).
Applying the Häjek and Renyi inequality (Exercise A.10.3) to (15) [as the
Birnbaum and Marshall inequality was applied to (6.4.14)] gives

( 16 ) P(IlTn/qJj nk>3E)
(' k/n j' k/n k/n

f J <4s 2
o
(hn /9) 2 dt +2 J
0
hn dt
0
q-2 dt/(I — k/n) 2
702 RANK STATISTICS

where the factor 4 comes from (20.1.17). Writing h„ = (h„ — h) + h and applying
(11) shows that we can choose 0 = 0, in (16) so small that

(17) P( T „/q{jö?3E)<e for n >_some n,.

Also, Inequality 6.4.2 gives

(18) P(jjT/g11 0 - 2 e)^ 2 E -2 (h/q) 2 dt.


f0O
We now turn to [ 0, 1]. The function 'i is irrelevant on [0, 1], so in the remainder
we set +1= 1. Now
e
( 19 ) IITn(0,.] — T($,']f1e

(h„ —h,n) dR,,


cllfe e B + ^e hmd(RR—W) Ile B
f. 1=6

+II j (h,„ h)dW e =Yh1+Yn2+Yfls

<E
for the function h,„ of Proposition A.8.1. Now for n >_ n, we have P( y„ I ? E)
(by (16) and since

J• 1-0
(j) (hn — hm) 2 dt<_2I[h„ — h]r 2 + 2 1[h,„ — h]I 2 = 0
e

by (11) and Proposition (A.8.1)), P(yr3 >_ E) < E by (6.4.15) and f e (h,,,
h) 2 dt -* 0, and y„ 2 -* p 0 for each m since

(20) Ilj h d(R m ” —W)


JI l B`2IIhmIIB BIIRn—WJI+IIln—WIl f dlhmI
9 o

by integration by parts. Thus SIT„ (9, • ] — T(9, • ] 1^ B B - P 0. The interval [ 1— 0, 1]


contributes negligibly, using the same approach used in (16) and (j). Thus

(k) P(IIT„ —T)/q)— 11e)^ Ile for n?some n e .

Do note from (15) and (20.1.23) that T has the representation

(21) T(t) fo, I h(s)+Io 1(r s dr f o h(r) dr1 1 sJ


— dM(s)

(22) _ f I h(s)— S S 1 (r)dr dAA(s)=


o L ,
J o h*(s) dM(s)

as in (6.6.9) and (6.4.3). EI


PROCESSES OF THE FORM T„ = 1 0 h,, d68, IN THE NULL CASE 703

Exercise 1. Prove an analogous result for Jo h„ dM„.

We considered one type of sequential process in Theorem 1. A second type,


the analog of Theorem 19.1.4, is considered by Sen (1981). Yet another type
is considered by Mason (1981c). Mason works with R,, ... , R' R; ; is
the rank of X among X,, ... , X.
;

Exercise 2. (i) Häjek and Sidäk (1967) consider

(23) ^n - E h()= ^ hdW„ 0 d hdW for h€ t2 .


i_I JCc ' .1 o 0

They also show that


n
(24)
24 T=
n L J - dnR,,; with d„ k = Eh(^„,k)

(25) =E(T„"IRn1, ... , Rnn),

so that
(26) Var[T' > —T;,Z ^]=Var[T;,"]—Var[T Z ^]=o —Var[T;,2 ]

We will thus have

(27) f” hdW for hE2Z


0

provided we establish that

(28) Var[T]->o-

(ii) Lombard (1984) shows that if h = h 1 —h 2 with the h ; i' , then

(29) T >_J T(v) dh(v)

where the process

n
(30) T(v) = E b„(R„;, v) with b„ (k, v) = P( < vI R„ = k)
; ;

+_^ c c

has a covariance function that is bounded by (Z) (u A v — uv) and converges


to (u A v — uv). Thus (28) and (27) hold.
(iii) Verify all claims made above.
704 RANK STATISTICS

3. CONTIGUOUS ALTERNATIVES

The Model

Let denote a a-finite measure on (R, ß). Let X„ ; denote the identity map
on R for I C i <_ n, n ? 1. Consider testing the null hypothesis

(1) P„: X„, ... Xn „ are iid f

against the alternative hypothesis (indexed by b)

(2) Q„: X n 1i ... , X. are independent with densities f^ 1 , ... , f nn;

all densities are with respect to µ. We suppose that f 0 =f and also write X ° ,
for X„ ; . The special construction for this model is X, = (F 1 )( 4j. We agree
that in this section

(3) J[h]I _ J h 2 dµ and h = J h dµ for all h E YZ (µ),

where the space of all square integrable functions h is denoted by 2'2 (µ). Let
pn and q„ denote the densities of P„ and Qnb with respect to µ x • • • X µ.
We assume the existence of constants a„,, ... , a nn , n ? 1, satisfying
z z
(4) max a„ = max „ a „' 2 - 0 as n -+ o0
1'-^i^n a a ‚in i =1 ani

where
a = an = anls...,a„„ .

We let rank (a„ 1 ) denote the rank of a„ ; among a„,, ... , a n „; in case of ties we
will refer to possible rankings of a„,, ... , a„ n . We also assume the existence of
a function
(5) S in 22 (µ)
for which the densities satisfy our key contiguity condition
n
Ia ] 2 0 as n-
(6) Y [T

for each b under consideration. We let

2s-F
(7) 30= on (0, 1).
f ,F-'

CONTIGUOUS ALTERNATIVES 7 05

We wish to consider linear rank statistics T" under the alternatives (2); see
(3.2.30). Thus we introduce constants c",, ... , c,,, satisfying the u.a.n. condition

(8) max,c,c n c zn;


-+0 as n co, subject to ë,, = 0.

We will appeal primarily to Hodges and Lehmann (1963) for direction, and
to Jureckovä (1971) and Theorem 5.9.1 for technique.

Testing

Let ß,,1,. .. , ß n " denote the rv's of the reduction defined in (3.3.31). Now the
R 1 , . . . ‚, Rn" of Xn,, ... , X „” also serve as possible ranks of ß n ,, ... , ß nn .
ranks R„,,
(For now we have one fixed b.) Our concern is with

nb c R
nr h ni
Tn = Tn =
(9)
= I
fo,
h dlf8" under the alternatives Q„.

We let

(10) µn =Jhd^j c G"


"; ; I for Gn; as in (3.3.31),
o =i cc J

" Cni '


(11) S,,—=Y 1 c c h(ß n; )= hdZ n +µ n fort" as in (3.3.38),
,

n
Rnrl
(12) yn =Tn—Sn = ' >^ ^; c h( l_h(ßn^)],

and note that

(13) T,, = S" +y"= I hdZ n +µ n +yn .


Jo

Now yn -* p 0 under P. by (3.8.20), and hence

(14) y" — D 0 under Qb

by (4.1.36) and (4.1.37). By Theorem 3.4.2 we necessarily have

J
(15) 0
hdz np sJhdw
under Q„.
0

706 RANK STATISTICS

Thus

(16) Sn = Q T„ = h dZ n +µ n + yn = a S+ „ under Q.

We examine the deterministic A. by an indirect route in the next paragraph.


By Theorem 3.1.2

(17) S„ ->p hdWSN(0,o) under P.


J
By Theorem 4.1.3 the log likelihood ratio statistic L. for testing Pn vs. Q.
satisfies

(18) Lb, -> p bZ-2b 2 I[S0 ]j 2 where Z J3 dW°


0

with W' as in (3.1.72). Thus Theorem 4.1.4 yields


i

(19) S„ ^ d S+ bp ac h6 o dt under Q,
0

provided p„(c, a) - some p ia ,

since Coy [S, bZ] = bpca I'o h6 o dt under P„. Combining (16) and (19) shows that

(20) l n - bp co hS o dt as n - oo, provided p„ (c, a) -+ p ca •


0

Combining (16) and (20) gives

(21) T„ = a S+bp cQ h6 0 dt under Q„, provided p.(c, a) p ia .


0

In case pn (c, a) does not converge, we can argue as above on subsequences


where it does to conclude that

(22) T„ = Q S+bp„(c, a) J t
0
h6 o dt under Q.

It is important to note that (21) and (22) each give a representation to the
limiting form of T„ that is valid under the contiguous alternatives Qn of
(4.1.1)-(4.1.6).

CONTIGUOUS ALTERNATIVES 707

Asymptotic Power of Tests

Suppose now that we wish to test the hypotheses H: b = b o vs. the alternative
K: b> bo . As our test we will choose to reject H if the statistic
C Rh ob
n
nt ni
(23) 7'n = ^I
c' c n + 1

of (9) is "too big." Now from (22) [or from (3.8.20)] we have

(24)
0
T„0 = o
J
hdW+bopn (c,a)J1 h30 dt
0
under H: b = b o .

Thus, asymptotically, we reject H if TT > z ( °' ) Q,,+b op„(c, a) Jö hS o dt, where


z the upper a-point of the N(0, 1) distribution. The asymptotic power
of this test is given by

(25) Pb (TT > z t ° ) o -h ) = a P(N(0, 1)> zt')— (b —bo)p „(c, a) h8o dt/a-b f.
o

Now 3 o and the a „ ; 's are given by nature, while we choose h and the c„ ; 's.
How can we choose h and the c,'s to maximize the power? Clearly

(26) the asymptotic power depends on the c,'s only through p„(c, a)

and is maximized if p„(c, 1) - I [p„(c, a) = 1 if c,,, = a„ for all i]. As was shown
;

in (4.1.7),

(27)
f o,
S 0 (t)dt =0.

Thus

h 0 dt /ah = ar Corr [h(f), So(^)] = o s p(h, so)


(28) J

and is maximized if Corr [h(), &()] = 1.

Pitman Efficiency of Tests

For k = 1, 2 we let T» denote the statistics of (23) with constants c and


functions ht k) ; we assume p „(ct k ^, a) -+ p(c"' , a). Should we use T(,'P or T;; )
to try to distinguish Q„ from P„? Now the asymptotic power of T. against
708 RANK STATISTICS

alternatives Q. is given by (25) and (28) as


a
(29) Pb( Tnk) > z ( ) g.h( k))
-> P(N(0, 1)> z (°)— (b ( '` )— b0)P(c" ) , a)P(h ck) , 8 a )'.).

As a measure of the efficiency X 2 , 1 of the tests T;,2) with respect to the tests
T' ) we will take the ratio of distances of alternative
hypothesis from null hypothesis that produces equal asymptotic power for the
two sequences of tests. That is, equal power

cz) a)P(h(2 , so ) o = (bcn cn


(30) (b (2) — bo)P(c , — bo)P(c , a)P(hc"), go)oso

in (29) leads to W 2 ,, = (b t ' ) — b 0 )/ (b 2 — b o) satisfying



(2)
P(c (2) , a) P(h , so)
(31) ^2,^ = P(c ( ' ) , a) P(h('), so)

This is the idea of Pitman efficiency.

Asymptotic Linearity

We say that an ry X = F is stochastically larger than an ry Y— G if F ' (t) > -

G - '(t) for all t with > for at least one t (this is equivalent to requiring
F(x) <_ G(x) for all x with strict inequality for at least one x).
We now assume of the df's F„ ; in (2) that there is a single parameter family
of df's F,, for which

(32) F b = Fbo Q , a B
where F is stochastically increasing in B.

Thus our special construction of the rv's in this model is

(33) X bi (Fb,) - '(e„;) = F -''n^^ a a(^„r)•

We let (R 1 ,. .. , R) denote the ranks of (X 1 ,.. . , X ). We assume of the


df's FB that the parameter 0 enters in such a way that

um(RMb1,...'Rn= 6u m(n+1 — Rn1'...,n+1 — Rn„)


(34)
= (a possible ranking of the a's)

(in the location case when Fba ^ = F(• — ba„ ; / a'a), this condition clearly
holds).
CONTIGUOUS ALTERNATIVES 709
Consider now the rank statistic
"
c ni R bni
(35) Tfl b = h ( - j)

where we now assume that

(36) his/

and the a" ; 's and c" ; 's are concordant in that

(37) (an, - an;)(cn, - cn;)>0 for all 1 <_i,j<


_n.

Then it is clear that

(38) Tn is >' in b for each fixed w E f1.

Note that, because of (34), we have for each fixed w E SZ that

n n; h rank (a n; )
(39) tim T b -_ t max = n c
b-+ " " ,_,
1=V'c'c n+1
=(the maximum of all n! possible values of T„)

and

h(n +l -rank( a" ) 1 ;

(40) tim Tb =tm'"=


b-i-' n— n ;_, c'c l n+1 J
=(the minimum of all n! possible values of Tb).

We also implicitly assume that our statistic is nontrivial in that t' < Eb = O T„ <
t max
n

Theorem 1. Suppose (4)-(6), (8), (32)-(37) all hold. Then

(41) sup ITbn- To-bp n (a,c)


Ibn <a
J
o
h5 o dtH p 0 as n->oo

for each 0< B < oc.

Proof. Now for each fixed b the term, call it A, inside the supremum in
(41) converges to 0 in probability by (22). We will assume first that p "(a, c) >_ 0

for all n and hS o dt > 0. Note that in this case both T. - T , and °

710 RANK STATISTICS

bp„(a, c) Jö hS o dt are functions of b. Thus we can divide [—B, B] into subinter-


vals of length at most e/(2 J'o h3 0 dt) by using a finite number of division
points. By taking n E sufficiently large we can also be sure that <_ E/2
for all division points b 1 )> 1— e. The two conditions together imply
P(sup {linI: (bi <_ B} <_ r)> 1— e for all n >_ n e ; that is, (41) holds. The same
proof works in the other cases also, since both functions are still monotone.
El

Estimation

Treatment of an estimate of a fixed parameter typically involves consideration


of a test statistic under related local alternatives. Thus our special construction
is now (e„ ; 's as in Theorem 3.1.1)

(42) Xn ; = Feä^ ; (^n;) = Fea „ ; for 1 <_ i <_ n,

where (4)-(6), (34), (36), and (37) are still assumed for these a n; .
Note that (34) implies that for each fixed w E SZ we have

n
c n; h rank (an,)
(43) lim T B _ t ma"= n
;=, c i c n+1
=(the maximum of all n! possible values of Tn)

and

min _
n
c„; h
( n+I_rank
(a„;) 1
(44) lim T B —
nB' °D — n ;=1 C S C n+1 J
=(the minimum of all n! possible values of T.

We also implicitly assume that our statistic is nontrivial in that t n a” < 0< t";
note that 0= E9 = 0 T.". Thus

(45) 9n =sup{9: T„<0} and B 0 =inf{0: T„>0}

are a.s. well defined. Let

(46) 8„ denote any value in [8„, B„].

We will assume that

(47) hS0 dt > 0


0

CONTIGUOUS ALTERNATIVES 711
and

(48) lim p„(a, c) >0.

Theorem 2. Suppose (4)-(6), (8), (32), (42), (43), (34)-(37), (47), (48) all
hold. Then

(49) ✓a'a(9„ -9)=—


a
p „( jc)J So dt

Proof. Without loss set 0 = 0. We define

(a) b = ✓a'a 9, b„ = a'a B„, and so on.

Let T. be as in (35), and note that

(b) b„ = sup {b: T„ <0} and 6„ = inf {b: T>0}.

Since T° -> P f h dW, we can choose B> 1 so large that P(lTnI>


jö h8 0 dt lim p„ (c, a) B /4) < E/2 for all n > (some n, F ). Now use the asymptotic
F

linearity of Theorem 1 to claim that

(c)P( sup I
Ibl<B.
^h8 o dtl>E^<E
0 2

for all n >_ (some n 2F ). Let A „ denote the indicator of the intersection of the
F

complements of these two sets; thus P(A„ F )> 1— e for all n >_ n E = (n, F v n2E).
Moreover, for w E A n , the graph of T. must lie within vertical distance e of
the line y = T + bp „(c, a) f ö h3 o dt on the interval IbI <_ B F . That is, on the event
A r , the ry b„ is within horizontal distance s/[p„(c, a) ,(„ h1 0 dt] of the solution
point of the equation 0= T„ +bp „(c, a) f ä h6 0 dt. Thus

(d) b„ — b = o —
p„(c,a)J h8 ° d (

Conclusion (49) follows from (d), T„ = a Jö h d W and 6„ — b = ✓ a'a (8 — 0). ❑

Asymptotic Length of Confidence Intervals

The upper and lower endpoints 6„ and e„ of a confidence interval for 0 based
on (5) are asymptotically

e„ f
a'ap„(c,a)10h80dt
712 RANK STATISTICS

Thus
Ja'a(en—e„) (a) — Qh 1
(50) 2 Z a pn(c, a) Jo hS o dt o•50pn(c, a)p(h, s)

Note that (50) implies that if the optimal scoring function S o does not have
much dispersion, then the estimation of 0 inherently results in rather long
intervals.

The Multivariate Case

Suppose now that b is a vector and that [as in (32)]

bx x ,,
(51) Fes = F ; ß with ß = + • • • +
' '' 2 b „° i Z (call this Q")
• =1 x1 ^i• =i x i'p

for u.a.n. x ;; 's [as in (4.5.30)] and suppose (6) holds uniformly in IbI < B for
any 0 < B < co and for all a ; 's as in (4). Then, as in Theorems 4.1.3 and 4.5.3
the log likelihood ratio statistic L„ for testing Qn vs. P. satisfies

(52) Ln — Z„ — 2 Var [Z„] P^ 0 as n -^ oo

where

(53) n
Z = r b1xi1 2 + ... ^ ^— b 'p 2

]o0(en1) = a bb fo' bo dW(i),
L
^x ;•=1 x r'P i=

where

(54) Coy [W ' (s), W' (t)] = p;,; -(s n t — St)


( ) ) if p;; = lim p„(X; , X; -)exists
,

for the jth column X3 of the matrix X of x ;; 's. Also, for U we have

(55) Coy [U(s), W(t)] = p3 (s A t — St) = tim p„(1, X; )(s A t — St)

when the limit exists. For 1 s k s p define

( k)

1' b _ ', Cni Rni

t
(56) kn 1 = 1 J C (k) hk(n+l )

when the R„ ; denote the ranks of the Xb. ; . As in (22) we have


I P
(k) '
(57) QSk+( hkso dt) E bp n(C X;) forl^k^P
CONTIGUOUS ALTERNATIVES 713

for Sk as in (17); that is,

(58) Sk = h k dW k with W k = 4A/` ' (k

[note (3.1.66)]. Let T„ and S denote column vectors with kth entries Tk" and
Sk , let H denote a p x p diagonal matrix with kth entry Jö h,^3 0 dt, and let Mn
denote a p x p matrix with k, jth entry pn (c^ k ^, )C; ). We may summarize (57) as

(59) Tn = a S+HM"b = a T°+HM"b.

To mimic Theorem 1 we require that T„ be monotone in each coordinate 6 ;

separately. We thus assume

(60) each h k is /

and

(61) all vectors c X; , 1 <— j, k <— p, are concordant.

[Jureckovä (1971) contains a weaker condition than (61) that will suffice.] The
following result is then no more complicated than Theorem 1.

Theorem 3. Suppose (4)-(6), (8) for each k, (51), (33), (34), (56), (60), (61)
all hold. Then

(62) max sup ^T„ — T„ — HM"b^ --> P 0 as n - o0


175 l^P IbB

for each 0< B <.

Now modify the model (51) to

(63) F „ ; =Fß with ß = 0 1 x1 , +• • •+9P x;P

and try to estimate 0. The natural analog of (45), (46) is to choose 8” to make
T° (with the obvious definition) come as close as possible to achieving the
value 0. Then (59) shows (as in the proof of theorem 2) that 0. satisfies

(64) HM f 01)
S.
"L^X'PXP(BP"-0,)^ a

Of course, when HM" has an inverse, we can solve for our standardized
estimator and see that it is asymptotically normal with 0 mean vector and
714 RANK STATISTICS

covariance matrix

5 S

(65) Mn'H_, Y_ H_,M„ ' where I has j, kth entry p„(c ( », c (k) )(h; , h k ).
n n

Processes of the Form T. = f h„ dR n

Theorem 4. Suppose the contiguity conditions (4.1.4)-(4.1.6) hold. Suppose


the hypotheses of Theorem 20.2.1 also hold. Then the process T. of (20.2.7)
satisfies

(66) T„—p„ J h(r)S o (r)dr=T on (D,_q,II /q(^) as n-oo


a
0

where T is the process of (2.2.8).

Proof. The analog of equation (3.3.27) and then Lemma 4.1.1 together imply

(67) lim lim Q„(cd 4 ^ fq(1/m)?e)=0 for all s>0.


m oon- o

Under the null hypothesis we know from Theorem 20.2.1 that

(a) T(t) =T(t)


a
JI hdW`,
while the modified log likelihood ratio statistic Z„ of (4.1.18) satisfies

(b) Z„ =Z = S o dW°.
a

Recall Remark 3.1.1 to clarify the meaning of W` and W'. Since any sub-
sequence n' contains a further subsequence n” on which p n .,(a, c) -- some p as
n” -* co, we have

(c) 1
)J ° [TZ)] N L[O] ' LP°-h 1 oo3,So pO hl} tolf}bo ^^

Thus Le Cam's third lemma (Theorem 4.1.4) gives

T„,-(t) _ d po Ito } ,^+ S o dW ° under the alternatives Q.


(d)
J 0
THE CHERNOFF AND SAVAGE THEOREM 715

Since the -^' of (c) trivially extends to vectors (T „- (t,), ... , T„.•(tk )), so does
,

the -4 d of (d). Checking for proper covariance is trivial. Subtract the mean
from the left-hand side so that we do not need to assume p„ converges to
some p. Replace -> d by = a in (d) as done in (22). ❑

4. THE CHERNOFF AND SAVAGE THEOREM

Suppose X 1 ,. . . , X„, are iid with continuous df F and Y,, ... , Y. are iid with
continuous df G. Let N denote (m, n) when it is used as a subscript and let
N=m+n otherwise. Let A N =m/N and HN =A N F +(1 —A N )G. Let F m and
G. denote the empirical df's of the X; 's and Y; 's, respectively, and then
HIN - ANFm +(1 AN )G„ denotes the empirical df of the combined sample of
-

size N. Many classical statistics used to test the hypothesis that F and G are
identical are of the form

1 m
[^

m L
(1) TN- CN1ZNi

when CNI , ... , C NN are known constants and where ZN; equals 1 or 0 according
as the ith largest of the combined sample is an X or a Y. We define a score
function JN by

(2) JN(t)=CN; for (i - 1)/N <ti /N, 1:i <_N,

with JN right continuous at 0. Now note that

(3) TN=J^ JN (H N )dlF m .

If JN "converges" to some J, then the natural centering constant to associate


with TN is

(4) µN = f J(HN) dF.

We will consider a special construction of the X ; 's and Y; 's where

(5) m —F) =0_Jm(F) and f (G „ — G) =V,,(G)

for independent uniform empirical processes U m and V. (notice that this is


different from our usual use of the symbol V,,).
716 RANK STATISTICS

Note that

(6) Y^II(TN AN) = J V^I![JN(HN) - ✓ ( HN)]dF-

+ f J(HN)d^(F,,, - F)

= r ✓N(HN)- ✓ (HN) -
/ [HN-HN] dF.
V^►! + J ✓ (HN) dU -(F)
J HN—HN

(7) = ✓'(HN)[ANUm(F)+ AN(1 -


AN)V (G)] dFm

+ f J(HN) dUm(F)

AN J ✓'(HN)Um(F) dF+ A N (1-A N ) f ✓'(HN)V„(G) dF

U,,,(F)J'(HN)d[ANF +(1 -AN)G]


(8) -J

(9) = 1 -A N f V—
Af ✓ '(HN)Vn(G) dF

- '11 -A N f ✓'(HN)Um(F)dG)

(10) =TN= 1- AN {

-/1 -A J J'(H
N N )UJ(F) dG}

(11) = N(0, QN (F, G)),


where

O'N(F,G)= (1- J(HN(x)) ✓'(HN(y))


.1N){AN ff
x [G(x n y) - G(x)G(y)] dF(x) dF(y)

(12) +(1 -AN) JJ ✓'(HN(x))J'(HN(y))

x [F(x n y) - F(x)F(y)] dG(x) dG(y)


I .
SOME EXERCISES FOR ORDER STATISTICS AND SPACINGS 717

Thus a proof of the asymptotic normality of TN requires firming up the


approximations made in steps (7), (8), and (10).
We assume that for some 0< A 0 < 2 we have

(13) A o A N SI—A O for all N>1.

Suppose that for some function J the scores C N; =JN (i/n) are close to J(i/n)
in the sense that

I N-I
(14) ^cN;—J(i/N)H0 and C NN /J7.T-0 as N-I.00.

Suppose J has a continuous derivative J' and

(15) JI<_[I(1—I)]-1+s and lJ'l<_[I(1—I)] 2+S for some S>0.

Theorem 1. If (13)-(15) hold, then

(16) NI (TN tLN) — TN + P O


- as N-co

for the ry TN of (10).

Exercise 1. Prove Theorem 1. The techniques required can all be found in


the chapter on L-statistics. We note here only the helpful inequality F <_
HN/AN•

Example 1. If C N; = i/ N, then TN is just the Wilcoxon rank sum statistic. The


choice C N; = c '(i/(N+ 1)) gives Van der Waerden's statistic.
-

5. SOME EXERCISES FOR ORDER STATISTICS AND SPACINGS

Exercise 1. Let V. denote the uniform quantile process. Recall that e n: , <_
• • • - 6„,,, denotes the order statistics and S „ i = ; — fin:; _, is the ith spacing.
Let p„ ; = i /(n+1). The fact that

1 —
f,:i
(1) E(,;+1I .,:,)= 1
n+1 I—p,, ;
suggests that
1 1— .,:;
(2) ^
M.,(=fen:i
Pni) —
n +l j =, 1—p„j

= VR (Pn^)_
______
1 '' 1
L wn(Pnj)
n+1;=, 1—Pnj
71 8 RANK STATISTICS

is the appropriate basic martingale for this situation. Solving for V. gives

(3) .n(Pn,)=Mn(Pnt)– E Pn,–Pn, AN^U „(P„;)


i=l 1 – p„1

(4) =(1 – p) 1 DAAn(Pn;)


=l 1–p„ ;

Verify the claims made above.

Consider a slightly different version of the uniform quantile process, namely

(5) 4/ „(t) =^(e„:;–t)for(n+2)ct<n +1,0:sinn+l

so that N„/ V has jumps of size S„ at i/ (n + 2) for 1 ^ i ^ n + 1. Then consider


;

the linear combination of uniform spacings

h( - -)61_/ ] h d4/ „.
(6) T„=•/n[YI = fo,
Exercise 2. Find conditions on h under which

(7) T„ =T ° f t h dV = N(0, I[h11 2 ).


a o

Let us define an integrated uniform quantile process

(8) A (t) —'vf—


n ( n:O+Sn;I+
...
+ n:i) 2, ]
I

for 1 <i<_ n + 1.
n+1 t <—,
n+1

Then

1i I r
(9) Tn=^ o hdAl„.
n;E^h {n+l }fin:;–^o
th(t)dt]=

Exercise 3. Find conditions on h under which

(10) T„ =T= hV dt=N(0,v).


a o
SOME EXERCISES FOR ORDER STATISTICS AND SPACINGS 719
Exercise 4. Show that

(11) J
0
IU„(t)I k dt= J
0
O IN „(t)l k dt for all k>0.
CHAPTER 21

Spacings

0. INTRODUCTION

In Section 1 we consider the exact small-sample distribution theory of uniform


spacings S„ i . In Section 2 the limiting distribution of (S „ ,, 3 „ :n + ,) is presented.
:

Uniform spacings can be considered (for fixed n) to be the interarrival times


of a renewal process of exponential rv's that has been standardized by dividing
the total sum. Empirical, quantile, and weighted empirical processes of such
renewal rv's are studied in Section 3, and are specialized to the exponential
case in Section 4. Statistics used to test for a uniform distribution are considered
in Section 5. They are viewed as functionals on the processes of Section 4,
which process depends on the type of alternatives considered. Section 6 states
some LIL-type results for spacings.

1. DEFINITIONS AND DISTRIBUTIONS OF UNIFORM SPACINGS

The joint density of the order statistics„,, ... ,„:„ of a Uniform (0, 1) sample
is [see (3.1.95)]

(1) n !ontheregion0 :5&:, <. <1.

For 1 <_ i < n + 1 we define

(2) sni ° fn:i — Sn:i -1 ,

where n:O 0 and n:n +l ° 1; these are called uniform spacings. The joint density
of Snt,... 9 Snn is

(3) n! on the region S „ >_ 0 ;

for 1 <
—i n and S „,+• • •+S nn <1;
720

DEFINITIONS AND DISTRIBUTIONS OF UNIFORM SPACINGS 721

just note that the Jacobian of the transformation is one. Note also from
Proposition 8.2.1 that

/ al an+1
() (sn 1 1 • ' • s sn,n+l) — n+l ^ ' ' ' + n+l

where a, , ... , a n+ , are iid Exponential (1) rv's. From this we see that
Sn,n+, are exchangeable rv's; that is, permuting the coordinates of
5 n, n+ ,) does not change the distribution of the vector.

Exercise 1. Verify that each S" ; has density

(5) n(1—u)n-1 for 0<_u<-1,

while for all i ^ j the joint density of S" ; and 8"; is

(6) n(n-1)(1—u—v)n-2 foru?O,v?O,u+v<_1.

Thus

n
(7) E8;= --- and Var[Sn;]=
-

n+l (n+l)z(n+2)'

and for all i 0 j

1 1
(8) Coy [Sn;,S";]=—(n+l)2(n+2) and Corr[S" ; ,8,; ]=—n

Later in the section we will show that

(9) P(snl25x and 5 nzY) P(snl^x)P(snz^Y) forall0_<x,y^1.

We let 0<_ S n1 s <_ S n:n+ , denote the ordered uniform spacings. Note that

an+la an+l:n+l
(10) \Sn:1,•• ‚ 3 n:n+l) — n+l ''''+ n+l )'
a; E, ai

where 0 = an+l:o < an:,:, < • • : an+,:n+, are the order statistics of the sample
a l , ... , a" +1 of Exponential (1) rv's.

Exercise 2. (Sukhatme, 1937) The normalized exponential spacings y" of a ;

sample a, , ... , an of Exponential (1) rv's are defined by

(11) yn;=(n—l+1)[a,.1—a,,:l_, ] for 1 is n


722 SPACINGS

and are iid Exponential (1). Show that


x rc

(12)
( Yni =^ an:i
I I

and

(13) an:i=Yn1+ Yn2 + •+ Yni _ j Yni for 1 i<n,


n n-1 n —i+1 ; L_•, n—j+1

Now show that (S ,, ... , S„ „ + ,) has the same distribution as does the vector
:

whose ith coordinate is


n+ 1
c• Yn+I,i /
(14) L Yn+,,;
'
j., n—j+2

Thus show that

1 1
(15) ES„ = :;

n+1 ; , n —j+2

while for i <_j

(16) Coy [S n :i , S n :j ]

1 1 1 1 i 1
(n+1)(n+2) Lk 1 (n — k-2) z— I (n—k+2) k I n—k+2

Also show that (&‚. . , S n,„ + ,) has the same distribution as does the vector
whose ith coordinate is

(17) (n — i +2)[lin:i—sn:i—I]•

Exercise 3. (Lewis, 1965) Show that


n i n+1
(18) Ln — Yn+l,i1 L Yn+l,i
1 I I I

is distributed as the sum of n iid Uniform (0, 1) rv's. It thus has mean n/2,
variance n/12, and approaches normality very fast as n increases. Show that
n+tn+l r n+l
(19) L„ =2 (n+1)— 2
iOn+I:i/ a ; ] = I (n
+1)— i5n:i],

and note in this expression that the large S „ :i 's are weighted the most heavily.
DEFINITIONS AND DISTRIBUTIONS OF UNIFORM SPACINGS 723

Lewis proposed L. as a suitable statistic to test if an iid sample a 1 ,. .. , a,


was indeed from some Exponential (0) distribution.

We now turn to Renyi's (1953) representation of spacings. Start with


,,..., „ as independent Uniform (0, 1) rv's. By the inverse transformation
of Theorem 1.1.1 we see that

(20) an:n -i +l = —log n:i for 1 <_ i <_ n

are Exponential (1) order statistics. As in Exercise 2 or Exercise 8.2.1, the rv's

(21) Yni=(n — i + 1 )[ an:i — an:i-1 ] fort i<_ n

are independent Exponential (1) rv's (here a„ :o =0). Also

( _ n—i+l Ynj
(22) fn:i = exp ( — an:n -i +l) = exp E
j . 1 n —j+l

so that (with y n,n+l = co)

—i +l
Ynj Yn,n —i +2
8ni = en:i — n:i-1= exp — . 1 1—ex
; i _ n—j+1) p( ( i -1

(23) _^,^ „ I1 — eX p (
i
Yn,n -i +2
i-
11
1 /) J for l<_i<_n+l;

note that (23) represents the spacings as the product of two independent factors.

Exercise 4. (Proschan and Pyke, 1967) For 1 < i <_ n

l i Ynj 1 2 c
( 24 ) E ( n:i—
n+1) — ( 1 n/ 1 n—j+1J n2

for some absolute constant c.

Exercise 5. (Pyke, 1965) Let X 1 ,. . . , X„ be iid F with positive density f


and unique quantiles a = F - ' (u) and b = F ' (v) for 0 < u < v < 1. Suppose
-

i/n-+u and j/n- v as n oo. Then the rv's n(X. :i — Xn:i _,) and n(X—X„.; _,)
are asymptotically distributed as independent exponential rv's with means
1/f(a) and 1/f(b).

A nice reference for these results is Pyke (1965).


724 SPACINGS

Exercise 6. Show that for 2 < i <_ n +1 the rv's

(25) Zi = S„ i S are independent with P(Z > z) = (1— z) " ' ; -

for 0 < z < 1. [Sweeting (1982) uses this fact to test for outliers.]
Proof of (9). We note that

4(u)= P(S n2 C YI 5 n1 = u)

=P &n2 < n1IS„, =u)


( 1—u 1—u

=P (&2_1,1 j A i)

(a) _ (an 1' function of u).

Thus for x, <— x2 we have


P(S n2 sy and 3 nl x,)
P(Sn2 C YIsnl C xl) =
P(Sn1 x,)

X' P( 8 n2 5 yI 6 1 = U)
dFD^i(u)
= P(bn1^Xl)

cJ o x2 p( P(sn1 x2) F
by(a)

(26) =P(6 n2^Y1Ön1^x2)•

Set x, = x and x2 = 1 in (26) to get

P(S n2 25 y and 8nl <x)


=P(S„z^YlSnl^x)
P(5 n1 <_x)

(b) ^ P(sn2<Y) by ( 26 )

as claimed. This is a special case of the next exercise. ❑

Exercise 7. (Beirlant et al., 1982) Show that

^k7
(27) P(ti„ 1: X,,..., 8 nk' 5 xk)^ 11 P(sni C xi)
i=1

and
k
(28) P(lSnl>xl,...9snk>Xk)^ fl P(sni>xi)•
i=1
LIMITING DISTRIBUTIONS OF ORDERED UNIFORM SPACINGS 725

The Length of the Spacing that Contains the Next Observation

Now 6,, ... , e„_, divide [0, 1] into the n subintervals [e„_, ;; _, ,„_,;;] of length
for 1:5 i <_ n. Let S * be the length of the subinterval
containing the next observation e„.

Theorem 1. The density of S * is

(29) n(n —1)t(1— t) „-2 for 0<_ t<_ 1; that is, 5 * = Beta (2, n-1).

Proof. Since all S n _, ;; _— S„_, ; , with density (n — 1)(1 — t) n-2 for 0< t <_ 1, by
(3.1.90), then (29) is immediate from the generalized version of Theorem 1
contained in (30) below.
Let [0, 1] be divided into n disjoint subintervals in such a way that the
length X; of each subinterval is a ry with the same df F for each 1 <_ i n. Let
f be Uniform (0, 1), and independent of the X; 's. Let X, denote the length
of the subinterval that contains the point and let its df be F. Then F * has
a density with respect to F given by

(30) dF (t)=nt for0<—t-1.

Let I denote the index of the subinterval in which 6 falls. Then

F*(t) = P(X* ^ t) = P([I = i] n [X1 ^ t])

i =1 JO
I P(I = i l Xi = s) dF(s)

(a) =n ^ r s dF(s).
0

Thus (30) holds. Note also that

(31)
0
J
EX* = J 1 tdF* (t)= tntdF(t)=nEX1
0

= n{Var [X,]+E 2 X,1= n Var [X,]+EX,

holds in the general case. (This was taken from a Berkeley preliminary.) ❑

2. LIMITING DISTRIBUTIONS OF ORDERED UNIFORM


SPACINGS

We now show that the limiting distribution of the minimal and maximal
uniform spacings are exponential and extreme value, respectively.

726 SPACINGS

Theorem 1. (Darling; Levy)

(1) P((n+1)2Sn :i: t)- 1—e ` as n- - forallt>0.

(2) P((n+1)Sn:n+,—log (n+1) <t)-.exp( —e `) - asn->co for all t.

P((n+1) 2 S n:l > t and (n+1) 3„ ; „ +1 —log (n +1)<—logs)

(3) -* exp (—(s + t)) for all s, 1>0.

Proof. Let Y, ., ... , Y , be independent Exponential (1/(n+1)) rv's. Let


m„= min (Y1,..., Y. +1 ),M„= max (Y 1 ,..., Yn+,), and 1„ = Y, +• • +Yn+,• .

Note that

(a) Y „ -+ P 1 asn-oo [in fact (log (n +1))(- 1) p 0]

since 1„ has mean 1 and variance 1/(n+1). From (21.1.10) we know that

M .

(b) Sn:I =mit and Sn:n+I = •


^n ^n

Now

Exponential (1) = (n + 1) 2 m„ by Exercise 8.2.2

=(n + 1 ) 2 MI I„
n

(c) =(n+1)28„:,[1 +Op(1)] by (b) and (a),

so that (1) holds.


Note that

log((n +l)/e ` )

P((n+l)M„ —log (n+1)st) =P M„s


(n+1)
= log((n +l)/e - `)1n +'
P`Y'^
n+1
e tn+l

= [1 —exp(—log ((n+1)/e - `))]n +'=


1—n+l]

(d) -> exp ( —e`) as n -> oo.


RENEWAL SPACINGS PROCESSES 727

Also
(n+1)M "- log(n+1) I"
(n+1)M„-1og(n+1)=

+n+l)^^n log(n+l) I”+(log(n+l))(I n -1)


J
(e) =[(n+l)S":"+,—log(n+l)][1+o,(1)]+op(1)
by (a) and (b).

Now (d) and (e) combine to give (2).


We leave the joint distribution (3) to an exercise.
The results (1) and (2) are from Levy (1937), while (3) is from Darling (1953).

Exercise 1. Verify (3).

Exercise 2. (The exact distribution of S„ : " + ,) Use (21.1.3) to show that the
joint density function of (8, .... S" ; ) at the point (t,, ... , t ; ) is

(4) n (1—t,—•••—t ; )" '


^ - ont ; >0and t,+ •
(n —

while for such points

(5) P(3-1> t,,..., S„ ; > t ; )=(1—t,—...—t ; )".

Now use the principle of inclusion and exclusion to show

(6) P(n:"+t>t)=(n+1)(1—t)"—(n21 (1-2t)"


)

+...+(_1)i - i^ n+1 1(1—i t )"+..


1 )

where the series continues as long as (1—it)>0. (See David, 1981, p. 100.)
[The exact joint distribution of S" : , and S + , is given by Darling (1953).]

[See Rao and Sobel (1980) for generalizations to ith largest and ith smallest
spacings.]

3. RENEWAL SPACINGS PROCESSES

Let X 1 ,. . . , X. be iid with df F having F(0) = 0, density f, mean > 0, and


variance 0. 2 < cx . We let F" denote the empirical df of X 1 ,. . . , X". We will use

728 SPACINGS

terminology based on the idea that X1 ,. . . , X„ are the interarrival times of a


renewal process. We define
rt

(1) Dn;=nX; Y_ X; =X; /Xrt forl^i^n

to be the normalized renewal spacings. Their empirical df is

1
(2) -n I1 =l[D..ryt=F„(Xny) for0<_y<oo.
1

Moreover, Proposition 8.2.1 tells us that

(3) spacings and the D,, ; of (1) when F in (1) is any exponential df
with mean Fc have the same finite-dimensional distributions.

If we are interested in testing that the Xi 's are exponential against some
alternative F, then

(4) the representations (1) and (2) are correct, jointly in n,


for renewal alternatives.

However, (3) is incorrect, when viewed jointly in n, as a representation of the


spacings of a sequence of independent Uniform (0, 1) rv's.
We now consider the special construction 16n ... , e„„, n >_ 1} and U of
,,

Theorem 3.1.1, and we suppose above that

(5) =F - 1 (
X1 =X,i— 1 ) for 15i<_n.

We define the normalized renewal spacings process to be

(6) Xn(y)=Vi[F,,(X,,y)-F(µy)] for 0<


_y<oo.

We recall from Theorem 3.1.2 that as n -> oo

(7) Zrt-^(X„-A)= J ^ F ' du „ -* o Z=


0
- J F' duJ = N(0,0, ).
0
2

We also have the representation

X (y) _ Vi[Fn(Xrty) - F(Xy)} + ✓n [F(Xrty) - F( µy)}


(8) = UJ, (F(y))+Z,,yf(yµy)

for some µ* between X„ and µ, provided F has a density f that satisfies

(9) f is continuous on [0, oo) and yf(y) -*0 as y - oo.


RENEWAL SPACINGS PROCESSES 729

Theorem 1. If F satisfies (9), then the special construction (5) above satisfies

(10) IIX — v(F(µ'))II-p 0 asn-*x,


where

Z '
(11) U U +— F - 'f o F '
- with Z F ' d u.
-

lu o

It seems appropriate to introduce the notation

(12) X_ J(F(p.))•

Proof. From (8) and (11) we have

Iix -v(F(SI))II
IIUn(F(XnI)) — U(F(Xnf))II +IIu(F(X„I)) — U(F(µ'I))II

(a) +I Z*I suP IYµv.f(Yµy) — Yl^i(Yµ)I + (µ n —ZI sup yf(Yµ)


A Y I
(b) °Ynl+Ynz+yn3+Yna=Yn2+yn3+op(l)^

Now 7n2 -> p 0 also, since U has uniformly continuous sample paths and the
mean-value theorem gives

(X) * * I _ op(1).
(c)
* II F(I) — F(I)II = supY I AY
Yf(Y
I^Y 1i Y )

We can claim yn3 - p 0 if we can show

(d) sup IYµy.f(yp. ) — y f(Yµ)I - p 0.


Y

But (d) is clear since (9) implies that zf(z) is uniformly continuous on some
[0, M], with M chosen so large that both yµf(yp.) and yµ*f(yµ*) are uni-
formly small on [M, co) with probability exceeding 1— E. [Note that (9) is a
bit stronger than we actually need.] See Pyke (1965) for a version of this
theorem. ❑

The natural inverse process to associate with the renewal spacings is

( 1 3) Vn(t)°%ki(F'(t)) ✓i Dn:,— F (t) ^ fortnl<t<-,1<_i<n


L 11

with V(0) = 0; we will call it the ordered renewal spacings process. Note that
730 SPACINGS

it is essentially a quantile process. Now

(14) ^'„=µf(F-') ✓ f1 1
—n J
F- 'f(F-' )
=Xn f(F ')V(F.'— F ']—^(X„— s)
- -

Xnk

(15) = — Xn L —Q„ + µ" F- 'f(F-') J ,


where

(16) Q f(F ')^[F„' —F ']


- - for0 t <1

denotes the standardized quantile process treated in Chapter 18. In Chapter


18 we gave conditions under which 110 n —N„ 11 o p 0, which is the key ingredient
needed to handle (15) as in Shorack (1972b). We thus have the following
theorem.

Theorem 2. If F satisfies (9) and if 110„ —N„ -*' 0 as n -^ oo, then

(17) V„ +611 -> p 0 as n->co

for v = U+(Z/µ)F 'f(F ') as in (11).


- -

We define the weighted normalized renewal spacings process by

cn'
(18) Zf(y)= E [1[x1<X.,r]—F(Ly)] fory>0

(19) =W,,(F(.'ny))+p,,(l, c)'Vi[F(X,,y) — F(µy)],

where W,, is the uniform weighted empirical process of the Uniform (0, 1)
rv's f =F(X ) and p„(1, c)=c'1/ 1'lc'c as in Section 3.1. Thus virtually the
; 1

same proof as we gave for theorem 1 establishes the next theorem.

Theorem 3. If F satisfies (9) and if the u.a.n. condition


z
(20) max ,
c 0 as n - 00
I^__n c'c

holds, then the special construction (5) above satisfies

(21) IJZ — *(F(p'))II -' r 0 asn->X,



UNIFORM SPACINGS PROCESSES 731

where


(22) NN=W+ZF - 'foF- ' with Z = F ' d U. -

We introduce the notation

(23) Z-*(F(p ))•

Remark 1. If we use the notation

(24) 8:1=(X1+ +X,)/(X,+ ..+-X„ +i ) for1:isn

[even though these :j are not Uniform (0, 1) order statistics unless the X ; 's
are exponential], then the quantile process of these „:; was considered in
Section 12.2.

4. UNIFORM SPACINGS PROCESSES

We begin by examining what the processes X. and V„ of Section 3 reduce to


in case the df F of Section 3 is exponential (without loss of generality we may
take the mean to be 1).
Recall that ^_, , ... , _, ,,_, denotes the sample of size n - I from the
Uniform (0, 1) df guaranteed by Theorem 3.1.1. The normalized spacings of
the order statistics 0 „-, :o <_ e„_ 1:1 <... < 4 h _, :n - 1 < en - 1:n = I are

( 1 ) D„;=nS„-1,;=n(en-,:,-fin-,a-,) for l<_i<n.

Their ordered values are denoted by O< D. 1 < . .. < D„ „, :

The normalized uniform spacings process [see (21.3.6)] is


i n
(2) Un(t)=- Y 1 [ 1 [D,i^ -1 og( 1-1 )) — t ] for0:5t<_1.

The ordered uniform spacings process is

(3) 4/n(t)=(1-t)'/W[Dn:,--log (1-t)] 1 1 <t 1 , 1<i<_n


for -
n n
with C ' ° 0. The limiting normal processes are

(4) UJ(t)°U(t)-Z(1-t) log (1-t) for 0:t<_1

where Z = - [log (1 - t)] dU(t)


Jo

732 SPACINGS

and
A

(5) 0 = — v,

where U and Z are jointly normal with

(6) Z = N(0, 1) and Cov [UJ(t), Z] (1— t) log (1—t).

Thus the covariance of the v process is

(7) K ( s, t )=sAt— st — [(1 —s)log(1— s)][(1 —t)log(I —t)]

for0ss,t<_1.

(Note that the 6 process arose in Pettitt's Table 5.6.1 where the distribution
of rv's like J 1 [v„(t)] 2 dt is tabled, see the rows labeled "E ".)
Consider the weighted normalized uniform spacings process

(8) W(t) ° E c", [ 1 [0^;^ ^o 8 c1-,» — t] for 0_ t_1


c c ;,

for constants satisfying the u.a.n. condition


z
(9) max " -^ 0 as n -^ 0o.
1.5n c'c

The limiting normal process is

(10) 1(t) = W(t)— Z(1 —t)log(1 —t) for0<_t<1

whenZ= — J ,

0
[—log(1 —t)] dU(t)

with covariance function

(11) s A t — st+(1 -2p,,)[(l —s) log (1 — s)][(1— t) log (1— t)]

for0<_s,t<_1

provided

c' 1
(12) p„(1,c)= -+somepi, asn -00.
1'lc'c
TESTING UNIFORMITY WITH FUNCTIONS OF SPACINGS 733
As in Chibisov's theorem (Theorem 11.5.1) and as in Theorem 11.5.3 we
suppose that either
t/z
(13) q E Q* and T(q, A) =t - ` exp (—,1g 2 (t)/t) dt <oo for all A >0
0

or

(13') q€Q and g(t)=q(t)J tlog 2 (I/t)-*co as

1. If q satisfies (13) or (13'), then

(14) j(Vn—V)/q)! -*0 as n-+oo

where 4/°, equals 4/„ on [1/(n+l), 1] and 0 elsewhere. If qE Q* and


jö [q(t)] -z dt <co, then

(15) ^j(U n —QJ)J9II -0 as n-co,

If (9) and (12) hold, q E Q* and Jö [q(t)] -2 dt <oo, then

(16) 11(DNS„ —14N)Jgj! - p 0 as n -* 00.

Proof. Consider N„, and note the identity (21.3.15). Apply Theorem 18.1.3
to the O n term and either a Taylor's expansion or l'Höpital's rule to the
multiplier —(1 — t) log (1 — t) of Z. We leave the rest as an exercise. ❑

Exercise 1. Provide the details of a proof of (15) and (16). See Beirlant et
al. (1982).

5. TESTING UNIFORMITY WITH FUNCTIONS OF SPACINGS


Testing an lid Sample

Suppose now that we are testing n — I data values, , ... ,„_, in [0, 1] to see
if they can be regarded as a sample from the Uniform (0, 1) distribution. Thus
we have the

(1) null hypothesis: f,,..., are iid Uniform (0, 1).

Suppose now that the null hypothesis of uniformity is true. If S„_, , ... ,fin-t,n
denote the n spacings between the data values, then under the null hypothesis

(2) Dnj = nsn-t,:

(3) = Y / Y
; for lid Exponential rv's Y,, ... , Y„, under the null.
734 SPACINGS

Throughout this subsection, we take as our

(4) alternative hypothesis: e,, ... , fin _, are lid on [0, 1],
but are not Uniform (0, 1).

Throughout this section it is convenient to suppose

(5) Y- G where G(x) 1 —exp ( —x) denotes the Exponential (1) df.

Let g(x) = G'(x) = e x, and note that

(6) G-`(t) = —log (1— t) for 0: t < 1.

Many statistics designed for testing uniformity against the alternatives (4)
are of the form
n

(7) T'n=-Y [h(D1)—Eh(Y)]


1

(8) _-- ^[4(G(Dn1))


— EO(e)] where -1
t

(9) _ 14 dLn
0

for ( n given [as in (21.4.2)] by

I n
(10) U(t) -7= [i(D^G-1(:] —t] for0<_t<_1.

Note that the df H of the ry h(Y)=h(G `(e)) satisfies


-

(11) H ' =h ° G '


- - where H is the df of h( Y) with Y G and h 7.

We also note that


1 n 1

[h(Dnj)—Eh(Dn,)]—Jo 0dÜn=— o undo,


(12)
where

(13) / [1Eo. G `u» — F,,°G - `(t)] for 0<_tC1


Ü (t) --
Vn i = ►

and

(14) Dn; =Fn withF,,(y)= 1 — (1 — y /n) n- ` for0^y^n.



TESTING UNIFORMITY WITH FUNCTIONS OF SPACINGS 735
Proposition 1. We have

(15) [Eh(D„;) — Eh(Y)] = 0(n -116 ) provided Eh z (Y) <oo.

Proposition 2. We have

(16) lv„—ÜI->p0 and IIv„—ü11-* v 0 asn-.00

for the 6 of (21.4.4).

Theorem 1. Suppose

(17) h = h, — h z where each h k is 7 and left continuous on (0, co)

and

(18)
f o ,
t(1—t)dcb k (t)<oo

A
fork=1,2with¢ k =h k (G

Then the statistics T. and T„ of (7) and (12) satisfy

(19) T„ N(0, Q 2 ) and T„—T„=0(n 6 ) asn - oo,

where

(20) Q 2 =Var[h(Y)]—{E[(Y-1)h(Y)]} 2 .

Exercise 1. Show that Eh k+s( Y) < oo for any ö>0 implies


( t) O k (t)<oo, and this in turn implies Eh( Y)<oo. Show, however,
tad
that neither implication is reversible.
Proof of Proposition 1. Without loss we assume h is >'. We note from (21.1.5)
that the df F. and density f„ of the D„, are

n —1 yIn-2
(21) 1—F„(y)= I- } ny" - '
and f„(y)= n 11—n for0--ysn.
Thus

(a) —log f„(y)? y-2y/n for0^y<_ n,


while
+'
/z
—log ( 1° ih _(n ) [( ) +21 n)+3(^
n) -2
n
23 \
)

(b) ^ n (n1 +2 (n ) +2 (n) ^... =y+2 ( y in) 2

y/n

736 SPACINGS

Together (a) and (b) give

(c) <_g(y)exp(2y/n) for0^y<n


i
(d) „(Y)— >g(Y)
( 1—n)
exp
(_/(i n )) for0sy<n.

Thus for 0!5 y s a n when 1 s a„ < n/2 we have

If(Y) g(Y)I g(Y) I [exp (2nn)-1]


v 1—(1—n) ( _ f/(i
L exp n//J}
2a,2a„ 1 a n a„
^g(Y) n exp
\ n / v \n + 2n^\ 1 nl/

an g(Y)[ 2 e2 v 2]
n
15a„
(22) c for 0<y s a n when 1 <_ a„ <_ n/2.
n g(Y)

We also have [using (a)] that

(23) I.%(Y) — g(Y)I ^f,(Y)+g(Y) ^ (e 2 + 1)g(y) <— 10g(y) for all y? 0.

Since h/ and since we can replace h by h +Constant in (7) or (12) without


changing the value of T„ or T„, we may suppose that for y sufficiently large
h is both >0 and ‚”. We thus note that

^f a
Ih(Y)I
e-Ydy< Ih (a)+a j Ih(Y)I[Ih(Y)I+y4]e-'dy

2 hEh2(Y)evdy z(Y)e ydY


cIh(a.^)I+an I Ja ,

1/2
+2 f Y e 8 -Y dy I I
by Cauchy-Schwarz and then the c, inequality

0(1) ifa„-+co asn-*oo.


(e) Ih(an)I+an
TESTING UNIFORMITY WITH FUNCTIONS OF SPACINGS 737
Thus
ViEh(Dn1) Eh(Y)1 =V J h(h S) dx+ J

' — h(h —g) dx
o a

15a„ 1o ^ lh(y)l e -v d y

+ 10Vi Jh(y)j e '' dy by(22)and(23)


f a,
-

15an (1')I
+l0 lh(a +an o(1) by (e)
(f) = O(n - '/ 6 ) if we now let a„ = n'/ 6

as claimed. ❑
Exercise 2. Minor changes in the proof of Proposition 1 give
(24) ✓n II F„ — G 11 = O(n - '/ 6 ).

Proof of Proposition 2. That 116.411 -- P 0 follows from Theorem 21.3.2 (or


Theorem 21.4.1). This and (24) imply I 0„ —vgl -0. ❑
Proof of Theorem 1. Now
Var [Ü„(t)]
= Var [1[o,,^o-'(,)]l +(n —1) Coy [l [ on ,<a - '(f)] , 1 (D•.2^G - '( , )I]
Var ( )}
[1 [D,, ,, G -' , ] since all Cov<_ 0 by (21.1.9)
= P(D ,,1 ^ G '(t))P(D,,>> G '(t))
- -

(25) ^ Constant • t(1 — t)


since, as in the proof of Proposition 1, we have
(a) I — F„ (y) < e[ 1— G(y)]
and we also have
„(y) = 1 — (1 — y/n) " ' - withy=G-'(t)
n-1
s1-1+ ysy

=—log (1 — t) = t +^ +3 +•
^ t+t 2 +t 3 +•

= t/(1 — t)

(b) <
-2t for0 -t <_z.

738 SPACINGS

Thus
Var U. doII = foO f oO E[vn(s)U„(t)] dcb(s) d(t)
L f."
e z
J {E[v„(t)]
('

z }'/ z do(t) by Cauchy-Schwarz

:[ L
J B {Constant' t(1—t) }'
o
/Z y (25)
do(t) zb
]

(c) <—r for 0:5 some B by hypothesis (18).


E

Hence Chebyshev's inequality gives

0 do I ?s)Var
<_ [J
(d) P\J o
o / oBLd ]/ czcE3,EZ —E '

with an analogous result for ,(1_ e U. do. Since an elementary application of


l'H6pital's rule shows that

(26) Var [cl(t)] <_ Constant • t(1— t),

the same proof used for (d) gives


e
P(
(e) J0 6 d^ > r ) c E'

with an analogous result for f ;_ B Ci do. Also


i o
J f
(' -

(f) I O n dcß — 0 d4 < v n — OJ V J do -, 0.


e ”, e

Combining (d)-(f) gives

(g) T„= 0 n do p T= J Üdo asn- X.


0 0

Now Exercise 2.2.27 shows that T = N(0, Q z ) where

^ t K(s, t)dcp(s)d4(t)
J o Jo

J J [sAt st]do(s)do(t)
=

0 o
(1

—S [ —(1— t)log(1 — t)]d o(t) }


z
0 11
TESTING UNIFORMITY WITH FUNCTIONS OF SPACINGS 739

= Var [4(e)] — j f [—(1— t) log (1— t)] dh(—Iog (1— t)) } z


'. o J

= Var [h ° G '(e)]
- — Ye "dh(y)I
tJ 4
'

e-ydylz
=Var[h(Y)]—^ I (y-1)h(y)
o J
(h) = Var [h(Y)] —{E[(Y-1)h(Y)]} z

as claimed. C

An interesting interpretation of some of these spacings statistics is found


in Kale (1969).

Exercise 3. (Holst and Rao, 1981) Prove a limit theorem for statistics of
the form Y n= , h(Dn1 , (i —z)/n)/n.

Testing a Renewal Process for Exponentiality

Consider the more general family of statistics

(27) 1n ° ^^ 1 ^; c [h(Dni) — Eh(^')]

(28) _ 0d* n with 46=hoG - '

and *, given by (21.4.8). Suppose


z
(29) max ' 0 and p(1, c)-+some p, c as n -*oo.
1sisn C ' C

Let T. and MU,, denote the obvious modifications of T n of (27) and *. in the
spirit of (12). Such a statistic with c ni >' in i might be appropriate for testing
against the

J alternative hypothesis: the XI 's of Section 21.3 are


(30)
stochastically increasing.

Theorem 2. If (17) and (18) hold, then the T n of (27) and its modification
Tn satisfy

(31) Tn -* d N(0,Q 2 ) and T—TO(n '/ 6 ) asn->ao,


740 SPACINGS

where

(32) a. 2 =Var[h(Y)] +(1 - 2p,,) {E[(Y-1)h(Y)] } 2 .

Proof. The proof is virtually the same as that of Theorem 1. ❑

Testing for Exponentiality

In this subsection we test exponentiality against a different type of alternative


from the previous subsection; we consider the

(33)
1 alternative hypothesis: X1 ,. .. , X„ are lid F( / 0) on [0, cc) for some
0>0 where O< f x 2 dF(x) < oo, but F is not Exponential (1).

A class of statistics that has been proposed for this problem is

11 n

T„=^I J(
n ;=, n+I
i X„
X"`J —
J oxJ(F(x)) dF(x) .
(' °O

More generally, we consider


n °D
(34) T„ = N/n — F, ' cni Xry' — f o xJ(F(x)) dF(x)

for constants c 1 ,. .. , c„„ whose J„ function [defined by J (t) = c„ for (i- J ;

1)/n < t <_ i/ n, 1 <_ i s n ] "converges” to some function J. We let T„ denote


the associated linear combination of order statistics
1
f
„ o0

(35) T„ _ — Y- c„ ; X„ — xJ(F(x))
:; dF(x)
n=i o

that were treated in Chapter 19. Note that

T„ — ✓n (.Y„ -1) Jo xJ(F(x)) dF(x)


(36)T„ = X n

(37) i F ' d
T = T— UJ
LJ - F(x
J f ^ xJ( )) dF(x) provided T n ^ T.
0 .

In Chapter 19 we showed that

J
I

(38) T = T=
T JU dF ' - under "certain hypotheses";
a — o

and under such hypotheses we have T= N(0, a 2 ) with Q 2 computed from


Theorem 3.1.2.

ITERATED LOGARITHMS FOR SPACINGS 741

Exercise 4. Compute Q 2 .

Exercise 5. When X 1 ,. . . , X„ are iid G, then

(39) a„ =ED„ = Y_ (n—j+l) '


; :; for1<i^n.

Now define a function a„ on [0, 1] by letting a(t) = a„ ; for (i — 1)/n < t z i/n
and 1 s i <_ n, with a„ (0) = 0. Show that

(40) (1— t)Ia„(t)—G '(t)I<_?


n
- for 0<_t<_1- 1
n

[See Shorack (1972b) for a proof.]

Exercise 6. Jackson (1967) proposed

(41) T„=^I n
LL 1
Xn

a, .
1
]

Show that in the special construction representation we have

(42) T„ -° J ” G - 'U dG - ' — J , G ' d U - under the null hypothesis.


0 0

Exercise 7. Show that in the special construction representation

(_i)2
dt
(43) fo \—(1—t)log(1—t))
under the null hypothesis.

This statistic is suggested by the standard exponential probability plot.

6. ITERATED LOGARITHMS FOR SPACINGS

Devroye (1981) shows that the kth largest spacing (denote it by K„) induced
by , ... , e„_ I satisfies both

nK„ —log n _ 1
(1) li 0 k a.s.
2 10 $z n
and

(2) lim (nK„ — log n + log 3 n) = c a.s. where —log 2: c < 0.


742 SPACINGS

In Devroye (1982) it is shown that S„ :k satisfies

( 0 "0 a„ ( Goo
(3) P(n2Sn :k ^ a n i.o.) = j 1 according as Y =j
l n =^ t1 l =^

provided a n > 0 satisfies a n / n 2 J, . If in addition a n T, then Devroye also showed


that

(4)
1 0
P(n23 n:k ^ a n i.o.) = S 1 according as
°° a„ (

exp (—a n ) = S
<co

t n=i n l o0
CHAPTER 22

Symmetry

1. THE EMPIRICAL SYMMETRY PROCESS S. AND THE


EMPIRICAL RANK SYMMETRY PROCESS R.

Let X 1 . , X,... be iid with df Fand empirical df F„ and suppose we have


,.

the representation X = F '( r= ) for independent Uniform (0, 1) rv's


; - ;

.... As usual, we let G„, U,,, W, and R. denote the empirical df,
the empirical process, the weighted empirical process and the empirical rank
process of , ... , i,,; see Section 3.1. Of course,

(1) U(F) is the empirical process of X 1 ,... , X.

The df of —X 1 ,. . . , — X„ is

(2) F(x) = 1— F(—x —) on (—co, co),

and note from (1.1.15') that the empirical df of —X 1 ,. . . , — X„ is

(3) i(x) = 1—t n (— x —) on (—oo, oo)


(4) = 1— G „(F( —x —) —) on (—oo, oo) a.s.

In fact, (4) holds on (—co, oo) unless some achieves a value t satisfying
F(x,) = F(x 2 ) = t with x, 54 x 2 .

Thus, the empirical process of —X 1 X„ is ,. .. , —

F(x)] = — U„(F(—x —) —) on (—oo, co) a.s.


(5) _ —U„((1— F(x)) —) on (—oo, oo) a.s.

Then

(6) H=F+F-1 on[0,co)isthedfofJXJ


743

744 SYMMETRY

and

(7) H _ F n +F n —1 on [0, co) is the empirical df of IX,I, ... , IXnI

The absolute empirical process of IX,I, ... , IX„I satisfies

(8) '/i[H,(x)—H(x)]=^ n Il nx, l . xt —F[ —x,x]j


n

(9) =f S E I {11F( X_)^ ^F(x) — F[—x, x]} for all x - 0 a.s.

(10) = v „(F(x)) —U ((1 —F(x))—) -0.


for x>

Thus

(11) /i[Hl n —H] U(F)— U(1—F) in II IIö-norm


for the special construction X,, = F_'(fn ,) of Section 3.1
(12) = U(F) — U(1— F) if F is symmetric about 0
(13) = B(2F -1) if F is symmetric about 0,

where (see Exercise 2.1.5)

(14) B(t) =U((1+t)/2)—U((1—t)/2) is Brownian bridge on [0, 1].

The empirical symmetry process S,, is defined by

(15) Sn=Vn[Fn-4n] on( - 0o,0c)

=U (F) +v n ((1— F) —)+V(F —F) on(—cc, cc) a.s.


(16) U(F) +U(1 — F) +/i(F —F) in II Il - norm
for the special construction of Section 3.1
(17) _ [U(F) +U(1— F)] if F is symmetric about 0
(18) = l8(2F —1) if F is symmetric about 0,

where (see Exercise 2.1.5)

(19) fR(ItI) °[U((1+t)/2)+U((1 — t)/2)]


and l« 1 — •) is Brownian motion on [0, 1].
Note that

(20) S n (x) = S n ( —x —) for all x >_ 0.


THE EMPIRICAL SYMMETRY PROCESS S , 745

For a continuous df F we define the empirical rank symmetry process R n by

(21) Rn=sn 0 An'=•In[ln 0 AIn' — ^n 0 An l ] on [ 0 , 1],

where c,, is a continuous absolute empirical df version that equals i/ (n + 1) at


IXI n :i for 0<_i^n (with 0°IXI n:o <IXI n: ,<. • •<IXIn:n<IXIn:n+l=+Xl the
ordered X; I's) and where H,, is linear on I X I n :i _, <_ x <_ IXI n , ; for all 1 s is n.
We let R+n; denote the rank of IX; I among X11,.. , IX,I, and we let Dn ; denote
the antirank defined by

(22) R no , = i or X D J.) = IXIn:i•

We note from (6), and then (19), that

R n (t) U(F 0 H '(t))+UJ(1— F o H '(t))


- -

+/[Fo119„'(t)—FoU-0(t)] in 11 11
(23) =U((1+t)/2)+Q..)((1 — t)/2) = R(t)

if F is a continuous and symmetric df

since for a continuous df F

(24) IIF-An'—F0H 'IIsIIHoAn'—III _*a.s.0


- asn->W

is just a "version” of IId„' — I I1 -4 a.s. 0 expressed in different notation. Note that

(25) when F is a continuous and symmetric df, then R. (1 — t), 0 <— t <_ 1, is just a
"version" of the partial-sum process of n independent rv's that equal ±1
with probability 2,

see Figure 1. Note that S. ° H n '(1— t), 0<_ t <_ 1, is exactly the partial sum of
n independent rv's that equal tl with probability 2; however, our reasons for

0 Plot of X 1 , Xn

3— ]RR(t) The i -th step is up or down


as sign(Xp,) is +1 or -1.
2 .— 0-- 4p-- The initial height is equal to the
number of observations less than 0
1 .-- 0. —. minus the number greater than 0.

t
--2-- • 0• n-1- _n 1 —

n+l n+1 n+1 n+1

Figure 1.
746 SYMMETRY

preferring Il (1- t) with jump points at i/ (n + 1) instead of i/ n will become


apparent in Section 4 on signed rank statistics.

Exercise 1. Compute the covariance functions between the processes B, l,


and U.

The definition of S„ seems to be found first in Butler (1969). The asymptotic


representations of the processes considered here are from Shorack (1970).

2. TESTING GOODNESS OF FIT FOR A SYMMETRIC DF

Suppose X„ 1 , ... , X.,, [with X„ = F ' ( „;) as in Section 3.1] are iid F where
; -

(1) F is symmetric about 0 (i.e. X = -X).

Then a natural estimate of F is

(2) ^,,=lfn+Fn) /2=[F,, +I—Frs( -- )] / 2 = (Hn +1 )/ 2 ,


where F. and F. are the empirical df's of the X,, ; and the -X„ , respectively.
;

Note that [see (22.1.6) and (22.1.7)]

✓n [U-U,(x) - H(x)]/ 2 = U,,(H(x))/2


for xE(0,00)
(3) Vi[F*n(x)-F(x)]= - ,'[F *(- x -)- F(-x -)]
= - ✓n [til„( -x -) -H(-x-)]/2
for xE( -x, 0]

for the empirical df H. of X,,,I, ... , IX,,.J with IX J = H. It is immediate from (3)
and (22.1.13) that

(4) Ik"i[r*- F] - !(2F-1) /2JI°_°m -f,0 as n->oo

for the Brownian bridge B(t)= v((1+t)/2)-U((1-t)/2) for 0 <-t<_1 of


(22.1.14) with B( --t) = -B(t) for O<_ t -1.

Exercise 1. (Schuster, 1973) Suppose F is continuous and symmetric. Then


EF *(x) = F(x) = EF„(x),

(5) Var [F(x)} = F( - Ixj)[ 1- 2 F( - Ixl)] / 2 n


and
(6) Var [F„(x)] - Var [F(x)] = F( -Ixl) /2n >- 0.
TESTING GOODNESS OF FIT FOR A SYMMETRIC OF 747

Also note the asymptotic versions of these formulas by comparing


Var [B(2F(x) — 1)/2] with Var [U(F(x))].

Supremum Tests of Fit

It is also clear from (3) and (22.1.3) that (see Schuster, 1973)

(7) / jj(F„ — F) # 11- ZjlU,,' JI for continuous, symmetric F


and
(8) •I II (IF — F) * JJ ' JJB(2F —1) # II = Zjw # 11 for continuous symmetric F.

These results can be used to test whether the true df is a specified df F. Also,
confidence bands based on ^^0=„ — FiI are exactly twice as wide as those based
on 11än — FII. Likewise, use (22.1.6) again,

(9)JJ(F —F)jr(F)JJ=2IIUJ„a(^( 21 1II for continuous, symmetric F

To test symmetry without specifying the/ true df, it is natural to reject for
large values of the statistic

(10) IJs )o — vi'liFn — ^rslfo


(11) Q f lIB f 1 ä if F is continuous and symmetric,

where 111 is Brownian motion. This limiting distribution is given by (2.2.7).


Butler (1969) gives an expression for the exact distribution. Obviously, weight
functions 4i can again be introduced.

Integral Tests of Fit

A natural integral statistic to use to test for symmetry is

(12) = s 2„ dH n .
T„J

By now, it is clear that for continuous and symmetric F we have


j
8 2 (2F-1)d(2F-1)=
TT -> 0
0 o f.,
R 2 (t) dt

as n - x. Thus

(13) T„ q T J t ß8 2 (t) dt,


0
748 SYMMETRY

where J (1- t), 0<_ t <_ 1, is the Brownian motion of (22.1.19). The percentage
points of T/4 are given in Table I from Orlov (1972), with the two adjustments
indicated by Koziol (1980a).

Table 1
Limiting Distribution of the Symmetry Statistic T,^
(from Orlov (1972))

z S(a) x S(a) z S(z)

0.000 0.0000000 0.500 0.9696850 1.000 0.9980828


.020 .1090357 .520 .9729687 .040 .9984532
.040 .2988227 .540 .9758841 .080 .9987512
.060 .4347773 .560 .9784745 .120 .9989913
.080 .5328109 .580 .9807779 .160 .9991848
0.100 0.6069192 0.600 0.9828274 1.200 0.9993409
.120 .6654505 .620 .9846522 .240 .9994669
.140 .7122236 .640 .9862778 .280 .9995686
.160 .7510634 .660 .9877268 .320 .9996507
.180 .7535899 .680 .9890191 .360 .9997171
0.200 0.8111307 0.700 0.9901722 1.400 0.9997708
.220 .8346457 .720 .9912015 .480 .9998494
.240 .8548546 .740 .9921207 .560 .9999009
.260 .8723127 .760 .9929419 .640 .9999348
.280 .8874587 .780 .9936759 .720 .9999570
0.300 0.9006453 0.800 0.9943321 1.800 0.9999716
.320 .9121604 .820 .9949191 .880 .9999813
.340 .9222416 .840 .9954442 .960 .9999876
.360 .9310872 .860 .9959141 2.040 .9999918
.380 .9388638 .880 .9963348 .120 .9999946
0.400 0.9457124 0.900 0.9967115 2.200 0.9999964
.420 .9517529 .920 .9970490 .400 .9999987
.440 .9570881 .940 .9973513 .600 .9999995
.460 .9618062 .960 .9976222 .800 .9999998
.480 .9659832 .980 .9978650 3.000 .9999999

Exercise 2. (Orlov, 1972) Show that

( 14 ) T= F 2 (t)dt = 41 zZ;
fo
, [(2) - 1)1rj

for iid N(0, 1) rv's Z,, Z2 ,.... (Recall the methods of Chapter 5.)
TESTING GOODNESS OF FIT FOR A SYMMETRIC DF 749

Exercise3. (Srinivasan and Godio, 1974) Let P k (let Nk ) denote the number
of positive (of negative) X ; 's for which jX; 1 <_ IXI n:k. Define S„ by 4S„ _
Y,k_ I (Nk —Pk ) 2 . Show that

(15) Tn =4S„/n 2 = J[S n +l_2F n (0)] 2 dH n

(16) --p T= [R(t)-18(0)] 2 dt=T= J O 2 (t) dt.


o o

These authors table the exact distribution of S. for 10 <_ n < 20. They suggest
that the asymptotic approximation (16) should give at least two-digit accuracy
for n>20.

Exercise 4. (Hill and Rao, 1977) Consider

(17) T„+T„=J {§n+[S„+1-2F n (0)] 2 }dH„.


0

Show that while T. and T„ are both invariant under the transformation x -* —x,
T„ + T„ also has the desirable property of being invariant under the tranforma-
tion x-3- 1/x. Thus

(18) T +T„ - T+ f= J {68 2 (1)+[l (t)-1(8(0)] 2 } dt.


T .

These authors table the upper 10% of the exact distribution of n 2 ( T„ +


for 10 n <— 24, and claim that n = 24 gives a good approximation to the
asymptotic distribution.

Components of S.

Consider the process Z. defined by

„]
(19)

Since ISS- 68(2F-1)II -0 for continuous and symmetric F, it is a minor


exercise that

(20) Z. Z = R( ^ 1) in f i, for continuous and symmetric F,


where Z, on [0, 1], satisfies

(21) Z(t),0<_t<_Z,is Brownian motion and Z(1—t)=Z(t).


750 SYMMETRY

Exercise 5. Verify (20) and (21).

Exercise 6. (Koziol, 1980a) Use the methods of Chapter 5 to show that

(22) 1(t)= Y- AZ;`l(t),


=1

where

(23) A3 =[(2j-1)7r] -2 and f(t) =2 sin ((2j— 1)irt)

with the f's orthonormal on [0, 2], and where Z*, Zz, ... are iid N(0, 1). Now
verify that the series in (22) converges uniformly and absolutely on [0, 1].

Exercise 7. (Koziol, 1980) Define the components

(24) Zn `
=^^ J;(t) 7 1(t) dtv
o

Verify that the Z„,, Z* 2 , ... are uncorrelated with means 0 and variances 1,
and that they are asymptotically independent N(0, 1). Then use integration
by parts to verify that

2 (

(25) Z= — j cos [(2j —1)irF(X)]


n l x^ ; <o

cos[(2j-1)arF(—Xn; )]
— xY o
J .

Exercise 8. (Koziol, 1980) Show that if F is replaced in (25) by the F of


(2), then the resulting statistic for j =1 is

rr(1+Rnr/n) 1
(26) Z„* , = — [sign (X1)] cos 2
fl i

(27) -gy J J*(t) d11(t)


0
as n-*,

where J*(t) equals f cos (or(1— t)) or — cos in according as 0<_ t <_ 2 or
2 <_ t 1. (Use the results of Section 22.4.) Koziol's consideration of Z*, is
based on the fact that the first component usually contributes most of the
power; he finds that this test has good power against location shifts.
THE PROCESSES UNDER CONTIGUITY 751

3. THE PROCESSES UNDER CONTIGUITY

Suppose now that with respect to Lebesgue measure .t on (R, P13)

(1) P„:Xn1i...,X areiddF,


where F is symmetric about 0 with density f,

is to be tested against

(2) Q„: X ,, ... , X„„ are independent with densities f ,, ... 'f,,,,.
As in Section 4.1, suppose

(3) I[h]I = ✓J h 2 dµ and !i= J h dµ for all he 5'2 (µ),


i
(4) max ---O as n -> o0
lsi_n a

(5) 3E.2(µ),
r 2 _ 1
-^-
(6) 1
y- I
L a',a s ] ,aa [a nl / aä -S J I^ 0
as n -* oo.
Theorems 4.1.1 and 4.1.2 show that the natural extension of the null absolute.
empirical process, namely

(7) Vi[D„(x)-H(x)]= 1 j ll [ l x ., 1 -, j -F[-xx] l for—oo<x<oo

(8) B(2F(x)- 1) +P„(a, 1) J X 2Sv7dµ


x
in 11 Il -norm, under (4)-(6),
for the special construction of Section 3.1
(9) I$(2F(x) - 1) if S is an odd function.

The proof follows from

✓n [H„(x) - H(x)] T i Yl {l [x., - x ^- F(x)}


i n
-= {1 [x,.< _ ] -F(-x)}

(10) U(F(x)) +p (a, 1) 28f dµ}


L
-jU(1 - F(x)) +P„(a,1) f_ z 26 / d/ } in II II-norm.
l

752 SYMMETRY

For S, we really just combine the bracketed results of (10) with a plus sign
instead of a minus sign. Thus, Theorems 4.1.1 and 4.1.2 show that the natural
extension of the null S n , namely
i n
(11) S (x)=^ ^ t {1 (x,, . ] -1 [ _ x1 .]} for —oo<x<oo

(12) F1(2F(x)-1)+pn(a,l)( J _^+ f_ x )2S^dµ

in II II-norm, under (4) \-(6),


for the special construction of Section 3.1. Moreover, the natural extension of
the null I,, (i.e., we now rank independent Xn ,, ... , Xnn with df's Fn ,, ... , F)
satisfies

(13) l n (t) i)( J


0
(l+t)/2

J0
(1—r)/z

+ )[28 o F '/ fo F '] ds


- -

in II II-norm
for the special construction of Section 3.1, if p. = Lebesgue.

Exercise 1. Prove (13) for UB n . Hint: Follow the outline of the proof of Theorem
4.1.2. There is only one real difference: Since E00, we cannot put in the
centering term in step (f) for free; however, a canceling term (F is symmetric)
must be put in for F

In the case of a simple location model, Eq. (2) is replaced by


n 1/2
(14) Xni = bani ani + Eni, with e,, = F ' (Sni), 1 < < < n
=1

where we suppose that the a ni 's are u.a.n. as in (4) and that

(15) Io(f)= f (f '/f) 2fdx<w.

Theorem 4.5.7 shows that (8) and (13) become

(16) ✓ n [l1l (x) — H(x)] I3(2F(x) —1)


in II II-norm under (4), (14), and (15)
and

(17) ll n (t) l«t)-2bp n (a, 1)f o F - '((l+t)/2)

in II II norm under (4), (14), (15),


-

To see this, note that 5_^ 2SVj dx = —bf in Theorem 4.5.1.


SIGNED RANK STATISTICS UNDER SYMMETRY 753
Nearly Null Alternatives

We note that contiguity can be relaxed if we center appropriately. Thus,

(18) 'f[H.—H n ] U(F)— U(1—F) in 11 II - norm;


in
hereHn -- Hn; ,
n ;=,
even if the alternatives are only nearly null in the sense that

(19) max 11F,,—F11-40


Il F, as n -> oo fora continuous df F.
Isisn

Likewise,
n

(20) §n =— { 1 t x.] —lt x,a—(Fn;—Fn;)} on(—oo,00)


Vll; =l

(21) U(F) +U(1 —F) in Ill-norm

under (15). We have not assumed F symmetric in this paragraph.

Exercise 2. On the one hand we have proved (12) and (13) under (1)-(6).
Now replace (1) by (4.6.1) and obtain the natural extension of Theorem 4.6.1.
That is, extend (12) and (13) to processes of residuals from a general linear
model.

4. SIGNED RANK STATISTICS UNDER SYMMETRY

The Null Hypothesis

We now consider the signed rank statistics

(1) Tn = i J^ Rn ' sign (X,;) =! J


( -j)) sign (X.D ;),
-

Vn n+l n;_, n+ l
where X,, 1 ,. . . , Xnn are iid with continuous and symmetric df F and J is a
score function on [0, 1] such as J(t) = t or J(t) = cos (ir(1 + t)/2). Of course,
the R „; and the Dn; are the ranks and antiranks, respectively, of the IX m; I.
Note that

(2) Tn=J'J(U-Un) dSn = J I JdS n °1'


0 0

(3) =
fo,
JdR,,
754 SYMMETRY

where we recall from (22.2.25) that

(4) R (1— •) is a partial-sum process of n


iid rv's that equal ±1 equally likely.

Also, recall from (22.1.25) that if X,= F '(e„ ; ), then (in fact)
-

(5) h^R„ —RI1 a. ,. 0 as n oo for the special construction of Section 3.1


where

(6) 1 t I+OJ( 2
R(t)_=U 2 1 t^

and R(1 //— t) is Brownian motion for 0 < t < 1.

Moreover, the empirical processes (F„ — F) and v„ of X„,, ... , X„„ and
„, respectively, satisfy

(7) i(F„—F)=1J„(F) with luv„ —vhf - a.s. 0as n-*x

These last two formulas are of interest since they allow an asymptotic rep-
resentation of T„ on the special probability space of Section 3.1; see (11) below.

Exercise 1. (i) Show that if J„(t)=c„ for (i -1)/n<t<_i/n, 1isn,


;

satisfies

(8) t[1„—J]I2- 0 as n -ooforsome JE?2(µ),


then

(9) T„= c,1 sign (X„o;.)=J J„dR„

(10) q T= J i JdR.

Hint: Mimic the proof of Theorem 3.1.2 given in Section 3.4. Note that (4)
makes calculation of variances easy.
(ii) Now note that the T of (10) satisfies

T=
fo
,
J(t)dR(t)= J
0
. J(t)d[U((1—t)/2)+U((1+t)/2)]

— Jo ^
dv(s)+
'/z J(1-2s)
f^/2
J(2s-1) dv(s)

i
(11) = J* dU
0

SIGNED RANK STATISTICS UNDER SYMMETRY 755

Figure 1.

where, see Figure 1,

(12) J (t)
J
* _ J(1-2t)
J(2t — 1 )
for0<t<2
for2^t<1;

J(t) =J*( 2
1 t ) for0<_t<1.

Having the representation (11) in terms of U makes it particularly easy to


obtain the joint asymptotic distribution of T with other statistics.
From Exercise 2.2.27 and J* =0 we know that

(13) T =N(0,o )
with

(14) Var [ T] = J* 2 (t) dt = J 2 (t) dt = v;.


0 0

Of course, the choice J(t) = t gives a version of the Wilcoxon one-sample


statistic, call it W„. Note from (3), integration by parts, and (5), that

(15) W„ W= ^R(t)dt=N(0i).
JJo
Contiguous Alternatives

Of course, we can obtain the asymptotic distribution of T. under the contiguous


alternatives Q. of (22.3.2) merely by computing Coy [ T, Z] for

(16) Z = f dw° J
W cpo d ° with cpo
o `l /f(F )
' o ✓.%(F-5)
756 SYMMETRY

see (4.1.18) and Theorem 4.1.4 (Le Cam's third lemma). Note that

(17) Cov [T, Z] = p(a, 1) J J*^p o dt provided p„(a, 1)--> some p(a, 1);
0

recall (4.1.22). Thus, go to subsequences if necessary if p„ (a, 1) fails to


converge,

(18) TT — p„(a, 1) J*V o dt -+ d N(0, Q;) under the Q. of (22.3.2)


0

as n -^ x .

Exercise 2. Use the Lindeberg-Feller theorem to obtain asymptotic normality


of T„ for any u.a.n. cm 's. Note, however, that you do not have a representation
of the limiting rv.

Fixed Alternatives

Exercise 3. (i) (Puri and Sen, 1969) If J is absolutely continuous on


[e, l — e ] for all e > 0 with

lJI _ M[I(1—I)] - ► i z+s and ^J'I <_ M[I(1—I)] -3 / 2+s on (0, 1)

for some 6>0 and 0 < M < oo, then

(19) TV[f J(n) dS„— J^ J(H)d(F—F) I


o, o J

(20) J J(H)d[U(F)+v(1—F)]

[U(F)—U(1—F)]J'(H)d(F—F)
+J 0

(ii) An analogous result that also allows discontinuous J is found in


Shorack (1970). Show that if J= I ( ,,for P some 0 <p < 1, then when F is
sufficiently regular, the T,, of (18) satisfies

(21) T„ = l%(2F0 H -1 (p)-1)+(1-2a)3(2Fo H '(p)-1), -

where a denotes the derivative of the necessarily absolutely continuous func-


tion F o H '. -

(iii) Of course, the results of (i) and (ii) can be added together if J has
both types of components.
ESTIMATING AN UNKNOWN POINT OF SYMMETRY 757
5. ESTIMATING AN UNKNOWN POINT OF SYMMETRY

Estimators Based on Signed Rank Statistics

Suppose that

(1) Fis symmetric about 0 and I 0 (f)= I (f' /f) ZJdx <oo.
1 r

f,,;).
Let X,,,, ... , X,, be iid F,, = F(• — 0) with 0 unknown; that is, X„ ; = Fo'(
Suppose also that

(2) J is / on [0, 1).

Define R(b) to be the rank of the IX, — b!, and let

(3) T(b)7 \ = n +b)/sign(X.i —b).

Note that

(4) T„(b) is ,, in b.
Define

(5) 6( 0 +8**)/2 where


B* =inf {b: T(b)>0} and 6* *=sup {b: T„(b) <0}.
Now, as in (22.4.18),

(6) T„(b / ✓n) d —b J *cp o dt+ J* dU


0 0

where the right-hand side of (6) is also \ in b. With /i(B„ — 0) -^ b, Eq. (6)
suggests (see Exercise 1) that

(7) fn(8n -0) J J *dUJ/ J *v


0 0
o dt

for the special construction of Theorem 3.1.1.


Now suppose 0<a<1 and t;,°` = inf {x: P(T„ > x) = a}. Let
)

(8) 0„ = sup {b: T(b) < — t „°^ 2) } and 9„ = inf {b: T(b)> t;,°^ 2 }.
758 SYMMETRY

It is easy to check that

(9) 1—a?P0(Bn^O: ),

so that [B„, B„] is a (1—a) • 100% confidence interval for 0. Then (6) again
suggests (see Exercise 1) that

(10) .fn (9„ —0) Q z (./2) vJ — J o J* dU ]


f—
L Jo
J*cp o dt j
J

and

i i
(I1) In(@-8}
a
— z (a/z) Uj — f J* J*(Gods];
JdU]
o[—J0

so that the asymptotic length of the confidence interval is

(12) 'Ji(9„ — 8) 2z ( a^ 2) OJ1 /[ — JJ*dcp o dt].


0

These estimators and confidence intervals were proposed by Hodges and


Lehmann (1963).

Remark 1. For our statistics T. one can show that T. -+ p (some To ), for
the special construction of Theorem 3.1.1, under the null hypothesis. Then for
the contiguous alternatives of (4.1.4)-(4.1.6) one immediately has

(13) T(b) -* d bp(a, c) J 2Sv7dµ+T as o

The monotonicity of both T(b) and the rhs of (13) immediately yield the
asymptotic linearity

(14) supB I T„(b)—bp(a,c) J 2Sfdµ—T0 ß p 0 as


l

all 0 ^ B < oo. Of course, this extends immediately to vector-valued b as


in (5.5.6)-(5.5.8). Use this approach in Exercise 1.

Exercise 1. Using the technique of Section 20.3, establish (7) and (10)-(14).
Extend this to location type regression situations.
ESTIMATING THE DF OF A SYMMETRIC DISTRIBUTION 759
Estimators Based on Integral Test of Fit

Exercise 2. In the spirit of the Blackman estimator of Section 5.9 we could


define B„ to be any value b that minimizes

(15)J [F,(O +x) +F„(O —x —)-1] 2 d[F (O+x) —F„(O —x)].

Show that under regularity on F we have

(16) N
I—
n(ô —0) J f2 (x)11«2F(x) —1) dx/J f3 (x) dx.

Exercise 3. (Shorack, 1970) Modify Exercise 1 by choosing B„ to minimize

[0=„(6+x) +1= n (9— x —)-1] 2 dx.


(17) E

Show that under regularity on F we have


t ('

(18) Vi(6„ -0) 2 (' R(1)dt/ ^fz(x)ds f2(x)dx


=2W/J
for the Wilcoxon statistic of W of (22.4.15).

6. ESTIMATING THE DF OF A SYMMETRIC DISTRIBUTION


WITH UNKNOWN POINT OF SYMMETRY

Suppose that

(1) F is continuous and symmetric about 0.

Let X. , ... , X„„ be iid FB = F(- — 0) with 0 unknown; that is, X„ ; =


Suppose 6„ is an estimator of 0 for which

(2) Vi(9„- 0) =Op(1).


Estimation of Fe

A natural estimate of F0 that exploits the assumed symmetry is

(3) EFn =2[0 „ +L „]= 2[F,+1—F (26„— • —)],


760 SYMMETRY

where t„(x) =1— F„(29„ —x—) is the empirical df of the 2B„ —X ,'s. By way
of regularity of F, we assume

(4) f— F' is uniformly continuous.

To phase our result we require

(5) [Fn = 1 [L` n +F R ]= 2 [F„+1—F n (20— —)],

where F„ =1-1= „(20 — • —) is the empirical df of the 20 — X„ ' s. Then (see ;

Schuster, 1975)

(6) —9).% }jI -p v 0


under (1) and (4)

as n - oo.

Exercise 1. Prove (6).

Exercise 2. Show that if Var [B„] <oo, then


(7) Cov [e *„ (x), ] = 0 for all x.

Exercise 3. Suppose (1) and (4) hold. Suppose Var [ B„ ] < oo and V(& —
-
0)
0) d N(0, o.2 ). Then

(8) ✓ n[tn(x)—F6(x)] * dN(0,iF(xe)[ 1-2 F(xe)]+o 2f 2 (xe))

as n *x, where x ® = 0— Ix—ol. Also,

Asym Var ['/i(F*(x) — FB (x))] — Asym Var [v(F„(x) — FB (x))]

(9) 2F(xe)-0.2f2(xo)•

Suppose ,/n (9„ — 0) has the asymptotic representation

(10) '(9„-0) = T-T(U)

for a special construction X,, = F - '( ; ; ). Then

(11) v'i(f* —F0 ),B(2F-1) +fO T(U) in ' under (1), (4), and (10)

for the Brownian bridge B(t) = v((1 + t)/2)+v((1— t)/2) of (22.1.14) and
(22.2.8).
ESTIMATING THE OF OF A SYMMETRIC DISTRIBt;TION 761
Exercise 4. Suppose either (i) F is normal and B„ = X„, (ii) F is double
exponential and 0„ is either X„ or the median X„, or (iii) F is Cauchy and 9„
is the MLE. Show that the expression in (9) is -0 for x for all the special
cases described here. Can you show it more generally?

We have followed Schuster (1975) up to now.

Estimation of F

We now turn from the estimation of F,, to the estimation of F. Let

(12) 1='( x )= 0=* (x+9 „)= i{F„(x+8„)-+'0=„((x - 9„)–)}


(13) -*F(x) as n-* x,

and note that U„ is symmetric about 0. Moreover,

/[in(x) – F(x)]
=W [f„(x+8„) +l`„((x – B„)–)–FB (x +0)-1 +F9 (0 –x)]

= Z{f [F„(x+ B„) – FB (x+ 8„)]+^[FB (x+ 9„) – FB (x+ B)]


–. 1 n [ l=„( 6„– x –)–Fe (B„–x)J– ✓n[F9 (8„– x) –F9 (0 –x)]}

(14) =i[U„(F9(x +9„))- 4J „(F9(O„–x)–)J


+2[ .f(F(x+ 8„ –0)– F(x)) –V (F(–x+ 9„ –0)– F(–x))]
-' z[U(F(x)) – aJ( 1 –
F(x))]=i8(2F(x) - 1)

by symmetry of F. Thus

(15) jl./n(1. –F) –B(2F- 1)/21 --> v 0 asn->X under (1) and (4).

Comparing with (22.2.8), we see that we can estimate F equally as well whether 0
is known or unknown. This is not the case when estimating F9, as (11) shows.

An Estimate of 0 and an Alternative Test

Exercise 5. (Boos, 1982) (i) Show that

(16) dn (0)– J [F n (0 +x) +F n (0– x) -1] z dx

X; +X
_– n
z21IX, – X;I+ ni I2

I ^–o.
;,;

762 SYMMETRY

Thus d(0) is minimized by

(17) B„=median{(X +X,)/2: 1<—i,jsn}. ;

which is a variant of the Hodges-Lehmann estimator. (Note Exercise 22.5.2.)


(ii) It is reasonable to reject symmetry based on large values of the estimate

(18) T„=nd(Ö„)l[((n-1)/n)äc] with äG=ly,IX+ — X;^/( 2 )•

Show that for sufficiently regular F we have

TT-*,T=
11' .
[R(2F-1)]2dx

f 2 (x) dx}/Qc
—[ J o' 68(2t-1)dt ]
w

as n-*oo where QG = EJX,—X-^.


(iii) If F is the logistic df F(x) = [1 +exp (— (x/Q))] ', then -

(19) 7 . _ Z2j-1 _ Z
j = 2 1(2 . 1) = 1 (2i+1)(i+1)
for iid N(0, 1) rv's Z 1 , Z2 ,.... Note that for logistic F we have

J [R(2F-1)] 2 dx = J{{U(t) +U(1 — t)] 2 /[t(1— t)]} dt


o

_ z 2 _,
(20) ; 'j(j -1 / 2 )'
while
' i
{21} j I8{2t-1} dt] f2(x) dx=2Z;
JJo

and aQ=2.
CHAPTER 23

Further Applications

1. BOOTSTRAPPING THE EMPIRICAL PROCESS

Let X„ = (X 1 ,. . . , X„) where the X; 's are iid F. Let F. denote the empirical
df of these observations. Suppose now that the distribution of some ry R(X,,, F)
depending on both X„ and F is desired. The bootstrap principle of Efron (1979)
suggests approximating the distribution of R(X„, F) by that of R(X, F„),
where X n = (Xe,. .. , X) with the X *'s being an independent random sample
from the df F „. It is possible to obtain as many independent values of R( ( *, F,)
as desired by Monte Carlo sampling, and thus estimate the distribution of
R(X*, IF„) to whatever accuracy is desired. This distribution is then used as
an approximation to the distribution of R(X„, F).
The fact that this procedure is asymptotically correct in many situations is
well established in the literature. Here we simply show that the empirical
process /i(F„ — F) itself can be bootstrapped; this result is due to Bickel and
Freedman (1981), but the following simple proof is from Shorack (1982b).
Let X 1 , ... , X. be iid F with empirical df IF,,. Let X,. . . , X* N be iid F„
with empirical df F. We will assume that a special construction of the X's
has been used. Thus X^ ; =F( 1 ) where. , C NN are iid Uniform (0, 1)
rv's whose empirical process U *N satisfies I^ aJ N — UJ ll ^ a . s . 0 as N -> m for a
Brownian bridge U. Thus

II ✓N(FN—F n )—U(F)11
=11UN(F„)—U(F)1I
5 IIU (Fn)—U(F,,)II+11U(IFn)—U(F)

IIU *n, — UI)+IiU(F,,) — U(F)Il


(1) -* a . s . 0 as n n N --> oo.

From this we can draw the following conclusion, which does not depend on
a special construction for its validity.
763
764 FURTHER APPLICATIONS

Theorem 1. For a.e. infinite sequence X 1 , X2 ,.. . , X,,, ... , relative to the
conditional distribution given X 1 ,. . . , X„, we have

(2) 1T(F —I=„)= U(F) as n A N--*cc.

Corollary 1. Let T„ = T(./i(F„ — F)), where T is a (I II-continuous functional


of the empirical process. Then for a.e. infinite sequence X 1 ,. . . , X,,. . . , relative
to the conditional distribution given X1 ,. . . , X„ we have

(3) T(/N(FN—F1))-+dT(U(F)) as n A N-->oo.

Of course (3) is the desired conclusion. We know that T(/(F„ —


F))- d T(U(F)) as n->co. However, though the distribution of T(U(F)) may
not be known, the approximating distribution of T(/N((F*N —F„)) = T(U(F,,))
can be Monte Canoed and used to approximate both the distribution of
T(U(F)) and the distribution of T(V(F„ — F)) = T(U,, (F)). The choice N = n
seems obvious, but the theorem is true more generally.

Exercise 1. Extend Corollary 1 to II /9II continuous functionals T for an


-

appropriate class of functions q.

2. SMOOTH ESTIMATES OF F

Let k be a nonnegative function satisfying

(1) Jk(x)dx=1. xk(x) dx=0, and k 2 = x2k(x) dx<co;

and suppose b.10. If X 1 ,. . , X,, are iid F with density f = F', then

(2) .%, (x) = xb, Y/ dF„ (Y)


1 ^^ k \ x b X/ b,, J k ^

is a kernel estimate of f (k is the kernel function), and the corresponding


estimate of F is

(3) „ (x)=
fx .1,(t) dt=
00
I x j;*_) dFn
K(\ _ (Y),

whereF,, istheempirical df of the X's and K (x) S%k(y) dy, C=„isasmoothed


version of F,,.
SMOOTH ESTIMATES OF F 765
Consider the process

(4) SS„ = ✓n[i„ — F] on (—oo, co).

If we let e, , ... , 6„ be iid Uniform (0, 1) rv's and construct the Xi 's as
X = F '(e ) for i ? 1, then U. (F) = v'i[F„ — F] and a natural question arises:
; - ;

How close are the processes X. and OJ „(F)? The following theorem gives an
example of this type of result:

Theorem 1. Suppose that 11 H 00 < 00 and that f has derivative f with f ^^ 00 <
oo. If k satisfies (1), then, with c„ -> oo,

(5) IIxn—U,(F) wn(Iifllbncn)+411Un11 J k(Y)dy+M^b:,


.

where w,,( ) is the oscillation modulus of U,, and M = k2 iI f'iI ^/2.

Proof. First write, using integration by parts,

(a) X,, (x) = fn[rF„ (x) — E1„ (x) + EI,, (x) — F(x)]

= J - ) _
K( Xy dU,(F(Y))+v(F,(x) F(x))
b„ —

—f 00
vn(F(Y)) dK(x—y)
\ bn +
✓ n(FF(x)—F(x))

J
= — . aJ„(F(x—sb„)) dK(s)+^(F„(x)— F(x)),

where

x—yl
(b) F„(x)=Et „(x)= f ' K( dF(Y)•
J oo ` bn J
Thus

(c) X ,,(x) —U,(F(x))

J
_ — [UJ„(F(x—sb„))— v „(F(x))] dK(s)
+ f(F„ (x) — F(x))
= Rln(x) + R2n(x).

766 FURTHER APPLICATIONS

Now with c = c„

II R InII ' [U„(F( •— sbn)) — U (F(•))] dK(s)II-w


J[-c,c^

[U (F(•—sb„))—U„(F(•))] dK(s)
+II1, ^-^^^ - '
ü) , (an)+ 2 IIU II K ([ — c, c] c )

(d) = cü (an)+ 4 1IUn 1I


fW k(y) dy,
where a„ = IIfII'-=b„c„. Furthermore, using integration by parts and a Taylor
expansion,

R2n(x)=.lnl J K( x—yl dF(y)—F(x)J

= J . [F(x—sb„)—F(x)]k(s) ds
_—Vii sb„f(x)k(s) ds+Vii
J f ^ 1 s2 bnf'(zs )k(s) ds
.2

= 2 ib„ s2f'(zs)k(s) ds J
since y dK(y) = 0,

where z is between x and x — sb„. Hence


5

rb,Ilf' lk2
(e) IIR2n11W-co 2

Combining (c)-(e) yields (6). U


Corollary 1. If f and k satisfy the hypotheses of Theorem 1 and b„ - 0 and
c„ - satisfy nbc„/log n - x, then

(6) JjXn—v„(F)II=a.S.O\ bcnlog(


- )

+ log log n J k(y) dy+fib„ f

as n -^ X .
Proof. This follows immediately from Theorem 1, Theorem 14.2.1, and
Theorem 13.1.1. ❑
THE SHORTH 767

Exercise 1. If k is the standard normal kernel k(x) = (2ir) - ' 12 exp (— x 2 /2),
find c„ and b„ which minimize the bound on the right-hand side of (6). (Claim:
the "right" b„ is b. = n —'13 .)

3. THE SHORTH

Suppose X,, ... , X„ are iid FB = F(. — 6) where

(1) F has a density f that is symmetric about 0.

We wish to estimate 0. We consider as our estimate B„ the a- shorth, which is


defined to be the arithmetic mean of the shortest (1 — 2a) - fraction of the sample.
We will require the following regularity on the density f. Suppose

(2) f is a strongly unimodal density having a continuous derivative f.

Theorem 1. (Andrews, Bickel, et al.) If (1) and (2) hold, then

(3) n'/3(0)—,ABT

where

(4) A= ff2(F-'(1 —a)) 2 "3 B=2F - '(1 —a)


—f'(F - '(1 —a)) }
{ 1-2a

and T is the random time defined by

(5) t2 +7L(t) attains its minimum at r,

where Z is two-sided Brownian motion on (—co, co). [The exponent 3 in (3) is


correct.]
Proof. Without loss of generality, set 0 =0. We define

(a) G' 10 F.
G=F ',g-G'= and g'=G "= —( foF
1)3.

Our first task is to determine which (I —2a)- fraction of the data is the
shortest one. To this end we define
(6) 1 ff(t) = n 2/3 LXn:(n(I - 2 F —' (1— a)]
o +Atin't 3 )) — Xnc((a+Atfn' /3
))

the length of the (1 -2a)- fraction


= n z " of the data centered at — 2F - '(1 —a)
((1— a) /2 +At /n'/3)

768 FURTHER APPLICATIONS

(b) = n 2/3 [Xn:(n(1-a+At/„'/ 3 » — F -' ( 1 — a+At/ n 1J3 )]


— n 2/3 [X„:c„ca+Av„" 3 )> — F -' (a + At/ n 1/3 )]

+n 2/3 [F - '(1—a+At/n'/ 3 )—F - '(1— a)]


—[F - '(a+At/n 1 / 3 )—F - '(a)]
(c) =M^„(t)—M2„(t)+Ms„(t)•

We note that for any d >0 the Hungarian construction of the process

(d) (t)=V'[X:(„—F '(t)J=Vi[0 '(t)—F (t)]

satisfies (see Theorems 18.2.1 and 12.2.2, but note that we are concerned here
only with the interval [—d, d])

(e) nl/6IIR„—gB„1Id d^as.0 as n->cc

where B. is the Brownian bridge related to a Brownian motion S on [0, co)


via the equation

(f) 8„(t)=[§((n+1)t)—t§(n +i)]/ n+l

[note (12.2.4)]. It is immediate from (e) that for any 0- K <oo

(g) Ilslup M^„(t)—g At 1/6


1—a+ni/s n D^„ 1—a+ n
n
At

a.s. 0 as n-o0.

Moreover,

I sup I +nt31n'
/6t3„(1—a+n
t

r At )
—g(1—a)n' /6 B„tl—a+ l
n /3

<i sup I—a+ At— g( 1—


I ItI^K n / a)I}

sup n'/ 6 IQ^„(1 — a+ `93) }


xfjlj ^gK

(h) = o(1)O(1) a.s.


(i) =o(1) a.s.
THE SHORTH 769

[The reader is asked to verify the 0(1) term of line (h) in Exercise 1 below.]
Since the M en term is analogous to the M,„ term, we have thus shown that

sup IM1n(t) — Mzn(t) — 2 Ag(1 — a)Zn(t)


ItI^K

— g(1 —a)B „(l —a)+g(a)IB (a)I


(l) a.s.0 as n -^ oo.

where (see Exercise 2)


1/6

(k) Z„(t)= n 2A j J D$„(1 —a +n


/3
\— B„(1 a)^ —

— B„ a+ _ n (a) }
L \ n 3/ J

(I) =two-sided Brownian motion on ItIs n'" 3 a /A.

It is clear that for any Os K <m

sup IM3 „(t)— A 2 t 2 g'(l—a)I


ItIsK

= Sup I M 3(t)'

Iris K

—n2/3 1 [g( 1 — a) — g(a)] +2 zl3[g'(1 a) g'(a)]^ — —


1/3

(m) _+a.s.0 as n -+ o0

by a Taylor expansion of F'.


Combining 6) and (m) into (c) gives

sup IM„ (t) —A 2 t z g'(I—a)—'J2Ag(1—a)Z „(t)


ItI^K

— g(1 —a)[I „(1 —a) +B „(a)]I


as n-* o0

for any O<_ K < oo.


This paragraph is one for which only heuristic arguments are presented. If
T. is defined by

(7) T„ minimizes IXn:(n(l-a +r )) — Xn:(n(a+ n))I,



770 FURTHER APPLICATIONS

then, asymptotically, n' /3 Tn /A plays the role of tin (6). Thus (7) shows that
n' / 3 Tn /A behaves asymptotically like the value T. that minimizes

(n) A2t2g'(1-a)+ ZAg(1-a)Z n (t)


A2g'(1-a)
- 2Ag(1-a) tz+zn(t)J
[-,/-2Ag(l -a)
(o) = 2Ag(1 -a)[t 2 +Z n (t)).

We thus have

7,
(8) n1 /3T/A-d

where r is defined by

(9) r minimizes t 2 +7(t) on (—oo, co).

This completes our first stated task.


The a- shorth O n clearly satisfies

1 -2a
(p) (1 ' 2a)n V3 Bn
-
= n1/3 I 0
Xn:n(a +T +u) du+op (1)
-2a
f
1
n 3 [IF„'(a+Tn+u)-F-'(a+Tn-u)
o
+F - '(a+Tn +u)-F - '(a+u)] du+oo(1)
1 -2a
(q) =oo(1)+n' /3

I -2a
J 0
[F-'(a+Tn+u)-F-'(a+u)] du+o o (1)

= n' / 3 Tn g(a+u) du+op (1)


0

= n1/3TF-1(a+u)I1-2a+o (1)
(r) =n 3 Tn 2F - '(1-a)+o(1).

Thus

n' /3 Bn = n 3 T„B+oo (1)


- , dA• T . B

using (r) and (q). ❑


CONVERGENCE OF U- STATISTIC EMPIRICAL PROCESSES 771
Exercise 1. Show that for any 0 -s K < co and 0 a < 1 we have

sup n' / 6 IB n (a +K/n' /3 )I = 0(1) a.s.


I11^K

Exercise 2. Show that for any 0< a <

(10) Z(t)= 2d {[ U ( 1 —a+dt)— U(1 —a)]—[U(a+dt)—U(a)]}

is a standard two-sided Brownian motion for I il <(1 — 2a )/ d.


Remark 1. Suppose instead of averaging the shortest (1 —2a)- fraction,
we use its median to define an estimate 0n . Then line (p) of the previous proof
becomes
nl /3
/Bn — 0) = n '/3 (Xn:n( n+1/2) — 0)
= n 1/3
[F - '(T' + ! ) — F - '(')] -+- oc (1)
=n 3 Tn /f( 0 )+o (1)
(11) 'dAr /f(0) as n -cC.

Thus we see that the constant A is inherent to this type of estimator, while
the other constant depends on what functional is applied to the shortest
(1 —2a) -fraction of the sample.

Remark 2. Similar techniques can be used to find the asymptotic distribution


of the sample mode. Venter (1967) can serve as the model for such a result.

4. CONVERGENCE OF U- STATISTIC EMPIRICAL PROCESSES

Let X 1 ,.. . , X. be iid on some space and consider

(1)

where h is a real symmetric function (the kernel) of m variables and >, denotes
the summation over all (,„) combinations of m distinct elements {i,, ... , im }
chosen from {1, ... , n }. Tn is called a U- statistic. We suppose that
(2) Ehe° Eh 2 (X,,...,Xm)<aC,

and let

(3) 0 ° Eh(X,, .... Xm).


772 FURTHER APPLICATIONS

We let

(4) v2=Var[E(h(X,,...,X,„)IX1)], and we assume o. 2 >0.

It is well known, see Serfling (1980), for example, that

(5) n(T„-0)-*dN(0, m 2 o 2 ) as n-*

provided (2) and (4) hold. If T„ is a second such statistic with kernel h, then
(T,,, T„) are jointly normal with covariance

(6) Cov [E(h(X,, ... , X,„) I X,), E(h(X,, ... , X„,) X,)]•

These results are immediate once one shows that

(7) E(T„—T)2-+0 as n-^oo,

where

(8) T*=m
✓n E
;_ ►[h,(Xj)-6] with h I (x)=E(h(X,,....X,„)IX,=x).

Note that ov 2 = Var [h,(X; )] and that the covariance of (6) is


Coy [h,(X ), h,(X )]. It is also elementary that
; ;

(9) 0 o 2 ^ Var [h(X,, ... , Xm)] = Eh e 02.

We now consider the U-statistic empirical process

(10) Z,,(x)= ✓[0-ß„(x)—H(x)] for —<x<,


where

(11) H denotes the df of h(X,, ... , X,„)

and H. is the U-statistic empirical df

(12) 9-0„(x)= ^ n` 1 1 [h(X 1 ......X )max] for —<x<.


m )1
m)

Since 6-0„(x) and H,, (y) are U-statistics whose kernels are indicator functions
with means H(x) and H(y) and finite variances, the joint normality above
clearly gives the finite-dimensional convergence

(13) Zn _>f. d . Z as n- oo
CONVERGENCE OF U- STATISTIC EMPIRICAL PROCESSES 773
where 7L is a 0 mean normal process with

(14) Coy [Z(x), Z(y)]


=Cov [P(h(X,,...,Xm)_ xIXI), P(h(X1,...,Xm) yjXt)]-

We note from (9) that

(15) Var [Z(x)] <_ H(x)[1– H(x)].

The following result can be found in Silverman (1983).

Theorem 1. (Silverman, 1983) If Eh e <oo and Q 2 >0, then

(16) Z I ,o as 0 as n-* oo for a special construction

of the rv's h(X11 , ... , X,,) and of the normal process Z above.
Proof. In this first paragraph we present the key idea. Let a = (a (1), .. . , a (n ))
denote an arbitrary permutation of (1, ... , n), and define iid rv's

(17) Y;+1 = h(Xa(im +l) , Xa(jm+i), ... , Xa(Jm+m)),J =0, 1,... ,(n/m)—'1.

Now Y,,..., Y(nim) have true df H, empirical df

I (n/'n>
(18) Hn = (n/m) j t l^r;<•>>

and empirical process

(19) (n/m)[l-E – H] =U(n , m) (H) on (–co, x);


a.s.

here OJ^n1m) is the uniform empirical process of; , ... , f(n , m) where f; = H(Y )
with Y = H being the associated continuous ry of (3.2.48). Note that

(20) Hn = 1 [., Hn-


n! alla

The interesting feature of (20) is the fact that although the H„ are dependent,
each H, is the empirical df of an iid sample of size (n/m). Now note that

(21) Z n = V[H n – H] = Wn (H),

where

(22) ON n =— D1<n ^ m) .
n! ally
774 FURTHER APPLICATIONS

[Recall that the definition of the Y's involves extraneous Uniform (0, 1) rv's.
However, the key equation Z n =W(H) of (21) shows that almost all sample
paths of Z n are totally unaffected by these extraneous rv's. Thus we can
introduce them in the most convenient manner possible. We now specify that
the extraneous rv's introduced for permutation a will be totally independent
of those introduced for a * whenever a 96 a * . Note that this does have an effect
on the sample paths of W,,, even though it does not have an effect on the
sample paths of Z. It is simply that Z n =W(H) does not "look in on" W n
at any points where there is an effect.] Using W Z to denote the modulus of
continuity of the process Z, we have

(23) Eww.(a)<nI ^ Ewä^(a)


a im
(24) = Ew u (a)

since the sup of a sum does not exceed the sum of the sups. Now Chebyshev's
inequality gives

(25) lim lim P(w w ,(a)>_E)<_lim lim e 'Ew, w ,(a) -

a j O n-.00 alp m-.J

(26) <lim lim e - 'Ewv u (a) by (24)


ajo n--o

(27) =0 for all e>0 as in (7.1.13) and (7.1.14).

We also have

(28) Wn "'f.d. W for some 0 mean normal process.

If s and t are in the range of H, then Coy [W(s), W(t)] is of the U-statistic
form (14); while on each interval [H_(d), H(d)] associated with a point of
discontinuity d of H the process W behaves like Brownian motion conditioned
to match up with the other part of W at H_(d) (this is a result of how we
choose our extraneous rv's above). Thus W n =W on (D, 9, II II) by Theorem
2.3.1. A Skorokhod construction as in the proof of Theorem 3.1.1 now finishes
this proof. 0

Exercise 1. (Silverman, 1983) Extend Theorem I to convergence in 11 1q11


metrics. (Silverman obtains a class of q's in between the classes Q and Q* of
Section 11.2.)

Exercise 2. (Serfling, 1984) Let N= (;„), and let Wn: , <_ Wn:N denote
the ordered values of h(X ., ... , X,,,). Consider the
; generalized L-statistic
/ N
( 29 ) Tn — Cni Wn:i
i =1
RELIABILITY AND ECONOMETRIC FUNCTIONS 775

for known u.a.n. constants c„,, ... , c„ N . Establish conditions under which T.
is asymptotically normal.

5. RELIABILITY AND ECONOMETRIC FUNCTIONS

Let X be a positive ry with df F on R + and having mean Fc = E(X) _


,(ö x dF(x) = Jö F(x) dx <ax where F 1— F. The mean residual life function
e, Lorenz curve L, and scaled total time on test function H ' /^s corresponding
-

to F are defined by:

(1) e(x)=E(X—xjX>x)=l^F( y)dy , 0 <x<oo,


F(x)

F '(s ) ds
(2) L(t)- 1 F '(s) ds= lo 0 <t<1
fo,
µ Jo
-

F '(s) ds'
- '

and

i , F1
O F(x) dx ö
(3) µ H (t) =^ F(x) dx= 0s t <_ 1.
F(x) dx fo
50 F(x)dx

Note that e(0) = p, L(0) = H '(0) /µ = 0, and L(1) = H '(I)/µ =1.


- -

If F is a survival function, then e(x) is the mean remaining lifelength of


those individuals who survive beyond x. Similarly, L(t) is the fraction of the
total mean survival time contributed by those individuals with survival times
in the lowest tth fraction of the population, while Proposition I will show that
H '(t)/µ = E(X A F '(t))/µ which is the scaled total time on test of the
- -

initial tth fraction of the population. While e and H '/µ have been used
-

primarily in survival analysis and reliability (where the exponential distribu-


tions play an important role), the Lorenz curve L has been widely used in
econometric applications involving income and inequities of income distribu-
tions. Note that (4) below implies that the scaled total time on test function
is the identity function when F is any exponential distribution, while (5) below
implies that for any "perfectly equitable” income distribution, that is, an
income distribution concentrated at some µ, L is the identity function or the
45 0 line. Thus we record that when F is the exponential having F(x) =
exp ( —x /µ), x >_0,

e(x)=µ for0<x<cb,
(4) L(t)= t+(1— t)log(1 —t) for0^t<_1,
Exponential F
1 H '(t)=t
- for0t<_1,

776 FURTHER APPLICATIONS

and, when F is degenei ate at µ, F(x) = 1 [O ) (X), then


+
e(x)_(p —x) for 0 <_ x < oo,

(5) L(i)=z for 0- t:1,
Degenerate F
for 0<_t-1.

The Lorenz curve L is easily seen to be convex [since L'(t)=(1/µ)F '(t) is -

1'], and hence L(t)st for all 0 <_ t <_ 1. If F is absolutely continuous with
density function f = F', then

t) = fo 1 —t
dt H ^( -'( t ) _ A o F(t)

for almost every 0< t < I where A —f/ F is the hazard rate. Thus H ' is concave -

if A is 1', while H ' is convex if A is %. For more information about e, see


-

Hall and Wellner (1981); for more about H ' see Chandra and Singpurwalla
-

(1978) or Barlow and Campo (1975). Goldie (1977) gives a brief review of
applications of L including distributions of scientific grants and publishing
productivity of scientists.
These three transforms of F are all closely related:

Proposition 1. (Chandra and Singpurwalla, 1978) For any df F

(6) L(F(x))=1--- P(x)[e(x)+x]


I.'

and, if F is continuous,

(7) 1 H '(t)=L(t)4(1— t)F( 1) = L( t) + (1— t)L'( t).


-

µ µ

Proof. To prove (6), note that

(a) F(x)[e(x)+x] = xF(x)+ J F(y) dy


x

=EX1 (x , ) (X)

=p. —
EX1(o,x](X)

=p.— J(O,F(x)]
=µ.(1—L(F(x)).
F-'(t) dt by Theorem 1.1.1

RELIABILITY AND ECONOMETRIC FUNCTIONS 777


To prove (7), note that by Fubini's theorem

(b) H-'(t)=
j F 4n
J F'(x) dx=
f f
oF-1(
)

dF(y) dx
0 (x,-)

= f0
F '(t) n y dF(y)
-

ydF(y)+F-'(t)(1 —t)
(O,F '(t)]

= p.L(t)+ F '(t)(1— t), -

where the last two inequalities follow from continuity of F and Proposition
1.1.1. ❑

If X 1 ,. , X. are iid with df F on R + , natural estimates of e, L, and H '/ p -

are obtained by replacing F and F - ' by F. and F „'. Thus

^n(Y) dY ^ <x<oo,
(8) en(x)=j^ 0_
^n(x)

(9) L (t)°X
Jo F I(s)ds, O^t^l,
I -

j F '(r)
(10) XHn'(t) =X J o F „(x)dx, 0 <_t1.

Note that, with

k) (
k / [k` ll
Hn = 1 (n-1 +1)(Xn:i—Xn:i -1)— 1 X,,:i+(n k)Xn:k
L
1
' — .

n n i =I n 1=1

The corresponding processes are

(11) 18 „=Vi(e„ —e) on [0, co),


(12) Z. = V(L — L) on [0, 1],

(13) /N -n X ü- -1 H -' f on [0, 1].

After a little bit of algebra it is easily seen that the corresponding limit processes
are

(14) R—= j Ju(F) dI+ eU(F) },



778 FURTHER APPLICATIONS

{J
(15) Z—I vdF '—L
-
t
o udF-'}.

and, if F is absolutely continuous [recall (7)],

(16) T=Z+(1—I)Z'
_ ( 1
=z+ (1 µ I) {— f 1 _ I U +L' UdF - ' .
l °F o J

The following theorem gives convergence of the processes R,,, Z,,, and T.
to l, Z, and T respectively. Under slightly more stringent hypotheses the R.
part is due to Wang (1978) and Hall and Wellner (1979), the convergence of
Z. was proved by Goldie (1977), and convergence of T. was established by
Barlow and Proschan (1977). The theory of these processes has been given a
unified treatment by Csörgö et al. (1983).
We suppose that the X's have been constructed as X ; = F - '(f; ) where the
e's are the Uniform (0, 1) rv's of Theorem 3.1.1 having empirical process U.
satisfying(3.1.60), and that the limit processes (14)-(16) are defined in terms
of U of Theorem 3.1.1.

Theorem 1. (Csörgö et al., 1983) Suppose that EF (X 2 ) <oo. Then

(17) JOB„Er„—UBF1^ö> c 0 as n--oo.

If, in addition, F - ' is continuous on (0, 1), then

(18) 11Z"—Z11- ' 0.

If, in addition, f= F' is continuous and >0 on the open support of F and
II (1— I)q/f ° F - 'II <oo for some q in Q* satisfying (11.5.3), then

(19) JJT„—TlI-)- v 0 as n-oo.

Exercise 1. Prove Theorem 1. (Use the results of Chapter 6. Assuming that


EF jX1 2+ E <x for some s >0 provides an easier alternative problem.)

Now define

r 2 (x)=Var[X—xIX>x]

and set

G(x) = P(x)o2(x)
(20) 2(x)
RELIABILITY AND ECONOMETRIC FUNCTIONS 779

Then

(21) G—= 1— G is a df on [O, oo)

and it is not hard to check that the process

(22) R =— SoG on [0,co),

where S is standard Brownian motion. Upon estimation of o, and F by Sn


and F., where S„ = (X; — X ) 2 / n, (22) yields confidence bands for e:

Corollary 1. If A = ,1. is chosen so that P( I S 1 A) = 1— a [recall (2.2.7)]


a nd EX 2 <oo, then

(23) lim P(e(x)Een (x)±n '' Z ,l a Sn /F n (x) for all 0<_x<oo)>1—a


-
n+m

with equality for continuous df's F.

Exercise 2. Show that G in (21) is a df and verify the equivalence in law


claimed in (22).

Exercise 3. Prove Corollary 1.

Exercise 4. Show that P can be recovered from e by the formula

(24) F(x)
e( x)
exp \—
Jo e dI J.
Exercises. If K(x) = $ Fd I/µ, show that K(x) = exp (—J (1 /e) dl) and K
has hazard rate k/ K = lie.

Exercise 6. (Chandra and Singpurwalla, 1978) Let T(F) = Jä H ' dI/ p. be


-

the cumulative total time on test transform, and let G(F) _ ( 2 — jö L dI )/ (2) be
the Gini index. Give a geometric interpretation of G(F). Show that

G(F) = 1
to J Ix —yt dF(x) dF(y) = EjX — Y1,
/a o

T(F)
J
=-- o
(1— t)F '(t) dt,
-

G(F)=1—T(F).
780

FURTHER APPLICATIONS

Exercise 7. Let T. = T(F,,) and G„ = G(F). Show that /(T„ — TF )


—✓n(G„— CF) - d N(0, V2 (F)/µ 2 ), where
_
V2 (F) =
f (2F'(x) —
0
TF)( 2 F(y) — TF)

x (F(x) A F(y) — F(x) F(y)) dx dy.

Exercise 8. Show that when F is an exponential df the limit process T of the


total time on test process satisfies T = U.

Exercise 9. Use (17) to show that JI (tJ — l) Pll o- -*„O for some sequence c„ - o0
as
CHAPTER 24

Large Deviations

0. INTRODUCTION

In Section 1 we introduce the concept of Bahadur efficiency, and present the


key theorem for deriving the exact slope of a test. The fundamental requirement
of this theorem is a large deviation result.
In Section 11.8 we derived a large deviation result for binomial rv's. In
Section 2 we apply this to G„(t). Using monotonicity of D „, we extend this
to a large deviation result for D,,,,, = 1I(G — I) # Ip11
In Section 3 we introduce the Kullback-Leibler information number, and
give some of its basic properties.
The Sanov problem is stated in Section 4. Theorems giving conditions under
which Sanov's conclusion is valid are stated. We then show that the result of
Section 2 for Dy„ is a special case.

1. BAHADUR EFFICIENCY

We follow Bahadur (1971), to which we enthusiastically refer the reader for


additional results.
Suppose X 1 . . , X,... are i.i.d. P0 As a test of Ho : OE O o , we agree to
,. .

reject Ho if T„ = T„(X,, ... , X„) is "too big.” If


(1) p„( t) = Pe (T„ >_ t) for all 0 c O o ,

then
(2) L„=L„(X,,...,X„)= p„(T„)isthep-value
actually obtained. Bahadur defines {T: n ? 1} to have exact slope c(9), when
B is true, if
I 1
(3) —log L„ -+ -2 c(9) a.s. P0 .

781
782 LARGE DEVIATIONS

Note that (3) implies that a plot of (n, —2 log L„) a.s. tends to look, for large
n, like a half-line through the origin having slope c(0). For 0< e <1 Bahadur
defines
(4) N(E)=NE(Xi,X2,...)
min {m: L„(X i .... , X„) < e for all n > m}
0o if no such m exists.

Thus, N(e) is the minimum sample size required for the T„-test to become
and stay significant at level e.

Theorem 1. (Bahadur) If (3) holds with 0<c(0)<o, then

(5) E- o N(e)
2 log (1/e) — c(8)
a.s. Pe .

Suppose T, n and T2 are two different sequences of tests, and suppose they
have finite and positive exact slopes c,(6) and c2 (0) when 0 is true. Then

(6) N2(r) c1(0)


- r12(8) a.s. P0 ;

N1 (e) c2(8)

we call e 12 (0) the Bahadur efficiency of the T1 „- test with respect to the T2 -test
when 0 is true. It is thus of some importance to verify (3).

Theorem 2. (Bahadur) Suppose

(7) TT -> b(9) a.s. PO


for some fixed 0 t O o , where —oo < b(0) < oo. Suppose also that

(8) 1n log P„(t) -' —f(t) as n -> o0

for all t in a neighborhood of b(6), where f is continuous in this neighborhood.


Then

(9)n log Ln . —f(b(o)) a.s. PO

[i.e., c(0) = 2f(b(B)].

We expect (7) to be reasonably straightforward to verify, while (8) is typically


difficult. The conclusion (8) is referred to as a large deviation result.

Exercise 1. Prove Theorem 1 by consulting Bahadur (1971).

Exercise 2. Prove Theorem 2 by consulting Bahadur (1971).


LARGE DEVIATIONS FOR SUPREMUM TESTS OF FIT 783

2. LARGE DEVIATIONS FOR SUPREMUM TESTS OF FIT

We will consider the supremum statistic

(1) D,.^=II(Gn-I)#iG!I
for a weight function qi such that

(2) ii is positive and continuous on (0, 1),


symmetric about t = Z, and lim,. o cr(t) exists in [0, c].

Note that (1) specializes to several of the distribution-free tests considered in


Chapters 3 and 4.
The large deviation result we will obtain for D^,,, is closely linked to the
large deviation result for binomial rv's of Theorem 11.8.2. We now restate this
latter result for the binomial ry nG„ (t). Let 0<_ t<_1 and a>_0. Then

(3) n-1 log P(G„(t)? t+a)-+ f(a, t) frombelow as n-+oo

where

f( ) (a+t)log a t t +(1-a-t)log l l a t t if0^a_<1-t


(4) a, t
00 if a>1-t.

The function f is given in Table 1. We also define

(5) inf f(al i(t), t).


gq,(a)=0<r<I

a - t
Table 1. f(a,t)=(a+t)iog t
a t +(1 - a - t)1og 1
-

(from Groeneboom and Shorack (1981))

.90 .0167 .1054


.80 .0084 .0367 .2231
.70 .0062 .0254 .1163 .357
.60 .0053 .0216 .0915 .226 .511
.50 .0050 .0201 .0823 .193 .368 .693
.40 .0051 .0204 .0811 .184 .335 .551 .916
.30 .0058 .0226 .0872 .192 .339 .534 .794 1.20
.20 .0074 .0282 .1046 .223 .382 .583 .832 1.15 1.61
.10 .0122 .0444 .1537 .311 .511 .751 1.033 1.36 1.76 2.30
.05 .0206 .0702 .1251 .434 .688 .983 1.318 1.70 2.13 2.65 3.00
.01 .0588 .1690 .4611 .815 1.217 1.661 2.144 2.67 3.25 3.89 4.25 4.61
t/a .05 .10 .20 .30 .40 .50 .60 .70 .80 .90 .95 .99

784 LARGE DEVIATIONS

Table 2. 5 1 (a) = 2a 2 ; p2(a) = e Z a 2/8 = . 9236a 2


((rom Groeneboom and Shorack (1981))

a g1(a)/i (a) g 2(a) /92(a )


.00 1.0000 1.0000
.01 1.0000 .9917
.02 1.0001 .9835
.03 1.0002 .9755
.05 1.0006 .9596
.07 1.0011 .9442
.10 1.0022 .9219
.12 1.0032 .9076
.15 1.0051 .8868
.20 1.0091 .8541
.25 1.0144 .8237
.30 1.0210 .7953
.40 1.0390 .7440
.50 1.0646 .6989
.60 1.1009 .6591
.70 1.1540 .6236
.80 1.2394 .5918
.90 1.4211 .5632
.97 1.9342
1.00 .5373
1.50 .4368
2.00 .3679
3.00 .2790

We give g, and g2 in Table 2, where

(6) g; = g y with i' 1 (t) = I and ii 2 (t) = -log (t(1- t)).

Note that

(7) i lm 4i(t) = oo implies

_
I a
sup f (a,t) a log :t <_S anda<
1
-1-+5e E
l

for some 5. >0 depending only on e > 0. To see this, just write

[ alfa -I ía 1 a I
t II +G(t) +r log +rJ -r log r + ^
I ()

L«G(r)J t) le

a l 1_t a/ iP(t)
+ I1 - t - +G(t)] [

log 1-t
-
LARGE DEVIATIONS FOR SUPREMUM TESTS OF FIT 785

It follows from (7), that


log
(8) 1 yo et/ t) = 0 implies g y (a) = 0 for all a ?0.

It also follows from (7) that

(9) f(a/r2(t), t)->a as t J,Ooras tt1.

Theorem 1. Let 1 satisfy (2). Then


1
(10) n log P(Do„? a) - —g, G (a) for each a>_0.

(In Proposition 24.4.1 we reformulate this conclusion in terms of Kullback-


Leibler information.)

Under fairly general conditions we expect that


(11) Tin (mn — Fo) # 41(Fo)II
as. b" = II(F — F0 )'0 ii( under regularity
when X, , ... , X,,,... are i.i.d. F. If F0 is continuous and (10) and (11) hold,
then the exact slope c(F) of the test of Ho : F0 that rejects Ho when T„ is
"too big" is

(12) c*(F)= — go(b*)•

Exercise 1. Give regularity conditions that guarantee (11). (Note Theorem


10.2.1.)

Remark 1. The functions iP s (t) = [t(1— t)] a , with S >0, satisfy both (2) and
(8). Thus their g functions are identically 0. Intuitively, these functions are
too severe in that they put too much weight on the extreme order statistics;
in fact

(13) p,,(t)=P( II(6„ — I ) + '1' U ^t)^ P(CI^(2n - ' n(2nt) - ` /S


so that (1/n) log p,, (1) as n->oo. Thus, p„(t) does not approach zero
exponentially fast. In fact, (8) and (9) show that (up to orders of magnitude)

a(J2(t) _ —log (t(1— t)) is the most extreme weight function Ifs
(14 )
for which the exact slope c"(F) is nonzero.

This suggests the potential value of tests of fit based on T" 2 ,„. (We recall that
the weight function I,, 12 (t) _ [t(1— t)] - `^ 2 that produced the constant variance
786 LARGE DEVIATIONS

process Z„(t) = U„(t)/ t(1— t) was shown to have IIZ. 11 perform poorly in
Chapter 16. Though A„ = Jö Z„(t) dt performed well in Chapter 5, we note
that A„ does not lend itself to confidence bands.)

Exercise 2. Verify (13).

Exercise 3. Show that g, is T' with g, continuous on (0, 1),

(15) g,(a)=2a 2 + O(a 3 ) as a- 0,

while g(a)-cc as a-^ 1.

Exercise 4. Show that

(16) g2 (a) — e 2 a 2 /8 as a 1.0,

while g 2 (a) -> oo as a- 1. Show that the value tQ at which the infimum in (5)
is achieved for g 2 converges to a solution of t(1 — t) = exp ( -2) as a .1, 0.

Exercise 5. Theorem 1 still holds if the assumption of symmetry on cit is


dropped provided we replace g 1, by the appropriate g, where

(17) gi(a) = o inf,f(a/J'(t), t), g^c(a) = o inf,f(a/Ji(t), 1 —


t),

gç°gm^g^•

An interesting statistic, that is, the supremum over t of the p value of


G„ (t) — t, is introduced by Berk and Jones (1979). It behaves asymptotically as

(18) sup {G,,(t) log G °^ t) +(1 — G(t)) log I l eft t) : G„(t)> t },

and has some optimal properties.


In a paper on large deviations for boundary-crossing probabilities of partial
sums, Siegmund (1982) shows that

( 19 ) P(I)(G„ — I) + II > A)
exp(— n[(6, -02 )A +92 +log(1 -92 )])
{A1021 -1 (1 — B2)[1+(IB21/ei)3(1 — e])/(1 — 62)]}1/2

where 02<0<0 1 satisfy 0 1 — 0 2 = log [(1— 0 2 )/(1— 0 1 )] and 9,- ' + 02' _ A -'
This has surprisingly good accuracy.

Exercise 6. Show that (19) is in line with (10).


LARGE DEVIATIONS FOR SUPREMUM TESTS OF FIT 787

Proof of Theorem 1. Consider first D— D y,,, . Fix a ?0. For any fixed t we
have from (3) that

(a) n-' log P(Dy,,, ea)>_n - ' log P(6,(t)?t+ - )-*—f( ,t).

Hence

(b) li
en ' log P(D,y,,, ^ a) ^ oinf g^(a)•
1
if(^(t). t)

It remains only to prove the reverse inequality.


In case g o (a) = 0, however, there is nothing else to prove. Now a = 0 is one
condition that implies g p (a)=0. Another condition that implies g(a)=0 for
all a >_0 is lim, 10 log (1 / t )/ ^i (t) = 0; see (8). Thus, it now suffices to prove the
reverse inequality of (b) under the assumptions a >0 and

(c) l m q(t)log(I >0 whereq= 1.

Note that (c) implies

(d) q(t)-+oo as t-*0.


t

Case 1: q(t) -10 as t-+0. Note that for all 0 1 and 0 2 in [y,, 2], with y f > 0,
and for 0 < t 3. we have

f(aq(t)-0 1 t, 0 2 t)—aq(t) log()

aq(t) e9,t+021
<I [aq(t)—B,t+0211log
L 1 —aq(t) log I +g

^ [aq(t)—B,t+0 2 t] log [aq(t)-0,t+0 2 111

+10,-021log\et/ +Iaq(t) log (e je


z lz t) —aq(t)log ^t) + 8

(e) <2 for0<tS.

provided aq(t) > O1 [which is true by (d), a > 0, and YE > 0]. For each n we
let t, < . • . < t,,,,„ + , = i (with t,,,„, +;+ , = 1— t n, m _ ;+ , for 1 <_ i <_ m) satisfy

(f) tn; =-- for 1-- i -- m with 0 "close" to 1


788 LARGE DEVIATIONS

• .
uuu I I I 1 1 1- 1 l i l t 1 111111.

o tni *N tn,m +1 *** tn,2m+1 1


Figure 1.

and

(g) tni = exp ( –n 2 ).

Solve (I -1/n)' =exp (-2n 2 ) for m = m„ to find that


(h) m = O(n').

Let

(i) iGrn = SUP {J(t): tnr ^ t ^ tn,, +l}


with t* such that qi n; = r'(t) for 1, i ! 2m.
;

Now note that for 1 <_ i <_ 2m we have

pn^ =P( SUP [G(t) – t]fi(t) ^ a) ^ P(G(t 1+1 ) + a/


^^ sfin,i+^ 0ni

a
(j) < exp ( – nf tni – t n.;+1 +—,
( by Theorem 11.8.1

where

(k) t,+1 –tn; =(1- 0)tn;= 0 1 t *; with y E <0 1 ^1

and

(1) t, +, = 02 t*; with 2 ` <_ 02 <_ 2.


-

Thus (c) shows that for tn; sufficiently close to 0 or 1 we have

p n; ^ exp _n[aq(ti)log(
(
1
t ni
-) –
: exp ( n[f(aq(tnt), t 1 )–e])

(m) s exp (–n[g,,(a) – s]) if 0< t n; <_ S E .

That

(n) p,, ^exp ( n[g,,(a)


– – e]) if1–SE^tn.<1
THE KULLBACK-LEIBLER INFORMATION NUMBER 789

follows by an analogous argument. It is clear from (h) that for B sufficiently


close to 1 we have

(o) P„,-exp(-n[g^(a) -s]) ifSE^t „r 1 -S E


simply by continuity of f and Ji. Also, for all large n,

p„ o =P( sup [G „(t)- t]ip(t)?a) and


o<t t

P„,2m +i= P( sup [G„(t)-tj4i(t)^a)


tn.zm+V"r l

satisfy

(P) P„o V P„.2m +1'5 P( 6 „(t„ 1 ) > Ø)<- nP(e, <_ t) <_ n exp (-2n 2 )
5 exp ( -n 2 ).

Thus (m)-(p) and then (h) give

(q) T i --log P(Dy„>_a)slim 1 log[Mn 3 max p„ ; ]<-[g y,(a) -E].


n-.co f n n- -
, O 2m+l

Case 2: lim,_ 0 q(t) exists in (0, oo). The derivation of (q) is just a repeat of
step (o).
Combining (b) and (q) gives

(r) n log P(D+. ^ a) = gq,(a)•


„ -

The result for D^ follows since P(D;„> a) = P(D m > a).


Finally, (r) and

(s) P(D y. „ >_ a)<_ P(D ^ a)<_ P(Dy „>_ a)+P(Dy„>_ a)


.

<2P(Dm „>_a) .

yield the result for


The original statement along these lines, with difficulties in the proof, was
given by Abrahamson (1967). This theorem is from Groeneboom and Shorack
(1981). ❑

3. THE KULLBACK-LEIBLER INFORMATION NUMBER

Let P and Q denote probability measures on the measurable space (X,s:Q). If


Q« P, then we let dQJdP denote the Radon-Nikodymn derivative; note that

790 LARGE DEVIATIONS

we may suppose 0 <_ dQ/ dP < oo for all x E X. We now define the Kullback-
Leibler information number K(Q, P) by

if Q« P
(1) K(Q, P)= J ( log dP) dQ
ao else.

Proposition 1. K is well defined with 0 < K oo. We have

(2) K(Q, P) = 0 if and only if P(A) = Q(A) for all A E si.

Proof. Note that r = dQ/ dP satisfies 0< r(x) <'x a.s. Q. Also log t < t —1 for
0 t < oo with equality if and only if t=1. Thus, for D = [x: 0 < r(x) < co],

(log!) (l- 1) dQ=I—P(D)^::-0


(a) fX (log r) dQ= J — n r dQ>_J v
-

with the first inequality in (a) being an equality only if r =1 a.s. Q. ❑

Let 11= {A 1 , ... , A,,,, ... } denote a measurable partition of Cl. Let

Q(A,,)
(3) Kn(Q, P) Q(Am) log
M P(A,,,)
Proposition 2. For any measurable partition 11 we have

(4) K(Q,P)^K(Q,P) for all P, Q.

Proof. We may assume K(Q, P) <oo, else the result is trivial. Thus 0
dQ/dP < for all w. Also ç(x) ° x log x is convex on [0, co) (with 0 log 0 = 0).
Thus for any measurable set A having P(A) > 0, we can use Jensen's inequality
to write

fA d
log
dp
P i dQ fA \ dP dP
dQlog \ dP P(A)
dQP(A)
_ dQ
J dP
A (dP) P(A) P(A)

dP P(A)) P(A) = I'(A)^P (p(A)) = Q(A) log P(A)


(a) > (fA
If P(A) = 0, then (a) again holds since all terms equal 0. Thus we may
sum (a) over the sets A. of the partition, and in doing so we obtain the
proposition. 11
THE KULLBACK-LEIBLER INFORMATION NUMBER 791

If P and Q have df's F and G, then we also write

(5) K(G, F) = K(Q, P).

Using the partition {(—cc, x], (x, co)}, Proposition 2 gives

(6) K(G, F) ? G(x) log F( x) + (1— G(x)) log (1— F(x))

for any real x.

Proposition 3. Let I denote the Uniform (0, 1) df. Among all df's G that
pass through the point (t, h) with t E (0, 1), the one that minimizes K (G, I)
is the df G, that is linear from (0, 0) to (t, h), and from (t, h) to (1, 1). The
minimum value is

(7) K(G,,I)=h log (h/t)+(1—h) log ((1—h)/(1—t)).

Proof. It is a trivial calculation to verify (7). Then the partition


{(—cc, t], (t, oo)} in (6) implies that K(G, I)? K(G„ I) for all G going through
(t, h). [This is evidently from B. Hoadley's (1965) thesis at U. California at
Berkeley.] ❑

We now suppose that Il, denotes a collection of df's, and we define


inf {K(G, F): GEfl} ifCl#D
(8) K(SZ, F)= 100

A key role in evaluating K(fl, F) is often played by the function


(9) i2(t)=—log(t(1—t)) for0<t<1.

We let

(10) m(F)={G: G is a df having II(G—F)Ii 2 (F)II_m

Proposition 4. For any F and Cl having K(fl, F) <x,

(11) K (St, F)=K(Cl(^m (F), F)

for all m ? mp n sufficiently large.


Proof. Since tiIr2 (t) [since (1—t)rli 2 (t)] is bounded for t<-2 (for t>2), we
have for all large m that

(a) 9m(F)`c^G:(IGipz(F)II }U 11( 1 — G).i2(F)II }.


792 LARGE DEVIATIONS

Also for m sufficiently large, we have

(b) llGIi2(F)1I?implies K(G, F)? 4

[and a similar implication with 11(1 —G)i(i2 (F)11] since

(c) K(G, F)? G(x) log F(x) +(1 —G(x)) log I—F(x))(

for any fixed x by (6). Thus for m' sufficiently large we have

m'
(d ) K(G F)>-->_ K(Sl, F)-- I whenever I'(G— F)t, F)I) >_ m;

but this is just (11). (See Groeneboom and Shorack, 1981.) ❑

4. THE SANOV PROBLEM

Suppose now that T is a functional on the collection 0 of all distribution


functions. For —oo < a <‚ let

(1) Qa={GE2: T(G)>_a}.

Suppose also that our test of a hypothesis H o is to

(2) reject Ho if T„ = T„ (X, , .. .‚X)= T(F„) >_ a.

The conclusion Sanov (1957) established (it requires regularity) is

(3) n1 log P(T(C~„) > -+ —K (S2a, F), under regularity, as n -^ o0


_ a)

for the Kullback-Leibler number of (24.3.8). Suppose also,

(4) T(IF) -)' a.s. b under a fixed alternative

where

(5) K(fa, F) is continuous for a in a neighborhood of b.


THE SANOV PROBLEM 793
Then the exact slope c for this alternative F is, by Theorem 24.1.2,
(6) c = 2K (S2 b , F).
The evaluation of (6) can also be difficult.
We now state without proof some results along these lines. The first is from
Hoadley (1967), with a simpler proof in Stone (1974).

Theorem 1. Let F be a continuous df. Suppose the functional

(7) T is uniformly continuous in the II II topology.


Suppose the function r-* K (Sl r , F) is continuous at a and u„ - 0 as n --loco. Then

1
(8) -log P(T (lF„)? a+u „)--K(SZ a , F) as n-"x.

Groeneboom et al. (1979) introduce the r-topology on .2 (see below) and


simultaneously generalize a number of results, including Theorem 1.
Groeneboom and Shorack (1981) relax their conditions slightly in the above
setting so that -r-continuity is only required on an appropriate subset.
We now define the T-topology. Let H = { B 1 ,. . . , B,,} denote a partition of
118 into Borel measurable sets. Let

Pc(BI)
(9) K1(G, F)= y_ PG(Bi) log
i = 1 1'F(Bi )

where PF (or PG) is the probability distribution corresponding to F (or G).


For a set of df's H let

(10) K n (H, F) = inf { K n (G, F): G e SZ }.

(We use the conventions 0 log oo =0 and a log 0 = —oo for a > 0.) Consider
also the pseudometric do on 2 given by

(11) dr1(G,F) = 1 max PF(B;) — Pc(B;)j.

The topology on -9 generated by all such d will be denoted by -r; thus T is


the smallest topology such that the sets {GE 2i: dn (G, F) < e} are open for
each e > 0, each Fe 9, and each finite partition H. In fact, these sets form a
subbase for the topology.

Exercise 1. The I II-topology is strictly coarser than the r-topology. That is,
all I l -open sets are r-open, but the converse fails. (See Groeneboom et al.,
1979.)
794 LARGE DEVIATIONS

Theorem 2. Suppose F is a continuous df and

(12) T is r- continuous on 9 (F)- {G E 2: II(G—F)c112(F)I!: m}


for each m >0

[recall ii2 (t)=—log(t(1 —t))]. Suppose K(11. a ,F)<oo, the function r-


K (fl, F) is right continuous at a, and u„ - 0 as n -+ oo. Then

1
(13) --log P(T(F„)?a+u n )-*—K(fl , F) as n- o3.

Large deviation results for the Anderson and Darling statistic A n and for
L- statistics are derived in Groeneboom and Shorack (1981).

Reexpression of Theorem 24.2.1

Let 5F denote the set of all df's on the real line. For a fixed positive function
:/r and a fixed df F, we define the function TF by

(14) T(G) = sup [G(x) — F(x)] # i4'(F(x)) = II(G — F) # +G(F)II ;


x

note that for continuous F we have

(15) TF(^n)=D^.n.

For a ? 0 we define

(16) ftä = {G: TF(G)?a}= {G: II(G — F)I(F)II?a}.

Proposition 1. Let F be a continuous df and let i i denote any function positive


on (0, 1). Then for all three values +, —, and I I of #, we have

(17) gq,(a)= K41 a , F) forall a>_0.

(This offers an alternative expression for the limit in (24.2.10), and is in the
format of the Sanov result.)
Proof. Since F o F ' is the identity for continuous df's F, we can define the
-

df G, on R so that the df G, = G, o F ' on the unit interval has the uniform


-

on intervals density

[t +a/iy(t)]
if0<u<t
t
(a) g,(u)=
[(1 — t) —a/t^/(t)]
if t <u<1.
1—t
THE SANOV PROBLEM 795

Note that G, Ell a since TF (G,) = a, and note that

(b) K(G„ F) = f(a/./i(t), t).

Thus

(c) K(fo, F) s inf, K(G„ F) = g y (a).

To obtain the reverse inequality, we note that for > 0 there exists a df G, E fl a
for which

(d) K(Ga, F)<K(fl a , F)+e.

Now G* E fl, implies lO (ta ) — ta l ? a/ r(ta ) for some t o in (0, 1). But by
Proposition 24.3.3 the df Gö that is linear from (0, 0) to (t a , Gä(ta )) to (1, 1)
has the smallest Kuhback- Leibler number, when compared to the Uniform
(0, 1) df, among all dfs that pass through the point (ta , Gq(ta )) and its
information number is -K(G,., F) [note that Gä = G,, when Gä(t q ) — to =
a/i/i(ta )]. Thus for s>0

(e) K(HQ, F)+EkK(Gä, F)^K(G,a, F)=f(44t4), ta)?g„(a)•

Hence K (fl,, F) ? g(,(a ). The minor extension to S2.ä is left to the


reader. ❑

Actually the definition of the T-topology makes sense much more generally;
just imagine that F and G above (or introduce P and Q) are probability
measures on a complete separable metric space. Let int 1 and cl t denote interior
and closure in the r-topology on the set of all measures on the metric space.
The next theorem is from Groeneboom et al. (1979).

Theorem 3. Let P E 2 and let 11 be a subset of 9 satisfying

(18) K(int r (11), P) = K(cl, (ft), P).

Then the empirical measure A. satisfies

1
(19) --log P(1' ,, E fl) -. —K (fl, P) as n - 00.
CHAPTER 25

Independent but Not


Identically Distributed
Random Variables

0. INTRODUCTION

We now return to the situation of Section 3.2 in which X,,,, ... , X,, are
independent with df's Fn ,, ... , Fnn . Good theorems for the empirical process
in this present situation will require an extension of the DKW inequality
(Inequality 9.2.1). This is the subject of Section 1. Just as U n (t) - Binomial (n, t),
in the present situation the marginal of the empirical process at a fixed point
has the generalized binomial distribution. In a very strong sense the generalized
binomial distribution is less dispersed than an appropriately chosen ordinary
binomial distribution. This fact is the subject of Section 2. This result is then
used to obtain in probability "linear bounds" on the empirical df in Section
3. Armed with these inequalities, we then establish the weak convergence of
the empirical, weighted empirical, and quantile processes of independent but
not identically distributed rv's in II /q1I metrics in Section 4. Section 5 extends
our earlier work on L- statistics to this case.

1. EXTENSIONS OF THE DKW INEQUALITY

Let X„,, ... , X. be independent rv's with arbitrary df's F„,, ... , Fnn as in
Section 3.2, and let

( 1 ) Fn(x) =n
i=1

n
(2) F(x) ° F,(x)= n '
— Fni(x),

796
EXTENSIONS OF THE DKW INEQUALITY 797

and

(3) by (3. 2. 4 1),

where X. is the empirical process of the associated rv's a n ,, ... , a s ,, with


continuous df's constructed in Section 3.2.

Inequality 1. (Bretagnolle) Let co be / and convex from R + to R + and let


a be right continuous and bounded from R to R. Set

(4) V=V,= sup (nlF n (x)—a(x)).


—ao<x<XD

Then, for each fixed n, a, (p, and F n

(5) Ec?(V)<E*co(V),

where E denotes expectation under F= (F 1 ,. .. , Fnn ) and E* denotes


expectation under F = (Fn, ... , Fn ). In particular, with a = nF,

(6) E(p(Vnp)-E*(p(VnF)•

Inequality 2. (Bretagnolle) For all n a! 1, A > 0, and F = (F 1 ,. . . , Fnn ) there


exists an absolute constant c such that

(7) PF (./nIIFn—Fnll^a)/2 PF(v'iII(Fn—Fn)+II A)


ec exp (-2A 2 ).
The absolute constant c of the DKW inequality (Inequality 9.2.1) works.
(Hence c = 29 works.)

We first show that Inequality 2 follows from Inequality 1 by way of the


following lemma.

Lemma 1. (Kemperman) Let H and K be ", functions on R + with H(co) _


K(x)=0 and H(0) < K(0) < oo. Suppose further that

(8)
to 0

Jo pd( H)<_ j
pd( K) —
for each function on R + of the form ç (t) = c(t — b )+ with c, b > 0. Finally,
suppose that K (x) - " is convex for some 0< a <1. Then

(9) H(x)<(1—a)-'^'K(x) for all 0<_x<co.


798 INDEPENDENT BUT NOT IDENTICALLY DISTRIBUTED RV'S

If K(x)_a is convex for all small a >0 [or equivalently K(x) = exp (—f(x))
with f convex] then

(10) H(x)<_eK(x) forall0x<oo.

Note that if (8) holds for all / convex from R + to R + with cp(0) = 0,
then the hypothesis of the lemma, and hence (9), holds.
Proof of Lemma 1. Let f(t) K(t) - °. Since f is convex and 7, there exist
constants p and q describing a supporting hyperplane such that

(a) f(t)?f(x)+q(t— x)=p +qt for all t;

here p = f(x) — qx and x>0 is fixed. Hence

(b) K(t)=f(t) -'1°'[p +qt]-'I° for all t >_ — = x —lf(x)•


q q

First consider the case qx < af(x). Then, by (a),

H(x) -° ^ H(0) -° '' K( 0 ) -° =f(0)


>_f(x)—qx^(1 — a)f(x) =(1— a)K(x) °

which implies (9). Thus we now assume that

(c) qx> af(x)

which implies that u = x —(a/q)f(x)>x— (1 /q)f(x) satisfies 0 <u<x. Now


apply (8) with p(t)={(t—u)/(x --u)}; then q,(0)=0 and (x)= 1. In par-
ticular (t)>_ 1 ^ (t), so the left-hand side of (8) is ?H(x). The right-hand
( )

side can be integrated by parts, and using (b) and K(oo)9(oc) =0 by (b) gives

(d)
('

J
J x—u
o
1
°°
K(t)dt^
1
K(t)dcp(t)=
x—u
( °° °°
[p +qt]- 'f °dt.

Hence

H(x)<_(x —u) ' 1 as1 (p +qu)' - '^ °


-
q

_ (1— a) 'f(x) 'I°


- - by definition of u and p
(e) _ (1— a)-'1 °K(x).
EXTENSIONS OF THE DKW INEQUALITY 799

Lemma 1, suggested to us by Kemperman, improves slightly on a similar


lemma of Bretagnolle (1980), who proved

H(x) < [1+(1—a) 'i 2 ] z K(x)


1—a
(f) :4K (x) if K (x) - ° is convex for all small a

under the same hypotheses. Kemperman's e cannot be improved: for fixed


x>0 choose K(t)=exp(—ct—d) with c>_1/x and H(t)=eK(x)1 [o.x] (t).

Proof of Inequality 2. Let M" =' I) (f'" — F") + II = n - " 2 V"F, and set, for A >_0,

H" (A) = P(M" ? A) calculated under F = (F",, ... , F,, ),


H(A)= P*(M" ?A) calculated under F = (F, ... , F),

and

K(A) = c exp (-2,1 2 )

where c = 29. For cp satisfying the hypotheses of Lemma I it follows from the
DKW inequality (Inequality 9.2.1) that
fco
^cpd(—H*) ^ j cod(—K)
(a) J o o

after an integration by parts. From Bretagnolle's inequality (Inequality 1)


fx
(b) J o
cod( —
H")^j cod(—H),
0

and combining (a) and (b) yields

. cpd(—H,,)s f cpd(—K).
(c) Jo o

Since K - ° is convex for every a >0 and H"(0) = 1 <c = K(0), (7) follows
from Lemma 1. ❑

To prove Inequality 1, we need two lemmas concerning the effect of "iterated


pairwise averaging" of the components of a vector of numbers or of distribution
functions. Let x = (x,, ... , x") n Ii", and consider the following pairwise averag-
ing process:
800 INDEPENDENT BUT NOT IDENTICALLY DISTRIBUTED RV'S
(i) Choose two coordinates of x "at random," say x i and xj.
(ii) Form the average of the two coordinates chosen, 2(x +xj ). ;

(iii) Replace both x i and xj by their average to obtain a new vector x (1) =
1/ 1/
(x1, ...exi — l,21xi + xj), xi+le... xj—1, Rxi+xj), xj +l,.. ., xn);

(iv) Repeat this process, denoting the vector obtained after m steps by x (m) .

An alternative description of this process of "pairwise random averaging" is


easily given in terms of products of random matrices. Let 9 --- {P;j } denote
the collection of all n x n pairwise permutation matrices, and let J= {T ; }
denote the collection of averaging matrices Tj =(I+ P;j ) for P;j E . Thus
Px=(x
I 1 ,...,x_,,xJ J^x ^+I^ exj_
J l^xnx J+1^ exn )' and Tix
y =(x lr .^ ^xi_1 ) ,
x i+ ,, ... , ,, z(xi+ ,, . .. , x")' for any x e W'. Note that both
P and T contain n(n -1)/2 distinct elements. Let M be a random matrix
uniformly distributed over the n(n —1)/2 Tj 's in .°l: P(M= Tj ) =2 /n(n —1)
for all Tj E 2T Thus Mx represents the result of choosing two coordinates of
x at random, averaging them, and replacing both of the chosen coordinates
by their average. Every Tj is doubly stochastic, and hence M is doubly
stochastic with probability one.
Letting M 1 , M2 ,... be random matrices independent and identically dis-
tributed as M, the pairwise averaging process described above is given after
m steps by just

x (m) _Mm M, x.

For any x E R ", let x = n -1 Y_;_, xi and z = (z, ... , .z) E 18 ".

Lemma 2. For any x E IR"

(11) x (m) _M • • • M,x-+z a.s. as m -* x ;

equivalently,

(12) Mm •••M,-iJ a.s. as m oo


n

where J is the n x n matrix of ones.

Lemma 3. Let F = (Fn1i ... , F,,,) be an n-vector of arbitrary df's, let


Ml , M2 ,... be a sequence of matrices in J such that

S. M. M,- 1 J asm-"
n
EXTENSIONS OF THE DKW INEQUALITY 801

(which exists by Lemma 2). Then

n
llSmF — F1I=,^"
max sup I Z S m (i,j)F"; (x)— 1n Y_ F ; (x)
-oo<x<m _, j

-*0 as m -- CO,

where F= 1/n ^^ F"; and

Exercise 1. Prove Lemma 2. [Hint: Without loss suppose z = 0. Let y =


x (m) = M M x z = x ( rn +1[ and Vm= y" (x^'" ) ) 2 . Show that V V
— YS ) 2 - 0 for some random r, s; and by computing E( Vm+i — Vm I y) show
that E(Vm )=(1—e) m (^; x?) where e=I/(n-1).]

Exercise 2. Prove Lemma 3.


Proof of inequality 1. In view of (3) we have II/i(F" — ") +iI <_ IIX+nH where
X,, is the empirical process of the rv's a 1 ,. .. , a"" with continuous df's
G„ n constructed in Section 3.2. Hence we may assume, without loss
of generality, that F1, 1 ,..., F"" are continuous.
Suppose that n = 2. Let P, E denote probability and expectation in the case
of general F, 0 F2 ; and let P*, E* denote probabilities and expectations in
the identically distributed case (F, F) with F = 2(F, + F2 ). We note that F, =
F—G, F2 =F+G, where G=Z(F2 —F1 ).
Recall that V— V1 = sup, (nf"(x) — a (x)) = sup x (21F 2 (x) — a (x)) when n =
2. Thus, if a = sup x (—a(x)),

(a) as V<a+2 a.s. (PorP*),

and hence

Ew(V)=—f cp(z) dP(V>z)


laa+21

(b) = ^p(a)+
fCaa+2]
V'(z)P(V> z) dz

after an application of Fubini's theorem, and similarly for E*^p(V). It follows


that

(c) E*^p(V)=Ecp(V)= f cp'(z){P*(V>z)—P(V>z) }dz,


ia.a+2]

and to complete the proof when n =2 we need to show that the right-hand
side of (c) is >0.
802 INDEPENDENT BUT NOT IDENTICALLY DISTRIBUTED RV'S

Let T, = min {X,, X2 } and T2 = max {X,, X2 } denote the order statistics, and
note that

(d) N(t)=2F2(t)=1[T,,T2)(t)+2.1[T2. )(t)•


Let A(t) = sup s ,, (—a(s)). Thus A>—a, A and are bounded and right
continuous, A is \, and R1 1', and we have N(t)—a(t)<ri(t)+A(t) for all t
so that Vssup, (N(x)+A(x)), while on the other hand, for any fixed t

V = sup (N(x) — a(x))

sup (1\l(x) — a (x))


x?t

> sup (J(t) — a(x)) since N?


x >r

_ (1J(t)+A(t))

so that V >_sup, (N(t)+A(t)). Hence, note Figure 1,

(e) V = sup (r (x) + A(x)).


x

Now let
a,(z)=sup {t: A(t)>z -1},
and

a 2 (z)=sup {t: A(t)> z — 2},

so that a, and a 2 are monotone and a 2 (z+ 1) = a,(z). Thus it follows from (e)
that V = a v (1 +A(T,)) v (2 +A(T2 )) and hence that

[V >z]=[T2 <a 2 (z)] forz>_a +1

IN + A `
i V
II
A V

a fit.

Figure 1. ^i

EXTENSIONS OF THE DKW INEQUALITY 803
while

[V>z]=[T2 <a 2 (z)]u[T,<a,(z)] for z>_a.

From this together with

P(TZ <y) = F,(y)F2(y),

and, for x<_y,

P(T,<xorT2 <y)
= F 1 (y)F2 (y)+F1 (x)(1—F2 (y))+F2 (x)(1 — F,(y)),

it follows that

P*(V>z)—P(V>z)
G 2 °a 2 (z) a+1-z
(f)
G2 ° a 2 (z) —2G ° a,(z)G ° a 2 (z) a <_ z.

Substituting (f) into (c) and using a 2 (z+ 1) = a 1 (z) yields

E*rp(V)—Ecp(V)
=
f cp'(z){G Z ° a 2 (z) —2G ° a,(z)G ° a z (z)} dz
°

+
f

a+2
p'(z)Gz° a Z (z) dz

(g) _ ^p'(z){G ° a 2 (z) — Go a,(z)} 2 dz


a

f
++

(z)G 2° a,(z) dz+ ^'(.Y+1)GZ° a,(.Y) d.Y


.a a

= J ^p'(z){G °
a+^

°
a z (z) — G ° a,(z)} 2 dz

a+,
+
f {ç'(z+l)—^p'(z)}GZ° a,(z) dz
a

>0

since cp is >' and convex implies that So' is >O and 7. This completes the
proof when n=2.
804 INDEPENDENT BUT NOT IDENTICALLY DISTRIBUTED RV'S

Now suppose that n >- 3. Choose two variables X and X; call them X, ;

and X2 for convenience. Keeping the notation J(t) = l x1 ,+ I IX2 . 0 , write

nF„(t) — a(t)=^l(t) - 1 a(t)— l ix .. i


i=3 ///
=rJ(t)-ä(t)

where ä = a — =3 l [x,^. ] is right continuous and bounded.


By the preceding calculations for n = 2, it now follows from independence
of the X,'s that

(h) E(c➢ (V)I X3, ... , Xn) _< E * (w(V)I X3, ... , Xn)

where E and E* denote expectations with respect to the (conditional) laws


of X,, X2 (given X3 ,. . . , X„) under the distributions (F,, F2 ) and (Z(F,+
F2 ), 2(F, + F2 )), respectively. Hence, taking expectations in (h)

(i) Ecp(V)<_E,cp(V)

where E denotes expectation under F = (F,, ... , F„) and E, denotes expecta-
tion under F, = (2(F, + F2 ), Z(F, + F2 ), F3 ,.. . , F„), and we have shown that
the "regularization operation" which consists of replacing two coordinates F
and F; of F = (F,,..., F„) by their average 1(F+F) increases Ep(V).
To complete the proof, first note that since a <_ V< a + n a.s. where a
sup, (—a(x)), p( V) is a bounded continuous function of X 1 ,. . . , X„, and
hence the function F =(F,,...,F„)-Ecp(V) =f • . . f cp(V(x)) dF,(x,)
dF„(x„) is continuous with respect to the topology of weak convergence of F.
But it follows from Lemma 3 that S,„F F uniformly, and hence weakly, as
m -> oo. It therefore follows from (i) that

Eco(V)^Es,FSo(V)^ ... <Es m Fc(V) by(i)


(j) /E *cp(V) asm-*oosinceS„F -. d F,

and this completes the proof. ❑

2. THE GENERALIZED BINOMIAL DISTRIBUTION

In this section we let X 1 ,. . . , X. denote independent Bernoulli random vari-


ables with probabilities of success p,, ... , p. Let X = X, + • • • + X„ and let
p = (p, + • • - +p)/n; we call X a generalized binomial random variable. To
form a comparison, we let Y denote a binomial (n, p) rv. Hoeffding's (1956)
key result is that the distribution of Y is more dispersed about its mean rip
than is the distribution of X.
THE GENERALIZED BINOMIAL DISTRIBUTION 805

Inequality 1. (Hoeffding, 1956) (i) If 0:5 a -5 np s b - n then

(1) P(a ^ Y<_ b) P(a <_ X <_ b).

This bound is always attained when a =0 and b = n; but otherwise it is attained


if and onlyifp,_•••=pn =A
(ii) If g is any function for which

(2) g(k+2)-2g(k+l)+g(k)>0 for all 0<_ k<_ n-2,

then we have

(3 ) Eg(X) ^ Eg(Y).

Note that (2) is satisfied by any convex function.


Proof. We only prove (ii). Let p = (p t , . . . ' p,,) denote the vector of parameter
values, let 1= (1, ... , 1), and for a fixed value of p let L— { p: pi' = np}. Let
g denote an arbitrary function on {0, ... , n}. We define f by f(p) = E(g(X) I p_ )
for p E L. Let a denote a value of p in L at which f is maximized. Let

(a) h;(k)=P(X—X;=kla) and h(k)=P(X—X; —X; =kIa).

Now
n
f(g) = Y, g(k)P(X = kl a)
k=0

n
_ Y- g(k)[(1—a )h (k)+a h (k-1)]
; ; ; ;

k =0

n
_ F, g(k) {(i —a )[(1—a )h (k)+a h(k-1)]
; ; ;; ;

k0 =

+a;[(1—aa )h(k-1)+ah(k-2)]}
n-2

g(k)h(k)}
k=0

f l—I

+(a ; +a; ) Y- g(k)[—h(k)+h(k-1)]^


k=0

+a a I Y, g(k)[h(k)-2h(k-1)+h i (k-2)]^
; ;

k-0 11
(b) =B;;+(a;+a;)C;;+a;a;D;;;
806 INDEPENDENT BUT NOT IDENTICALLY DISTRIBUTED RV'S

and note that we may also write

n-1
(c) C;;= [g(k +1) —g(k)]h(k)
k=0

and

n-2
D = E [g(k +2) -2g(k +1) +g(k)]h(k)
;;

k=0

using summation by parts.


Suppose now that the maximizing point a in L has a ; <a3 for some i ^ j.
We define a' E L by replacing a , a; by a ; + x and a; — x and leaving the other
;

coordinates unchanged. Since f is a maximum in L at a, we see from (b) that

x(a —a —x)D =f(a')


; ; ;; — .f(a)

:5 0 for all 0 <— x _< some S since a ; < a3


1 <_0 forall0Ixl<some S iffurther0<a,<a3 <1.

We may thus conclude that

1(d)
1=0
D;; :5
0 whenever a ; 0 a;
whenever a ; 0 a; with 0 < a ;, a; < 1.

We now assume (2). We saw from (d) that iff (p) = E(g(X) p ) is maximized
in L at a, then

(e) D;; 0 for any pair i ^ j having a ; ;t a .


;

Applying (e) to the representation for D;; given in (c) shows that h(k) = 0
for k =0,..., n —2 for any pair having a ; 54 a;. But we must always have
h (0) + • • • + h (n — 2) = 1. Thus a ; # a; for some i O j is impossible. Thus the
;; ;;

maximizing value a must have all coordinates equal. Thus a =/5, and (3) is
proved. ❑

Generalizations and related results are given by Samuels (1965), Gleser


(1975), Marshall and Olkin (1979), and Bickel and van Zwet (1980).

Exercise 1. (Feller, 1968) Show directly that for p in L we have Var (X)
-j 1 p; (1 —p; ) =nP- 1 p;--np(1 —P)= Var(Y) with equality if and only if
= pn =. (This is from Feller, 1968, Vol. I, p.231.)
BOUNDS ON F„ 807

3. BOUNDS ON F.

It is often useful to bound F,, above or below by some function of F„. Inequality
1 gives the probability bounds necessary to establish an extension of Inequality
10.4.1 to the case of independent but nonidentically distributed rv's. To simplify
notation, we state these theorems for the left tail of F. and F,, only; by
symmetry, analogous results hold for the right tail.
Let X,, 1 ,.. . , X,,,, be independent rv's with arbitrary df's F,, 1 ,.. . , F„„ as in
Section 1, and let

11 F j F,^ (X)
(1) =sup
F —co<x<oo Fn (x)

and

(2)IIF"II - = sup F,,(x)


F n X,:i X n:1 --x<m Fn(x) •

The following inequalities should be compared with those of Section 10.3 for
IIG,/III and J1I/6„ II L.1.
Inequality 1. (Van Zuijlen) For all n ? 1

P(IIF„II^A) s exp(1-1/A)(1/A)
(3) for A>1
F„ J 1—(1/A) exp (1-1/A)
and

/ F„ °° \ eAe ” -

(4) PI >A) <_ _a fork > 1.


1 — eke

Theorem 1. For an array of arbitrary df's, and any e >0 there exists a number
A =A(e) such that

F„
(5) P(F,, <_ AF„ on (—ox, co) and F„ >_ on [X,, ,, (X)))>_ 1—
:

for all n?1.


Proof. This follows immediately from Inequality 1. ❑

Proof of Inequality 1. By (3.2.36) and (3.2.37) the suprema in (1) and (2) are
bounded by the same suprema defined in terms of H,, and R. of Section 3.2;
hence it suffices to consider the case when all the F„ 's are continuous.
;
808 INDEPENDENT BUT NOT IDENTICALLY DISTRIBUTED RV'S

We first prove (3). For A> 1 and n >_ 1 we have

F (I >A) = P(F n (x) >_ AFn (x) for some —oo < x < oo)
P( n

=P —>—Fn (Xn : i ) for some i = 1,...,n^


n

(Xn:i Fn'( ))
i1 P

(a) = L P(Sn - 0,
f=1

where S. = ^^ Z; and 4 are independent Bernoulli (p' ) rv's with p;


Fnj (F (i /nA)) so that p= (1/n) , p; = i/n,1.
Since b = i> i/A = np for A> 1, by Hoeffding's inequality (Inequality 25.2.1)
with Y= Binomial (n, p), we have, for 1!5 i ^ n,

P(Sn >i)<_P(Y>i)

=PI YAI
np"
<_exp (—nph(A)) by Inequality 10.3.2

(b) exp ( —i 1 h(A)) =exp (—ih(1/A)).

Combining (b) with (a) yields

^n ^
Pt >_ A )< exp(—i1(1/A))
!
'

F.
\ _ exp (-1(1/A))
(c) 1 —exp (— h(1 /A))'
and (3) follows upon noting that h(1 /A) = 1/A —1 +log (A)> 0 for A>!.
We now prove (4). For A >_ 1 and n >_ I we have

Fn -A)= P (Fn(Xn:r)?AL'n(Xn:i—)forsomei= 2,...,n)


P( F n X.
X

=P(Fn (Xn:i )?A -1 for some i = 2,..., n)


(n/A)+l

P( Fn(Xn:i)^A n
1
=2

(n/A)+1
(d) _ I P(Sn ?n —i+1)
1=z
CONVERGENCE OF X,,, Y„, AND Z. WITH RESPECT TO /q^J METRICS 809

where S. = ^? Z; and ZJ are independent Bernoulli (p; ) rv's with pJ


1—F(F(A(i-1)/n)) so thatp=l—A(i-1)/n.
Since b = n — i + 1 > n — A (i —1) = np for A>1, by Hoeffding's inequality
(Inequality 25.2.1) with Y=— Binomial (n, p), we have, for 2:5i-5(n/ A) + 1,

P(S„> n —i +1)^ P(Y> n —i +1)


=P(n—Y -i -1)
n(1—p) ^A^
=P(
\ n—Y

<_exp ( — n(1 — p)h() by Inequality 10.3.2


)

(e) =exp1 —(i -1)Ah( =exp( —(i- 1)!(A)).


A ))

Combining (e) with (d) yields

Y exp(— (i -1)1(A))
P \IIFnn llx 1
2
(f) exp ( h(A)) —

—exp (—h(A))'

and (4) follows upon noting that h(A) = A +log (1/A) —1. 11

Open Problem 1. Formulate and prove appropriate analogs, for the case of
independent but not identically distributed random variables, of Theorems
10.2.1, 10.5.1, and 10.6.1.

4. CONVERGENCE OF X, Y,,, AND Z. WITH RESPECT TO


II 1 g11 METRICS

Suppose X 1 ,. .. , X,,,, are independent with

(1) arbitrary df's F,,,, ... , F.

Let c,, ... , c a ,, denote arbitrary constants (Inequality I below is true even if
the c,, ; are rv's such that the (c 1 /V?, X„ ; ) are independent). As in Sections
3.2 and 3.3 the weighted empirical process is

i n
(2) I(x)= CC'y' cni[l(_,x](XI)—F,(x)] for —oo<x<oo,
810 INDEPENDENT BUT NOT IDENTICALLY DISTRIBUTED RV'S

and F,, is defined by

in
(3) F„(x)=— c. F (x) for —co<x<oo. ;

Recall also that

(4) lE„ =Zn(F) on (—oo, oo) a.s.

for the reduced empirical process 1„ of the associated array of continuous rv's.

Theorem 1. (=' of Z,, of the ß„ 's in ; 11 /q1I) Suppose that

(5) q E Q (thus q A and q(t) /f \ on [0, 1/2) satisfies J ”

0
q([) -2 dt <oo

and that

(6) ( max cn / c'cl


; 0 as n -+ co.

Then

(i) Z. is weakly compact on (D, , JJ /qjj).


(ii) Any possible limit process Z must satisfy P(Z /q E C)=1.
(iii) The condition that the functions K. of (3.3.50) and (3.3.44) satisfy

(7) Kn(s,t)-someK(s, t) as n- co forall0<_s, t1

is necessary and sufficient for the process Z. to satisfy

(8) Z,,= some Z on (D, 2, 1 /q) as n -g oo.

Moreover, Z is necessarily a normal process with zero means and covariance


functions K.

(iv) Suppose (8) holds and a Skorokhod construction has been performed
so that II4 —7L lJ -* p 0. Let h E.2 . Then

Cl
I
(9)
o
hdn
7L
° f
o ,
hdZ

ä
for some ry 5 h dZ on the Skorokhod probability space.
CONVERGENCE OF X„, V,,, AND Z. WITH RESPECT TO 11 /qff METRICS 811

Convergence of X and V„ [recall (3.2.13)-(3.2.18)] in II /II metrics follows


from Theorem 1.

Theorem 2. (= of X„ and Y„ in II / q (I) Let X„ denote the reduced empirical


process formed from the nth row of any triangular array of row-independent
rv's having continuous df's. Then for all q c Q satisfying J q(t) -2 di <x:

(i) X. is weakly compact on (D, 2, II / q II ).


(ii) Any possible limiting process X must satisfy P(X/q E C) = 1.
(iii) The condition that the functions K. of (3.2.15) [or (3.3.5) with all c,,, = 1
and n ' Z G,, ; (t) = t] satisfy (7) is necessary and sufficient for the process
-

X, to satisfy

(10) x„= someXon('D,-9,II /II) asn->cC.


Moreover, X is necessarily a normal process with zero means and covariance
function K. If (10) holds, then with Y*=Y n 1L,/ n ,_, / „ t

[11) Y=—Xon(D,9, II /qll) asn-oo.

Corollary 1. (Sen, et al.; Rechtschaffen) If X is a limit process in (11), then


for all A>0,

(12) P(IUXII A)^P(IIUII^A)


where U denotes a Brownian-bridge process.

Thus the process X is "smaller” than U. Note that (12) is equivalent to


P( IIXII > A) <_ P( IIUJII > A), which is an asymptotic version of Inequality 25.1.2.
Recall that weakly convergent processes can be replaced by equivalent
processes that converge almost surely; see Theorem 2.3.4. If these processes
are used to define new rv's, we obtain:

Corollary 2. If (7) holds [with all c,, ; = I and n ' E; G,, 1 (t) =z, 0 < t s 1],
-

then there exists a triangular array of row-independent rv's X,, 1 ,. .. , X„„, n ? 1,


having continuous df's F„,, ... , F„„, n >_ 1, whose reduced empirical processes
satisfy

( 13 ) II(X„ — X)/qll ßa5. 0 and II( 11'*+X)/ql a.s.0 asn -goo

for all q E Q satisfying J, q(:) -2 dt <co. Here X is a process equivalent to the


process X in (10).

The nexit corollary is useful for proving limit theorems under local alterna-
tives. Recall that the df's F,,,, ... , F„„, n > 1, are said to form a nearly null
812 INDEPENDENT BUT NOT IDENTICALLY DISTRIBUTED RV'S

array if

(14)
Imax
i,;_nHF F; — F II -'0 as

Corollary 3. Let F 1 ,. .. , F , n >_ 1, denote any null array. Then there exist
row-independent rv's X",, ... , X"", n >-1, with these df's such that the reduced
empirical and quantile processes X. and V. of the associated array of con-
tinuous rv's satsify

(15) II(X — U)/qll - 0and JI(Vn +v) /qll - v 0 as n -goo

for all q E Q with Jö q(t) -z dt < oo. Here U denotes a Brownian bridge. Thus
the empirical process f (F n — Fn ) of Xn ,, ... , Xnn , n > 1, satisfy

(16) -P0 asn->cc

for the same collection of q's.

Our proof could be made to rest on the following inequalities, but won't.

Inequality 1. (Marcus and Zinn) Suppose F, .. .‚ F"" are arbitrary df's,


c",, ... , cnn are arbitrary constants, and

(17) ti >_ 0 is ", and right continuous on (0, oo].

For all A, 5>0 for which (18) is a number of (0, oo],

P(JIPEn11>A +3)
k \
^P max I E cni[ 1 (_0,.](Xni) — Fni]+^II > A +
( :^^ki5n
(18) <_ 8P(IT I> A/4)/[1 - 648 -2 ET„]
T

where e,, ... , e" are iid Rademacher rv's independent of X 1 ,. .. , X" n and

./'

(19) n
7' = C,C E,Cnt'Y(Xnt) =^0,
f
.'
W Z dF"^•

Recognition that ifr = 1 [0, 01 (F")/q(F") is the appropriate function to which


to apply this inequality is from Shorack and Beirlant (1985). This led to the
reduction 7L" of the ß";'s.
CONVERGENCE OF X,,, V„, AND Z. WITH RESPECT TO Q /qfl METRICS 813

Inequality 2. Let E > 0. Suppose (1), (2), and

f
/2
(20) q >_ 0 is 1' and right continuous to [0, 2] and [q( t)] -2 dt < oo.
o ,

Then there exists y = y,, q in (—oo, o)) so small that

(21) P \II9(Fn)IIY^ >

Any value y with f(7)<_ 9 where !ö [q(t)] Z dt < r 3/1024 will work; thus y
depends on n, c n ,, ... , cnn , F 1 ,. .. , F" n only through Fn . Of course,
e \
(22) l1
P(I Zq. 1o>e)e
///
for this choice of 0, independent of n, c n ,, ... , C nn, Fn ,, ... , F"".

Proof of Inequality 1. We let X's be independent copies of the X, ; 's. Define


C" ; = Cm / V? and then define

(a) X,,, = cni[ 1 ( - .,',( Xnf) — Fn,] ,


X 1 C,,[1 (_x,,*)(XI) — Fni]11, X,, ni

to be processes on (D R , SR). Let p denote the middle term of (18). Then


k II
(b) p=P(max 7^"; >A+S^
l_k^n ;_,
k
-5PI1max Ili El X ^i
kn ll >'k )/L
1—PI
Oma jI;k,XnjII>S/J
(c) = Pi
[ 1— P2]
by the Skorokhod inequality (Inequality A.14.1). Now

k \
(d) p,=P^1ma jXsjII>A
„I E I by (A.14.7)

k
C 2 P( 1 ma n (^' e i Cnil( - m, • ](Xni)Y'll
} ^/
/ n A
(e) - 4P1 y^ E1Cn1 1 (-m .I(Xni )+P ^ 2) '

by the Levy inequality (Inequality A.14.2).


814 INDEPENDENT BUT NOT IDENTICALLY DISTRIBUTED RV'S

Let RC n:; denote the process associated with the order statistic X ; thus ;;

X n:; = X (D); , where D(i) = D, 1 denotes the rank of Xn;. Since cr is \, we have

(f) I n
^ EiCnil(—^']\Xni)WII ^ max c1 (Xn:k)I Y-
i= 1 I--kern

k
^`
k

J=1
- D(J) CnD(j)I

(g) ^ 2 t mk
x L
n E D(j) CnD(j)I( Xn:j)
j1

using the monotone inequality (Inequality A.2.2) for (g). Thus Levy's inequality
(Inequality A.2.8) at step (i) gives

k /
max Z
P1- 5 4P lsksn ED(j)CnD(j)`Y(Xf:j) >A /4
j=1

k
(h) =4E max I Y- £D(j)CnD(j)'tl(Xn:j) > A/ 4 Xnl, ... , Xnn
jp( --5k,^n 1j= 1 1

4El 2P(I Y- ED(j)CnD(j)W(Xn:j)( > A/ 4 I Xn1,... ‚ Xnn)}


l J=1

I) = 8P( Y, ED(j)CnD(j)cu1()'n:j) > A14)


j= 1

.1) =8P ( IY, E,Cni+V(Xn;) >A /4) =8P(ITT I>A /4).


J _1
Also
rP S E max X n; I
Z l
2 -2 by Chebyshev's inequality
i=1 J
k 11 2
k) <_ S 2E max ji (ui X)
[ 1:5k-5n li=I —

by Inequality A.14.3 with cp(x) = x2

(1)
[1 :5k
^n i Ii
k2
= S -2 E max Y- E,(RC n; —RC n,)

H k
by (A.14.7)

2 1
=S -Z E max
[ 1!5k^sn Z
,.^(Xn,)-1(_^*^(X *n;)}
J
j ( k 2
<_ S Z E max 2111 E;Cnr1(-m.•l(Xn,)]
i =1

k1 ] Z II]
r k
<4S -Z EI max
L lsksn ;_1 e;Cn;l(_^.)(Xn;)I
since Z Z*
CONVERGENCE OF X,,, V,,, AND Z. WITH RESPECT TO 11 Jail METRICS 815

j k 2
(m) X83-2EI EiCni1(_.*](Xni)H
i =1

by the Levy inequality (Inequality A.14.2)


k 2
(n)<32S-ZE[ ax I ED(J)CnD(J)q'(Xn:j) ] by (g)
1--s ksn j=1

((( k Z
<_ 323 Z ES E j I max 1-DQ)CnD<j)+G(Xn:j)
l ll lsk<_n j _ 1

( Xn 1, ...

)( n /
(0) ^ 645-ZE {E} ED(j)CnD(j) 1 (Xn:j)] xnl, ... > x}}
l j=1

by Levy's inequality (Inequality A.2.8)


n 2)
645 Z E^ E ED(j)CnD(i)I(Xn:j)]
j= 1
n ( 2

(p) = 3 2E ^ EiCni (Xni )] ) = 64 ET n.


j1

Plugging 0) and (p) into (c) gives the inequality. ❑


Proof of Inequality 2. Let 0< 0 < 2 i i =1 [0 0t („) j q(F„ ), and A = S = E j2 in
Inequality 1. Now E(e ; ^Ji (X„ 1 ))=E{{E ; /i(X„ 1 )IX„ ; }}=E{0}=0, so

ETA = 1 cni Var , (Xni)} = 1 c^ni


[ 4 ^ z dFni
cc;= 1 cci=) 0

_ f^ ^2 di n = f 1[o,o](Fn){q(Fn)] -2 din
0 0

(a) ^ J 0
q -2 dt by (3.2.58).

Thus Chebyshev's inequality gives

(b) P(^TTI> A/4)'x (4/A) Z ET^„<_ (64/a a ) f o q -2 dt.

Let y be such that F„ (y) < 0. Then plugging (a) and (b) into the Marcus and
Zinn inequality (Inequality 1) gives

(23) P(jj 1 Enjq(Fn)jjY^> e)


e (' B
(512/e 2 ) o q -2 dt/[ 1 _(256/2)
J o q-2 dt]
(c) <(E/2)/jl—E/4]<E
816 INDEPENDENT BUT NOT IDENTICALLY DISTRIBUTED RV'S

provided 0 is so small that

(d) J B q -2 dt < s 3 / 1024.

We have specified y so that F,,(y)-5 0 for the 0 of (d). ❑


Proof of Theorem 1. Now Theorem 3.3.3 tells us that

(a) Z,, is weakly compact on (D, .9, ^ P 11),

and that

(b) Z„ (some Z) on (D, 2, I1 11) if and only if (7) holds.

Moreover, it guarantees that Z must be a normal process with 0 means,


covariance function K, and P(Z e C) = 1.
Letting Z. -= Z„/q we have from the c,- inequality at (e) that

( a a
(e) EZ„(S, t] 4 2 3 j Z4(t)t] +EZ,,(S)a
\q(s) q(t)) }

(f) g 13(t — s)2+(t —s)[max c /c'c]


< L q 4 (t)

35 2 +s[max c „ /c'c] ; Z 1 1 a
+ s 2 s \q(s) q(t)!

3(t —s) 2 Z 1 _ 1 a
+3s
c 8L q4(t) (q(s) q(t)/
r 2
<48(` q u ( ) -2 du )
J
\s /

(2 4 ) 2517 f ^ q(u) -2 du1

since q on [0, Z] implies

(25) (t — s)lg 2 (t)C j j [q(u)] -2 du for gEQ


CONVERGENCE OF %„, V„, AND Z. WITH RESPECT TO 11 /q METRICS 817

and f/q(t)y, on [0, 1] implies

1 1 2 _ s
[ I_ q(S) ] 2
t _ q(s) 2
s Lq(s) q(t)^ q 2 (s) q(t) q 2 (t) L 1 q(t)J

z(t)J
t I s 2 t—s
q 1 —t
q2(t)

(26) <_ J [q(u)] -2 du for q E Q.

Now (24), Exercise 2.3.6 and (b) show that (i) -(iii) hold. The proof of Theorem
3.3.3 still suffices to imply (9). This completes the proof.
We make one further comment on the continuity of the process Z =1/ q
inasmuch as we appealed to the nonstandard Exercise 2.3.6. On any subse-
quence n' where Zn ' -* rd. Z we have convergence of third moments. Thus

E^Z(s, t]1 3 = lim EIZn •(s, t]1 3

lim (EZn (s, t])' ,


n'-. oo

(g)- { 7 f 5
q(u)du}1

by (24). Thus the familiar Exercise 2.3.5 shows that P(Z E C) = 1; that is
P(7L/qEC)=1.
Our original proof of this theorem used (22) to show that for 0 small enough
we have

(h) r)< e for n' sufficiently large;

this is the key step. We can replace Z„- by 7L in (h) since P(7/q E C) = 1. The
rest is trivial. ❑

Remark 1. Theorem I is from Shorack and Beirlant (1985). The same con-
clusions hold for the processes IL,, of the a n; 's, but the additional hypothesis
max.-, max,^ n nc, /c'c<M is required; see Shorack (1979a).
;

Proof of Theorem 2 and Corollary 2. All but (11) follow immediately from
Theorem 1.
818 INDEPENDENT BUT NOT IDENTICALLY DISTRIBUTED RV'S

Now the X. part of Corollary 2 follows from (10) and Theorem 2.3.4. To
prove the 0'„ part of Corollary 2, and hence (11), note that by (3.2.17)

jI(V^ +X) /glIsII[Xn(C„') — X(G „ ')]/gJIi „ ^ n


+ II[X( 6 n') — X]/q "" i-n„n
+'jiII[Gn o G' — I]/glliin /° +IIX/gIlö + jX/9j^i yin n

(a) = R,+R 2 +R 3 +R 4 .

Now van Zuijlen's inequality (Inequality 25.3.1) implies that for every E > 0 there
is a A = A 6 > I such that

(b) P(Gn'(t)^At for I/nit<1)?1 —E.

Hence

I
1 /n q
'/2

Ilq(A )I '/2

0 o
su I/2
q(At)
q(t)^
Vl

(c) v since q ) \ and ,1 > 1,

and by symmetry about 2 (c) yields

R,
—I [Xn(C.,') —

q(G )
X(C„')] q(G ')
q
ll'"-
Ihn

G
q vn

S^11(X„ SC)/q11
-0 — , as n-'ac'

by the X. part of Corollary 2. Using the same idea with (c) and Theorem 3.2.1
shows that R2 -+‚' Now R 3 < 1/[✓n q(1/ n)] -* 0, and R4 - p 0 by continuity
of X/ q with X(0)/ q(0) = 0. Hence the right-hand side of (a) --> p 0 as n - o0
and the V*„ part of (13) holds. But (13) implies (11). ❑
Proof of Corollary 1. From (3.2.15) and n' ^i Gn; (t) = t for 0 s t 1 it
follows that

S t—st
A —K.(s, t) n Y__
=—
;
n

i
G n; (s)Gn; (t) —st
(a)
n ;_,
n
_ — E (Gni(S) — S)(Gni(t) — t).
CONVERGENCE OF X,,, V,,, AND Z. WITH RESPECT TO 1q11 METRICS 819

Hence for any 0 <_ t i <_ • • • ^ tk ^ 1, with

II K(tr, t5)II = lim IIKn(tr, tS)II


n-.co

and

M„ =11 tr n is tr ts II,

Eq. (a) implies that

n ( / / /
(b) 1,—lv= l im I i Gni(tr)—tr)(Gni(ts)—ts)II
(
„-. = n i =1

is positive definite. Therefore, by Anderson's inequality (Inequality A.7.3)

(c) P( max IZ(tj )I<A)-P(max IU(t,)I<A)


1J-k Ist <_k

for all A > 0. Letting k = 2, t. =j/2m for j = 1, ... , 2' and letting m -* 00, (c)
implies (12). ❑

Exercise 1. Let T. be as in (18) when ^y = I [o , oj (Fn ) with 0 < B s 2.

(i) Show that

( z
(27) P(±TT >, 1)<expl — for all A>0,
2B ^AIBcII))

where 11cI) °max {Icn;I/ c'c: 1:5 i s n} and


2aresinh(A) 2( 1+A Z -1)
(28)

(ii) Determine properties of r analogous to those of Proposition 11.1.1.


(iii) Establish an analog of Inequality 11.2.1.

Hint: Compute E(exp (rT,)) as in Bennett's inequality (Inequality A.4.3),


where g(x)+g(—x), with g(x) _ (ex —1 —x 2 )/x 2 , is now the key function.

An Exponential Bound for the Weighted Empirical Process 71„

We now turn to an exponential bound for the process Zn.


820 INDEPENDENT BUT NOT IDENTICALLY DISTRIBUTED RV'S

Inequality 3. (Marcus and Zinn) For all n ? 1, c = (c ,, ... , c„„), G =


(Gnt, ... , Gnn), and A >_O,

(29) P(IIZn1I'A)<(1 +2 27ri1) exp(— A 2 /8).

Here Z,, can be either Z. of the a„ i 's or of the ß „ i 's.


Proof. First note that we may assume c'c = 1 without loss of generality. We

now take Y (t) = C„;1[di] and let Y;(t) = c„ i 1 [ &< < ^ where {a;, } is an indepen- ;
dent copy of {a i }. Then, since c'c = 1, we have

(a) Zn= (Y; — EYi)


i=1

and hence by an application of (A.14.3) and (A.14.14) with cp(x) = ems`,

E exp ( r IIZ II)C E


exp (rlli ' EiCniI[] II +r ll ^ EiCnil[a.; `—'III/
EQ E« -EE exp ( rll E^ Ercn,l[a ^ ']
lll +r li i
/
_— E^ Ee exp 12r E i cni I [a ^ s•^I^
<[ J
n 1/2
Ea'[EE exp 2ri L Eicnil[ •t
i=l />
by Cauchy-Schwarz on E.
n
<E.EE ex p (2r ll Eicni l [a * ] )
by {a„ i } {a'} and Cauchy-Schwarz on Ea
n

5 E.E. exp ( 2 r1)


/ j
(b) = E. exp (2r max I ED CnD,, ;
Isj<„ i=1

Since {E i } and {a „ i } are independent, {E D,,} is again a Rademacher sequence,


and by Levy's inequality (Inequality A.2.8)

P (
I` —n i=1
— j`
j

max I Et CnD ( ^ x) ^ ni 2P Y_ x
(
\ i=1
n

Eicni ^ )

(c) <_ 4 exp /_-- )



MORE ON L-STATISTICS 821

by elementary calculations using (e'+e - ')/2exp(r 2 /2) and c'c=1. Using


(c) together with (b) gives, after integration by parts,

(d) E exp (rIJZ„ II) < I +8r


1"^e
-x
2 /2 e2rz
dx < 1 +8 21rr exp (2r 2 ).

Finally, by the basic inequality (Inequality A.1.1) with g(x) = e'.

P(IlZ.^1I ^ A) e 'E exp (rflZ,11)


-

(e) <e-'A(1+8 21rre 2 r z ) by(d).

The choice r=,t/4 in (e) yields (27). 0

Exercise 2. Marcus and Zinn (1984) consider

(30) Z(x) = E c;{1[x;sx]—P(X;^x)}, —ao<x<o3,


i =1

for arbitrary independent rv's X1 , X2 ,... and arbitrary constants c,, c2 ,....
With

(31) a ^ — \ c? / loge ( ci )

they show that

Z
(32) lira sup <oo a.s.

Use Inequality 3 to establish (32) in the case 1:, , c, < oo. (Note the reduction of
Section 3.2.)
Marcus and Zinn in fact consider the more general process

n
(33) {^1,1tx5t—E(RI1tx1^xt)}/9(x), —oo<x <00,

and establish conditions under which the 11 Jj of this process is a.s. O(a) for
appropriate sequences a,, -> oo, here (rl . X ) are independent vectors.
; ;

5. MORE ON L-STATISTICS

In this section we present a central limit theorem for linear combinations of


order statistics of independent but nonidentically distributed random variables,
and explore some of its consequences.
822 INDEPENDENT BUT NOT IDENTICALLY DISTRIBUTED RV'S

Suppose that X,, ... , X,,. are independent with continuous df's
F 1 ,.. . , F... Let c.,..., c,,,, denote known constants, the scores, and define
J,, byJ,,(t) =c,, ; for ( i-1/n <t^i/n,1 i<_n. Let h be a fixed function. Then,
if X.,, ... , X,,,, are independent rv's with df's F,, ... , F.,,, the linear combina-
tion T„ we want to consider is defined by

i n
(1) T„ c,,;h(X,,:;)
n =1;

in
n ;_ i

where X,, , <


: < X,, ,, are the order statistics of the X n:; 's. With g„ = h° Fn',
:

we have

(2) Tn =f t g„(G '(t))Jn(t) dt

= f o' gn(G (t)) d n(t),

where, as in (14.1.4)

(3) "Vn(t)Mf J„(s)ds for0<t-1.


l '/2

A natural centering constant is

(4) µ „(J,,)= f l g°(t)J,,(t) dt = f


o
o
l g. d'Y„•

If J„ -' J, it is natural to replace µ„(J„) by

(5) µ „(J) =
f o,
g„(t)J(t) dt = ^ th(F;')J(t) dt.
0

A version of the following theorem for bounded scores was given in Shorack
(1973).

Theorem 1. Suppose that the functions J. and g„ = h °Fes' satisfy assumptions


19.1.1 and 19.1.2 for all n>1 (so that Ih ° F;'i < D on (0, 1) for i=1,2 and
;

all n >_ 1). Further, suppose that the covariance functions K. of the reduced
empirical process X,, satisfy

(6) K„(s,t)-*K(s,t) asn-^ooforall0<_s,t^1.



MORE ON L-STATISTICS 823
Then

(7) T.—An(J))=a- 1 Jx dgn=N(0,ß„) as n - co,

where

(8) On 2 (J, K, gn) = JJ 1 K(s, t)J(s)J(t) dgn(s) dgn(t).


0

In addition, if

(9) r— u. z (J, Kn, gn) _ f^, n(s,


1 K t)J(s)J(t) dgn(s) dgn(t),
0 0

then

(10) o 2,,—T„-+0 as n-*.

Corollary 1. If lim inf ra o, >0 and the hypotheses of Theorem 1 hold, then

/i( n—/In (J))


(T -*N(0,1) as n-.
un .

In view of (10), Qn in (11) may be replaced by rn .

Exercise 1. Give a proof of Theorem 1 following the proof of Theorem 19.1.1,


but using Theorem 25.4.2 and Inequality 25.3.1.

It is of interest to compare the variance T 2, = v 2 (J, Kn , gn ) in (9) with the


corresponding variance (19.1.27) in the iid case with Fn; = Fn for all i ^ i <_ n.
Thus a natural comparison involves letting g = = h - F in (19.1.27). Setting

(12) K0 (s,t)=snt —st

so that

ao- a 2 (J, Ko, gn) = JJ (s A t — st)J(s)J(t) dgn(s) dgn(t),


0 0
824 INDEPENDENT BUT NOT IDENTICALLY DISTRIBUTED RV'S

it follows as in the proof of Corollary 25.4.1 that

o —Tn =
r jo At st
-

J 0
l [S — —
K,,(s, t)]J(S)J(t) dgn(S) dgn(t)

J
1
—(Gni(S)—s)(Gnf(t)— t) ✓ (S) ✓ (t) dgn(s) dgn(t)
0 .n
,
il

I 2

[Gni(t) t}J(t) dgn(t)^


n i=1 I0 —

(13) = 1 j r^ LFni(X) — Fn(x)lJ(Fn(x)) dh(X)1 2 .


ni=I l t

Thus we always have o — T2 > 0.


An interesting special case is the contamination model with (1 —s)n of the
F, ; 's equal to some df G, and sn of the FF; 's equal to some df H. Then
F = (1— --)G+ EH = F, and, with h(x) = x, it follows that gn = F 1 , and hence
F

in this case (13) reduces to

(14) vo— T „ = E(1 —s)I (G(x)—H(x))J(F(x)) dx 4 2 .


L JJ^ J
Note that in this contamination model the covariance function K n of X. is
given by

(15) K„(s, t) = s A -- t Y_ n ;_,


n
G1(s)G(t)

= s A t—(1 —e)Go F - '(s)Go F '(t) —BHo F - '(s)H° F '(t) - -

K(s,t) foralln>_1

so that the hypothesis (6) of Theorem 1 holds trivially.


Equations (13) and (14), the following examples, and a discussion of some
implications for robust estimation were given by Stigler (1976).

Example 1. (The sample mean) When h(x) = x and Jn = 1 for all n >_ 1,
T = X, the sample mean, and (13) yield the well-known formula
T

Oro Tn -- L (t'Lni i ) 2 ,
ni =i

where ii ni = I x dFni (x) and µ = l x dF (x). P

Example 2. (The trimmed mean) When h(x) =x and J(t) =


(ß —a) '1 [ , ß] (t) with 0^a^ß^1, let A= F„'(a), B =Fn'(ß), and define
-

MORE ON L-STATISTICS 825

F T to be the distribution F truncated at A and B:

0 x<A
F T (x)= F(x) A<_x<B.
1 x>B

Then, if E(X I F T ) denotes the expectation of a ry X with distribution F T,

fb„ (J)=[E(XI P T ) — aA — ( 1— ß)B]l(ß — a),


and

o"o—T„=(ß—a) zl Y_ [E(XIF T ) — E(XIF T )] z .


n
For the contamination model this becomes

Qp—Tn= E(1—E)(ß—a) 2 [E(XIHT)—E(XIGT )]z.


Example 3. (The median) Let h(x) = x. If we assume that densities f„ of ;

the F„ ; exist, let f„ = n ' ^;_, fn;, and suppose that f„ ° F-'(1/2)> 0, then,
-

letting at and ßJ,, Example 2 yields

12(J)- 1)

and
n -

aÖ — Tn_* [f ° Fn ' (z)} 21 [Fni°Fn'(z)-1i2


n
the contamination model this becomes

o'p—T Z„-e(I—e)[(I—s)g(0)+eh(0)] 2 [G(0)—H(0)] 2 ,

where (1—e)G(0)+eH(0)=2.

Example 4. (The mean deviation from the median) Let h(x) = x, J(t) _
sign (t — 2), and suppose that 0 = Fn' (z) is unique. Then

14 „(J)= J Ix-0I dF„(x)=E(IX — Oil Fn)

and

OÖ—Tn — L [E(I X — 0II Fni)—E(IX —BII Fn)}Z.


n ;_ 1
For the contamination model this becomes

0'ö—T„=e(1—e)[E(IX- 0IIG)— E(IX- 0II H)]2.


CHAPTER 26

Empirical Measures
and Processes for
General Spaces

0. INTRODUCTION

Let X,, X2 ,... be iid X- valued random elements with induced distribution
(probability measure) PEP on X. Here (3C, si) is a measurable space and P
denotes the set of all probability measures on X. The empirical measure of
X1 ,.. . ‚X is

(1) Pn =n

where 8,, is the measure with mass one at x E X, 5(A) = 1„(x) for all A E Si.
The corresponding empirical process is

(2) Zn =v'i(Pn —P).

For a (measurable) function f on X, let P(f) = If dP for any measure or signed


measure P. Thus

(3) zn (f) =1 f dZ„ =Vi J fd (P n — P)

= vT [P (f) - P(f)]

=n ' / 2 E [.f(XX)-Ef(X1)l.
-

For a fixed function f it follows from the strong law of large numbers, central
limit theorem, and law of the iterated logarithm that: If Ef(X) <oo, then

(4) Pn(f)—P(f)=f(X)—E.f(X) ^as0 as n - co;

826
GLIVENKO-CANTELLI THEOREMS VIA VAPNIK-ÖHERVONENKIS 827
and if Ef2 (X) < oo, then with o- 2 (f) = Var [ f(X )], both

(5) Z„(f) -+dN(0,o'2(.f)) as n-+oo

and

(6) Z,,(f)/b,,—[—o(f),o(f)] a.s. as n-+co.

In this chapter we will view 71„ as a process indexed by f€ where SW is


some class of functions on X. An important special case is that of indicator
functions of some class of sets 19 c s49: _ { 1 c : C E W C .si}. Then we will also
write Z,,,(C)=Z„(l c ) for CE `P. Note that when X=R d and 16=
°
{(-oo,x]:xER }, then F,, (x)=P„( -oo,x] is just the "usual" empirical df,
F(x) = P(-co, x,] is the df of the X's, and Z„ (x) = Z,, (-m, x] is the empirical
process.
Recall that {7L„ (f): f E .° "} was considered in the case of a compact set X
in Section 17.3. The chaining argument used there is a commonly used tool in
proving limit theorems for Z„.
Our main object in this chapter will be to give statements of uniform versions
of (4) -(6) for general sample spaces X, or, in other words, appropriate analogs
of the Glivenko-Cantelli theorem (Theorem 3.1.3), Donsker's central limit
theorem (Theorem 3.3.2), the strong approximation Theorem 12.1.2, and the
law of the iterated logarithm Theorem 13.3.1. In most cases only appropriate
references will be given. The expositions of Dudley (1983), Gänßler (1983),
and Pollard (1984) provide further detail and elaboration on these themes.
The rough idea is that uniform (in % or W) analogs of (4) -(6) hold if the
class of functions F or class of sets W is not "too big.” There are two basic
ways of saying that or 16 is not "too big": One involves a strictly com-
binatorial idea, that of the Vapnik and thervonenkis index; the other involves
counts of numbers of sets needed to cover 9, or the metric entropy of . Both
of these will be described in more detail below.

1. GLIVENKO-CANTELLI THEOREMS VIA THE


VAPNIK-CHERVONENKIS IDEA

If f is some class of sets, or 3F is a class of functions, when does a Glivenko-


Cantelli theorem for W, or for 9, hold? In other words, when does

(1) D„(` ) IIP, - PII,= supIP,(C) - P(C)I


CEW

or

( 1 ') Dn(.9)= I1P,, - sup IP,,(f) - P(.f)I


fE19
828 EMPIRICAL MEASURES AND PROCESSES FOR GENERAL SPACES

converge a.s. to 0? Throughout this section we assume that D. (` ') and D. (Y)
are measurable. A variety of generalizations of the classical Glivenko-Cantelli
theorem are available. Here we consider extensions to rather general classes
based on the following combinatorial ideas.

Vapnik—Chervonenkis Classes of Sets

Let X be the space in which observations X 1 ,. .. , X,, take their values, and
let W denote a class of subsets of 3r. For any finite subset F of X, let

(2) #,e(F)=#{FnC:CEW}

where # denotes cardinality. Thus # ,,(F) denotes the number of different


subsets of F that can be obtained by intersecting F with members of W. If
#,c (F) = 2« F so that IC carves out all subsets of F, then we say that f shatters F.
For r= 0, 1, 2, ... we define

(3) me(r)=max{#^(F): FcX, #(F)=r}

and call it the growth function for . Note that m^ (r) 2'. Let

mf {r: m c (r) <2 r }


(4) v— V—
co if m, (r) = 2' for all r < oo,

and

(5) s =sg=sup {r: m,,(r)=2'}


=ve -1,

so that v denotes the smallest integer for which some subset of size v cannot
be shattered by W. The integer v g is called the Vapnik-Chervonenkis index
number of W If vw <‚ then T is called a Vapnik- ehervonenkis class, or VC
class.

The Main Result

Theorem 1. (Vapnik-Chervonenkis) Suppose that '6 is a VC class of subsets


of 3r, and that Dn (19) = JP,, — PJJ,, is measurable. Then

(6) Dn () -4 a_5. 0 as n - oc uniformly in P E J'.

That is, for every e > 0,

(7) sup Pr[maxD m (W) >s]= sup Pr[max11P m —PII w ?r] -*0
PE 9 min P€P m?n

as n-c .
GLIVENKO-CANTELLI THEOREMS VIA VAPNIK-HERVONENKIS 829

Several inequalities provide the key to the proof. The first is the classical
Vapnik-Chervonenkis (1971) inequality and the second is a variant thereof
due to DeVroye (1982).

Inequality 1. If W is a VC class of subsets of är, then for all A > 0 and n >_ 2
we have

(8) Pr(D„(W)>_A)<4m,,(2n)exp(—nt2/8) for n?2 /FA Z

and

(9) Pr(D„(t)?1t)<_4mw(n2)exp(4A+4A2)exp(-2nA2)
for n>(2vA - ')

for the growth function m e of (3).


Proof of Inequality 1. Let P. and G„ , denote the empirical measures of
X 1 ,. . . , X„ and X +1 ,. .. , X,,,,, respectively. Then for A >0 and 0<0<1 we
have

(10) Pr(IIP„ — PII,6>A) Pr(IIP„ — Q„•II,>( 1- 8)A)/I1 — Z 2 I


l 4EA n/

provided 40 2 A 2 n'> 1. To prove this, let X ( ' ) = X, 3E ; and 3E (2) = X"+; ; and
let P ' and p(2) denote the measures on X' and 3E (2) induced by (X 1 ,. . . , X„)
( )

and (X„ + , ... , X„ + „•), respectively. Then

Pr(IIP„ — O„•Ikk>(1 - 9)A)= j y 1 tiP^ u>( , e)AId{P(,)x p(2))

(a)
f' [f.
(1) r^Z^ 1 [IIP^-Q^ III>( , e)a) dP (2)] dP(,)
using Fubini's theorem

where
(b)
A(n=Lx(`):11P„—P11 >A].

Now if x ( ' ) E A ( ' ) , then there exists a set C0) E T such that IP„ (Cx ww>, x ( ' ) ) —
P(Cx ('))I > A. Thus if x (z) e Az23,, where

(c) A) = [x2: x(2)) — P(Cx ( , ))I < OA],


830 EMPIRICAL MEASURES AND PROCESSES FOR GENERAL SPACES

then IMP „(•,x")—Q „(•,x (2) )l^ W >(1 -8)A. Thus, from (a),

Pr(IJP. — Qn>( 1 -8 )A) > dP


12) ] dP (,3
J [J

__P f ')
A
2 (A9) dP^'^
x

(d) > JA Pr(jBinomial (n', P(C,rmm))— n'P(Cm)l ^ n'OA) dP m ' m

fA [1-40212,]
dP” by Chebyshev's inequality

(e) = r 1 _ 482A2n l P c' (Ac>>)= I 1 - 482 Ä , I Pr(IIPn


2n — P1116> A)

as we sought to show in (10). Note how (10) changes a one-sample problem


into a two-sample problem. This is the first key step of Vapnik and Chervonen-
kis. (DeVroye used the (1 — O)A shown above, whereas Vapnik and Chervonen-
kis used A /2.)
Now for the second key step. Note that since

(f) Pr((jPn—On'it >(1 -8),l)


= E {Pr(II Pn — Qn.II > (1— O)A 1 X,, ... , X ± )},

it will suffice to

(11) find a good exponential bound for


Pr( IIPn—On'11I>(1— O)A IX =x)

where X = (X I , ... , X„ + „-) and x = (x,, ... , x„ + „'). For this it will be handy
to have the notation so that
n+n'
(12) Fn+n' (C)= l c (x ; ) /(n+n') for CE 16

= (the empirical measure of x,, ... , x + ,,.).

We now note that for a fixed C we have


We

P„(C) — On , (C) =P„ (C) — [(n +n')Fn+n'(C) —


nPn(C)]/n'

(13) =n t, n [P n(C) — un+n' (C)]


n' 1 n
;_, ;_,
+ 1 n+n'
_— E 1 C(X1) 1
C(Xi)^. —
n n n+n'
GLIVENKO-CANTELLI THEOREMS VIA VAPNIK-CHERVONENKIS 831

Consider the following urn model for (13). Imagine an urn that contains n + n'
balls of which

(14) k=kc=(n+n')^nn+n•(C)

bear the number 1 while the rest bear the number 0. These are thoroughly
mixed (since X,, ... , Xn+n • are iid). Then n of them are chosen at random
without replacement and designated to comprise the first sample. The number
W of 1's in the first sample has the same distribution as does the conditional
distribution of n P n (C) given X = x. Thus

(15) (nP.(C)I X =x)= W= (a Hypergeometric rv).

We thus have the fundamental conditional rewriting of (13) as

(16) (P n (C)—Q .(C)I(=x)_n+n' _W EW


n' [n n

Applying Hoeflding's corollary (Corollary A.13.1) and then (A.4.9) to the ry


W of (16) gives

(17) Pr(IPn(C)—Qn'(C) I>(1 — O)A IX= x)


(IW—EWI n' )
=Pr` > ,(1—B)A J
n n+n
(g) (Pr
n,Binomial
k/(n + n')— nk/(n + n'))I n' (1
>
C n n+n'
n ' 2
(h) ^2exp(-2n(n+n,(1-8)A)) for any CE T

(18) =2exp(-2n(1—B)2A2(1-1/n)2) setting n'=n 2 —n


(i) <_2 exp (-2nA 2 +40nA 2 +4A 2 )
(19) =2exp(-2nA2+4A+4A2) setting B=1/(nA).

We also note that for n'= n 2 — n, 0 =1 / (nA ), and n>_2, the last term in (10)
satisfies

(j) [1-1/(4O2A2n')]-' 2.

(DeVroye specified n'= n 2 — n as above whereas Vapnik and Chervonenkis


specified n' = n.) How can we use the exponential bound of (19) where a single
set C is considered to obtain a bound on (11) where (1 1k is required?
?
832 EMPIRICAL MEASURES AND PROCESSES FOR GENERAL SPACES


XI

Figure 1. For the set C shown, the set FF of (20) has # FF = 4.

Now for the third key step. Each set C E 16 defines a set (see Fig. 1)

(20) F =Fc =Cn{x,,...,x„ + „•}=Cn{x,,...,x „2}

where the k of (14) satisfies k = # F. Whereas # 16 is typically very large


(even infinite), we note that #{Fc : C E f} is typically much smaller. Note
that #{Fc : CE f} is always bounded by 2" Z , and in fact is bounded by the
Vapnik and Chervonenkis index number m c (n 2 ). The ry W of (15) is in reality
a ry W. But as C varies over all CES, at most #{ Fc : C E €} < m,, (n 2 )
different rv's W are required. Moreover, for each ry W the bound (19) holds.
Thus

(21) Pr(IIP„—Q„•II R >( 1 — B ),l X= x) <_ ►n^(n 2 )2exp(-2na z +4A +4A 2 ).

Since this is independent of x,

(22) Pr(IIP„— OIl, g >( 1 — O),1) <m^(n 2 )2exp(-2nd 2 +4,1 +4A 2 ).

Combining (22) and (j) into (10) gives the inequality of (9).
Most of this was learned from a P. Gänßler seminar, given at the University
of Washington, in which a slightly different inequality was proved. ❑

One reason for the importance of VC classes is that when v < co it is possible
to obtain a bound on the growth function m(r) defined in (3) that is of an
order far less than 2'. This is the fourth key step.

Inequality 2. (Vapnik and Chervonenkis) If s ° s w <oo so that W is a VC-


class, then, with r C,, = Y_-'= o (i), we have

(23) m v (r)<_,C< s <_zrs /s! for r >_s +2.

Proof. See Dudley (1978) or (1983). ❑


GLIVENKO-CANTELLI THEOREMS VIA VAPNIK-C`HERVONENKIS 833
Proof of Theorem 1. Now
00

Pr[supDm (l)?E]<_ Pr[D,„1


min m=n

(24) < 6 exp (4e+4e 2 )(s!) -1 m 2 s exp (-2me 2 )


m

for all PE 9 by applying (8) with the bound (23) on m w (n 2 ). Thus (7)
holds. ❑

Remark 1. Note that for all sets C we have used Hoeffding's crude bound
(A.4.9) on the hypergeometric ry We of (18). For most sets, sets C having
the k c of (18) larger than 1, we can use the far stronger bounds of Inequality
11.1.1. However, the combinatorial method of proof given does not allow us
to claim that with high probability k c is substantially larger than 1. The next
section will incorporate such ideas in a different context.

Exercise 1. Use 0 = z and n' = n in the above proof to show that (8) holds.

Exercise 2. By considering (24), show that under the hypotheses of Theorem


I we have

(25) lim <_ s a.s.


n-.^ log n

[If we had a maximal inequality at our disposal, we could go to subsequences


and replace log n in (25) by log t n. Even so, the constant of the right-hand
side of (25) is excessive.]

Examples of VC Classes

Example 1. If ? = R' and W _ {(—oo, x]: x E R d }, then v w = d + 1. If W =


{(a,b):a ; b ; ,i=1,...,d}, then v,,=2d+1. ❑

Example 2. If X = R d and 16= {all closed balls in R d }, then v e = d +2. (See


Dudley, 1979.) ❑

Example 3. If 3r = R d and IC = {all half-spaces in h} = {x E Rd: (x, u) > c,


u e R d, c E R}, then v c = d+2 (see Dudley, 1979). ❑
Example 4. Let H be an m-dimensional real vector space of functions on a
set 3r, f a real function on X, and Hf = { f + h : h E H}. For any function g e G
on 3r, let pos (g) = {x: g(x) > 0} and pos (G) = {pos (g): g e C}. Then if 1
pos (Hf ), v w = m + 1 (see Dudley, 1978, 1983). ❑
834 EMPIRICAL MEASURES AND PROCESSES FOR GENERAL SPACES

Example 5. If är = Rd and T = {all polyhedra with at most k faces}, then


vcg <03.

°
Example 6. If X = R d and W = {all lower layers in R } = {B: x E B and y_ ^ x
implies yEB}, or if 'W = {all convex subsets of l }, then * = oo. v
Exercise 3. Let X 1 ,. . . , X,, be iid d vectors, let u E S d- ' be a unit vector in
R d, and consider the empirical df of u • X,, ... , u • X":

u)— n ' Y_" l . ,, for x€ R, u€ S '.


0="(x) ° 0="(x, - tu X X] d-

If F(x) = F(x, u) = P(u • X -5 x), use Inequalities 1 and 2 and Example 3 to


show that if din -^ 0 then

D" =sup sup IF (X, u)— F(x, U)P^ a.s. 0 as n - 00.


U z

This is related to "projection pursuit w ; see Diaconis and Freedman (1984).

For strengthened versions of Inequality 1 in the classical case of lower left


orthants, see Kiefer (1961). In the general case, see Alexander (1984).

Equivalence of a .8, and -> p for D(')

Even though s c = oo, it may still be true that # ( {X,, .. .‚X})<2 with
probability converging to 1, and hence the Glivenko-Cantelli theorem may
continue to hold for such classes. The following theorem addresses this
possibility.

Theorem 2. (Vapnik-thervonenkis; Steele) Suppose that D" (W)


SIP" — P11, is measurable. Then the following are equivalent:

(26) jP"—PII W 3 P O as n - oo;

(27) I1 f°" —111 4a.s.0 as n - 03


(28) n 'E # (
- log w {X,,...,X"}) -.0, as n- ' .
Proof. See Steele (1978) or Dudley (1983, Section 11). ❑

Note that (28) holds for any VC class W (uniformly in P). In general,
however, the convergence in (26) and (27) need not be uniform in P.

Exercise 4. Let

&" ° c[(P" (A), P " +1(A), ...): A E 4]



GLIVENKO-CANTELLI THEOREMS VIA METRIC ENTROPY 835
so that 9„ is the o-field generated by the set of observations {X 1 ,. . . , X„}
plus the tail of the sequence X„+1, X,, +2 .... .

(i) Show that (II P„ — P II T , F!,1 ), n > 1, is a reverse martingale.


(ii) Use this to prove the equivalence of (26) and (27) in Theorem 2.

2. GLIVENKO—CANTELLI THEOREMS VIA METRIC ENTROPY

Entropy Conditions

A useful way to say that a class of functions . is not "too big" is to control
the growth of some measure of the number of sets needed to cover JW. Several
different versions of the metric entropy functions have proved useful, and we
now define a few of the more common variants.
Let E > 0, 0< r <— oo and suppose that P is a probability measure on (3r, sq).
Let IIJIIY = If Ifl dP}'I'r. Then define

for some f,, ... , fk E W,(P)


(1) Nr(e, /, P)=min k:
min Ilf—f11 r <e for all fE

The function

(2) Hr(e) = Hr (e, 1, P) = log Nr(E, 9, P)

is called the metric entropy of 9 in 2r (P).


Another type of entropy is defined as follows: for f, g E YO (X, si), the set
of real-valued 4-measurable functions on 3r, let

[f g]={hEYO(X,sa^):f<_h<_g}

where the set is empty unless f s g. Given e > 0, r> 0, and a probability
measure P on (X, .si), define

for some f,,...,fkEFr(P)


(3)
I
NBr(E, F, P) = min k: I'^'7{'^'
CV{[Ji,/]•Ili fhIr^E}
i.l
1JJ

The sets [f, f] are called brackets, and

(4) HBr (e, ST, P) = log NBr (e, °7, P)

is called a metric entropy with bracketing.


836

(5)

and
Dr(E, S, F, 9 ) = min k: 1^k

Dr(E,
xeS
[f(x) — f (x)] r ^ E
for all fc3W

F, %) = sup Dr (e, S, F,,)


S
_
EMPIRICAL MEASURES AND PROCESSES FOR GENERAL SPACES

A third type of (combinatorial) entropy has been introduced by Pollard


(1982). Suppose F is an envelope function for .f: thus F is measurable and
(f j <_ F for all f E 3 . Let S be a finite subset of 3r and let e > 0. Define

for some ft , ... , fk E F


min r Y

xeS
F(x) r

where the supremum is over all finite subsets S of 3 . Then Pollard's entropy
is

(6) Hr(e,F,F)=logD,(E,F,F).

Proposition 1. (Pollard; Dudley) Let F E 2, (P), 1 <_ r < oo, F ? 0, suppose


that W is a VC class of sets, and let {Flom: Cc 16 }. Then for k> s(t)
there is an A <oa so that

(7) Dr(e, F, 3) —<Ae - rk for 0 <e <— 1.

Pollard (1982) calls a class S sparse if

[H(u, F, )] 1/2 du <oo.


(8) J

Note that if FE2'2 (P), then (7) implies that (8) holds with room to spare.
Pollard (1982) shows that when F E $'2 (P) and (8) holds, then Z. indexed by
9 satisfies the central limit theorem uniformly in Je 9; see Section 3.

Example 1. Let ^= maß,,,, = {f: [0, i] d - R such that max i ,,,,,, sup {ID"f(x)I:
xe[0,1] d }+max ipl =„, sup {IDPf(x)— D °f(y)^/^x—yj ' : x^ye {0,1} d }M}
where m =[Ji3, y = ß—[ß], and 4pl =p, +• • •+pd with p =(p 1 ,...,pd ) for
integers p i . Then

H^(e, 9) ^ Ae -d is for 0 < E <_ 1

for some constant A = A(ß, M, d); this is due to Kolmogorov and Tihomirov
(1959). Thus

HH (u, g) 1/2 du <oo if d <2ß. ❑


J 1
0
WEAK AND STRONG APPROXIMATIONS TO THE EMPIRICAL PROCESS Z, 837

Example 2. Let -9= {l c : C is a convex subset of [0, 1 ] d } with d>_2, and


suppose P has a density bounded by M < oo. Then

HB 2 (e, Jw, P):5 Ae

for some constant A = A(d, M). This is due to Bronstein (1976). ❑

Glivenko-Cantelli theorems can also be formulated for the process indexed


by functions. Here are two results along these lines.

Theorem 1. (Blum, DeHardt) Suppose that 3'c Y, (X, .s9, P), that
NB, (E, 9, P) < oo for all e>0, and that 11P.—P11, is measurable. Then

(9) IIPn — PIIs->a.s.0 as n->oo.

Proof. See Dudley (1983, Section 6). I

Theorem 2. (Pollard; Dudley) Suppose that 9v is a class of functions with


envelope function F€ 2, (3C, d, P), that D, (e, F, ) < oo for all e > 0, and that
II P„ — PlL is measurable. Then

(10) IIPn — PII$ 'a.s.0as n-+oo.

Proof. See Pollard (1982) and Dudley (1983, Section 11). ❑

It is not hard to show that (10) holds uniformly in Pe.9, for any collection
.9 1 satisfying


sup fF1[ F ^ AJ dP0 as A-oo.
PER 1 J

For other approaches to Glivenko-Cantelli theorems, see Gaensßler and


Stute (1979) and Gine and Zinn (1984).

3. WEAK AND STRONG APPROXIMATIONS TO THE


EMPIRICAL PROCESS Z.

The approximation theorems of Chapter 12 extend to the general empirical


process Z. indexed by some class of functions ; this extension was carried
out by Dudley and Philipp (1983). In this section we sketch their main results.
In order to state these results, we first introduce appropriate Gaussian limit
processes analogous to Brownian motion S and Brownian bridge U. Let
(X, .s^, P) be a probability space and Y Z = YZ (3r, s^, P) ° { f: f: Y -> R', f is
838 EMPIRICAL MEASURES AND PROCESSES FOR GENERAL SPACES

.s1- measurable, $ fz dP < oo}, and set

1 1,2
(1) er(f,g)= J(f — g) z dPl forf,gEYz.

Let W be the isonormal Gaussian process indexed by '2 (3E, s4, P); i.e., the
rv's {W P (f): fE .z } are jointly Gaussian with mean 0 and covariance

for f, gE^P2.
(2) EWP(.f)WP(g)=P(f9) =
J fg dP
See Dudley (1973) for more information about W. Let Z P be another Gaussian
process indexed by 9z (3r, s', P) with mean 0 and covariance function

(3) E7r(f)7p(g)=P(g)—P(f)P(g)

=f (f — J fdP) (g —JgdP)dP.

Note that Z, can be obtained from W P by

(4) ?L(f)_W(f)—P(f)W(l)

in complete parallel to Exercise 2.2.1.

Exercise 1. Show that if (X, ,i, P) = ([0, 1], , I) where I denotes Lebesgue
measure, then {W 1 (1 [0 ,, ] ): 0_< t_< 1} is Brownian motion, while {l 1 (l [0 ,, ] ): 0<_
t <_ 1} is Brownian bridge.

Now let 11 - 11 0 be the semi-norm on 12 (X, .s', P) defined by

(5) II fII oP (f) = P(f2 ) — P(f) 2 , f E 1(.% ., P)

and let p, denote the corresponding pseudometric:

?0
(6) PP(fg)=OpP(f —g)=
< (f,g)•

Definition 1. A class 9c Y2 (X, s4, P) will be called a Z -BUC class if and


only if the process 7 P (f)(w) can be chosen so that for all w the sample
functions f -> ZL p (f)(w) restricted to f E .w are bounded and pp uniformly
continuous.
WEAK AND STRONG APPROXIMATIONS TO THE EMPIRICAL PROCESS Z . 839
Definition 2. A Zp-BUC class will be called a functional Donsker class for
P if and only if there exist processes V (f), f e 3F where V; = Z p are independent
;

copies of Zp for which . is Z -BUC for each j such that


m
(7) n-"2 max frZ, — E V II - Al,,
^i`^ f=1

where M^ -*p0 as n->oo.

Definition 3. A Z -BUC class 9 will be called a strong invariance class for P


if V; , j >_ 1, can be chosen as above but with (7) replaced by

(8) IIz^—n 1/Zi Y_ v 'O / b^c U,,


=1

where b ^ =/ 1j 2 n and U,, a.s ,0 as n - .

Note that if V 1 . V 2 ,... are iid copies of Zp, then

(9) n-'^2 Y- ^'; =7Lp for every n>1,


i =1

while

(10) E (Y, V;(f)) (^ Vj(g)) = (m n n)(P(jg) — P(f)P(g))•

Theorem 1. (Dudley and Philipp) Suppose that c. 2 (X, ,i, P) is a class


of functions satisfying both

(11) .w is totally bounded in .'2 (X,1, P)

and

(12) for every e >0 there exists a S >0 and n o such that for all n ? n o

Pr*Isup[IJ
(f—g)dz^I:.f,gcF,ep(fg)<3 >et<E.
1
Then .^ is a functional Donsker class for P; that is J

k
(13) n - ' /2 manx VkZ k — VjDI M^ -p0 as n-00.
i I
-

Moreover, M. -+, 0 for any r < 2.


840 EMPIRICAL MEASURES AND PROCESSES FOR GENERAL SPACES

If, in addition, there is an Fe Y Z (X, sad, P) such that I f ^ F for all f e


then M„-* 2 O as
If only j (F 2 / LLF) dP < eo [where LLx ° log (log (x v e) v e)], then 9 is a
strong invariance class for P; that is, (8) holds.

Conditions (11) and (12) . of Theorem 1 can be verified in many different


ways, and hence Theorem 1 has many corollaries. Here are two examples.
For the first one, let
(14) c
-F = {1 : CE ?}

and write

(15) H,(e) = HB,(e, F', P)

with HB,(e, 9, P) as defined in (26.2.4). Note that HB, corresponds to using


the metric dp(C, D) P(CAD), where A denotes the symmetric difference
operation, since
EI1 c —1 D I = P(COD).

Theorem 2. (Dudley and Philipp). If

i2
(H, (u 2 ))' du <oo,
(16).1 0

then the 9 in (14) is a functional Donsker class for P.

Another result in this same vein is the following theorem due to Pollard
(1981b).
To state Pollard's theorem, we introduce the notion of K -measurability:
Regard Z, as an element of the space B(.) of all bounded functions on JV
metrized by uniform convergence. Let C(S) consist of all bounded real
functions on which are uniformly continuous with respect to the.T z (. , .d, P)
norm on 9. Finally, a random element of B(f) is K- measurable if it is
measurable with respect to the o -field generated by all closed balls in B(3W)
centered at points in C(f). If Z. is stochastically separable for all n> 1, then
Z. is K-measurable for all n > 1.

Theorem 3. (Pollard) Let Y2(?, sat, P) and suppose jfj Fe


Y2 (X, .c, P) for all Je Sk If 3F is K-measurable and

(17)
J 0
[H2 (u, F, F)]" 2 du <ce,

then is a functional Donsker class for P.


WEAK AND STRONG APPROXIMATIONS TO THE EMPIRICAL PROCESS Z. 841
In particular, Pollard (1981b) shows that (17) holds for ={1 c F: CE 16}
if W is a VC class of sets and FE 22 (T, .B P).
This type of theorem has already been put to good use in statistics; see,
for example, Pollard (1979), where Z. indexed by sets is used to study
chi-square tests with estimated cell boundaries, and Pollard (1982), where Z.
indexed by functions is used to study clustering methods.
APPENDIX A

Inequalities and
Miscellaneous

0. INTRODUCTION

Recorded here are many of the most useful inequalities in probability theory.
Many of them are used in this book, but not all.
When dealing with sums of independent rv's X i ,. . . , X„ we will let
O'k= Var[Xk ] and s„ -o +• +Q „. Also S„=X, +• • • +X' and X„=S„/n.
Recall that we write

(I) Xk=(0,uk)

to denote that Xk has mean 0 and variance u k, while

( 2 ) Xk = Fk ( 0 , o k)

also specifies that the df of Xk is Fk .

1. SIMPLE MOMENT INEQUALITIES

Basic Inequality 1. Let g ? 0 be an even function that is 7 on [0, cro). Then

(1) P(IXI >A)sEg(X)/g(A) for all ,1 >0.

Markov's Inequality 2. Fix r> 0. Then

(2) P(IXI>A)<_EIXI'/A` for all A>0.

Chebyshev's Inequality 3. If X = (µ, o), then

(3) P(IX —µI>A) Q2 /A 2 for all A>0.

842
MAXIMAL INEQUALITIES FOR SUMS AND A MINIMAL INEQUALITY 843

Jensen's Inequality 4. If g is convex on (a, b) with -m <_ a < b < oo, P(a <X <
b) =1, and EIXI <oo, then
(4) Eg(X)>
_g(EX).

Equality holds for strictly convex g if and only if X = EX a.s.

Liapunov's Inequality 5. We have

(5) [EIXI']'1' is in r for r>_0.

C, Inequality 6. For any rv's X, Y and any r> 0

(6) EIX + Yl'< C,[EIXI'+El YI']

where Cr equals I or 2r - ' as 0<r<1 or 1 <_ r.

Cauchy-Schwarz Inequality 7. We have

(7) EIXYI<_[EX2EY2]'i2

with equality if and only if aX + b y = 0 a.s. for some a, b having a 2 + b 2 > 0.

Hölder's Inequality 8. Let r - ' +s - ' = 1, where r> 1. Then


(8) EIXYI - [EIXI.],ir[EI YIs3I/S.

Minkowski's Inequality 9. Let r? 1. Then

(9) [EX + Y^']'/ ' [EiX l']'I'+[El Yj'] I /r.

See Loeve (1977) for these inequalities.

Inequality 10. If X > 0, then


00
(10) P(X>n)EXs I P(X>_n).
n=1 n=o

2. MAXIMAL INEQUALITIES FOR SUMS AND A MINIMAL


INEQUALITY

Kolmogorov's Inequality 1. If Xk - (0, Qk), 1 k <_ n, are independent, then

(1) P( max Sk I>_,1)<s/A 2 for all A>0.


Isksn
844 INEQUALITIES AND MISCELLANEOUS

Monotone Inequality 2. (Shorack and Smythe) For arbitrary rv's X l , .. . , Xn


and 0<b 1 <_ • • <_ b n we have
k
(2) max ISk I/bk <_2 max
Isksn 1sk^n j=1

If all Xj >_ 0, we may replace 2 by 1.

Häjek-Renyi Inequality 3. If Xk = (0, Qk), 1 <_ k <_ n, are independent and


0 b, < • • • < b n , then

(3) P( max ISk j/bk >_d)< ok/bk /^1 2 for all ,t >0.
1 k n L k =1 ]

Levy and Skorokhod Inequalities 4. (Skorokhod) If Xk , is k n, are


independent, then for all 0< c < 1 we have

(4) P( max Sk >A)<_P(S,>cA)/ min P(Sn -S1 s(I -c),1)


Isksn l^ksn

for all A > 0. Thus if Xk = (0, o k), then we have both


-

(5) P(max Sk ^A)52P(Sn >_cA) for all A>fs n /(1-c)


1^ksn

and (Levy)
(6) P(max Sk >_A)<_2P(S n >A- 2s n ) for allA>0.
1^ksn

Inequality 5. For the Poisson process f

(7) P( sup h(t) - tI^A)s3P(IN(b) - bI >


_ ,1- 2(b -a))
a< ‚b

for all A>'/2(b-a).

Menchoff's Inequality 6. If EXk = 0 and EX; Xk =0 for all 1 <_ j, k -5 n, then

2 log 4n 2 2
(8) E( max S k
1^ksn ) ^ log 2) Sn.

See Loeve (1977) for Kolmogorov's inequality; the monotone inequality is


a special case of A.12.2; Renyi (1970) contains the Häjek-Renyi inequality;
Skorokhod's inequality is in Breiman (1968); Inequality 5 is an exercise; and
Menchoff's inequality is in Stout (1974).
A bivariate maximal inequality is found in Sheu (1974).
MAXIMAL INEQUALITIES FOR SUMS AND A MINIMAL INEQUALITY 845

Symmetrization Inequalities

We call the number m(X) a median of the ry X if P(X>_m(x))?2s


P(X <_ m(x)). For any ry X, let X' be independent of X with the same df as
X; then XS = X — X' is called the symmetrization of X.

Weak Symmetrization Inequality 7. For all A >0

(9) P(X — m(X) ^ A) <2P(X s > A).

Levy's Inequality 8. Let X 1 ,. . . , X. be independent. Then

(10) P( max [Sk —m(Sk —S)]>—A)<-2P(S n >_A)


1 ksn

for all A>0.

Exercise 1. Prove Inequalities 1-8.

Mogulskii's Minimal Inequality 9. Let X 1 ,. . . , X,, be independent, and set


Sk =X I +• • +X, for
for 1<k<n. For 2<m<n and for all A 1 ,A 2 >0 we have

(11) P( min 1S k JA I )^P(ISj<_A I +A 2 )/ min P(jSS -Sk 1<A 2 )


msksn msksn

Corollary 1. Let Xi = X1 (I) = I[o ,, ] () — t for 0 t ^ 1 for independent rv's 6 i


on [0, 1]. Let Sk - X I + • •+Xk for In- k < n. For 2 <_ m<— n and for all A,, A 2 >
-

0 we have

(12) P( min liSkIi_A,)<P(IiS n jk<A I +A z )/ min P(I(S n — Sk Il^A2)-


msksn m^ksn

Proof. The same proof works in both cases. We have


n

P(JSnj^AI+A2)> Y_ P( min 1SiJ>AI,1SkI^AI,ISnI^A1+A2)


k=m mi^k-1
n
Y, P( min ISil>AI,ISkI^AI,ISn—S&5A2)
k=m mask-1
n
_ Y_ P( min 1Sil--A lsl`skl`A1)P(ISn —
Ski`—A2)
k=m msisk-1

? { min P(I S. — Sk I <_ A 2 }P( min Si I <_ A,)


msk^n m_i_n

using a first passage time argument. p


846 INEQUALITIES AND MISCELLANEOUS

Inequality 10. Let X,, ... , X„ be iid (0, 0.2 ). Let 101 < 1 and let 101 < S < 1.
Then for all s >0 we have

X1 +... +Xn
(13) P
C ✓nb.
-0 'SE

—nP(IXI>^)— EJXl [JXJ ^^ 3


[log^n]s

for all n 2t1 and some 0<c 1 , c2 < oo (where bn =v' 2 log t n).

See Loeve (1977) for most of these. See Mogulskii (1980) for Inequality 9
and Goodman et aI. (1981) for Inequality 10.

The Continuous Version of the Monotone Inequality

Inequality 11. (The monotone inequality) (Gill; Wellner) Suppose q >0


for t> a and q is /' and right continuous. Suppose Z is a right-continuous
function of bounded variation with Z(a) = 0. Then for any a < b s c in the
domain of q and Z we have both

(14) JjZ/gjjä`-2
I J ,. [q(s)]_' dZ(s) 11a
[a ]

(we can replace 2 by I if Z is 7 , but 2 is best possible in the general case) and

(15) Iq b q(b)I f(b, ^ dZ l b q

If both q and Z are left continuous, the result still holds. Likewise q \, and
c b < a is allowable provided q and Z are both right (or left) continuous.

The monotone inequality (Inequality 2) was used by Wellner (1977a) to


obtain a version of (14) for JU„/q11. Gill (1983) gave a version for q continuous
and Z a semimartingale. The above version seems to have the right degree of
generality.
Proof. Suppose first that

(a) Z is ,' and right continuous with Z(a) = 0,,

and let Z equal 0 on ( —m, a). Let

(b) V(t) = [q(s)]' dZ(s) for a <_ t<


J a .tl
-
MAXIMAL INEQUALITIES FOR SUMS AND A MINIMAL INEQUALITY 847
with V equal to 0 on (—oo, a). Then by the Radon-Nikodym theorem

(c) Z(t) = J q(s) dV(S).


(a,t)

Note that

(d) V is 7 and right continuous with V(a) = 0.

Thus

Z(t)=f qdV= q+dV


fla,t]
(e) =V+(t)q+(t)—V_(a)q_(a) —J V_dq byA.9.13
[a t]

= V+ ( t)
I[at]dq+ V+(t)q-(a) — V (a)q-(a)— V
fla,t)
q

(f) =I [a.t]
[V+ (t)— V_(s)]dq(s) since wma q_(a)=0

(g) <_ sup [V+ ( t)—V_(s)] dq since V?0 is J'


ass<! fc aj]
(h) = V(t)q + (:).

Hence

(i) Z(t)/q(t)=IZ(t)I/q+(t)- V + (t) = V(t) under (a).

In the general case we have

(j) Z = Z, — Z2 for Z, and Z2 satisfying (a).

Thus subtracting (f) for Z 2 from (f) for Z, gives

(k) Z(t)
f fa,t]
[V+ ( t)—V-(s)]dq(s).

Thus

I Z(t)1 sup I V+(t) — V (s)l q+(t)


aesst

(1) X211 V1Iäq+(t)•


848 INEQUALITIES AND MISCELLANEOUS

The constant 2 in (1) is best possible: If a = 0 and q(2) = E while Z is such


that V_(2) = V+ (1) =1 with VJ, on [0, Z] and T on [2 1], then equality holds
i

in (1) if 2 is replaced by (2—i).


Now to prove (15). First note that for t? b we have

(m) Z(t) = Z(b)+[Z(t) —Z(b)].

Thus

Z
(n)ilgilb^ q +li 1ib
C

' (b)' Z q (b)

1 d[Z—Z(b)]ii^
cIZ(b)I+iif(b,-,
q(b) q b

by applying (14).
Suppose now both q and Z are left continuous. At any point x where q is
continuous, changing the sense of continuity of Z does not change the value
of [IZ+ (x)j v IZ_(x)^]/q(x). At any point where q has a jump, changing the
sense of continuity of both q and Z does not change the value of
[JZ+(x)J /q+(x)] v [JZ- (x))/q-(x)]. ❑

3. BERRY— ESSEEN INEQUALITIES

The basic idea of this sort of theorem is that the df of the normalized sum of
independent rv's will converge uniformly to the N(0, 1) df at a rate provided
slightly more than a second moment exists. Thus it provides a rate of conver-
gence for CLT-type results. In many cases, it can be used as an alternative to
exponential bounds in LIL-type results.
The basic results were provided independently by Berry and Esseen. The
versions we give here are variations on the original theme; we give references,
but not original sources. We also present refinements due to Cramer and Stein.
Let c(x) denote the N(0, 1) df, and let 0 denote its density.

Theorem 1. Let X 1 ,.. . , X„ be independent with X; = (0, o) having


E[X;g(X,)]<ao for 1-5i<_n where

(1) g is even, ?0, g(x) for x >0, and x /g(x)?' for x >0.

Then for some absolute constant c we have

(2) sup IP(S /s, <_x)—(D(x) <—c


(
an=, E[X^g(X1)]
-=<x< S.g(S.)
BERRY- ESSEEN INEQUALITIES 849

The classical result of Berry and Esseen uses g(x) = lxi. The next version
does not require the existence of anything beyond a second moment.

Theorem 2. Let X1 ,.. . , Xn be independent with X; = F; (0, o) for 1 <— i <_ n.


Then for all e >0

SUp I P(Sn/Sn <_ X) -4 )(x)1


—ex:,<x<ao

(3) ^ z x 2 dF;(x)+ a jx)3 dF.(x)


L Sn Z
i =1 J IxI?es,^ Sn i =1 J ^x^<cs,

where c is an absolute constant.

The next theorem produces a bound that improves as lxl 1'; but stronger
assumptions are needed.

Theorem 3. Let X 1 ,.. . , Xn be iid (0, 1) with E I Xi12+s < oo for 0< 3 C 1. Then
for all x

(4) IP(Sn/sn <_ x) — 4(x)i <— +X12+s]


ns/2[1

where cs depends only on S.

Petrov (1975) contains Theorems 1, 2, and 3 (with S = 1), while Michel


(1976) contains the general version of Theorem 3 (due to Nagaev). These are
excellent references from which to begin additional search. Michel's (1976)
paper is an excellent reference for Sections 3-5. Barbour and Hall (1984) give
a different sort of proof based on the approach of Stein (1972).
The following theorem of Cramer (1962) shows that finer approximations
are possible.

Theorem 4. (Cramer) Let X, X 1 , X2 ,... be iid (0, Q 2 ) with

(5) Y1= E(X — µ) 3 /Q 3 and yZ = E(X —µ) 4/Q ° -3

well defined. Suppose

(6) lim jEe""j<1


ItI- W
(which is true if the df F of X possesses an absolutely continuous component).
Then

(7) sup O(n-1),


—=<x <=

850 INEQUALITIES AND MISCELLANEOUS

where

(8) Fh(x)= ((x) - 0(x){6 (x 2-1 ) + 3 -3x)


24n (x
3

+2n (X 5 -10X 3 + 15x) .

The key to the above results is the following lemma of Esseen [see Petrov
(1975)].

Lemma 1. (Esseen) Let G denote a fixed df having mean 0, density g


bounded by M, and characteristic function i/i. Let F denote an arbitrary df
having mean 0 and characteristic function 4,. Then for all a >0

1 fa 4,(t) —p(t) 24M


(9) sup jF(x)—G(x)j^— i dt+ .
-aD<X<OD Tr a t air

The following variation on an unpublished result of C. Stein also follows


from this lemma. See Stein (1972) for a more powerful version.

Theorem 5. (Stein) Let X and Y be random vectors. Let T(X) and S(X, Y)
be rv's. Suppose for some a >0 and 0< E, 8 s 1 and some K 1 , K2 , Ks we have:

(i) T=T+S.
(ii) EIE(SI X) +aT/21<K 1 aE (typically, K1 =0).
(iii) EIE(S 2 I X) —aI -5 Ke as.
(iv) EISIz +s — Ksaes.

Then we have
(10) sup IP(T-x)—P(N(0, 1)^x)I<_(5/S)(K,+K 2 +Ks )E S ^^' +a>
-OD<X<N

If K, = 0, then
2 +s
(11) ET=0,IVar[T]-1I:K 2 e,and E^TI <cO

4. EXPONENTIAL INEQUALITIES AND LARGE DEVIATIONS

We shall let the bound for N(0, 1) rv's serve as a standard for later comparison.

Mill's Ratio 1. For all A >0

(1) 113 1 exp I exp ( A2


( A ,1) 2zr
— ( A 2zr \— 2 )
EXPONENTIAL INEQUALITIES AND LARGE DEVIATIONS 8S1

Thus if A. / oo we have (with S" \ 0)

(2) P(N(0,1)>A")=exp(— (1—S")) as n-->oo.

Exercise 1. Show the Ito and McKean (1974, P. 17) result

2 <_ e k2/2 e_ ,2/2 dt < 2


,1 2 +4+A JA ,12+2+A

Exponential Inequalities Remark 2. Even though a sum of independent rv's


may be approximately normal, there must be restrictions before the exponential
rate (1) or (2) can be approached.
Let Xk = (0, o), 1 <_ k n, be independent. Let A . We would like to
claim that

(3) P(S"/s">a ")=exp(— 2 [1+o(1)]) as

According to Pinsky (1969), conclusion (3) holds provided

(4) A/log (1/r)-*0 with r"=Y_ EIXk l 2 +s/ S n+ 6 .+ 0


I

as n - oo for some S > 0. Michel (1976) contains some refinements and improve-
ments of this in the iid case. According to Kostka (1973) in the iid case, if (3)
holds for a" = (1 + e) 2 log e n with e>0, then EX 2 [ 1 v log X ]e < eo. Con-
versely, if EX 2 [1 v log IXI] 1+3 ` <x, then (3) holds with a" = (1 ± e) 2log e n.
More generally, if EX 2 g(X)<co where g(x)>'oo and x/g(x)/ as xToo and
if exp ((A./2){1 + o(1)})/g(v') - 0 as n moo, then (3) holds. See also Michel
(1976) for refinements when g(x) = Ixl s.

For bounded rv's we can obtain inequalities valid for all A >0 that are
improvements on the above claims. Those due to Bennett and Hoeflding will
be used heavily, but we list others also.

Bennett's Inequality 3. Let X 1 ,.. . , X" be independent with Xk <_ b, EXk =0,
and Var [Xk ] = o for 1: k <_ n. Let Q Z = (o + .. . + „)/ n. Then for all A ? 0

(5) P(/i A)^ exp (-2QZ4(v2


✓ n)

852 INEQUALITIES AND MISCELLANEOUS

Here ifr(A) = (2/A 2 )[(1 + A) log (1 + A) — A] satisfies (note also Proposition


11.1.1)

(6) i(0)= 1, 0 is ]., +J(A)—(2log A)/A as A - co,


(7) AiIi(A) is T

and
(8) 0(A) >1 -0 if0<A<_30, since i/i(A)>_1 /(1 +A /3) for A>_0.

Proof. See Bennett (1962). Now

^X; ?A)=infP (exp (rXi )^ >_exp (rA)I


P( 1 r >o 1

(a) <_ inf exp (—rA)E exp f X . ;

r >o I

Now EX; = 0 and g(x) ° (e x — 1 — x)/x 2 for x 0 0, with g(0) = Z, is nonnegative,


T, and convex for x E (—co, co), and hence it follows that
Ee = E(1 +rX; +r 2 X?g(rX1 ))
<_1+r 2 o g(rb)
p/ 2 e .b
— b2 — r 1
(b) <_ ex a' b
;

This step is from Chow and Teicher (1978, p. 339).


Using (b) and independence in (a) gives

(c) P (E Xi ? A') : inf exp (—rA + nv 2


erb — 2— rb )
i r>o b J

Differentiating the exponent of (c) shows it is minimized by the choice r =


(1/b) log (1+Ab /no 2 ). Plugging this value into (c) gives
n
Ab
1+ ( —
P „I X' ^ A ) ^ eXp (— b log ( 1 + nQ2 ) + 622 (nQ2 — 1Dg l n i,) )
[(I+ no,2 )
=exp^ —b b log 1+ Ab
\ e— 1]
A I /Ab
=exp/ — 2+1)log \l+
LLL no z ) no, J/ no ,2 /
z
(d) =exp ( -
2no, 2 (no, ))
EXPONENTIAL INEQUALITIES AND LARGE DEVIATIONS 853

Formatting this result in terms of the function 4i, instead of in terms of


h (x) = x(log x —1) + 1 as Bennett did, is due to Shorack (1980). ❑

Exercise 2. (Chow and Teicher, 1978) Show that the function g(x)
(e" —1— x)/ x 2 , x ^ 0, defined in the preceding proof, is ? 0, T, and convex
for all real x. (Hint: Use the mean-value theorem.)

H tfding's Inequality 4. Let X 1 ,. . . , X. be independent with 0: X k ^ 1 and


EXk = µ k for l <_ k <_ n. Let µ = (µ, + - - - + µ n )/ n. Then for 0 < ,t < 1— µ

(9) exp(-2A2) for0<µ<_2

(10) exp fort —µ<1.


2 µ( 1— J-^)

Proof. We follow Hoeffding (1963). Let µ denote µ. Since the exponential


function exp (sx) is convex, its graph on 0 < _ x s 1 is bounded above by the
straight line connecting its ordinates at x = 0 and x = 1. Thus

esX<_(1—x)+xes for 0<x<1;

so that the moment generating function M of X satisfies

(a) M(s)^1—EX+EXe'.

Thus by Markov's inequality, (a), and the fact that the geometric mean does
not exceed the arithmetic mean,

P(X —/i A)=P(exp1 sXi I exp(n(IL+A)s) if s>0

<exp (—n(µ+A)s)M E .. X ,(s)


n

^ exp ( n(µ+A)s) fl ( 1—
— +µi e s )
1 ‚i

cexp
n I (1—/ßi+µies) J
(—n(µ+A)s) —

(b) = [exp (-(µ+A)s)(1 — A+µ e s )l n .


But (b) is minimized (just easy differentiation) by the choice

(1—tl)(A+A) >0.
so=log


854 INEQUALITIES AND MISCELLANEOUS

Putting s o into (b) yields


µ+A n
[
(c) P(X µ ^ A)C (µ+A) (1 —µ—A) _ J
(d) = exp (—nA 2 G(A, µ)),

where

AIogµ+A + l—µ2 Alogl—µ—A


(e) G(A,µ)=µ A2µ
A 1 —µ

We will now minimize G(A, µ) with respect to A for 0<A < 1— µ, and the
resulting minimum will be denoted as g(µ). Now

=(1 - 2 1 -µ^ log(1 1


A2 as G(A µ) ^µ/
+ A\
—I 1-2 µA I log11 µ +A I

(f)H(-_
1 A ) —H(_A
)
where 0 <A /(1 —µ)<I and 0 <A/(µ+A)<1 and where for 11slt<1 we have

2
H(s)= 1 -- log(1 —s)
s

= 2 +(3 - 2)S 2 +(4 - 3)s 3 +(5 4)s4^'. .

with all coefficients of this power series positive. Thus H(s) is T' for 0< s < 1.
Hence we see from (f) that (a/eA)G(A, µ)>0 if and only if A /(1 —µ)>
,1 /(µ + A) or equivalently A> 1- 2µ. Hence for 1— 2µ > 0, G(µ, A) achieves
its minimum at A =1- 2j.; while if 1— 2µ <_ 0, then G(A, ) achieves its
minimum at A = 0. Plugging these values into G(A, µ) yields for the minimum

1 log 1 for 0< µ<


µ
(g) g(µ)= 1-2i
fortsµ<1.
2µ(1—µ)

Since g(µ) ? g(Z) = 2 is easy, we have from (d) that

(h) P(X —µ>A) exp (—nA 2 g(µ))<exp (-2nA 2 ).

This gives the inequality. ❑


EXPONENTIAL INEQUALITIES AND LARGE DEVIATIONS 855
Bernstein's Inequality 5. Let X,, ... , X„ be independent with EXk =0 and
EIXk l" vk n!c n-2 /2foralln>_2 where c>0. (If IXk I <_K, then c=K/3 works.)
Then for all A>0

>_a) A2/2
(P(VX
11) exp –
( [Y, t vk /n+cA/vn])
Exercise 3. Prove Bernstein's inequality. Note that for bounded rv's it follows
from Bennett's inequality (5) and (8).

Hoeffding's Inequality 6. Let X 1 ,.. . , X„ be independent with a ; <_ X; <_ b ; for


1 <_ i <_ n. Then for all A>0

(P(
12) i(X –EX)-A)<exp(-2A 2 / ( bi – a ) 2) .

Kolmogorov's Exponential Inequality 7. Let Xk = ( 0, vk) with IXk l < K, 1 < k <_
n, be independent. Then

(13) P(S„/s„>A)<
exp(– 2z(,_AK))

"
forall a <_s„/K
exp (– 4K) for all A >_ sn / K ;

for any E >0 there exists K r and A such that


E

(14)
C
P(S„/s„>A)>exp ---(1 + r))provided A?.l and — K.
•l2 Kn
E

Klass (1976) notes how this upper bound (see Loeve, 1977, p. 254) can be
extended to the event P(max,. k .„ Sk /s„> A); see Klass for the exact details.

Exponential bounds for the special cases of binomial, Poisson, beta, and
gamma rv's are found in Sections 11.1 and 11.9.

Large Deviations

We define

(15) M(t)=Eexp(tX) for –oo< t <oo

to be the moment generating function (mg/) of the ry X. Then 0< M(t) -- o0


for all t e (–co, oo) and M(0) = 1. Define

(16) K =inf{M(t): t>_0}


856 INEQUALITIES AND MISCELLANEOUS

Elementary Exponential Inequality 8. We have

(17) P(X?0)<_K.

The elementary exponential inequality (Inequality 8) is surprisingly sharp


when applied to X In fact, we have the following large-deviations result of
Chernoff (1952).

Theorem 1. (Chernoff) If X, XI , X2 ,... are lid on [—oo, co) and a„ -* a E


(—co, co) as n -^ oo where P(X > a) > 0, then

(18) n - ' log P(J?? a.)-. f(a) as n- co


[and > may replace ? in (18)] where

(19) f(a)=sup{at—logM(t): t >


-0 and M(t)<oo}.

Note that for "smooth enough" M we will have

(20) f(a) = aT—log M(-r) for r defined by a = (d /dt) log M(t)I ,=,.

Example 1. Let X = Poisson (1). Then in Chernoff's theorem (Theorem 1)


we have T = log a and f(a) = h (a) for h (x) = x(log x —1) + 1 as in Chapter 11.
Thus we obtain the result

(21) R - ' log P(t[Poisson (r) — r]/r A)--* — (,1 2 /2)i(±A) as r--oo. 0

A version of this type of theorem that applies to more general rv's was
proved by Sievers (1969) and improved by Plachky and Steinback (1975). We
present it following a listing of some of the properties of mgf's. See Bahadur
(1971) for Theorem 1 and the next three exercises.

Exercise 4. M(t) is convex on (—co, co). Consequently, {t: M(t) <oo} is an


interval containing t = 0.

Exercise 5. Suppose P(X = 0) < 1 and M(to ) <co for some t o > 0. Then M is
strictly convex and continuous on [0, t o ], M has derivatives of all orders on
(0, to ), and M' is continuous and T on (0, to ).

Exercise 6. Suppose M(t)<co for all t >0, —co <_EX <0, and P(X>0)>0.
Then M(t) <oo on some (0, t0 ) with to > 0, M'(0 +) <0, and M'(t) > 0 for some
0 <t<to.
MOMENTS OF SUMS 857

1 ^./
Mit)

K to
Figure 1.

Theorem 2. Let the nontrivial Z. have a finite moment generating function


M„ on [0, t,) and suppose for finite c(t) that

(22) n ' log M„(t)-+ c(t)


- for tE (to , t,) with 0 t o < t,.

Let

(23) a„ - a E A = {c'(t): the function c' exists, is T, and


is right continuous at t E (to , t,)}.

[M„ finite implies M;, is T and continuous on (0, t i ).] Then


n ' log P(Z„/n > a„) -* —sup {at — c(t)}
-

t>o

(24) = — {aT — c(T)} if c'(T) = a.

5. MOMENTS OF SUMS

We begin with an approximation to the moments of normalized sums.

Von Bahr Inequality 1. Let X k = (0, ok) with EIX k r < oo for r>2, 1:5 k ^ n,
be independent. Then

I EI S„/s„I' — EI N(0, 1)1'I (some M)/n [ '^cr z>li ,


- z

where
./z
+1 \
EIN(0, 1)t'=^ I'(r .

If r is an integer>_ 4 (?3 works for lid rv's), then we may replace absolute
moments by moments.
See Michel (1976) for the above result and von Bahr (1965) for refinements.
See Pinsky (1969) for analogs for Eh(S„/s„) with h” continuous. Gut (1975)
provides a good review of rth mean convergence. See also von Bahr and
Esseen (1965).


858 INEQUALITIES AND MISCELLANEOUS

rth Mean Convergence Theorem 2. Let X1 ,. . . , X,,,.. be iid and let 0< r < 2.
The following are equivalent

(i) EIXI' <oo (we suppose EX = 0 in case 1:5 r <2).


r
(ii) S n /n -+ a.S. O as n -^ oo.
(iii) EIS I' = o(n) as n - OO..
(iv) E[ max jSk i']=o(n) asn -moo.
1sk_n

Von Bahr–Esseen Inequality 3. Let X 1 ,.. . , Xn be independent with 0 means.


Then for 1 <_ r 2

EISn I'<2' ' E EIXk Jr.


-

Inequality 4. If X l , .. . , Xn are iid with E I X (' < oo for r ? 1, then

E[ max lSk I'] s 8EISn I'.


1sksn

Hornich's Inequality 5. Let X, X 1 ,... , X. be iid with mean 0 where P(X ^


0)>0. Then for some c>0

E I Sn I >_ cf for all n.

If X has variance o-2 E [0, cx ], then

tim E ' Sn' = 2 lim -/ n = ?


ES - o.
n-. n-.00 V nV ?r

(See Chung, 1974, p. 172.)

We now extend our consideration to the whole infinite sequence. Gut (1978)
summarizes the following results and extends the second to higher-dimensional
indices; Gabriel (1977) extends the first.
For iid rv's we have

(1) E[sup iSn /ni'] <co for fixed r>_1


n

if and only if the 1937 Marcinkiewicz and Zygmund condition

(2) in case r=1, or EJXJ' <oo in case r>1


BOREL-CANTELLI LEMMAS 859

holds, if and only if the 1962 Burkholder condition

(3) E[sup I X /nkr] <oo for fixed r? 1.


n

For iid mean 0 rv's, Siegmund's (1969) and Teicher's (1971) results show
that the following are equivalent:

(4) E[sup IS„/ n log t nI']<co for fixed r>_2,

(5) E[X2 log XI/log i lXi] <n if r = 2, or EJXj' < if r>2,


(6) E[sup iXn / n log e ni']<ao for fixed r?2.
n

Burkholder's Inequality 6. Let r>_ 1. If S„ =X,+X 2 +• • •+X„ where the X; 's


are independent (or where S o = 0, S IT . .. , S. is a martingale, and r> 1), then

cEIS„1' CrE
c,E \ \i l X? / r/2 / \ \i X? / r/z /

for some constants C r and Cr . [See Burkholder (1973).] Also max,. k ^„ Sk i'
may, using Doobs inequality, replace EIS„V.

See Mogyorödi (1975) for additional inequalities. Some involve


E4)(max,^ k .„ 1Ski) for convex 0.

6. BOREL—CANTELLI LEMMAS

Borel—Cantelli Lemma 1.

(i) If Y; P(A) < co, then P(A„ i.o.) = 0.


(ii) If the A„'s are pairwise independent, then Y_° P(A) = oo implies
P(A„ i.o.) = 1.

Renyi's Variation 2. If E ° P(A) = oo and

In n J z
lim E Y_ P(A A )j/
; k E P(A k )j = 1, then P(A„ i.o.) = 1.
n-.m I I I J

Serfling's Variation 3. Let sf1„ _ o-[A,, ...‚A]. If Y,, P(A„) = co and


^° 1P(A,jsi k _,) — P(A k )i < oo, then P(A„ i.o.) = 1.

Kostka's Variation 4. If P(A) = oo and P(A; A k ) 75 cP(A) )P(A k ) for all


j, k ? some N where c < eo, then P(A„ i.o.) > 0.
860 INEQUALITIES AND MISCELLANEOUS

Baum, Katz, Stratton Variation 5. Suppose Y,, P(A n )=oo, P(B„)?a for all
n where 0<a<1 and P(A„B„A k Bk ) = P(A n )P(B„A k Bk ) for all 1 <_ k < n —1
and n ? 2. Then P(A„Bn i.o.) ? a.

Klass Variation 6. If A n is independent of {B n , B n+ ,; ...} for all large n, and


if P(Bn i.o.) = 1 and lim n -. P(A n ) > a, then P(A n n Bn i.o.) >_ a.

See Chung (1974), Renyi (1970), Serfling (1975), Kostka (1974), Baum et
al. (1971), and Klass (1976) for these results.
We also have the following results approximating tail probabilities based
on the principle that rare independent events are almost disjoint. See Wichura
(1973b, p. 446) and Chung (1974, p. 62)

Lemma 7. (Wichura, 1973) Let A k , k >_ 1, be pairwise independent with


Y_ ° P(A k ) <co. Then

P U Ak P(Ak) as n -* oc.
n n

Lemma 8. (Chung) Let Akn , 1 <_ k < n, be independent with


) P(A) -> 0
as n - oo. Then

P(LJ Akn) ) P(Akn^) as n -> cc.

7. MISCELLANEOUS INEQUALITIES

Inequality 1. (Events lemma) (i) Suppose that for every k> I the events
A k , A_ 1 ,. . . , Aö and Bk are independent (where A 0 denotes the empty set).
Then

P1 U AkBk^ ? cP (U A k ^ where c = inf P(Bk ).


k k k

(ii) Suppose that for every k >_ 1 the two classes of events
{A ik (Q J A;•k _,) • • (Y- j A;,,)`: i is arbitrary} and {B ik : any i} are independent.
Then

PI lJ E A ik B;k ) ? cP(U E A k ) where c = inf P(B;k ).


k i k i i,k

MISCELLANEOUS INEQUALITIES 861

Proof. (i) See Loeve (1977). Now

P(uAkBk) = P((A, B,))+ P((A, B,)`(AzB2))


+P((AIB,)`(A2B2)`(A3B3 ))+ . .

>_P(A I B 1 )+P(Ac,A 2 B2 )+P(Ac,Ac2A 3 B 3 )+• • •

= P(A, B,)+ P(Ai A2)P(B2)+ P(Ai A2cA3)P(B3) +.. .


>_c[P(A,)+P(A',A 2 )+P(A,AZA 3 )+• •

= cP(uA k ).

(ii) Now

U Y AikBik)
P( k i /
/
=P1 Ai1B11])+P([EA11B11] IY_Al2B;z])+ ...

! 1 i

^P([: A 11 B11 ])+P([Y A1,] [,A,,Biz])+ ...

_ P(Ai,)+ ( [EAil Ai2B12)+.. .


i J
=Y, P(A„)P(Bi,)+ P(Biz)+.. .
\k A11] Al2)

'— c j^ P(Ai,)+E, P( Ail Aiz) +...^

=cjP([ A})+P([LA] Y_Al2] J+...}


[

= cP(U Aik I • ❑
\k i /

Bonferoni's Inequality 2. P(n", A ; )>_ 1—Y_; P(A).

Anderson's Inequality 3. Suppose that X and Y are normally distributed


random vectors in R d with E(X) = E( Y) = 0 and covariances matrices Y X
and E y, respectively, where E X is positive definite and E y and — I y
are positive semidefinite. Then for any convex set C c R d symmetric about 0,

P(XEC)-P(YeC).

Equality holds only when E =0 ° the null matrix.


862 INEQUALITIES AND MISCELLANEOUS

8. MISCELLANEOUS PROBABILISTIC RESULTS

Exercise 1. (Moment convergence) If X„ -* d X as n --> oo and if EJX„l b <


(some M) <oo for all n with b >0, then EIXI°->ElXl° and EX n EX as - '

n oo for all real a <b and all integers k < b.

Exercise 2. (-> p is equivalent to -* 8 , s . on subsequences) We have X„ -S P X as


n -* m if and only if every subsequence n' has a further subsequence n" on
which X„- - a . s , X as n" -*co.
,

Exercise 3. (Cramer-Wold device) X N(0, 4) if and only if a'X


N(0, a'Ea) for all constant vectors a. Moreover, if a'X„ -+ d N(0, a'Za) for all
constant vectors a, then X. -^ d N(0, T).

Exercise 4. (See Stigler, 1974) Let (X, Y) have joint df F with marginals
G and H. Then

(1) J
EX = o G(x) dx+
00
J '0

0
[1— G(x)] dx

if EX exists, and

(2) Coy [X, Y] =


J [F(x, y) — G(x)H(y)] dx dy

provided EX, EY, and EXY exist.

Exercise 5. If ElXlr <00, then

3) IxI'F(x)[1— F(x)] <_ (some M) <oo on (—cc, cc)

and this same function of x converges to 0 as x -^ too. [Hint: 1%, iyl' dF(y)
lxIrF(x)•]

Exercise 6. (Vitali's theorem) Suppose X. pß X as -


n - oo, where X. E Yr
for all n >_ 1, where 0 ^ r <cm. Then the following are equivalent:

(i) X„ - ,X as n ->oo,
(ii) EIX l' -3,ElXl' as n-*cc,
(iii) the rv's IX„I', n >_ 1, are uniformly integrable.

Exercise 7. (Schefle's theorem) Suppose h n + a . s , h as n -* co on a a-finite


(,f2, .s1, Ec) and lim„_^ J Ih„i dµ I I hl dµ. Then sup,,,,, 11,, h„ dµ — J A h duI -> 0.
MISCELLANEOUS PROBABILISTIC RESULTS 863

Exercise 8. If h. E.2 on a a-finite (Cl, sdl, jt) for all n, if h as h as n - m


for some h E Y2 and if Tim„ y , f h„ dµ < 1 h 2 dµ, then j (h„ — h ) 2 dµ - 0 as n - m.

Proposition 1. Let h E Y 2 ([0, 1], 00, dt). Let


i/m

(4) hm(x)=m h(x)dx fori-1/m<x<i/m and 1-im.


(i -1)/m

Then

(5) I (h—h m ) 2 dt^0 as m.


la

Proof. Now

m / i
0Sf (h-h m ) 2 dt= Y_ f ' h(t)- hm Zdt

_
o,

rn

i =1
i

i m

u-1)/m
z
h dt-- h m


m L
=1 (i -1)/m
2 m i

m;-,

m +—Ehm
^—

m , m
(

1 m
—/J
m

^ ^

(a) =f1 h 2 (t) dt—^l t) dt.


0 0

By Exercise 8 it remains only to show that h m (t)-h(t) a.e. as m-x. Now


for each in > I there exists i = im for which (i —1)/ m < t <_ i/ m. Then

(b) hm(t)=m(m—t){fi/mhdsl —t
(i )}

+m(t—iMI)
{J(i-l)/m h ds/(t—` ►nl)}

(c) -* h(t) a.e.

since each term in { } in (b) converges a.e. to h(t) by the Fundamental Theorem
of Calculus. ❑

Exercise 9. Let h E'2 ([O, 1], , dt). Let

i -1 i
(6) h.(x)-h m+l for m <t<_m and 1<
—i <_m.

864 INEQUALITIES AND MISCELLANEOUS

Use Scheffe's theorem (as in Häjek and Sidäk (1967, p. 164)) to show that

(7) J 0
i
(h_h m ) 2 dtO as m.

9. MISCELLANEOUS DETERMINISTIC RESULTS

Stirling's Formula 1. For all n> 1

where
1 1
<a <—
12n +1 12n

Euler's Constant 2.

n 1
– – log n ,[ y = 0.5772156649... .

Proposition 1. (i) Suppose d„ 7. Then

log n =
1 < ao implies n-.=
lim 0.
do

(ii) Suppose only ndn 7 is known. Then

°° 1
— < oo implies lim d„ = oo.
n-.=

Proof. (i) For each integer K there exists a constant MK J 0 such that

1 1 "' 1
oo>MK _ m
K +, nd„ dm K+1 n
(a) ?[log m –(log K)-1+y)/d,„ for all m, K

where y is Euler's constant. From (a) and dm 7 oo we get

MK ? lim (log m )/ d„, for all k


m-m

Thus
0> um MK _> lim (log m)/d.
K- co m-.00
MISCELLANEOUS DETERMINISTIC RESULTS 865

(ii) Nowco>M=E,ß (1/nd„)?Y (1/kd k )?1/d„,solimd„?M>0.

Assume d. -A oo. Then on some subsequence n k we have d„ k -+ (some M') e


(0, co). Thus

oo>2M>_ (nk – nk-1) — =M' 1–nk =M' Y, ak .


k=1 nk k=1 nk k=1

But if Y ° a k < oo and nk n k _,/ (1– a k ) as above, then we need small a k


for convergence. The smallest possible a k leads to n k = n k _ 1 + 1, or n k = k; but
this case leads to a k = l/ k with ak = oo. This is a contradiction. Thus
d„-co. ❑

Proposition 2. Suppose f and g are positive and \ and S > 0. Then

f 06
f( )t
1
dt<oo implies li ö
log(1/t) =0
f(t)
and
s 1

J :g(t)t log(lit)
dt <ao implies tim
m
log2(1/t)
g(t ) 0.

And so on.
Proof. Let 0 < e < S, and
f:+.
fixt so that 0<t<5–e. Then
"
f' ,
1'

uf(u) du
11log(t+E)—logt
uf(u)
du> –

.f(t)
f udu=
+f
f(t)

Now let t -> 0, which implies f(t) -^ oo, and obtain

f06 uf(u)
logt +e)+log (I/t) - -log (1/t)
du_li m
f(t) = iii f(t)

Letting e - 0 yields the claim. This proof is from James (1975). ❑

Exercise 1. Suppose f is positive and ", with jö[1/tf(t)]dt<oo for some


5>0. Then E10[1/nf(1/n)]<oo.

Exercise 2. Suppose f is positive and \. Then for any integer k we have


Yn k+I f(n) - 1 J (t) dt ^Yn=kf(n)•
0

Exercise 3. Suppose f is / on [0, 1] with Jö [1/f(t)] dt <oo. Then there exists


a function g having g(0) = 0 that is T and continuous such that
Jo [ 1 /(f(t)g(t))] dt <oc.
866 INEQUALITIES AND MISCELLANEOUS

Exercise (0, 1) and f f(t) dt <oo. Then 0 as t-+0.


4. Suppose f is \ on tf(t)->

Proposition 3. Let a > 0 be \. Let n = ((1 + e ) ) for some s>0. Then


n k k

"<oo if and only if Y_ a < nk


n k=1

(Note that other monotonicity conditions such as na n \ could replace a n


Proof. Now for some constant c,
n 1
^^ C k - (''
an = L an c L an 1 (nk - 1 - nk-I) C1 L an,;
n=1 n k=1 n =nk -j n k=1 nk -1 k=1

and likewise for some c2

_
an> Y an (n k —1— nk -1) cZ Y- a n , ❑
n=1 n k=1 nk k=1

Exercise 5. Let a n >0 be \. Then for any integer k

I an k
— ak« <
an
k-1 n =, n n =0 k-1 n., n

Proposition 4. If a n < cc, then there exists a sequence c n T' cc such that
^ I cn a n <cO.
Proof. (J. Fabius) Let s,,, = a 2m with 0< a < 1. Choose n m T so that Y„ a k <
E. for all m. Let ek = i/I for n m <_ k < n m + , and m >_ I with c, _ = c- 1 =
0. Then

ao m nm + 1 -1 m 1 =Q

ckak = L ckak^ k m aa,,,


k=1 m=1 k=n VE
m /— o0
VE m = Y- a m <oo.
m=1 m=i

Now modify the c k 's to be T. 07

Discussion 1. The proof of Robbins and Siegmund (1972) of Theorem 10.1.2


for k = 1, succeeds by first working on a particular subsequence. This is a
useful technique and we now discuss it in more detail.
Let us define the subsequence n; by

(1) n; =(exp (aj/logj)) forj?2.


MISCELLANEOUS DETERMINISTIC RESULTS 867

We would like to claim that E , (c„ /n) exp (—c„) is <oo or =00 according
as Y Z c„ ' exp (—c„.) is <oo or =oo. This is not true, but we now see what
we can do in this direction.
We first claim that
(2) logi n;—logj asj-*cc
and

n —1 asj-goo.
(3) n+1 ~lo
log j

Let c„ be a sequence satisfying

(4) c„/n', and either c„; or lim c„/Iog 2 n >_ 1.

If we now define d„ by

(5) d„ = c„ n 2 log e n, ,

then

(6) dn/ n ^, 0 and either d„ >' or lim d„/log 2 n >-1.


n- .0

Suppose first that for a fixed integer k >_ 1 we have

(7) (cn/n) exp ( — cn) <o°.


n=1

It then holds that

(8) d„ ?A log t n for all n ?some n,A for each 0 < A < 1,

(9)
1
M^ <both (i — n^
{
n; l and ( n;+i _ i ) d.1 <— M«
/ d„
for some 0< MQ < oo and for all j ? some J., and finally
00
(10) Y, d„-' exp (—d„,)<oo.
1=2

If, however,

(11) (c^ /n) exp ( —Cn) =co >


n=1
868 INEQUALITIES AND MISCELLANEOUS

then
00

(12) E d exp ( —d,, ) =oD.


1
j

Roughly speaking, c„'s satisfying (4) can be replaced by the somewhat


smoother d„'s of (5) for which our desired claim holds.

Exercise 6. Establish (3), (8)-(10), and (12). [The key points, for k = 1, are
contained in Robbins and Siegmund (1972).]

Exercise 7. Prove the Robbins and Siegmund theorem (Theorem 10.1.2).

Integration by Parts

If U and V are of bounded variation on the real line, then Hewitt and
Stromberg (1969, p. 419) show that

(13) U+ (b)V+ (b)— U_(a)V_(a)=^ U_ dV+ f V dU


(a b] a , b]

and

(13') U+(b)V+(b)—U+(a)V+(a)=
f
(.,b]
U_dV+f
a b]
V+dU

or, symbolically,

(14) d(UV)= U_ dV +V+ dU.

From this it follows by induction that


k-1 `

(15) dU k = U' 'f dU


U`_- - fork= l,2,....

By applying (14) to 1= U(1/ U) we obtain for positive U's that

(16) d(1/U) =— U U dU. '

If U>_0 is ‚', then (15) yields

(17) rU'` ' dU<_d(U k )<—rU+- ' dU fork= 1,2,...


MARTINGALE INEQUALITIES 869
replace U+ by U_ (with U+ >_ U_) in (15) for the first inequality, and replace
U_ by U+ in (15) for the second.
Note that for an arbitrary df F we have

f hd (1 F) = J h { F-d (1 1 F) + 1 F dF }

J 1
= h{F (1—F)(1—F_)dF+l—FdF}
(^ 1
1

(18)
= J h(1—F)(1—F_)dF

10. MARTINGALE INEQUALITIES

Let S 1 , 52,... be a sequence of rv's and let 9 l c '2 c . • . be an 7 sequence


of Q-fields for which S n is SP,,-measurable; we say that S n is adapted to 5"n .
Then (S,,,9',,), n ? 1, is said to be a martingale if EISn I <oo for all n and if

(1) E(Sn+, ß.9'n)=Sn a.s. for all n.

Also, (S n , fin ), n > 1, is said to be a submartingale of EIS,,I < for all n and
if

(2) E(Sn+iI9'n)?Sn a.s. foralln.

In the most common situation Y. = a [S. , ... , S,], and we omit 'n and refer
only to S. We define

(3) Mn' max Sk •


iss

Proposition 1. Suppose the range of all S n 's is contained in an interval on


which the function h is convex, and suppose Elh(S,)I <oo for all n.

(i) If (Sn , Yn ) is a martingale, then (h(S,), 9'n ) is a submartingale.


(ii) If (Sn , Yn ) is a submartingale and h is also 7, then (h(S,), .fi n ) is a
submartingale.

Proof. (i) Using Jensen's inequality in the first step we have

(a) E(h(Sn+l) 1 9 'n) ? h(E(Sn+, 1 9'n)) = h(Sn ) a.s.

(ii) For increasing h, the equality in (a) becomes an inequality. ❑


870 INEQUALITIES ANIS MISCELLANEOUS

Exercise 1. If (S„, fin ) and (S*n, fin ) are submartingales, then (Sn v S*, ."„)
is a submartingale.

Proposition 2. Let S,, be adapted to gn with EIS„ <x for all n. Then (Sn , El,,)
is a martingale (submartingale) if and only if for every n > m and every A E En
we have

(4) J S dP = S,„ dP.


n
A O fA
Proof. By definition of conditional expectation we have

J Sn dP=f E (Sn S„) dP =


A A (
J S _, dP.
A
n

The rest is a trivial inductive type of argument. ❑

Inequality 1. (Doob) Let (S„, Y.) be a submartingale. Then for all A >0

(5) AP(Mn >_ A) <_ Sn dP <_ ES+n <— EIS n I.


ft M^^A]
Proof. Define r to be the minimum k not exceeding n for which S k >_ A, and
let r equal n + 1 if no such k exists. Clearly S,. is a rv, and

AP(M,,—A)<— J SrdP
[M„ A]

J
= k=1 fwnz-A and r=k]
SkdP= Y J
k=1 [r=k]
SkdP

Sn dP by the previous proposition


k=1 [T=k]

Sn dP. ❑
[M„zA]
The previous inequality of Doob is a generalization of Kolmogorov's
inequality. The next inequality extends this to a type of Skorokhod inequality.

Inequality 2. (Brown) Let (S n , En ) be a submartingale. Then for all 0< c < 1

(6) P(M,?A)< S„dP /[(1 —c)a] for allA>0


f
ES ^ >cAl
MARTINGALE INEQUALITIES 871

Proof. Now using Doob's inequality (Inequality 1)

AP(Mn?A)<—f SndP
CM^^A]

=
f
CM,ztA and S,,>cA]
SndP+
f
t M,aAandS, cA]
SndP

f
C&>CA]
Sn dP+cAP(MM >_A).

One step of simple algebra completes the proof. ❑

Remark 1. If (Sn , .9n ) is a submartingale, then so is (exp (rSn ),.n ) for any
r>0. Applying Doob's inequality (Inequality 1) yields

(7) P(Mn>_,i)<—inf Eers^/e + r" for all A>0.


r>0

This frequently is sharper than Brown's (1971) inequality; but Brown's


inequality does not require the existence of a moment generating function.

Exercise 2. (Doob) If (Se , 9') is a submartingale of rv's satisfying S n >_ 0,


then

(1+E(S.(logSn ) + ))
eel ifp=1
(8) EMn<_ / \p
1 p f ES„ ifp>1.
\p - 1/

(See Doob, 1953, p.317); see also Gut, 1975.)

The following generalizations of the Häjek-Renyi inequality are found in


Birnbaum and Marshall (1961).

Inequality 3. (Birnbaum and Marshall) Let S k be adapted to & k and suppose

(9) E(jS k _ I )>_B k ISk _,I a.s. for l<—k<_n

(where So = 0) with 0 <_ 9 k <_ 1 for all k Let

(10) b,>...>bn?0=bn+l•

Let r? 1. Let

(11) Mn = max b k lSk i.


lsksn
872 INEQUALITIES AND MISCELLANEOUS

Then for all A >0

n /

( 12 ) P(M n ^ A) C (bk — 0 k+l bk +l )EI Skl r/^ r


1

(13) —: bk[EIS k I r —d k _,EISk - l Ir]/Ar•

Proof. Since the given condition implies E(ISkJ'iS'°k_,)>_ BkISk _,I' for any
r ? 1 by Jensen's inequality applied to the convex function x', without loss of
generality we set r = 1 and assume all S k >_ 0. Let

(a) Ak = [max bj Sj <A <_ b k Sk ].


1sj<k

For j > k we have

Sj dP = J E(Sj I &j -,) dP> 0j Sj _, dP


fA k '4 k Ak J
j
(b) ? • • •
j1
i=k+1
o )

Ak
dP.

And since
n j k
(c) Y_ (bj — 0j +1 bj+ ,) I II O i = b k provided fj 0 = 1, ;

j=k i=k+1 I i=k+1

we have

AP(MM >_,i) =,1 P(A k )^ b k Sk dP using (a)


fAk

_ E Y, (bj—Bj+lbj Sk dP by(c)
k=1j=k -n ei /
+l)\ i=k+t f1 k

(bj — 0j +lbj +l) J S1 dP by (b)


k=1j=k Ak

n j
(bj _0j±1 bj±1 ) J S1 dP
j=1 k=1 Ak

(d) _ Y- (bj — ej+1bj +l) Sj dP


j= 1 MJ^AI fr
n n

(e) L (bj -0j+lbj+1)ESj = Y- bj[ESj - 0j_,ESj_,].


j= 1 j =I

MARTINGALE INEQUALITIES 873

This completes the proof. Unfortunately, there seems to be no way to exploit


(d). ❑

Exercise 3. (Häjek and Renyi inequality) Let (IS k I, 9k ), 1 <_ k <_ n be a


submartingale with ESk = 0 for all k Let ck = Var [S k — Sk _,], where So = 0.
Let b m ? • • • >b, for some 1 <_ m <_ n. Show that for all A>0

(14) P(mk
max bkISkJ ^ A) ^ [b m J Qk+ bkQk //i2.
1 m+l ]

State and prove the analog for a reverse martingale (see the next section).

Let S,, t ? 0 be a collection of rv's and let be an increasing collection


of o--fields for which S, is S°,-measurable. Then (S,, .9,), t ^ 0, is called a
martingale (or submartingale) if EIS,I <oo for all t and if
(15) E(S,IS°S )^= A SS a.s. forallst.

Inequality 4. (Birnbaum and Marshall) Let (IS,I, .9',), 0<_ t <_ 0, be a submar-
tingale whose sample paths are right (or left) continuous. Suppose S(0) =0
and v(t) = ES 2 (t) < co on [0, 0]. Let q>0 be an 14 right- (or left-) continuous
function on [0, 0]. Then

(16) P( sup IS(t)Ilq(t)^ 1 )^ [q(t)] -2 dv(t).


0<-t. o
B
J 0

Proof. By the right (left) continuity of the sample paths and S(0) =0

p= P( sup IS(t)I/q(t)^ 1)
0<t O

= P( max IS(Oi/2")I/q(Oi/2") ^ 1 for all n> 1)


o25r^z"

= lim P( max 1)
n-'= 0^is2

llm
L 1— [E(S2(Oi/2")—S2(0(i-1)/2"))/g2(Oi/2n)]]
z ° 1
=1—li q
m z (ei /2) [v( 0ij 2 ) — v( 8 (i -1 )j 2 )]

J
= 1— e [q(t)] -z dv(t),
0

where the inequality comes from Inequality 3 and the convergence comes from
the monotone convergence theorem. In fact, we only need 5 to be separable
and q to be A. ❑
874 INEQUALITIES AND MISCELLANEOUS

Inequality 5. (Doob) Let (S,.9',), a!5 t ^ b be a submartingale whose


sample paths are right (or left) continuous. Then for all A >0

(17) _
AP(IIS + IIa^A) < SbdP^ESa:EISb.
fill s + IItea]

If, further, S, >_ 0, then


P
(18) E{{SP{^ b < p ESb if p> 1.
p-1

Theorem 1. (Submartingale convergence theorem) Suppose (S,,, EI'„), n >_ 1,


is a submartingale.
(i) If sup„ ES„ <oo, then there exists a ry S such that S„ -^ S, a.s. as n - 00.
Further, E I Sc0I <— sup ,, EIS„ I.
(ii) If the Sn's are uniformly integrable, then (S„, El'„), 1 <— n <_ oo in a submar-
tingale when 9m =o[U' , S„]. If further the S„'s are uniformly integrable,
then EIS„ —S .,I -*0 as n -" •

11. INEQUALITIES FOR REVERSED MARTINGALES

Let S, , S2 ,... be a (finite or infinite) sequence of rv's and let 9' D Y2


be a N sequence of v -fields for which S. is 3„- measurable; we say that S,, is
-

reverse adapted to Y,,. Then (S,,, Y„) is said to be a reverse martingale if


EIS„! <oo for all n and if

(1) E(S„ I S„ + ,) = S„ + , a.s. for all n.

Also, (S„, 9',,) is said to be a reverse submartingale if EIS„) <oo for all n and
if

(2) E(S,,j.'„+,)>S„+, a.s. for all n.

Exercise 1. Suppose the range of all S,,'s is contained in an interval on which


the function h is convex, and suppose EIh(S„)I<oo for all n:

(i) If (S„,.„) is a reverse martingale, then (h(S„), 9„) is a reverse sub-


martingale.
(ii) If (5,,, 9'„) is a reverse submartingale and h is also 7, then (h(S„), .9'„)
is a reverse submartingale.

Exercise 2. Let S. be reverse adapted to .9'„ with EIS„I <co for all n. Then
(S„, ,,) is a reverse martingale (submartingale) if and only if for every m < n
INEQUALITIES FOR REVERSED MARTINGALES 875

and every A E.,, we have

(3) A f
S,„ dP = I S,, dP.
^' A

Inequality 1. Let (Sk , .k ), 1 <_ k <_ n, be a reverse submartingale. Let Mn


max i k . n Sk . Then for all A >0

(4) AP(Mn?A)< _ES, ^EIS1I•


S, dP <
ft M,zA)

The version for S S continuous on [a, b] also holds.


Proof. Define Sk = Sn —k+l and &r = 9n-k+1 for 1: k n. Then (S, S") is
a submartingale. Thus

AP(Mn >_A)=AP( max Sk?A)<


Ik`n f
S*n dP= S, dP
fEA4_-A) M,^A]
E

by Doob's inequality (Inequality A.10.1). 0

Inequality 2. (Birnbaum and Marshall) Let (S„ 9's ), a t ^ b, be a reverse


submartingale whose sample paths are right (or left) continuous. Suppose
S(b) =0 and v(t)=ES 2 (t) on [0,b]. Let q >0 be a \ right- (or left-) con-
tinuous function on [0, b]. Then
j"b
(5) P(IIS/glIe 1) [q(t)] - Zd(—v(t))•

Theorem 1. (Reverse submartingale convergence theorem) Suppose


(Sk , 92k), k >_ 1, is a reverse submartingale for which

(6) lim ESn <oo.

Then the Sn 's are uniformly integrable and there exists a ry S ,,, and a a--field
Y., such that (Sk , .9'k), 1 <_ k <— oo, is a martingale. Finally,

(7) S,,-S,,,
S,,, a.s. as n -+ o0
and

(8) EIS,, —S,.31-+0 as n -oo.


876 INEQUALITIES AND MISCELLANEOUS

12. INEQUALITIES IN HIGHER DIMENSIONS

Let N denote the set of r-tuples k = (k 1 ,. . . , k,) of integers k ; > 0. Let Xk , k


inN',berv's.Wewriteknifk i ^n i for1i<_r.Let1=(1,...,1).Wedefine

(1) S= Y_ Sk and M= max Sk i.


isk<n I`k'511

We begin with an analog of Skorokhod's inequality due to Wichura (1969).

Inequality 1. (Wichura) Let X k , 1 <_ k ^ n, be independent (0, o) rv's. Let


Q 2 =Y- I ^ k ^ n Qk. Let 0<c<1. Then for A>V' Q/[c' 1 (1—c)] we have

(2) P(M,,^a)^2'P(iSn!^c'A).

Proof. We give the proof in dimension r = 2 only; the rest is only notation.
With n = (m, n) and S;; = S(i j) , let

A ;k =[max max IS,, <a<_lSikI]


I<hsi IJ<k

and

Bik—[ISmk—Sikl <(I—c)Ä]•

Now

p=mii P(Bik) =1— m k x P(ISmk — Sikl^( 1— c)A)


ik

Var [Smk — Sik] Q2


>_1—max (1—c) ^2
i,k 2d 2 X12

(a) >Z ifa>_'Jcr/(1—c).

Hence

P( max jSmk 1>cA)^P(UY_AikBik)


1- kn k i

[min P(B;; )]P ( U Y_ A l k) by the events lemma


(Inequality A.7.1)

^P( UY-Aik)/ 2 by(a)


k i

= P(Mn A)/2.
INEQUALITIES IN HIGHER DIMENSIONS 877

So

P(Mn A)<_2P( max IS,nk I>c,1) forA>_fa /(1—c)


Isksn

_4P(Sn >C 2 ,1) forA>v o/(c/(1—c))

by Skorokhod's inequality (Inequality A.2.4). ❑

In cases where Wichura's inequality can be applied to the Tk 's on the rhs
of Shorack and Smythe's (1976) inequality (Inequality 2) below, the result is
an analog of the Häjek and Renyi inequality.
Let bk , k E N', be a set of positive constants such that

(3) Obk?0 forallk?1;

here Obk is the usual r- dimensional differencing around the 2 points of N'
neighboring k which are sk. Define

(4) Sn= Y, Xk,


kn
Yk=Xk/bk, T.
kn
Yk•

Inequality 2. (Shorack and Smythe) If b k > 0 satisfy (3), then for arbitrary
Xk we have

( 5 ) max {ISkI/bk] ^ 2' max Tk j.


k n

If all Tk >_ 0, we may replace 2' by 2' `.


Proof. Define b k and Xk to be zero if some ki =
-

0, 1 <_ i < r. Then

Sk = I bj(AT) =
jsk - -
Y_ (b)=
jsk
(ATj)
- i^j isk

where Tik = Y. Note that each Tik is formed by differencing 2' sums
of the type T. Since

Y_ =1
i5k
(clbi )/bk with each Abi >_0,

we note that (since weighted averages do not exceed the maximum)

Y- (Ab;)jT,,I/bk<_mak IT-iI•
'`
— k -
878 INEQUALITIES AND MISCELLANEOUS

Thus
max Sk I/b k <max Y_ (Ab)I7 I/bk ^ max max I T;k I
kin L'' ‚k k<n isk

<_ 2r ksn
max I Tk 1.

Clearly 2r ' works in case Tk ? 0 for all Ic.


- ❑

13. FINITE-SAMPLING INEQUALITIES

Let a finite population consist of the N values c,, ... , C. Let Y,, ... , Yn
denote a random sample without replacement, and let X 1 , ... , X. denote a
random sample with replacement from the population. Let
1N
2_. I N
(I) µ=— E c; and
N1= 1 N1

Then

(2) EY=EX=µ and Var[Y]=(I-N-1) n ^ n =Var[X].

Inequality 1. (Hoeffding) If f is convex and continuous, then

(3) Eh(2)<_Eh(9).

In particular, then we have

P(.jn(Y-µ)>A)<_inf e -"'E e rc ' µ^


r>0

(4) <_ inf e ' E er•^"cz µ>


-
r>0

Thus any bound on P(V(X-µ)?A) derived via (4) is also a bound on


P(Ii ( Y - µ) >_ A). In particular, then, we have the following corollary.

Corollary 1. The bounds of Bennett's inequality (Inequality A.4.3),


Hoeflding's inequality (Inequality A.4.4) and Bernstein's inequality (Inequality
A.4.5) all hold when X is replaced by Y

14. INEQUALITIES FOR PROCESSES

Suppose X 1 ,. .. , X,, are processes on (D, ). Recall that X is a process on


(D, 9) if and only if each path X ; (w, •) is in D and each X ; (•, t) is a rv. Thus
INEQUALITIES FOR PROCESSES 879
all of the partial sums
k

(1) Sk= Y_ X„ 1 sk<n,

are also processes on (D, 2). Let S0 = O. Also, for any process X on (D, 2),
-

we note that IIXII = lim, n -, max o s 1 ^ 2 m IX(i /2'")I is a rv. Thus all rv's and
probabilities are well defined in what follows. All conclusions claimed so far
are also true on (D R , O R ). We now present generalizations of the inequalities
of Skorokhod and Levy.

Inequality 1. (Skorokhod type) Let X 1 ,. .. , X,, be independent processes


on (D, 2) or (DR, -O R ). For all 0 <c<1 we have

(2) P( ma jSkll^A)^P(IISn cA) /[1 — 1 max P(IISn—SkII>(1—c)A)


I kn

for all A>0.

Proof. Just as for one-dimensional rv's we let

(a) A k =[max IjSill<a<IISkII] for s<k<n.


0^/sk

Then
n
aP( max IISkII^A)^ E P(Ak)P(I1Sn—Sk1Is(1—c)A)
1`-k`-n k=1

n
= Y,= P(Akn[IISn—SkII^(1—c)A])
k1

(b) ^ P (II S II ^ CA ),
where

(c) a= I min n P(IISn — SkII (I — c )a) =1 —^max^P(IISn - Sk II>(1 — c)A).

Dividing through by a in (b) gives the inequality. ❑

We call e a Rademacher ry if P(s = 1) = P(e = -1) = z. If X is any process


on (D, .2) and e is a Rademacher ry independent of X, then EX is a process
on (D, 2) that is symmetric in the sense that EX=-eX. This also holds on
(DR, SR).

Inequality 2. (Levy type) Let X l , ... , X n be independent processes on


(D, 2) or (DR . O R ) that are independent of the iid Rademacher rv's e, , ... , 8n.

880 INEQUALITIES AND MISCELLANEOUS

Then

(3) P/'mk "11 k j X 1 II>A)^2P( 1 >A) forall,l>0.

Proof. Let Sk E X;, and let

(a) Ak=[max IISiII<_A<IISkII]


0si<k
for t:5k<_n.

Just as in the one-dimensional case

n
(b) P(IISrII>A)^ I P(Akn[IISnII>A])•
k=1

Before continuing as in a one-dimensional proof, we let Tm


{i /2 m : 0 < is 2 m } and T = Um=, T. = {the diadic rationals} so that T is count-
ably dense in [0, 1]. We also define IIf II m =sup {I f(t)I: t E Tm }, and note that
IIfIIm -^ If II for all f E D. Whereas IIf11 may not equal f(r) for some re [0, 1]
[it could equal some f(r—)], it is the case that If I m equals f(r) for some r
in Tm . We also define

(c) Akm=[ max IIS; II<A<IISrIIm] and tkm = min {i: Sk (i/2m)>A}.
Irk- n

We are now able to proceed. We go back to (b) and write

(b) P(IISnII>A)? P(Ak^[IIS+nII>A])


k=1
n
_ I lim P( Akmn[II Sn lJm>A])
k= , m-^o'

n 2^'
_ E hm P(Akm n[ II `Sn IIm > A]n[Jkm - 1])
k=1 m-.ao j=0

_
n 2`6 / /
lim n [S (j/ 2m ) ^ Sk(j/ 2m )] n [Jkm — I])
k , m+V J .. Q

n 2'" /

_ Y_ lim Y P(Akmn[Jkm=l])P(Sn(J/2m)—Sk(.J/2m)^O)
k= , m-. J _ 0

2'"
1 n ^n`
lim Y_ P(Akm n [Jkm — .I]) = — L lim P Ak m )
(

k=1 m -i 0O j=0 2 k=1 ^^ - ' O D


i n
(c) =— k Y_ P(A ) = I P( 'maxn II Sk Ii > A ).
k ❑
INEQUALITIES FOR PROCESSES 881
We now let X denote an arbitrary process on (D; 2) or (DR , R ), and let
X* denote an independent copy of X. Then RC S = X — X* is a process symmetric
in the sense that

(4) Xs = X — X* = —X s, when X = X* are independent.

Moreover, we have the highly useful property that

(5) e(X —X *) =X X*
for e a Rademacher ry independent of X and X.
(The one-dimensional argument suffices to make (5) clear. Thus

P(EX'(t)<_x)= P(E (t )<_xande= 1) +P(EXS(t)<_xands= -1)


=P(X'(t)^x and s =1)+P(—X (t)sxande= -1)
= 2 P(X'( t) <_ x)+z P( — X(:) 5 x) = ( 1+2)P(X (t) ^ x)
S

= P(x s (t) <_ x),

as claimed.) Thus if

(6) XX*for1:i<
_n whenX l ,...,X ng X*1 ...,X 9

are independent processes on (D, -9)

and if

(7) e,, ... , e„ are iid Rademacher rv's independent of the X i 's and X 's,

then

t - 1 li - 1 JJJ

Now for any process X, we agree that

(9) EX is defined by (EX)(t)=E(X(t)), and we consider only X for


which EX c D.

Now for any function g we have E IIg — XII a EI g(t) —X(t)I > I E[g(t) —X(t)]I ?
Ig(t)—EX(t)I for all t, so that

(10) Ellg—XII^IIg —EXIL foranyg.


882 INEQUALITIES AND MISCELLANEOUS

Likewise, for any processes X 1 ,.. . , X,, we have

E{ max Ilgk—Xkll}^Eilgk—Xkli>—Ilgk`EXkII
lsk^n

for all k, so that

(11) E{ max Ilgk Xk II}>_{ max Ilgk


— — EXk ll} for any gk 's.
Isk^n I ksn

Letz _ (X,, ... , ^;C n ), EZ = (EX„ ... , EX ), g = (g...... gn), and define

(12)IIIgIII=,mk x ll.

Then (11) can be rewritten as

(13) EIIIg — ZIII ^ I))g — EZ II for any g.

Inequality 3. Let 7L = (X 1 ,. , X n ) as above and let Z* = (X?,.. , X*) be an


independent copy. Suppose

(14) cp >— 0 is convex and 7 on its domain [0, c).

Then

(15) Ev(IIIZ—EZIII)^ E^(IIIZ—Z*III)•

Proof. Now

E^(IIIZ—Z*III) = Ez {EZ. {p(IIIZ—Z*Ifl)IZ}}


(a) ^ Ez{^(Ez•{IIIl —1*III) IZ})}
by Jensen's inequality and convexity

(b) ? Ez{^G(IIIZ —EZ *III)} by (12) and cp >'


= E^v(IIIZ — EZIII) since Z* = 7L

as claimed. O

Inequality 4. Let Z and Z* be as in Inequality 3. Then for any A, 6>0

(16) P(IIIZ ^A+s)P(Iih*III S) P(IIIl Z*III A). —



INEQUALITY FOR PROCESSES 883

Proof. By independence, the product of probabilities on the left-hand side


of (16) can be written as the probability of the intersection. That intersection
is trivially a subset of the event on the right-hand side in (16). ❑

We have followed Marcus and Zinn (1984) throughout this section.


Fernandez (1970) seems to be a starting point.
APPENDIX B

Martingales and
Counting Processes

1. BASIC TERMINOLOGY AND DEFINITIONS

In this section we set forth the notation and basic definitions that will be used
throughout the remainder of Appendix B and in Chapter 7. Additional useful
references are Bremaud (1981), Jacod (1979), Meyer (1976), Dellacherie and
Meyer (1978, 1982), Liptser and Shiyayev (1978), and the survey by Shiryayev
(1981).
Suppose that (11, 9, P) is a fixed complete probability space. A family of
v-fields # = {. , c : t E [0, co)} is a filtration if

(i) c S, fors < t.


(1) (ii) { } is right continuous: f1 7 = S
s>,

(iii) {} is complete: S contains all P-null sets of

The collection (El, 1, P, ) where is a filtration is called a stochastic basis.


A stochastic process X = {X(t): t e [0, cc)} is measurable if it is measurable
as a function (t, w) - X(t, w) with respect to the product o -field 9.4 x ST where
is the Borel o -field on [0, Qo). X is adapted to the filtration or -adapted,
if X(t) is S -measurable for all 0 t<oo. A random variable T taking values
in R+= [0, oo] is called a stopping time with respect to , or an F-stopping
time, if [ T <_ tie for all 0 ^ t < oo.
A process X is integrable if sup o ^, < . EIX(t)I <ao, and uniformly integrable
if lim,A sup o < « , EIX(1)I1 [ x( , )r ^, 1 =0. X is square integrable if X 2 is
integrable.
A process X has a property locally if there exists a localizing sequence of
stopping times { Tk : k >_ 1} such that (i) Tk -+ co a.s. as k -* x, and (ii) the process
X( n Tk ) has the desired property for each k? 1.
884
BASIC TERMINOLOGY AND DEFINITIONS 885

An adapted right-continuous process M with EI M(t) < oo for each 0 <_ t < o0
is a martingale [or, to be explicit, an (F, P) -martingale] if

(2) E(M(t)l,)=M(s) a.s. forall0^s<t<oo.

M is a submartingale if ? holds in (2), and a supermartingale if - holds in


(2). If M on [0, oo) is uniformly integrable, then lim,—,. M(t) = M(x) exists
a.s. and, adjoining S to the stochastic basis, M is a martingale on [0, 00]. In
this case, we say that M(co) closes the martingale.
The following notation will be used for various subspaces of martingales:

(3) .'If[F, P] = the collection of uniformly integrable (F, P) -martingales,


(4) X 2 [F, P] = the collection of square-integrable (F, P) -martingales,
(5) tit `[F P] = { M E .,It [F, P]: M has continuous sample paths a.s.},
(6) 'M ° [F, P] {M E .4i[F, P]: MIN for all bounded N E .AI `[F, P] },

where M 1 N in (6) means that EM(T)N(T) =0 for an arbitrary finite stopping


time T, or equivalently that MN E Al [F, P]..At' [F, P] is called the set of
purely discontinuous martingales.
Several classes built up from J' processes will also appear frequently:

(7) .{F, P] = the collection of integrable, 7, i- adapted processes,

(8) [F, P] = sat + [F, P] — se[F, P]


= the collection of F- adapted processes with
integrable variation,

(9) V+[F, P] = the collection of 7, finite, F -adapted processes,


(10) 'V[F, P] = +[F, P] - '+[F, P]
= the collection of *-adapted processes with finite variations.

If X = '[F, P] is a collection of processes, then °`[F, P] will denote


the set of processes locally in '[F, P], and Xo [F, P] will denote the subset
of '[F, P] such that X(0) = 0 a.s. Thus .tita °`[F P] denotes the collection
of locally square-integrable (F, P) -martingales M with M(0) = 0.
We will use the notations

(11) IjXIIö = X * (t) °supo^,. , X(s)l


interchangeably in this Appendix and Chapter 7.
A process X is predictable if it is measurable with respect to the o -field on
(0, oo) x SZ generated by the collection of adapted processes which are left
886 MARTINGALES AND COUNTING PROCESSES

continuous on (0, oo). Equivalently, it can be shown that X is predictable if


and only if it is measurable with respect to the o --fields on (0, co) x12 generated
by rectangles of the form (s, t] x A, A E g,. A stopping time T is a predictable
stopping time if the process 1( o ,,](T)=l[ T < t ] is a predictable process.
Equivalently, it can be shown that T is a predictable stopping time if and only
if there exists an increasing sequence of stopping times {S k : k> l} such that
Sk- T a.s.; the Sk 's are said to announce T. An .-stopping time T is totally
inaccessible if P(T = S < co) = 0 for every predictable stopping time S.
For any process X E sz?' ° `[^', P] there exists a predictable process X such
that X — X E At' ° `[, , P]. Moreover, this decomposition of X is unique. X = X"
is called the dual predictable projection of X; see, e.g., Jacod (1979), p. 18.
Every local martingale Mc .til' ° `[3F, P] can be uniquely decomposed as the
sum of a continuous local martingale M` E X ` . ' ° `[.W, P] and a purely discon-
tinuous local martingale M d E.yl d.Ioc [.P]: M=M`+M'. For every ME
.« z •'°`[,F P] the process M 2 is a submartingale, and hence the Doob-Meyer
'

decomposition theorem implies the existence of a predictable increasing pro-


cess (M), called the predictable variation process of M such that M 2 —(M)e
.til' ° `[^, P]. For M, N c X 210° [^ P] the predictable covariation process
(M, N), of M and N, is defined by

(12) (M, N)='-,((M+N)—(M — N)).

For any M E til[^ P], define the quadratic variation process

(13) [M](t) (M`)(t)+ E (AM(s)) 2 , OM(s) = M(s) — M(s —).


5th

Although [M] is not in general predictable, [M] = (M) (so that [M]--(M)€
P]), and [M]' /2 E s4' ° `[<°^", P]. Note that [M]—(M)=
_ (,1M) 2 —(M d ). The quadratic covariation process [M, N] is defined by

(14) [M, N]=;([M+N] —EM —N])=(M C, N`)+ Y_ OM(s)AN(s).


sue•

2. COUNTING PROCESSES AND MARTINGALES

A counting process N is an adapted 7, right-continuous, integer-valued process


with N(0) = 0 a.s. and with jumps of size +1 only. Let Ti = inf {t: N(t) = i },
i = 1, 2,... denote the jump times of N (with Ti = oo if the set is empty). Thus
the sample paths of a counting process N are as shown in Figure 1.

Example 1. (Poisson process) A Poisson process t\! with parameter A (so


EN(t)=At for all 0:5 t < oo) is a counting process with iid Exponential (A)
interarrival times Ti — Ti - I , i = 1, 2,... with To = 0. D
COUNTING PROCESSES AND MARTINGALES 887

T 1 T2 T3 14 •0•
Figure 1.

Example 2. (Renewal counting process) If X,, X2 ,... are iid positive ran-
dom variables [so P(X; = 0) = 0] and T; = X, + • .+X,, then N(t)
1 [0• , ] (T; ) is a (renewal) counting process. ❑

Example 3. (Empirical df counting process) If X 1 , X2 ,.. . , X are iid non-


negative rv's with continuous df F, then 101„(t) = E,'_, 1 (X,)= nt„(t) is a

counting process with T = X,,, ; for i = 1, ... , n and Ti = oo for i = n + 1.... .


Note that N. is not a counting process as defined here if F is discontinuous,
since jumps of size >1 (ties) may then occur. El

A multivariate counting process (N 1 ,..., N,) is a finite collection of counting


processes Ni , i = 1, ... , r with the additional restriction that no two processes
N, and N; , i ^ j, can jump at the same time.

Example 4. (Censored data subempirical df counting processes) Suppose


that X 1 ,. . . , X„ are iid nonnegative rv's with continuous df F, and that
Y1 ,..., Y. are iid nonnegative rv's independent of the X's with continuous
df G. Let Z; = X; A Y, S ; = l [x y ], and define

n n

N;(t)= 110 • r](Z1)s1, N(t) = 1 [o.,](Zi)( 1— Sj)

for O<_ t < cc. Then (N, N „) is a bivariate counting process. ❑

An important property of counting processes is that they may always be


decomposed as the sum of a local martingale and an increasing predictable
process. This is the content of the following theorem due to Meyer (1976):

Theorem 1. (Meyer) If N is an (^, P)- counting process, then there exists


an ,' right-continuous, predictable process A with A(0) =0 a.s. such that

(1) M = N —AE.tilo°[_F,Pl .
888 MARTINGALES AND COUNTING PROCESSES

The localizing stopping times may be taken to be any sequence of stopping


times {S„}, S. -> oo a.s. such that EN(S) < oo for n = 1, 2.... .
Proof. See Theorem I.9, p. 256 in Meyer (1976), or Theorem 18.1, p. 239,
Liptser and Shiryayev (1978). ❑

The predictable process A in the decomposition of N = M + A is called the


compensator or dual predictable projection of N. Other commonly used notations
for A are N (Meyer, 1976; Rebolledo, 1979, 1980) and N° (Jacod, 1979, 1980).
It is easily seen that the martingale M of (1) is in fact locally square
integrable (take the stopping times S„ = T„, n ? 1), and hence M Z is a (local)
submartingale. Application of the Doob-Meyer decomposition to this submar-
tingale yields the existence of an 7 right-continuous predictable process (M)
with the property that

(2) M2 — (M) E fä c [.ag, P].

The following theorem shows that for the counting process local martingale
of (1), the predictable variation process (M) can be easily computed from the
compensator A.

Theorem 2. (Boel et al.; Elliot) The predictable variation process (M) of


the locally square-integrable process M of (1) is given by

(3) (M)(t)=
f 0j]
(1—AA)dA, 0<_t<oo.

Proof. See Lemma 18.1.2, p. 269, Liptser and Shiryayev (1978). ❑

A variety of methods are available for determining the compensator A of


a counting process N with respect to a given filtration ; see the discussion
on pp. 12-14 of Gill (1980). When the filtration F is the natural or minimal
filtration ^ N = {^ N: 0 < t < oo} with

^N = o,{N(s): s <— t},


then the following theorem can be used to calculate A in terms of the regular
conditional distributions

(4) F3(t)°P(T1^tI T1-1,...,T,), i =1,2,....

Theorem 3. (Chou and Meyer; Jacod) The compensator A N of the (F N, P)


counting process N is given by

(5) A _ r'
Ar,
COUNTING PROCESSES AND MARTINGALES 889
where

(6) Ai(t)=
f(0 , tATjj
( 1 F; -)
-

(ot7
- 'dFi= J 1 t :..^(T1)
1 -
1
F;(s - )
dFF(s)

-t <oo.
for 0<
Proof. See Theorem 18.2, Liptser and Shiryayev (1978). ❑

Using Theorem 3, it is straightforward to calculate the compensators A AN


for Examples 1-3; Example 4 is treated in Chapter 7.

Example 1. (continued) Here A(t) = At for 0 <_ t < oo and {J(t) -At: t > 0}
is a martingale. ❑

Example 2. (continued) Easy calculation using Theorem 3 yields

(7) A(t)= dF
Yi-F_
i =7 fO,t^Tj—rnT-}

where the X's have common df F. Note that Example 1 is a special


case. ❑

Example 3. (continued) When n = 1, Theorem 3 yields

A,(t)=
I (o , t ^ x , 1
1
1- F
dF =
co,r^
1(s..)(X,) 1
1- F(s -)
dF(s).

Noting that N n is a sum of iid counting processes each having the same
distribution as rk, this yields

( 8 ) An(t) = L l[s,ao)(Xj) 1 dF(s)


;_1 Imt) 1 -F(s-)

J (oI'
(n - N,
1-F-
-) 1 dF.

We conclude this section with several results concerning counting processes


with continuous or deterministic compensators. An 1 -adapted process X
is quasi-left-continuous if

lim X(Sn )=X(lim S n )


n+ 0 n-+m

for any sequence of 7 .-stopping times {Sn } n .,. ❑


890 MARTINGALES AND COUNTING PROCESSES

Proposition 1. If N is an (', P)-counting process with compensator A, then


the following are equivalent:

(i) A is a.s. continuous.


(ii) N is quasi-left-continuous.
(iii) All the jump times of N are totally inaccessible.

Proof. Liptser and Shiryayev (1978), pp. 242-243. ❑

Proposition 2. If the counting process N has deterministic compensator A,


then N has independent increments and

(7) E exp (U(N(t) — N(s)))


= exp [(e'"-1)(A`(t)—A`(s))] fl (1+(e'"—l)QA(u)).
8<U^!

Equivalently,

(8) N=N„+JA

where NA is a nonhomogeneous Poisson process with mean function ENA (t) =


A`(t) —Y- SSf DA(s), 0< t <oo, and JA is a process independent of NA with
independent jumps only at the deterministic discontinuity points of A and

(9) P(AJA(t) =1) = DA(t), 0<_ t <oo.

Proof. Liptser and Shiryayev (1978), p. 279. ❑

3. STOCHASTIC INTEGRALS FOR COUNTING PROCESSES

The first theorem of this section asserts that the martingale property is preserved
for stochastic integrals of predictable processes with respect to counting process
martingales.
Let N be an (, P)-counting process.

Theorem 1. Suppose that M = N — A E ill I o c [^, P] and that H is a predictable


process satisfying I H( 1)1 <‚ and H dA < oo a.s. for all 0 <_ t <. Let

(1) Y(t) J (o,r]


HdM for0 -t<cao

where the integral is an ordinary Lebesgue-Stieltjes integral for each fixed


wE1.
STOCHASTIC INTEGRALS FOR COUNTING PROCEESSES 891
(a) If E(10 IHI dA) <oo, then YE.tif o [. , P].
(b) If P( J(o, , ] I HI dA < oo for all 0< t < oo) =1, then Y E .^Uo °[^, P].
(c) E(Jö H 2 d(M)) = E(J' H 2 (1 —AA) dA) <oo if and only if
(i) P(J (0g IHI dA <co for all 0st<oo) =1 and
(ii) YE P] and has predictable variation process Y given by

(2) (Y)(t) =
1 0,I)
H2d(M)=
f
(o,t]
H2(1—AA)dA —t <oo.
for0<

(d) P(j (01] H 2 d(M) <oo for all 0 s t < oo) = 1 if and only if Y E .ti l. oc[F, P]
and has predictable variation process (Y) given by (2).
(e) If P(5 H 2 d(M) < oo) =1, then P( II YII 0 < öx) = 1 .

Proof. See Theorems 18.7 and 18.8 on pp. 268-270 of Liptser and Shiryayev
(1978); Chapter II of Meyer (1976); and Jacod (1979). Also see the survey of
stochastic integration by Dellacherie (1980). ❑

The second theorem of this section asserts that all the local martingales
associated with a counting process N are of the form (1).
Suppose that N is an (.F, P)- counting process, let W N = {3;': 0< t <oo}
denote the natural or minimal a --fields associated with N, that is,

X,N = o {N(s): s <— t },

and let N = M +A be the decomposition of Theorem B.2.1. Thus M is an


( ‚ N, P) -local martingale.

Theorem 2. If Y is an (M N, P) -local martingale with right-continuous sample


paths, Y E .'M 1 oc[jV NI P], then there exists a predictable process H satisfying

(3) P( IHIdA<ooforall0<t<co) =1
(o,t]

and

(4) Y(t) = Y(0) + I HdM for 0 s t < oo.


J (O,t]

Proof. See Theorem 19.1, p. 282, Liptser and Shiryayev (1978). Results of
this type were given by Davis (1974), Boel et al. (1975a), Chou and Meyer
(1974), Dellacherie (1974), Kabanov (1973), and Grigelionis (1975). ❑
892 MARTINGALES AND COUNTING PROCESSES

4. MARTINGALE INEQUALITIES

An / process Y is an adapted process with / right-continuous sample paths


and Y(0) = 0 a.s. If X is an adapted nonnegative process with right-continuous
sample paths, Y is an 1' process, and EX( T) <_ EY(T) for all stopping times
T, we say that X is dominated by Y.
The first inequality of this section gives an important generalization of
Doob's inequality (Inequality A.10.1).

Inequality 1. (Lenglart) Suppose that X is an adapted nonnegative process


with right-continuous sample paths, that Y is an 1' process, and that X is
dominated by Y.

(i) If Y is predictable, then for all A > 0, rl > 0, and all stopping times T

(1) P(IlXllö?A)<_A E(Y(T)Arl)+P(Y(T)>_ ).

(ii) If IID YII <c, then for all A > 0, 77 > 0, and all stopping times T

( 2 ) 1'(IlXllö''.1)^A E(Y(T) A(77+c))+P(Y(T)^ r► ).

Part (i) of this inequality is due to Lenglart (1977); part (ii) was proved by
Rebolledo (1979).
Proof. We first show that

(a) I'(IlXll0 ?A)^! EY(T).

Let S= inf {s <_ TA n: X(s) >_ A}, T n n if the set is empty. Thus S is a stopping
time and S <_ TA n. Hence

E(Y(T))>_ E(Y(S))
>_ E(X(S)) since X is dominated by Y

'— E(X(S)I[IIXIIö".A7)
(b) ?AP(IIXIIo""^A).
Letting n -> oo in (b) yields (a).
To prove (1), we will in fact show that for A > 0, rt > 0, and all predictable
stopping times S

(c) PQXj ?A)<_ —


E(Y(S—)A77)+P(Y(S—)>_rt).
MARTINGALE INEQUALITIES 893

Then (1) follows from (c) applied to the processes X T = X(. A T) and Y T
Y(• A T) with the predictable stopping time S = oo.
Let R ° inf {t: Y(t) > 77 }. Then R >0 by right continuity of Y and is predict-
able since Y is predictable. Thus R A S is predictable and

P(X *s -^A)=P(Ys-< 17, Xs-^A)+P(Ys-^ 71,X *s -^A)

^ P(1[ rs _<,^1X s -'— A)+ P(Ys- '— n)

`P(X(R„s)--A)+P(YS-^ v1)
since 1 [ ,.s _ < ,, l X *s_ _ X(R„s)-
*

^liminfP(Xs,>_A— E) +P(Ys ->i )

where S,, < R A S, Sn - R A S is a sequence of stopping times,

and [X {RAS) _?A]climinf [Xs.?A —s]

< liminfE(Ys,)+P(Ys->77) by(a)


A — e n^oo

E(YYRAS)-)+P(Y5-^7t)
A —E

E(Ys-^Tt)+^'(1's-gin)
(d) ^^l —s

since Y( RAS)- 1's- A n•

Letting E -*0 in (d) yields (c) which in turn implies (1). The argument for (2)
is similar. ❑

Example 1. Suppose that M = {M(t): t - 0} is a square-integrable martingale


with predictable variation process (M). Thus M 2 —(M) is a martingale, and
EM 2 (T) = E(M)(T) for all stopping times T. Thus the nonnegative process
X ° M 2 is dominated by the 7 predictable process Y= (M). Thus by
inequality (1), for all A > 0, 77 > 0, and stopping times T

(3) P(IIMII0 A)=P(IIM 2 V0 ^A 2 )^Az+P((M)(T)?Tl)•

Passing to the limit on k in P(IIMII0 " T" ?A)=P(IIM('A Tk)II0 ^A)^


rtA - ' + P((M)(T A Tk ) ? 71), we see that (3) holds for all stopping times T when
M is just a locally square-integrable martingale. ❑
894 MARTINGALES AND COUNTING PROCESSES

Inequality 2. (Burkholder-Davis-Gundy) If 1 ^p <co then there exist con-


stants c,, and Cp such that for every local martingale M

(4) cpE[M]"iz_ E IIMIIv -_ Cp E[M]riz

Proof. See Meyer (1976), p. 350 or Jacod (1979), p. 38. El

5. REBOLLEDO'S MARTINGALE CENTRAL LIMIT THEOREM

For M E Rö `[^, P] and e >0 set

(1) aE[M](t)= E IAM(S)II[^oM(S)1>E,,


s t

(2) o.e[M](t) ° IIM(s)I z ltloM(S)I>E},


S5t

(3) AE[M](t)= OM(s) 1 [^nA4cs)l>E3.


s

Also let

(4) I=_A[M]_A6[M]

and

(5) MM—Me.

Then IIOME I) <2e always and IIDM` II s e in the quasi-left-continuous case.


Now suppose that (M) 1 E jj X `(^,,, P) is a sequence of local mar-
tingales defined on a common probability space (Cl, , P), but with different
filtrations .. = {: t e R + }.

Definitions. If (M„), is a sequence of local martingales as above:

Then (M„) nal E fl ,,,, ö `[F., P] satisfies the ARJ(1) condition if, as

(6) ([M„]"z+[M„, M^]*)(t) - p 0 for alle > 0, t E R + .

Then (M„) nz1 satisfies the strong ARJ(1) condition if

(7) äe[M„](t) p0 0
-
for all e > 0, t e R + .

Then (M„) n.1 E fl .,, 40c[, P] satisfies the ARJ(2) condition if, as n - 00,

(8) ((M6) +(Mn, M6.)*)(t) -> P 0 for all €>0, t E R+.


REBOLLEDO'S MARTINGALE CENTRAL LIMIT THEOREM 895

Then (M„) „. 1 satisfies the strong ARJ(2) condition if, as n -4 oo,

(9) ?E[M„](t) -*p0 for ally>0,tER + .

Then (M) 1 satisfies the Lindeberg condition if, as n -* co,

(10) E{o [M„](t)}-^0 for all s >0, tc R.

Proposition 1. (Rebolledo)
(i) For (M„) n .1 E fl n,l 0 ö °[,`n, P] the strong ARJ(1) condition implies
the ARJ(1) condition: (7) implies (6).
(ii) For (M„)„^ 1 E .# 2,
1oc [., n , P] the Lindeberg condition implies

the strong ARJ(2) condition which implies both the ARJ(2) and the strong
ARJ(1) conditions. Also, the ARJ(2) condition implies the ARJ(1) condition
[and, of course, (i) continues to hold]:

^ (8) v
(10)=(9) (6).
y (7) ^

(iii) If (M )>, E tit0I°`[F„, P] and each M. is quasi-left-


continuous, then the strong ARJ(2) and ARJ(2) conditions are equivalent and
the ARJ(2) condition implies the strong ARJ(1) condition:

(10)^(9)^(8)^(7)^(6).

Now let S denote standard Brownian motion on [0, oo).

Theorem I. (Rebolledo) Let (M„)„^, Ejj^ 1 .titö`[3F P] and. suppose that

(11) (M„)„ . 1 satisfies the ARJ(1) condition (6)

and
(12) [M„](t) -* A(t) asn-^coforalltER+

where A is 2' and continuous with A(0) = 0. Then

(13) M ='M =SoA on (D[0,00),2[0,00),d) asn->cc.

Theorem 2. (Rebolledo) Let (M„) „ E fi,, fö'I°°[^„, P] and suppose that

(14) (M,,),,, satisfies the ARJ(2) condition (8)

and either

(15) (M„)(t) -p D A(t) for all tE R +


or
(16) [M„](t) -*A(t) for all tE R+
896 MARTINGALES AND COUNTING PROCESSES

where A is >' and continuous with A(0) = 0. Then (15) and (16) are equivalent
and

(17) M„^M=SoA on (D[O, x), 22[0,ac), d) as n-9.

6. A CHANGE OF VARIABLE FORMULA AND EXPONENTIAL


SEMIMARTINGALES

An adapted process X is a semimartingale if it has a decomposition of the form

(1) X =Xo +M+A,

where X0 is Fo -measurable, M e tf o [P, P], and A e 'VO [P, P]. The decompo-
sition (1) is not unique. The collection of (, P)-semimartingales will be
denoted by ‚9 [^, P].
2

Now suppose that X = (X',.. . ‚X') is a vector of semimartingales (or an


R k - valued semimartingale). Let F: R k -> R' be twice continuously differenti-
able with derivatives DF, i = 1, ... , k, and D'D'F, i, j = 1, ... , k. The following
"change-of-variables” formula generalizes the classical Ito formula [e.g., see
McKean (1969)].

Theorem 1. (Ito; Doleans-Dade and Meyer)

(2) F-X(t)=FoX(0)+1 Y, D'FoX(s—)dX(s)


0 , 11 1=1

k
+ 1 E DD'FoX(s—)d(X",X")(s)
2 (o, , ] t,i=1

+ jFoX(s)—FOX(s—)— D'FoX(s—)AX'(s)^
$ I111 i_I

(3) =F0X(0)+ I D'FoX(s—)dX(s)

k
+ 1 D`D'FoX(s—)d[X`,X'](s)
2 j,,1 i,jt

+ jFoX(s) — FoX(s )— — D'F°X(s )AX'(s )


— —

sir I. i t

n
— 1 Z D'D'F o X(s —)LX'(s)AX'(s) .
2 ^,i =1

Proof. See Meyer (1976), Chapters 3 and 4; or Doleans-Dade and Meyer


(1970). For the continuous case, e.g., Kallianpur (1980). ❑
CHANGE OF VARIABLE FORMULA AND EXPONENTIAL SEMIMARTINGALES 897

The Exponential of a Semimartingale

It is a fact of elementary calculus that the unique solution f of

(4) f(t)= 1+f ,f(s)ds


o

is given by the exponential function

(5) f(t) = exp (t).

The following theorem gives a far-reaching generalization of this result.

Theorem 2. (Doleans-Dade) Let X be a semimartingale. Then there exists


a unique semimartingale Z = i'(X), called the exponential of X, satisfying

(6) Z(i)=1+ ^ Z_dX fora110<t<oo.


(o, r]

It is given by

(7) Z(t)=exp (X(t)-2{X(t) a fl ) (1+t1X(s).)exp(—AX(s)).

Proof. See Doleans-Dade (1970) or Meyer (1976), p. 304ff. The proof uses
the change-of-variable formula (2). ❑

The exponential Z of a semimartingale X "inherits" many of the properties


of X: the next theorem gives a partial list.

Theorem 3. Suppose that X E 9 [^ , P] and Z is given by (7). Then Z e


2

f>'[, P]. Moreover,

(i) If X E ft °`[^, P], then Z E .tit'°`[^, P].


,

(ii) If XE 'V[.F, P], then Z E V{F, P}.


(iii) If X E? n V, then Z E P r Y.
(iv) If X is deterministic then Z is deterministic.

Proof. See Jacod (1979), p. 192. ❑

Proposition 1. If X, Y E S"[ß, P], then

(8) f(X) S(Y) = '(X+ Y+[X, Y]).


898 MARTINGALES AND COUNTING PROCESSES

Example 1. Let F be an arbitrary df on R + = [0, cc) with F(0) = 0, and let


the cumulative hazard function A associated with F be defined by A(t)=
f (o r] (1 — F_) ' dF for 0<_ t < oo. Then
-

(9) F(t)= J (1 —F_)dA, 0 <_t<ao.


(o r )

or equivalently

(10) 1 —F(t) =1— J (1 —F_)dA.


(o t]

It follows immediately from Theorem 2 with X = —A (deterministic) that

(11) 1— F(t) =exp (—A(t)) fi (1- A(s))e °A(s)


ssr

=exp (—A`(t)) fl (1 —AA(s)). ❑


sir

Example 2. (Lemma 18.8, Liptser and Shiryayev, 1978) Suppose that A is


an /' right-continuous function with A(0) =0 and let a be a measurable
function with 1 (0 , ] Ia(s)I dA(s) <oo for 0< t <oo. Then, by Theorem 2 with
X(t) = J0,1 a(s) dA(s) (deterministic), the unique solution Z of

(12) Z(t)=Z(0)+ J
f(o r] aZ_dA

is given by

(13) Z(t) = Z(0) II


ssr
(l +a(s)AA(s))exp(f adA`^.
O,r]

where A,(t)=A(t) — E,,,AA(s). a


Example 3. Suppose that S is standard Brownian motion on R + = [0, m).
Then, with X ° rS, r E R, the unique solution Z of (6) is

(14) Z(t) = exp (rS(t)-1r z t), 0 <_ t <cc.

By Theorem 3, Z E rlt'°`[^, P]

Example 4. Let N be a counting process with continuous compensator A


and let X = cM= c(N—A) for c>-1. Then the unique solution Z of (6) is
CHANGE OF VARIABLE FORMULA AND EXPONENTIAL SEMIMARTINGALES 899
given by

(15) Z(t)=exp(c(N(t)—A(t))) fi ( l + E)
-

si5l: eN(s)=I e

=exp (N(t) log (1+c)—cA(t))


=exp(rN(t)—(e'-1)A(t)), r=log(1+c)
= exp (rM(t)—(e'-1—r)(M)(t))

since A continuous implies (M) = A by Theorem B.2.2. Since M c U[3, P],


it follows from Theorem 3 that Z E ^ 0c [^, P] ❑

A Useful Exponential Supermartingale

Let c>0 and define

(16) cp c (r)=(exp(re)-1—rc)/c 2 for 0:r<oo.

Proposition 2. If M E A loe [.", P] satisfies II AM lI ö < c, then the process

(17) Z'(t)=exp(rM(t)—co (r)(M)(t))

is a positive supermartingale for all r > 0.


Proof. See Lepingle (1978); for the discrete case, see Neveu (1975) or
Freedman (1975). ❑

Inequality 1. If Me .1t `[F, P] is a local martingale with 11AMIIä <— c, then


for all t>0, A >0, and r>0 we have

(18) P(IIM + IIö^A,(M)(t) T)^exp(—


\7c/)

and

(19) P(11M11,-A,(M)(t)<_T)<-2exp(-2Tp\ Tc/)

where I,(x)=2h(1+x)/x 2 , h(x)=x(logx-1)+1 as in Section 11.1. Hence

(20) P(IIM + IIö^A)sexp( - 2 r +i( ))+P((M)(t)''T)


900 MARTINGALES AND COUNTING PROCESSES

and

A^
(21) P(IIMIIö^,i)^2 exp T ))+P((M)(t) r).
( -2r (
For the discrete case see Freedman (1975), and note that the bound in
Freedman's (1.6) may be written as exp (— (a 2 /2b)Ji(a /b)).
Proof. Now EZ'(0) = 1 for every r> 0 and

P( IIM+IIö>A, (M)(t) ^ r)
= P(M(s) a- A for some 0s s ^ t, (M)(t) s r)
= P(rM(s) — (r)(M)(s) ? rA — (p^(r)(M)(s)
for some s < t, (M)(t) <_ r)
= P(exp [rM(s) — cp^(r)(M)(s)]? e'A- ` ,(')' for some s t)
-5 1/exp (rA — ço (r)r)

by the supermartingale inequality and Proposition 2


(a) = exp ( —r,1 + r(p,(r)).

Choosing r = (1/c) log (1 +Ac /r) yields (17). Since —Mc Jt0 °[. , P] also has
c, (17) holds with ISM1ö, and combining the two inequalities
yields (18). ❑
Errata
1 Introduction
Since publication of our book Empirical Processes with Applications to Statistics
in 1986, we have become aware of several mathematical errors and a number of
typographical and other minor errors. Although we would now do many things
differently, our purpose here is only to give corrections of the errors of which we
are currently aware.
We encourage readers finding further errors to let us know of them.
We owe thanks to the following friends, colleagues, reviewers, and users of the
book for telling us about errors, difficulties, and shortcomings: N. H. Bingham,
M. Csörgö, S. Csörgö, Kjell Doksum, Peter Gaenssler, Richard Gill, Paul Janssen,
Keith Knight, Ivan Mizera, D. W. Muller, David Pollard, Peter Sasieni, and Ben
Winter.
We owe special thanks to Peter Gaenssler for providing us with a long list of
typographical errors which provided the starting point for section 3 here.
The corrections of Chapters 7, 21, and 23 given in section 2 were aided by
discussions and correspondence with Richard Gill and Ben Winter (in the case of
Chapter 7), I. Bomze and E. Reschenhofer, and W.D. Kaigh (in the case of Chapter
21), and Keith Knight (in the case of section 23.3).

2 Major changes and revisions


Here we give substantial corrections and revisions of section 7.3 (pages 304-306)
and section 23.3 (pages 767-771).

2.1 Revision and correction of section 7.3


The last two lines (pages 305, lines -7 and -6) of the proof of (1) of Theorem 1
(page 304) are false. Hence there are also difficulties in the cases (i)—(v) on pages
305-306. The following revision of section 7.3 should replace that entire section.
As indicated in the following text, these results are due to Peterson (1977), Gill
(1981), and Wang (1987).
We owe thanks to Richard Gill and Ben Winter for pointing out these difficul-
ties and for correspondence concerning their solution.

Section 7.3, pages 304-306, should be replaced by the following:


901
902 Errata

3. CONSISTENCY OF A n and IF n .
In this section we use the representations of Theorem 7.2.1 and continuity of
the product integral map E which takes A to F (see section B.6 and_ especi al ly
example B.6.1, page 898) to establish weak and s trong consistency of A n and IF n .
Our first result gives strong consistency of both A n and 9,,, on any interval [0, 0]
with0<T- TH - H -1 (1).
Theorem 1. Suppose that F and G are arbitrary df's on [0, no). Recall T - TH
H 1 (1) where 1 — H - (1 — F)(1 — G). Then for any fixed 0 < r
-

(1) — F'IIg 'a.s. 0 as n -^ no

and

(2) IIAn — AII0 —'a.s. 0 as n —4 oo.

The following theorems strengthen (1) of Theorem 1 in different directions.


Theorem 2. Suppose F and G are df's on [0, no) with T - -rH - H (1)
satisfying either H(T—) < 1 or F(T—) = 1 (where T = no is allowed). Then

(3) sup (t) — F(t)I = II 9 n — FII 0 as n —> oo,


O<t<T

and, with T - Zn:n,

(4) sup Ilf'n(t) — F(t)I = IIPn — F la.s. 0 as n —> oo.


O<t<T

The following theorem is more satisfactory since F and G are completely ar-
bitrary; the price is that the consistency is in probability (and the supremum in (5)
is just over the interval [0, T)).
Theorem 3 (Wang). Suppose that F and G are completely arbitrary. Then

(5) sup l(t) — F(t)I —> p 0 as n --> oo,


O <t<T

and, with T - Z,, :n ,

(6) sup II(t) — F(t)I — 0 as n — oo.


0<t<T

Open Question 1. Does Wang's Theorem 3 continue to hold with --> p replaced
by —> as ? (The hard case not covered by Theorem 2 is F(T—) < 1, G(T—) = 1.)
Errata 903

Recall that for an arbitrary hazard function A (of a df F on R+), the (product
integral) or exponential map £( -A) recovers 1 - F:
1 - F(t) = £(-A)(t) _ fJ (1 - dA)
O<s<t

= exp(- A c (t)) (1 - AA(s));fi


o<s <t
see section B.6 and Example B.6.1. Our proofs of Theorems 1-3 will use the
following basic lemma which is due to Peterson (1977), Gill (1981), and, in the
present form, Wang (1987).
Lemma 2.1 (Continuity of the product integral map £) Suppose that {gn}n>o
is a sequence of nondecreasing functions on A = [0, rr] or [0, r) satisfying Ago <
1, and set h n - E(-g n ), n = 0,1, .... If
(7) Sup I gn(t) - go(t)I -* 0 as n -* oo,
tEA

then
(8) sup Ih n (t) - ho (t) I -' 0 as n -> oc.
tEA
Proof of Theorem 1. Now IIH n - HII ->a.s. 0 by Glivenko-Cantelli, so that
III() - H_ I a . s . 0 also. Thus for any fixed t < 0 we have a.s. that
An (t) - A(t) I <
t I1-
( H n -) - '-(1-H_) 1 I dIH[ n1

+ 0
t
(1 - H_) -1 d(H 1 - H 1 )I
(a) a.s. 0+0=0

by the Glivenko-Cantelli theorem and H t-) < H(0) < 1 for the first term, and
(

by the SLLN for the second term. Since A n and A are 1, the standard argument of
(3.1.83) improves (a) to (2).
But then (1) follows from (2) and continuity of the product integral map £
given in Lemma 2.1. ❑
Proof of Theorem 2. First suppose H(r -) < 1. Then as in (a) of the proof of
theorem 1,
_
A n (t) - A(t)I < J 0
t
{(1 - H n _) -1 - (1 - H_) - 1 }d]H[n

+ II t
(1 - H-) - 1 d(E - H 1 ) ,

904 Errata

where the first term converges to zero uniformly on [0, T] by the Glivenko—Cantelli
theorem since 1— H(T—) > 0 and H7(r) < 1. Now the second term: for 0 < t <
T,

t 1
% ( 1 n — H 1)I
J o 1 — H_ d ]EI[

f l (t) — Hl(t) — f t (^n(s) — H l (s))d (1 — H(s—)/


— H(t— )
Aä l (T) — AH'(T)
+I 1 — H(T—)

< 2I
1
— H'IIö + I IH[ (T) — OH 1 (T)
1 — H(T—) 1 — H(T—)
—'a.s. 0 + 0 = 0,

so the second term converges to zero a.s. uniformly in t E [0, T]. Hence

(a) IIA — AIIO = sup I A (t) — A(t)I as 0.


0<t<r

If 0A(T) < 1, then (3) follows from Lemma 1. If A11(T) = 1 (so F(T) = 1), then
lemma I and (a) imply that

sup IIFn(t) — F(t)I —a.s. 0


0<t<T

and

0 < 1 — (T) < 1 — AA 1i (T)


^a.s. 1 — L\A(T) = 0 = 1 — F(T),

so again (3) holds.


Now suppose that F(T—) = 1. Given E > 0, choose 0 < T such that F(0) >
1 — E. For 0 < t <i- both

(t)<1

and

1 — E<F(T) C1.

Hence

(b) II^n — FII0 C max{e, 1 — IF n (0)} —> a . s . max{E,1 — F(0)} = E


Errata 905

by (1). Since E is arbitrary, (1) and (b) imply (3) in this case (F(T—) = 1), too.
Since T = Zn . n <r a.s., (4) follows from (3) . ❑
Proof of Theorem 3. We first suppose 0 < T with F(9 —) <1 and show that

(a) sup jA n (t) — A(t) I —> r, 0 as n — o0


O<t<O

and

(b) sup IIE' n (t) — F(t)I —* 0 as n —> cc.


O<t<e

Let ID n = A n —A. Then, with T - Zn;n , IIDn - {➢D n (t A T) : t >_ 0} is a square


integrable martingale with predictable variation process
T tAT
1 — DA(s)
(c) (fin) (t) _ dA(s).
o n(1 — IEI[ n (s—))
Now

(d) (Dn) (8 ) — a.s. 0.

To see this, let e > 0, and choose o < 8 so that A(8 —) — A(o) < E, and hence
,

H(Q) < 1 also. Then


1 — DA(s)
(D T ^( 0_ ) — (D n )(ci) = 1{T,s^ dA(s)
L
1 0 — n(1 — IHIn (s—))
("0)
< A(0) - A(u) <€‚
and, by the Glivenko—Cantelli theorem

^6 1 — DA(s)
(IDn) (0,) _ dA(s)
p 1 — H n (s—)
1 — DA(s)
#as a dA(s) < oo.
0 1 — H(s—)
Therefore,

(fin (^) ^a.s. 0

and

limsup(IDn)(B—) < E a.s.


n —oo
906 Errata

Since e > 0 is arbitrary, (d) holds.


By Lenglart's inequality B.4.1,
(e) sup I1(Dn(t)J —> p 0 as n -, oo.
o<t<0

Since we also have (recall that T - Zn : n )


A(8—) —11(T)}l[ T< e] -}a.s. 0 as n —* oo,
in view of F(9—) < 1, (a) holds.
Now (a) implies that for every subsequence {n'} there is a further subsequence
{n"} C {n'} so that
(f) SUP ^An ^^ (t) — A(t)j a.s. 0 as n
t<o

But by continuity of 9 given by Lemma 1, it follows from (f) that

(g) SUP !«(t) — F(t) I —a.s. 0,


0<t<O

and hence (b) holds when F(6—) < 1.


To complete the proof of (5), it remains only to consider the case F(T—) = 1.
But then (5) follows from (3).
To prove (6), consider the two cases H(T—) = 1 and H(-r—) < 1: If H(T—)
1, then T - Zn;n < T a.s., and hence (6) follows from (5). If H(-r—) < 1, then
(6) follows from (4). ❑
Proof of Lemma 1. By (7) and Ago < 1 we can assume that
(a) Og7z < 1 for n = 1, 2, ... .
Since gn are nondecreasing and finite and (a) holds, it is easy to verify that h, z > 0,
n = 0,1, 2, .... For t E A and e > 0, define (note (B.5.3))

(b) 9(t) = g(t) — log(1 — Ogn(s))1[IO9n(s)1<_El


s<t

and

(c) 9n (t) = — log(1 — Ogn(s ))1^IOgn (s)1>E]


s<t

so that

(d) 9n (t) + 9n (t) _ — log h n (t) .



Errata 907

Now gn(t) is the sum of at most a finite number of terms. Thus by (7) for every
E >0with

(e) E E {a < 1/2 : Ago(t) a for all t E A}

it follows that

(f) sup
teA s<t
09n (S)1[lo9n(S)I>E] — Z
s<t
A 9 o(s) 1 [Io90(3)1>E] I —' 0

asn —iocand

(2.1) sup I9n(t) — 9ö(t)I —*0 as n —> oc.


tEA

But note that

9n(t) —g(t)

g(t) — 9n(t) — 09 n( 8 ) 1 [109n(1)1<_11I


s<t

s<t
+ Ig(t) + E A9n(s ) 1 [I 0 9n (s)I<El — 9ö(t) —
s<t
09o(s) 1 [Io9o(s)I<_El

+ Ig(t) — 9ö(t) — A go( s ) 1 [1Ago(s)I<E1


s <t

< (^ {log(1 — A9(s)) + Agn(s)} 1[IAgn (s)E]


s<t

+IE 09n (s) 1 [Io9n(S)I>E1 — i Ago(s)1[g0(s)j>€]


s<t s<t

+ I9n(t) — go(t) I
+I i {log(1 — Ogo(s)) + Ago(s)} 1[logo(s)ISe]
s<t

< E(19n(t)1 + fgo(t)I)


+ I E
s<t
A9n( 8 ) 1 [IA9n (s)1>11 —
s<t
A9 o( s ) 1 [Io90(3)I>E]

+ I9n(t) — go(t)I•

Therefore, for every e satisfying (e), (f) yields

lim sup sup Ign — go (t) ^ < 2Ego (T),


n,ao tEA —
908 Errata

and hence, by (d) and (g),

(h) lim sup sup log h n (t) — log ho (t) j < 2ego (T) .
n—oo tEA

Since E is arbitrary, (h) implies (8). ❑

2.2 Revision and correction of Section 7.7. Weak convergence = of


and X,,, in /q - metrics ö
On page 325 in Exercise 3, the displayed equation should read as follows:

(1 — K)/(1 — F) _ (1 +
J CdF)
0

and then "Hence (1 — K)/(1 — F) is \."

2.3 Revision and correction of Section 19.4


Page 689, lines 9-12, should be replaced by the following:
i I

fo [l < j — t]J(t)dg(t)j

< tB(t)dg(t) + 1 (1 — t)B(t)dg(t)


Jo J e

< tB(t)D(t)) ö + f D(t)d(tB(t))


+ (1— t)B(t)D(t)I + J D(t)d(tB(t))
1

< M'[ß(1 — 6)] -a where a < 1/2.


2.4 Revision and correction of Section 23.3. The Shorth
There is an error here in the grouping of the n 1 / 6 factor leading to (i) on page 768;
and exercise I on page 771) is not correct. The following correction is perhaps the
simplest. A different, somewhat longer correction, was suggested to us by Keith
Knight. Knight's alternative correction changes the "centering" in the definition of
Mn in (6) from 2F -1 (1 — a) to lFn 1 (1 — a) — IFn 1 (a).)
Beginning on page 768 just after (g):
Errata

Moreover, since g' exists and is continous,

sup g ^1 — a + n1t ) n 1 / 6 1^ n (1 — a -I nit3)


ltl =x

— g(1 — a)n 1 / 6 (1 — a +
At
< n 1 / 6 sup ( ) _(1_a) }
jtj<x

sup Ih n
(tj<K
(1 — a + A l
\
t
/

At
< n 1 /4 sup g (1 — a+ ) — g(1 — a)
^tl <K

• n-1/12 At
SUP B n (1 —a+
jtI <K
= o(1)O(l) a.s.
= o(1) a.s.

Continue on page 769, line 1.

Correction of Exercise 23.3.1, page 771. Replace Exercise 1 by the following:

Exercise 1. Show that for any 0 < K < oc and 0 < A < oc and 0 < a < 1 we
have
n-1/12 sup Ilg n (a
+ to /n 1 / 3 )I = 0(1) a.s.
I tl <K

Knight's alternative correction for this section involves the following alternative
exercise.

Exercise 1'. Show that for any 0 < K < oo and 0 < A < oo and 0 < a < 1 we
have

sup n 1 / 2 IB,,(a + to/n 1 l 3 ) — IB n (a) I = 0(1) a.s.


Its<K
910 Errata

3 Typographical errors, spelling errors, and minor changes

Page Line or equation Error or change


12 (10) factor of (-1) + l is missing
14 (7) factor of (-1) k + 1 is missing
^ 2 ^ 2
15 (14)+1 E1 Xi — Eß=1 X
16 -1 (2.2.11) - (2.2.13)
25 Exercise 4 replace G on the LHS by g (lower case)
25 -4 left continuous inverse K -1
27 (11) d(x, y) is the do(x, y) of Billingsley (1968, pp. 112-115)
28 (15) X -> x (is needed in 5 places)
29 (18)+1 x X
29 (18)+5 the set of continuous
30 (5) +1 (sl A s2)(t, A t2 — tlt2)
37 (j)+l 5.9- 9.9
37 (15)-4 5.6 --+ 9.6
47 -12 replace "to then" by "then to"
47 Theorem 4 referred to on p. 113 as "Skorokhod's theorem"
47 (16)+l MS 6-separable implies Ms
is MB-measurable (cf. lemma in Gaenssler)
49 (24)-1 HZ - A m,ZII --i IIZ - A te, o ZII
52 (261)+l insert "with µ,z ([0,1]) --> some # E (0, oc) "
59 4 change to: ... independent of S
59 5 dF(a) -+ dF(-a)
61 Exercise 8 Keifer Kiefer
61 -2 (23) — (30)
63 -8 (1.1.31) -+ (0.1.31)
65 12 (EIIX - YII)1/p -* (EIIX - YIIP)l/P
69 (2)-(3) b --r b n
70 16 Brieman -f Breiman
73 -8 limST --* RM-Is"I
83 -3 EIXI =. -> EIXI = oo•
88 (21) — ^i
90 (33) +1 X twice
90 (35) _->-
90 (35)+1 identify --> identity
92 (54) E
112 2 Lemma 2.3.1 -^ Lemma 1.3.1
124 -6 '(t-1,t] , 1 (t1-1,t^1 (t)

Errata 911

Page Line or equation Error or change


126 7 Vn — Ln
135 (a) EVn ---+ EVn
138 -5 martingale - submartingale
140 (37) xni - Xni
150 -8, -9 replace "with m = n" by "with m = m' = n"
151 (2) i =1
153 -2 Xni = Fn 1 (^ni) - Xni = F'()
154 (16)+1 Xni - F 1 () again
156 6 Theorem 1 -f Theorem 3
160 (d) f --* f x".
160 (e) +1 (the second occurrence of a'a) -> a'a
163 -2 I[']1 -' 1[.]1 2
168 Corollary 1 Fo - F
168 (5)-1 3.6 -> 3.8
169 1 Theorem 14.1.4 -> Theorem 4.1.1
169 -12 Theorem 4.1.2 -> Theorem 4.1.5
169 -1 P -> P; Theorem 4.1.2 --* Theorem 4.1.5
193 -6 Xn - Xi
195 1 vector -+ matrix; constant - constants
195 -10 1 -^ X'
224 (32) G -> G 2
224 (32)+1 change Gn -->d G to G —*d G2 n
224 (33) change P(G > A) to P(G 2 > A)
262 (25)+3 dA(x) = -p dA(x)
264 (6) 1 [Xi <y] - 1 [xn2 <_yl
264 (6) 1 < i<n. --p 1<i<n3
265 (14) Xi -* Xn i twice
266 7-8, -2 Xi -+ Xni throughout
270 (32)-1 change (A.9.6) to (A.9.16)
272 (40) Xi -> Xni twice
273 (1) Xi -* Xni twice
274 -3 O(x) = x 2 --- O(x) = x
275 (9) 11- Ilö, 11.11 -' 11110, 111Iö
275 -1 f e oo -^ fo
276 (1) Xi -* Xn i (twice)
279 (9) N -* I[K on RHS
282 (21) delete nonsymbol before =
288 -4 [I^n - K] -f [Kn - K]

912 Errata

Page Line or equation Error or change


294 (4)+3 T = =
TH TF V TG T TH= = TF A TG
295 -4 change (10) to (12)
304-306 see the major revision in Section 2 of this Errata
323 2 change "Proof of (10)" to "Proof of (9)"
325 Exercise 3 see the major revision in Section 2 of this Errata
339 -1 Un -- * UN
n,
N

369 (38) change to n! on the right side of this display


419 (4) change < E. to < M/ / .
424 9 delete an J. 0
425 -7 G I Gn I
425 (15) +3 Mason (1981) --f Mason (1981b)
425 -1 Mason (1981) —* Mason (1981b)
429 (a) = 0 becomes = 0 a.s.
450 10 ]g2(t) ]€ 92 ( ) (at the end of the line)
451 (16)+1 see Bretagnolle and Massart (1987) for
P(II U II0—
> A) ) <— ex p 2
— 2b1—b))'
454 (7)
478 Exercise 4. +1 Anderson's
483 (13) (2) --; ()
twice
492 -1 Esseen — Esseen
545 (18) #Un --> 1n#
558 section title IKn --i 1K
584 (3) - 1 ((log2 n) 1 / 4 log n/\
/n)1 /4
((log2 n)(log n)2
604 (2)+10 n t
604 (2)+11 t —* n
661 5 'P —p n twice '

661 (5) +1 would --> might


661 (9) + 1 in the next section —+ in section 4
662 (12) =-
688 (1) -1 "since the ..." —* "since for the ..."
695 (3)+2 n n1
+
—'

696 (3) + 1 (3.7.4) — (3.6.4)


697 (15) 0<t<1
698 (21)
(7)
o
fPn,i+l
t Jo
Pn,i+1

699 fo fo
731 (22) .f o F -1 —* .f o F 'Pn(l, c)

Errata 913

Page Line or equation Error or change


732 (10) log(1 - t) --i (log(1 - t))p n,(1, c)
732 (11) (1 - 2pl, c )
740 (32) (1 - 2p1, c )
746 5 change to: The definition of S n is found first
in Smirnov (1947); see also Butler (1969).
747 (11) + 2 change to: Smirnov (1947) and Butler (1969) give an
expression for the exact distribution.
771 2 t missing just before K
778 (16)+3 Wang -* Yang
790 (4) 7r --> 1I
802 (d) 2.1[T2,00) 2 - 1 [T2,00)
804 (j) F belongs with Si and Sm as part of the subscript
813 -4 by (A.14.7) -} by (A.14.8)
819 -3 (eX -1-x2) (ex -1-x)
820 9 (A.14.14) -* (A.14.15)
821 -2 nonidentically -+ not identically
821 -3 combinations of
-> combinations of a function of
844 (6) 2sn, -4 \ s n
850 3 'yi -^ Y1
850 -2 Mill's — Mills'
z
851 (5) exp(_ a2 2 M 2 ' exp - 2 Q Z
852 (a) E exp(^i Xi) -> E exp(r Ei Xi)
853 7 0<\<1-7i--4 0<A//<1-µ
855 (12) exp (-2A 2 / >1(bi - ai) 2 )
--). exp (-2nA2/ T.i (bi - ai) 2 )
856 (21) +2 Steinback --* Steinbach
859 -6 Renyi --> Erdös and Renyi
863 4 i - 1/m --> (i - 1)/m
863 -2 dt). -i dt) be monotone
868 -5 U+U'' -i+l —+ U+ U k -i+l
868 -1 r -+ k twice
873 -12 [0, 0] -i (0,0]
873 -2 5 -* S
879 13 maxo<j<k -; maxo<j < k
890 (8) + 2 replace Ac(t) - E s<t AA(s) by
A c (t) = A(t) - I3 <t AA(s)

914 Errata

Page Line or equation Error or change


896 (2) dX -* dX
896 (3) dX - dX 2
897 (6) f(o,t] -' f[o,t]
898 3 and (9) (0, t] -p [0, t]
903 -5 enchantillon -> echantillon
904 4 Burk -+ Burke
910 20 tall --- tail
915 -6 Steinbach --> Steinebach
916 -2 theroy -* theory
925 -16, right Wang -+ Yang
925 -7, left Steinbach -+ Steinebach
929 -7, right 877 -^ 878
936 9, left Rebelledo - Rebolledo
938 17, left 676 --' 677

Errata Jr

4 Accent mark revisions


Page Line Error or change
xxxiii 3.8.3 Renyi
xvii -3 Renyi
16 8 Tusnädy
19 -4 Levy
223 -6, -3 Csörgö
559 4 Csörgö
274 -2 Horväth
492 -6 Horvath
903 -5 nonequireparti
904 4 Horvath
905 -6 Horvath
906 1, 3, 7 Horvath
923 23 Horvath
843 -7 Loeve
844 -5 Loeve
846 6 Loeve
855 -10 Loeve
861 1 Loeve
913 13 Loeve
924 24, left Loeve
924 4, left Komlös
905 24 Csäki
905 24 Tusnädy
908 -23, -21 Rejtö
913 1 Poincare
270 9 Doleans-Dade
897 7, 12 Doleans-Dade
907 1 Doleans-Dade
916 Errata

5 Solutions of "Open Questions"


Problem Page Reference for solution
OQ 9.2.1 353 Götze (1985)
OQ 9.2.2 356 Massart (1990)
OQ 9.8.1 400 Khoshnevisan (1992)
OQ 9.8.2 400 Khoshnevisan (1992)
OQ 9.8.3 400 Bass and Khoshnevisan (1995)
OQ 10.6.1 428
OQ 10.7.1 431
OQ 12.1.1 495
OQ 12.1.3 495 Deheuvels (1998, 1997)
OQ 13.4.1 526 Einmahl and Mason (1988)
OQ 13.5.1 530 Lifshits and Shi (2003)
OQ 13.6.1 530
OQ 14.2.1 544 Deheuvels (1991)
OQ 14.2.2 545 Einmahl and Ruymgaart (1987)
OQ 15.2.1 596
OQ 16.2.1 605
OQ 16.4.1 616 Einmahl and Mason (1988)
OQ 17.2.1 628
OQ 25.3.1 809

REFERENCES

BASS, R. F. AND KHOSHNEVJSAN, D. (1995). "Laws of the iterated logarithm for local times of
the empirical process," Ann. Probab., 23, 388-399.
BRETAGNOLLE, J. AND MASSART, P. (1989). "Hungarian constructions from the nonasymptotic
viewpoint," Ann. Probab., 17, 239-256.
CHERNOFF, H. AND SAVAGE, 1. R. (1958). `Asymptotic normality and efficiency of certain non-
parametric test statistics," Ann. Math. Statist., 29, 972-994.
CSÖRG6, M., CSÖRG6, S. AND HORVATH, L. (1986). An Asymptotic Theory for Empirical Re-
liability and Concentration Processes, vol. 33 of Lecture Notes in Statistics, Springer- Verlag,
Berlin.
DEHEUVELS, P. (1991). "Functional Erdös-RBnyi laws," Studia Sci. Math. Hungar., 26, 261-295.
DEHEUVELS, P. (1997). "Strong laws for local quantile processes," Ann. Probab., 25, 2007-2054.
DEHEUVELS, P. (1998). "On the approximation of quantile processes by Kiefer processes," J. The-
oret. Probab., 11, 997-1018.
EINMAHL, J. H. J. AND MASON, D. M. (1988). "Strong limit theorems for weighted quantile
processes," Ann. Probab., 16, 1623-1643.
Errata 917

EINMAHL, J. H. J. AND RUYMGAART, F. H. (1987). "The almost sure behavior of the oscillation
modulus of the multivariate empirical process," Statist. Probab. Lett., 6, 87-96.
GÖTZE, F. (1985). `Asymptotic expansions in functional limit theorems," J. Multivariate Anal., 16,
1-20.
KHOSHNEVISAN, D. (1992). "Level crossings of the empirical process," Stochastic Process. Appl.,
43,331-343.
LIFSHITS, M. A. AND SHI, Z. (2003). Lower functions of an empirical process and of a Brownian
sheet. Teor. Veroyatnost. i Primenen. 48 321-339.
MARTYNOV, G. V. (1978). Kriterii omega-kvadrat, Nauka, Moscow.
MASSART, P. (1990). "The tight constant in the Dvoretzky-Kiefer-Wolfowitz inequality," Ann.
Pmbab.,18, 1269-1283.
SMIRNOV, N. V. (1947). "Sur un entere de symetrie de la loi de distribution d'une variable aleatoire,"
C. R. (Doklady) Acad. Sci. URSS (N.S.), 56, 11-14.
WANG, J. G. (1987). `A note on the uniform consistency of the Kaplan-Meier estimator," Ann.
Statist., 15, 1313-1316.
References

Aalen, O. (1976). "Nonparametric inference in connection with multiple decrement models,"


Scand. J. Statist., 3, 15-27.
Aalen, O. (1978a). "Nonparametric inference for a family of counting processes," Ann. Statist.,
6, 701-726.
Aalen, O. (1978b). "Weak convergence of stochastic integrals related to counting processes," Z.
Wahrsch. verw. Geb., 38, 261-277.
Aalen, O. and Johansen, S. (1978). "An empirical transition matrix for nonhomogeneous Markov
chains based on censored observations," Scand. J. Statist., 5, 141-150.
Abrahamson, 1. G. (1967). "The exact Bahadur efficiencies for the Kolmogorov-Smirnov and
Kuiper one- and two-sample statistics," Ann. Math. Statist., 38, 1475-1490.
Aggarwal, Om P. (1955). "Some minimax invariant procedures for estimating a cumulative
distribution function," Ann. Math. Statist., 26, 450-463.
Albers, W., Bickel, P. J., and van Zwet, W. R. (1976). "Asymptotic expansions for the power of
distribution free tests in the one-sample problem," Ann. Statist., 4, 108-156.
Ale"skjavicene, A. (1977). "Approximation of the distribution of the sum of random variables with
mean small absolute value," Soy. Math.—Dokl., 18, 519-523.
Alexander, K. S. (1984). "Probability inequalities for empirical processes and a law of the iterated
logarithm," Ann. Prob., 12, 1041-1067.
Aly, A. A. (1983). "Strong approximations of the Q-Q process," unpublished technical report,
Department of Mathematics and Statistics, Carleton University.
Andersen, E. S. (1953). "On the fluctuations of sums of random variables, I," Math. Scand., 1,
263-285.
Andersen, E. S. (1954). "On the fluctuations of sums of random variables, 11," Math. Scand., 2,
195-223.
Anderson, K. M. (1982). "Moment expensions for robust statistics," Technical Report No. 7,
Department of Statistics, Stanford University.
Anderson, T. W. (1960). "A modification of the sequential probability ratio test to reduce the
sample size," Ann. Math. Statist., 31, 165-197.
Anderson, T. W. and Darling, D. A. (1952). "Asymptotic theory of certain 'goodness of fit' criteria
based on stochastic processes," Ann. Math. Statist., 23, 193-212.
Anderson, T. W. and Darling, D. A. (1954). "A test of goodness of fit," J. Am. Statist. Assoc., 49,
765-769.
Anderson, T. W. and Samuels, S. M. (1967). "Some inequalities among binomial and Poisson
probabilities," Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and
Probability, Vol. 1, pp. 1 -12, University of California Press, Berkeley, California.

919
920 REFERENCES

Andrews, D. F., Bickel, P. J., Hampel, F. R., Huber, P. J., Rogers, W. H., and Tukey, J. W. (1972).
Robust Estimates of Location, Princeton University Press, Princeton, New Jersey.
Bahadur, R. R. (1966). "A note on quantiles in large samples," Ann. Math. Statist., 37, 577-580.
Bahadur, R. R. (1971). "Some limit theorems in statistics," Regional Conference Series on Applied
Mathematics, Vol. 4, SIAM, Philadelphia, Pennsylvania.
Bahadur, R. R. and Rao, R. R. (1960). "On deviations of the sample mean," Ann. Math. Statist.,
31, 1015-1027.
Bahadur, R. R. and Zabell, S. L. (1979). "Large deviations of the sample mean in general vector
spaces," Ann. Prob., 7, 587-621.
Barbour, A. D. and Hall, Peter (1984). "Stein's method and the Berry- Esseen theorem," Aust. J.
Statist., 26, 8-15.
Barlow, R. E. and Campo, R. (1975). "Total time on test processes and applications to failure
data analysis," Reliability and Fault Tree Analysis, pp. 451-481, SIAM, Philadelphia, Pennsyl-
vania.
Barlow, R. E. and Proschan, F. (1977). "Asymptotic theory of total time on test processes with
applications to life testing," in Multivariate Analysis-IV, P. R. Krishnaiah, ed., pp. 227-237,
North-Holland, New York.
Baum, L., Katz, M., and Stratton, H. (1971). "Strong laws for ruled sums," Ann. Math. Statist.,
42, 625-629.
Baxter, G. (1955). "An analogue of the law of the iterated logarithm," Proc. Am. Math. Soc., 6,
177-181.
Beekman, J. A. (1974). Two Stochastic Processes, Almqvist & Wilksells, Stockholm.
Beirlant, J., van der Meulen, E. C., and van Zuijlen, M. C. A. (1982). "On functions bounding
the empirical distribution of uniform spacings," Z. Wahrsch. verw. Geb., 61, 417-430.
Bennett, G. (1962). "Probability inequalities for the sum of independent random variables," J.
Am. Statist. Assoc., 57, 33-45.
Beran, R. (1977a). "Estimating a distribution function," Ann. Statist., 5, 400-404.
Reran, R. (1977b). "Robust location estimates," Ann. Statist., 5, 431-444.
Bergman, B. (1977). "Crossings in the total time on test pilot," Scand. J. Statist., 4, 171-177.
Berk, R. H. and Jones, D. H. (1979). "Goodness-of-fit statistics that dominate the Kolmogorov
statistics," Z. Wahrsch. verw. Geb., 47, 47-59.
Berning, J. A. (1979). "On the multivariate law of the iterated logarithm," Ann. Prob., 7, 980-
988.
Bickel, P. J. (1967). "Some contributions to the theory of order statistics," Proceedings of the Fifth
Berkeley Symposium Mathematical Statistics and Probability, Vol. 1, pp. 575-591, University
of California Press, Berkeley, California.
Bickel, P. J. (1973). "On some analogues to linear combinations of order statistics in the linear
model," Ann. Statist., 1, 597-616.
Bickel, P. J. and Freedman, D. A. (1981). "Some asymptotic theory for the bootstrap," Ann.
Statist., 9, 1196-1217.
Bickel, P. J. and van Zwet, W. R. (1980). "On a theorem of Hoeffding," in Asymptotic Theory of
Statistical Tests and Estimation, I. M. Chakravarti, ed., pp. 307-324, Academic Press, New
York.
Bickel, P. J. and Wichura, M. J. (1971). "Convergence for multiparameter stochastic processes
and some applications," Ann. Math. Statist., 42, 1656-1670.
Billingsley, P. (1968). Convergence of Probability Measures, Wiley, New York.
Billingsley, P. (1971). "Weak convergence of measures: Applications in probability," Regional
Conference Series on Applied Mathematics, Vol. 5, SIAM, Philadelphia, Pennsylvania.
Billingsley, P. (1979). Probability and Measure, Wiley, New York.
REFERENCES 921
Birnbaum, Z. and Marshall, A. (1961). "Some multivariate Chebyshev inequalities with extensions
to continuous parameter processes," Ann. Math. Statist., 32, 687-703.
Birnbaum, Z. W. (1952). "Numerical tabulation of the distribution of Kolmogorov's statistic for
finite sample sizes," J. Am. Statist. Assoc., 47, 425-441.
Birnbaum, Z. W. (1953). "On the power of a one-sided test of fit for continuous probability
functions," Ann. Math. Statist., 24, 484-489.
Birnbaum, Z. W. and McCarty, R. C. (1958). "A distribution-free upper confidence bound for
P( Y< X) based on independent samples of X and Y," Ann. Math. Statist., 29, 558-562.
Birnbaum, Z. W. and Pyke, R. (1958). "On some distributions related to the statistic D„," Ann.
Math. Statist., 29, 179-187.
Birnbaum, Z. W. and Tingey, F. H. (1951). "One sided confidence contours for probability
distribution functions," Ann. Math. Statist., 22, 592-596.
Blackman, J. (1955). "On the approximation of a distribution function by an empiric distribution,"
Ann. Math. Statist., 26, 256-267.
Blom, G. (1958). Statistical Estimates and Transformed Beta-Variables, Wiley, New York.
Blum, J. R. (1955). "On the convergence of empiric distributions," Ann. Math. Statist., 26, 527-529.
Boel, R., Varaiya, P., and Wong, E. (1975a). "Martingales on jump processes, I: Representation
results," SIAM J. Control, 13, 999-1022.
Boel, R., Varaiya, P., and Wong, E. (1975b). "Martingales on jump processes, II: Applications,"
SIAM J. Control, 13, 1022-1061.
Bohman, H. (1963). "Two inequalities for Poisson distributions," Skand. Aktuarietidskr., 46, 47-52.
Bolthausen, E. (1977a). "Convergence in distribution of minimum distance estimators," Metrika,
24,215-227.
Bolthausen, E. (1977b). "A nonuniform Glivenko-Cantelli theorem," unpublished technical
report.
Book, S. A. and Shore, T. R. (1978). "On large intervals in the Csörgö- Revesz theorem on
increments of a Weiner process," Z. Wahrsch. verw. Geb., 46, 1-11.
Boos, D. (1979). "A differential for L-statistics," Ann. Statist., 7, 955-959.
Boos, D. (1982). "A test for asymmetry associated with the Hodges-Lehmann estimator," J. Am.
Statist. Assoc., 77, 647-651.
Borovskih, Ju. (1980). "An estimate of the rate of convergence of the Anderson-Darling criterion,"
Theor. Prob. and Math. Statist., 19, 29-33.
Braun, H. 1. (1976). "Weak convergence of sequential linear rank statistics," Ann. Statist., 4,
554-575.
Breiman, L. (1967). "On the tail behavior of sums of independent random variables," Z. Wahrsch.
verw. Geb., 9, 20-25.
Breiman, L. (1968). Probability, Addison-Wesley, Reading, Massachusetts.
Bremaud, J. P. (1981). Point Processes and Queues: Martingale Dynamics, Springer- Verlag, New
York.
Bremaud, J. P. and Jacod, J. (1977). "Processus ponctuel et martingales," Adv. Appl. Prob., 9,
362-416.
Breslow, N. and Crowley, J. (1974). "A large sample study of the life table and product limit
estimates under random censorship," Ann. Statist., 2, 437-453.
Bretagnolle, J. (1980). "Statistique de Kolmogorov-Smirnov pour un enchantillon nonequireparti,"
Colloq. internat. CNRS, 307, 39-44.
Breth, M. (1976). "On a recurrence of Steck," J. Appl. Prob., 13, 823-825.
Brillinger, D. R. (1969). "The asymptotic representation of the sample distribution function,"
Bull. Am. Math. Soc., 75, 545-547.
922 REFERENCES

Bronstein, E. M. (1976). "Epsilon—entropy of convex sets and functions," Siberian Math. J., 17,
393-398. (Siberksi Mat. Zh. 17, 508-514, in Russian.)
Brown, B. (1971). "Martingale central limit theorems," Ann. Math. Statist., 42, 59-66.
Burk, M. D., Csörgö, S., and Horvath, L. (1981). "Strong approximations of some biometric
estimates under random censorship," Z. Wahrsch. verw. Geb., 56, 87-112.
Burke, M. D., Csörgö, M., Csörgö, S., and Revesz, P. (1978). "Approximations of the empirical
process when the parameters are estimated," Ann. Prob., 7, 790-810.
Burkholder, D. L. (1973). "Distribution function inequalities for martingales," Ann. Prob., 1, 19-42.
Butler, C. (1969). "A test for symmetry using the sample distribution function," Ann. Math.
Statist., 40, 2209-2210.
Cantelli, F. P. (1933). "Sulla determinazione empirica delle leggi di probabilita," Giorn. Ist. Iral.
Attuari, 4, 421-424.
Cassels, J. W. S. (1951). "An extension of the law of the iterated logarithm," Proc. Cambridge
Phil. Soc., 47, 55 -64.
Chan, A. H. C. (1977). "On the increments of multiparameter Gaussian processes," Ph.D. Thesis,
Carleton University.
Chandra, M. and Singpurwalla, N. D. (1978). "The Gini index, the Lorenz curve, and the total
time on test transforms," unpublished technical report, T-368, George Washington University,
School of Engineering and Applied Science, Institute for Management Science ad Engineering.
Chang, Li-Chien (1955). "On the ratio of the empirical distribution to the theoretical distribution
function," Acta Math. Sinica, 5, 347-368. (English Translations in Selected Transl. Math.
Statist. Prob.), 4, 17-38 (1964).
Chapman, D. (1958). "A comparative study of several one-sided goodness of fit tests," Ann. Math.
Statist. 29 655-674.
Cheng, P. (1962). "Nonnegative jump points of an empirical distribution function relative to a
theoretical distribution function," Acta Math. Sinica, 8, 333-347. [English translation in
Selected Trans!. Math. Statist. Prob., 3, 205-224 (1962).]
Chernoff, H. (1952). "A measure of asymptotic efficiency for tests of a hypothesis based on the
sum of observations," Ann. Math. Statist., 23, 493-507.
Chernoff, H., Gastwirth, J., and Johns, M. V. (1967). "Asymptotic distribution of linear combina-
tions of order statistics with applications to estimation," Ann. Math. Statist., 38, 52-72.
Chibisov, D. M. (1964). "Some theorems on the limiting behavior of empirical distribution
functions," Selected Trans!. Math. Statist. Prob., 6, 147-156.
Chibisov, D. M. (1965). "An investigation of the asymptotic power of tests of fit," Theor. Prob.
Appt., 10, 421-437.
Chibisov, D. M. (1969), "Transition to the limiting process for deriving asymptotically optimal
tests," Sankhya, A31, 241-258.
Chou, C. S. and Meyer, P. A. (1974). "La representation des martingales relatives a un processus
punctuel discret," C. R. Acad. Sci. Paris A, 278, 1561-1563.
Chow, Y. S. and Lai, T. L. (1978). "Paley type inequalities and convergence rates related to the
law of large numbers and extended renewal theory," Z. Wahrsch. verw. Geb. 45, 1-20.
Chow, Y. S. and Teicher, H. (1978). Probability Theory: Independence, Interchangeability, Martin-
gales, Springer- Verlag, Heidelberg.
Chung, K. L. (1948). "On the maximum partial sums of sequences of independent random
variables," Trans. Am. Math. Soc., 67, 36 -50.
Chung, K. L. (1949). "An estimate concerning the Kolmogorov limit distributions," Trans. Am.
Math. Soc., 67, 36 -50.
Chung, K. L. (1974). A Course in Probability Theory, 2nd ed., Academic Press, New York.
REFERENCES 923
Chung, K. L., Erdös, P., and Sirao, T. (1959). "On the Lipschitz condition for Brownian motion,"
J. Math. Soc. Jpn, 11, 263-274.
Clements, G. F. (1963). "Entropies of several sets of real valued functions," Pacific J. Math., 13,
1085-1095.
Cotterill, D. S. and Csörgö, M. (1982). "On the limiting distribution of and critical values for the
multivariate Cramer-von Mises statistic," Ann. Statist., 10, 233-244.
Cramer, H. (1928). "On the composition of elementary errors, Second paper: Statistical applica-
tions," Skand. Aktuarietidskr, 11, 141-180.
Cramer, H. (1962). Random Variables and Probability Distributions, Cambridge University Press,
Cambridge.
Crow, E. and Siddiqui, M. (1967). "Robust estimation of location," J. Am. Statist. Assoc., 62,
353-389.
Csäki, E. (1968). "An iterated logarithm law for semimartingales and its application to empirical
distribution function," Studia Sci. Math. Hung., 3, 287-292.
Csäki, E. (1975). "Some notes on the law of the iterated logarithm for empirical distribution
function," in Colloq. Math. Soc. J. Bolyai, Limit Theorems of Probability Theory, P. Revesz,
ed., Vol. 11, pp. 45-58, North-Holland, Amsterdam.
Csäki, E. (1977). "The law of the iterated logarithm for normalized empirical distribution function,"
Z. Wahrsch. verw. Geb., 38, 147-167.
Csäki, E. (1978). "On the lower limits of maxima and minima of a Wiener process and partial
sums," Z. Wahrsch. verw. Geb., 43, 205-222.
Csäki, E. (1981). "Investigations concerning the empirical distribution function," Selected Trans!.
in Math. Statist. and Prob., 15, 229-317.
Csaki, E. and Tusnady, G. (1972). "On the number of intersections and the ballot theorem,"
Periodica Math. Hung., 2, 5-13.
Csörgö, M. (1967). "A new proof of some results of Renyi and the asymptotic distribution of the
range of his Kolmogorov-Smirnov-type random variable," Can. J. Math., 19, 550-558.
Csörgö, M. (1983). "Quantile Processes with Statistical Applications," Regional Conference Series
on Applied Mathematics, Vol. 42, SIAM, Philadelphia, Pennsylvania.
Csörgö, M. and Revesz, P. (1974). "Some notes on the empirical distribution function and the
quantile process," in Colloq. Math. Soc. J. Bolyai, Limit Theorems of Probability Theory, P.
Revesz, ed., Vol. 11, pp. 59-71, North-Holland, Amsterdam.
Csörgö, M. and Revesz, P. (1975). "A new method to prove Strassen -type laws of invariance
principle, I and II," Z. Wahrsch. verw. Geb., 31, 255-260, 261-269.
Csörgö, M. and Revesz, P. (1975). "Some notes on the empirical distribution function and the
quantile process," in Colloq. Math. Soc. J. Bolyai, Limit Theorems of Probability Theory, P.
Revesz, ed., Vol. 11, pp. 59-71, North-Holland, Amsterdam.
Csörgö, M. and Revesz, P. (1978a). "Strong approximations of the quantile process," Ann. Statist.,
6, 882-894.
Csörgö, M. and Revesz, P. (1978b). "How big are the increments of a multiparameter Weiner
process?," Z. Wahrsch. verw. Geb., 42, 1 -12.
Csörgö, M. and Revesz, P. (1981). Strong Approximations in Probability and Statistics, Academic
Press, New York.
Csörgö, M., Csörgö, S., Horvath, L., and Mason, D. M. (1983). "An asymptotic theory for empirical
reliability and concentration processes," unpublished manuscript.
Csörgö, S. (1975). "Asymptotic expansions of the Laplace transform of the von Mises w 2 test,"
Theor. Prob. Appl., 20, 158-160.
Csörgö, S. (1976). "On an asymptotic expansion for the von Mises w 2 -statistic," Acta Sci. Math.
Hung., 38, 45-67.
924 REFERENCES

Csörgö, S. and Horvath, L. (1983). "The rate of strong uniform consistency for the product-limit
estimator," Z. Wahrsch. verw. Geb., 62, pp. 411 -426.
Csörgö, M., Csörg6, S., Horvath, L., and Mason, D. M. (1984a). "A new approximation of uniform
empirical and quantile processes and some of its applications to probability and statistics,"
Technical Report, Vol. 24, Laboratory for Research in Statistics and Probability, Carleton
University.
Csörgö, M., Csörgö, S., Horvath, L., and Mason, D. M. (1984b). "Applications of a new
approximation of uniform empirical and quantile processes to probability and statistics,"
Technical Report, Vol. 25, Laboratory for Research in Statistics and Probability, Carleton
University.
Daniels, H. E. (1945). "The statistical theory of the strength of bundles of thread," Proc. Roy.
Soc., London Ser. A 183, 405-435.
Darling, D. A. (1953). "On a class of problems relating to the random division of an interval,"
Ann. Math. Statist., 24, 239-253.
Darling, D. A. (1955). "The Cramer-Smirnov test in the parametric case," Ann. Math. Statist.,
26, 1-20.
Darling, D. A. (1957). "The Kolmogorov-Smirnov, Cramer-von Mises tests," Ann. Math. Statist.,
28, 823-838.
Darling, D. A. (1960). "On the theorems of Kolmogorov-Smirnov," Theor. Prob. Appl., 5,356-360.
Darling, D. A. and Erdös, P. (1956). "A limit theorem for the maximum of normalized sums of
independent random variables," Duke Math. J., 23, 143-155.
David, H. A. (1981). Order Statistics, 2nd ed., Wiley, New York.
Davis, M. H. A. (1974). "The representation of martingales of jump processes," research report,
pp. 74/78, Imperial College, London.
de Haan, L. (1975). "On regular variation and its application to the weak convergence of sample
extremes," Mathematical Centre Tract, Vol. 32, Mathematisch Centrum, Amsterdam.
DeHardt, J. (1971). "Generalizations of the Glivenko-Cantelli theorem," Ann. Math. Statist., 42,
2050-2055.
Dellacherie, C. (1972). Capacites et Processus Stochastique, Springer- Verlag, New York.
Dellacherie, C. (1974). "Integrales Stochastique par rapport aux procesus de Wiener et de Poisson,"
Seminaire de Probabilities VIII, Vol. 381, Springer- Verlag, Berlin.
Dellacherie, C. (1980). "Un survol de la theorie de ('integrale stochastique," Stoch. Proc. Appl.,
10, 115-144.
Dellacherie, C. and Meyer, P. A. (1978). Probabilities and Potential, I. Hermann/North-Holland,
Paris/Amsterdam.
Dellacherie, C. and Meyer, P. A. (1982). Probabilities et Potential, IL Hermann, Paris.
Dempster, A. P. (1959). "Generalized D„ statistics," Ann. Math. Statist., 30, 593-597.
Devroye, L. (1981). "Laws of the iterated logarithm for order statistics of uniform spacings,"
Ann. Prob., 9, 860-867.
Devroye, L. (1982). "Upper and lower class sequences for minimal uniform spacings," Z Wahrsch.
verw. Geb., 61, 237-254.
Diaconis, P. and Freedman, D. (1984). "Asymptotics of graphical projection pursuit," Ann. Statist.,
12, 793-815.
Dobrushin, R. L. (1970). "Describing a system of random variables by conditional distributions,"
Theor. Prob. Appl., 15, 458-486.
Doksum, K. (1974). "Empirical probability plots and statistical inference for nonlinear models
in the two-sample case," Ann. Statist., 2, 267-277.
Doksum, Kjell A. and Sievers, Gerald L. (1976). "Plotting with confidence: Graphical comparisons
of two populations," Biometrika, 63, 421-434.
REFERENCES 925
Doleans-Dade, C. (1970). "Quelques applications de la formule de changement de variables pour
les semi martingales," Z. Wahrsch. verw. Geb., 16, 181-194.
Doleans-Dade, C. and Meyer, P. A. (1970). "Integrales stochastiques par rapport aux martingales
locales," Seminaire de Probabilites IV, Universite de Strasbourg, vol. 124, pp. 77-107, Springer-
Verlag, Berlin.
Donsker, M. D. (1951). "An invariance principle for certain probability limit theorems," Mem.
Am. Math. Soc., 6, 1-12.
Donsker, M. D. (1952). "Justification and extension of Doob's heuristic approach to the
Kolmogorov-Smirnov theorems," Ann. Math. Statist., 23, 277-281.
Donsker, M. D. and Varadahan, S. R. S. (1977). "On laws of the iterated logarithm for local
times," Commun. Pure Appl. Math., 30, 707-753.
Doob, J. L. (1949). "Heuristic approach to the Kolmogorov-Smirnov theorems," Ann. Math.
Statist., 20, 393-403.
Doob, J. L. (1953). Stochastic Processes, Wiley, New York.
Dudley, R. M. (1966). "Weak convergence of probabilities on nonseparable metric spaces and
empirical measures on Euclidean spaces," Ill. J. Math., 10, 109-126.
Dudley, R. M. (1968). "Distances of probability measures and random variables," Ann. Math.
Statist., 39, 1563-1572.
Dudley, R. M. (1973). "Sample functions of the Gaussian process," Ann. Prob., 1, 66-103.
Dudley, R. M. (1976). "Probabilities and metrics: Convergence of laws on metric spaces, with a
view to statistical testing," Lecture Notes Series, Vol. 45, Matematish Institut, Aarhus Univer-
sitet.
Dudley, R. M. (1978). "Central limit theorems for empirical measures," Ann. Prob., 6, 899-929.
Dudley, R. M. (1979). "Balls in R k do not cut all subsets of k+2 points," Adv. Math. 31, 306-308.
Dudley, R. M. (1981). "Some recent results on empirical processes," Probability in Banach Spaces
III. Lecture Notes in Mathematics, Vol. 860, Springer- Verlag, New York.
Dudley, R. M. (1983). "A course on empirical processes," preprint.
Dudley, R. M. and Philipp, W. (1983). "Invariance principles for sums of Banach space valued
random elements and empirical processes," Z. Wahrsch. verw. Geb., 62, 509-552.
Durbin, J. (1971). "Boundary-crossing probabilities for the Brownian motion and Poisson proces-
ses and techniques for computing the power of the Kolmogorov-Smirnov test," J. Appl. Prob.,
S, 431-453.
Durbin, J. (1973a). "Distribution theory for tests based on the sample distribution function,"
Regional Conference Series in Applied Mathematics, Vol. 9, SIAM, Philadelphia, Pennsylvania.
Durbin, J. (1973b). "Weak convergence of the sample distribution function when parameters are
estimated," Ann. Statist., 1, 279-290.
Durbin, J. and Knott, M. (1972). "Components of Cramer-von Mises statistics, I," J. Roy. Statist.
Soc. B, 34, 290-307.
Durbin, J., Knott, M., and Taylor, C. (1975). "Components of Cramer-von Mises statistics, II,"
J. Roy. Statist. Soc. B, 37, 216-237.
Durbin, J., Knott, M., and Taylor, C. (1977). "Corrigenda to: Components of Cramer-von Mises
Statistics, 1I," J. Roy. Statist. Soc. B, 39, 394.
Duttweiler, D. L. (1973). "The mean-square approximation error of Bahadur's order statistic
approximation," Ann. Statist., 1, 446-453.
Dvoretzky, A., Keifer, J., and Wolfowitz, J. (1956). "Asymptotic minimax character of the sample
distribution functions and of the classical multinomial estimator," Ann. Math. Statist., 27,
642-669.
Dwass, M. (1959). "The distribution of a generalized D„ statistic," Ann. Math. Statist., 30,
1024-1028.
926 REFERENCES

Dwass, M. (1974). "Poisson processes and distribution free statistics," Adv. Appl. Prob., 6, 359-
375.
Efron, B. (1967). "The two-sample problem with censored data," Proceedings of the Fifth Berkeley
Symposium on Mathematical Statistics and Probability, Vol. 4, pp. 831-853, University of
California Press, Berkeley, California.
Efron, B. (1979). "Bootstrap methods: another look at the jackknife," Ann. Statist., 7, 1-26.
Eicker, F. (1979). "The asymptotic distribution of the suprema of the standardized empirical
process," Ann. Statist., 7, 116-138.
Elliot, R. J. (1976). "Stochastic integrals for martingales of a jump process with partially accessible
jump times," Z. Wahrsch. verw. Geb., 36, 213-226.
Epanechnikov, V. A. (1968). "The significance level and power of the two-sided Kolmogorov test
in the case of small samples," Theor. Prob. Appl., 13, 686-690.
Erdös, P. (1942). "On the law of the iterated logarithm," Ann. Math. 43, 419-436.
Fears, T. R. and Mehra, K. L. (1974). "Weak convergence of a two-sample empirical process and
a Chernoff-Savage theorem for b-mixing sequences," Ann. Statist., 2, 586-596.
Feller, W. (1943). "On the general form of the so-called law of the iterated logarithm," Trans.
Am. Math, Soc., 54, 373-402.
Feller, W. (1946). "A limit theorem for random variables with infinite moments," Am. J. Math.,
58, 257-262.
Feller, W. (1968). An Introduction to Probability Theory and Its Applications, Wiley, New York.
Ferguson, T. S. (1967). Mathematical Statistics, Academic Press, New York.
Fernandez, P. J. (1970). "A weak convergence theorem for random sums of independent random
variables," Ann. Math. Statist., 41, 710-712.
Finkelstein, H. (1971). "The law of the iterated logarithm for empirical distributions," Ann. Math.
Statist., 42, 607-615.
Földes, A. and Rejto, L. (1981a). "Strong uniform consistency for nonparametric survival curve
estimators from randomly censored data," Ann. Statist., 9, 122-129.
Földes, A. and Rejto, L. (1981b). "A LIL type result for the product limit estimator on the whole
line," Z. Wahrsch. verw. Geb., 56, 75-86.
Frankel, J. (1976). "A note on downcrossings for extremal processes," Ann. Prob., 4, 151-152.
Freedman, David (1971). Brownian Motion and Diffusion, Holden-Day, San Francisco.
Freedman, David A. (1975). "On tail probabilities for martingales," Ann. Prob., 3, 100-118.
Gabriel, J. (1977). "Martingales with a countable filtering index set," Ann. Prob., 5, 888-898.
Gänßler, P. (1983). Empirical Processes, 3, IMS Lecture Notes-Monograph Series, Institute of
Mathematical Statistics, Hayward, California.
Gänßler, P. and Stute, W. (1979). "Empirical processes: a survey of results for independent and
identically distributed random variables," Ann. Prob., 7, 193-243.
Galambos, J. (1978). The Asymptotic Theory of Extreme Order Statistics, Wiley, New York.
Ghosh, M. (1972). "On the representation of linear functions of order statistics," Sankhya, A34,
349-356.
Gill, R. D. (1980). "Censoring and stochastic integrals," Mathematical Centre Tract, Vol. 124,
Mathematisch Centrum, Amsterdam.
Gill, R. D. (1983). "Large sample behavior of the product-limit estimator on the whole line,"
Ann. Statist., 11, 49-58.
Gill, R. D. (1984). "Understanding Cox's regression model: a martingale approach," J. Am. Statist.
Assoc., 79, 441-447.
Gine, E. and Zinn, J. (1984). "Some limit theorems for empirical processes," Ann. Prob., 12,
929-989.
REFERENCES 927
Gleser, L. (1975). "On the distribution of the number of successes in independent trials," Ann.
Prob., 3, 182-188.
Glivenko, V. (1933). "Sulla determinazione empirica della legge di probabilita," Giorn. Ist. Ital.
At:uari, 4, 92-99.
Gnedenko, B. V., Koroluk, V. S., and Skorokhod, A. V. (1961). "Asymptotic expansions in
probability theory," Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics
and Probability, Vol. 2, pp. 153-170, University of California Press, Berkeley, California.
Gnedenko, B. V. and Korolyuk, V. S. (1961). "On the maximum discrepancy between two empirical
distributions," Selected Transl. Math. Statist. Prob., 1, 13-16.
Gnedenko, B. V. and Mihalevic, V. (1952). "Two theorems on the behavior of empirical distribution
functions," Selected Transl. Math. Statist. Prob., 1, 55-57.
Götze, F. (1979). "Asymptotic expansions for bivariate von Mises functionals," Z. Warsch. verw.
Geb., 50, 333-355.
Goldie, C. M. (1977). "Convergence theorems for empirical Lorenz curves and their inverses,"
Adv. Appl. Prob., 9, 765-791.
Goodman, V., Kuelbs, J., and Zinn, J. (1981). "Some results on the LIL in Banach space with
applications to weighted empirical processes," Ann. Prob., 9, 713-752.
Govindarajulu, Z. (1980). "Asymptotic normality of linear combinations of functions of order
statistics in one and several samples," Colloq. Math. Soc. J. Bolyai, Nonparametric Statistical
Inference, Vol. 32, B. V. Gnedenko, M. L. Puri, and 1. Vincze, eds., North-Holland, Amsterdam.
Govindarajulu, Z. and Mason, D. (1980). "A strong representation for linear combinations of
order statistics with application to fixed width confidence intervals for location and scale
parameters," Technical Report No. 154, Department of Statistics, University of Kentucky.
Grigelionis, B. (1975). "Random point processes and martingales," Litovsk Matem. Sbornik, XV,
4. (In Russian).
Groeneboom, P. and Shorack, G. R. (1981). "Large deviations of goodness of fit statistics and
linear combinations of order statistics," Ann. Prob., 9, 971-987.
Groeneboom, P., Oosterhoof, J., and Ruymgaart, F. H. (1979). "Large deviation theorems for
empirical probability measures," Ann. Prob., 7, 553-586.
Gut, A. (1975). "On a.s. and r-mean convergence of random processes with an application to
passage times," Z. Wahrsch. verw. Geb., 31, 333-342,
Gut, A. A. (1978). "Moments of the maximum of normal partial sums of random variables with
multidimensional indices," Z. Wahrsch. verw. Geb., 46, 205-220.
Häjek, J. (1960). "On a simple linear model in Gaussian processes," Trans. ]Ind Prague Conf.
Inf. Theory Stat. Dec. Functions, Random Processes, pp. 185-197.
Häjek, J. (1970). "Discussion of Pyke's paper," in Nonparamagnetic Techniques in Statistical
Inference, M. L. Puri, ed., pp. 38-40, Cambridge University Press, Cambridge, Massachusetts.
Häjek, J. (1972). "Local asymptotic minimax and admissibility in estimation," Proceedings of the
Sixth Berkeley Symposium on Mathematical Statistics and Probability, Vol. 1, pp. 175-194,
University of California Press, Berkeley, California.
Häjek, Jaroslav and Sidäk, Zbynek (1967). Theory of Rank Tests, Academic Press, New York.
Hall, W. J. and Loynes, R. M. (1977). "On the concept of contiguity," Ann. Prob. 5, 278-282.
Hall, W. J. and Wellner, J. A. (1979). "Estimation of mean residual life," unpublished technical
report, University of Rochester.
Hall, W. J. and Wellner, J. A. (1980). "Confidence bands for a survival curve from censored
data," Biometrika, 67, 133-143.
Hall, W. J. and Wellner, J. A. (1981). "Mean residual life," in Statistics and Related Topics, M.
Csörgö et at., eds., pp. 169-184, North-Holland, Amsterdam.
928 REFERENCES

Harter, H. L. (1980). "Modified asymptotic formulas for critical values of the Kolmogorov test
statistic," Am. Statist., 34, 110-111.
Hartman, P. and Wintner, A. (1941). "On the law of the iterated logarithm," Am. J. Math., 63,
169-176.
Hewitt, E. and Stromberg, K. (1969). Real and Abstract Analysis, Springer- Verlag, New York.
Hill, D. and Rao, P. (1977). "Tests of symmetry based on Cramer-von Mises Statistics," Biometrika,
64, 489-494.
Hoadley, A. B. (1965). "The theory of large deviations with statistical applications," unpublished
dissertation, University of California, Berkeley, California.
Hoadley, A. B. (1967). "On the probability of large deviations of functions of several empirical
cdrs," Ann. Math. Statist., 38, 360-381.
Hodges, J. L. and Lehmann, E. L. (1963). "Estimates of location based on rank tests," Ann. Math.
Statist., 34, 598-611.
Hoeffding, W. (1956). "On the distribution of the number of successes in independent trials,"
Ann. Math. Statist., 27, 713-721.
Hoeffding, W. (1963). "Probability inequalities for sums of bounded random variables," J. Am.
Statist. Assoc., 58, 13-30.
Holst, Lars and Rao, J. S. (1981). "Asymptotic spacings theory with applications to the two-sample
problem," Can. J. Statist., 9, 79-89.
Hu, Inchi (1985). "A uniform bound for the tall probability of Kolomogorov-Smirnov statistics,"
Ann. Statist., 13.
Iman, R. (1982). "Graphs for use with the Lilliefors test for normal and exponential distributions,"
Am. Statist., 36, 109-112.
Ito, K. and McKean, H. P. (1974). Diffusion Processes and Their Sample Paths, Springer- Verlag,
Berlin. Second Printing, Corrected.
Jackson, O. (1967). "An analysis of departures from the exponential distribution," J. Roy. Statist.
Soc. B 29, 540-549.
Jacobsen, Martin (1982). "Statistical analysis of counting processes," Lecture Notes in Statistics,
Vol. 12, Springer- Verlag, New York.
Jacod, J. (1975). "Multivariate point processes: Predictable projection, Radon-Nikodym deriva-
tives, representation of martingales," Z. Wahrsch. verw. Geb., 31, 235-253.
Jacod, J. (1979). "Calcul stochastique et problemes de martingales," Lecture Notes in Mathematics,
Vol. 714, Springer- Verlag, Berlin.
Jaeschke, D. (1979). "The asymptotic distribution of the supremum of the standardized empirical
distribution function on subintervals," Ann. Statist., 7, 108-115.
Jain, N. and Pruitt, W. (1975). "The other law of the iterated logarithm," Ann. Prob., 3, 1046-1049.
Jain, N. C. and Marcus, M. B. (1975). "Central limit theorems for C(S)-valued random variables,"
J. Funct. Anal., 19, 216-231.
Jain, N. C., Jogedeo, K., and Stout, W. F. (1975). "Upper and lower functions for martingales
and mixing processes," Ann. Prob., 3, 119-145.
James, B. R. (1971). "A functional law of the iterated logarithm for weighted empirical distribu-
tions," Ph.D. dissertation, Department of Statistics, University of California at Berkeley,
Berkeley, California.
James, B. R. (1975). "A functional law of the iterated logarithm for weighted empirical distribu-
tions," Ann. Prob., 3, 762-772.
Johnson, B. McK. and Killeen, T. (1983). "An explicit formula for the cdf of the L, norm of the
Brownian bridge," Ann. Prob., 11, 807-808.
Johns, M. V. (1974). "Nonparametric estimation of location," J. Am. Statist. Assoc., 69, 453-460.
REFERENCES 929

JureZkovä, J. (1969). "Asymptotic linearity of a rank statistic in regression parameter," Ann. Math.
Statist., 40, 1889-1900.
Juretkovä, J. (1971). "Nonparametric estimate of regression coefficients," Ann. Math. Statist., 42,
1328-1388.
Kabanov, Yu. M. (1973). "Representation of the functionals of Wiener processes and Poisson
processes as stochastic integrals," Teoria Verojatn Primenen, XVIII, 376-380.
Kac, M. (1949). "On deviations between theoretical and empirical distributions," Proc. Nail.
Acad. Sei., USA, 35, 252-257.
Kac, M. and Siegert, A. J. F. (1947). "An explicit representation of a stationary Gaussian process,"
Ann. Math. Statist., 18, 438-442.
Kac, M., Keifer, J., and Wolfowitz, J. (1955). "On tests of normality and other tests of goodness
of fit based on distance methods," Ann. Math. Statist., 26, 189-211.
Kalbfleisch, J. D. and Prentice, R. L. (1980). The Statistical Analysis of Failure Time Data, Wiley,
New York.
Kale, B. K. (1969). "Unified derivation of tests of goodness of fit based on spacings," Sanhkya,
A31, 43-48.
Kallianpur, Gopinath (1980). Stochastic Filtering Theory, Springer- Verlag, New York.
Kanwal, R. (1971). Linear Integral Equations Theory and Technique, Academic Press, New York.
Kaplan, E. L. and Meier, P. (1958). "Nonparametric estimation from incomplete observations,"
J. Am. Statist. Assoc., 53, 457-481.
Karlin, Samuel and Taylor, Howard M. (1975). A First Course in Stochastic Processes, 2nd ed.,
Academic Press, New York.
Kaufman, R. and Philipp, W. (1978). "A uniform law of the iterated logarithm for classes of
functions," Ann. Prob., 6, 930-952.
Khmaladze, E. V. (1981). "Martingale approach in the theory of goodness-of-fit tests," Theor.
Prob. Appl., 26, 240-257.
Keifer, J. (1961). "On large deviations of the empiric d.f. of vector chance variables and a law
of the iterated logarithm," Pacific J. Math., 11, 649-660.
Kiefer, J. (1967). "On Bahadur's representation of sample quantiles," Ann. Math. Statist., 38,
1323-1342.
Kiefer, J. (1969). "On the deviations in the Skorokhod- Strassen approximation scheme," Z.
Wahrsch. verw. Geb., 13, 321-332.
Kiefer, J. (1970). "Deviations between the sample quantile process and the sample df," in
Nonparametric Techniques in Statistical Inference, M. L. Puri, ed., Cambridge University Press,
Cambridge.
Kiefer, J. (1972). "Iterated logarithm analogues for sample quantiles when p„ --^ 0," Proceedings
of the Sixth Berkeley Symposium on Mathematical Statistics and Probability, Vol. 1, pp. 227-244,
University of California Press, Berkeley, California.
Kiefer, J. and Wolfowitz, J. (1956). "Consistency of the maximum likelihood estimator in the
presence of infinitely many nuisance parameters," Ann. Math. Statist., 27, 887-906.
Kiefer, J. and Wolfowitz, J. (1976), "Asymptotically minimax estimation of concave and convex
distribution functions," Z. Wahrsch. verw. Geb., 34, 89-95. 1
Klass, M. (1976). "Toward a universal law of the iterated logarithm, Part I," Z. Wahrsch. verw.
Geb., vol. 36, pp. 165-178.
Kolmogorov, A. N. (1933). "Sulla determinazione empirica di una legge di distribuzione," Giorn.
Ist. Ital. Attuari, 4, 83-91.
Kolmogorov, A. N. and Tihomirov, V. M. (1959). "The epsilon-entropy and epsilon-capacity of
sets in functional spaces," Usp. Mat. Nauk, 14, no. 2 (86), 3-86. Am. Math. Soc. Transl. 17
277-364 (1961).
930 REFERENCES

Komlös, J., Major, P., and Tusnädy, G. (1975). "An approximation of partial sums of independent
rv's and the sample distribution function, I," Z. Wahrsch. verw. Geb., 32, 111 -131.
Komlös, J., Major, P., and Tusnddy, G. (1976). "An approximation of partial sums of independent
rv's and the sample df, I1," Z. Wahrsch. verw. Geb., 34, 33-58.
Kostka, D. (1973). "On Khintchine's estimate for large deviations," Ann. Prob., 1, 509-512.
Kostka, D. (1974). "Lower class sequences for the Skorokhod- Strassen approximation scheme,"
Ann. Prob., 2, 1172-1178.
Kotel'nikova, V. F. and Chmaladze, E. V. (1983). "On computing the probability of an empirical
process not crossing a curvilinear boundary," Theor. Prob. App!., 27, 640-648.
Koul, H. (1970). "Some convergence theorems for ranks and weighted empirical cumulatives,"
Ann. Math. Statist., 41, 1768-1773.
Koul, H. (1977). "Behaviour of robust estimators in the regression model with dependent errors,"
Ann. Statist., 5, 681-699.
Koul, H. and Staudte, R. (1972). "Weak convergence of weighted empirical cumulatives based
on ranks," Ann. Math. Statist., 43, 832-841.
Koul, H. and DeWet, T. (1983). "Minimum distance estimation in a linear regression model,"
Ann. Statist., 11, 921-932.
Koziol, J. (1980a). "On a Cramer-von Mises type statistic for testing symmetry," J. Am. Statist.
Assoc., 75, 161-167.
Koziol, J. A. (1980b). "A note on limiting distributions for spacings statistics," Z. Wahrsch. verw.
Geb., 51, 55-62.
Kuelbs, J. and Dudley, R. M. (1980). "Log log laws for empirical measures," Ann. Prob., 8, 405-418.
Kuelbs, J. and Philipp, W. (1980). "Almost sure invariance principles for partial sums of mixing
B-valued random variables," Ann. Prob., 8, 1003-1036,
Kuiper, N. H. (1960). "Tests concerning random points on a circle," Proc. Kon. Akad. Wetensch.,
A63, 38-47.
Kunita, H. and Watanabe, S. (1967). "On square-integrable martingales," Nagoya Math. J., 30,
209-245.
Lai, T. L. (1974). "Convergence rates in the strong law of large numbers for random variables
taking values in Banach spaces," Bull. Inst. Math. Acad. Sinica, 2, 67-85.
Lauwerier, H. A. (1963). "The asymptotic expansion of the statistical distribution of N. V.
Smirnov," Z. Wahrsch. verw. Geb., 2, 61-68.
Le Cam, L. (1958). "Une theoreme sur ]a division d'une intervalle par des points pres au hasard,"
Pub. Inst. Statist. Univ. Paris, 7, 7-16.
Le Cam, L. (1969). Theorie Asymptotique de la Decision Statistique, Les Presses de l'Universite de
Montreal.
Le Cam, L. (1972). "Limits of experiments," Proceedings of the Sixth Berkeley Symposium on
Mathematical Statistics and Probability, Vol. 1, pp. 245-261, University of California Press,
Berkeley, California.
Le Cam, L. (1979). "On a theorem of J. Hajek," in Contributions to Statistics; Jaroslav Hajek
Memorial Volume, J. Jureckova, ed., pp. 119-135, Reidel, Dordrecht.
Le Cam, L. (1982). "Limit theorems for empirical measures and poissonization," in Statistics and
Probability: Essays in Honor of C. R. Rao, P. R. Krshnaiah and J. K. Ghosh, eds., North-
Holland, Amsterdam.
Le Cam, L., (1983). "A remark on empirical measures," in A Festschrift for Erich L. Lenmann in
Honor of His 65th Birthday, P. Bickel et al., eds., Wadsworth, Belmont, California.
Lehmann, E. L. (1947). "On optimum tests of composite hypotheses with one constraint," Ann.
Math. Statist., 18, 473-494.
Lehmann, E. L. (1959). Testing Statistical Hypotheses, Wiley, New York.
REFERENCES 931

Lenglart, E. (1977). "Relation de domination entre deux processus," Ann. Inst. Henri Poincare,
13, 171-179.
Lepingle, D. (1978). "Sur le comportement asymptotique des martingales locales," Seminaire de
Probabilities XII, 649, 148-161.
Levit, B. Ya. (1978). "Infinite-dimensional informational lower bounds," Theor. Prob. Appl., 23,
388-394.
Levy, P. (1937). Theorie de ('Addition des Variables Aleatoires, Gauthier-Villars, Paris.
Lewis, P. A. W. (1965). "Some results on tests for Poisson processes," Biometrika, 52, 67-78.
Lilliefors, H. W. (1967). "On the Kolmogorov-Smirnov test for normality with mean and variance
unknown," J. Am. Statist. Assoc., 62, 399-402.
Liptser, R. S. and Shiryayev, A. N. (1978). Statistics of Random Processes II: Applications,
Springer- Verlag, New York.
Loeve, M. (1977). Probability Theory, 4th ed., Springer-Verlag, New York.
Lombard, F. (1984). "An elementary proof of asymptotic normality for linear rank statistics,"
unpublished technical report, University of South Africa.
Lorentz, G. G. (1966). "Metric entropy and approximation," Bull. Am. Math. Soc., 72, 903-
937.
Loynes, R. M. (1980). "The empirical distribution function of residuals from generalized
regression," Ann. Statist., 8, 285-298.
Major, P. (1976a). "Approximation of partial sums of independent rv's," Z. Wahrsch. verw. Geb.,
35, 213-220.
Major P. (1976b). "Approximation of partial sums of iid rv's when the summands have only two
moments," Z. Wahrsch. verw. Geb., 35, 221-229.
Major, P. (1978). "On the invariance principle for sums of independent identically distributed
random variables," J. Mult. Anal., 8, 487-501.
Mallows, C. L. (1972). "A note on asymptotic joint normality," Ann. Math. Statist., 43, 508-
515.
Malmquist, S. (1950). "On a property of order statistics from a rectangular distribution," Skand.
Aktuarietidskr, 33, 214-222.
Malmquist, S. (1954). "On certain confidence contours for distribution functions," Ann. Math.
Statist., 25, 523-533.
Marcus, M. B. and Zinn, J. (1984). "The bounded law of the iterated logarithm for the weighted
empirical distribution process in the non-iid case," preprint. Ann. Prob. 12, 335-360.
Marshall, A. W. (1958). "The small sample distribution of w,," Ann. Math. Statist., 29, 307-309,
Marshall, A. W. and Olkin, 1. (1979). Inequalities: Theory of Majorization and Its Applications,
Academic Press, New York.
Mason, D. M. (1981a). "Asymptotic normality of linear combinations of order statistics with a
smooth score function," Ann. Statist., 9, 899-908.
Mason, D. M. (1981b). "Bounds for weighted empirical distribution functions," Ann. Prob., 9,
881-884.
Mason, D. M. (1981c). "On the use of a statistic based on sequential ranks to prove limit theorems
for simple linear rank statistics," Ann. Statist., 9, 424-436.
Mason, D. M. (1982b). "Some characterizations of almost sure bounds for weighted multi-
dimensional empirical distributions and a Glivenko-Cantelli theorem for sample quantiles,"
Z. Wahrsch. verw. Geb., 59, 505-513.
Mason, D. M. (1983). "The asymptotic distribution of weighted empirical distribution functions,"
Stoch. Proc. Appl., 15, 99-109.
Mason, D. M. (1984). "Weak convergence of the weighted empirical quantile process in L 2 (0, 1),"
Ann. Prob., 12, 243-255.
932 REFERENCES

Mason, D. M. (1984a). "A strong limit theorem for the oscillation modulus of the uniform empirical
quantile process," Stoch. Proc. Appl., 17, 127-136.
Mason, D. M., Shorack, G. R., and Wellner, J. A. (1983). "Strong limit theorems for oscillation
moduli of the uniform empirical process," Z. Wahrsch. verw. Geb., 65, 83-97.
Mason, D. and van Zwet, W. (1985). "A refinement of the KMT inequality for the uniform
empirical process," Universitaet Muenchen Mathematisches Institut technical report no. 30,
1-18.
Massey, F. J. (1950). "A note on the power of a nonparametric test," Ann. Math. Statist., 21,
440-443. [Correction: Ann. Math. Statist. 23, 637-638 (1952).]
McKean, H. P. (1969). Stochastic Integrals, Academic Press, New York.
Mehra, K. and Rao, J. (1975). "Weak convergence of generalized empirical processes relative to
dq under strong mixing," Ann. Prob., 3, 979-991.
Meier, P. (1975). "Estimation of a distribution function from incomplete observations," in
Perspectives in Statistics, J. Gani, ed., pp. 67-87, Academic Press, London.
Meyer, P. A. (1976). "Un cours sur les Integrales Stochastique," Seminaire de Probabilites X,
Lecture Notes in Mathematics, Vol. 511, pp. 246-400, Springer- Verlag, Berlin.
Michael, J. (1983). "The stabilized probability plot," Biometrika, 70, 11 -17.
Michel, R. (1976). "Nonuniform central limit bounds with applications to probabilities of large
deviations," Ann. Prob., 4, 102-106.
Millar, P. W. (1979). "Asymptotic minimax theorems for the sample distribution function," Z.
Wahrsch. verw. Geb., 55, 72-89.
Miller, R. (1981). Survival Analysis, Wiley, New York.
Mogulskii, A. A. (1980). "On the law of the iterated logarithm in Chung's form for functional
spaces," Theor. Prob. Appl., 24, 405-413.
Mogulskii, A. A. (1984). "Large deviations of the Wiener process," in Advances in Probability
Theory: Limit Theorems and Related Problems, A. Borovkov, ed., Optimization Software, Inc.,
New York.
Mogyorbdi, J. (1975). "Some inequalities for the maximum of partial sums of random variables,"
Math. Nach., 70, 71-85.
Müller, D. W. (1968). "Verteilungs-Invarianzprinzipien for das starke Gestz der grossen Zahl,"
Z. Wahrsch. verw. Geb., 10, 173-192.
Nagaev, S. V. (1970). "On the speed of convergence in a boundary problem, I and II," Theor.
Prob. Appl., 15, 163, 186, 403-429.
Nair, V. N. (1981). "Plots and tests for goodness of fit with randomly censored data," Biometrika,
68, 99-103.
Nair, V. N. (1984). "Confidence bands for survival functions with censored data: A comparative
study," Technometrics, 26, 265-275.
Neuhaus, G. (1975). "Convergence of the reduced empirical process for non-i.i.d. random vectors,"
Ann. Statist., 3, 528-531.
Neveu, Jacques (1975). Discrete-Parameter Martingales, North-Holland, Amsterdam.
Niederhausen, H. (1981a). "Scheffer polynomials for computing exact Kolmogorov-Smirnov and
Renyi type distributions," Ann. Statist., 9, 923-944.
Niederhausen, H. (1981b). "Tables of significance points for the variance-weighted Kolmogorov-
Smirnov statistics," Technical Report No. 298, Department of Statistics, Stanford University,
Stanford, California.
Noe, M. (1972). "The calculation of distributions of two-sided Kolmogorov Smirnov type statis-
tics," Ann. Math. Statist., 43, 58-64.
Noe, M. and Vandewiele, G. (1968). "The Calculation of distributions of Kolmogorov-Smirnov
type statistics including a table of significance points for a particular case," Ann. Math.
Statist., 39, 233-241.
REFERENCES 933
O'Reilly, N. E. (1974). "On the weak convergence of empirical processes in sup-norm metrics,"
Ann. Prob., 2, 642-651.
Oodaira, H. (1975). "Some functional laws of the iterated logarithm for dependent random
variables," in Colloq. Math. Soc. J. Bolyai, Limit Theorems of Probability Theory, P. Revesz,
ed., pp. 253-272, North-Holland, Amsterdam.
Oodaira, H. (1976). "Some limit theorems for the maximum of normalized sums of weakly
dependent random variables," Proceedings Third Japan- USSR Symposium Probability Theory,
Vol. 550, pp.467-474, Springer- Verlag, New York.
Orlov, A. (1972). "On testing the symmetry of distributions," Theory Prob. Applic., 17, 357-361.
Owen, D. B. (1962). Handbook of Statistical Tables, Addison-Wesley, Reading, Massachusetts.
Parr, W. C. (1981). "On minimum Cramer-von Mises-norm parameter estimation," Commun.
Statist., A10, 1149-1166.
Parr, W. C. and Schucany, W. R. (1982). "Minimum distance estimation and components of
goodness-of-fit statistics," J. Roy. Statist. Soc. B 44, 178-189.
Parthasarathy, K. R. (1967). Probability Measures on Metric Spaces, Academic Press, New York.
Parzen, E. (1980). "Quantile functions, convergence in quantile, and extreme value distribution
theory," Technical Report B-3, Statistical Institute, Texas A & M University, College Station,
Texas.
Pearson, E. and Hartley, H., eds. (1972). Biometrika Tablesfor Statisticians, 2, Cambridge University
Press, Cambridge.
Penkov, B. I. (1976). "Asymptotic distribution of Pyke's statistic," Theor. Prob. Appel., 21, 370-374.
Peterson, A. V. (1977). "Expressing the Kaplan-Meier estimator as a function of empirical
subsurvival functions," J. Am. Statist. Assoc., 72, 854-858.
Petrov, V. V. (1975). Sums of Independent Random Variables, Springer- Verlag, New York.
Petrovski, I. (1935). "Zur ersten Randwertaufgabe der Warmeleitungsgleichung," Compositio
Math., 1, 383-419.
Pettit, A. (1976). "Cramer-von Mises statistics for testing normality censored samples, " Biometrika,
63, 475-481.
Pettit, A. (1977). "Tests for the exponential distribution with censored data using Cramer-von
Mises statistics," Biometrika, 64, 629-632.
Pettit, A. (1978). "Generalized Cramer-von Mises statistics for the gamma distribution," Bio..
metrika, 65, 232-235.
Phadia, E. G. (1973). "Minimax estimation of a cumulative distribution function," Ann. Statist.,
1, 1149-1157.
Philipp, W. and Pinzur, L. (1980). "Almost sure approximation theorems for the multivariate
empirical process," Z. Wahrsch. verw. Geb., 54, 1 -13.
Pierce, D. A. and Kopecky, K. J. (1979). "Testing goodness of fit for the distribution of errors in
regression models," Biometrika, 66, 1 -6.
Pinsky, M. (1969). "An elementary derivation of Khintchine's estimate for deviations," Proc. Am.
Math. Soc., 22, 288-290.
Pitman, E. (1979). Some Basic Theory for Statistical Inference, Chapman & Hall, London.
Pitman, E. J. G. (1972). "Simple proofs of Steck's determinantal expressions for probabilities in
the Kolmogorov and Smirnov tests," Bull. Aus:. Math. Soc., 7, 227-232.
Plachky, D. and Steinbach, J. (1975). "A theorem about probabilities of large deviations with an
application to queuing theory," Period. Math. Hungar., 6, 343-345.
Pollard, D. (1979). "General chi-square goodness-of-fit tests with data dependent cells," Z.
Wahrsch. verw. Geb., 50, 317-331.
Pollard, D. (1980). "The minimum distance method of testing," Metrika, 27, 43-70.
Pollard, D. (1981a). "Limit theorems for empirical processes," Z. Wahrsch. verw. Geb., 54, 1 -13.
934 REFERENCES

Pollard, D. (1981b). "A central limit theorem for empirical processes," J. Aust. Math. Soc. A 33,
235-248.
Pollard, D. (1982). "A central limit theorem for k -means clustering," Ann. Prob., 10, 919-926.
Pollard, D. (1984). Convergence of Stochastic Processes, Springer- Verlag, New York.
Prohorov, Yu. V. (1956). "Convergence of random processes and limit theorems in probability
theory," Theor. Prob. Appl., 1, 157-214.
Proschan, F. and Pyke, R. (1967). "Tests for monotone failure rate," Proceedings of the Fourth
Berkeley Symposium on Mathematical Statistical Probability, Vol. 3, pp. 293-312, University
of California Press, Berkeley, California.
Puri, M. L. and Sen, P. (1969). "On the asymptotic normality of sone sample rank order test
statistics," Theory Prob. Appl., 14, 163-167.
Pyke, R. (1959).• "The supremum and infimum of the Poisson process," Ann. Math. Statist., 30,
568-576.
Pyke, R. (1965). "Spacings (with discussion)," J. Roy. Statist. Soc. B 27, 395-449.
Pyke, R. (1968). "The weak convergence of the empirical process with random sample size," Proc.
Cambridge Phil. Soc., 64, 155-160.
Pyke, R. (1970). "Asymptotic results for rank statistics," in Nonparametric Techniques in Statistical
Inference, M. L. Puri, ed., Cambridge University Press, Cambridge.
Pyke, R. (1972). "Spacings revisited," Proceedings of the Sixth Berkeley Symposium on Mathematical
Statistics and Probability, Vol. 1, pp.417 -427, University of California Press, Berkeley,
California.
Pyke, R. and Shorack, G. R. (1968). "Weak convergence of a two-sample empirical process and
a new approach to Chernoff-Savage theorems," Ann. Math. Statist., 39, 755-771.
Quade, D. (1965). "On the asymptotic power of the one-sample Kolmogorov-Smirnov tests,"
Ann. Math. Statist., 36, 1000-1018.
Raghavachari, M. (1973). "Limiting distributions of Kolmogorov-Smirnov statistics under the
alternative," Ann. Statist., 1, 67-73.
Rao, J. S. and Sethuraman, J. (1975). "Weak convergence of empirical distribution functions of
random variables subject to perturbations and scale factors," Ann. Statist., 3, 299-313.
Rao, J. S. and Sobel, M. (1980). "Incomplete Dirichlet integrals with applications to ordered
uniform spacings," J. Multivar. Anal., 10, 603-610.
Rao, K. C. (1972). "The Kolmogoroff, Cramer-von Mises, chi-square statistics for goodness-of-fit
in the parametric case, Abstract 133-6," Bull. Inst. Math. Statist., 1, 87.
Rao, P. V., Schuster, E. F., and Littell, R. C. (1975). "Estimation of shift and center of symmetry
based on Kolmogorov-Smirnov statistics," Ann. Statist., 3, 862-873.
Rao, R. R. (1963). "The law of large numbers for D[0, l]- valued random variables," Theor. Prob.
Appl., 8, 70-74.
Read, R. R. (1972). "The asymptotic inadmissibility of the sample distribution function," Ann.
Math. Statist., 43, 89-95.
Rebolledo, R. (1978). "Sur lea applications de la theorie des martingales a ]'etude statistique
d'une famille de processus ponctuels," Journees de Statique des Processus Stochastiques,
Lecture Notes in Mathematics, Vol. 636, pp. 27-70, Springer- Verlag, Berlin.
Rebolledo, R. (1979). "La methode des martingales appliquee a ('etude de convergence en loi de
processus," M. Soc. Math. France, 62, 125.
Rebolledo, R. (1980). "Central limit theorems for local martingales," Z Wahrsch. verw. Geb., 51,
269-286.
Rechtschaffen, R. (1975). "Weak convergence of the empiric process for independent random
variables," Ann. Statist., 3, 787-792.
Renyi, A. (1953). "On the theroy of order statistics," Acta Sci. Math. Hung., 4, 191-227.
Renyi, A. (1970). Probability Theory, North-Holland, Amsterdam.
REFERENCES 935
Renyi, A. (1973). "On a group of problems in the theory of ordered samples," Selected Transl.
Math. Statist. Prob., 13, 289-297.
Rice, S. O. (1982). "The integral of the absolute value of the pinned Wiener process—calculation
of its probability density by numerical integration," Ann. Prob., 10, 240-243.
Robbins, H. (1954). "A one-sided confidence interval for an unknown distribution function,"
Ann. Math. Statist., 25, 409.
Robbins, H. and Siegmund, D. (1972). "On the law of the iterated logarithm for maxima and
minima," Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probabil-
ity, Vol. 3, pp. 51-70, University of California Press, Berkeley, California.
Root, D. (1969). "The existence of certain stopping times on Brownian motion," Ann. Math.
Statist., 40, 715-718.
Ruben, H. (1976). "On the evaluation of Steck's determinant for the rectangle probabilities of
uniform order statistics," Commun. Statist., A(1), 535-543.
Ruben, H. and Gambino, J. (1982). "The exact distribution of Kolmogorov's statistic D sub n
for n s 10," Ann. Inst. Statist. Math., 34, 167-173.
Runnenberg, J. T. and Vervaat, W. (1969). "Asymptotical independence of the lengths of subinter-
vals of a randomly partitioned interval; a sample from S. Ikedä s work," Statist. Neerlandica,
23, 66-77.
Salvia, A. A. (1981). "Minimum Kolmogorov-Smirnov estimation," unpublished technical report,
Pennsylvania State University.
Samuels, S. M. (1965). "On the number of successes in independent trials," Ann. Math. Statist.,
36, 1272-1278.
Sander, Joan M. (1975). "The weak convergence of quantiles of the product," Technical Report
# 5, Division of Biostatistics, Stanford University, Stanford, California.
Sanov, 1. N. (1957). "On the probability of large deviations of random variables," Selected Trans!.
Math. Statist. Prob., 1, 213-244.
Schoenfeld, D. (1980). "Tests based on linear combinations of the orthogonal components of the
Cramer-von Mises statistics when parameters are estimated," Ann. Statist., 8, 1017-1022.
Scholz, F. W. (1980). "Towards a unified definition of maximum likelihood," Can. J. Statist., 8,
193-203.
Schuster, Eugene F. (1973). "On the goodness-of-fit problem for continuous symmetric distribu-
tions," J. Am. Statist. Assoc., 68, 713-715. [Corrigenda (1974). J. Am. Statist. Assoc. 69, 288.]
Schuster, Eugene F. (1975). "Estimating the distribution function of a symmetric distribution,"
Biometrika, 62, 631-635.
Sen, P. K. (1973a). "An almost sure invariance principle for multivariate Kolmogorov-Smirnov
statistics," Ann. Prob., 1, 488-496.
Sen, P. K. (1973b). "On fixed size confidence bands for the bundle of filaments," Ann. Statist.,
1, 526-537.
Sen, P. K. (1978). "An invariance principle for linear combinations of order statistics," Z. Wahrsch.
verw. Geb., 42, 327-340.
Sen, P. K. (1981). Sequential Nonparametrics, Wiley, New York.
Sen, P. K., Bhattacharyya, B. B., and Suh, M. W. (1973). "Limiting behavior of the extremum of
certain sample functions," Ann. Statist., 1, 297-311.
Seneta, E. (1976). "Regularly varying functions," Lecture Notes in Mathematics, Vol. 508, Springer-
Verlag, Berlin.
Serfling, R. (1975). "A general Poisson approximation theorem," Ann. Prob., 3, 726-731.
Serfling, R. (1980). Approximation Theorems of Mathematical Statistics, Wiley, New York.
Serfling, R. J. (1984). "Generalized L -, M-, and R- statistics," Ann. Statist., 12, 76-86.
936 REFERENCES

Shepp, L. A. (1966). "Radon-Nikodym derivatives of Gaussian measures," Ann. Math. Statist.,


37, 321-354.
Shepp, L. A. (1982). "On the integral of the absolute value of the pinned Wiener process," Ann.
Prob., 10, 234-239.
Shue, Shey Shiung (1974). "Some law of the iterated logarithm results for sums of independent
two dimensional random variables," Ann. Prob., 2, 1139-1151.
Shiryayev, A. N. (1981). "Martingales: Recent developments, results, and applications," Int.
Statist. Rev. 49, 199-233.
Shorack, G. R. (1969). "Asymptotic normality of linear combinations of functions of order
statistics," Ann. Math. Statist., 40, 2041-2050.
Shorack, G. (1970). "The one-sample symmetry problem," University of Washington Technical
Report, No. 21, Department of Mathematics.
Shorack, G. R. (1972a). "Functions of order statistics," Ann. Math. Statist., 43, 412-427.
Shorack, G. R. (1972b). "Convergence of quantile and spacings processes with applications,"
Ann. Math. Statist., 43, 1400-1411.
Shorack, G. R. (1978). "Convergence of reduced empirical and quantile processes with application
to functions of order statistics in the non-i.d.d. case," Ann. Statist., 1, 146-152.
Shorack, G. R. (1974). "Random means," Ann. Statist., 2, pp.661-675.
Shorack, G. R. (1976). "Robust studientization of location estimates," Statist. Neerlandica, 30,
119-142.
Shorack, G. R. (1977). "Some LOLL type results for the empirical process and for the k -th smallest
order statistics." Unpublished technical report announced in: Bull. Inst. Math. Statist., 6, 38.
Shorack, G. R. (1979a). "The weighted empirical process of row independent random variables
with arbitrary d.f.'s," Statist. Neerlandica, 33, 169-189.
Shorack, G. R. (1979b). "Extensions of the Darling and Erdös theorem on the maximum of
normalized sums," Ann. Prob., 7, 1092-1096.
Shorack, G. R. (1979c). "Weak convergence of empirical and quantile processes in sup-norm
metrics via KMT constructions," Stoch. Proc. App!., 9, 95-98.
Shorack, G. R. (1980). "Some law of the iterated logarithm type results for the empirical process,"
Aust. J. Statist., 22, 50-59.
Shorack, G. R. (1982a). "Kiefer's theorem via the Hungarian construction," Z. Wahrsch. verw.
Geb., 61, 369-374.
Shorack, G. R. (1982b). "Bootstrapping robust regression," Commun. Statist., All, 961-972.
Shorack, G. R. (1982c). "Inequalities for the Poisson bridge," Bull. Inst. Math. Statist., 11,
193.
Shorack, G. R. (1982d). "Weak convergence of the general quantile process in II /gfl-metrics,"
Bull. Inst. Math. Statist., 11, 60.
Shorack, G. R. (1982e). "The Lipschitz 1/2 modulus of Brownian motion," Bull. Inst. Math.
Statist., 11, p.261.
Shorack, G. R. (1985). "Empirical and rank processes of observations and residuals," Can. J.
Statist., 12, 319-332.
Shorack, G. R. and Beirlant, J. (1985). "The appropriate reduction of the weighted empirical
process," preprint.
Shorack, G. R. and Smythe, R. (1976). "Inequalities for max S k / bk where k c N'," Proc. Am.
Math. Soc., 54, 331-336.
Shorack, G. R. and Wellner, J. A. (1977). "Bounding the empirical distribution by lines through
the origin: a.s. upper and lower classes for the slopes," Department of Mathematics, University
of Washington, Seattle, Washington.
REFERENCES 937
Shorack, G. R. and Wellner, J. A. (1978). "Linear bounds on the empirical distribution function,"
Ann. Prob., 6, 349-353.
Shorack, G. R. and Wellner, J. A. (1982). "Limit theorems and inequalities for the uniform
empirical process indexed by intervals," Ann. Prob., 10, 639-652.
Shorack, G. and Wellner, J. (1984). "A.s. convergence of the Kaplan-Meier estimator," Bulletin
I.M.S., 13, 365.
Siegmund, D. (1969). "The variance of one-sided stopping rules," Ann. Math. Statist., 40,
1074-1077.
Siegmund, D. O. (1982). "Large deviations for boundary crossing probabiliities," Ann. Prob., 10,
581-588.
Sievers, G. (1969). "On the probability of large deviations and exact slopes," Ann. Math. Statist.,
40, 1906-1921.
Silverman, B. W. (1983). "Convergence of a class of empirical distribution functions of dependent
random variables," Ann. Prob., 11, 745-751.
Singh, K. (1979). "Representation of quantile pi bcesses with non-uniform bounds," Sankhya,
:

A41, 271-277.
Skorokhod, A. V. (1956). "Limit theorems for stochastic processes," Theor. Prob. Appl., 1, 261-290.
Slud, E. (1977). "Distribution inequalities for the binomial law," Ann. Prob., 5, 404-412.
Slud, E. (1978). "Entropy and maximal spacings for random partitions," Z. Wahrsch. verw. Geb.,
41, 341-352. [Correction: Z. Wahrsch. verw. Geb., 60, 139-141 (1982).]
Smirnov, N. V. (1939). "An estimate of divergence between empirical curves of a distribution in
two independent samples," Bull. MGU, 2, 3-14 (in Russian).
Smirnov, N. V. (1944). "Approximate laws of distribution of random variables from empirical
data," Usp. Mat. Nauk, 10, 179-206 (in Russian).
Smirnov, N. V. (1948). "Table for estimating the goodness of fit of empirical distributions," Ann.
Math. Statist., 19, 279-281.
Srinivasan, R. and Godio, L. (1974). "A Cramer-von Mises type statistic for testing symmetry,"
Biometrika, 61, 196-198.
Steck, G. P. (1968). "The Smirnov two-sample tests as rank tests," Ann. Math. Statist., 40,
1449-1466.
Steck, G. P. (1971). "Rectangle probabilities for uniform order statistics and the probability that
the empirical distribution function lies between two distribution functions," Ann. Math.
Statist., 42, 1-11.
Steele, J. M. (1978). "Empirical discrepancies and subadditive processes," Ann. Prob., 6,
118-127.
Stein, C. (1972). "A bound for the errors in the normal approximation to the distribution of a
sum of dependent random variables," Proceedings of the Sixth Berkeley Symposium on Mathe-
matical Statistics and Probability, Vol. 2, pp. 583-602, University of California Press, Berkeley,
California.
Stephens, M. A. (1970). "Use of the Kolmogorov-Smirnov, Cramer-von Mises and related statistics
without extensive tables," J. Roy. Statist. Soc. B 32, 115-122.
Stephens, M. A. (1974). "EDF statistics for goodness of fit and some comparisons," J. Am. Statist.
Assoc., 69, 730-737.
Stephens, M. A. (1976). "Asymptotic results for goodness of fit statistics with unknown para-
meters," Ann. Statist., 4, 357-369.
Stephens, M. A. (1977). "Goodness of fit for the extreme value distribution," Biometrika, 64,
583-588.
Stigler, S. (1969). "Linear functions of order statistics," Ann. Math. Statist., 40, 770-788.
Stigler, S. (1974). "Linear functions of order statistics with smooth weight functions," Ann. Statist.,
2,676-693.
938 REFERENCES

Stigler, S. M. (1976). "The effect of sample heterogeneity on linear functions of order statistics,
with applications to robust estimation," J. Am. Statist. Assoc., 71, 956-960.
Stone, M. (1974). "Large deviations of empirical probability measures," Ann. Statist., 2, 362-366.
Stout, W. (1974). Almost Sure Convergence, Academic Press, New York.
Strassen, V. (1964). "An invariance principle for the law of the iterated logarithm," Z. Wahrsch.
verw. Geb., 3, 211-226.
Strassen, V. (1966). "A converse to the law of the iterated logarithm," Z. Wahrsch. verw. Geb., 4,
265-268.
Strassen, V. (1967). "Almost sure behaviour of sums of independent random variables and
martingales," Proceedings of the Fifth Berkeley symposium on Mathematical Statistics and
Probability, Vol. 2, pp. 315-343, University of California Press, Berkeley, California.
Strassen, V. and Dudley, R. M. (1969). "The central limit theorem and epsilon-entropy," Lecture
Notes in Mathematics, Vol. 89, pp. 224-231, Springer- Verlag, Berlin.
Stute, W. (1982). "The oscillation behaviour of empirical processes," Ann. Prob., 10, 86-107.
Sukhatme, P. V. (1937). "Tests of significance for samples of the chi-square population with two
degrees of freedom," Ann. Eugen. Lond., 8, 52-56.
Sukhatme, S. (1972). "Fredholm determinant of a positive definite kernel of a special type and
its applications," Ann. Math. Statist., 43, 1914-1926.
Sweeting, T. J. (1982). "Independent scale-free spacings for the exponential and uniform distribu-
tions," Technical Report 21, Dept. of Mathematics, University of Surrey, Surrey, U.K.
Switzer, P. (1976). "Confidence procedures for two-sample problems," Biometrika, 63, 13-26.
Takäcs, L. (1967). Combinatorial Methods in the Theory of Stochastic Processes, Wiley, New York.
Takäcs, L. (1971). "On the comparison of a theoretical and an empirical distribution function,"
J. Appl. Prob., 8, 321-330.
Uspensky, J. V. (1937). Introduction to Mathematical Probability, McGraw-Hill, New York.
Vanderzanden, A. J. (1980). "Some results for the weighted empirical process concerning the law
of the iterated logarithm and weak convergence," thesis, Michigan State University.
van Zuijlen, M. C. A. (1976). "Some properties of the empirical distribution function in the
non-i.i.d. case," Ann. Statist., 4, 406-408.
van Zuijlen, M. C. A. (1978). "Properties of the empirical distribution function for independent
non-identically distributed random variables," Ann. Prob., 6, 250-266.
van Zuijlen, M. C. A. (1982). "Properties of the empirical distribution function for independent
non-identically distributed random vectors," Ann. Prob., 10, 108-123.
Vapnik, V. N. and (`hervonenkis, A. Ya. (1971). "On uniform convergence of the frequencies of
events to their probabilities," Theor. Prob. Appl., 16, 264-280.
Venter, J. H. (1967). "On estimation of the mode," Ann. Math. Statist., 38, 1446-1455.
Vervaat, W. (1972). "Functional central limit theorems for processes with positive drift and their
inverses," Z. Wahrsch. verve. Geb., 23, 245-253.
Vincze, I. (1970). "On Kolmogorov-Smirnov type distribution theorems," in Nonparametric
Techniques in Statistical Inference, M. L. Puri, ed., Cambridge University Press, Cambridge.
von Bahr, B. and Esseen, C. (1965). "Inequalities for the rth absolute moment of a sum of random
variables, 1 s r s 2," Ann. Math. Statist., 36, 299-303.
von Bahr, Bengt (1965). "On convergence of moments in the central limit theorem," Ann. Math.
Statist., 36, 808-818.
Watson, G. S. (1961). "Goodness-of-fit tests on a circle," Biometrika, 48, 109-114.
Watson, G. S. (1967). "Another test for the uniformity of a circular distribution," Biometrika, 54,
675-676.
Weiss, L. (1970). "Asymptotic distribution of quantiles in some non-standard cases," in Nonpara-
metric Techniques in Statistical Inference, ed. M. L. Puri, 343-348, Cambridge University Press,
Cambridge.
REFERENCES 939
Wellner, J. A. (1977a). "A martingale inequality for the empirical process," Ann. Prob., 5, 303-308.
Wellner, J. A. (977b). "A law of the iterated logarithm for functions of order statistics," Ann.
Statist., 5, 481-494.
Wellner, J. A. (1977c). "Distributions related to linear bounds for the empirical distribution
function," Ann. Statist., 5, 1003-1016.
Wellner, J. A. (1977d). "A Glivenko-Cantelli theorem and strong laws of large numbers for
functions of order statistics," Ann. Statist., 5, 473-480. [Correction: Ann. Statist., 6 (1394).]
Wellner, J. A. (1978a). "A strong invariance theorem for the strong law of large numbers," Ann.
Prob., 6, 673-679.
Wellner, J. A. (1978b). "Limit theorems for the ratio of the empirical distribution function to the
true distribution function," Z Wahrsch. verw. Geb., 45, 73-88.
Whittaker, E. T. and Watson, G. N. (1969). A Course of Modern Analysis, 4th ed., reprinted
Cambridge University Press, Cambridge.
Wichura, M. (1969). "Inequalities with applications to the weak convergence of random processes
with multidimensional time parameters," Ann. Math. Statist., 40, 681-687.
Wichura, M. J. (1970). "On the construction of almost uniformly convergent random variables
with given weakly convergent image laws," Ann. Math. Statist., 41, 284-291.
Wichura, M. J. (1971). "A note on the weak convergence of stochastic processes," Ann. Math.
Statist., 42, 1769-1772.
Wichura, M. J. (1973a). "Some Strassen-type laws of the iterated logarithm for multiparameter
stochastic processes with independent increments," Ann. Prob., 1, 272-296.
Wichura, M. J. (1973b). "Boundary crossing probabilities associated with Motoo's law of the
iterated logarithm," Ann. Prob., 1, 437-456.
Wichura, M. J. (1974a). "On the functional form of the law of the iterated logarithm for the
partial maximum of independent identically distributed random variables," Ann. Prob., 2,
202-230.
Wichura, M. J. (1974b). "Functional laws of the iterated logarithm for the partial sums of i.i.d.
random variables in the domain of attraction of a completely asymmetric stable law," Ann.
Prob., 2, 1108-1138.
Williamson, Mark A. (1982). "Cramer-von Mises type estimation of the regression parameter:
The rank analogue," J. Mutt. Anal., 12, 248-255.
Withers, C. S. (1976). "Convergence of rank processes of mixing random variables on R, II,"
Technical Report No. 61, Department of Scientific and Industrial Research, Wellington, New
Zealand.
Yang, G. L. (1978). "Estimation bf a biometric function," Ann. Statist., 6, 112-116.
Yoeurp, C. (1976a). "Decomposition des martingales locales et formules exponentielles," Semi-
naire de Proabilites X. Lecture Notes in Mathematics, Vol. 511, pp. 432-480, Springer- Verlag,
Berlin.
Yoeurp, C. (1976b). "Decomposition des martinglaes locales et formules exponentielles," Semi-
naire de Probabilites X, Lecture Notes in Mathematics, Vol. 511, pp. 481-500, Springer- Verlag,
Berlin.
Yor, M. (1976). "Sur les integrales stochastique optionelles et une suite remarquables de formules
exponentielles," Seminaire de Probabilites X, Lecture Notes in Mathematics, Vol. 511, pp. 481-
500, Springer- Verlag, Berlin.
Author Index

Aalen, 0., 258, 294, 301 Braun, H. 1., 133


Abrahamson, I. G., 13, 809 Breiman, L., 42, 57, 59, 61, 62, 68, 70, 335,
Aggarwal, Om P., 173 338, 492, 844
Aly, A. A., 653 Bremaud, J. P., 294, 884
Alegkjavicene, A., 56 Breslow, N., 304, 306
Alexander, K. S., 834 Bretagnolle, J., 797, 799
Andersen, E. S., 377 Breth, M., 367
Anderson, K. M., 474, 476, 478, 480 Brillinger, D. R., 492, 497, 499
Anderson, T. W., 41, 42, 147, 148, 212, 215, Brongtein, E. M., 837
225, 227 Brown, B., 870
Andrews, D. F., 767 Burkholder, D. L., 859
Butler, C., 746, 747
Bahadur, R. R., 3, 483, 585, 586, 591, 781,
782, 856 Campo, R., 776
Barbour, A. D., 849 Cantelli, F. P., II
Barlow, R. E., 776, 778 Cassels, J. W. S., 17, 513
Baum, L., 860 Chan, A. H. C., 559
Baxter, G., 611 Chandra, M., 776, 779
Beekman, J. A., 357 Chang, Li-Chien, 347, 404, 407, 412, 414,
Beirlant, J., 724, 733, 812, 817 415, 424
Bennett, G., 852 Chapman, D., 347
Beran, R., 162, 172, 174 Cheng, P., 386
Bergman, B., 395, 396 Chernoff, H., 3, 668, 856
Berk, R. H., 786 1. hervonenkis, A. Ya., 829
Berning, J. A., 75 Chibisov, D. M., 141, 167, 339, 438, 446,
Bhattacharyya, B. B., 811 465, 466, 470, 472
Bickel, P. J., 51, 65, 131, 132, 162, 538, 641, Chmaladze, E. V., 358, 361
763, 767, 796 Chou, C. S., 888, 891
Billingsley, P., 10, 27, 37, 40, 41, 42, 46, 48, Chow, Y. S., 84, 852, 853
51, 111 Chung, K. L., 12, 35, 506, 626, 858, 860
Birnbaum, Z. W., 11, 178, 349, 350, 356, Clements, G. F., 632
386, 871 Cotterill, D. S., 223
Blackman, J., 255 Cramer, H., 145, 849
Blom, G., 474, 476 Crowley, J., 304, 306
Blum, J. R., 837 Csäki, E., 20, 21, 138, 343, 347, 373, 379,
Boel, R., 888, 891 383, 425, 530, 604, 608, 609
Bolthausen, E., 255, 546 Csörgö, M., 16, 21, 66, 223, 269, 274, 450,
Book, S. A., 559 465, 492, 495, 497, 498, 499, 559, 604,
Boos, D., 668, 690, 761 611,616,643,646,649,650, 778
Borovskih, Ju., 227 Csörgö, S., 42, 145, 223, 274, 492, 499, 503

940

AUTHOR INDEX 941

Daniels, H. E., 13, 347, 404 Goldie, C. M., 776, 778


Darling, D. A., 20, 82, 147, 148, 197, 212, Goodman, V., 846
215, 225, 227, 229, 230, 234, 235, 238, Gordio, L., 749
243, 246, 383, 396, 599, 727 Götze, F., 223
David. H. A., 727 Govindarajulu, Z., 691, 692, 694
Davis, M. H. A., 891 Grigelionis, B., 891
de Haan, L., 651 Groeneboom, P., 783, 784, 789, 792, 793,
Dellacherie, C., 884, 891 794
DeHardt, J., 837 Gut, A., 857, 858, 870
Dempster, A. P., 347
DeVroye, L., 488, 741, 742, 829 Häjek, J., 4, 9, 37, 38, 134, 157, 162, 165,
DeWet, T., 255 169, 182, 194, 700, 703
Diaconis, P., 834 Hall, P., 849
Dobrushin, R. L., 64, 65 Hall, W. J., 158, 324, 776, 778
Doksum. Kjell A., 652, 653, 655 Harter, H. L., 353
Doleans-Dade. C., 896, 897 Hartley, H., 148
Donsker, M. D., 53, 110, 400 Hartman, P., 74
Doob, J., 14, 35, 37, 38, 42, 110, 870 Hewitt, E., 867
Dudley, R. M., 39, 42, 46, 48, 621, 632, 827 Hill, D., 749
832, 833, 834, 837, 838 Hoadley, A. B., 791, 793
Durbin, J., I5, 37, 42, 168, 197, 203, 212, Hodges, J., 705
215, 217, 218, 220, 221, 225, 230, 233, Hoeffding, W., 805, 853
235, 238, 242, 347 Holst, L., 79
Duttweiler, D. L., 587 Horvath, L., 274, 492, 499
Dvoretzky, A., 12, 173, 356 Hu, 1., 356
Dwass, M., 385, 386, 390
Iman, R., 248
Efron, B., 304 [to, K., 41, 452, 626, 851
Eicker, F., 600, 650
Elliot, R. J., 888 Jacod, J., 294, 884, 888, 891, 894, 897
Epanechnikov, V. A., 367 Jain, N. C.. 3, 60, 452, 530
Erdös, P., 20, 82, 452, 510, 599, 626 James, B. R., 420, 426, 511, 512
Esseen, C.. 857 Jäschke, D., 20, 600, 603, 650
Jogedeo, K., 60, 452
Fears, T. R., 405 Johansen, S., 258, 301
Feller, W., 83, 452, 481, 806 Johns, M. V., 687
Ferguson, T. S., 173 Johnson, B. McK., 149
Fernandez, P. J., 882 Jones, D. H., 786
Finkelstein, H., 17, 74, 75, 77, 80, 513 Jure^kova, J., 157, 705
Frankel, J., 410
Freedman, D. A., 42, 65, 538, 763, 834, 895, Kabanov, Yu. M., 891
896 Kac, M., 4, 208, 233, 238, 246, 388
Kalbfleisch, J. D., 293
Gabriel. J., 858 Kale, B. K., 739
Galambos, J., 410 Kallianpur, G., 896
Gambino, J., 370 Kanwal, R., 207
Gänssler, P., 827, 837 Kaplan, E. L., 21. 293
Ghosh, M., 691 Karlin, Samuel, 41, 42
Gill, R. D., 258. 269, 293, 294, 301, 302 Katz, M., 84, 860
306, 317, 325, 846, 888 Kaufman, R., 636
Gine, E., 837 Khmaladze, E. V., 258, 358, 361, 366
Gleser, L., 806 Kiefer, J., 12, 19, 61, 173, 174, 356, 404,
Glivenko, V., II 407, 408, 409, 432, 437, 585, 586, 587, 834
Gnedenko, B. V., 12, 349, 386, 401 Killeen, T., 149

942 AUTHOR INDEX

Klass, M., 855, 860 Neveu, J., 895


Knott, M., 15, 212, 217, 218, 220, 221, 225 Niederhausen, H., 343, 363, 374
Kolmogorov, A. N., 12, 142, 452, 632, 836 Noe, M., 363
Komlos, J., 16, 66, 67, 492, 493, 495
Kopecky, Kenneth J., 199, 230 Olkin, 1., 806
Korolyuk, V. S., 401 Oodaira, H., 77, 78, 82
Kostka, D., 851, 860 Oosterhoof, J., 793, 795
Kotel'nikova, V. F., 358, 361, 366 O'Reilly, N. E., 55, 438, 446, 465, 466,
Koul, H., 106, 110, 162, 255 643
Koziol, J. A., 748, 750 Orlov, A., 748
Kuelbs, J., 636, 846 Owen, D. B., 144
Kuiper, N. H., 39, 142
Parr, W. C., 254
Lai, T. L., 84, 406, 412 Parzen, E., 9, 10, 651
Lauwerier, H. A., 349 Pearson, E., 148
Le Cam, L., 173 Penkov, B. l., 349, 354
Lehmann, E. L., 25, 336, 705 Peterson, A. V., 294
Lenglart, E., 317, 892 Petrov, V. V., 849, 850
Lepingle, D., 895 Petrovski, 1., 452
Levit, B. Ya., 173 Pettitt, A., 234, 238, 240, 241
Levy, P., 534, 727 Phadia, E. G., 174
Lilliefors, H. W., 248 Philipp. W., 636, 837
Liptser, R. S., 302, 884, 888, 889, 890, 891, Pierce, D. A., 199, 230
898 Pinsky, M., 851, 857
Loeve, M., I, 843, 844, 846, 855, 861 Pitman, E. J. G., 1, 373
Lombard, F.. 703 Plachky, D., 856
Loynes, R. M., 158, 197. 230 Pollard, D., 255, 827, 836, 837, 840,
841
McCarty, R. C., 356 Prentice, R. L., 293
McKean, H. P., 41, 452, 626, 851, 896 Proschan, F., 723, 778
Major, P., 16, 65, 66, 68, 492, 493 Pruitt, W., 3, 530
Mallows, C. L., 64, 65 Puri, M. L., 756
Malmquist, S., 38, 42, 336 Pyke, R., 18, 134. 141, 255, 342, 347, 354,
Marcus, M. B., 812, 821, 883 385, 386, 404, 419, 723, 729
Marshall, A. W., 223, 806, 871
Mason, D. M., 274, 405, 425, 429, 431, 474, Quade, D., 168
476, 478, 492, 499, 501, 552, 558, 559,
567, 581, 583, 615, 651, 691, 692, 703 Raghavachari, M., 177
Massey, F. J., 178 Rao, J. S., 234, 483, 668, 690, 727, 739
Mehra, K., 405, 668, 690 Rao, P., 749
Meier, P., 21, 293, 294 Rao, R. R., 483
Meyer, P. A., 294, 884, 887, 888, 891, 894, Read, R. R., 174
896, 897 Rebolledo, R., 258, 294, 888, 892, 895
Michael, J., 248 Rechtschaffen, R., 811
Michel, R., 849, 851, 857 Renyi, A., 39, 42, 142, 145, 146, 336, 345,
Mihalevic, V., 386 347, 348, 723, 844, 860
Millar, P. W., 173 Revesz, P., 16, 21, 66, 495, 497, 498, 499,
Miller, R., 294 559,604,611,616,643,646,649,
Mogulskii, A. A., 12, 526, 530, 846 650
Mogyorödi, J., 859 Rice, S. 0., 149
Müller, D. W., 68, 69 Robbins, H., 404, 408, 507, 865, 867
Root, D., 59
Nagaev, S. V., 56 Ruben, H.. 369, 370
Nair, V. N., 324 Ruymgaart, F. H., 159, 793, 795

AUTHOR INDEX 943

Samuels, S. M.. 806 Strassen, V., 55, 60, 69, 74, 80, 81, 452, 512,
Sander, J. M., 657 513, 621, 632
Sanov, 1. N., 792 Stratton, H., 860
Schoenfeld, D., 251 Stromberg, K., 867
Scholz, F. W., 174, 333 Stute, W., 542, 545, 546, 837
Schucany, W. R., 254 Suh, M. W., 811
Schuster, E. F., 746, 761 Sukhatme, P. V., 336, 721
Sen, P. K.. 139, 140, 432, 691, 692, 694, Sukhatme, S., 230, 232. 233, 243, 246
703, 756, 811 Sweeting, T. J., 724
Seneta, E., 651 Switzer, P., 655
Serfling, R., 174, 473, 772, 774, 860
Shepp, L. A., 149, 471 Taktics, L., 345, 376, 383
Sheu, S. S., 75, 844 Taylor, C., 41, 42
Shiryayev, A. N., 302, 884, 888, 889, 890, Taylor. H. M., 42
891, 898 Teicher. H., 852, 853, 859
Shorack, G. R., 18, 21, 82. 106, 110, 134, Tikhomirov, V. M., 836
141, 162, 197, 404, 410, 41.5, 419, 420, Tingey, F. H., II. 349
465, 506, 511, 512, 518, 541, 545, 569, Tusnädy, G., 16, 379, 383, 492, 493
585, 587, 604, 627, 643, 662, 668, 677,
678, 687, 730, 741, 746, 753, 756, 759, van der Meulen, E. C., 724, 733
783. 784, 789, 792, 793, 794, 812, 817, Vanderzanden, A. J., 51, 140
822, 853, 877 Vandewiele, G., 363
Shore, T. R., 559 van Zuijlen, M. C. A., 104, 724, 733, 807
Sidäk, Z., 4, 9, 37, 38, 157, 162, 165, 169, Van Zwet, W., 501, 806
194, 700, 703 Vapnik, V. N., 829
Siegers, A. J. F., 14, 208 Varadahan, S. R. S., 40
Siegmund, D. 0., 404, 408, 507, 786, 859, Varaiya, P., 888, 891
865, 867 Venter, J. H., 771
Sievers, G. L., 655, 856 Vervaat, W., 159, 594, 595, 658, 659
Silverman, B. W., 773, 774 Vincze, l., 347, 376, 377, 378, 379, 386
Singh, K., 628 von Bahr, B., 857
Singpurwalla, N. D., 776, 779 von Mises. R., 145
Sirao, T., 626
Skorokhod, A. V., 16, 48 Wang, G. L., 768
Slud, E., 443 Watson, G. N., 415
Smirnov, N. V., II, 142, 143, 349, 384, Watson, G. S., 147, 212, 220, 223
401 Weiss, L., 640
Smythe, R., 877 Wellner, J. A., 18, 69, 276, 324, 386, 405,
Sobel, M., 727 406, 410, 415, 420, 424, 425, 426, 442,
Srinivasan, R., 749 465, 545, 627, 668, 690, 691, 776, 778,
Staudte, R., 110 846
Steck, G. P., 13, 367, 373 Whittaker, E. T., 415
Steele, J. M., 834 Wichura, M. J., 29, 47, 51, 77, 78, 81, 131,
Stein, C., 849, 850 132, 860, 876
Steinbach, J., 856 Wintner, A., 74
Stephens, M. A., 149, 212, 234, 237, 238, Withers, C. S., 9
243. 354, 364 Wolfowitz, J., 12, 173, 174, 356
Stigler, S. M., 668, 824, 862 Wong, E., 888, 891
Stone, M., 793
Stout, W., 844 Zinn, J., 425, 812, 821, 837, 846, 883
Subject Index

Absolute empirical process, 744 rank case, 696


Acceptance bands, 247 weighted case, 272
Adapted, 868, 884 Beran's theorem, 172
Affinity, 159 Berry- Esseen theorem, 2, 848
Alternatives: Binomial distribution:
contiguous, 152, 157, 183, 672, 704, 706, exponential bounds, 439, 440
707, 751 Feller's theorem, 481, 482
fixed, 177, 715, 756 generalized, 804
general, 98 large deviation, 483
local, 167, 251, 672 Birnbaum-Tingey formula, II, 349
nearly null, 120 Blackman minimum distance estimator, 254,
regression, 183, 190 759
scale, 190, 191 Bootstrap, 763
symmetry, 751 empirical process, 764
Anderson-Darling statistic, 148, 224, 237 Borel-Cantelli lemma, 859
interval version, 627 Boundary crossing probability, 33
Anderson's inequality, 819, 861 Brownian bridge, 34, 39
Antiranks, 90, 101 Brownian motion, 33, 34, 38, 40
Approximation: Daniel's formula, 345
Stephens', 149, 212 Dempster's formula, 344
T,,, -, 44 large deviations, 786
ARJ conditions, 894 recursions, 357
Associated array of continuous rv's, 102, Smirnov; Birnbaum and Tingey formula,
117, 810 349
Asymptotic expansion, 12, 349 Bounded variation:
Asymptotic linearity, 705, 708, 713, 758 inside (0,1), 43, 91
Asymptotic minimax, 173 inside (- x,x), 289
Asymptotic power, 167, 169, 253 Bounds:
of D, -test, 169 exponential, 440, 444, 446, 451, 453, 457,
of rank tests, 707, 756 461, 484, 545, 850
of W, Z, and II U,, II tests, 218 linear on 6,,, 419
Average df, 98, 117 nearly linear 6,,, 426
on central moments of order statistics,
Bahadur efficiency, 782 456
Bahadur-Kiefer theorem, 586 on functions of order statistics, 428
Banach space, 65 Bracketing, 835
Bands: Breslow-Crowley theorem, 307, 308
acceptance, 247 Brillinger process, 33
confidence, 247, 323, 653, 747, 779 Brownian bridge, 30, 86
Basic martingale, 265 q- functions, 452
censored case, 296, 311, 313 for increments, 626

944

SUBJECT INDEX 945

Brownian motion, 29, 30 Skorokhod, 55


boundary crossing probability for, 33 special, 93, 94, 120, 141, 150, 186, 189,
normalized, 82, 598 191
q-functions, 452 Contamination model, 824
for increments, 625, 626 Contiguity, 152, 157, 183
Strassen's theorem, 80 key condition, 152
transformations of, 30 Le Cam's lemmas, 156, 157, 165
two-sided, 769 linear model:
known scale, 188
(C.ß), 26 unknown scale, 192
Cassels' LIL, 17, 513 scale model, 190, 191
Censoring times, 293 simple regression model, 183, 190
fixed, 293 symmetry. 751
general, 325 Contiguous alternatives, 152, 157, 183, 672,
random, 293 704, 706, 707, 751
Central Limit Theorem (CLT), 2 Continuous:
Lindeberg-Feller, 91 a.s. S-, 48
martingale, 894 mapping, 48, 78
Rebelledo's, 261, 328, 329, 894 Convergence:
Stein's, 850 in .1',. metrics. 470
Chaining, 630, 634, 827 martingale, 874, 875
Change of variable, 25, 106, 896 in II •/y 11-metrics, 140, 319, 462, 466, 469,
Chernoff-Savage theorem, 715 700, 733, 811
Chibisov's theorem, 167, 462 of moments, 475, 862
extension of, 501 in quantile, 10
Chung's theorem, 3, 12, 505 rate, 494, 495, 497, 502
Clustering, 841 relationship between -^„, and -.,,, 862
Combinatorial lemmas. 376 r-th mean, 862
Compactness: of series and integrals, 863
relative, 69, 79, 80, 512 weak, 44, 45
weak, 44 Counting process, 258, 886
criteria for. 46 basic martingale, 265
Comparison: rank case, 696
inequality, 797, 805, 811 weighted case, 272
of variances, independent but not id with censored data, 296, 311, 313, 887
iid, 823, 824 compensator, 259, 888
Compensator, 259, 311, 888 empirical df, 887
Components: heuristic discussion, 258
Anderson-Darling, 226 multivariate, 887
Cramer-von Mises, 215 predictable variation, 259, 888
of OJ,,, 215 renewal, 887
other, 221, 222 Cramer expansion, 849
of symmetry process, 750 Cramer-von Mises statistic, 14, 17, 92, 145.
Concordant, 709 168.178,201,216,503
Confidence bands: Cramer-Wold device, 862
for F, 247 Criteria:
with censored data, 323 relative compactness, 76
for mean residual life e, 779 weak compactness, 46
for shift function 0 = G ' o F - 1, 653 weak convergence, 45, 46, 51, 52
for symmetric F, 747 Crossings of empirical df, 395
Confidence interval. 680, 711 Cumulative hazard function, 264, 295, 898
Constructions: CLT on 10,1], 307
Hungarian, 55, 494 CLT on 10,1] in II •/q II , 319
refined, 499 consistency, 304, 306

946 SUBJECT INDEX

estimator, 295 Dual predictable projection, 886, 888


with general random censoring, 325 Dvoretzky, Kiefer, Wolfowitz:
martingale representation, 312 inequality, 12, 354
predictable variation, 312 inequality, extension to non-identically
process, 295 distributed rv's, 797
Cumulative total time on test transform, 779 minimax theorem, 173

(D,2), 27 Econometric functions, 775


Daniels result, 13, 345, 404 Efficiency:
Darling-Sukhatme theorem, 229 Bahadur, 781, 782
Decomposition: Pitman, 673
of A, 225 Eigenfunction, 207
Doob-Meyer, 311 Eigenvalue, 207
of U, 213, 215 Embedding:
of U,,, 215 empirical process, 494, 495, 499. 500, 501
Kac-Siegert, 210 L-statistic process, 669
principal components, 203, 250 partial sums, 59, 61
of W, 216 Poisson, 340, 578
of Z, 225 quantile process, 337
Density estimator, 764 Skorokhod's partial sum process, 60
Density quantile function, 232, 637, 640 Empirical distribution function (df). I. 85,
Difference, uniform empirical-quantile 98, 265
process, 584 as counting process, 262, 887
Distance: intersection with a general line. 344, 380
Hellinger, 158 inverse smoothed uniform, 86, 87, 337,
Mallows, 65 384
Prohorov, 65 inverse uniform, 85, 87
total variation, 159 large deviations, 785, 793
Wasserstein, 56, 62 number of intersections with a line, 380
Distribution function: reduced, 99, 116
average, 98 smoothed uniform, 86, 87, 384
empirical, I, 85, 98 uniform, 85, 87, 339
uniform empirical, 85 U-statistic, 772
weighted average, 117 Empirical measure, 630, 795, 826
Distributions: large deviations, 795
beta, 489 Empirical process,
binomial, 85, 480 absolute, 744
exponential, 496 bootstrapped. 763
extreme value, 82, 599 crossings from above, 395
of functionals, 502 Dwass's Poisson process approach, 388
gamma, 488 estimated, 200, 228, 231
generalized binomial, 804 exponential identity, 269
joint. 497 general, 98
Poisson, 484 general weighted, 99
uniform. 4, 85 indexed by functions, 630, 827
supremum of Brownian bridge, 34 indexed by sets, 621, 625, 827
supremum of Brownian motion, 34, 143 inverse smoothed uniform, 86, 337, 384
Doleans-Dade formula, 897 inverse uniform, 86
Donsker class (of functions), 839 ladder excess, 393
Donsker theorem, 53 LIL, 504. 505. 513, 517
Doob: local time, 398
heuristics, 14 location of maximum, 384
transformation, 30 martingale, 444
Doob-Meyer decomposition, 311 modulus of continuity, 542, 544

SUBJECT INDEX 947

Empirical process (Continued) Euler's constant, 864


normalized, 597 Exact distributions, 343
rank, 90, 101, 153 Exact slope, 781
rank symmetry, 745 Exponential:
reduced, 99, 116, 810 formula, 269, 301, 326, 897
of residuals, 194 identity, 269, 301, 302
sequential uniform, 131, 491 rv's 334, 335, 496
smoothed uniform, 86 of semimartingale, 897
studentized. 600 supermartingale, 897
symmetry, 744 Exponential bounds:
two-sample, 402, 715 Bennett, 440
uniform, 86, 338 Bernstein, 440
U-statistic, 772 for binomial probabilities, 440
weighted, 100, 109, 117, 151, 153, 695, 811 Bretagnolle, 797
of standardized residuals, 196 Dvoretzky, Kiefer, Wolfowitz, 354
uniform, 19. 88, 695 for empirical measure indexed by sets, 828
zeros, 390 empirical process near 0, 444
Empirical rank process, 90, 101 Hoeffding, 440
modulus of continuity, 116 James, 444
of standardized residuals, 196 Kolmogorov's 855
Entropy: for local martingale with bounded jumps,
combinatorial, 836 895
metric, 632, 835 for oscillation modulus w„(a), 545
with bracketing, 835 for Poisson, Gamma, Beta rv's, 484
Pollards, 836 for Poisson process, 569
Envelope function, 836 quantile process near 0, 457
Equivalent: Shorack, 444
processes, 25 Shorack-Wellner, 622
random elements, 25 for uniform empirical process /q, 445
Esseen's lemma, 850 for uniform order statistics, 453
Estimated empirical process, 200, 228, 231 for uniform quantile process /q, 460
Estimation: Vapnik-thervonenkis, 828
of df, I, 171 for weighted empirical process 7L,,, 820
of location, 181, 254, 670, 680, 710, 757, Wellner, 440
767, 821 Exponentiality, testing for, 739
of symmetric df, 746, 759 Exponential probability plot, 741
unknown point of symmetry, 757, 761 Extreme value distribution, 82, 599
Estimator:
based on: Feller's theorem, 83, 482
integral test of fit, 254, 759 Fiber bundles, 431
rank statistics, 710 Filtration, 884
signed rank statistics, 757 Finite dimensional:
cumulative hazard function, 295 convergence, 25
of density function, 764 distributions, 25
distribution function. I, 171 subset, 24
with random censorship, 293 Finite sampling, 135
Hodges-Lehmann, 758, 762 inequalities, 877
minimum distance, 254, 759 process, 90
of point of symmetry, 757 Finkelstein's LIL, 513
product-limit, 293 Fisher information:
shorth, 767 for location, 181
smooth estimator of a df, 86, 87, 384, 764 for scale, 182
U-statistic, 771 Fixed alternatives, 177, 715, 756

948 SUBJECT INDEX

Fixed censorship model, 293 Goodness of fit, 142


Fluctuation inequality, 49, 51 integral, 145, 146, 149, 178, 201, 747
for 7L,,, 110 supremum, 142, 145, 167, 169, 177, 747
Formula: Growth function, 828
exponential, 269, 301, 326, 897
integration by parts, 300, 867 Häjek-Renyi inequality, 18
Ito, Doleans-Dade, Meyer, 896 Häjek-Shepp theorem, 157
for means and covariances, 862 Hazard function, 264, 295
Stirling's, 863 Hellinger distance, 158
Function: Hermite polynomials, 228
distribution. 1, 85, 98 Hoeffding's inequalities:
econometric, 775 for Binomial, 440
Lorenz curve, 775 exponential, 853, 855
mean residual life, 775 for finite sampling, 831
quantile, 3, 10, 637 for generalized Binomial, 805
reliability, 775 Hodges-Lehmann estimator. 758, 762
total time on test, 775 Hungarian construction:
Functional Donsker class, 839 of empirical process, 16, 494, 499
Functional limit theorem, for L -statistics, of partial sum process, 55, 66
665 of quantile process, 496
Functionals: of sequential uniform empirical process,
continuous, 48, 764 493, 495
rates of convergence, 502 Hypergeometric rv, 831
Fundamental identity, 100
for p,,, 90, 101 Increment, 28, 533, 621
for U,, and V,,, 86, 100 independent, 28
notation ; 28, 533
Gambler's ruin, 58 stationary, 28
Gaussian process: Identities, 301
Brillinger, 33 Identity:
Brownian bridge, 30, 86 for estimated empirical process, 228
Brownian motion, 29 exponential, 269, 301
isonormal, 838 fundamental, 86, 100
Kiefer, 16, 30, 131 for k,,, 90, 101
limit for reduced empirical process, Independent but not identically distributed
109 rv's, 98, 102, 108, 119, 325, 796
limit for weighted empirical process, 109, L -statistics, 821
118 Indexing:
Uhlenbeck, 30 by continuous functions, 630
Generalized binomial distribution, 804 by functions, 630, 827
General spaces, 826 by intervals, 621
Gini index, 676, 762, 779 by sets, 621, 625, 827
Glivenko-Cantelli theorem, 11, 95 Induced probability, 25
for empirical hazard function, 304 Inequalities:
for empirical measures, 827, 834, 837 Anderson, 819, 861
extended, 105 basic, 842
Lai's variation, 410 Bennett, 440, 851
for product-limit estimator, 304 Bernstein, 440, 855
for uniform empirical df, I1, 95, 410 Berry- Esseen, 848, 849
uniformity in F or P. 828 binomial probabilities, 440
via metric entropy, 834 Birnbaum-Marshall, 871, 873
via Vapnik-Chervonenkis classes of sets, Bonferoni, 861
827 Bretagnolle, 797
SUBJECT INDEX 949

Inequalities (Continued) Wellner, 440


Brown, 870 Wichura, 876
Burkholder, 859 Information:
Burkholder, Davis, Gundy, 894 Fisher, 181, 182
Cauchy-Schwarz, 843 KulIback-Leibler, 159, 790
Chebyshev, 842 Integrable:
convex, 797, 881 process, 884
Cr, 843 uniformly, 884
Doob, 870, 871, 874 square, 884
Dvoretzky, Kiefer, Wolfowitz, 354 Integral metrics, 470
elementary exponential, 856 Integral test of fit, 145, 146, 149, 178, 201, 747
events lemma, 860 Integrals:
exponential, 354, 440, 445, 453, 460, 851, of Gaussian (normal) processes, 42
856 stochastic, 92, 122
finite sampling, 877 Integral tests:
fluctuation, 49, 51, 110 for partial sum processes, 55
Gill's linear bounds. 317 for q functions, 462, 517
Gill-Wellner, 316, 845 for q functions for intervals, 625, 626
Häjek-Renyi, 844, 873 Integrated empirical difference process, 594
Hoeffding, 440, 805, 853, 855 Integrated uniform quantile process, 718
Hoeffding's finite sampling, 878 Integration by parts, 300, 868
Hdlder, 843 Intervals:
Hornich's, 858 confidence, 680, 711
Jensen, 843 indexing by, 621, 625, 627
Kolmogorov, 843 Invariance, strong, 839
Kolmogorov's exponential, 855 Inverse transformation, 3, 638
Komlös, Major, Tusnädy, 494 lsonormal Gaussian process, 838
Lenglart, 317, 892 Iterated logarithm laws:
Levy, 844, 845, 879 Cassells', 513
Liapunov, 843 Chung's, 505
Marcus-Zinn, 879-883 Chung's "other," 3
Markov, 842 for ®,,, 586, 587
martingale, 869, 892 Mogulskii's, 516
maximal. 511, 512 for OJ,,, 504, 505, 513
Menchoff, 844 for N,,, 504, 505, 516
Mill's ratio, 850 Ito, Doleans-Dade. Meyer formula, 896
minimal, 527
Minkowski, 843 James LIL, 517
Mogulskii, 845 Joint distributions, 497
monotone, 844, 846 of mean and median, 671
moment, 843, 858 Jureekovä lemma, 157
for processes, 877
Pyke-Shorack, 18, 134 Kac-Siegert decomposition, 210
Sen, 42 Kernel, 207
Shorack-Smythe, 877 density estimator, 764
Shorack-Wellner, 622 Kiefer process, 16, 30, 131
Skorokhod, 844, 879 modulus of continuity, 558
Symmetrization, 812, 845, 878 Kolmogorov-Smirnov, 12, 91, 142, 168, 177
for IIU,IgII;;, 134 Kuiper statistic, 142, 168, 177
for II U„/q II
,,,, 621
,. Kullback-Leibler information, 159, 790
von Bahr, 857
von Bahr- Esseen, 858 Lack of memory, 337
weak symmetrization, 845 Ladder points, 391

950 SUBJECT INDEX

Large deviations, 781, 782, 855 in-probability bounds, 418


Chernoff's theorem, 3, 856 Mason's extensions, 425
for beta rv's, 489 maximal inequalities, 423
for binomial rv's, 483 for non-identically distributed rv's, 807
empirical df, 785, 793 for restricted intervals 1a,,l], 424
empirical measure, 795 upper class sequences for, 420
for gamma rv's, 489 Shorack and Wellner lower bound, 415
integral statistics, 794 Van Zuijlen's, 807
for partial sums, 786 Wellner's upper bound, 415
for Poisson rv's, 485, 856 Linear combination of order statistics, 660,
supremum tests of fit, 783 821
Law, zero-one, 57 Lipschitz-'/ modulus:
Law of the Iterated Logarithm (LIL): of U, 540
for Brownian bridge, 72 of U,,, 544
for Brownian motion, 72, 80 Local alternatives, 167, 252, 672, 704, 706,
Chung's, 505 751
Chung's other, 3, 526 Local property, 884
classical, 2, 69, 70, 83 Local time, 398
for empirical process, 11, 504 Locally:
Hartman-Wintner, 73 integrable, 884
for k -th largest spacing, 741 square integrable, 884
limit sets, 17, 79 Lorenz curve, 775
for L -statistics, 665 Lower class results, for order statistics, 407,
Mogulskii's, 526 408
multivariate, 74 L- statistic, 660
for normalized empirical process, 432 CLT, 662, 664
for normalized quantiles, 435 general, 660, 675
for normal rv's, 70 generalized, 774
for partial sum process, 80 Gini mean difference, 676
for small order statistics, 407, 408 of independent but not identically
Smirnov's, 504 distributed rv's, 821
for standardized quantile process, 650 LIL, 662, 665
Strassen's converse, 73 linearly trimmed mean, 674
Law of large numbers (LLN): mean, 670
classical strong (SLLN), 2, 83 median, 670
for cumulative hazard function estimator, process of future, 665
304 process of past, 665
for empirical df, 95, 105 randomly trimmed mean, 678
for empirical measures, 827, 834, 837 randomly winsorized mean, 678
for product-limit estimator, 304 SLLN, 662, 665
weak (WLLN), 84 tri-mean, 676
Le Cam's lemmas, 157, 165, 672 trimmed mean, 674, 678
Legendre polynomials, 226 winsorized mean, 679
Lenglart's inequality, 317, 892
Likelihood ratio statistic, 154, 706 Mallows statistic, 149
Limit distribution, of ordered uniform Mann-Wald theorem, 2, 10, 14
spacings, 725 Mapping:
Limit sets, for -, 79 continuous, 48, 78
Linear rank statistics, 92, 101, 695, 699 projection, 24, 76
Linearization, T, „-, 46, 76 theorem, 48, 78
Linear bounds for empirical df's: Martingale, 259, 260, 872, 885
Chang's lower bound, 345 basic, 264
Daniels' upper bound, 345, 404 censored case, 296

SUBJECT INDEX 951

Martingale (Continued) Model:


quantile process. 717 linear, 188, 192, 194
rank, 695 with known scale, 188
weighted case, 272 with unknown scale, 192
for censored data, 296, 31 simple regression, 183
central limit theorem, 894 Moderate deviations:
convergence theorem for, 874, 875 for gamma rv's. 489
convergence theorem for reversed, 874 for Poisson rv's, 487
for the empirical process, 133, 271 Modulus of continuity, 531, 533
inequalities, 869, 892 of Brownian bridge, 535
local, 885 of Brownian motion. 534
for nonidentically distributed of centered Poisson process L, 571
observations, 140 of empirical process U,,. 542
purely discontinuous, 885 of empirical rank process R,,, 116
for the quantile process. 133 of general weighted empirical process 1,,,
for rank statistics, 695 109, 119
Rebolledo's CLT, 261 of Kiefer process Ilö, 558
representation theorem. 891 Lipschitz -'/2. 534. 540
reversed. 136, 138, 874 Moment generating function, 855
semi-, 896 Moments:
square-integrable, 884 bounds on, 476
sub-, 138, 139, 260, 885 bounds on central moments of order
super, 885 statistics. 456
transform theorem, 260, 890 bounds for II U,,/q II , 276
for weighted uniform empirical process, convergence, 862
133, 273 finiteness. 475
Mass function, 264 of functions of order statistics, 474
Maximum likelihood, 174, 332 of order statistics, 474
Mean: of sums, 857
deviation, 825
linearly trimmed, 674
metrically symmetrized Winsorized, 678 Nearly linear bounds:
randomly trimmed or Winsorized, 678 bounds on functions of order statistics,
sample, 670, 824 428
tri-, 676 logarithmic bounds, 426
trimmed. 674, 689, 824 powers oft. 426
Winsorized, 679 Nearly null:
Mean residual life, 775 alternatives. 753
Measurable: array, 120, 753
function space, 24 empirical , 120, 811
process. 884 weighted empirical ', 121
Median. 670, 825 Nonparametric maximum likelihood, 174.
Mercer's theorem, 15, 208 332
Metric entropy. 835 Normalized Brownian motion, 82, 598
with bracketing, 835 Normalized empirical process, 597
Metrics: LIL. 604, 609
integral, 47t) Normalized quantile process. 615
/q II or q -, 140, 273, 319, 446, 460, 462 LIL, 616
466,511,517,641.774, 810 Normalized renewal spacings, 728
supremum, 26 Normalized spacings, 336
uniform, 26
Wasserstein. 56, 62 Optional sampling theorem, 58
Metric entropy, 632, 835 Ordered renewal spacings process, 729
Minimum distance estimator, 254 Ordered uniform spacings process, 731

952 SUBJECT INDEX

Order statistics: process, 260,310,885


bounds for functions of, 429 projection, 886, 888
linear combinations, 660, 821 v-field, 310, 885
of non-identically distributed rv's, 821 stopping time, 886
moments, 474 variation process, 259, 269, 310, 886
pseudo, 685 Principle component decomposition, 203,
uniform, 86, 97, 335, 717 250
distribution and moments, 97 heuristic, for processes, 15, 205
O'Reilly's theorem, 466 for matrices, 203, 204
extension of, 501 normalized, 206
partial sums, 55 for symmetry process, 750
Orthogonal decomposition, 207, 250 for U, 213, 215
of Brownian bridge, 33, 213, 215 for U,,, 215
of Brownian motion, 33 for 7L, 225
of the symmetry process Z, 750 Probability integral transformation, 5
Oscillation modulus: Process, 24
of U., 542 absolute empirical, 744
ofV,581 bootstrapped empirical, 763
Ornstein-Uhlenbeck process, 20, 30, 32, 598 Brillinger, 33
Brownian bridge, 30, 86
Brownian motion, 29, 30
Parameters estimated: centered Poisson, 341, 570, 578
empirical difference process, 595 counting, 258, 886
empirical process, 228, 595 cumulative hazard, 295
quantile process, 595 embedded partial sum, 60
Partial sum process, 52 empirical, 1, 98
of future observations, 56 empirical difference, 19, 584
Hungarian construction, 55, 66 empirical indexed by functions, 630,
Skorokhod construction of, 54 827
Skorokhod embedding, 55, 56, 59, 60, 61 empirical indexed by sets, 621, 625, 827
smoothed, 497 empirical rank, 90, 115, 153
on [0,o), 68 empirical rank symmetry, 745
Pitman efficiency, 673, 707 empirical symmetry, 744
Plots: equivalent, 25
exponential probability, 741 estimated empirical, 200, 228, 231, 595
PP-plots, 248, 250 estimated empirical difference, 595
QQ-plots, 248, 250 estimated quantile, 595
SP-plots, 248, 250 finite sampling, 90
Poisson: Gaussian, 29, 30, 86, 109, 118
bridge, 340, 575, 578 increasing, 885, 887
centered, 341, 570, 578 integrable, 884
distribution, 557 integrated empirical difference, 594
embeddings, 340, 578 integrated uniform quantile, 718
process, 334, 388, 556, 569 isonormal Gaussian, 838
representation of U,,, 339, 556, 578 Kiefer, 30, 131, 493
Polynomials: L-statistic. 665
Hermite, 228 martingale, 132, 259
Legendre, 226 normal, 25
Portmanteau theorem, 47 normalized Brownian motion, 82
Positive definite, 207 normalized uniform empirical, 19, 597
Positive semidefinite, 207 normalized uniform quantile, 615
Power, 167, 168, 169, 178, 707 normalized uniform spacings, 731
Predictable, 260 ordered renewal spacings, 729
covariation process, 886 ordered uniform spacings, 731

SUBJECT INDEX 953

Process (Continued) Pseudo order statistics, 685


Ornstein-Uhlenbeck, 20, 30, 32, 598 tj,- function, 440, 445, 455, 545, 570, 571
partial sum, 52, 60 properties of tl,, 441
Pitman efficiency, 707 properties of, 453, 455
Poisson, 334, 339, 388, 556, 569
Poisson bridge, 340, 575, 578 Quadratic:
predictable, 260, 310, 885 covariation process, 886
predictable variation and covariation. 259, variation process, 886
269, 310. 886 Quandrats, 28
product-limit quantile, 657 Quantile:
quadratic variation and covariation, 886 asymptotic normality, 639
quantile, 86, 98, 100, 457, 460, 469, 496, for censored data, 657
504, 513, 581, 637, 657, 717, 718 convergence in, 10
rank statistic, 699, 714 function, 3, 10, 637
reduced empirical, 99, 116, 118 p -th sample, 639
reduced quantile, 100 Quantile process:
renewal, 727, 739 estimated, 595
renewal spacings, 727 general, 98, 100
sequential uniform empirical, 131, 491 integrated uniform, 718
smoothed partial sum, 337, 497 normalized, 615
smoothed uniform empirical, 86 product limit, 657
smoothed uniform quantile, 86, 337 reduced, 100
square integrable, 310 renewal process construction, 496
standardized quantile, 627 SLLN, 651
standardized Q-Q, 652 smoothed uniform, 86
stationary, 28 standardized, 637,638
stochastic, 24 uniform, 13, 86, 457, 460, 469, 504, 513,
studentized empirical, 600 581, 717
time reversal, 32, 69 Quasi-left-continuous, 889
two-dimensional Poisson, 335 Q- functions, 55
Uhlenbeck, 20, 30, 32, 598 Q- metric, 140, 273, 319, 446, 460, 462, 466,
uniform empirical, 13, 86, 338 511, 517, 641, 774, 810
uniform empirical difference, 19, 584 Q-Q:
uniform quantile, 86, 457, 460, 469, 496 plot, 248
uniform spacings, 731 process, 651
U -statistic empirical, 772
weighted empirical, 99, 103, 109, 117, 151, Rademacher rv's, 812, 879
153 Radon-Nikodym derivative, 157, 789
weighted empirical of standardized Raghavachari's theorem, 177
residuals, 196 Random:
weighted normalized renewal spacings, element, 24
730 trimming or Winsorizing, 678
weighted normalized uniform spacings, Random censorship model, 293
732 iid censoring, 293
weighted rank of standardized residuals, independent but nonidentically distributed
196 censoring, 325
weighted uniform empirical, 18, 88 Randomly trimmed mean, 678
Product-limit estimator, 293 Randomly Winsorized mean, 678
consistency, 304 Random variable (rv), 24
identities, 301 Bernoulli, II, 804
as maximum likelihood estimator, 332 generalized binomial, 804
Product-limit quantile process, 657 independent but not identically distrib-
Projection mapping, 24, 76 uted, 98, 102, 108, 119, 325, 796
Projection pursuit, 834 hypergeometric, 831

954 SUBJECT INDEX

Ranks, 90, 101 Residuals, 194, 197


Rank statistic processes, 699, 703, 714 classical, 197
null hypothesis, 699 empirical rank process of standardized,
contiguous alternatives, 714 196
Rank statistics, linear, 90, 101, 151, 695, 699 robust, 197
Rate of convergence: standardized, 195
of functionals, 502 weighted empirical process of
of II Z, II, 604 standardized, 1%
Rebelledo's CLT, 261, 894
Recursion: Sample:
Bolshev's, 366 mean, 670, 824
Noe's, 362 median, 670, 825
Ruben's, 369 mode, 771
Steck's, 367 Sanov:
Steck's one-sample formula, 369 problem, 792
Steck's two-sample formula, 374 theorem, 793
Reduced empirical process, 99, 116, 118, Scheffe's theorem, 862
810, 811 Score(s) function, 695, 699, 717
Reduced quantile process, 100, 811 bounding function, 662
Reductions to [0,11, 99, 100, 116, 118, 291, Semimartingale, 896
810, 811 Sequential uniform empirical process, 131,
Reflection principle, 34, 35, 41 491
Regression model, 183 Sets, Vapnik-(hervonenkis classes, 828
Regularly varying, 651 Shatter, 828
Relative compactness, 69, 79 Shift function A = G ' o F - 1, 652
-

Brownian motion, 80 Shorth, 767


criteria on (D,a), 75, 76 Sigma-field:
extension of classical LIL, 74 ball, 26
of the integrated empirical difference Borel, 26
process 0,,, 595 see also Filtration
of UJ„ and 4/,,, 512 Signed rank statistics, 753
of the L-statistic processes, 666 null hypothesis, 753
mapping theorem, 78 contiguous alternatives, 755
multivariate, 74 Skorokhod:
partial sums, 79 construction, 16, 54, 55, 810
rv's, 74 elementary theorem, 9
of standardized quantile process Q, 650 embedding, 55, 56, 59, 60, 61
Relatively compact, 69 Skorokhod-Wichura-Dudley theorem, 23, 47
Reliability functions, 775 Slowly varying, 651
Renewal: Smirnov, 11, 349
construction, 496 Smoothed:
normalized spacings, 728 estimator of df, 86, 87, 384, 764
process, 728,739 partial sum process, 337
spacings, 727 uniform empirical df, 86
spacings processes, 727 uniform quantile process, 86, 337
Renyi's: Space:
representation of spacings, 723 Banach, 65
statistic, 142 C. 26
Representation, 93 C r , 29
of the uniform empirical process: complete, separable, 26, 27
Chibisov's, 339 D, 26
conditional, 339, 556 metric, 26
Kac's, 339 Spacings, 720
Poisson, 578 functions of, 734

SUBJECT INDEX 955

Spacings (Continued) shorth, 767


k-th largest, 741 sign. 671
LIL, 741 signed rank, 753
limit distributions of ordered uniform, 725 spacings, 720, 733
726 supremum, 784
moments. 721 symmetry, 747
normalized, 336, 728 trimean. 676
normalized exponential, 336, 721 trimmed mean, 674, 679, 824
ordered uniform, 721 U-, 771
renewal, 727, 728 Watson, 147
Renyi's representation, 723 Wilcoxon, 717
statistics, 733 Winzorized mean, 679
uniform, 98. 717 Steck formula, 367, 369, 374
Spacings processes: Stein's CLT, 850
normalized renewal, 728 Stephens' approximation, 212
normalized uniform. 731 Stirling's formula, 863
ordered renewal, 729 Stochastic basis, 884
ordered uniform, 731 Stochastic integrals, 92, 122, 890
weighted normalized renewal, 730 for counting processes, 890
weighted normalized uniform, 732 martingale properties, 891
Sparse class of functions, 94. I50, 836 with respect to Brownian bridge U or W,
Special construction, 93, 185, 189, 191, 439, 92, 125, 127
491, 492, 728 with respect to 5, 127
Square integrable. 884 Stochastic process, 24
Standardized Q-Q process, 652 Stopping time, 57, 884
weak convergence, 653 predictable, 886
Standardized quantile process, 637 totally inaccessible, 886
strong approximation, 645 Strong approximation:
relative compactness, 650 of lt,,, 494
weak convergence, 638 of U,,, 494
Stationary: of N,,, 497
increment, 28 of standardized quantile process, 645
process, 28 Strong invariance class, 839
Statistics: Strong law of large numbers (SLLN), 2, 83
Anderson-Darling, 148 for empirical df, I1, 95, 105, 410
Cramer-von Mises, 14, 17, 92, 145, 168, for empirical measures, 827, 834
178, 201, 503 for L-statistics, 662, 665, 669
integral, for symmetry. 747 for quantile process, 651
interval version of Ander<on-Darling, 627 Strongly unimodal density, 767
Kolmogorov-Smirnov, 91, 142, 168, 177 Strong Markov, 57
Kuiper, 142, 168, 177 Studentize, 684
likelihood ratio, 154. 672, 706 Studentized process, 600
linearly trimmed mean, 674 Submartingale, 138, 139, 260, 885
linear rank, 92. 101, 695 Subsequence, 507, 510, 865
L-statistics, 660, 821 Supermartingale, 885, 895
Mallows, 149 Supremum metric, 26
mean, 670, 824 Supremum statistic, 783
mean deviation, 825 Survival function:
median, 670, 825 CLT, 308
metrically Winzorized or trimmed mean, on (0,/], 308
682 on 10,71 in IIVgII,319
mode, 771 confidence bands, 323
order, 86, 97, 660, 717, 821 exponential formula, 301
rank, 695, 699 general random censoring, 325
Renyi, 142 identities, 301

956 SUBJECT INDEX

LLN, 304 Uniformly integrable, 884


martingale representation, 312 Uniform order statistics:
predictable variation, 312 representation via:
process, 294, 296 exponentials, 335
product-limit estimator, 293 Poisson process, 336
Survival times, 293 Upper and lower class results:
Symmetric df, 744, 745, 746 for II U„ II , 505
Symmetrization inequality, 812 Feller, Erdös, Kolmogorov, Petrovksii, 452
Symmetry, 743 for order statistics, 407, 408
testing for, 747, 749 special subsequence, 507, 510
Urn model, 831
Tail rate function, 663 U- shaped, 273
Test: U -statistic, 771
chi-square, 841 empirical process, 771, 772
of exponentiality, 739 estimators, 771
goodness of fit, 142-150, 627, 746
likelihood ratio, 673 Vapnik-C`hervonenkis:
of normality, 676 classes of sets, 828, 833
sign, 671 index, 827
symmetry, 746 theorem, 828
of uniformity, 142, 150, 627, 733 Vervaat's lemma, 658, 659
Tightness, 46 Vitali's theorem, 862
Time:
interarrival, 334 Wasserstein distance, 56, 62
local, 398 Weak compactness, 44
stopping, 57, 884 criteria for, 46
survival, 295 Weak convergence, 43
waiting, 334 criteria for, 45, 46, 51
Time reversal process, 32, 69 of cumulative hazard function estimator
T,,.
process, 307, 319, 329
approximation, 44, 76 of general weighted empirical process Z,
linearization, 76 109, 118
Topology: of U,,, 93, 110
supremum, 793 indexed by functions, 632, 633
tau, 793 indexed by intervals, 625
Total time on test, 775, 779 with respect to II Iq II , 140, 462
Transformation: of L- statistic process, 666
Doob, 30 of product-limit estimator process, 308,
inverse, 3 319, 329
probability integral, 5 of standardized Q-Q process, 652
Two-sided Brownian motion, 769 of standardized quantile process, 638
of Z,,, with respect to II /q II , 810
u.a.n., 88, 273 Weak Law of Large Numbers (WLLN), 83
Uhlenbeck process, 20, 30, 32, 598 for L- statistics, 662
Uniform: Weighted empirical process, 99, 103, 109,
empirical difference process, 584 117, 151, 153
empirical distribution function (df), 85, 87 general, 99, 108, 809
empirical process, 86 modulus of continuity, 109
metric, 26 reduced 7,, of the «,, 's, 100, 109
;

order statistics, 86, 97, 334 reduced Z. of the ß„'s, 117, 810
quantile process, 13 z, 109, 810
smoothed empirical df, 86, 87 uniform, 19, 88, 695
spacings, 98, 717, 720 Weight function, 784
spacings processes, 728, 729, 730, 731
Uniformity, test of, 733 Zero-one law, 57

You might also like