Professional Documents
Culture Documents
Empirical Processes
with Applications
to Statistics
b--------~
Books in the Classics in Applied Mathematics series are monographs and textbooks declared out
of print by their original publishers, though they are of continued importance and interest to the
mathematical community. SIAM publishes this series to ensure that the information presented in these
texts is not lost to today's students and researchers.
Editor-in-Chief
Robert E. O'Malley, Jr., University of Washington
Editorial Board
John Boyd, University of Michigan
Leah Edelstein-Keshet, University of British Columbia
William G. Faris, University of Arizona
Nicholas J. Higham, University of Manchester
Peter Hoff, University of Washington
Mark Kot, University of Washington
Peter Olver, University of Minnesota
Philip Protter, Cornell University
Gerhard Wanner, L'Universite de Geneve
Petar Kokotovic, Hassan K. Khalil, and John O'Reilly, Singular Perturbation Methods in Control: Analysis
and Design
Jean Dickinson Gibbons, Ingram Olkin, and Milton Sobel, Selecting and Ordering Populations: A New
Statistical Methodology
James A. Murdock, Perturbations: Theory and Methods
Ivar Ekeland and Roger Temam, Convex Analysis and Variational Problems
Ivar Stakgold, Boundary Value Problems of Mathematical Physics, Volumes I and II
J. M. Ortega and W. C. Rheinboldt, Iterative Solution of Nonlinear Equations in Several Variables
David Kinderlehrer and Guido Stampacchia, An Introduction to Variational Inequalities and Their Applications
F. Natterer, The Mathematics of Computerized Tomography
Avinash C. Kale and Malcolm Slaney, Principles of Computerized Tomographic Imaging
R. Wong, Asymptotic Approximations of Integrals
O. Axelsson and V. A. Barker, Finite Element Solution of Boundary Value Problems: Theory and Computation
David R. Brillinger, Time Series: Data Analysis and Theory
Joel N. Franklin, Methods of Mathematical Economics: Linear and Nonlinear Programming, Fixed-Point
Theorems
Philip Hartman, Ordinary Differential Equations, Second Edition
Michael D. Intriligator, Mathematical Optimization and Economic Theory
Philippe G. Ciarlet, The Finite Element Method for Elliptic Problems
Jane K. Cullum and Ralph A. Willoughby, Lanczos Algorithms for Large Symmetric Eigenvalue
Computations, Vol. I: Theory
M. Vidyasagar, Nonlinear Systems Analysis, Second Edition
Robert Mattheij and Jaap Molenaar, Ordinary Differential Equations in Theory and Practice
Shanti S. Gupta and S. Panchapakesan, Multiple Decision Procedures: Theory and Methodology
of Selecting and Ranking Populations
Eugene L. Allgower and Kurt Georg, Introduction to Numerical Continuation Methods
Leah Edelstein-Keshet, Mathematical Models in Biology
Heinz-Otto Kreiss and Jens Lorenz, Initial-Boundary Value Problems and the Navier-Stokes Equations
J. L. Hodges, Jr. and E. L. Lehmann, Basic Concepts of Probability and Statistics, Second Edition
George F. Carrier, Max Krook, and Carl E. Pearson, Functions of a Complex Variable: Theory and
Technique
Friedrich Pukelsheim, Optimal Design of Experiments
Israel Gohberg, Peter Lancaster, and Leiba Rodman, Invariant Subspaces of Matrices with Applications
Lee A. Segel with G. H. Handelman, Mathematics Applied to Continuum Mechanics
Rajendra Bhatia, Perturbation Bounds for Matrix Eigenvalues
Barry C. Arnold, N. Balakrishnan, and H. N. Nagaraja, A First Course in Order Statistics
Charles A. Desoer and M. Vidyasagar, Feedback Systems: Input-Output Properties
Stephen L. Campbell and Carl D. Meyer, Generalized Inverses of Linear Transformations
Alexander Morgan, Solving Polynomial Systems Using Continuation for Engineering and Scientific Problems
I. Gohberg, P. Lancaster, and L. Rodman, Matrix Polynomials
Galen R. Shorack and Jon A. Wellner, Empirical Processes with Applications to Statistics
Richard W. Cottle, Jong-Shi Pang, and Richard E. Stone, The Linear Complementarity Problem
Rabi N. Bhattacharya and Edward C. Waymire, Stochastic Processes with Applications
Robert J. Adler, The Geometry of Random Fields
Mordecai Avriel, Walter E. Diewert, Siegfried Schaible, and Israel Zang, Generalized Concavity
Rabi N. Bhattacharya and R. Ranga Rao, Normal Approximation and Asymptotic Expansions
P
Empirical Processes
with Applications
to Statistics
6 a
Galen R. Shorack
Jon A. Wellner
University of Washington
Seattle, Washington
II1p
Society for Industrial and Applied Mathematics
Philadelphia
Copyright © 2009 by the Society for Industrial and Applied Mathematics
This SIAM edition is an unabridged republication of the work first published by John Wiley
& Sons, Inc., 1986.
10987654321
All rights reserved. Printed in the United States of America. No part of this book may be
reproduced, stored, or transmitted in any manner without the written permission of the
publisher. For information, write to the Society for Industrial and Applied Mathematics,
3600 Market Street, 6th Floor, Philadelphia, PA 19104-2688 USA.
Research supported in part by NSF grant DMS-0804587 and NI-AID grant 2R01 AI291968-04.
Tables 3.8.1, 3.8.2, and 9.2.1 used with permission from Pearson Education.
Tables 3.8.1, 3.8.4, 3.8.7, 5.3.4, 5.6.1, 24.2.1, and 24.2.2 used with permission from the Institute
of Mathematical Statistics
Tables 3.8.5, 5.6.1, and 9.2.1 and Figures 5.7.1(a) and 5.7.1(b) used with permission from the
American Statistical Association.
Tables 3.8.6, 5.6.1, and 5.7.1 and Figures 5.7.1, 5.7.2, and 18.3.1 used with permission from
Oxford University Press.
Tables 5.3.1, 5.3.2, 5.3.3, and 5.6.1 used with permission from Wiley-Blackwell.
Tables 9.3.2 and 9.3.3 copyrighted by Marcel Dekker, Inc.
To VERA,
WITH THANKS FOR YOUR
LOVE AND SUPPORT.
-JAW
Short Table of Contents
Statistics, 235
The Darling-Sukhatme theorem for K(s, t) =K(s, t)—
E;' cp (s)cp (t); Tables of distributions; Normal, exponential,
; ;
J and J_.
f_.
4. Processes of the Form h dU,(F) h dW„(F), 282
Reduction of 5”. h dW„(F) to the form $ h dNA„;
Existence of f _ h dU(F) and
W J_
h dW(F); Convergence
in metrics; Some covariance relationships among the
processes; Replacing h by h„
5. Processes of the Form F,, dh, ji v„(F) dh, and
.
exponential bounds
CONTENTS xix
4. In-Probability Linear Bounds on G, 418
5. Characterization of Upper-Class Sequences for IIG/III and
III /G„ II L:,, 420
Shorack and Wellner's characterization of upper class
sequences for IIG./III and I I /G„ II , ,; Chang's WLLN type
result for I) • II ä, °'; Wellner's restriction to I) I) ä with a„ --> 0;
Mason's upper class sequences for II n '(G„ — I )/
[I(1 —I)]' "II with 0 <_ v <_z
6. Almost Sure Nearly Linear Bounds on 6„ and 6 „', 426
Bounding 6„ and G ' between (1 t e)t' $s ; James' boundary
weight i/i = l /(1og 2 (e e / I) for III /G„ II',,,; Nearly linear
bounds with logarithmic terms
7. Bounds on Functions of Order Statistics, 428
/
Mason's upper class result for max {ig( n:; na,,: 1 i < k„}
)
0. Introduction, 621
1. Bounds on the Magnitude of (jU„/ q ^I T( ., b) , 621
2. Weak Convergence of U. in / q II w Metrics, 625
3. Indexing by Continuous Functions via Chaining, 630
0. Introduction, 842
1. Simple Moment Inequalities, 842
Basic, Markov, Chebyshev, Jensen, Liapunov, C r, Cauchy-
Schwarz, Minkowski, estimating EI X I
2. Maximal Inequalities for Sums and a Minimal
Inequality, 843
Kolmogorov, Monotone, Häjek-Renyi, Levy, Skorokhod,
Menchof; Maximal inequality for the Poisson process; Weak
symmetrization, Levy; Mogulskii's minimal inequality; The
continuous monotone inequality of Gill- Wellner
3. Berry- Esseen Inequalities, 848
Berry- Esseen, with generalizations; Crämer's expansion;
Esseen's lemma; Stein's CLT, a special case
4. Exponential Inequalities and Large Deviations, 850
Mill's ratio for normal rv's; Variations on P(S„ /s„ > A) =
exp (— A 2 /2); Bennett, Hoefding, and Bernstein inequalities;
Kolmogorov's exponential bounds; Large deviation theorems
of Chernoff and others; The Poisson example; Properties of
moment-generating functions
5. Moments of Sums, 857
von Bahr inequality; rth mean convergence equivalence;
Burkholder's inequality; Marcinkiewicz and Zygmund
equivalences, with variations; Horn ich's inequality
6. Borel-Cantelli Lemmas, 859
Borel-Cantelli, Renyi, and other variations
7. Miscellaneous Inequalities, 860
Events lemma, Bonferoni inequality; Anderson's inequality
8. Miscellaneous Probabilistic Results, 862
Moment convergence; -i,, is equivalent to -> as, on subsequen-
ces; Cramer- Wold device; Formulas for means and covari-
ances; Tail behavior of F when moments exist; Vitali's
theorem; Scheffe's theorem with applications
9. Miscellaneous Deterministic Results, 864
Stirling's formula; Euler's constant; Some implications of
convergence of series and integrals; A discussion of the sub-
sequence n; ° ( exp (aj/log j)) used in upper class proofs;
Integration by parts
xxviii CONTENTS
CHAPTER 3
CHAPTER 5
CHAPTER 9
CHAPTER 22
CHAPTER 24
1
a t t) 783
24.2.2 g 1 (a)/ 1 (a) and g2(a) /S2(a) 784
Preface to the Classics Edition
This edition of our book is the same as the 1986 Wiley edition, but with a long list
of typographical and mathematical errors appended. A somewhat shorter version
of this list has been posted on Wellner's University of Washington Web site for
many years; the current list has resulted from adding all the additional errors of
which we are currently aware.
We owe thanks to N. H. Bingham, Miklos Csörgö, Sandor Csörgö, Kjell Dok-
sum, Peter Gaenssler, Richard Gill, Paul Janssen, Keith Knight, Ivan Mizera, D.
W. Müller, David Pollard, Peter Sasieni, and Ben Winter for pointing out errors,
difficulties, and shortcomings.
Although our book focuses almost entirely on empirical processes of real-
valued random variables, and much progress on general empirical process theory
has been made in the two decades since then, it seems that the book is still valuable
as a reference for many of the one-dimensional results and for the useful collection
of inequalities. We are very pleased to see it reprinted in the SIAM Classics Series.
During the 1970s and 1980s the general theory of empirical processes for
variables with values in an arbitrary sample space X began developing strongly,
thanks to the pioneering work of Dick Dudley, Evarist Gine, Vladimir Koltchin-
skii, Mike Marcus, David Pollard, and Joel Zinn, and this has continued since with
notable contributions from Michel Ledoux, Michel Talagrand, Pascal Massart, Ken
Alexander, and many others. This more general theory is covered by several books
and monographs including van der Vaart and Wellner (1996), de la Pena and Gine
(1999), Dudley (1999), Ledoux and Talagrand (1991) van de Geer (2000), and
Pollard (1984, 1990).
Our book gave statements of 19 "Open Questions." To the best of our knowl-
edge, 11 of these have been solved. We give a list of the problems and the solutions
of which we are aware at the end of the Errata list.
Xxxi
The Uniform Song
There are continuous distributions,
discrete ones too.
Some are heavy tailed,
and some are skew.
There are logistics and chi squares,
but these we will scorn,
'Cause the loveliest of them all
is the Uniform.
xxxiii
Preface
The study of the empirical process and the empirical distribution function is
one of the major continuing themes in the historical development of mathemati-
cal statistics. The applications are manifold, especially since many statistical
procedures can be viewed as functionals on the empirical process and the
behavior of such procedures can be inferred from that of the empirical process
itself. We consider the empirical process per se, as well as applications to tests
of fit, bootstrapping, linear combinations of order statistics, rank tests, spacings,
censored data, and so on.
Many of the classical results for sums of iid rv's have analogs for empirical
processes, and many of these analogs are now available in best possible form.
Thus we concern ourselves with empirical process versions of laws of large
numbers (LLN), central limit theorems (CLT), laws of the iterated logarithm
(LIL), upper-class characterizations, large deviations, exponential bounds,
rates of convergence and orthogonal decompositions with techniques based
on martingales, special constructions of random processes, conditional Poisson
processes, and combinatorial methods.
Good inequalities are a key to strong theorems. In Appendix A we review
many of the classic inequalities of probability theory. Great care has been
taken in the development of inequalities for the empirical process throughout
the text; these are regarded as highly interesting in their own right. Exponential
bounds and maximal inequalities appear at several points.
Because of strong parallels between the empirical process and the partial
sum process, many results for partial sums are also included. Chapter 2 contains
most of these.
Our main concern is with the empirical process for iid rv's, though we also
consider the weighted empirical process of independent rv's in some detail.
We ignore the large literature on mixing rv's, and confine our presentation for
k- dimensions and general spaces to an introduction in the final chapter.
We emphasize the special Skorokhod construction of various processes, as
opposed to classic weak convergence, wherever possible. We feel this makes
for simpler and more intuitive proofs. The Hungarian construction is also
considered. It is usually more cumbersome for weak convergence results, since
there is no single limiting Brownian bridge. However, it can be used to provide
strong limit theorems even though the Skorokhod construction cannot.
xxxv
Xxxvi PREFACE
The book is intended for graduate students and research workers in statistics
and probability. The prerequisite is a standard graduate course in probability
and some exposure to nonparametric statistics. A reasonable number of exer-
cises are included. In some cases we have listed as exercises results from
themes we have not pursued.
The following convention is used for cross-referencing theorems,
inequalities, remarks, and so on: Theorem 1 in Section 2 of Chapter 3, for
example, will be referred to as simply Theorem I only within Section 2, but
will be referred to as Theorem 3.2.1 everywhere else.
We have had fruitful discussions with many individuals concerning topics
in this book. It gives us great pleasure to thank Peter Gaenssler, Richard Gill,
Jack Hall, David Mason, David Pollard, and Winfried Stute in particular.
Reactions from students and colleagues who took a Fall 1983 course by
Shorack were also helpful; comments from Sue Leurgans and Barbara
McKnight led to improvement of parts of Chapters 3 and 4.
We owe special thanks to Tina Victa, Linda Chapel, and DeAnne Carr for
their expert typing of the manuscript and its several revisions, and to Ilze
Shubert and Anne Sheahan for preparation of the tables and figures.
Finally, much of the research for writing of this book has been supported
by grants from the National Science Foundation.
GALEN R. SHORACK
JON A. WELLNER
Seattle, Washington
October 1985
Acknowledgments and
Sources of Tables
G.R.S.
J.A.W.
List of Special Symbols
INTERDEPENDENCE TABLE
/1\
^^ a
ections Sections
.1-2.4 2.5-2.11
4 5 6 18 10
\ 1 1 1
\ Appendix 1 9 11
25 7 i 20 13
I T
21 14
22 15
23 16
24 17
1
26
x 1i
CHAPTER 1
Introduction and
Survey of Results
which assigns mass 1/n to each data value X is called the empirical df of
;
X 1 . . , X,,; here 1, denotes the indicator function of the set A. We will see
,.
This fact has been calledt "the existence theorem for statistics as a branch of
applied mathematics" and also# "the fundamental theorem of statistics." It
implies that the whole unknown probabilistic structure of the sequence can,
with certainty, be discovered from the data. Thus the study of the empirical
df F. has been an important and continuing theme in statistical literature.
The natural normalization of F. leads to the empirical process defined by
t Pitman, E. (1979).
1 Loeve, M. (1955).
1
2 INTRODUCTION AND SURVEY OF RESULTS
Example 1. (Classic sums of lid rv's) Let us briefly review a bit of standard
classical probability and statistics. Let X, X,, X2 ,... be independent and
identically distributed (iid) where
(7 2 );
that is, the distribution of X is taken to have meanjL and variance o.2 . We let
n ;_,
(4) X„ -"a.s. M- as n - o0
(this actually holds with no assumption about the variance of X). Thus the
mean can be accurately estimated from the data with increasing certainty as
n -*ao.
An assessment of the accuracy of this approximation is provided by the
central limit theorem (CLT) which states that
The convergence in (5) is known to be rather rapid when EIXI 3 <00, in that
we then have the Berry- Esseen result
[v/-n- (g,,
(6) su p
) ] - ( ) I I 3 j J3
for b n = 2 log e n; M► means that for almost every w in the set fl of the basic
probability space (CI, s4, P) on which the rv's are defined, the sequence of real
numbers ^(Xn (w)—µ)/b n has as its set of limit points exactly the interval
[—o, a ]. The "other LIL" gives
-
k
(9) lim max (Xk —µ) 6n = - a.s.;
n-.co lsksn V n i =1 2
we refer the reader to Jain and Pruitt (1975) for an updating of this theme
begun with Chung (1948).
Another result on the convergence of X n to 0 is provided by Chernoff's
(1952) large deviation result, for which we refer the reader to Bahadur (1971).
If X 1 , X2 ,... are arbitrary lid rv's, then
where
P n bn
(12) P(Xn >_ 0) = — ^ where 0 < lim b n : lim bn < oo.
Theorem 1. (The inverse transformation) Let = Uniform (0, 1). For a fixed
df F, define its left continuous inverse by
(14) X=F-'(6)=—F.
4 INTRODUCTION AND SURVEY OF RESULTS
In fact,
(15) [X^x]=[esF(x)]
or
(15 ") 1[X <x] =1[r<F(X_)] for all —oo < x <oo.
Proof. Now 6< F(x) implies X = F '(6) <x by (13). If X = F '(6) <_ x, then
- -
F(x + e) >_ e for all e > 0; so that right continuity of F implies F(x) - 6 (see
Häjek and Sidäk, 1967). We have thus shown that (15) holds; that is, the
events are equal. [We only need P(ec (0, 1]) = 1 for this proof, so that the
result can be generalized to other than a Uniform (0, 1) ry f. ]
If e= t where F(x) = t is satisfied by no x's or exactly one x, then (15 ")
holds. If = t where F(x) = t is satisfied by at least two distinct x's, then (15 ")
fails. O
Theorem 2. The sequences of random functions F,, and G„(F) on ( —m, oo)
have identical probabilistic behavior; we denote this by writing
Likewise,
that is, these have identical joint distributions. This is clear since nF „(x ; ) and
DEFINITION OF THE EMPIRICAL PROCESS 5
nG„(F(x; )) both have Binomial (n, F(x ; )) distributions; the vectors in (19)
have identical "cumulative multinomial" distributions. Clearly, the joint
behavior in n is likewise identical. ❑
This completes the main thrust of this section. However, we will now list
several results in the spirit of Theorem 1 that will have some usefulness later.
Note from (15) that for any df F and any 0< t <1
with equality failing if and only if t is not in the range of F on [—oz, cc]. Thus
F - F ' is the identity function for continuous df's F. See Figure 1 c.
-
Proof. Let x = F(t) in (21). Just consider separately is in and not in the
range of F. ❑
with equality failing if and only if t is not in the closure of the range of F.
Thus if F is continuous, then e= F(X) is Uniform (0, 1). See Figure 2c.
6 INTRODUCTION AND SURVEY OF RESULTS
F -1 -F
z
(a)
-
F '°F(X) = ar. X
(b)
os - -t
0
(0)
1 •x
(a)
0.— S S
(b)
P(F(X) 5 t)
5-
0 V
0 1
(e)
Figure 2.
Proof. Let t = F(x) in (21). Just consider the two kinds of x's separately.
Exercise 2. Show that F ' o F o F"' = F ', F o F ' o F = F, and (F ') ' = F.
- - - - -
since F - ' o F(X) = X unless F(X — e) = F(X) for some e >0 by Proposition
3. Thus (a) implies
(b) P(X <_ x) = F(x) unless F(x — e) = F(x) for some e > 0.
Proof. Easy. ❑
0 --
—> x
(a) F`' -F
F ;I -F(x) = x fails
if and only if F(x) = F(x—)
for some s > 0 .
(b)
ii
of
0 (o) 1 t
F '(t+h)
- — F '(t)
- k
h F(x +k) —F(x)
Several of the theorems of this section are recorded in Häjek and Sidäk
(1967, pp. 33-34, 56). See also Withers (1976) and Parzen (1980) among what
must be a multitude of sources.
Exercise 4. Let F+'(t) = sup {x: F(x) <_ t }. Show that F (f) = F for a Uni-
form (0, 1) ry 6. Determine when F o F +' (t) = t fails. See Figure 3. Show that
for a.e. w we have
(28) F„ -* d Fo as n - oc.
Define rv's X by *,
(29) Xn*= F„'(4) forn?0
(30) X = F„
Proof. Fix 0< t < 1. Let s > 0.be given. Choose x such that
with F continuous at x. The second inequality of (a) gives F(x) < t. Thus
F,, (x) < t for all n exceeding some N, and thus F„' (t) > x for all n ? N. Hence
lim F(t) ? x> F '(t) - e by the first inequality of (a), so that
-
with F continuous at y. Thus t < t' < Fe F'(t') < F(y) by the first half of (c),
where Proposition 1 was used for the first :. Thus t <_ Fn (y) for all n exceeding
some other N; hence F„'(t) <_ y for all n > N. Thus lim F-,' (t)<y < F ' (t') + e
-
by the second half of (c), implying lim F(t) - F ' (t'). Thus
-
n- co
Since F ' is 7, it is continuous a.s. Thus (32) is just a statement that (31)
-
holds [recall (29)]. This proof can also be found in Billingsley (1979). ❑
In the remainder of this chapter we will survey some of the main results in
this book. This should give the flavor of what is to follow.
We begin by recalling that nG,,(t) is a sum of n lid Bernoulli (t) rv's. Thus
Example 1.1.1 implies
as n -* oo by (1.1.8 ),
where
a +t+(1 l—a —t
(a+t)log —a — t)log if0:a^1 —t
(6) g(a,t)= t 1—t
[ cc ifa>1 —t.
From Chapter 3
(nit—A)) n ) ( ,+ i t
( 8 ) P(II(G —I)*II>A)=Eo A ( —A--/
i n/ \1
for0<A <1.
In view of (9), the factor 2A 2 in the exponent of (13) is the best possible rate.
Mogulskii (1980) obtained the other LIL by showing that
The exact probability (8) converges to the limiting value (9). Is a more
accurate approximation possible? Asymptotic expansions by Chan Li-Tsian
recorded in Gnedenko et al. (1961) show
All of the results for IIQJ„II in this section carry over to the uniform quantile
process N„ ° n [G V I] because of the fact, obvious from Figure 3.1.1, that
(19) IIV H = lIU„ II•
The quantile process is dealt with systematically in Chapters 12 and 18.
From Chapter 9
We begin by considering the exact probability that G. lies between two >'
curves g h having g(0) 0 <_ h(0) and g(1) < 1 <_ h(1). Steck (1971) estab-
lished that
_G„(t) <_h(t) for all0 t-1) = P(a <_ ,<_b for l<_isn)
P ( g ( t )< ; ;
where
(x) + x v 0, and where it is understood that all elements of the matrix having
subscripts i > j + 1 are zero. Even more useful than (1) are the recursion
relations found in Section 9.3.
For various special cases, (1) can be evaluated exactly. An especially
interesting result due to Daniels (1945) is
From Chapter 3
(3) Un -'fd. U as n - x
for a Brownian bridge U (see Section 2.2 for the definition of U). However,
this mode of convergence is not strong enough to yield the Mann-Wald
theorem; that is, it does not follow from (3) that h(U) -> d h(U) for ^1 ^^-
continuous functions h. The concept of weak convergence, =., was designed
to fill this need (we leave the precise definition of = until Chapter 2). In a
landmark paper, Doob (1949) suggested heuristically that
(4) U = U„ as n -^ 00,
(5) h(U) h(U) as n -> oo for all h that are II 11- continuous a.s. U.
Doob's conjecture had been guided by his observation, based on earlier work
of Bachelier, that
and
From Chapter 5
(8)
0 f
U ,(t) dt -> d U Z (t) dt
o ,
as n -+ ao;
just note that h(f) = J'o f2 ( t) dt is Il- continuous. The trick is to determine
the distribution of h(U); the solution of this problem for the h in (8) leads to
some particularly fruitful methodology. This is explored in the next few
paragraphs (see Kac and Siegert, 1947).
RESULTS FOR THE RANDOM FUNCTIONS 6,, AND U. ON [0, 1] 15
where
and where the series (9) converges uniformly and absolutely. (This is done
via Mercer's theorem, which is the natural analog of the principal axis theorem
for covariance matrices.) Then the principal component rv's
(14)
J o
QJ Z (t) dt =
i =t
Z; J f,?(t) dt =
o
Durbin and Knott (1972) applied the same approach to U, to obtain the
representation
m
(15) Y_ Zj(t)-[v„(t)+U8(t—)]/2 asm-^co for each0^t<_1,
that are "nearly normal" for n of about 20. They further indicated how these
variance normed principal components Z„ can be used in a fashion analogous
to tests of fit. Further,
that h(Skorokhod's U) -> a. h(U) for any 11 1I- continuous functional h, and
,.
since h(Skorokhod's U) = h(any U„), one obtains immediately from (18) the
result (5) that h(any U„) > d h(U). So far, (18) has only provided an alternative
proof of (5). In what way is it really superior to (4)? First, it can be understood
and taught more easily than (4). Second, it is often possible to show that
h(Skorokhod's U„), or even h(Skorokhod's v„), a . s . h(U) and to thereby
establish the necessary 11 fl- continuity of h in a fashion difficult or impossible
to discover from (4). (Examples will be seen in the chapters on linear combina-
tions of order statistics and rank statistics.) Given that Skorokhod's construc-
tion is based on a triangular array, we know absolutely nothing about the joint
distribution of Skorokhod's (U 1 , U2,...). Thus his construction can be used
to infer d or -' of h(any U „), but it is helpless and worthless for showing a.s_
The Hungarian construction (begun in Csörgö and Revesz, 1975a) and
fundamentally strengthened by Komlös et al. 1975), improves Skorokhod's
construction in that it only uses a single sequence of Uniform (0, 1) rv's and
a Kiefer process K (see Section 2.2 for the definition of the Kiefer process)
on a common probability space that satisfy
.fi n
(19) —B,, <x a.s. for the Hungarian construction;
li (log n) 2 JJU„
as in (2.2.11). Since h(I3„) = h(U), this construction also yields (5). It is also
CONVERGENCE OF U,, IN OTHER METRICS 17
capable of yielding -* 8.5. for the original sequence, though the subscript n on
B. may make the problem difficult. Its real value is in the rate it establishes.
From Chapter 13
We now turn to LIL-type results. Finkelstein (1971) showed that (see Section
2.8 for the definition of relative compactness M ►)
(22)
J o U(t) dt/b
{J
h 2 (t) dt: h E gel =[0, 1/]
as n - oo;
thus establishing a LIL for the Cramer-von Mises statistics J U (t) dt; of
course, the upper bound in (22) was provided by solving a calculus-of-
variations problem. Cassels (1951) showed that for E >0 and a.e. m there exists
an NE such that for all n > NE.w the increments of U,, satisfy
U„(t)+P(t) dt,
(1) J0
ci dt — J va/r dt
(2)
fo v ,
n
0
- II(vn — U)/9Il f q^G dt.
0
18 INTRODUCTION AND SURVEY OF RESULTS
b„ = 2 log e n,
From Chapter 3
(9) sup {jU (s, t]—U(s, t]l/q(t—s): It—sI> en - ' log n} -* p 0 as n-+oo
Chapter 15
(2) lim n"4llB„ ll/ b„ log n =1/f a.s. where b,, = 2 log t n,
n-.m
From Chapter 14
From Chapter 16
satisfies Z(t) = (0, 1) for all t. Let S denote Brownian motion on [0, co) and
define
—t)s(1! t )
(7) Z(t)— (l t) t — t(1—t) (1 =x\21og (i 7))'
(8) b(r)IIXIIöe" —
c(r) - d E4 as r -4,
where
(9) b(r)= 21og2 r, c(r)=2log, r+2 - ' log 3 r-2 ' /2 log4ir,
-
and thus
+ j/
(13) kmm
I I nllo _
log s a.s.
n +ao
SURVEY OF OTHER RESULTS 21
Csörgö and Revesz (1975b) posed the problem of considering IIZ 2/bn as
a n . 0. Shorack (1977, 1980) showed that if a n = (c n log, n )/ n defines c,,, then
A more complete solution to (15), in which the lim is evaluated for various
a n , was obtained by Csäki (1977).
Treatment of the empirical process via Poisson process methods are motivated
in Chapter 8 and applied in parts of Chapters 9 and 14. Many first proofs of
results presented in other chapters also used these methods.
Extensions to the case of independent but not identically distributed rv's are
considered in Chapter 25. Local alternatives of this sort are considered in
Chapter 3.
From Chapter 7
From Chapter 26
Much of the research in the theory of empirical processes during the past 10
years has aimed to generalize the results surveyed in this chapter to higher-
dimensional observations and more abstract sample spaces. Some of these
results are summarized in our final chapter.
CHAPTER 2
0. INTRODUCTION
For the case of a sequence of lid rv's, major theorems are the central limit
theorem (CLT) and the law of the iterated logarithm (LIL). The natural analogs
of these results yield weak convergence (^) and relative compactness (-.)
of partial-sum processes and of empirical processes.
In Section 1 we present the basic spaces (C, 16) and (D, ) on which our
processes will be defined and their convergence studied. The most important
of the limiting processes are introduced in Section 2; their interrelationships
are discussed and some interesting results involving them are recorded.
Section 3 defines =, illustrates its utility, and presents criteria for its
verification. Donsker's theorem on the weak convergence of the partial-sum
processes S. to Brownian motion S is established in Section 4. Special
constructions of S„ that converge a.s. are summarized briefly at the end of
Section 4; then Skorokhod embedding is presented in Section 5 and the
Hungarian construction is presented in Section 7, after the Wasserstein distance
it uses is discussed in Section 6.
The general definition of M► is given in Section 8. Since the simplest versions
of this concept are less familiar, some examples are presented, and the relation-
ship to the classic LIL is carefully discussed. Strassen's theorems on the relative
compactness M► of scaled Brownian motion S(nI)/ and of the partial-sum
process S. are presented in Section 9.
In this chapter our aim is to learn how to use the major theorems of =
and M►. Thus the Skorokhod-Dudley-Wichura theorem (Theorems 2.3.4), the
weak convergence criteria of Theorem 2.3.6, Strassen's theorems (Theorems
2.5.2 and 2.9.1), and the Hungarian construction of Section 7 are presented,
discussed, and used, but will not be proved. Enough is proved, however, so
that the reader should be able to understand most of the fine detail of literature
that uses these major results. Roughly, we have used partial sums for illustration
23
24 FOUNDATIONS, SPECIAL SPACES, AND SPECIAL PROCESSES
in the present chapter, while the corresponding results for empirical processes
will be considered in greater detail in later chapters.
There are a few additional results about partial sums that need to be recorded
for use in later chapters. Section 10 considers the supremum of normalized
Brownian motion S(t)/ . Section 11 is devoted to various forms of the LLN.
General Concepts
Then for any Borel subset in the class 913 k of Borel subsets of R k , the set
r^^' fk (B) is called a finite-dimensional subset of M.
Exercise 2. Let 1i = -98, denote the Borel subsets of the real line R. Then X
is a process on (M, .41) if and only if X(-, w) e M for all w E f and X(y)
is a ry on (R, ) for each t E J. [X is called a random variable (rv) on (R, A)
if and only if X(B) e .si for all Ba 90.]
all F E ,M, defines a probability on (M, Al) called the induced probability. If
two transformations X and Z induce the same probabilities, then we call them
equivalent random elements and again write X =Z.
If all of the finite-dimensional distributions of a process X [i.e., the distribu-
tions induced on (R k , k ) by Tr,,, k (X)] on a measurable function space
(M, At) are multivariate normal, then X is called a normal process.
Suppose X, X,, X2 ,.... denote processes on (M, .tit ). If the convergence in
distribution
holds for all k >_ 1 and all t,, ... , t k , then we write Xn - f.d _ X as n -* co; and
we say that the finite-dimensional distributions of X. converge to those of X.
J X
G(X(tw)) dµ(w) =
'(A) f A
g(x) dv(x)
in the sense that if either integral exists, so does the other and they are equal.
Hint: Prove it for indicators, then simple functions, nonnegative functions,
and integral functions. See Lehmann (1959, p. 39).
(ii) Suppose rv's X = F and Y= G are related by G(K)=F and X=
K '(Y), where K is ‚' and right continuous on the real line with right
-
and Y = 6 gives
Metric Spaces
Let (M, 5) denote an arbitrary metric space and let .tit s denote its Borel field
(i.e., the a-field generated by the collection of all 5 -open subsets of M). Let
4ts denote the a-field generated by the collection of all open balls, where a
ball is a subset of M of the form {y: S(y, x) < r} for some x E M and some r> 0.
Exercise 5. ‚U ä c ,u s , while
For functions x, y on [0, 1] define the uniform metric (or supremum metric) by
Let C denote the set of all continuous functions on [0, 1]. Then
Now `III II denotes the o -field of Borel subsets of C, denotes the o -field
of subsets of C generated by the open balls, and 16 denotes the Q -field
generated by the finite-dimensional subsets of C. It holds that
Now III II denotes the Borel o -field of subsets of D, III II denotes the cr-field
of subsets of D generated by the open balls, and 2 denotes the o -field
RANDOM ELEMENTS, PROCESSES, AND SPECIAL SPACES 27
(5) o
and moreover
We now digress briefly. The proper set inclusion of (5) was the source of
difficulties in the historical development of this subject. To circumvent these
difficulties, various authors showed that it is possible to define a metric d on
D (see Exercise 8 below) such that
(8) 2d = 2.
while
Let q > 0 denote a function on [0, 1] that is positive on (0, 1). For functions
x,yon[0,1]we
when this is well defined (i.e., when lim jx(t)-y(t)j/q(t) is finite for t
approaching 0 and 1).
Let T = [0, co) x [0, cc) with t = (t,, t2 ) a typical point. Each t determines four
quadrants by intersecting the half-spaces s, < t l and s, >_ t, with the half-spaces
S2 < t2 and s 2 ? t2 . We label these quadrants Q( <, < ), Q(<, >_), Q( >_, <),
Q(?, ?). D T denotes the collection of all real-valued functions x on T for
which for each t E T
and if s ^ I we let
We take as well known the existence of a normal process S on (C, `6) having
mean-value functions ES(t) and covariance function Coy [S(s), 5(t)] given by
We call S Brownian motion on [0, 1]. (See Exercise 15 below for its existence.)
Note that if 0= t o < 1 1 <• • •<t k+ ,=1, then S(t,) — S(to)....,S(tk+1) — S(tk)
are independent rv's distributed as N(0, t, — t o ), ... , N(0, t k+ , — t k ) for all k ? 1.
Also S(0) —0.
30 FOUNDATIONS, SPECIAL SPACES, AND SPECIAL PROCESSES
We will call X the Uhienbeck process. Note that Var [X(t)J = 1 for all In R.
We also take for granted the existence of a normal process S on
(CR'XR * , CR * xR * ) satisfying
(4) ES(s,, t,) =0 and Coy [S(s,, t,), S(s 2i t 2 )] = (S i A s 2 )(t 1 A 12)
for all s,, s2 >_ 0 and 0 <_ 1, 12 <_ 1. Its existence follows from Exercise 12 below.
Relationships between these processes are shown in the following exercises.
Because of Exercises 2.1.2 and 2.1.3, it is clear that all the transformations
below lead to normal processes on the appropriate (CT, 16 T ). It is trivial to
verify that the resulting mean-value functions are zero so these exercises merely
consist of verifying that the resulting covariance functions are correct.
0. 5
0. 0
—0.
—1. 0
5
0.0 0.2 0.4 0.6 0.8 1. 0
(a)
1.5
1.0
N
0.5
0.0
—0.5
—1.0
—1.5
0 0 0.2 0.4 0.6 0.8 1. 0
(b)
is Brownian bridge on (C, 16). [It is true that S(r)/r-+ a.5 .0 as r-> x ; you may
use this fact. This fact is a consequence of Theorem 2.8.1.]
32 FOUNDATIONS, SPECIAL SPACES, AND SPECIAL PROCESSES
where U is Brownian bridge on (C, 16). Then Z is a Brownian motion on [0, 2],
while Z(t)= 1'(1—t) fort <_1. Also
is Brownian bridge.
Exercise 7. Let U be a Brownian bridge on (C, 16) and let Z denote a N(0, 1)
ry independent of U. Then U + IZ is Brownian motion on (C, `6). [Recall that
I(t) = t is the identity function on [0, 1].]
Exercise 8. Let S denote Brownian motion on [0, ax). Then the time-reversal
process tS(1 /t) for t >0 is again Brownian motion. [It is true that S(r) /r -* a , s .0
as r-' oo; you may use this fact. This fact is a consequence of Theorem 2.8.1.]
t(^X(Ilogi
t t ) for0<t<1
v(t) _
0 for t = 0 or 1
is a Brownian bridge for each s>0. Let B(0, t) = 0 for 0 s t < 1. We call B the
Brillinger process.
(t)—
J U(r)r ' dr
- for0^ t<1,
are all Brownian motions on (C, 16). [Note (3.4.47), (3.4.48), and (6.1.24).]
f sin (kart)
U(t) = Zk for0<t<1.
k-, kir
Show that U is a well-defined random element on (C, 16) that satisfies our
definition of Brownian bridge. Likewise (or use Exercise 7),
'i2sin (kTrt)
S(t)=Zo t+ Zk for0<—ts1
k=1 kir
=
f 1 b exp
o ^27rS3
(_b
Z f ds
2s/
for all b> 0
= F (t) F
=1--
7r k=o 2k+ 1
exp /— (2k 8b^ ^2 2 f for all b > 0;
( ,mo o at +b
§(t)
(9) Pt
at +b
I§(t)I , 1 ^
Proof. We will give informal justification for some of results (6)-(13) based
on "reflection principles."
Consider (6). Corresponding to every sample path having 5(t)> b, there
are exactly two "equally likely" sample paths (see Figure 2) for which IS>
b. Since S(t) = N(0, t), Equation (6) follows. The theorem that validates our
key step is the "strong Markov property" (see Theorem 2.5.1); we paraphrase
it by saying that if one randomly stops Brownian motion at a random time r
that depends only on what the path has done so far, then what happens after
time 'r as measured by {S(-r+ t) — S(r), t >_ 0} has exactly the same distribution
as does the Brownian motion {S(t): t?0}. In the argument above, r was the
first time that S touches the line y = b. Change variables to get the second
formula.
y
b
0
t Figure 2.
BROWNIAN MOTION AND VARIATIONS 35
A second reflection
3b
2b
ç:::ection %.
or path
—b
Figure 3.
(a) P(IISIIö> b)
= P(A + ) — P(A+-)+ P(A+-+) —.. .
+ P(A) — P(A-+) + P(A-+-) — .. .
(b) =2[P(A+) —P(A+-)+P(A+-+)—• • •] by symmetry
(d) ¢(a, b)° P(S(t)-at+b for some t?0) where a>_0, b>0.
Now note that
b, + b 2
b,
0
t
Figure 4.
because at the instant T when S first hits the line at+ b,, we then have the
same problem all over again where the equation of the original line at + (b, + b2 )
relative to the point (T, S(-r)) has become at +b2 (see Figure 4). This is again
rigorized by the strong Markov property. Also, 4(a, b) ? P(S(l) ? a + b) > 0
and 4(a, b) is \ in b. The only solution in b of the functional equation (e)
having these properties is
In order to solve for ifr(a) we consider Figure 5. Let r =inf {t: S(t) = b }. Now
Note from Figure 5 that once S intersects y = b at time r, the event that is
then required has probability 4(a, aT) by the strong Markov property. We
thus have
exp (—iIr(a)b) = 4(a, b) by (f)
(g) = E [¢(a, ar)]
T by the strong Markov property
J
(h) = o" exp (-4i(a)as) —b exp( - ---- 'I ds
ds by (f) and (14)
o 1s 3 /Z J
^^r
2
f o exp $ — a ^2y 2 b2 +y 2 )
tting
dy le y2= 2s —
Figure 5.
and
P(-a(1-t)-bt<_UJ(t)<a(1-t)+bt for 0^t<1)
Exercise 18. (i) (Doob, 1949) Show that for a, c >_ 0 and a, ß ? 0
where
(ii) (Häjek and Sidäk, 1967) Now show that for all a, b, a, ß >0
— exp —
C d2Dk
—c /J
BROWNIAN MOTION AND VARIATIONS 39
where
_ Y_
(22) P(1IU 11 < a and IIU II <_ b)
-
k=—ao
exp (- 2k 2 (a+b) 2 )-
Exercise 19. (Renyi, 1953) Show that for A >0 and 0<c<1
and
-OP((4k-3)A /iii)]
1-c
Here, 1 denotes the N(0, 1) df. The extension of these results from [c, 1] to
[c, d] also appears in Renyi (1953).
Exercise 20. (Billingsley, 1968, p. 79) Let a, b> 0 with —a < u < v < b. Then
giving
and
00
Exercise 21. Consider the boundaries shown in Figure 6 where a 1 > 0, a 2 > 0,
b, >— b 2 (but b, = b2 = 0 is disallowed), and 0< T s cx. Let Pi( T) denote the
probability that Brownian motion S on [0, co) hits the upper line and does so
prior to time T and prior to hitting the lower line. Then if b, ? 0 we have
0"
(30) p1(oo)= Z {exp(- 2 [k 2 a, b , — (k - 1) 2 a2b2 — k(k - 1)(a,b2 — a2b1)])
k=1
a,
-a 2
Figure 6.
BROWNIAN MOTION AND VARIATIONS 41
If b, = b 2 -O, then
See Anderson (1960) for formulas for p(co) when b, <0, for p 1 (T) in general,
and for this same probability conditional on the values of S(T). Other
generalizations are also given by Anderson.
Exercise 23. (i) Show that Z = max {s: S(s) = 0 and s <_ t} has
and
Exercise 25. Let a,b>0 and 0<c<1. Compute P(U(t) a(1—t)+bt for
c<—t<1).
42 FOUNDATIONS, SPECIAL SPACES, AND SPECIAL PROCESSES
2a(1—b)
_ — aresin
it b(1 —a)
Suppose
Suppose
(34) a.2
fo, fo, K(s, t) dH(s) dH(t) exists.
Proposition 1. Suppose almost all sample paths of the normal process X on
(C, '') are such that
(35)
fo, X(s) dH(s) exists a.s.
Under (32)-(34) we have that
(36)
fo, X(s) dH(s) = N(0, Q2 ).
(a)
9
1-6
X dH -> J
0
X dH.
WEAK CONVERGENCE 43
Moreover,
1 B
(b)
(c)
f o
X dH= n-•
n-W
with
(n(1-6)) (n(1-e))
K i –, H –1 – H —
1—1
(d) ^n =j =(ne)
E i=(ne) n n n n
x[H(n)–H(L )
nl J
1. 1-0 i B
(f) Q2 as 0 -o 0.
Clearly, N(0, Q„)-4 d N(0, Q Z (6)) as n - an, and N(0, o 2 (8))-* d N(0, 0- 2 ) as
0-+0. ❑
Var[Y]= J 1
0 f
o ,
[sAt–st]dH(s)dH(t)
also holds.
Hint: In the special case when H(1) = 0 we can write Var [ Y] = Coy [ Y, Y] _
t in Cl
j0 [l [o , t) (r)–s] dH(s) J o [l [o ,, ) (r) – t] dH(t) dr and use Fubini's theorem.
Note that l [o , r) (^) = 1 (£,, t (r) and Coy [1 [o , ․ ) (t ), 1 to ,, ) (e)] = S A t – st.
3. WEAK CONVERGENCE
(M, .'Gls ), and let P. denote the probability measure induced on (M, At:) by
X„. We want criteria which will imply that
or
For each m >_1 we let T,„_ {t,„ o = 0 < t,,, I <• • •<t k . =1 }, and define
mesh (Tm ) = max {t; — t,_ 1 : 1 <_ i ^ Q. We call the function A,„ (x) in D that
equals x(t,„ j ) at each t,„ , 0 - i _< km , and, is constant on each [ t„,, _,, tm; ),
; ;
then
using the triangle inequality via Figure 1 for the second inequality in (7).
Hence (4) is really a statement that the sample paths of the processes are "not
too wiggly." We could thus replace (4) by
EJ (Xn) — EJ (X0))
^I E{f(Xn) f(Am
— ° Xn )]I+IEf(A m ° Xn) — EJ(Am ° X0))
0 1 Figure 1.
46 FOUNDATIONS, SPECIAL SPACES, AND SPECIAL PROCESSES
One of the most common criteria for weak convergence on general spaces
is phrased in terms of tightness. A sequence X., n> 1, of (M, 3)- valued random
elements is tight if the corresponding induced measures are tight: for every
e> 0 there exists a compact set K c M such that
(11) P„(K)=P(X„EK)?l —e forall n?l
where P„, n -1, denote the induced probability measures on (M, X"). [See
Dudley (1966; 1976, Chapter 23) for a generalization of the following theorem
suitable for the nonseparable case when .4lä C 41S]
(12) Xn ^r.d.XO
- as n- +X
WEAK CONVERGENCE 47
and
then
(iv) lim PP (B) = P0 (B) for all sets B whose boundary dB has P0 (aB) = 0.
n-.00
A Special Construction
and
(16) P0 (MS ) = 1
for some .eft -measurable subset M. of M that is 5-separable
to a ry and satisfies
Proof. See Dudley (1976) for a very nice writeup. Skorokhod's (1956) funda-
mental paper required (M, S) to be complete and separable; in this case an
even more accessible proof is in Billingsley (1971). ❑
This theorem allows an easy proof of the fact that weak convergence does
indeed imply (1) for a wide class of functionals i/i, as we now show.
Let 0 ‚ = {x E M: si is not 8 -continuous at x }. If there exists a set ‚ E tits
having 0,^ c Q' and P(Xo E 0 *) = 0, then we say that 4i is a.s. 8-continuous
with respect to the Xo process.
t The conclusion of the previous theorem would still hold even if the special process Xö * was
replaced by a sequence of special processes Xö . that are equivalent and that satisfy
S(X**, Xö*„)-► 0 0 as n - oc. This could arise when we consider Hungarian constructions.
WEAK CONVERGENCE 49
Remark 2. Thus on (D, 1) we have the following result. Let each X„, n >0,
be a process on (D, ) with P(XO E C) = 1. Suppose X= X0 on (D, -9, II)
as n -> Qo. Then each X. may be replaced by an equivalent process X * (i.e.,
X n - X„) for which JI X „* — X O* 11 , 5. 0 as n -* oo. If II X„ — XoII - 0 as n - . is
given, then X„zX0 on (D, 2, II II) as n-*oo can be concluded; in this latter
case we will write either X„ Xo on (D, _9, II II) or ll X„ — Xö 1) -> p 0 as n -* m.
For this reason, the conclusion (20) will also be denoted by writing
Verification of (4)
(23) EZ(r, s] 2 Z(s, t] 2 < v(r, t] 2 for all 0<— r<_ s <— t <_ 1.
P(IIZ —
AmZII ^ P(wz(1/m)-E)
sup I Z(r, s]i <_ jZ(r, r+ 8]) + sup {IZ(r, s]) A ^Z(s, r+ S])}
rss,^r +s rss^r +s
Thus
Now consider any 0 ^ r s ^ 1 such that 0 <_ s — r <_ 1/ m. Then for some
0: i <_ j <m -1 with 0<_j—i<1
i i+1 j j+1
(f) —ter<_— and <_ss—;
m m m m
so that
Thus
m-1
P(wz (1/m)?6e)^ E P( sup IZ(k/m, k/m+s]I>_2e)
k=0 Ots^l/m
k —1 , k l 4 k-1
14 j EZ( +Kv( , k ' l by(e)
e k_1 t. \ \ m m
(h)'5i jEZ,
M
k1 ' k
+Kv(0, 1] max v (
{
E k=1 m
m E 1krn m
m J
valid for all m >— 1 and all s > 0. This proof is a special case of Vanderzaanden
(1980) D
The next theorem is one of the more useful weak-convergence results found
in the literature. It is essentially from Billingsley (1968, p. 128). Generalizations
to processes in higher dimensions appear in Bickel and Wichura (1971).
Basically, it allows for discontinuous limit processes.
then we only require (25) to hold for all r, s, t in T,,.] Suppose that is a
continuous measure on [0, 1] and either
(26) µ„(s,t]p(s,t] for all 0<_sSt<
_I and for all n>I
or
52 FOUNDATIONS, SPECIAL SPACES, AND SPECIAL PROCESSES
(29) EIX(s, t]j b <_ µ(s, t] ° for all 0 <_ s <_ t <_ I
Exercise 6. Show that (9) and (10) hold if for a finite continuous we have
(30) ►
1 m n EXn (s, t] 4 ^ µ(s, t] 2 for all 0 <— s <_ t ^ 1.
We will let
S(nt)
S(t)= ^_ for0^t<_1
n
Si i i+l
(4) _ for—<_t<—and0<_isn;
n n
•tom
X 2/.r'
'
i- ^ i
1 2 3 4 ••• n-1 1 t
n n is is is
mn=P(II§° — Am°Sn )
(a) <_ mP(IISn110 m> E)
(b) <— 2 mP(IS(n/m>I/''— E/ 2 )
by Levy and Skorokhod inequality (Inequality A.2.4)
= 2 mP(IS(n/m)I/ ..In/m > e.lm/2)
(c) -^2mP(N(0, 1)>e./m/2) as n-oc by the CLT
(d) -0 as m-+oo for all E>0.
Thus Theorem 2.3.6 applies, and yields S=S on (D, .9, II II )• The original
paper is Donsker (1951). El
54 FOUNDATIONS, SPECIAL SPACES, AND SPECIAL PROCESSES
Our next theorem spells out what the conclusion (5) means when reexpressed
in a most convenient form.
(a) II §n — °
§ 1Ha.g.0 as n -+oo.
=P(S„EY)
(c) =1.
Of course (d) implies S = 5,,, and (d) and (a) combine to give
Thus the theorem holds, where we suppress the *'s in its statement. ❑
WEAK CONVERGENCE OF THE PARTIAL-SUM PROCESS S, 55
and define
(9) T T(q, A) J o
t - ' exp (—Ag 2 (t)/t) dt.
Then we have
if and only if
(A simple proof of this theorem for the special case of square integrable 1/q
is given in Section B.2.)
for b„ =J2 log 2 n. Note that S(nI) /f is also a Brownian motion, and then
compare (12) and (7).
In Section 7 we will discuss a different type of construction in which a best
rate of convergence is also established. The best result is given by the Hungarian
construction of a Brownian motion S and a single sequence of iid (0, 1) rv's
X 1 , X2 ,... with df F whose partial-sum process S„ satisfies
n Sk n n
(14) „(t)= k S for and k >_n
k+1<t<_k
:n
n=4
-- 0 S.
-4
•
~ n -
n+3 '-- n+1
n+2 - A-- 1
—4
Remarks
Functional analogs, for partial sums, of the Berry- Esseen theorem are con-
tained in Nagaev (1970) and Aleskjavicene (1977).
Let X denote a ry having mean 0. Let S denote a Brownian motion on [0, co).
Our first objective is to sketch out how to define a ry r? 0 having the property
that
(1) S(r)X.
RC(t+T)—X(T) on [T<oo]
(2) ^'(t)
t o on [T=«t].
Exercise 1. (Blumenthal's 0-1 law) Let A be in the o,-field s:Qo =(1 E ^ o s:^ E
,
Now the ry
is a stopping time with respect to the a-fields sf, = Q[S(s): s < t]. We take as
known the fact that P(ra, b <00) =1 and that if Z denotes any of the three
martingales above, then the optional sampling theorem of martingale theory
gives EZ(ra, b ) = EZ(0). If we let Pa = P(S(Ta.b ) = a), then application of
optional sampling to (5) yields 0 = ES(0) = ES(ra, b ) = — apa + b(1— pa ); this
implies
Applying optional sampling to (6) gives 0 = E(S 2 (0) —0) = E(S Z (ra, b ) — ra, b )
(— a) 2 b /(a +b) —b 2 a/(a +b) —Era, b = ab —Era, b , so that
Observe (A, B) = (a, b) according to H, and then observe Ta , b ; call the result
of this two-stage procedure r. Show that T is a stopping time, and then use it
to prove Proposition 3. (This T was taken from a W. J. Hall seminar in 1967.
There are many other ways to specify a suitable r; see Root, 1969 for a nice
one. See also Breiman, 1968.)
by Propositions 2 and 3. Note that ET, = Var [X,] by (11). Also T, is a.s. finite
(since Ta , b is), so that the strong Markov property of Theorem I shows that
Because of this independence, we can now repeat the above procedure obtain-
ing a ry T2 such that
Note that
(17) X,+X2=[5(T2+r1)—S(T1)]+S(T1)=S(T1+T2).
The LLN suggests that T, + • • • + T = nt. Thus we see reason to hope that
(
II§*-S(nI)/clI
(25) Thu
[n , (log 2 n) log 2 n]II,<oo a.s.
provided
IS(n)—S(n)I+
(27) a.s. 0 if EIXI'<ao for 2<r<4.
n h/ r Ilog n
Use this to establish a rate between the rates of (24) (when r = 2) and (25)
(when r'= 4). Breiman also proved (26) for S = 2.
IS(n)—S(n)I
lim >0 a.s.
n-.=
Exercise 8. (Keifer, 1969) Follow Kiefer to evaluate the lim sup in (25). Its
value is a function of the particular stopping time used.
An Alternative Construction
Suppose now that X = (0, 1) with df F, and we now specify T via Proposition
3 so that
(32) II§n—SII- O as
62 FOUNDATIONS, SPECIAL SPACES, AND SPECIAL PROCESSES
(c) T, = nT„,.
Thus (b) will follow if we show that tln - p 0; we will in fact show that
where the functions on both sides of the arrow in (f) are A. [Note the proof
of the Glivenko-Cantelli theorem (Theorem 3.1.3), which uses this idea also.]
This is from Breiman (1968, p. 279). ❑
6. WASSERSTEIN DISTANCE
0
[F-(t)—G-(t)]2 dt
I 1/z
and note that the above remarks show this to be finite for F, G E 2 • It is clear
that d2 (F, G) >_ 0 with equality if and only if F = G, that d2 (F, G) = d2 (G, F),
and d2 (F, H)<d2 (F, G) + d2 (G, H) by Minkowski's inequality; thus
(a) J [F-'(t)]2dt<e.
(b)J [F'(t)—F'(t)]2dt<r
and
for all n >_ some n. Convergence of second moments implies that for n ? some
n °_ s we have
[Fn'(t)]z dt <3e
(e) [s,`
J1 -7
for n ? n,, s = n', s v n. Thus (a), (b), (e), and Minkowski's inequality give
(f) di(F,F)= J 1
0
[Fn'(t)— F - '(t)J 2 dt<e +(v +^) 2 <9e
J'
—{[J co X2dFn(x)]1 /2—[J x2dF(x)
so that J x z dF(x) - j x z dF(x). Assume F. -' d F does not occur. Then at some
continuity point t o of F we have F„•(to ) -* a 0 F(to ) on some subsequence
n'. This is enough to imply d2 (F, F) -A 0, which is a contradiction. Thus
F„ d F. O
Theorem 2. (Mallows, 1972) Show that d2 (F, G) = inf E(X — Y) 2 when the
infimum is taken over all jointly distributed X and Y having marginal df's F
and G.
for F, G in = {F: F is a df such that ,( . lxi dF(x) <oo}. Show that d,)
is a complete metric space. Note from geometrical considerations that
(7) d1(F, G) =
J m
IF(x) — G(x)l dx.
Show that d,(F, G) = inf EIX — Yl when the infimum is taken over all jointly
WASSERSTEIN DISTANCE 65
distributed X and Y having marginal df's F and G. Show that for F,, F2 ....
in
when the infimum is taken over all jointly distributed X and Y having marginal
df's F and G.
f f dP^ --), f dP
(12) f
for all continuous f having f(x) = O(iixiI ° ) at infinity.
Exercise 7. (Bickel and Freedman, 1981) Show that 1IF— G il <S implies
that {t: IF - '(t)—G - '(t)i>'} has Lebesgue measure less than v.
(1) S(nI)/✓n=S.
Let X denote a (0, 1) rv. The Hungarians succeeded in defining iid X rv's
X,, X 2 , ... and a Brownian motion S on a common probability space so that
the following results hold: The partial-sum process S of X,, ... , X. satisfies
while
with b„ = 2 log e n. However, for any A. -+ m there exists a (0, 1) df F for which
(4) Im A „
I .^n — S(nI) /v /!I —
b = oo a.s.
n
whenever X,, X2 ,... are iid F; thus (3) is indeed the most that can be
established under the (0, 1) assumption. See Major (1976a, b) for all results
of this paragraph. Theorem 2.6.2 provides some of the basic motivation; consult
also Major (1976b).
In the next paragraph we shall see that under additional moment assump-
tions, more can be claimed about the uniform closeness of the special S„ to
S(nI)/'.
The best possible rates are obtained when we assume that
* log n
(6) lim (^^ n —^(nI)/^^^ <some M <ao a.s. when (5) holds.
n- w
where c 1 , c2i C3 > 0 depend only on the df F of X and where c 3 may be taken
arbitrarily large by taking c, large enough. Moreover, this construction is the
best possible to the extent that whenever F is not a N(0, 1) df, then no matter
what construction is used we have
^S*(n)—S(n)^
(8) tim ^ some c> 0 a.s.,
n-.ao log n
where c depends on F. Thus the rate in (6) is the best possible. Moreover, the
hypothesis (5) is essential in that whenever it fails, the construction on which
(6) is based satisfies
S*(n)—S(n)
(12) iii — =00 a.s. if EIXI'=oo with r>2.
n -. n
Major (1976a, b) established (11) and (13) for 2< r ^ 3, as well as results based
on E(X 2 g(X)]<oo. See Breiman (1967) for (12).
—S(nt)/./iI / log n
(14) lim max § "(t)
n -i00 {t:t= k /n,kz0} I v logt Vn
where q(t) = [(e 2 v t)1og 2 e 2 v t]' /2 for t > 0. [In fact, Müller, 1968 showed that
(15) holds for the Skorokhod construction of the partial-sum process of iid
(0, 1) rv's.]
Proof. Let i(f)= sup,y l f(t) /t, and note that both +/j(S) and çfi(S(nI)/./)
are rv's that are a.s. finite. Also
by (15). Thus iIi(§n)—iI(§(nI) / ✓n) ->p0. Thus G( §n) * d +y(S). Now +'(Sn)=
v n maXk zn Sk /k. Finally,
The key to this example was step (c), which rests on the fact that IS(• / I) is
the natural time-inversion dual process of S. 0
See Müller (1968) and Wellner (1978a) for additional examples in a similar
vein.
8. RELATIVE COMPACTNESS M ►
Let X,, X 2 ,... denote processes on a probability space (11, d, P) that have
trajectories belonging to the set M. Let ' denote a subset of M, and let 6
denote a metric on M. Suppose there exists Ac d having P(A) = I such that
for each w E A we have:
(i) Every subsequence n' has a further subsequence n" for which X,,.•(w)
8-converges (or is S-Cauchy).
(ii) All 6-limit points of X(o) are in
(iii) For each h E' there exists a subsequence n' = n,,, w such that
S(X,, (m), h)-0 as n'-*co.
,
When these conditions hold we say that X. is a.s. relatively compact with
respect to S on M with limit set Z, and we write
This latter result has a natural functional analog for the partial-sum process
S„, as defined in (2.4.4). Specifically, Strassen's (1964) landmark paper showed
that
where
(5) 11—{k: k is absolutely continuous on [0, 1] with k(0)=O and
J0
" [k'(t)] 2 dt
The Classic LIL for the Normal Processes S and U, and for Sums of lid rv's
The next proposition puts on record the proof of the classic LIL in a very
simple case where details do not cloud essentials. (See Brieman, 1968 for the
approach of this subsection.)
n - ' ✓n k b„ k _ I )
P(A k )<_ 2P(S„ k ?(1+2e) k by (b)
nk
= a convergent series.
(e) 1TVnb
i 1 a.s.
n
We must now show the iii in (e) is also ?1 a.s. We will still use n k = ( a k );
but a will be specified sufficiently large below. We write
and
so that P(Bk i.o.) = 1 by the other Borel-Cantelli lemma. But P(A k i.o.) = 0
72 FOUNDATIONS, SPECIAL SPACES, AND SPECIAL PROCESSES
has lim sup equal to v'T 1 + 0 =1 a.s. as t -* oo. Thus (6) holds.
For (7) we use the time reversal Exercise 2.8. Thus
t§(1/t)
I as lim by time reversal and (6)
-.= 2-t log t t
S(r)/r
(f) =lim letting r= 1/i.
r-•o (2/ r)1og 2 1/ r
We can now use Skorokhod embedding (2.4.12) to extend the LIL for
Brownian motion to the general LIL for an lid sequence.
S
(9) lim _b =1 a.s.,
n
Proof of (i) of Theorem 2. Let X, Xi',... and S denote the specially con-
structed random quantities of (2.4.12). Note that
that is (9) holds for the special rv's X, X Z , .... But (9) is a property
determined solely by the finite-dimensional distributions of the X; 's, and these
agree with those of the X*'s. Thus (9) holds. [The other half of (2) follows
by symmetry.] The original proof was given in Hartman and Wintner (1941).
In the next subsection we will improve on the LIL (2) by establishing the
M► of (3).
Theorem 3. (i) (LIL) Let X,, X2 ,... be iid (0, 1) rv's and let Sn
X, + + X. Let b n = 2 log e n. Then (9) can be strengthened to
Let C,„ ={x€ R.: IxI =1} denote the surface of B.. Let an C„, be arbitrary
but fixed; recall (a). Let 0< e <_ 1 be arbitrary. Now a.s. for infinitely many n
the vector Z„ satisfies a'Z„ ? 1- E by (a) and 1 4 1 ^ 1 + e by (b), so that
wrt I Ion R m .
(15) i
Tim
ö = 1 a.s.
.^ 2t log e t
Let Tm = { tmo, ... , tmk }, where 0 = tmo <_ tm ' ' ' tmk ,, 1 denotes a finite
subset of [0, 1]. Recall that ar (x) (x(t mo ), ... , x(tr k- )) denotes the projec-
tion mapping of x. We will call the function x m (the function z m ) that equals
x(tmj ) at each t mj and is constant (linear) on each subinterval closed on the
left and open on the right (closed on both the left and right) between adjacent
t,nj 's the Tm -approximation of x (the Tm -linearization of x).
When dealing with —Ye a.s. wrt II II on D, we will make the following
assumptions about Ye:
and
provided Tm becomes I I-dense as m - oo (i.e., provided max {tm; – tm; _,: 1 <_
i^km }-+0 as m -cc).
and for some X satisfying (16)-(18) we have for each fixed m that
Combining (b), (c), and (d) into (a), we can pick an n, (it must exceed both
n k _ 1. and Nkr,) for which
(g) :3E.
by (18) and (g). Thus f is the II II-limit point of the sequence fm , where fm E yC
by (18). But C is II II-compact by (16). Thus f E Ye, and (ii) is verified.
It remains to prove (i). This will hold if we show that every subsequence
n' has a further subsequence n" that is Cauchy. Fix m so large that a m < E/2.
Since
Exercise 5. (Wichura, 1973a) Show that (19) and (20) may be replaced by
(22) P( lim inf ffX„ — h ^f = 0) =1
n-.ao hek
Mapping Theorem
Proof. Let A, be the set having P(A,) =1 that is associated with (25). Let
A denote the exceptional set on which (26) fails. Let A = A, n A 2 , so that
P(A) = 1. Let co E A. We will show that for every co E A conditions (i), (ii),
and (iii) of definition (1) of M► hold. Let n' be an arbitrary subsequence.
Condition (25) guarantees the existence of a further subsequence n” and of
an h E ' for which S(X„•(u,), h) --> 0 as n"-x. Hence (26) yields
fi(^i„.(X„••(cu)), rjro (h))-* 0 as n "-> oo. Thus (i) holds. Let h E i/i o (); then h=
gio (h) for some h E Condition (25) guarantees the existence of a subsequence
n' for which S(X„•(w), h)--0 as n' -+oo. Thus S(a/i„•(X„•(w)), h)-+0 as n' -goo.
Thus (iii) holds. Suppose S(tji„.(X„•(w)), k) - 0 as n'-^ co for some subsequence
n' and some k€ M. As in our proof of (i), there exists a further subsequence
n” and an h E ge for which S(i/i„••(X„••(w)), «io (h)) ' 0 as n "moo. But
S(rJi„.(X„••(w)), k) ^ 0 as n" -.cc by our assumption. Thus k = +J0 (h) E s<r o ( 'T).
Thus (ii) holds. See Wichura (1974a). ❑
RELATIVE COMPACTNESS OF SS(nt) / /i 79
J0
[k'(t)] 2 dt<_ 1I.
Since I k(t) — k(s)I = Js k'(r) drl <_ { Js dr Js [k'(r)] Z dt}" 2 <_ t — s, we have
Proposition 1. k E yC if and only if k(0) =0 and for every partition 0 = t o < t, <
< tm = 1 we have
Z
(3) i -1 t[k(t
i—ti ' )— k(ti-1)]
I <_1.
Moreover,
(4) satisfies (2.8.16)-(2.8.18).
I m l 1/2
(d) J0
[kl(t)]2dt= J
0
lim[km(f)]2dtlim J0
[km(t)]2dt<_1.
Thus kEX.
We now turn to (2.8.16 - 2.8.18). That km E 3C follows from (3) by the
argument written out twice so far. Both 11 k,„ — k 11-+ 0 and k,„ — k,„ I -+0 are
trivial by uniform continuity of k. For (2.8.16), suppose I k — k u1-+ 0 for any
;
k E 9l: Each k satisfies (3); passing to the limit in (3) for k , we find k satisfies
; ; ;
S(nI)
(5) ^- " - ^L a.s. wrt ll lion D as n oo.
Corollary 1. For all e > 0 there exists an NE, w such that the increments of S
satisfy
jS(ns, nt]l
(6) t — s + e for all 0 <_ s <_ t <_ 1 whenever n >_ N ,. EB
fib„
Corollary 2. (Strassen) The partial-sum process S. of iid (0, 1) rv's satisfies
(2.7.4). That is,
Proof. The reader is referred to Strassen (1964) for (5); note also Exercise
2.8.7. Then note that (5) combined with Skorokhod's embedding (2.5.24) gives
a trivial proof of (7). Also (6) follows trivially from (2) and (5). (Note the
proof of Cassells' theorem (Theorem 13.3.2).) ❑
S(n)
I)—IS(n
(10) B(n, •) _ = U = Brownian bridge,
B(n, )
(11) b ^^► &Y a.s. wrt ^^ ^^ on D as n -> oo.
n
We call S(t) /f normalized Brownian motion since it has variance 1 for all t.
We define
and
(3) c(t) =21og2 t+2 - ' log 3 t-2 ' log (4r)
-
Note that E t, has a long right-hand tail, and the kth power E. is the df of the
maximum of k independent rv's having df E,,.
while
Darling and Erdös (1956) proved Theorem 1 via the representation (8). They
also proved an analogous result for the partial-sum process of independent
(0, 1) rv's with uniformly bounded third moments. However, the stronger result
presented as Theorem 2 is due to Oodaira (1976) and Shorack (1979b);
examples involving dependence are given there.
THE LLN FOR lID RV'S 83
^S(S))
(10) Y sup
1--s(logn)` ' S
is essentially equal to —r in that for all large K > 0 and small e > 0
Then
^S^)^
(12) m =
it sup S") and Mn =^
sup
n
I sssn
satisfy
and
while
Theorem 2. (Feller) Let X,, ... , X. be iid with E[XI =. Let a n ?0 satisfy
a n /n/. Then
HsnI 1 0 °° ( <00
(3) lim = S according as E P(IX ? a n ) = S —
n - an 100 n=I l co.
84 FOUNDATIONS, SPECIAL SPACES, AND SPECIAL PROCESSES
(5) EIXV<co if and only if nra -2 P(ISn I >n °e) <00 for all e >0.
n=1
See Chow and Lai (1978) for citations and additional results and references.
A necessary and sufficient condition for the WLLN is given in the next result.
The Processes
We let 1,, ... , e„ denote independent Uniform (0, 1) rv's whose df is the
identity function I on [0, 1]. We define the uniform empirical df C„ by
t „
(1) Gn(t)=— i 1 [€c,^ for0<_t^1.
n i.,
Note that
so that
1
(4) _.„,; for—<ts—and 1-5i-5n
n n
is our notation for the uniform order statistics. We will also have use for the
smoothed uniform empirical df L. defined by
(6) (t) equals i/(n + 1) at each en:; and is linear on each [fi n:; _,, fin;; ].
We agree that
We can write U n as
n
(14) Un(t) Jn 1IfIE4i <1 — t} for0t1.
We agree that
O o
"
n:1 cflfl E-:i ^^^ n :n
(a) (b)
1 ^/^^ 1 1
/ ^-°
^ U
1rx-1
^^. 1n'-1
__ ~"" _-__. 1
n TI. Ti m
(c) (d)
1 1
^
" "
Emu ^^^ En:p ^^^
E~:i Ennp 1
(e) -'
Figure I .
87
88 CONVERGENCE AND DISTRIBUTIONS OF EMPIRICAL PROCESSES
Lo
Nn
0.5
0.0
-0.5
In
0.0 0.2 0.4 0.6 0.8 1.0
Figure 2.
where c cn = (ci1 , ... , cnn )'. The weighted uniform empirical process W n is
defined by
where
1'c c„
( 24 ) Pn=Pn(C, 1 )=
✓ 1'lc'c ✓c
UNIFORM PROCESSES AND THEIR SPECIAL CONSTRUCTION 89
n3 —7 .........
Cri 1 [^ni $ t]
^► ♦ ^—iw ^ i =1
C ann ' I
C nDn2 Zi y
^_
Cni t
~^ ti=1
C nD.t -i+I
t
0 En;i En:2 En: 3 "' En:n 1
Figure 3. The difference between the step function and the straight line is
(25)
1 „
c„ _ —
ni _, ;
2=1 „ 2
c„ and c„ — — c i
n ,_,
„.
This holds since
sn t —st ifi=j
(26) Cov[1—s, 1[^,]—t]=
0 ifi5&j.
so that
Note that
and that
Note that
1 ( ( „ +I)r)
(35) l(t)= — E c ; for0s t <_ 1.
,Jc c i =,
We will also use the name empirical rank process. Note the fundamental identify
t --- ---^
—1 — k-- 1
n+1 ♦— n+1
Let Y2 denote the space of all square integrable functions on [0, 1], and let
J
1/2
J
('
Exercise 2. (i) Use the Lindeberg-Feller theorem and (20) to show that if
I[h]1 2 <co, then
(41) W n —* f. d .W asn-*co.
(ii) Show that if J[h]I 2 <o0, then (20) and Lindeberg-Feller give
1
(42) 1 cn; h() --" d N(0, [h]j 2 ) as n -> oo provided J. = 0.
✓ c'c ;_,
Statistical Applications
so that we expect
U 2(F) dF
(47) =f
(48) =
E v„(t) dt if F is continuous
/ i /
(50) S,, -7=Y_ Cni[h(&i) — h]
✓c c
(51) = I h dW n .
0
(55) _ 0
h dJI8„
(56) _ -
f I8„ dh
o
if h e BVI(0, l) is left continuous.
UNIFORM PROCESSES AND THEIR SPECIAL CONSTRUCTION 93
If Fl converges to W when c„ = 0, then we would expect [from (55) and (36)]
that
(57) T„ — +d E 0
hdW=N(O,o1.) asn^eo ifc„=0.
It is important to note that when (45), (49), (52), and (57) are made precise
via a special construction, we will not only have —> d in these equations, but we
will also have a representation of the limiting rv's.
This should serve as sufficient motivation for what is to come. In later
chapters we wish to consider statistics (such as S„ and T„) under both null
and very general alternative distributions. This section has allowed us to set
up notion for the null situation. The following theorems are special cases of
results to be proved in Sections 3 and 4; Theorem 1 rigorizes (45) and (49)
while Theorem 2 rigorizes (52) [we will rigorize (57) later].
that are all defined on a common probability space (Cl, .s', P) for which
and
t It is customary to claim only "a.s.” in a special construction, but we chose "for every w” instead
at the small price of redefining things on a null set.
94 CONVERGENCE AND DISTRIBUTIONS OF EMPIRICAL PROCESSES
We will also
Recall that
(70) Pn(c,1)- Pcl, pn(c,a) - Pca, Pn(a, 1 ) ' Pal asn - goo,
for 0 <_ s, t t. (It is sometimes appropriate to let "a” denote the "truth” and
let "c" denote what we think is true.)
(73)
t
0
h dUn =a f
0
hdv =N(0,o )
and
= Ql^^ . ,^,he
0 o
('
(76) Cov I J
L o
1 (' '
h d UJ, J h dU I=Cov Jh d W,
o
1
J
IC
o
1
f o l hd W = uh,n,
(78) Coy J h d u, I h dW I =
0 Jo
As is suggested by these covariance formulas,
(79)
fo ,1
[o,,^dW=W(t)=—J " WdI to,, 1
o
and
and
Proof. Let M be a large integer. By applying the SLLN to each of the finite
number of binomial rv's nG n (i/M), 0 <_ i -5 M, we conclude that
[ G(
I -1 1 1
) M ] M^Gn(t) ts G n ( )_. .
i i l +M.
1
(83) -
-
The monotonicity technique used in the above proof at step (83) is worth
remembering.
Generalizations
Much current research deals with the following type of generalization of these
ideas. Consider
When only a finite number of sets A are involved, the latter is no more general
than Q.J . ^f.d. U . When . ' ={{O, t]: 0 <_ t_< 1}, (86) holding uniformly reduces
to (81) and (60). Further generalizations to larger classes of sets can lead to
difficult problems. Since UJ,(A) = jö l 4 dUJ,, a further generalization would be
to consider
under these hypotheses 0J„(h) (0, vh) and is asymptotically normal. One
could then ask for uniform convergence and study the limiting process indexed
by h. See Chapters 17 and 26 for some results of this type.
Since e„ :; < t if and only if at least i of the n observations are less than or
equal to t, we have the key relation
with density
I
(90) —t)" - ' for0<— t<_ 1.
(i -1)!(n—i)!t'-'(1
This is also "proved” by noting that n! /((i —1)!(n — i)!) distinct groupings of
the n observations have i —1 observations less than t, one equals to t, and
n — i exceeding t; each such grouping "has probability density” t'(1—t)"'.
Similar reasoning shows that for 1 i <j < n the joint density function of
„ :i and ; is
I
1
(t—s) , -
I(1—t)" " for0<s<_t 1;
(91) (i -1)!(j—n-1)!(n—j)!S'
i 1 i j
(93) E "` i= COV[6„:,,f„:;]=
n+1' n+2n+1 1 n+l
and
1 j —i ( 1— j —i
(94)
n+l
Ti )
Exercise 3. Give rigorous proofs of (89) and (93) based on (88) and its
trinomial analog.
To see this, note that the inverse image of any (t ,, ... , tn ) having 0< t, < • • • <
to < 1 is the set consisting of the n! permutations of (t,, ... , t o ). The density
value, namely 1, at each of these n! points is mapped onto the same image
point to yield a density value there of n!.
The uniform spacings 5 n; are defined by
in
(1) F,(x)°— I 1[Xx] for —ao<x<oo,
n ;_,
the average df
n
(2) F.(x)=— j F,, 1 (x) for—co<x<co,
n ;_,
Let Xn: , _
< . < Xn:n denote the order statistics. Let
1 ^
(6) l(x)= c^,[1[x,,<XJ—F„j(x)] for —co<x<oo.
vc c
Define
and
(10) — E G„ (t)=t
; for0^i<1.
[a";ct]=[`<_G„;(t)] forall0<t<_1
(11)
[X„ jux] =[a^;:SF"(x)]=[e"1^G^,(F^(x))] for all —oc<x<x;
_1 _ I
C"(t) — for0<t<_1
(12)
n;=1 1 [n..i<<] —
n ;_, l[..^G..,(t)]
of a,,, ... , a„„ is used to form the reduced empirical process X„ on (D, 2)
defined by
1 ”
(13) X,(t)-`I [C„(t)—t]= _ {ltf.,.^G (,)] — G";(t)} for0 t<1.
.fin
Clearly,
and
K„(s, t) = Coy {X,,(s), X„(t)]
(15)
=snt - 1n G
;= i
1 (s)G(t) for0:s,t<1.
(17) lY„=-RC„(G„')+n[G„-Q^n'-I]
and
We will also consider smoothed versions lk,1^', >K,,, l' n of these processes.
They are random elements on (C, W). [Recall (3.1.9)-(3.1.10).]
It will suffice to concentrate our study on the reduced processes because
the fundamental identity,
This is true because nG„(F„(x)) equals the number of indices i for which
= F„(X„ ; ) <_ F„(x); and (11) shows that this is a.s. equal to the number of
indices i for which X,, ; <_ x. We need only union (11)'s null sets over a finite
number of indices i for each of a countable number of n's; such a union is
again a null set. Note also that the quantile process is such that
(20) J[f„' - F„'] =.In [F„'(G ) - Fn'] for all n, holds a.s.
Z»(t)= , cn;[ltntt -
G,,;(t)] for0^ t <_ 1
(21)
_✓ in
— c;[ 1 ( .Qc,»-Gnr(t)]
c'c; =1
for0<_t<1
Note that
and
Moreover,
n
(25) Coy [Xn(s), Z (t)] _—
'In i .,
c=
= [Gni(s A t) - Gni(s)Gni(t)]
'Ic'c
for O s,t<_1.
We now let
and
1 «n+ut>
(28) R(t)=-=I C,, , 1 for0^t<_1.
J c'c ;-,
Note the fundamental identity
(29) R„ =Z( )+ cc
(30) Tn - h ( l cni =
ni
\ n 1I
h ( ) CnD^j= fl hdR^.
.lc'c i =, \ n+1 o
This differs from (3.1.36) and (3.1.54)-(3.1.56) in that we are now considering R„
and T. under completely general continuous alternatives F 1 ,..., F„„.
102 CONVERGENCE AND DISTRIBUTIONS OF EMPIRICAL PROCESSES
We now extend these results to the case of completely arbitrary df's. We agree
that
and
We now define an associated array of continuous rv's X.,, ... , X„„, n > 1, by
letting
for 1 <_ i ^ n; here all f„ are independent Uniform (0, 1) rv's that are also
;;
We thus see from (34) that X„,, ... , X„ „, n ? 1, are independent with con-
tinuous df's H,, 1 ,.. . , H,,,,, n > 1, that satisfy
(36) F„ = H„ ; (D„)
; forl_<i<n and F„= H„(D „).
d2 d 4 d11 din
M M M M
Now let
„
(38) «„; = H„(X„ ; ) and G„ ; = H„ ; ° H„' where H„ = n' H„ ; .
i =1
„
(39) n '
- G 1 (t)=t for0<—ts1.
=X(P) by (36),
where
(thus X. is the empirical process of the continuous a„,, ... , a„„). This funda-
mental equation,
is the discontinuous case version of (19) ; note that in the case of continuous
df's the present X„ is the same as the X. of (19). We can think of (41) as
saying that the empirical process Jn[F„ — F„] only "looks in on” the reduced
empirical process X,, of the associated array of continuous rv's at points in
the range of F„. All we really need to learn from what is above in this subsection
is formulas (40) and (41) and this interpretation of (41).
In this general case we agree that
Likewise the ranks R„,, ... , R„„, the antiranks D„,, ... , D„„, and the rank
sampling process 08„ are also defined in terms of the continuous a n ,, ... , a„„.
104 CONVERGENCE AND DISTRIBUTIONS OF EMPIRICAL PROCESSES
For these randomly broken ties, expression (30) for T„ with the R„ ; 's defined
in terms of the continuous a „ ; 's is an accurate representation of simple linear
rank statistics formed from arbitrary F„,, ... , F.
The key paper is van Zuijlen (1978).
A different reduction is treated at the end of Section 3.3.
Verification of Reduction
absolutely continuous with respect to F„.) Thus Proposition 1.1.2 and (a) imply
that
gives
(c) ant = G,
as claimed. ❑S
_ [G ni1 (Gi(ai)) t]
_ [a <— t]
n; for all 0 ^ t s 1, a.s., by Proposition 1.1.3
since G,^; (G (a„ )) = a„ a.s. follows from a„ = G„
; ; ; ;
_[Fn(Xni)-t]
_ [ '„, ^ G(t)] for all 0<_ t < 1, a.s., by the proof so far
c [ F.,. (en1) ^ F. o Gnn (t) ]
We also note that (8) implies that a.s, for all —oo < x < oo we have
=[4L FF;°F.'oF.(x)]
[ „; < F„,(x)] by Proposition 1.1.3
Theorem 1. We have
and complete the proof as in (3.1.83). See Shorack (1979a) and Koul
(1970). ❑
(48) X(w)=X(w)+YPil[d<x(w)]+Y-Pisil[xc,^)=e;i,
i i
where 4 , ... are additional iid Uniform (0, 1) rv's. Let F denote the df of
X. Then with
(51) F=F(D)
DEFINITION OF SOME BASIC PROCESSES 107
d2+P2 dz+d,+P2 +Pl d2+d1+d3+p2 +pl +p3
d 2 d2+d,+P2 d2+d1+d3+P2+P1
ID 1D - '
P2
Pt * P3
d 2 d, d3
Figure 2.
and
where U, is the uniform empirical process of the rv's f = F(X,). We note that ;
J J
d,sx
x h(F) dF= D(x) h(F(y)) dF(y)
E
h(F(y)) dF(y)+
-(dj) d^sx
Y_ h(F(d,))p .;
f
(57) h(FF) dF= h(t) dt+
co-'
J
(' F(x)
dj^xfFF((dj)
_d,)
[h(F ±(d;) —h(s)] ds.
108 CONVERGENCE AND DISTRIBUTIONS OF EMPIRICAL PROCESSES
In particular,
f
x (X) X
We must temporarily set our statistical problems aside and concentrate on the
probabilistic behavior of the underlying processes. We maintain part of the
notation of Section 2. Thus
i n
(1) 7L(t)= — for 0 <_t^ 1
J CC;_^ cni[1[f,.G,,(,)]—Gn1(t)]
,
where, now
and where the constants c, 1 , ... , c" n satisfy the u.a.n. condition
2
Thus
and
for0:s,t<_1.
then
Moreover,
if and only if
then (8) holds and (11) holds with K(s, t) = s A t —st. Thus
and
(17) 'V is weakly compact on (D, .9, II l).
The original papers dealing with 74, and R,, seem to be Koul (1970) and
Koul and Staudte (1972). See Shorack (1979) for reference to other authors.
(18 r
) 1,, jIa [a(1—a) —ab
11 1, J bl —ab b(1—b)
Moreover,
and
Let
c„ ; /.ic'c we have
where we let
y — Gni(r, S]
and
Now the y,'s are independent, the S ; 's are independent, and y is independent ;
of Sj if i 0 j. Thus
2}
c) 2 (
)
E {(
n
_ C'E[y?6 ]+ CZC;Ey?E6;+2y-y C?C;E[y 1 6 1 ]E[yj 6; ]
( i0j i*j
n
3 C4G (r, s]G„ (s, t]+3 Y-Y C i
„1 ; j Gn; (r, s]G„j (s, t]
( isj
= 3v „(r, s]v„(s, t]
E j C; 5 ; _ C °ES°+ C?C; ES?ES; +2
I i;j
as CZC; ES; ES;
12
=3 E C,E6j +Y C'E6° -3 C4(E3;) 2
( I I
proving (23). ❑
112 CONVERGENCE AND DISTRIBUTIONS OF EMPIRICAL PROCESSES
Exercise 2. Prove (24); note that we only ask for a crude bound.
c 4 P(WZ^(I/m) ^ E)
/ k-1 k l
+3Kv„(0,1] max v„{ ,— }
1 1^5k!5m ` m m 111
J(c)
k 1 k (
^(3+3K)v„(0,1] max v„^, +v„(0,1]j max
lz5kicm m m
c;
l lsisn C C '
-
'
so that applying (3) and (8) to (c) shows that giving Corollary 2
(f) ?n >r.d. Zx
– when K„ -> K.
Suppose now that Z„some 1 on (D, -9, II II). Then Z. .fd1 by using
projection mappings for the f of Theorem 2.3.5. From -^ d and the uniform
bound of (23) on fourth moments, we conclude
v" cZ G (k-1 kl
l J 'mJ
-
m m c' c ^ ` m
(a)
r max G„,k-1 k 1
1 „
c„ <_ 2 ma x ^I Gni — j Il + l / ms
L m 'm
t_in C'C
;
so that
/ k-1
lim max v„ I , — ^ lim 2 max jI G -I II + 1/ m
n-+co ]k^m \ m m n- I^in
d'd = c'c, max {nId„ i - c ; I/.lc'c: 1 <i<n }-*0 as n-*oo and the d, take on n
values, no subset of which sum to zero. Note that
thus satisfies
Since Wn trivially satisfies (8) and (11) with v„(t)=t and K(s, t) =sA t-st,
we have from Theorem 1 that W,, CNN on (D, 2, I) II) ; thus by Skorokhod's
Theorem 2.3.4 there exists processes W W d. and W* = W for which
Because of right continuity, the sample paths of H and H are both completely
determined by their values on the diadic rationals; and for each m ? 1, the
joint distributions of{H„( j/2 m ): 0j^2'} and {1Vü (jj2"'j: 0 j<2'Jarethe
same. Let a'„ denote the set of all possible sample paths of 6.0„. Then
p (Hn € X„)
= lim P(H *„ (j/2 m ) with 0 j <_ 2"' takes on n +1 distinct values)
m-W
Let A = n„ t [Hl n* E Ye„ ], so that P(A) = 1. For WE s' define 0< f„ , < • • • <
< 1 to be the ordered points of discontinuity of Moreover, for all
0=to<t'< ... <t^<t,,—=I we have
and thus
, t* \L
(h) (n:t, =(fn:t, ... , fl:n)-
Now let fin, ... , f *„ denote a random permutation of ^*;,, ... , 6* : ,,, where
each of the n! permutations is equally likely. Note that W. is the weighted
empirical process of , ^n„. Observe that [as in (b)]
t Step (f) is probably clarified by the following special cases. If all c,, ; = 1, we would rewrite (f)
asP(f„, ; <_t ; forlsi<_n)=P(nG,(t ; )?iforlsisn).Also,ifn=3then[i=,,. ; st ; forIsi31
is the union of disjoint events of the type [/:, s t,, f, ^ t„ t, < 2 s t z ]; each event of this latter
type is easily seen to satisfy (f) for appropriate B„ (using our conditions on the d,, 1 s).
WEAK CONVERGENCE OF THE EMPIRICAL PROCESS 115
(Note that we had to modify the original c n; 's slightly in order to guarantee
that H had exactly n jump points. If one c n; was equal to 0, say, then H.
would have had at most n —1 jump points.) ❑
Exercise 3. Assuming p n -' p, extend the previous proof to show that (IW, —
W11-> 0 and II U n — U II -' 0 for all w, as stated in Theorem 3.1.1. [Note that the
individual statement (3.1.60) is an immediate corollary to (3.1.62)].
Proof of (3.1.61) and (3.1.63) in Theorem 3.1.1. Now (3.1.61) holds since
(3.1.12) gives
by (3.1.62), III n' —III -' 0, and the continuity of the sample paths of W. ❑
Exercise 4. Show that for all disjoint intervals (say, [0, r], (r, s] and (s, t])
we have
and
(E7L n (0, r] 2 Z,,(r, s]7L n (s, t]J ^ 3v (0, r] {vn (r, s] A vn (s, t] }.
,,
(a) V. = I
that
Remark 1. We have proved above that — W 11— ,, 0 for the special construc-
tion of Theorem 1.1. Thus the modulus of continuity w R , of R. satisfies
(26) ^ww(l/m)+211R„—WII
=w w (1/m)+Op (1) as n-goo,
Suppose that X,, 1 ,. .. , X„„ are independent with df's F 1 ,. .. , F„„. Let
X,, 1 ,. . . , X„„ with continuous df's H„,, ... , H„„ be the associated array of
continuous rv's of Section 3.2. Thus, as in (3.2.52),
where, as in (3.2.35),
gives
(36) [Xn, x] = [ßn, Fn(x)] = [fn; ^ F,,(x)] for —oc < x <x a.s.
Note that the f n; 's of the present ß -construction, call them „; = f ni , and the
's of the earlier a-construction of (3.2.42), call them ^„°;, are a.s. identical in
that
We now let Z. denote the weighted empirical process of the ß„ ; 's; thus
n
7Ln(t) E
, Cni{1[/3,,i_t] — Gni(t)}
J c c;_^
(38)
— VCi 'c i=1
n
C ni{ 1 [fn^G,dM)J — Gnilt)}
i
(39) c E ' c {1 [x .. —F„ ; }
n
En= Jc ni 1
i
( 40 )Y_ ./c'
c
n
Cnr{ 1 [.F(.)] — F„;} a.s.
i n
(41) _ Cn; {1 ] —Gn,(Fn)} a.s.
✓c c
(42) =Zn(Fn)•
118 CONVERGENCE AND DISTRIBUTIONS OF EMPIRICAL PROCESSES
Now
i n
(46) G„(t)=— 1 tß <,^ is now the empirical df of the ß,'s
n
Moreover,
if and only if
Now consider the proof of Theorem 1. It applies verbatim; but since we now
have vn (s, t] = t – s we do not require condition (8). This completes the present
proof. We single out for display the fact that this proof has established that
also holds. a
(1) a,,,
1
(1')
X; n > 1} having continuous df's {F 1 ,... , F; n > 1 }. As in Sec-
tions 3.2 and 3.3
F n (X n i) = G,,1 F,,1 o F n
1
where F. = –
in
n ; =1
where Fn =— Y c 2n; Fn; ,
Y_
C C i=1
n
Fni,
1 n
(2) G,;(an;) = I and –
n ;
E
=1
G n; = I,
/^ 1 n
(2') 6ni = G ni(F' ni
' ) a.s. and -- c 1GPi = I,
C C 1
and
(3') [ßn; <_ t] = [an; < G„ ; (t)] for 0-5 t 1 holds a.s.
120 CONVERGENCE AND DISTRIBUTIONS OF EMPIRICAL PROCESSES
I n
(4) X (t)= = [ 1 [f,a.,(r)] Gi(t)] for 0<_ t 1,
✓n
i n (
(5) 7n(t)= C ni [i [f....G,.t( , )] — Gni(t)] for 0 tC 1,
✓ CCi_^
_ 1 ^`
(5 ' ) ZR(t)— L Cni[ 1 [&. GpJ , )) — Goni ( t)] for 0<_t<_1 ,
✓ C C i =1
Since this maximum approaches zero, the weighted average (on j) approaches
zero; thus (6) and the definition of G, i and G„ ; in (1) and (1') imply
m ax II G ni — I (H 0 and max ^I Gp ; — I II -> 0 as n -* oo.
(7) te isn 1si<_n
From Theorem 3.3.1 it is natural to think that if {c,, 1 , ... , Cnn ; n >_ 1} satisfy
the u.a.n. (3.1.58), then (recall Corollary 3.3.1 and Corollary 3.3.2)
where
Theorem 1. Suppose the continuous df's {Fn ,, ... , F,,,,; n >_ 1} are nearly null
as in (6) and the constants {C,,, ... , c nn ; n > 1} satisfy the u.a.n. condition
2
(9) c i -^ 0 and p n -
max n p as n -> oo.
I i -n C , C Cn
Consider the special construction if,,, ... , i,,,,; n ? 1}, U, W of Theorem 3.1.1
that satisfies
provided these same in; are used in defining X,, and Z. in (4) and (5). Moreover,
and
(n^
Var [A „(t)] =—
, L
cc ;_, Cni{I G,, (t) — tl[ 1— IGnf(t) — tl]}
max ^I Gni —I ll
=isn
(b) -0 as n -* m.
122 CONVERGENCE AND DISTRIBUTIONS OF EMPIRICAL PROCESSES
P(II ^3E)
by applying Chebyshev's inequality and (b) to the first term in (c) and by
applying (3.3.14) to the second and third terms in (c). Since e > 0, we have
shown
Stochastic Integrals
hdt, Q;,=I[h-h]I2=Jh2dt-ham,
hJ 0 0
(17) (' 1
Q h,(h-h,h-h)= J hhdt-h h -
from Section 1. Now (with X,, Z„, and Z as in Theorem 1)
I n
(18) h dX n =— Y1 [h(a;) - Eh(an;)]
0
and
^ n
(19)
J o
hdZL =- 1 Z cn;[h(«n;) - Eh(an;)],
✓c,c +=I
would seem to converge jointly to rv's, all of which are N(0, o ). It is now
-
SPECIAL CONSTRUCTION FOR A FIXED NEARLY NULL ARRAY 123
our purpose to define N(0, 0.h) rv's, to be denoted by Jä h dU and jö h dW,
on the probability space ((1, .s^P, P) of the special construction of Theorem 1 in
such a fashion that both
asn ->co
or
nC2
(20b) cc some K < oo for all 1 < i <_ n, n ? 1
or
or
or
or
Theorem 2. Suppose the continuous df's {F,,..... F„„ ; n >_ 1} are nearly null
as in (6) and the constants {c,,, ... , c; n >_ 1} are as in (9). Consider the
special construction { , ... , i,,,,; n >_ 1 }, U and W of Theorem 3.1.1. Let
h, h E Y2 • Then there exist rv's
(21) I
f o ,
hdU=N(0,ah) and I hdW=N(O,o )
Jo
124 CONVERGENCE AND DISTRIBUTIONS OF EMPIRICAL PROCESSES
for whicht
(22)
f. '
hdX„ =
a0
hdU and Jhd7L° = hdW.
o a
(22') I h d Z„ = I !dW.
0 ° 0
Remark 1. Our proof will define J h dU and Jo h dW so that (21) holds and
the covariance formulas hold true for any appropriately correlated Brownian
bridges U and W. We insist on using the U and W of Theorem 1 in Theorem
2 so that (22) also holds true in the sense of ---* , instead of just — d. Thus we
will get representations of our limiting rv's.
Then clearly
hdi„ = ^, Lcni[l[o,r](ani)—Gni(r)]=—J ZL A.
0
to give
n
(b) /^
(=1 ^ E cni[h(ni) — Eh(eni))if all G„
'IC C
; =I
(24)
f o ,
h dW= — W dh
Jo
for step functions h as in (23),
for total variation measure dlhI. Thus (22) holds for the step functions (23),
and (21) follows by Exercise 3.1.2.
Since the step functions (23) are I[ ]) dense in £2 , given any h E Yz we can
choose a step function h E having
(e) = J 0
(h — h,)d7L„+ J
0
h d7L„
E — Y
f
—J0 (h —h )dZL — J
'
E o h r,dl m —Yr ,
1
so that for m A n >_ some N E Chebyshev and (c) at step (f) give
! (' I
+
P( J h,d Z , — Y I ? e J + (two analogous terms).
o
Y
) dZ m /e 2 .
+Var
[J1 (h — h
E ]
0
126 CONVERGENCE AND DISTRIBUTIONS OF EMPIRICAL PROCESSES
Var1 f (h-h,)dZ I
L 1 " J
1 1 2
J (h-h )
('
_—
e c i_)
c„ ; e 2 dG” -(; (h-h e ) 2 dGn;/
0 0
1
J 0 (h-h,) d
(
v lss t] i Y_ Gni(s, t]
"
(
— C
1 "
-
1
c
/
(t - s) if (20a) holds
K (t - s) if(20b)holds
K(t-s) if(20c)holds
(h) <_ K(t-s) [i.e., (20a)-(20c) are all three special
cases of (20d)].
Thus (g) and (h) give, independent of all F" 's satisfying (h) with a fixed K, that
;
(i) Van l
Lfo,
_Kf (h-h,) 2 dt<e 3 .
(h-h E )dZ" <
J o,
Since statement (j) says that the Y. are a Cauchy sequence in probability,
there exists a ry Y for which
since the Lindeberg-Feller theorem of Exercise 3.1.2 establishes for the sum
representation of Y. in (b) that Y. * d N(0, oh). We then define Jö h dW to
be Y, and rewrite (k) as
Joint normality of any finite number of W(t ; )'s and Y = (ö h; M's likewise
follows by applying the Lindeberg-Feller theorem to an arbitrary linear combi-
nation of the same W n (ti )'s and Y;n = J h, dW 's. Since coupled with this - d
we already had - ,' to a limiting vector, it is clear that the covariances among
the W(t ; )'s and Y; 's equal those among the W n (t1 )'s and Y,,'s. These are
11
1 n
1 n 1
c^,, 1 o , 1 (s)h(s) ds—th}
c'ci -1 tj0
(25) =j
t E h(s) ds — th} = Qn,1 1 ,., 1
and
j' 1 1 ('
(26) ={ f
h(s)h(s) ds—k} =cT n ,n.
If we set cni = 1 for all i, then 74, reduces to X,, and the very same proof
works for X,,. (The ordinary CLT could replace the Lindeberg- Feller version.)
The covariances for X„ and Z n are analogous to (25) and (26). If all c ni = 1,
then (20b) is automatically satisfied; thus any version of (20) may be omitted
from the formal hypothesis if only X n is considered.
We note that the essential part of inequality (i) is that
for all r >0 we can choose a function h F for which (a) and (c) hold
(27)
The theorem also holds under this condition. We note that both (a) and (c)
hold for functions h satisfying (20e), so that (27) is not needed in this case.
To prove the Z result we merely note that for the process Z of (5') the
corresponding P. function is v„(s, t] = t—s; thus (h) necessarily holds. ❑
(28) 1
0
h d S = h (1)S(1) — § dh
1 J 0
for the step functions h of type (23)
128 CONVERGENCE AND DISTRIBUTIONS OF EMPIRICAL PROCESSES
11 Cl
(29) ^ 1 hdW= I hdS—Z I h(r)dr forallhE.'2 ,
0 Jo Jo
Now define
(31) fhdsJ
= h1 [o ,, ] dS and J hdW= h1 10,, ] dW.
o o 0 0
(34)
f
Cov I S(s), I h d = I
o, J Jo
h(r) dr
(35) I hdW=
0
fI 'h
odS—S(1)
Jo
h(r)dr for0<—t<landallhE2 2
SPECIAL CONSTRUCTION FOR A FIXED NEARLY NULL ARRAY 129
and
lf
f
1 SAI
for all 0 ^ s, t < 1 and each h E 22 . Similarly, (35), (32), and (34) give
(J s ft l f ',1nt t
(38) Coy d h S,
00
h dW = h
hhdr—
J fos
dr I h dr
o
(i.e., Cov [S(1), Jö h dW] = Cov [Z, Jä h dW] =0 for all t).
Since ,(ö h d S is a martingale on [0, 1] for h E 22 , the Birnbaum and Marshall
inequality (Inequality A.10.4) and (32) imply that if
e 1
= s -z [q(t)] "2d h 2 (s) ds
0 0
,S(1) h(r)
P (IIL fo, dr]/gJIlo> E)
rt /2
h2(r)dr)1
cP ( 1I§(1)I\Jo /q110>E/
e
E -2 [q(t)] -2 d 2 (r) dr}
I E(S2(1) fo h ,
Combining (40) and (41) into (35) gives for all 0<_ 0< 2 that
(42) P( J
( o
h dW
)/ g ll oß_28)
2r [h(t)l q(t)] 2 dt
for q as in (39),
P (ll( J h„ dw
— ohJ dW)/qll
^2e)
n — h) dW)/qfl ^ 2e)
= P (11 Jo (h
o V r=t i
n (
` t— r
(47) U(t) ° A^II(t) — 0 1—r dM(r) is Brownian bridge for 0< t <_ 1.
(49) J 0
hdU=Jhdi1_Jh(r)
0 0
Jj -
0 —s
-dR1I(s)dr
(50) f
= o I h (s) — 1
,
s I SS h(r) dr] du(s).
of Section 2.2; thus K is a normal process with continuous sample paths and
covariance function
Proposition 1.
K f.d. K as n -'
for all neighboring "blocks" B, and B2 . There are two types of neighboring
blocks.
132 CONVERGENCE AND DISTRIBUTIONS OF EMPIRICAL PROCESSES
while
nb — na E E, b,^a+I [I[ ,s](e ) — (s — r)] 2
,
Ellö n(t)
B =
,
n nb —na
_(b—a)EUJ m (r s] where m nb — na
=(b—a)(s—r)
(d) µ(BS)
(f) =µ(B1)FA(B2)•
Now apply the Bickel and Wichura (1971) theorem, using Proposition 1.
1 (s)
K,(s,t)= cflI{1( . Gcr»—Gn,(t)] for0<—t<_1
✓ cc
and (3.3.2), (3.3.3), (3.3.8), and (3.3.11) hold with the K(s, t) of (3.3.11) being
SAt — St.
and
Proof. Now for 0 <_ s < t we have for each 1 <_ i<_ n that
t—s
( 5 ) E(1[e.:JI05)= 1 [f; =.s]+ 1 -5 1 [SGf;]•
Thus [check the values of (5) and (a) in the two cases e; < s and fi > s
separately]
E I X5 1
[ ^V " (t)
^_ ►
_ _
1—t J 1—t
E( V "(P"') I
Q) = ✓ n(E(f":iI ^i) — P";)l( 1 — Pni)
Pni
—j i n+1
=./n " :;+ (1— " :;) — n+1 n —iA -1
n — j+l
(c) (n+1)f":;—J / "(P"i)
=
n[ n—j+1 J 1—p ",
(b) = [g(t)]
J0 B -2 (l f
— t) -2 dt = oe [q(t)] -2 dt/A 2 .
See Pyke and Shorack (1968) for the first version of this inequality. Note also
Häjek (1970); he decreased the constant on the right-hand side to one.
Now U(t)/(1 — t) is also a martingale with second-moment function v(t)
t/(1— t). So is W (t)/(1— t). Thus the same proof works for them. 0
Inequality 2. Let q, : • •:q„ and let 0 < 0 1 be fixed. Then for all n>-1
and all A >0 we have
1 ( nO > 1
(10) Ps" max IV^(P^1)) Al j--
2•
/
1 i ne 4. A n 9.
and
IRn(i/(n+1))I A - 1 ( no> 1
(11) P max
Imsi!sno q; — AZ(n-1) I q,
MARTINGALES ASSOCIATED WITH U,,, U,,, W,,, AND Q. 135
Proof. Let p; =i/(n+ 1). We will use Proposition 1 and the Birnbaum and
Marshall inequality (Inequality A.10.3) for our inequality below, and then use
Proposition 3.1.1 to evaluate variances. Thus
(i) Suppose an urn contains n identical balls labeled c„,, ... , c„, and m balls
are randomly sampled without replacement. Let Y; denote the value of c„ ;
and
”' 1 (^ m1 1
Y/m = m t 1 n -1 for1 msn,
(13) Var
Li 1 J
where
1 /
14 mac,,, — ( Cni — C„) 2 .
n ;_,
( )
136 CONVERGENCE AND DISTRIBUTIONS OF EMPIRICAL PROCESSES
(15) Cov [
n 1 i = I n
dnil'i, 1 i dnjY; J
1 _n ;-^
c,n
mm'
z I z 1 1
d n i — — ni d — dnj )].
n n i -, n;_,
Remark!. Theorem 3.1.1 showed that if cn = 0 and max {c/ c'c: 1 < i n} - 0
as n -> oo, then IIR n —WII —* a . s . 0 for a special construction; this implies
ff (t) --> d N(0, 1(1-1)) for each 0<t<1. In the finite sampling notation of
the last problem, we note that
I m
(16) R n (t)= c Y ^ Y; with m=((n+1)t).
so that the first claim holds. Now let O i = o [en:k : k >_ j]. Then for i < j, fin , ; is
distributed as the ith largest of j —1 rv's uniform on [0, f n .; ]; so
I
'. ') t (i -1)+1 j
and
Also,
Remark 2. For each a> 0 the process {G (t)/ t: a ^ t < 1 } is a reverse mar-
R
tingale. Thus Inequality A.11.1 gives P( IIG„/ I II', ? A) < A -' E (G„(a )/ a) = 1/A.
Passing to the limit on "a” gives
In later chapters we will show that equality actually holds. In the latter chapter
on linear bounds we will also use martingale methods to establish the com-
panion inequality
This inequality requires more technical detail than we care to display at this
time; however, we will use it before we prove it. Thus for any e > 0 we can
use (25) and (26) to choose a large A e > 0 such that
(27) P(G„ < A,t for 0 s is!1 and G„(t) ^ t /A, for fn: , < t <_ 1) > 1— E.
The martingale results that follow will not be used until much later in this
monograph.
Recall that for any function f we let f + and f - denote the positive and
negative parts respectively, so that f =f + —f. Thus (since the upper extreme
is achieved)
IIU II =ö ax U(t) and IIU II = — o inf^ U„(t).
138 CONVERGENCE AND DISTRIBUTIONS OF EMPIRICAL PROCESSES
Proof. (See Csäki, 1968). Let S n = n(G n — I). Let Tn denote a value of t at
which IIS'.II achieves its maximum, and note that
[It is the inequality in (a) that breaks down when we try this with Sn =—./c'c W ;.]
We thus have that
E(IIS,,+III—iIS+nIIIe...... en)?0.
Since IISn II n, we have EII Sn II <oo. Thus IISn II is a submartingale with respect
to the o-field o [C,, ... , CJ generated by e,, ... , 6 n .
The proof for IISnII is completely analogous.
If a n and 13n are both submartingales with respect to an increasing sequence
.n of o-fields, then so is a n v A n since
Thus 1I Sn II = II S v 11 S„ 1I is a submartingale. ❑
is such that Ell n (G n —1)^/ f II < oo. The same holds for Proposition 4 below.
(31) II(Gn — I) + II, II(Gn — I)11, and II6 n — III are reverse submartingales
= II(G,+l — I) + II
and
(a) ^^ IE JJUn/9II
( 2 /A)EJIvn/qlI/ 2
/x } dx]
c(2 /A) 1 J {[J q2(i )
1+
1
112
dt
]
2
Jq
('
c2 1+ -2 (t) dt /A
0
(b) = M/A
140 CONVERGENCE AND DISTRIBUTIONS OF EMPIRICAL PROCESSES
P( max max k
= ( 1^k^n I II > A.)
1—k<_n n k 1^ ^ A)
IIq q
(c) cA-'EIIvn/qII
(d) : M/A
using the proof of (b) from (a) in going from (c) to (d). See also Sen (1973b).
Cn.
[1[a<xFni(a,x]]
(35) foraSx<ol7
I — Fnr(a, x]
is a martingale adapted to the o-fields
(37)
n
cn;
1[x<xn;sa] —Fn; x, a ll for—oo<x<a
1— Fn1(x,a]
[Note that we must actually restrict (35), and (37), to the interval of x's for
which its denominator is nonzero.]
Proof. We pattern this proof after Pyke and Shorack (1968). We treat only
W. (of which U. is a special case). Now
e
P(I^Wn/gllö E) [q(t)] -2 dt/e 2 by Inequality 3.6.1
0
(b) E3/EZ= E.
Likewise,
In this section we wish to consider some of the standard statistics that have
rich histories.
and
143
Table 2. Limiting Distribution of the Kuiper Statistic
(from Owen (1962))
144
LIMITING DISTRIBUTIONS UNDER THE NULL HYPOTHESIS 145
Renyi was motivated by the desire to measure relative error rather than absolute
error. Now
for any 0< a <b < 1. See Exercise 2.2.19 for the limiting distribution in case
b=1; thus
for all x>0. The function R(• ✓a/(1— a)) is given in Table 3 for various a.
(10) I'(IIS#II-x)
=P((1+t)U # (t/(l+ t)) <_x for (1—b)/b<—t<—(1—a)/a)
=P(U'D (s)^x(1—s) for(1—b)^s<_1—a)
=P(U # (r)<—xr forbr<a)
using symmetry about s = 2 in the last step. This proof is from Csörgö (1967).
E
Exercise 2. Establish some results along the lines of (7), but use as a method
of proof Renyi's (1953) representation of Uniform (0, 1) order statistics in
terms of independent Exponential (1) rv's. See Proposition 8.2.1.
y a 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.06 0.09 0.1 02 0.3 0.4 0.5
.1 0. 000 0.000
0.5 0.0000 0.0000 0.0008 0.0092
1.0 0.0000 0.0000 0.0000 0.0000 0.0000 0.0092 0.0716 0.2001 0.3708
1.5 0.0000 0.0000 0.0000 0.0002 0.0009 0.0023 0.0050 0.0092 0.1420 0.3543 0.5591 0.7328
2.0 0.0000 0.0001 0.0008 0.0038 0.0101 0.0212 0.0367 0.0563 0.0791 0.3708 0.6193 0.7951 0.90821
2.5 0.0001 0.0022 0.0112 0.0299 0.0578 0.0925 0.1320 0.1730 0.2155 0.5778 0.7966 0.9714 0.9751
9.0 0.0000 0.0015 0.0157 0.0474 0.0941 0.1487 0.2081 0.2832 0.3184 0.3780 0.7328 0.9009 0.9915 0.9954'
3.5 0.0001 0.0092 0.0491 0.1135 0.1879 0.2629 0.3341 0.3994 0.4598 0.5140 0.8398 0.9561 0.9978 0.9991
4.0 0.0006 0.0291 0.1052 0.2001 0.2942 0.3804 0.4570 0.5244 0.5635 0.6353 0.9082 0.9823 0.9995 0.9999
4.5 0.0031 0.0643 0.1778 0.2950 0.4001 0.4902 0.5685 0.8311 0.6880 0.7328 0.9511 0.9936 0.9999 1.0000
5.0 0.0096 0.1135 0.2562 0.3895 0.4985 0.5873 0.6594 0.7193 0,7683 0.8088 0.9751 0.9979 1.0000
5.5 0.0225 0.1726 0.3511 0.4784 0.5863 0.8723 0.7374 0.7903 0.8326 0.8665 0.9887 0.9994
6.0 0.0428 0.2375 0.4204 0.5591 0.6627 0.7409 0.8006 0.6463 0.8817 0.9081 0.9954 0.9999
8.5 0.0707 0.3045 0.4952 0.6310 0.7282 0.7989 0.8509 0.8895 0.9181 0.9395 0.9977 1.0000
7.0 0.1053 0.3708 0.5639 0.6939 0.7834 0.8461 0.8904 0.9220 0.9448 0.9607 0.9991
7.5 0.1452 0.4347 0.6193 0.7484 0.8294 0.8839 0.9207 0.9460 0.9633 0.9752 0.9996
8.0 0.1889 0.4959 0.8811 0.7951 0.8871 0.9135 0.9438 0.9634 0.9783 0.9847 0.9999
8.5 0.2348 0.5513 0.7301 0.8345 0.8977 0.9385 0.9806 0.9756 0.9850 0.9908 1.0000
9.0 0.2819 0.8032 0.7731 0.8896 0.9221 0.9540 0.9729 0.9849 0.9907 0.9948
9.5 0.3290 0.6510 0.8104 0.8950 0.9410 0.9713 0.9617 0.9898 0.9944 0.9989
10.0 0.3754 0.8938 0.8427 0.9175 0.9564 0.9770 0.9876 0.9936 0.9967 0.9983
10.5 0.4205 0.7328 0.8704 0.9358 0.9680 0.9840 0.9921 0.9961 0.9981 0.9991
11.0 0.4640 0.7678 0.8939 0.9505 0.9788 0.9891 0.9949 0.9976 0.9989 0.9995
11.5 0.5055 0.7992 0.9137 0.9622 0.9833 0.9927 0,9968 0.9988 0.9994 0.9997
12.0 0.5450 0.8271 0.9303 0.9713 0.9882 0.9951 0.9980 0.9992 0.9997 0.9999
12.5 0.5824 0.8517 0.9441 0.9784 0.9917 0.9968 0.9988 0.9995 0.9998 0.9999
13.0 0.8174 0.8734 0.9555 0.9841 0.9943 0.9980 0.9993 0.9997 0.9999 1.0000
13.5 0.6509 0.8924 0.9648 0.9883 0.9961 0.9987 0.9996 0.9999 1.0000
14.0 0.6812 0.9090 0.9724 0.9915 0.9973 0.9992 0.9998 0.9999
14.5 0.7099 0.9234 0.9780 0.9938 0.9982 0.9995 0.9999 1.0000
15.0 0.7367 0.9358 0.9833 0.9958 0.9988 0.9997 0.9999
15.5 0.7615 0.9464 0.9872 0.9969 0.9992 0.9998 1.0000
18.0 0.7844 0.9555 0.9902 0.9978 0.9995 0.9999
18.5 0.8055 0.9631 0.9927 0.9985 0.9997 0.9999
17.0 0.8249 0.9697 0.9944 0.9990 0.9998 1.0000
17.5 0.8428 0.9752 0.9958 0.9993 0.9999
18.0 0.8591 0.9797 0.9989 0.9995 0.9999
18.5 0.8740 0.9838 0.9977 0.9997 1.0000
19.0 0.8878 0.9867 0.9983 0.9998
19.5 0.9000 0.9893 0.9988 0.9999
20.0 0.9112 0.9915 0.9991 0.9999
20.5 0.9213 0.9932 0.9994 0.9999
21.0 0.9304 0.9948 0.9996 1.0000
21.5 0.9388 0.9957 0.9997
22.0 0.9480 0.9967 0.9998
22.5 0.9528 0.9974 0.9998
23.0 0.9590 0.9980 0.9999
23.5 0.9838 0.9984 0.9999
24.0 0.9696 0.9988 1.0000
24.5 0.9724 0.9991
25.0 0.9760 0.9993
28 0.9821 0.9996
27 0.9887 0.9998
28 0.9902 0.9999
29 0.9929 0.9999
30 0.9949 1.0000
35 0.9991
40 0.9999
43 1.0000
146
LIMITING DISTRIBUTIONS UNDER THE NULL HYPOTHESIS 147
since j[ ]1 is a I- continuous functional on C. In Chapter 5 we will show that
i
(12) Wz= I o U (t) dt
2 z zZ;
l
J=I.J
- *d U2 =[ J 0 '
U 2 (t)dt- (J "
0
U(t)dt z asn--*oo.
)
]
148 CONVERGENCE AND DISTRIBUTIONS OF EMPIRICAL PROCESSES
2 / ,TZ
UZ= E Z2jZE;=11v11
7r
i =1
(14) A„ — f _! (
dt^dAZ= fol UZ(t) dt as n-*oo
o t(1—t) t(1-1)
aJ Z _U 2 ' i
IA„—A
z 21^ [I(1—I)]'”
11. fo [t(1—t)] 314di
n
1 Z,
(15) AZ=
=lj(j+l)
x P(A2>x)
1.610 .150
1.933 .100
2.492 .050
3.070 .025
3.85 .010
(16) "
Mn = J IUJ„(t)I dt -* d M m II IU(t)I dt as n oc.
0
where
+L(t) 3z^3
= expt ( 2/27 t 2 )
- Ai((3t) -413)
Ai is the Airy function (see Johnson and Killeen for references), S; = —a; /2' /3 ,
and a; is the jth zero of Ai'. The df in (17) is given numerically in Table 7.
Theorem 1. (Linear Rank Statistics, Häjek and Sidäk) Let ,„„ and
W be as in Theorems 3.1.1 and 3.1.2. Suppose
(18) h = h, — h 2 with each h ; 1' and in YZ .
Then the linear rank statistic T. of (3.1.54) satisfies
(19) T„ _ c „I I R 1=f R „^
I hdl^ n * p T= J 1 hdW.
=t C C J Jo
Proof. This is a special construction version of Häjek and Sidäk (1968, p.
163). We can trivially replace Z. by II,, in (a) -(f) and then (i) (which is applied
to (f)) of this iid case of Theorem 3.4.2. It thus suffices for > 0 to exhibit an
h E of the form (3.4.23) for which Var [ j'o (h — h E ) d 0] < E 3 . But (3.6.15) with
m = n shows this holds if
2 <
( +l
)]
(a) n 1 1
h [ L)n h`(—
—
n+1
E3 for all n>—some n 6 .
The easy Exercise 4 completes it; see Häjek and Sidäk (1968, p. 164). ❑
Exercise 4. Establish line (a) above. Write out details for (21).
CHAPTER 4
Alternatives and
Processes of Residuals
0. INTRODUCTION
The weighted empirical process (in this chapter we center at F rather than at
as in the previous chapter)
of independent rv's X, , ... , X„„ whose true df's F, ... , F„„ satisfy
max,^,s„ i F ; — FIt -*0 is asymptotically equal (according to Theorem 3.4.1) to
1. CONTIGUITY
Let µ denote a o -finite measure on (R, P'). Let Xn; denote the identity map
on R for 1 ^ i <_ n, n ? 1. Consider testing the null hypothesis
(5) 5 in Y2 (µ)
(6)
ant
1 2
C=, a'a i
a2 . v
a,, 1 ! a'a
Vf -
- J 1b
2
- 0 as n^oo.
( 9 )
1 n
(
^` n
)— Li=1 ac J
^.1
ni ni
28vr dg - >0 as
CCi=,
r
f
Cn, Fn1 — F
0
a'Q C'C
a c
'
( 10 ) pn(a, c) = r r
as cc
1 n
(13) 11 Fn —FII=O(n " 2 ) asn-goo, where,,-- E Fn; ,
n ;= 1
when cn = 0,
for the Brownian bridge W and the specially constructed Xn; = F„'(fn; ) of
Theorem 3.1.1 on the special (11, si, P).
154 ALTERNATIVES AND PROCESSES OF RESIDUALS
Remark 1. All manner of tests of the hypotheses P. vs. Q. are possible. Many
of these can have their limiting power evaluated, for a contiguous sequence
of alternatives satisfying (4)-(6), by means of Theorem 2. Thus, for example
fni(Xn))
(17) L n =log An= log
i=1 J ( x.)
(18) Zn _ „ —„ Zn'
; L
an,
,
28 (Xn1)
a'af^
.
using (7). Thus an easy application of the Lindeberg- Feller theorem of Exercise
3.1.2 shows that
The ry Zn is far simpler to deal with than L. because of this simple form. One
importance of conditions (4)-(6) is that they are applicable in many situations
and that they allow the following replacement of L. by Z.
Theorem 3. (Le Cam) Suppose (1)-(6) hold. Then (20) holds and
Moreover,
(22) Sf = 0.
on the probability space (Cl, si, P) of the special construction of Section 3.1.
Here
We note that
(a) Epexp(L„)=f
r =i
J f„ 1 (x)dx=1,
and that
where both integrands in (c) are >O. Using (25) and (c) in Vitali's theorem
gives (23). We also remark that Vitali's theorem implies that since (25) holds,
the three statements (23), (c), and
are equivalent. U
156 ALTERNATIVES AND PROCESSES OF RESIDUALS
1/1 [ 0, v° 1\
(28) F µ 2 4
S„' -* d N(l 7 } i as n -*oounderP„,
[
J
then
(30) [ ] -+dN [ ° ]. [°
[ as n-00 under P.
]]
with o. 2 =41[S]I2.
(this need not hold for all B). We note that the rv's
(32) 1 B (S„) exp (L„) ^ exp (L„) are uniformly integrable wrt P
by (27). Thus Vitali's theorem (Exercise A.8.6) with (25), (31), and (32) shows
Exercise 1. Show that for any S E YZ having 5=0, there exists a density f„
on [0, 1] such that
^[✓n(^—f)—S]1 Z ^O asn-goo
o du-2 o 6ödI).
(35) dP^=exp(0 S
We now present some of the standard results that are usually associated
with the concept of contiguity.
Exercise 2. (Le Cam's first lemma) If L„ -'> d N(—Q 2 /2, 0.2 ) as n - oo, then
{Q„} is contiguous to {P„}. Thus (according to Theorem 3)
Proof. Assume the lemma to be false. Then for some e 0 > 0 and all values
of 2 -k, k> 1 (values of fi), we can find a sequence {A k,„: n? 1} such that
but
We now use (b) to choose indices nk satisfying nk> n k , nk + ,> nk, and
especially
Since n> n k , (a) and (d) together or else (e) alone imply that
(f) P„ (B „) -^ 0 as n -^ co.
Thus {Q„} is not contiguous to {P„} and this is a contradiction. See Jureckovä
(1969). ❑
Exercise 4. (Hall and Loynes, 1977) {Q„} is contiguous to {PP } if and only
if {L„} is uniformly integrable with respect to {P„} and Q„(p„ = 0) -> 0 as n - Co.
where
( 4 0) P(fg)= f ✓fgdp.
CONTIGUITY 159
(42) K(f, g) =
J flog (f/g) dtL
otherwise, we set K (f, g) = oo.
and that
Proofs
/ ( n f x ( { ^
On(x) ^^ cni J) — a ,a dµ
(a)
J _. (fni — aniSV /
C
n
1r 11 ^ ^
x
„—'f )(Vfni+Vj) ara ani 3 V7 d/2
^
1
(b) — C S C , L l Cni f _^ V,/n7 — VJ ara a n i/Sj2 -%/ d^A.
1 n j x ^ ^ ^
il
Now using the Cauchy-Schwarz inequality for sums at (d) and then for
160 ALTERNATIVES AND PROCESSES OF RESIDUALS
ant Cni z v3 — /J i
IA(x)I_ t^1 Q d^A.
' a C' C anil Qra—U^2Vf
m
(d) ^ \/ j
n Cni n Qni
r
CC i=1 QQ Qni/ Q Q
If
nyfi —V./ l 2 1 1/2
— S 2fdµ }I
11
2 1/2
C n a ni ani/ — l
—g]
(e) — E
=I aa a ni / as dµ4Jfdµ1
1 2 1/2
I =2
n a 2 Vfni-
^I ara
ani/a'atj J dµ^
(f) -+0 by (6).
r Icnil
n 2
—
2
= I max —S +
L1 i n an;
cc i=1 a'Q _'_a
[a,j/V a J
(g) —] max
I<isn CC
^cnil n2,,
21 f
i=I a r a La1/
—
arQ
—S 2 + S2
]
l
J
0 by (5), (6), and (8).
Thus (9) holds. Note that max a „ i /a'a -+0 was not used, but is typically required
to verify (6).
In fact,
(h) if densities fn; replace f in (I) and (6), then fn; can replace f in (9).
f J fdM) =IL8II<00.
I/2
J
('
(k) Sv7dµ — ( 5 2 dµ
CONTIGUITY 161
When n is fixed so large that the 1) 11 in 0) is <s, we can let x- in the
functions in 6) to conclude that IS SV7 dp.I < s ; here e >0 is arbitrary. Thus
(7) holds.
If densities fn; replace f in the previous paragraph, then 0) becomes
n n 2 ('
IIF,,i —Fli
li f *W (f
— J)dFzll
J (v_7)(v+vJ) du
- f ] I [v+ f ]
<I by Cauchy-Schwarz
(o) ^ 2I [' -f ] I since densities integrate to 1,
so that
ands+ anf
(1 Fni —F ^I 2--4 I[ f ni — Vf ]I 2=4 I [VJni —VJ — a'a a'aSj IZ
8
{ I [v'_v7— l7nQ
S] I Z+ R R I[S]I2}
by the c,-inequality
using ^^ Fn — Fit -* 0 and the continuity of W in the second term of (b). Also
by Theorem 1. Combining (b) and (c) in (a) gives the E n result. [Note that
(c) with c = 1 implies Ii Fn — Fit = O(n 'I 2 ).] -
As in (3.2.29) we have
i n
(d) R n =ln(&)+ c , c i l l cn;Gn;(^„')
n
CII 1 L cni[Fn1(E ()) — F( Fn l (ö n ' ))]
(f)
— P(a, c)
J F - ' (G,,')
28fdL
so(1)+1 • I
E -
1
2Sf d1i
F-(G;')1/2
by Theorem 1
(h) ! o(1)+2I[8]I{IIF(Fn'(^R'))-1111 1 / 2 ,
where
IIF(Fn'(^n'))—III IIF(Fn'(.n'))—'n'+^n'—III
IIF(F')—III+II—III
(i) =IIF(')—',(')II+o(1) a.s.
IIF—F„II+o(1)
(j) -*0 a.s. as n -' o0.
Since the rhs of (h) goes to 0, and since (e) holds, the identity (d) gives the
P. result. Li
Proof of Theorem 3. All probability computations are made under P,,. Let
n
^ 4 Y_ I [ — — a n; S/ a'a ] I
i =1
[([/j]2
(c) EWn = 2 Y_ J [ f — .i] dµ = —^ f d z,
we also have, since EZn = 0,
(g) Zn d N(0, 4 1[ 8 ]I z ) as n - .
We thus have
f` -1 >_e
(j) lim max P,,
n- .0 l^isn
n
1 =0 for all e >0;
ePP(Ifr_1I>—E)<_EI L.
=E
{I lIl j+}
Thus 0) holds, and (h) and (i) give (21). [We proved (22) in Theorem 1.] This
completes the proof.
Note that
(46) if densities fni replace f in (1) and (6), then im can replace f
in (18), (20), and (21) provided Ep.(Z,)->0 as n-^ x,
providing a generalization. U
In the special case when all Fni = F. on [0, 1] and all c,, = 1, these results
are found in Beran (1977b).
Proof of Theorem 4. We prove only (ii). The proof of (i) can be found in
Häjek and Siddk (1967). By considering all possible linear combinations, it is
enough to prove Theorem 4 when S,, is a one-dimensional rv.
By making a new Skorokhod construction if necessary, we may suppose that
for some ry S on (Cl, .i, P); the distribution of (Z, S) is given by the rhs of
(30). Now the rv's
(b) lexp (itS,,) exp (L,)I = exp (L n ) are uniformly integrable wrt P
166 ALTERNATIVES AND PROCESSES OF RESIDUALS
by (27). Combining (a) and (b) in Vitali's theorem (Exercise A.8.6) gives
EQ,[exp (itS„)] = Ep [exp (itS„) exp (L„)]
(47) - Ep[exp (itS+Z-2I[8]1 2 )] by Vitali
1212 Var y [S]+2it Cov e [S, Z]+Varp [Z]
=ex itE +
p ( 2
—2 1[&]1 2 /
t2
=exp (it{EPS+Cov e [S, Z]}— 2 Varp [S]
Thus for any Borel set B in some R k whose boundary aB has Lebesgue measure
0, the finite-dimensional set A= H T`(B) = II ^'... , k (B) E'6 satisfies
Q„(v„E A)- n
J 1A(U) exp (Jo' 6
dU-21[60]12) dP
I
(h) PP(A)=fAexpl o dP with A=11T'(B)
(
for all T and for all Bore! sets B whose boundary has Lebesgue measure 0.
Now the sets A of (h) generate W. Thus (h) implies (35). ❑
Suppose that Q: X„ , , ... , X„„ are iid with dl F„, and consider tests of
Ho : F = F vs. H I : F 0 F based on some functional of the process
F F
(1) fn(F„—F).
In this section we want to briefly consider the power of such tests under local
alternatives F. satisfying
, n(F„—F)=^(F„—F„)+^(Fn —F)
(3)
Proof. See Chibisov (1965) for a version of this theorem. The proof is simple
since
IJU (F„)+fl „(F) —[U(F)+ii(F)]jl
for example.
J 0
(U+A) 2 dl,
Quade (1965) and Durbin (1971, 1973a) have given various bounds and numeri-
cal methods for computing these general boundary-crossing probabilities.
Finite-sample power calculations for D. can be found in Section 9.3.
LIMITING DISTRIBUTIONS UNDER LOCAL ALTERNATIVES 169
Remark 1. We note that Theorem 14.1.1 shows that
for some S E 22 (µ) = 22 (Reals, Bore!, Lebesgue). Thus (2) holds with
)
f
f (13) (t)=—
F '(` ` 26 0 F '(s) _
2Sf dx= lo f F (s) ds J('
o 6 0(s) ds
(14)
j
J
/
= exp I
1
So d v --
1
fo 6
l
\
I I dP, under (12),
A \ 0 2
Häjek and Sidäk (1967, p. 230) use Corollary 1 and Theorem 4.1.2 together
to give an interesting expansion of the local asymptotic power of the Dn test
for location alternatives. In fact their calculations are valid for arbitrary local
alternatives for which (12) holds, as we will now show. [They also apply to
tests based on the processes E. and R„ of Theorem 4.1.2 under (12), _provided
we now suppose p„ (a, c) -+ p as n --> oo and let S o = 2 p S a F - ' f - F - '.]
ac 0C
Suppose that
SdU)dP.
ab B(a, 0, b)1b =0= fA (Joy
a
(17) fA
I
\
J 0
Sdv) dP= J ^ S(t) d I
0
` J A
v(t) dPI.
J A
U(t) dP=— J OJ(t)
A
dP
= —EU(t)I A '(U)
= —EE{U(t)l A '(U)IU(t)}
(18) = — EU(t)P(A`I U(t))•
Hence, upon noting that given U(t) the processes {U(s): 0 <_ s <_ t} and
{U(s): t <_ s <_ 1} are conditionally independent so that for x <_ d
zdc d- x ) /t
P(A`I U(t) = x) = (1 — e -
-Za(d- x)1(1-t))
)(1— e
f,(x,d)
f A
U(t)dP —J ^xf(xd)d^
t(1—t)/
Av(r)dP= -2 «dam I
(20) at J 2 ^\ d^ (1 lt)/ —1 J
—2ad„c/i(a, t).
ASYMPTOTIC OPTIMALITY OF F„ 171
Hence
Exercise 3. For the family of alternatives F(x) = x °, 0 <_ x <_ 1, with 0< 0 < oo,
use (21) to approximate the power of the Dn test. (Take a = 0.05 and n = 64,
81, 100.)
Exercise 4. Let A = {x E D: II x II <_ t} with 0 < t < oo. Verify that A satisfies the
(4.1.31)-type condition 1 A (UJ„) . P 1(v) as n -4 oo.
3. ASYMPTOTIC OPTIMALITY OF F.
There is a nice connection between the preceding section concerning the local
asymptotic power of tests of fit and the asymptotic optimality of F. as an
estimator of F. To describe this connection, let F be an absolutely continuous
df with density f, let
and
where 1 is a process in (C, 16). Thus the law of the process 1 on C does not
depend on S.
For the usual empirical df estimator I=„ we have
for any F,, with (1 F„ — Fly -*0, and this follows from (4.1.13) for any { f„} E (f).
Thus F,, is regular at every f in the sense that (2) holds for each of the
continuous versions of F,, uniformly within 1/n of F,,, for example, F„ [see
(6.6.1)].
The following theorem gives a representation for the limiting process 1
corresponding to any sequence of regular estimators {9 „}.
(4) 7 = U(F) +W
for every regular estimator It„. [Hint: Since w is convex and symmetric,
t When applying asymptotic theory one wants a certain stability; that is, the asymptotic theory
should not change severely when F is only perturbed slightly. In particular, Beran (1977a)
shows how this condition rules out certain "superefficient” estimators of the df.
ASYMPTOTIC OPTIMALITY OF F„ 173
(6)f
"
SUPFEEFw(IIVi(F"—F)II)
(7) lim I =1.
"-.- nfbEO„
supFE^ EF[j w((^"
Ilv — F)1I)b(d^", X ))
Similar results hold for other loss functions, for example, nice functions of
J ,, [Iii(F„(x) — F(x))) 2 dF(x); see Dvoretzky et al. (1956) and Millar (1979).
The proof of Theorem 2 will not be given here; we refer the interested reader
to Dvoretzky et al. (1956), Levit (1978), and Millar (1979).
Millar (1979), using the methods of Le Cam (1972, 1979), also gives a
"geometric” sufficient condition for the empirical df F,, to be asymptotically
minimax in smaller nonparametric classes of df's, such as the classes of all
convex df's or all df's with increasing failure rate. Kiefer and Wolfowitz (1976)
give a detailed study of the former case.
We now briefly summarize without proof some small-sample optimality
properties of F". Let Fe F, = the class of all continuous df's on R', let 4i be
a positive continuous function defined on (0, 1), and set
and
Aggarwal (1955) and Ferguson (1967) show that this estimation problem is
invariant under the group of monotone transformations and that the minimax
174 ALTERNATIVES AND PROCESSES OF RESIDUALS
with
(12) w(F.,fn)=
f (in(x)—F(x)) Z
F(x)(1— F(x)) dF(x),
where G is any df on
It is also known that F. is the "nonparametric maximum likelihood
estimator" of F; see Kiefer and Wolfowitz (1956) and Scholz (1980) for
definitions. The reader is asked to verify a heuristic statement of this in Exercise
2 below. Also, from the theory of U- statistics, F,,(x) is a UMVU estimator of
F(x) for each fixed x; see Serfling (1980).
and p; = 1/n, and hence that F,, is the "nonparametric maximum likelihood
estimate" of F.
(a) EJ
l
J
exp S i ^ 7L„ dv } -> E exp j i J - Z dv }
1 1 l ^ ))
as n -* oo.
By regularity of tI„ and Theorem 4.1.3, the marginal laws of (Z, L„)-
(^(F„ - F), L„) converge under Pf to proper laws. Hence the joint laws are
relatively compact, and there exists a subsequence for which the joint laws
converge to some joint subprobability law on the product space; since the
marginal laws are proper, this joint law must also be proper. In the following
we restrict attention, if necessary, only to the subsequence for which the joint
laws of (Z, L„) converge weakly.
Since (7„,L„)= (Z,2I[5]IY-2I[S]I 2 ) as n-oo, by (4.1.3) where Y=
N(0, 1), the Skorokhod-Dudley-Wichura theorem (Theorem 2.3.4) guarantees
the existence of probabilistically equivalent versions for which (R„, L„) - *e.s.
(Z, 2I[S]I Y-2I[S]I 2 ) as n -> oo. Thus it also follows, by regularity of F„, (1.9),
and (4.1.25)-(4.1.27) and by the preceding construction and Vitali's theorem
(Exercise A.8.6), that
=Ef expji J ^%
I n — F)dv-i^^ n(F„-F)dv+L„r
111 ^ ^ 1
= E exp ji
t JT z dy -i J 2(3. 1(_.,Jf2)dV(t)+2I[8JI Y
-2 1[b]1 2 }+o( 1 )
J _^vfdµ ) fdµ=Varf(v(X)).
('
Note that
and
= -o h.
where the left-hand side of (i) equals, by definition, the left-hand side of (f).
This paragraph is not actually used in the proof.]
For this choice of S the deterministic part in the exponent on the right-hand
side of (b) becomes -iho - 2 h 2 . Hence, letting
denote the joint characteristic function of (7, Y), it follows from (a), (b), and
(f) that for the choice of S given in (c),
(j) ii(v,0)=Eexpi
IJ z exp -iho -2h 2
The right-hand side is analytic in h, constant for all real h, hence constant for
all complex h. Choosing h = it in (j) yields
implies (4).
is well defined.
(5) D„ — A) - d Z + v Z - and
v'i(K„ —(,1 + +A )) d Z + +Z
as n - co.
178 ALTERNATIVES AND PROCESSES OF RESIDUALS
(7)
Wz j
n" — ^ J
°
(F —Fo) 2 dFo
1 c°
l .
d Z = 2 f (F—Fo)U(F)dFo
= N(0, 0.2 )
Exercise 3. For the family of alternatives F(x) = x ° , 0 <_ x <_ 1, with 0< 0< co,
use Theorem 1 to approximate the power of the D„ test. Repeat this for the
W„ test using Theorem 2. (Take a = 0.05 and n = 64, 81, 100.)
Exercise 4. Find a lower bound for the power of the D„ tests based on the
binomial distribution. Approximate this bound for large n using the central
limit theorem and compare with the approximation resulting from Theorem
I in the same case as in Exercise 3. (Massey, 1950; Birnbaum, 1953). [Hint:
IIU„ II2:IU,(t)*I for any fixed 0 <t<1.]
Proof of Theorem 1. We give the proof only for D; the rest is an exercise.
Since (see Exercise 5 below)
we shall henceforth replace F„, F, F0 , (—cc, cc) by F„, G, I, [0, 1]. Let
and write
(d) =0
and, clearly,
r
(g) L+ c M ° U S,,
where
On M` we have
t515y tE$y
Combining (i) and (j) proves (f), and (4) then follows from (c), (e), and (f).
Note that the key idea of an heuristic argument is contained in step (h).
(a) =
I [(F„ +G)U (G) - 2MU (G)] dl
(=2
b) (G —I)U(G)dI+ o(1) a.s. for the special construction
fo,
( c) = Z+o(1).
Proposition 2.2.1 shows that Z has the claimed N(0, 0.2 ) distribution. ❑
CONVERGENCE OF EMPIRICAL AND RANK PROCESSES 181
In Section 4.1 we showed that the weighted empirical process E,, and the
empirical rank process R. converged under a fixed sequence of contiguous
alternatives. In this section we specialize to location and scale alternatives,
and we show that in such cases the convergence exhibits a type of uniformity
in the parameters. The key result is Theorem 8, though we find it convenient
to let our presentation proceed slowly through several special cases before
treating the most interesting situation.
We will use these results in Section 6 to treat empirical and rank processes
of residuals. These Section 6 results in turn can be looked upon as an improve-
ment of Section 5.5 in the special case of linear regression.
(1) Io(.f)- r
J
f/(x) f(x) dx<oo,
f(x)
then f is said to have finite Fisher information for location 10 (f). Otherwise,
we define I0 (f) = m.
When I0 (f) < oo the function f' is well defined a.s., and we define
je
—ßo= . (—f'lf)fdx= f .f'dx= tim J A f'dx
= im
f(B) — li m f(—A)
(a) =0 since . f dx =1
with f absolutely continuous as claimed. Just change variables for l[4 o ]1 2 . For
182 ALTERNATIVES AND PROCESSES OF RESIDUALS
the absolute continuity, note that the same change of variable gives
f
F '(r)
f o F (t)=
—' _ f'(x) dx = j [.f'(F '(s))/f(F
- - '(s))] ds
o
f0
^o(s) ds;
hence, the result. We follow Häjek and Sidäk both here and in Proposition 2.
f (x)
1, (f)= J ^ [ 1 +x(
)]
then f is said to have finite Fisher information for scale 1 1 (f). Otherwise, we
define I, (f) = co.
When I, (f) < oo, we define
f((F 11) 1
(6) 11+F -1 on(0,1).
^j1(.,f)=-
.i(F ) J
Proposition 2. If 1 1 (f) <oo, then xf(x) is absolutely continuous on (—co, o0),
F f(F is absolutely continuous on [0, 1] with value 0 at both endpoints, and
- ' ')
f,
(a)
— J
l+x(
— J [f +xf']dx =0,
x oo. Thus the absolute continuity of xf(x) follows easily from Definition 2.
Just change variables for 1[,k 1 ]I 2 . Finally, F 'f(F ') =J 4 1 (s) ds.
- - ❑
Suppose X 1 ,.. . , X,,,, are independent with drs F„,, ... , F„„, and consider
the problem of testing the hypothesis
(9) Pn:Fni=...=Fn„=F,n?1
We will call these contiguous simple regression alternatives. [We shall prove
presently that they satisfy (4.1.6).]
Theorem 1. For testing P vs. Qn under (8)-(11), the log likelihood ratio
statistic L„= "=1 logf, 1 (X,)/f(X) satisfies
We note that S = —b(v7)', 25Ij= —bf', and J_.2S1jdx = —bf in (4.1.6) and
(4.1.9),
(14) the rv's -O o (e ; ) are iid (0, I (f)) under P.,
0
and
Moreover,
(16) Zn - ► d N(0, Io (F)), under P., as n - oo.
Finally, with p n (a, c) a'c/ a'ac'c, we have
II
(17) 1 n c,, {F(• —ba n; / aa) —F } +bp „(a,c)fII 0
;
We follow Häjek and Sidäk (1967, p. 211). Since max Ia„ I/ a'a - 0, in order ;
Since I0 (f) < oo, we know from Exercise 2 that h = f is absolutely continuous.
Thus the integrand in T, converges a.s. to 0 as r -oc. It thus suffices to show
that
(d) lim
f h(x— r) —h(x) Z l f
dx< `° [h'(x)j 2 dx = Io(i)l 4 .
.-. J -^ r J J
)[ hx_r_hx
( ( ]2 [ 1 )
=JJ o - W
[h'(x — y)] 2 dx dy/ r by Fubini
(f) = J 0
fI0 (f )/4} dy/ r = Io (f )/4 <00.
for which
This is our special construction of the contiguous simple regression model (10).
186 ALTERNATIVES AND PROCESSES OF RESIDUALS
We define
n L
1 '` Y —V ani
(25) '— r L Cni 1 [r,1 y—ba.,1/ ✓ a'a)—FI
Q^Q
Proof. We let Y_ ++, E +-, Y_ -+, E --, denote the summation over those indices
i for which (c are (?0, ?O), (>0, <0), (<0, ?0), (<0, <0), respectively.
„ ; , a n ,)
We suppose initially [see Loynes, 1980 for the key idea we introduce in (a)] that
(a) b <_ b <_ 6 for some "short” interval [b, 6] c —B, B]. [
1 ban; '
(b) Z b
Z(y) c
(^
C , C L Cni 1[Eni^v na n / ✓a a] — F^ a , a f
- -
1 —ba y-6a „;
+ c , c cni [ F( '”
) —( a , a )
i n
(c) Y- Cni[ 1 E.,^Y-d,J
I — F(Y — dni)]
1 n
i cni[F(Y — eni) — F(Y)]
(d) =Zn(Y)+{Rn(Y)},
CONVERGENCE OF EMPIRICAL AND RANK PROCESSES 187
where
ba„ / a'a
; ba„;/ a'a .
i n
(f) Rn+ cn;(dni—eni)ff,0 as n- o;
C C;_^
so that
(h) n- 0
i:
We wish to rewrite the usual linear model Y = Xb + E (special case Y, = a,b + a,)
in a form suitable for consideration of local alternatives (special case Y =
ba i / a'a + e 1 ). To this end we writet
[
bx
1 11
+...+ bpx p
(27) Q n: Y= 2
/
+Ei, 1:i_n,
1
I
n
(30) max max x / X2•j I -* 0 as n -^ oo;
1 < j`p 1n i'=1
Theorem 3. For testing P. vs. Q„ under (27)-(30), the log likelihood ratio
Theorem
statistic L"n satisfies
where
n y \
(32) Zb = _ n 61x;1 ^.. • -^ b xip lI f (Ei )
n ^^ n 2 ^` n 2 J
i=1 Li' =t xi'1 Li' =1 X f(e,)
We note that S = —(f )', 2S/f = —f'/f and ,( 2 8/j' dx = —f in (4.1.6) and
(4.1.9),
and
(34) Q„ is contiguous to P.
Finally,
v
— F) +.i , cn^ blx^l ^0
1 ' Y. cni(Fni
2
(36) (I
Proof. Only minor changes are needed in the proof of Theorem 1 [to verify
that the obvious analog of (4.1.6) holds] and in Theorems 4.1.1 and 4.1.3 [to
verify that these proofs carry over under the new version of (4.1.6)]. Our
present theorem is an immediate corollary to the new versions of these last
two theorems. ❑
We now agree that i n; , W,, and W are as in Theorem 3.1.1 [or (20) and
(21)] and
1 " P
(40) , Y_ c„ ; (Fn; —F)+f Y_ b3 p n (X„ c) as n-*cc
c c 1 ;=e
Let X. , ... , X„„ be independent rv's with continuous df's F. , ... , F„„. Let
e1,..., enn be an array of known constants satisfying condition (8) above.
Let d E (—co, oo) and consider the hypothesis testing problem
Vs.
case of interest, we call them simply contiguous scale alternatives. Note that
(42) corresponds to the model
°
Theorem 5. For testing P„ vs. Q „ under (41)-(43) the log likelihood ratio
statistic L„ satisfies
with
and
(48) Qn is contiguous to P.
CONVERGENCE OF EMPIRICAL AND RANK PROCESSES 191
Moreover,
Finally,
Exercise 3. (i) (Häjek and Siddk, 1967, P. 214) Verify that (4.1.6) is
satisfied with a 1 / s/ replaced by e m /V? and with the d defined in Theorem
5.
(ii) Show that (50) is now a minor corollary of Theorem 4.1.1.
(52) forlsisn.
n
(54) = 1 , Y- c„ r[ 1 [E.,i^Y(i-dIVn» — F(Y( 1— d/ ✓ n))]•
Thus
using II F( • (1 — d / .,/ )) — F^I -*0 and uniform continuity of the sample paths
of W. ❑
Let Yn ,, ... , Ynn be independent rv's with continuous df's F n ,, ... , Fnn . Con-
sider the hypothesis testing problem
vs.
b1xil +...+
(58) Q"n .d : n — n 2
b x,,
n 2
e' =Fni,
(1 —d /^)
x
(60) max ax n 2 -*0 as n -> oo.
1_j^p ki^n I x.,.
for e 's, W,, and W as in Theorem 3.1.1 [or in (20) and (21)]. Let
;
n
1b for —oo<y<oo
(62) ' d (Y)= cn 1{1[ ,,,y] —Fni(Y)}
CONVERGENCE OF EMPIRICAL AND RANK PROCESSES 193
for c„ ; 's that satisfy
2
(63) max 0 as n - 'x.
Ie, n c'c
Theorem 7. If (57)-(61) and (63) hold, then for the special construction (61)
we have
in p
n
(66) En(Y) = 1 Y- c.^► {1fr.;cv1—F(Y)} for —oo<y<oo
I ((n+ ► )n
(67) R (t)= c c ;= ► c„on; for0--t^1.
(68) II E —jW(F)—.f Y_
i =l
b1Pn(X,c)—yfdp (I,c)
JII -P0
and
(69)
V (
In — j W—f o F ' -
P
bj p n (X; , c)1 II -> P 0 provided c„ = 0,
Recall from the definition (1) and Propositions 1 and 2 that both
and
i n
c , c Z c (F,, - F)
iI
1 ^
c [F((' - ß)(1 - d/ ✓ n)) - F( ( 1 -
dlv'i))]
1
(a) + Y^c^ 1 [F(•(1-d / ,/ )) -F].
The second term in (a) gives the second term in (65) by applying (50). Applying
(36) to the first term of (a) gives the first term of (65) —but with f( -(1 -
d/Vi))(i - d/Vi) instead of just f; however, the I II-distance between these
two functions goes to 0 since f is absolutely continuous and f converges to 0
as
Now consider (64). Theorem 2 gives
that satisfy
and
(7) E.=(Y^—X;/.)/ ,
where
Recall that
The standardized ith difference d between the fitted value X;/3 and the theoretical
;
value X;ß is
(10) ã1 X(j—ß)/ô
t i11 n = x`' (
"
^//^ 2 P'—ß')=—
v 2 [„ %^ x`' 2 bl
.,. xj
i n
f =i =1 X^l +_I (7 1=1 V 1^=1 X^)
[recall (4.5.58)].
We introduce constants c about which we assume
;
1 un+nn
(14) A„(t)= c, c E c,,,, for0<t^1,
1=I
(16) e ; =s nj = F '(en ,)
-
for the f,, ; 's, W n , and the Brownian bridges W and U of Theorem 3.1.1.
n—{W(F) +f
c'X(ß —ß)
+YfP n(c,1)•
l
(17) (l
C,C^
i( —i ) - 0
and
(18)
I^
U8„ — 4M+Ja F 'C'X (ß—ß)
c c )
,
^II
^ P 0 provided c„ =0
(19) _° c'X(ß ß)
c'co
—
Y- P,(X , c) Xi Xi 'Q
i= ►
(ß A)—
and
Proof. We write
(b) =7L„(y)+M„(y).
Applying (5) and (6) to (11) we see that the b ; 's of (11) and
are such that for all E > 0, there exists B E such that
X cp-
i =1 Q Q
c P) +Yfp.,(c, 1)v (
(h) W(F)+f — 1)
for L„(F). We can replace v and 0-in the middle term of (h) since its coefficient
is Op (i). Once this is done, the resulting version of (h) is (18).
This is an improved and extended version of the location-scale case of a
theorem of Darling (1955), Durbin (1973b), Loynes (1980), and others. It can
be found in Shorack (1985). ❑
(22)
c'ca a c'c '
198 ALTERNATIVES AND PROCESSES OF RESIDUALS
(23) ;
c(E^ 2
e = E^ '( 1 E
=ir o (e ; ) are (0, v 2 ) with v 2 =^ E (E) 2^ .
(22) and (23) hold with 4i(y) =y the identity function. For the classical
maximum likelihood estimate (MLE) we often have (22) and (23) with 4r =
—f'/f and E+i'(s)=Io ( f). In robust regression i41 is usually a bounded con-
tinuous and odd weight function. It is also usually appropriate to suppose that
(25) ✓n^^ -1 J a
=
a 2
1 •/n
Q
2 1
= ^1(e1)
a T r=j
f We define J ,
41dW(F)= jö cr(F `) dW.
-
EMPIRICAL AND RANK PROCESSES OF RESIDUALS 199
Proof. The basic observation is that
since X (X'X) X' is the matrix of the projection mapping onto the column
-
space of X; it is due to Pierce and Kopecky (1979). Thus we first alter the
middle terms of (17) and (18) by writing
c'X(ß—ß) — c'X(X'X)-X'e
by(22)
c eor Q c'cQ
_ c'e
by (30)
c'c0,
Finally, we can replace p„(c, 1) by its limit p. in (17) and (18) since its
coefficient has 11 11 that is Op (1). We remind the reader from Remark 3.1.1 that
in order to talk simultaneously of W and U, we must have p„(c, 1) converging.
Then we define
and
where
-^fo F-i qt
(35) Ü = U — .f'° F -1 ^o(F ') dUJ —F
- 1 (F -1 ) du.
0 0
The asymptotic power of such tests would come from a version of Theorem
2 for contiguous alternatives of a non-location-scale type, or from (5.5.15) to
(5.5.17) below.
Remark 2. Does the matrix X (which we still assume to have first column
1) contain all the regressors that are needed, or should our model (1) be
extended to
(38) L„ and Ü8c. [formulas (13) and (14) with the present c].
Integral Tests of
Fit and the Estimated
Empirical Process
0. INTRODUCTION
The basic problem that motivates this chapter is to determine the distribution
of the Cramer-von Mises goodness-of-fit statistic
W„ _
ao
n[1 „(x) — F(x)] z dF(x) = f l U(t) dt.
o
for functions f orthonormal with respect to the 2 metric. Let Z*, Z*.... be
'
iid N(O, 1) rv's. Just as E Z*y is a N(O, ) rv, so too the process X
;
defined by
o j1
201
202 INTEGRAL TESTS OF FIT AND ESTIMATED EMPIRICAL PROCESS
(4) OJ =U +Zg,
where g is a known function and the ry Z arises as the limit of rv's Z. of the
v
type f ö h d v„. The covariance of turns out to be of the form (the details are
simplest if 0 is an efficient estimator of 6)
that with proper choice of f, (and an 12,13,... that do not enter in) we can
have an asymptotically efficient test. We do not seriously suggest this, but it
seems instructionally useful.
In Sections 5 and 7 we estimated 0 by 9 and then tested whether or not
FB was sufficiently close to F by means of the statistic W. In Section 9 we
turn this around. We assume that F(• — 0) is the correct df for some value
of 9. We then estimate 0 by the value 0 that minimizes W„ _
J °n[0 „(x) — F(x — 9)] 2 dF(x — 0). Blackman's theorem on the distribution of
this 9 is presented.
The monograph of Durbin (1973a) has a fairly large intersection with this
chapter; both pieces of the symmetric difference are nonnegligible.
Statement of a Problem
We have seen earlier that under the null hypothesis, the Cramer-von Mises
statistic reduces to the form
(1) W = j^ [L„(t)] 2 dt
0
(3) Y=(0,4).
204 INTEGRAL TESTS OF FIT AND ESTIMATED EMPIRICAL PROCESS
We know by the principal axes theorem that there exists an orthogonal matrix
r (with rows y ;, ... , y„ that form an orthonormal basis for n- space) for which
(4) Z =rY is =(0,A) with A =r4r',
and A is a diagonal matrix with diagonal entries ,1, >_ • • > A n that are > 0
(>0 for nonsingular T). The coordinates
are called the principal components of Y; note that the A 's and y 's are ; ;
Y_
(6) y' = Ay'.
Of course, we have
n
(7) Z=F'AF= AJ[yy;]•
l= 1
Note that [ yy;]y = y; (y;y) = (y, y;)y is the projection of y onto the unit vector
;
Also,
n n
(9) Y= (Y, yy)y; _ Z;y;
and the Z*'s are the projections of Y onto the associated eigenvectors.
MOTIVATION OF PRINCIPAL COMPONENT DECOMPOSITION 205
Processes with any sort of smoothness property typically have their trajectories
completely specified if we know the trajectories at all points in a countable
dense set. Thus we believe that a countable number of rv's should be sufficient
to define a process like the Brownian bridge, say; see Exercise 2.2.15. Thus it
seems reasonable that the harmonic decomposition of a general zero-mean
process {X(t): 4<_ t< I} might take the form [see also (9)]
(13)X = Y_ Zif,
where Z; 's are uncorrelated zero-mean rv's and f's are orthonormal functions
on [0, 1]. We will use the notation
viz
(14) (g1, 8z)= S1(t)Sz(t) dt and ![g]I = (g, g)"2=
Lfo, g z(t)di
and also
(16) _ E A f (s)f(t)
; ; where AVar[Z ] ;
1= 1
and Cov [Z;, Zk ] = 0 if j 96 k. It is Eq. (16) that suggests how the f's may be
determined; we have
AJ (t)(Ik,ff)
= AkJk(t)-
206 INTEGRAL TESTS OF FIT AND ESTIMATED EMPIRICAL PROCESS
(18) Cov [Z , Zk ] = E
; X(s)f (s) ds X(t)fk (t) dt
0 0
The rv's Zk of (15) are called the principal components of X, a n d (13) is called
the principal component decomposition of X. We call Z* = Z; / JA the normalized
principal component since Z* =(0, 1); note that Eq. (13) can be reexpressed as
(19) X Y, Z* f.
j=1
(20)
f o,
X 2 (t)
i=l
dt= A;Z' 2 for uncorrelated (0, 1) rv's Z*.
Let Y2 denote the collection of all real measurable functions g on [0, 1] having
j^ g 2 ( t) dt < oo; here dt denotes Lebesgue measure. We let (g 1 , g 2 ) =
J o g 1 (t)g 2 (t) dt denote the usual inner product on Y2 , and we let I[g]I = (g, g) 1/2
denote the usual norm on 2'2. Recall that '2 is a complete normed linear
space, or Hilbert space.
ORTHOGONAL DECOMPOSITION OF PROCESSES 207
By a kernel we shall mean a non-null, symmetric, square integrable function
K (s, t) defined for 0 ^ s, i:5 1; that is,
(3) J i
0
f(s)K(s, t) ds=Af(t),
We remark that for each fixed m the choice a = (g, f) minimizes ^[Y_^ , a f —
; ;
where the series converges both uniformly and absolutely. Also, ^^ , A, < a0.
[Here ,l; and f denote the solutions of (3) with the f being orthonormal.]
f o,
K(t, t) dt<oo,
J
I
(a) Y- J 0
g(t)X(t) dt for each fixed g E Yz .
Now
j I I I
f ^/z ^ l r/2
[ J 01
g 2 (t) dt K
ol (t, t) dt =1[g]II
] L o
K(t, t) di 00.
<
Thus we have
(c) = I , f g(s)K
1 (s, t)g(t) dsdt,
0 o
f o, Jo
Elg(s)X(s)X(t)g(t)I dsdt
_ JIg(t)I[K(t,
0
t)]'/ 2 dt]
fJ o, 0
K 2 (s, t) ds dt = J f'
0 .
Cove [X(s), X(t)] ds dt
f f (Var
, [X(s)] Var [X(t)]) ds dt
0 o
=
J 0 0
J ' [K(s,
s)K(t, t)] ds dt
= [ o J K(t, t) dt] Z .
Thus K is a kernel.
210 INTEGRAL TESTS OF FIT AND ESTIMATED EMPIRICAL PROCESS
Now Z; has mean 0 by (b). For the covariances of the Z; 's we continue (c)
by writing
= fo A(t)f(t) dt
' by (3)
We also suppose that A I >_ ,t 2 >_ • • • >0 comprises the entire spectrum of eigen -
values of K, that the associated orthonormal eigenfunctions f,, f2 , ... form a
complete set for the subspace L, and that we have the Kac and Siegers
decomposition
(8) A; <°o,
m
(9) E Zf) -^ I[ ]l X as m - oo a.s.
J=1
ORTHOGONAL DECOMPOSITION OF PROCESSES 211
where
EZ X(t) = E I
; I f (s)X(s) ds X(t)^ =
LLL o E f(s)E[X(s)X(t)] ds
is justified by
IL(s)IEIX(s)X(t)I ds
fo,
1
0
I
1/z
= [ J 0
K(s, s) ds K(t, t) <oo.
212 INTEGRAL TESTS OF FIT AND ESTIMATED EMPIRICAL PROCESS
Now (a) converges to 0 by (6), which gives (12). Then (12) implies (13).
(15) cb(A)=Ee ür —
H (1- 2iA AY W2 .
j
i=1
METHOD 2. If one truncates off the infinite series in (14), then the weighted
sum of independent chi-square (1) rv's that remains can be treated by the
numerical methods of Imhof or Slepian. A good example of this technique is
found in Durbin and Knott (1972) —a description and references can be found
there.
METHOD 3. It is possible to compute the first few moments of T, and then
match these with a Pearson curve, and so on. A good example is Stephens
(1976).
METHOD 4. Use Monte Carlo methods. See Stephens (1976).
so that
and
J
('tt
2 K 2(s, t) dsdt
0 Jo
m
2
f.,
Y, Z AA f(s)f (s)f(t)f ( t)dsdt
0 j =t j' =t
, ,
Jö
(where K(t, t) di <n and Fubini's theorem allows the interchange). More
generally,
where
We now rigorize the program of Section 1 for the empirical process and
Brownian bridge.
can be decomposed as
where
with orthonormal f . The series (1) converges uniformly and absolutely on [0, 1 ].
Proof. Now Eq. (5.1.17) takes the form
(a) Af(t) = f(s)K u (s, t) ds = s(i — t)f(s) ds+ t(l —s)f(s) ds.
0 o r
= J r
f(s) ds— J sf(s) ds.
o
f2(t)d(1 .
(d) J0
The solutions of (b), (c), and (d) are given by (2), and the convergence is as
stated by Mercer's theorem. ❑
Theorem 1. We have
Moreover,
*2
(5) W2 [U(t)]2 dt= ( Z )2 .
J O jY)
P(W Z >x)=—
1=
E (-1)j+'
J (2i)Z2
—^)
e 1 ___
^y
%^ exp(—
x \\\
y 1 dy.
1T 1=i (2j-l)2 2 J sinV y 2/
See Durbin (1973a, p. 32) for some useful discussion and references. See also
Darling (1955, p. 15). Another inversion was used by Anderson and Darling
(1952) to derive Table 3.8.4.
(7) {Z: j? 1} are identically distributed (0, 1) rv's that are uncorrelated
since
[Moreover, the CLT would suggest that the Z„ are nearly N(0, 1).] Finally,
this decomposition represents U, to the extent that
✓2n Y_
o o
which we now show are each distributed as the sum of n uncorrelated (0, 1)
rv's. Note that
for all j ^ k and all i. All rv's cos (jire) are identically distributed since cos (jure)
is the projection onto the horizontal axis of a unit vector where the angle
between the vector and the axis is chosen uniformly on [0, jzr]. Thus (7) and
(8) are established. Equation (9) is a standard result from Fourier series, valid
because v" is piecewise linear. As in Theorem 1, (5.2.13) implies (10). Equation
(11) is just (5.2.11). ❑
Wn= I U (t)dt=
o ;_, n J 12n
Remark 1. (The null distribution of the Z„) Following Durbin and Knott
(1972), we note that the characteristic function of cos (,ire) is
distribution function = 0.5 + (1/ Ir) JJ0 (.J7 t)"t -' sin (zt) dt.
0
Durbin and Knott were able to obtain the following table 1 of percentage
points of the normalized orthogonal components Z„ from the distribution
formula of (12). They advocate that in addition to whatever other graphing
or testing is done, the Z^ 's should be "computed and considered."
,
(14) b
m [G(t)-t]^sin(jari) dt= J
(iZvn)
where 6„ is the empirical df of ß,, ... , ß,,. Thus the Z „* can also be interpreted
;
are also tabled by Durbin and Knott (1972); see Table 2. These can be used
to test (asymptotically) if p terms in the natural Fourier estimate of the true
df G account for the entire deviation from the Uniform (0, 1) df.
Remark 3. (On the power of W„, Zr,, and !IU .JI tests) Suppose X 1 , X2 , .. .
are iid F8, and we form statistics from the continuous hypothesized df F. Then
where, under mild regularity on F (see Propositions 4.5.1 and 4.5.2, for
PRINCIPAL COMPONENT DECOMPOSITION OF U,,, U 219
(17) = s(t).
Chibisov's theorem (Theorem 4.2.1) shows that Kolmogorov's test and the
Cramer-von Mises test satisfy
and
n[F,(x)-F%(x)]2 dF^(x) d
f [U+S] 2 dt
Z*Z
(19) = ±j where Z* are independent N (jir(S, f ), 1),
-
i =t (jir)
(20) "dN((s,f),
220 INTEGRAL TESTS OF FIT AND ESTIMATED EMPIRICAL PROCESS
*In fact, (— 1)1 z, the test statistic, if we always use the upper tail as the critical region
Thus from (19) and (20), we can compute the asymptotic power of the W„ -test
and the Durbin and Knott test based on the jth component Z, respectively.
Similar expressions for the power of the A. and U„ tests can be derived from
results given below. Table 3, from Durbin and Knott (1972), gives asymptotic
powers of the W„ -test, the A2n -test, and U„ -test and tests based on the first
four normalized principal components of U,,. The combination of Z*, and
Z* 2 and the statistic A seem most noteworthy. This seems to confirm the
suspicions of Durbin and Knott that (11) led to higher-order components
being damped out too quickly by weights proportional to 1/j 2 . Stephens (1974)
also found A„, and then W„, to outperform IIU„jI.
(21)
_f
where
0
{X (t)12 dt,
0
2 z °° 1
(24) U" -3.a U = y 2j 2Ei^
i =I 47r
(26) ir 2 U 2 =11U11 2 ,
which seems a very curious fact; thus the tables for 1lUJI can also be used for
W .
(v) Show that 4 U 2 has the same distribution as does the sum of two
independent copies of W 2 .
(vi) Show that
2 r i -1/2 l lZ 1
U=L i if"'^— n
— ^ + 2J + 12n
i —n/2
(2 7 ) =^i-2'^ ' ":^+3(-2)2
where
i
for1isn
n+l
=V n (i/(n+1)).
__
n i j i j
^' n+2 n+1 n+l n+1 n+l
_
222 INTEGRAL TESTS OF FIT AND ESTIMATED EMPIRICAL PROCESS
J^
A = n
; 4(n+1)(n+2) sine (
+l(sin( j J
f' n n+l) 'sIn\n+l)'...,sin(n+1Jj
Show, however, that the Z are not iid. For this reason, Durbin and Knott lost
;
n+l " r r i 1 2_ n +i
^` n+i "
n . tn
+l n n ;_,
and
z ^
and Knott note an observation of Feller that (n + 1)p — Z is a better choice than
the mean np in the normal approximation to P (Binomial (n, p) <_ k).]
Table 4.
+n J (1— t) 2 t(t)dt,
o
PRINCIPAL COMPONENT DECOMPOSITION OF IL,,, u 223
where
We state for emphasis the obvious fact that the f are continuous on [0, 1].
(iii) Compute the mean and variance of ,(ö v„+p dt.
1
-
(28) P( W < x) - P( W Z x)I <_ const for all x,
n
a result which had been conjectured by S. Csörgo (1976), who established the
rate Iog/vc using the Hungarian construction; see Section 12.4. This and many
other results for higher-dimensional versions of W„ are given by Cotterill and
M. Csörgo (1982).
and define
(30) Tn = n - '^ 2 IIN- n/2 fl and G„ = n - ' " [N(x) - n/2] 2 dx.
J0
(i) Determine the limiting distribution of the process
N(x) -n/2= Y_ ao +
0"
k=1
{a k cos (2lrkx)+b k sin (2irkx)}
2 n 2 n
sin (2zrkX ), ; bk =— Y, cos (2vrkX ) ; for k odd.
k-rr i =, kir ,_,
Show that
2
(32) G= 2 Ek for iid Exponential (1) Ek 's.
k -, (2k-1)
A Monte Carlo experiment by Watson shows that the approach to the limiting
distribution is rapid.
- d A 2 - J 0 1—:
(2) A
o « 1—: )dt
= J Z (t) dt
o
2 with Z(t)— U(t)
.'./t(1— t)
snt —st
(3) Kz(s, t) = for0<_s, ts1.
[s(1 —s)t(1—t)]',Z
Unfortunately, the theory of the previous sections via Mercer's theorem does
not apply since K z is not continuous on the entire unit square (note that it is
uniformly bounded by 1). Thus the method of Proposition 5.3.1 does not apply.
However, the general approach of Section 1 is still applicable. Note its analogy
to Exercise 5.3.7 with '/'(t) = [t(l — t)] - ' 1z .
where
2(j+1
(5) A;=[j(j +l)] -' and f(t) =2 t(1 —t)p;(2t -1)
^(^ +1 )
with p(x) being the Legendre polynomial of degreej [defined as the coefficient
of h in the expansion in powers of h of (1 — 2xh + h 2 )_" 2 ). This series converges
uniformly in every domain interior to the unit square. Moreover,
Proof. See Anderson and Darling (1952) and Durbin and Knott (1972). ❑
226 INTEGRAL TESTS OF FIT AND ESTIMATED EMPIRICAL PROCESS
PO(x) = 1
p 1 (x)=x p(x)=l
(7) p2(x) = Z(3x 2 —1) with p(x)=3x
P3(x) =1(5x 3 -3x) P3(x) = 3(5x 2- 1)
p 4 (x) ='-8 (35x 4 — 30x 2 + 3) p' (x) = 5(7x 3 — 3x).
The orthonormal versions are ✓(2j+1)/2pj (x) on [-1, 1]. Thus the first
orthonormal eigenfunctions in (5) are
Note that they put more weight on the tails than do the Cramer-von Mises
eigenfunctions.
Note that the jth normalized principal component Z * of Z,, is
=
2 fo
2^+1 n
l U n (t) dp j(2t-1)= — ✓2j+1 I pj (2t-1) dv n (t)
o
pj(2f; —1).
✓ n ;_,
These are not iid (in j) rv's. Nevertheless,
and
K
(11) as n-*;
j=t
(12) T„ K = Y, L
Z --> d E
j=1 J
—i—
i =1 3
for i.i.d. N(0, 1) rv's Z',
for example.
PRINCIPAL COMPONENT DECOMPOSITION OF A„ 227
Exercise 1. Show that the Anderson and Darling statistic satisfies
aA2 – –2iriA
O(A)= Ee
cos ((ir /2) / ✓ 1 +8iA)'
thus
Remark 1. Borovskih (1980) establishes that for all x>0 and all n sufficiently
large
Exercise 3. Carry out the program of this section for the statistic
1 2
f
'
Let X 1 , ... , X. be iid with unknown df F. We would like to test the hypothesis
that F belongs to a particular class of df's FB with 0 in some index set. Let
0, denote some natural estimate of 0 within the parameterized class of F e 's.
Then it would be natural to base tests of the hypothesis on goodness-of-fit
statistics formed from the process '.In [F,, — Fe,]. As far as the null hypothesis
situation is concerned, we now have enough notation to attack the problem.
However, we will also want to consider the power of our tests; we will find
it most convenient to do this in the content of a somewhat larger parametric
family.
To this end, we suppose that X 1 ,. . . , X. are iid with df F0, for some pair
(0, y) = (0 .... , 0,, Y,, ... , 7 K ) contained in R, +K. We will now test the
hypothesis H: y = 0 that the family parameter vector is 0, and we also desire
the power of this test under local alternatives of the form y/..In. We let IF„
denote the empirical df of X1 ,. . . , X,,; we define
Example 1. Let X, X,, ... , X. be iid. We wish to test the hypothesis that X
has some normal distribution. Given the data values of X i ,. . . , X„, we let P.
denote the normal df having mean X„ X; / n and variance 52
I; (X; —X„) 2 (n — 1). We will then form goodness-of-fit statistics based on
Now let (0, y) _ (0,, 0 2 , y,, 7 2 ) and let FB. , be specified
the process.✓ n [F, — Now
by saying that
e_32/2{1+ 6
(3) X e e' — 1 ' H3(x)+24H4(x)} ,
i
II (F ,7—
0 F0,0)—), e; )
aOj
F0 ,7 1 +Y, Yk
(8,0) k
F9.YIII
avk (9,0)
holds with all partial derivatives being uniformly bounded in x. We say that
the sequence of estimators B„ of 0 is regular if for each coordinate j we have
[We remark that if e is efficient, then we typically have (7) with the vector
of h; 's being the usual product of score vector times the inverse of the
information matrix.]
Here
where
( 10 ) F; = a Fo., 1
ae; 010)
and Fk = a
ayk
Fe.,,
1 (8.0)
and the vector of Z; 's and the Brownian bridge U are jointly normal with 0
230 INTEGRAL TESTS OF FIT AND ESTIMATED EMPIRICAL PROCESS
means and
—
F0,ri ✓„]= ✓ n(j — Oj)F, E—YkFk+o,(1)
k
where (11) holds by Theorem 3.1.2 also. We attribute this theorem to Darling
(1955), though Sukhatme (1972) and Durbin (1973b) certainly improved it
considerably toward the form we have devised here. ❑
Remark 1. When authors such as Durbin (1973a, b), Sukhatme (1972), Pierce
and Kopecky (1979) and Loynes (1980) have treated the process of (2), they
have used some additional notation. Thus we let
where 9„ denotes the empirical df of the „ ; 's. Note the simple identity
[which is true under regularity, see (17)], where [with Fj and Fk as in (10)]
(16) v = U —Y. Zjgj +Y- ykgk with gj = Fj (Fe,ö), and so on.
j k
If (14) holds, then the requirement for concluding (15) is to be able to plug
FR' into the left-hand side of (9); thus
uniformly continuous F's and Fk 's and mutually absolutely
(17) 1 continuous densities that are uniformly bounded
suffices for (15) to hold. [Of course, convergence of U. in modes other than
would suffice to conclude jö VJ„(t) dt -> d Jö 0 (t) dt, say.] [1
(18) Jn(IF„
. — F.,) =UJ „(F(• -9„))
satisfies
and
where
We call f(F ') the density quantile function. (In this discussion, we have
-
satisfies
with Z = (Z,, Z2 ) and U being jointly normal and Z, and Z2 having 0 means and
and
(28) Cov [Z, U(t)]= — (f(F - '(t)), F - '(t)f(F - '(t)))I - '= — g(t)'I - '•
where
and
gi gI
What sort of statistic does one use to test the hypothesis that the true df is
some F with 0 E O? In the spirit of this chapter, a natural statistic is
we use the labels 1, A,,, and so on for 4i(t) =1, r/i(t) = [t(1- t)] ', and so
on. Of course, under regularity conditions on F we will have
T, a 7 =
(34)
J ^ [O(t)] 4)(t) dt
0
2
Sukhatme's, 1972 proof of Exercise 1). We feel that it is better for the moment
for the reader to check the steps sketched in Remark 2 and Exercise 1 in
specific cases. Interesting special cases have been considered in the literature—
see Stephens (1976) for normal or exponential F; see Stephens (1977) for the
extreme value distribution; Pettitt (1978) for the gamma distribution; and
Pettitt (1976) and (1977) for censored normal and exponential cases,
respectively.
(In Sections 4.5 and 4.6, however, we treated rigorously something very
similar to this when we considered empirical processes of residuals in the
linear model.)
In all these cases the problem reduces to the following simple form. Let
{X(t): 0<t<_ 1} denote a N(0, K) process on (C, (6) when K is a bounded
covariance function of the form
(35) K(s, t) = K(s, t)–çb,(s)çb,(t)-4 2 (s)4' z (t)
X(t)=U(t)+0(t)I 0"(s)U(s) ds
0
f0
[,0'(t)] 2 dt=1.
Verify ,(ö ¢"(t)K v (s, t) dt = –4(s), and then compute K$ ; compare it with
(23). (ii) In the context of Remark 2, consider the choice
0(t) = –I(F - '(t))1I
Exercise 7. (Darling, 1955) Let F denote the Cauchy df with density [Ir(1 +
x 2 )] -' •
(i) For the location case of Remark 2, show that Remark 2 can be rigorized
with I= and ¢(t) _ (— ✓2 /zr) sin 2 (rrt). Show that the limiting ry WZ has
mean '' —1/(4ir 2 ). Compute the characteristic function of W2 .
(ii) For the scale parameter case at the end of Exercise 1, show that Exercise
1 can be rigorized with I= and 4(t) = (1 /) sin (2irt). Show that the
limiting ry Wz has mean'6- - 1/(41r 2 ) again. Show that the appropriate Ä; are
.l; = 1/( j 2 7r 2 ) for j = 1, 3, 4, 5,... (but not j = 2). (Thus, "one degree of free-
dom" is lost.)
asn-*x
Proof. Just reuse the proof of Theorem 3.1.1 given at the end of Section 3.3,
replacing a.s , by -te n . ❑
In the previous section we have seen that under the null hypothesis
(2) ->d W 2 =
J0
U 2 (t) di under suitable regularity conditions,
where v is a normal process on (C, 'W) having zero-mean value function and
covariance function K(s, t) of the form
where K is expressible as
(8) K(s, t)=K(s, t)-4,(s)O I (t)—/ 2 (s)0 2 (t) for 0<_s, t<_1
(9) T= J 0
' x 2 (t) dt.
we make the conventions that 0, is not the zero function, that 0 2 is not a
nonzero multiple of ¢,, and that
Theorem 1. (Darling; Sukhatme) Under the assumption (3), (4), (7), and
(8) we have that (10) holds (i.e., T = E^ ä ; Z; for iid N(0, 1) rv's Z,, Z2 , ...)
where
Finally (the analog of the loss of degrees of freedom for parameters estimated),
Of course, the most interesting special cases are the 1,2 of (2) and
Related work is found in Kac et al. (1955); though later authors have
seemed to find Darling's (1955) approach more tractable.
Tables of Distributions
then use these to numerically obtain the asymptotic percentage points of tests
based on W n , Ä„, and so on. This program was carried out by Durbin et al.
(1975) and Stephens (1974), (1976), (1977), with related interesting work
by Pettitt (1977), (1978). The asymptotic percentage points in Table 1 are
compiled from these references. When possible, Stephens also computed
these values by matching the first four cumulants and using Pearson and other
curves as approximations. The two methods showed good agreement.
(Note that Table 1 also includes percentage points and modifications for
Kolmogorov's D„ = IIv„ !I and Kuiper's V„ OJ„ II + Iv„jI.)
How does one use these tables with finite n? Stephens also ran extensive
Monte Carlo sampling experiments to estimate those percentage points for
finite n. After extensive plotting and smoothing he has come up with the
following recommendation. Compute the statistic W n , A, and so on. For the
sample size n actually used, compute the modification of your statistic that is
indicated in the "modification” column of Table 1. Then the percentage point
of the modified statistic for finite n is to be approximated by the asymptotic
percentage point of the table.
Some comments on the power of these tests seem appropriate; see Stephens
(1974). The integral-type statistics seem to outperform the supremum statistics
in case 0, and, by a lesser margin, in the other cases. Overall, A^, or A 2 and
then W„ or W;, seem the best choices. From Durbin et al. (1975), it seems
appropriate to also consider the normalized principal components Z, for ;
Table 1 (Part b)
Percentage Case Ni Case N2 Case N Case E
Points jyz A-z 02 ü Ä z Üz ßrz I Üz F,2 Uz
0.01 0.01865 0.15248 0.01794 0.02191 0.17045 0.01741 0.01651 0.12944 0.01589 0.02005 0.01807
0.05 0.02565 0.20230 0.02458 0.03166 0.23468 0.02378 0.02228 0.16743 0.02135 0.02804 0.02478
0.10 0.03074 0.23779 0.02939 0.03944 0.28336 0.02834 0.02638 0.19375 0.02520 0.03403 0.02964
0.20 0.03882 0.29221 0.03698 0.05282 0.36322 0.03552 0.03269 0.23317 0.03110 0.04372 0.03733
0.50 0.06269 0.44659 0.05943 0.10171 0.63004 0.05875 0.05087 0.34043 0.04797 0.07381 0.06007
0.80 0.10370 0.70242 0.09815 0.22003 1.21818 0.09356 0.08114 0.50908 0.07580 0.12970 0.09923
0.90 0.13439 0.89323 0.12742 0.32699 1.74270 0.12173 0.10354 0.83058 0.09818 0.17448 0.12877
0.95 0.16529 1.08696 0.15716 0.44180 2.30775 0.15065 0.12802 0.75157 0.11653 0.22157 0.15873
0.99 0.23804 1.55062 0.22807 0.72457 3.70201 0.22072 0.17878 1.03482 0.16382 0.33760 0.22990
t These specially marked values are from Pettitt (1977). All other values in part a are taken from the most
recent of Stephens (1970. 1972. 1974, 1976, 1977) in which they appear. All values in part b are taken from Dur-
bin. Knott and Taylor (1975). It is noted that there are slight discrepencies between the three authors.
239
240 INTEGRAL TESTS OF FIT AND ESTIMATED EMPIRICAL PROCESS
r l
= Xn:7+(n—r)Xn:r /re
]
R
B Xn:r+(n—R)xo]/R,
rTn _ n (X
J o
^:. [Fn(x) — Fe(x)] 2 t'(F6(x)) dF9(x)
with weight functions ii(t) = 1 and iIr(t) = [t(1— t)] - ', respectively; the only
change needed in case (E2) is to change the upper limit of integration from
Xn : r to x0 . Let 0 <p < 1. In case (EI) we suppose r/ n --* p as n -* oo. In case
(E2) we let p denote the value to which 1—exp (—x 0/6) converges.
This exercise indicates the asymptotic theory for natural tests of the
hypotheses (E1) and (E2). Using methods analogous to those of Stephens,
Pettitt (1977) obtained the ,. ; 's of Theorem 1 and the percentage points of the
limiting rv's „W 2 and PÄ 2 . These are given in Table 2. Pettitt (1976) gives
analogous tables for the case when only the first r of n order statistics from
a N(µ, r 2 ) distribution are observed.
THE DISTRIBUTION OF W2 , W2 , Ä 2 , Ä^ 241
p
point 0.5 0.6 0.7 0.8 0.9 1.0
(a) A^
50 0.201 0.244 0.293 0.345 0.407 0.496
85 0.411 0.494 0.583 0.675 0.781 0.915
90 0.487 0.584 0.687 0.793 0.914 1.061
95 0.622 0.746 0.870 1.003 1.149 1.321
97.5 0.763 0.914 1.062 1.222 1.394 1.590
99 0.975 1.145 1.324 1.521 1.729 1.947
(b),# 2
that were uncorrelated (0, 1) rv's that converged rapidly to normality. Indeed,
the first four Z's seemed to be useful statistics in their own right.
In the present subsection we would like to obtain an analogous
decomposition
_. o
( 22 ) d W2 = J 02( t ) d t= X;Z *2
242 INTEGRAL TESTS OF FIT AND ESTIMATED EMPIRICAL PROCESS
Table 3
(from Durbin, Knott and Taylor (1977))
Coefficients a4j for calculating components Z,;; of Wn in tests for the normal distribution
io y ) [n^ j \
jfori=2m-1: 1 3 5 7 9
jfori=2m: 2 4 6 8 10
i
Case (ii):
0, j 0 2m.-1
Observations a31-1j = 1, j = 2m - 1
Case (i):
0, j # 2m
Observations any„-1j =
1, j = 2m,
are N(61. 1)
B l unspecified aa„a = a a„- 1 , f from the table above for case (iii)
for lid N(0, 1) rv's . '. The reader is referred to Durbin et al. (1975) for what
we now summarize briefly (however, Tables 3 and 4 are their corrected versions
of their tables.)
Define
and
Table 4
(from Durbin, Knott and Taylor (1977))
Coefficients bj for calculating components Zn; of in tests for exponentiality Wn
10 2 n
Zni = where cos (j 7r)
j-1 n r-I
i J 1 2 3 4 5 6 7 8 9 10
1 1.26286 0.42939 -0.08152 0.02641 -0.01392 0.00725 -0.00477 0.00299 -0.00218 0.00152
2 -0.90795 0.89184 0.43537 -0.08917 0.04128 -0.02027 0.01292 -0.00794 0.00573 -0.00394
3 0.81487 -0.42303 0.81963 0.52393 -0.13055 0.05392 -0.08171 0.01865 .0.01308 0.00883
4 .0.79381 0.36115 -0.41482 0.77280 0.52054 -0.12745 0.06432 -0.03518 0.02368 -0.01557
5 0.77881 .0.33196 0.32235 -0.31993 0-74527 0.58137 -0.15376 0.07052 -0.04365 0.02738
8 0.77533 -0.82035 0.29053 -024859 0.33098 -0.73309 -0.55782 0.14841 -0.07819 0.04538
7 0.77325 -0.31301 0.27241 -0.21355 0.24085 -0.28686 0.71952 0.68475 -0.16805 0.08174
8 0.77854 -0.31126 0.26453 -0.19890 0.20767 -0.21002 0.30039 -0.72119 -0.58531 0.16334
9 0.79770 -0.31603 0.26409 -0.19307 0.19240 -0.17908 0.21487 -0.27897 0.72921 0.81923
10 -1.07747 0.42423 -0.35054 0.25170 -0.24387 0.21890 -0.24017 0.26415 -0.39111 0.98388
Compute
10
(25) Z n = Y_ ajj ^,j ' , for 1 <_j ^ 10,
J'= 1
where in case N [in case (E)] the a - are given in Table 3 (in Table 4). Treat
;;
the Z„ as independent N(0, 1) rv's. Of course, the first two Z's, and then
the next two, are of greatest interest.
Al 0 i
(26) Z=r'AI'=(Y1 ... Y,,) ••
O a )(
YY',,
244 INTEGRAL TESTS OF FIT AND ESTIMATED EMPIRICAL PROCESS
for eigenvalues A 1 > A 2 >' • • > A n >0 and eigenvectors y,, ... , y. For ¢ 54 0 set
a, 'y^
(27)
an ^yn
(28) j = T 00'.
-
(29) d(A)S(A)=0,
where
a2 ‚ A
(30) S(A) =1 + = 1 +a'(AI—A) -'a and d(A)= fI 1— 'J.
i=1A — A^ - A/
If all a; 0 0, then the ,1; 's are exactly the solutions of S(A) = 0. Finally, 5,
for all j.
ig = g = (T — 00 4 ,
—ca=r(,lI—T )g fora=rçb
= r[1r'r — r'Ar]g
(c) = (I I — A)Fg.
n -
j=, Ä — As
We will think of (e) as one linear equation in one unknown; and we will write
it in the form
If cg = 0, then (b) implies that ,1, g are also an eigenpair for T; and thus cg = 0
implies that ,l equals some )1 ;, which implies d(,l) = 0. If cg 0 0, then 1 can
satisfy (f) only if S() = 0. Thus in all cases we see that any eigenvalue Ä of
^ must be a solution of
(g) d(.1)S(,1)=0.
In the next paragraph we will show that the converse also holds.
Suppose now that A* solves (g). Then either case 1: A* 0 any A ; or case 2:
A * = some A. In case 1 we define
(h) g*= — ^ a1 Z
=—r'A(A*I—A) 'a+(r'a)(-1)
- using S(A*)=0
= —r'{A(A*I—A) '+ I}a -
G)
In case 2b, A * = A, is a double root of (g) ; we note that A, has two perpendicular
eigenvectors, namely A, and the g* of (h). Thus each root of (g) has eigenvectors
of the proper multiplicity.
We leave the proof Ä <_ A for all j to the exercises.
; ;
An Exercise
Exercise 4. (Kac et al., 1955) Let X = N(0, K) on (C, 16) where K is con-
tinuous and has a Kac and Seigert decomposition K(s, t)= 1, A f (s)f (t)
;
for 0< s, t <_ 1. [Note Mercer's theorem (Theorem 5.2.1). Let Z* = f ö Xf dt/ .
Let ¢ E 2'2 be such that
Y_
(i)
az
(32) b2 = —' < 1 where a; = cb(t)f; (t) dt.
=1 Al
(33) I[Y_ I
a,f, -4 -*0 as m-,
CONFIDENCE BANDS, ACCEPTANCE BANDS, AND PLOTS 247
i —JIb 2
(36) R(=RC— 62 W4= N(0, K) on (C, W).
Confidence Bands
Acceptance Bands
Recall from Section 5 tha t the estimated empirical process -.in [C;„ — F„] for
normal F satisfies Jn 111 Ä —F„ =Jn IIW„(( —X„)/S„)— 1. Thus, if
P(✓ n —4 (( —)/S)I <_ k) = 1—a, then the functions IF„((• —
t k/Jn contain the N(0, 1) df (D with probability 1— a if normality
1.0
100
.9
50
30
8 20
>- 15
0
Z 10
w
w .5
w
U- .6
w
<5
Ui
> .4
5 5
.3
10
U 15
2 20
30
50
.1
OL
—3 —2 —1 0 1 2 3
STANDARDIZED SAMPLE VALUE
1.0
.9
.8
.7
.6
.5
.4
.3
.2
.1
0
0
STANDARDIZED SAMPLE VALUE
Plots
For an i.i.d. sample Y,, ... , Y„ from Fo ((y — fs)/o ), the QQ-plot (quantile)
-
ity) graphs ordinate i„, ; = Fo ((Yn , ; —p/o ) versus abscissa p,, = (i — Z)/ n. The
- :;
a a
n 0.50 0.25 0.10 0.05 0.01 0.50 0.25 0.10 0.05 0.01
distribution of
SP _
Note that
1
(2) D, = IIG. - I = max i Ifn:, -P.,:jI +2n,
Exercise 2. (i) Develop formulas for the acceptance region of the D,,-test
of Ho: (Fo, µo, ob). Do this for three cases: when the acceptance region is to
be shown on a QQ-plot, on a PP-plot, and on an SP-plot.
250 INTEGRAL TESTS OF FIT AND ESTIMATED EMPIRICAL PROCESS
(ii) Repeat this exercise for the D „'-test. In particular, on the SP-plot,
the D' -test yields the acceptance curves
Table I also shows Michael's simulated percentage points for the statistics
Dn P obtained by replacing F0 , µ, o by ' o , X„, S. in D„ P above. Michael's
Figure 2 shows the six acceptance bands of Exercise 2 (but with I, X, S)
for a data set having n = 20. Note that because of the correspondence between
the Dn' plots, the same point causes rejection of a normality hypothesis on
each plot. However, the visual properties of the plots are different. Note that
D„ P leads to straight lines on the SP-plot.
c
2
0
-3 2 -1 0 1 2 3 0 2 4 .6 .8 10
Standard normal quantiles (x) .... l niform quantiles (p)
Dsr
8
I)
0 6
v
E
. 4
c
2
0
0 .2 .4 .6 .8 1.0
Sine quantiles (r)
Figure 2. Probability plots with 95% acceptance regions based on D and D5 (a) QQ plot, (b)
PP plot, and (c) stabilized probability plot.
8. MORE ON COMPONENTS
(3) d; = —f /.JA .
;
(4) _— U dd,
0
n
(6) =n d () ; since Ed3(6) = 0 for these d ,
;
r =,
This statistic could arise as a statistic used to test for the Uniform (0, 1)
distribution. The ordinary CLT gives
o
m
iI
m
i=1
\
b; asn-oo.
T mit = mb
m
L a F0 ) d [
(
n
0 ( f —B°)]
m n
(11) =E b n - " 2 d; (F^(X; )) for the d; of (2).
Under this mild regularity our next assumption will be true; we assume that
0n-00
'
; (FB )) dF^+o(1)
y J — --(d
=y
f 0
d'[\aeßfB) ° Feö ][i /fe° ° Feö] dt+o(1)
and
Var [d; (F^(X; ))] = 1 + o(1),
MORE ON COMPONENTS 253
so that we expect (12) to hold with [use (13) for less regularity]
Fo .
(15) a; = J d;(Feo)[—0 logfe] dF
for which Eq. (12) can be justified (typically, by an argument of the type
outlined in Remark 1). Which orthogonal functions d and what constants b ; ;
I/2
(18) maximizing e (a,b)= Y-
m ab b;) ; j maximizes power.
i -IIG I
is now fixed and since the choice b = a thus maximizes e (a, b).
; ; m
(ii) For fixed m we would like to choose d to maximize Y_, a; and then
choose b = a .
;
; ;
(iii) Since m = I would be nice, let us choose d; (= dm = d,) so that
z
(19) d,(F,,) eo logfo]/ I( 0 o) where I% = E®„ [ a8o log.!e] ;
-
(21) 7- m„ = n vz E a logfe(X1)•
254 INTEGRAL TESTS OF FIT AND ESTIMATED EMPIRICAL PROCESS
Since the Neyman-Pearson test for this hypothesis rejects for large values of
In some sense, we have come full circle with the idea of goodness-of-fit via
components. This does, however, suggest that a reasonable choice for a good
omnibus d, is one that "approximately satisfies (19) for the alternatives against
which we seek protection."
In the spirit of the robustness literature, in testing for a location parameter
0 we could choose for d 1 (F00 ) the normalized Huber influence function. (A
more interesting situation would be to estimate location and scale and test for
shape.) (See also Parr and Schucany, 1982.)
where
2 __ j J ' f2 (x)f2 (y)[F(x) n F(y) — F(x)F(y)] dx dy
3 (x) dx] 2
(3) F [I T f
is necessarily finite.
THE MINIMUM CRAMER-VON MISES ESTIMATE OF LOCATION 255
Generalizations of this theorem are found in Koul and DeWet (1983).
For further results in this vein see Bolthausen (1977) and Pollard (1980).
Proof. This result improves Blackman (1955). Our proof is from Pyke (1970).
We define
where
by the special construction of Theorem 3.1.1, II F( • + bn ' /2 ) — FII -> 0, and the
-
where
(k) an(b)>_a„(B.)-^a(B.)=B^ J^ f2 dF
Thus
B^,
Figure 1.
provided n a- some n w .
The combination of (e) and (n) added to (o) shows that M„ is "sufficiently
u- shaped" to imply that the location of the minimum of M. corresponds to
b.„ for any E >0 and all n >_ some n r. ,,,[the graph of M(b) must lie entirely
within the marked-off area of Figure 1]. ❑
CHAPTER 6
Martingale Methods
0. INTRODUCTION
Heuristic Discussion
where fix _ is the o-field generated by everything up to, but not including, time
x. With this background, we now turn to our problem.
Suppose now that
We note that [dN(x)] 2 = dN(x) since dN(x) only takes on the values 0 and
1. We also note that 0 <— dA(x) < 1. We thus suggest that
Moreover,
that is,
Moreover,
= H 2 (x)d(M)(x) by (5),
so that
Processes H(x) that are SwX _-measurable have the property that H(x) =
E(H(x) 1 9x _) or that H(x) can be determined by averaging H over the past;
such an H is thus called predictable. The statement (13) can be summarized as
INTRODUCTION 261
lim E {M (x)E{M (y)—M„(x)I3wx }}= tim E {M (x) • 0} =0; and for a normal
M M M
where
As noted above
In fact, conditions under which all of the previous heuristics are actually true
are given in Appendix B. Even without Appendix B, we can use these heuristics
as the first step in a guess-and-verify approach.
(25) N1 (x) = l (x ,- x] , x€ R,
satisfies
Thus
Coy [Mli(x), Mli(Y)] = EM (x A Y)
= E(Mi:)(x A y)
J
xny
= E 1(x[1 —AA(z)] dA(z)
xny
=E
J
xny
E(1[x ] )[1— AA(z)] dA(z)
[1—DA] dF
=j
(29) = V(x A y),
where
= J x [I—AA] dF
(33) = V(x).
(34) M
y taw = §(V) on (DR, 2R, 11 11)
Note that 0= A(–oo) <_ A(x) <_ A(y) A(r) = A(xD) ^ 00 for all x:5 y ^ -r, where
0F 1–F
(5) DAA–A 1–F satisfying
In fact, we will typically phrase our limit theorems in terms of the special
construction of Theorem 3.1.1. Thus, suppose f„, , ... , f„„, U,,, W,,, U, W are
as in the special construction of Theorem 3.1.1. Now let F be an arbitrary df,
THE BASIC MARTINGALE M,, FOR U,, 265
and define X,, = F ' ( ;) for 1 <_ i <_ n so that X,,, , ... , X,, are iid F, as claimed.
; -
where
(12) "s
7L„(t) — U„(t)+ Jo U ( ) ds for0<_t<_1.
Note that if 1F, denotes the empirical df of the one observation X„ , then
; ;
(13) ^n(x)
1
E^ ^1;(x) =
✓» ;
E l [^1^(x) — J x^ (1 - 1F1 _) dA
1
J
[Jtxj^xj—
f ltXr] dA(Y) .
(14) ;Y-1 J
That ! „ is indeed a martingale will be verified below. The natural limiting
process to associate with NA,, is
(15)
I
M(x)=UJ(F(x))+ UJ(F_) dA
x.
for—oo<x<oo.
(16) Coy [M,(x), M1 ,(Y)] = COV [v11 (x), (Y)] = V(x Ay)
and
where
Note that 0< V(x) F(x) _< 1 with V 7. Also !ill has uncorrelated increments,
and is thus a martingale.
Proof. Note that the iid R4 have for x <y, by Fubini's theorem,
u,x] —f -1
XY
dF(v) dA(u)— +
f m [u,x}
('Y
J dF(v)
x
dA(u)
(d) = V(x).
—i x J
^o {u}
dF(v)dA(u)=— J x [1—F_(u)]AA(u)dA(u)
_— Y- [1—F_(u)][AA(u)] ,
2
(e)
usx
giving (d).
Now for any fixed x we have from (14) and the c,-inequality that
(19)
2 2k-I
{1+E
for all x.
2k
}
THE BASIC MARTINGALE M„ FOR U, 267
a 0 2k
^z 2k
1 2k 2
(f) < fl E 1 [x^) fi dA(y )
-^ j=1 j_I
2k
}
(g)
_ F() dF(Y)
I Jl — _y
( t 1 l 2k
(1 — t) ' " 2 dt t by (3.2.58)
(h) < j
`fo , J
-
= 2 2k .
Now -mo d plus uniformly bounded moments of all orders implies convergence
of all moments; see von Bahr's inequality (A.5.1) for the uniform boundedness
of the moments. Thus (17) holds. In fact, using Cauchy-Schwarz to bound
product moments,
(a)–UII+II
= 0(1) a.s.
(1 I)(1–F_)
U 1)1/4 - 00 f
1 314dF
(23)
where
AO(x) =Z(F(x))+
dj x f F(d y(F-(dj)) y(t)
1 _ F (d
F-(d,) ;) 1–dt,
t —
(recall Exercises 2.2.14 and 3.4.3). From (23) we see that ABI = Z(F) if F is
continuous. Finally, we define S via the equation F = S(V); note (17). ❑
Corollary 1. We have
where Z is the Brownian motion defined in (24). Note (11) and (12).
Exercise 2. Prove directly the elementary (28). (Prove the result for n = 1 first
and then add up the results for the M0, 's.) This result is a special case of a
1
and establish their limiting distributions. This goal is inspired by Gill (1983).
Csörgö et al. (1983) deal with a similar topic, but not overtly in the spirit of
counting process martingales. We owe a debt to both in our development.
An Exponential Identity
Proposition 2. We have
(31)
^[lr „(x) —F(x)] _ 1
—F(y)dA0„(y) for all x <F'(1).
1 —F(x) _.1
1 —
F„(x)
1—F(x)
=1— 1 d(M„(Y)/ )
.! . (1—I:„(Y—))(1—AA(Y))
with
a (local) martingale. If dX„ in (31') were replaced by l (o^ ) (y) dy, the solution
of (31') with Y„(0) = 1 would be Y(x) = e”. Thus V. is in some sense the
"exponential of X„.” This is made precise by the theorem of Doleans-Dade;
see theorem B.6.2.
Proof. See (3.6.1) for the fact that the process is a martingale. Now
1 X 1 dM„(Y)
T _^ 1— F(Y)
fx
= d(1—F)
^1 1 F,dF„+ J-.(1 — F)(1„— F-)
dl„—I^(1—F„-)dl 1F
(a) —J-^1
d(1 — ^„ )
(b) —J 1 1 F dF„ 1—FL. .1 1 F
1—IF(x) F„(x)—F(x)
1—F(x) +1 = 1—F(x) .
where
and
All of the above is true. [The unusual thing in this example is that EX (x) is
easier to evaluate than E(X)(x).] Additionally,
provided F is such that F puts no mass at F - ' (1) [i.e., T< F 1 (1) a.s.]. Again,
this is true.
Exercise 3. After you gain some familiarity with the methods of Appendix
B, come back and verify all statements made in the Heuristic Discussion of
Section 0. They are a special case of results in the next chapter.
studied by adding up results for F, and M h using weights c,/ c'c. Note that
so all our previous results apply to the truly basic martingales ABO, , 1 <_ i<_ n. ;
(40)
n
fy0„(x) - I 1 ^,c 1[xxl_ [ J x
1 [x; > r] dA(y)] for —oo <x <
Also
(1)
1 U(x) = Y- 1 - [ l(Xcx]_J' '(X^YJ dA (Y )
Since the case 4iM. is very routine and does not seem to hold much interest,
we will concentrate instead on the important cases 4W,(F) and iW „(F).
Our theorem can be looked on as a rederivation of part of Theorem 3.7.1.
However, we will now work on (—co, oo) in terms of F, and we will obtain a
particularly interesting q- function as a corollary.
We will call t r u -shaped if
(3) is >0, \ on (—cc, 0], and 1' on [0, oo) for some 0
Theorem 1. If the u.a.n. condition (2) holds and ci E '2 ( V) is u- shaped, then
and
(c) _ E -2 J B (1- F) 22 dF
(1-F)(1-F_)
= e -2 J B qi 2 dV by (A.9.18)
Now [8', oo) is symmetric to (-cc, 0], and [8, 0'] is trivial. Thus
that is, (5) holds. Then (4) is a special case of (5). We wish to summarize for
later reference two statements proved above: for any y E. 2 ( V) that is right
(or left) continuous and u-shaped,
(8) EX2<00.
PROCESSES OF THE FORM aM,,, çliU„(F), AND tI'W „(F) 275
(a) J J
c/i2dF= (F ) 2 dI=EX 2 <0c.
Thus the first statement of (9) holds. Now F - ' has only a countable number
of discontinuities, and the chance some a,,, assumes one of these values is 0.
Thus, the two suprema in (9) are a.s. equal. This holds for W. also. ❑
-2
(11) P01MAI 11 Bcow E) <E iAZ d`/,
by Inequality A.2.11
—P \I )J q 'dM"Ilo>E/ by (6.1.31)
=P(IjKn o— e 2 )
<e EK (0)
-2 by Doob's inequality (Inequality A.10.1)
since
(14) K(B)=
where
_ 1 _ f 1 1
with qe = l[o,e]/q
(15) Y^ ge(fj) o (1— t)ge(t) dt
and
e
(16) EY=0, Var (Y)=q -z dt and EIYI z+s ^CS Jq - c z+s ^ dt.
f 0
Exercise4. Suppose that r>2 and q(t)=[t(1—t)]' 1 ' for 0 t<_ 1. Use (17)
and inequality (A.5.6) to show that for any 0 ^ p < r we have
ÖP p
_ CPEI YV' if p >_ 2
(1) U.(x)=^
c^;c rl^X^X]—l[X.]dA].
PROCESSES OF THE FORM J__ h dM„ 277
Theorem 1. Suppose
X
(2) K„(x)= hdM,, for—co<x<oo,
= J h 2 [1—AA]dF= f h 2 dV.
(b) =
Jm h 2 E[1 —F 11 _](1 —zA) dA by Fubini
We now apply Theorem 8.3.1 to claim that Oä, ; is a 0 mean martingale with
Theorem B.3.1 applies to the process l&,, (which is not a counting process in
the U (F) case, if F has even one discontinuity). We will "add up" our
previous results for the M, . Thus the sum of martingales is a martingale, and
;
_
(h) cn; [O(ia(x) – (k(;)(x)]+^^ S"—`D,r(x)lii(x) by (f), (k)
=,cc i #j CC
_ n Cni x (
(I) —K n( x ) — = K (x) — f- h 2 [1 —
r (K1)( x) „_](1 — AA) dA
W
as claimed. For (h) we write 1( 11 (y) =11( 1; (x)+ jY hdM, ; and get
v \
Y
E(f1r(Y)Kij(Y)i^X)=D(x)ll(,;(x)+E(
f
^ Y
hdlg,;
x
D'
hdUj xi
/
+K, (x)E
; h dM, fx) +K, (x)E
; ; h dMU, I . Viz)
;
x x
(k) =K,i(x)US,;(x)+0+0+0.
Since lK is a martingale, get two 0's in (k). For the last 0 note that
(1)
while
.I.T,
(XYI O'^ J
h dM,; h dAA, ; =0 on the event [X,> x and Xz > x]`,
(m) and ASU, are conditionally independent on [X, > x and X2 > x].
;
PROCESSES OF THE FORM f h dM,, 279
Theorem 2. Suppose h E.T2 ( V) as in (3) and the c,'s satisfy the u.a.n.
condition (6.2.2). Then there exist rv's, to be denoted by K(x) = l x s hdM,
satisfying
and K, _ I h dM satisfies
(9) IK (x) -,, N(x) for each —oc<x<oo, for the special construction.
Moreover,
Theorem 3. Suppose the c„ ; 's satisfy the u.a.n. condition (6.2.2), Jih E f,( V)
and i/i is >0 and \ on (—co, +co], then
where K = S( V < ).
(12) f x
S
l( .Y) d vfl = J Y
Y
dM = M(x).
If h, 1c .2( V), then
J
YA1 Y^1
hh[1 — AA]dF= V.
(14) = _q
d x .1 x
(a) J (h — h e ) 2 dV<E 3 .
All that remains is to verify an analog of Eq. (i) in the proof of Theorem 3.4.2.
Thus, as (7) shows,
(b) Var I
L
J x (h—h e ) dNA„] = J x (h — h ) 2 dV <E 3 ,
t
as was needed to complete this part of that proof. We have thus established that
J
X xnX^
by (c) and (17). The covariance claimed above comes from applying the same
argument, but using a bivariate version of (17) and (18). Also, (4)-(6) imply
Coy [K(x), K(y)] = VK (x A y). Hence K = S( Vs ). ❑
a simple time change from [—cc, cc) to [0, oo) in Rebolledo's central limit
theorem (Theorem B.5.2) suggests that K = S( VK ). In the proof of Theorem
7.8.2 we verify Rebolledo's ARJ(2) condition in a setting more general than
the present one. ❑
PROCESSES OF THE FORM J_ m h dM„ 281
Proof of Theorem 3. Since K,, and K are both martingales having covariance
function V,( :
J
9
and
Both statements are immediate from the Birnbaum and Marshall inequality
(Inequality A.10.4).
In order to treat the middle [ 0, , 02 ], we choose a step function h e of the
type (3.4.23) so that
(a) J BZ
6,
(h — he) z dV<e 3 /[g 2 ( 0 )vq, z (ez)]
[as in line (d) of the proof of Theorem 3.4.2]. Then applying the Birnbaum
and Marshall inequalities to the martingales — h ) and 1' (h — J ,
E
h ) dM gives
E
z f :(h—h)zzdV<e
(hhe)dM^e)`e
(b) P ^J e, J a,
and
(c) P ^ e, li
J(h —hM0Z
e)d?ef se - zJe2(h—hE)2 2 dV<E.
e, / e,
Integration by parts and then (6.1.21) show (with 0, and 0 2 continuity points)
92
h £ dM t
I h,d^O ^ —
Jo t Jo, 11
III^ —MIIä;IIhEi }ä;+IM^(01) —M(0^)I Ih,(01)1
+IIM^—
III j BZ
8,
dlhEI
(d) ->a.s.0.
For the tail [02 , co) we note that +I' is irrelevant, and the method of (19) gives
W
with an analogous result for 1K. Combining (19), (20), (b), (c), (d), and (e) gives
ifOÖ„(x)= J x hd MO„
The first line of (21) is an hypothesis on h. The result is true since Z. >„Z -
Suppose
(1) h E .2 (F).
J x
hdU„(F)=
J
h(y) dF[1-F(y)] 1-F(z) dM„(z) 1
I h(y){[ 1- F(y)) 1 dA^„(y)
1- F(y)
= J h(y) dM„(y)
X
+ _X= 1
1 -F(z) f
x (y)d[1 - F(y)l dY„(z)•
h
PROCESSES OF THE FORM J a , h dU „(F) AND J_ q, h dW„(F) 283
h dU „(F) = aö„(x) =
(2) J
x
J x hx df^0„
for
(8) h E -W2 (F) implies all hx E Y2 (V) for the h* of (3) and V of (6.1.18).
Theorem 1. Suppose h, !E2 2 (F) and the c„ ; 's satisfy the u.a.n. condition
(6.2.2). Then there exist rv's, to be denoted by J h dU(F) = J% hX dM and
1%h dv(F)=$', h* dF , such that
(9) y
hdv „(F)
fx .
hdv(F)
J h dv „(F)
' -
J h d1J(F)
(' y
0 N vh,a(x,
Y) Vh,K(Y, Y)
284 MARTINGALE METHODS
where
(10) Vh h(x, y) =
J
xny
hh dF—
I x.
h dF
y
h dF for all x, y.
Theorem 2. Suppose the cm's satisfy the u.a.n. condition (6.2.2). Suppose 4r, h,
and 4/ih are all in ^z (F) and 4i is >0 and \ on (—co, +oo]. Then
1 -x.
hdU,,(F)= J
=
hd (1 —
L F) 1„(F)]
(a)
f
= x h[1—F]d 1 U F +J h_ d[1 — F]
X X Q)
„ (F4 d ^hd[1—F]
(b) _ .hdt&+
1—F_
PROCESSES OF THE FORM J_,, h dU, (F) AND 1' _ h dW„(F) - 285
Thus
(14)
(d)
Ix h dU „(F)
.
f x hdM„—
= B1(x) —
Bn2(x)+ Bn3(X)•
hdF+ _^[
( ) . hdFdM „
0 3
h dUJn(F)II ^ >
_ 36 ) <_ Y l P(II 4GBn , Ile^^
(e) P ( 1144
and we will bound each term on the right of (e). Now
e
(f) P(IIY'BnIIIB^? E) <E -2 1^l2h2 dF
P(IIcGB n3 I1 e^^ E) ^ E 2 J
[1 F(z)] 2 J h(r) dF(r) J h(s) dF(s) dV(z)
=e 2 -
J . J . h(r)h(s)
J rv s
(1 - F)2 dVdF(r) dF(s)
<E 2 J q, (
2
f Ihi dF) 2 d[F/( 1 — F)]
Combining (f), (g), and (h) into (e) gives the claimed inequality. ❑
286 MARTINGALE METHODS
\Il f
0
^i hdU(F)IIB.>2ef <2s -2 J 4, 2 h 2 dF
(15) P
and
It is clear from (14) and Theorem 6.3.3 that for some n e we have
for any fixed 0 2 . We now suppose 0 2 chosen so large that (4i is irrelevant on
[ 02 , 0o) since ill is \) a symmetric argument gives
In the spirit of Remark 3.1.1, we let NN;, denote the basic martingale of (6.1.10)
associated with W„ = U. while the basic martingale of (6.1.40) associated with
W W. is denoted by M`„. As in (3.1.24) we let p„ = p„(c, 1) = c'1/ c'el'l;
PROCESSES OF THE FORM ,(_-_ h dU„(F) AND f'_ h dw„(F) 287
and we assume that p„ - p as n -> x for some number p = p c,, . Then (6.1.16) gives
for V(x) = J". (1 —AA) dF. From (6.1.49) and then (6.3.14) we obtain
In similar fashion
xAY
1
Cov W„(F(x)), f h dNN„(F)1
(21)
= Cov L
= VI.h(X,Y)
i 1 dW„(F),
f y.
h dW „(F)
1
288 MARTINGALE METHODS
(22)
I
Coy U,(F(x)), J y
m
h dW„(F)I = Pn Vd.h(x, Y)•
For the limiting process: Just replace p„ by p in (16), (18), (20), and (22),
while (17), (19), and (21) remain unchanged.
Replacing h by h„
Consider now
(23)
1 h„dW„(F),K„=
.0 1 hdW„(F),
.0
J
(24) (h n _h) 2 1! 2 dF_) o as n--+oo
x J B (h „ - h) 2 dF J B ti 2 dF}
111
(c) <.a for n ? some n . e
^3e)
P \II -^ h „dW „(F)—K„ Il e
P(II I - ^ (h„—h) dW n (F)11 >3e)
Theorem 3. Suppose the c ; 's satisfy the u.a.n. condition (6.2.2). Suppose +s, h
and Oh are all in '2 (F) and Ir is >0 and \ on (—cc, +co]. Suppose also that
h„ approximates h in the sense of (24). Then
5. PROCESSES OF THE FORM J_„ (^9„ dh, ,(_„ U (F) dh, AND
JW (F) dh
Basically, these processes are treated simply via integration by parts. We call
h of bounded variation inside (—cc, co), denoted h is BVI (—cc, co), if h is of
bounded variation on ( -0, 0) for all real 0.
Theorem 1. Suppose the c„ ; 's satisfy the u.a.n. condition (6.2.2). Suppose h
is BVI (—cc, cc). Suppose >0 is \ on (—oc, +oo], jcih + I is bounded by a
290 MARTINGALE METHODS
u-shaped function. Suppose 1i, h + , h_, /rh + and tih_ are f2 (F). Then
where U 11 (F) = f (F, ; — F) is the empirical process of the one observation X,, ; .
by Theorem 6.2.1 and 6.4.2 provided the c ; 's satisfy the u.a.n. condition (6.2.2),
provided (for Theorem 6.2.1) ./ih + is u-shaped and i/ih + E '2 ( V) and provided
(for Theorem 6.4.2) h_, ./rE' 2 (F), fr is %, and r/ih_ E.2(F).
REDUCTIONS WHEN F IS UNIFORM 291
The process U" (F) is a special case of W (F). The proof for M. is identical,
except that Exercise 6.2.1 replaces Theorem 6.2.1 and Theorems 6.3.2 and 6.3.3
replace Theorem 6.4.2. ❑
Exercise 1. Show that, under the appropriate hypothesis from above,
(9) Coy I
L
J x W (F) dh, f
1
-
h dW (F)n I
L Li . hdF
-F(z)
L J
1
i dF dh(z).
1—s
U
for0<_ t^1
for the basic martingale. Setting h =1 in (6.4.14) reduces it to the same identity
as (6.1.31), namely
1 d for0<_t^1.
(2) U" (t) =
J 01 — s
(s)
(4) M(t)=Z(t)=U(t)+
f , U(s)
o1—s
ds for0<—t <_1,
(5) U(tt — f 0
1 1 s dM(s) for 0s i 1,
292 MARTINGALE METHODS
(7) o h dM
f .,
hdU+
fo ,
h(s) (s ds ^s for 0 t <_ 1, h c Y 2 .
where the rhs of (8) is rich in martingale structure. Finally, (6.4.2) and (6.4.3)
lead to
(9)f o
,
hdU=
fo ,
Ih(s) —j1
for 0<_t<_1,hEY Z .
-'-
f^ '
h(r) dr dM(s)
Of course
0. INTRODUCTION
^S„:— sn:i
(2) 1—^n(t)= ^[ ^1— 1 ^1—
Z „,15, \ n—i +I z„ :5, n—i+l
<t<co,
for 0_
where 0 <_ ZZ ,, • • • <_ Zn ,„ < oo and S„ :; are the corresponding S's. [If there are
ties in the Z's, the uncensored Z's (S = 1) are ranked ahead of the censored
Z's (S = 0).] Thus 1 —F n is a right-continuous, \ step function on [0, oo) with
constant value 0 or fj"= ,(1-1/(n — i+1)) b„ ' on [Z,,.,,, co) depending on
whether S n: „ is equal to 1 or 0, respectively. We have thus chosen the convention
293
294 CENSORED DATA AND THE PRODUCT-LIMIT ESTIMATOR
that 1— F„ equals 1— lF„ (Z,,,,,) for t >_ Z,,,„ when 5,, ,, = 0; other conventions set
:
it equal to zero or leave it undefined in this case: see Meier (1975), Peterson
(1977), and Gill (1980). We focus on 9„ since it will be shown in Section 8 to
be the maximum likelihood estimate of F based on the (Z , 3 )'s. ; ;
Our object in this chapter will be to present results for I„ and X. which parallel
results given for F. and U,,(F) in the uncensored case of previous chapters.
A major difference in our approach here will be that in the later half of the
chapter we will rely heavily on the martingale theory of counting processes,
as recently developed by Aalen (1978a, b), Bremaud and Jacod (1977), Gill
(1980), Meyer (1976), and Rebolledo (1978). Some of the key results of that
theory are summarized in Appendix B. A nice elementary treatment is found
in Miller (1981).
Before describing the contents of the remainder of this chapter, we need
the following further notation and definitions. Note that the pairs (Z , S ), ; ;
H 0 (t)=P(Zt,8=0)= I (1—F)dG,
.1 [0.]
for 0<_ t < oo, so H= H+ H ° . For any df F let TF = F '(1) = inf { t: F(t)= 1 },
-
in
h
(5) H,(t)=— 1[o,t](Z;)s;, for0<t<oo
n
^n(t)= — E 1 [01](Z;)(1 — S;).
n
(7) A(1)= 1 dF
fro") l-F_
=
f ro , t) (1-F_)(1-G -)
1
(1- G_) dF
(8) = J [0 , 1 (1-H_)
1 dH' for 0 <_ t < TG
[If F is absolutely continuous with density f and hazard rate A = f/(I - F),
then A(t)=JA(s) ds and I - F(t) = exp (-A(t)).] In view of (8), a natural
empirical cumulative hazard function IA,, is defined by
We choose to define A. in analogy with (8) rather than (7) since it is obvious
that IIH„ - H tI - a.s _0 and IIH1„ - H' II a.s. 0, while supo<,<T CIF, -Fit a.s.0 is a
much deeper result that is to be shown in Section 3 [by first showing that the
ÖA,, of (9) satisfies IIOA n - A ii ' a . s _ 0 for any 0 < r = F ' (1)]. In Section 2 we
, -
(10) A „(t)
=fro t] , I —
I, di„ for0t<oo,
1F n-
and that
1- 1
(11) (1 - ? n )(1 -^ „)°1-H„ for 1 -I „(t)= f
z-i\
n-i+l) n.,
(13) G`„(t)=
fco,t]
(I -9n-) dA
We also consider
The study of X,, our main interest, is facilitated by introducing these other
two processes.
In Section 2 we will establish identities that represent B. and X,,/(1— F)
as stochastic integrals with respect to the basic martingale (see Theorem 7.2.1)
(17) J=
I
r]
S.
for s> 0, while
o f
ro ' tj
J
= ✓n[H(t)—H'(t)]+ J t
0
v(H"--H_) dA
Ij
H)
(2) = ✓ n[Hn(t)—H'(t)l+ f0 1 (H
v—H_ —
" dH'.
CONVERGENCE OF THE BASIC MARTINGALE M„ 297
( l
IYn(t)=—
. ^^_
1
n
n
[ 1 [Z1 I) 6 I
'
— dJ 1 1z; > u1
o
A(u)
J
(3) =T ^ 1 M,^(t),
LL o
= E(1 [z s] 61 [z .,, ․ )+ f5 J0 0
E[1 [z ^ U] I [z 0] ] dA(u) dA(v)
rs
—
=
J 0
S(1—G_)dF
+{
f
l , .
^s [1 —H(u v v —)] dA(u) dA(v)
0
+
f J [1
o" 5
' —H(v—)]dA(v) dA(u)}
1
=
J p
5 (1— G_) dF—
(5') = J 0
(1 — G) dF = H'(s) for continuous F and general G.
298 CENSORED DATA AND THE PRODUCT-LIMIT ESTIMATOR
Theorem 1. Suppose F and G are arbitrary df's on [0, oo). There exists a
special construction of both the rv's X,, Y,,,, ... , Xn ,,, Y,,,, n >-1, and the
Brownian motion § for which
Proof. By the ordinary CLT based on (3), and by (5), we easily have
for 0<_t<oo,
where the 1 [z 8 i are iid Bernoulli (H'(t)) rv's. Thus a minor variation on
our ordinary theory of empirical processes shows that
with Brownian bridge V, for a special construction; recall that H' is a sub df.
Also
satisfies
for an appropriate uniform empirical process U,, so that we may also assume
of our special construction that
we define
with
since H = H' + H °
since
by (3.2.58).
Note that we did not have to use (15), (17), and (18) to determine the
covariance function of M. Rather, (5) sufficed. 0
(21) "
(hIr)(t)= J 1[1—DA,]dA,= J 1 [z . z , ] [1—AA]dA,
0 0
so that
recall from the introduction that this means F is not degenerate, while G
could even be degenerate at +co.
We begin by stating the key identities; the first is elementary. Let
— (r J — ^eA
T 1
(3) Bn(t) J J. dl^ll„ — dlyU„ for 0 <_ t <_ T
p 1-l _ p 1-H_
and
(4) SS „(t) (` 1 —^„_ J„
—
dvt for0<_t<TFand0^t<_T.
1 —F(t) Jp I —F 1—U-0 „-
Proposition 1. (Aalen and Johansen; Gill) For any df F on [0, oo), and i n , A,
and /A„ defined by (7.0.2), (7.0.8), and (7.0.9), respectively, we have, for
0 <t<oo,
and
-1— Jp
(7) 1— F(t) 1—F d(A " —A)
or
Let TB = inf { t: B(t) = cc}. Then the unique locally (i.e., on each [0, t]) bounded
solution Z of
Z(s—)
(9) Z(t) =1—
f.' d[A(s) — B(s)]
1 —AB(s)
on [0, TB ) is given by
Proofs
Proof of Lemma 1. We follow Gill (1980), who adapted the proof of Lemma
18.8 in Liptser and Shiryayev (1978).
We now show that (10) does define a solution to (9). The right-hand side
in (10) is clearly locally bounded on [0, TB ). Let
U(t) — 1—AA(s)
o<s<< 1—AB(s)
and
Then using
I
(a) AU(s) = U(s) — U(s—) = U(s—) U(s) —1
U(s—) J
= U(s—)
1 1—AA(s) 1
1—OB(s)
in the second term of (b) and seeing the paragraph below for (c), we obtain
1—Z(t)=1— U(t)V(t)
fo,
= 1— U(0)V(0)_J U(s—) dV(s)—
0
J , V(s) dU(s)
by (A.9.13)
1 —DA(s)
U_Vd(—A`+B`)— Y, V(s)U(s—)
(b) _—
fo, o =_s <, 1—AB(s)
IDENTITIES BASED ON INTEGRATION BY PARTS 303
(c) = ^ t
z_
d(A` —B`)+ E Z(s—) [AA(s) AB(s)] —
(d) = J f
0
Z
1—OB
d(A —B),
where (1 —AB) - ' could be introduced into the integrand of the first term of
(c) for free because A` and B` are continuous. Thus(10) satisfies (9) by (d).
To show uniqueness, suppose that Z' is another locally bounded solution
of (9) on [0, TB ). Let Z=Z—Z', L(t) 1111(, and a(t)=
jö(1—AB) 'd(A +B) <oo for 0 <—t<-r e . Then, for any s <
- — t, differencing in
(9) gives
Substituting the outer inequality of (e) back into the first inequality of (e) gives
using the left-hand side of (A.9.17) with r = 2 in the second step. Repeating
this up to any r yields
(a) = ✓n
'
Thus (3) holds. Then (4) follows from (7') and (3). ❑
304 CENSORED DATA AND THE PRODUCT-LIMIT ESTIMATOR
are equivalent.
Exercise 4. (Breslow and Crowley, (1974) Show that
ol
(12) 0<—log(1— „(t))—A„(t)< 1 f 1 dH'
n (1 — Hn)( 1 — Hn—)
I l-0„(t)
(13)
<n 1—H„(tY
[Hint: 0<—log(1-1/(x+l))-1/(x+l)<1/x(x+1) for0<x<oo.]
[Note (3) below for 9„(T).] Fix 0 < T=- H '(1). Then -
+
J 0
(1—H_)-'d(H'.—H')
(a) = o(1)+ 10
(1— H-) 'd(H — H') (
- by Glivenko-Cantelli
=
J (left continuous)d(right continuous)
={ —l 9„(t)
1 —F(t) } K„(t) — Jo K„d ( 1—F)
by (A.9.13)
^ 1 } IK„(t)l+^^K„1I0 fo
{I — F(B) d
(e) ^ l^K„IjöO(1) a.s.
Together (e) and (g) give t„ (t) -* a.s. F(t). Since i„ and F are >', the standard
argument of (3.1.83) improves pointwise convergence to uniform convergence
on [0, 0] for any 0< T. This extends to uniform convergence on [0, T), since
F is bounded. This completes the proof.
The five cases of the previous paragraph include all possibilities. We can
summarize them as
but it can only fail if T < oo, G(r-) < 1, iF(T) > 0, and F(r)<1 [hence
G(r) = 1]. This paragraph is from Shorack and Wellner (1985). ❑
The results of this section, but with F assumed continuous, can be found
in Breslow and Crowley (1974). The outline of the present proofs are somewhat
different, however, being based on the identities of Section 7.2.
Suppose F and G are arbitrary df's on [0, co) in the sense of (7.2.1). We
define
(1) C(t)
J
o
t (1-H) -2 dV= J (1-H_)'(1-
0
for
(1') C(t) = J 0
" (1 — H) - 2 dH' = J 0
r (1—H) ' dA
- if F is continuous,
since
[1— H(t)]-'I^II(t)—
f o,
Md[1 —H_] - ' for t^0,
(6) 8 —S(C)
for Brownian motion S.
Theorem 1. Let F and G be arbitrary df's on [0, co) in the sense of (7.2.1).
Fix 8 <T= H '(1). Then -
We now turn to the X,, = ^(f n — F) process. Noting (7.2.4), the natural
limiting process to associate with X. is
1
(8) X(t)= [1 —F(t)] fo dM
' 1—F 1—H_
308 CENSORED DATA AND THE PRODUCT-LIMIT ESTIMATOR
where
Theorem 2. Let F and G be arbitrary df's on [0, co) in the sense of (7.2.1).
Fix 0<T= H '(1). Then
-
(17) B„ (t)=[1 —
Hn(t)] `R„(t)
-
— J tb0„d[1—B]-'•
0
PRELIMINARY WEAK CONVERGENCE=OF B N AND K „ 309
Using I IH,, - HII a .,.0, IIM,,-* a . s .0 by (7.1.7) and lim T> 0 a.s., we clearly
have for the first term of (17) that
where
o0
- , -
(d) =
J '
0
MI d[1 - H„ -]-'- f
o,
M-d[1
+ EAId[ 1 —Hn-]-''
(e)f o,
fM+d[1-
0
H_] - - J ft d[1 - H_] -
J
= M c d[1 -H-] '
0
-
uniformly in 0 <_ 1< 0; convergence for each follows from the Helly-Bray t
theorem, while monotonicity gives the uniformity via the standard argument
of (3.1.83).
Finally, in (d),
J o
r
.s.5
H
01vlld[1- ,,-] ' = I DM(s)0[1-Uil„_] -
-
t
'(s)
(f) ->a.s. Y_ DA^O(s)A[1- H_] - '(s) uniformly for 0< t<_ 0
(g) = J ,
0
AMd[1- H_] '; -
to establish (f)
we just treat a finite number of the biggest jumps (of F)
individually using lH„_ - H_I) f and then lump the rest of the jumps
310 CENSORED DATA AND THE PRODUCT-LIMIT ESTIMATOR
[note that M(s)>0 can only occur if AF(s)>0]. Adding the terms in (a),
(e), and (g) gives the theorem. ❑
Proof of Theorem 2. Now by (7.2.4), for t < 0 we a.s. have that for n sufficiently
large
1—F„_ 1
(a) X„(t)= f dML
Jo
(b) =
f 0
' 1—!F„_1
df^N„ + J dfyU„.
('' 1—t„_AF 1
1—F_ 1—H-0 _ o f—F_ 1—•F 1—H- _
It's easy to treat the first term of (b) using the same technique applied in the
proof of Theorem 1; but since d(1—F„_)/((1— F_)(1—H„_)) is not d (a
monotone function), we need to use (A.9.14) and (A.9.16) to break this up
into the difference of two d (a monotone function) terms. It remains only to
treat the second term of (b). As in the proof of Theorem 1, the technique for
the second term in (b) is to pull out a finite number of the biggest jumps (on
which the convergence is trivial) and note that the contribution of the other
terms is negligible [provided enough terms were pulled out so that the contribu-
tion of the remaining terms to ^^, AF(s) is sufficiently small]. ❑
Exercise 3. Determine the covariance functions Cov [B(s), M(t)], Cov [X(s),
ß1(t)], and Cov[B(s), X(t)].
5. MARTINGALE REPRESENTATIONS
{X, ATA : 0 < t < oo} for each k? 1 where Tk is a sequence of stopping times
satisfying Tk -> oo a.s. as k -> oo, then the property will be said to hold locally.
Recall that Z; = X; A Y; , S i =1 (X; < Y) ,
in in
(2) Hn'(t)=— 1 [o ,, t (Zi )5, and H„ =— 1 [o ,, } (Z; ) for 0<_t<oo
n ; _ I n ;_ I
where X 1 ,. . . , X. are iid F and Y,, ... , Y„ are iid G. Consider the process
H,. Now the predictable process
is quite clearly an / process that is adapted to .^ ,,. We will show presently that
sition of the submartingale H. into the sum of a martingale and an 1' process;
nA„ is called the compensator of nH ,. Since M. is a martingale, U. is a
submartingale. Thus it also has a Doob-Meyer decomposition of the form
Theorem 1. Let F denote an arbitrary df on [0, cc) in the sense of (7.2.1). For
each fixed n >_ 1, the process
(12) ?LX„(•AT)/(1—F(•nT))
Also
(1— G_) dF
(b) E{1tz<,JSjg'S} = S • I tz < st + J itz>st
ss)
1—H(
MARTINGALE REPRESENTATIONS 313
and
j
= E t\f o +Jst/Ilz^Utl—F(u—) dF(u)jJW
(c) —A,(s)+ l — F(u—)
1 dF(u)
=A'(s)+ t 0 I(Zc`l + l
—H(u—)
(d) Itz>st I dF(u)
1 —H(s) } 1— F(u—)
(e) = A1(s)+f (1—G_)
dF I[z>s].
,'1 —H(s)
Subtracting (e) from (b) establishes that the process of (15) satisfies the
martingale condition; and since (7.1.5) gives
where
Now note that the A'O, ; of (15) can be represented in terms of the M, ; of (16)
as
(20) 9(t)=
E (1-6, _)dM, for0-t<oo,
; ;
J0
(1 —G 1 _) 2 d(M 1 ) =
J (1 — G,-) 2 [ 1— DA1]
1
dA, <_
f 'c' (1
o
- 6,_)2dA,
=Jo [1—Q^,(u—)]21[X"]
1—F(u—) dF(u)
—r°° 1
Jo 11Y '"^ l ^ X '"^ 1—F(u—) dF(u)
(22) ]
— ,^o '[Z '" 1—F(u—) dF(u)
has
(1—G_) dF=H'(oo)
= J o (1—H_) 1 1 F dF= J
, o
(23) -- 1.
MARTINGALE REPRESENTATIONS 315
Thus (c) of Theorem B.3.1 implies that NN, is a square integrable martingale
on [0, oo) with predictable variation process
=
j l tz ,,, dA(s)- J 0
1 [Z as]AA(s) dA(s)
(24) =
Cr
j (1-Oil,_)(1-AA)dA=
0
f0
t(I -AA I ) dA l .
Since M. is just n - ' 12 times the sum of n iid processes of type fvll , it is also
a square integrable martingale on [0, oo). The proof will thus be completed if
we show that ftvl t) - (M, ; )( t) /n is a martingale. Now (24) gives
I „ I „
(25) - Y- (i„)(t)= - II [1 - H„-](1 -AA) dA
n ; =1 n r=i JO
=
J 0
t[1 -H„ - ](1 -AA)dA.
(26)B „(t)=
f 1-B0„
o' J
” dM0„ for t >_0.
316 CENSORED DATA AND THE PRODUCT-LIMIT ESTIMATOR
We note that Y_; (ßr4„ ) = (M„) and (recall (g) -(i) of Theorem 6.3.1)
;
co
E J /
(1 -
J.
Hl„-)
2
d(A)=E J^
0 (l
( J
-
”
H”-)
(1-AA)dA
c J„
(a) dA
-E o (1-a-0-)
=n Y 1 /n 1
f k kl
o k=l
(1-H_)H"_ k (1-F_) - ' dF<oo
since the finiteness of the term for k = 1 is most difficult, and for it
Thus (c) of Theorem B.3.1 does apply, and implies that (3„ is a square integrable
martingale having
' J 2 J
(c) (Bn)(t)= f d(M" fo (1-AA)dA.
„ 1)= '
0 1-60„_ (1-I-0„_)
6. INEQUALITIES
To extend the convergence of B. and 7L„ from [0, 0] to [0, T], we will require
the following three inequalities. The first is the Gill-Wellner inequality
INEQUALITIES 317
(2) P \II1—Fllo>^/c^
and
( 1 ^ n \SA T) =1 _
— 1
Z s)= 1 —i—
—F(sA T) o,s„T] 1—F 1—H dam°
(b) P\II1—Fl^o^r-A/ c A
If r < TF, we are done. So suppose T = TF. Letting t TF shows that the
318 CENSORED DATA AND THE PRODUCT-LIMIT ESTIMATOR
supremum in (b) may be taken over [0, T] — {TF }. If AF(TF ) = 0, then F„ (rF ) _
F„ (TF-) a.s. If F(T_) < F(T) = 1, then P(T = TF and F. (rF ) = 1) = P(T = TF ).
In either case, (2) holds.
To prove (3), note that 1— H =(1 — F)(1— G) and
1—H T
(d) P( <_elexp(—A) for A>_1
1-0-0 o
(e) for,1?1
P(II11 G Ilo^A)<
by (2) and symmetry of the problem in F and G. Hence
P \II1 — ^„ Ia ,'k P\
1—H 1—Qv„ T^Az
1 —@U_ 1—G Ilo
l by (c)
as was claimed. C
We already know from Section 4 that B. and X. converge on [0, 0]. In this
section we extend this to [0, T] and we introduce II /qII metrics.
We recall that
(1) 1—AA=(1—F)/(1—F_),
Warning: Recall that the Brownian motions in (7.1.8), (7.4.6), and (7.4.12)
are all different.
provided
and
(9) (I x „ —X 1 —KD —Ko) /(i —F))X —U(KD) IIT^p0
1—F q(KD)IIT
o II((1 q(KD) ho
as n-00
provided
(10) q is 7 and q(t)/.fit is ", on [0 2],q is symmetric about t =2
i i
and
fo , q(t) -2 dt <oo.
Remark 1. All proofs still carry through even if G = 0 (recall that this
possibility was specified in the Introduction). In this case D(t)=
Jo[(1 —F)(1—F_)] ' dF =(1— F(t)) ' -1= F(t) /(1—F(1)), so that
- -
Moreover, (9) reduces [after symmetry about Z has been used to replace T by
00, and after we note that q(t)/.Jt\ was not used in half the proof] to
(13) q is ,' on [0, 2], symmetric about '=1, and J[q(t)] 2 dt < co.
0
This is fully as strong as Theorem 3.7.1, and only slightly weaker than Chibisov's
theorem (Theorem 11.5.1).
Proof of (7). Let K = K. Consider [0, S] with 3 small. Now C is 1' and
right continuous. Thus inequality A.2.11 gives
0 0
(c) < -z + P
!3
( \^ o
q(C)
d > e3 l
/
by Inequality 7.6.1
since B = S(C) has (13) = C by (5). Combining (h) and 0) with Theorem 7.4.1
WEAK CONVERGENCE OF D^„ AND X,, IN II /q-METRICS 321
Now
(k) q(K)=q(C /(1+C)) >q(C /2) for t <some fi>0,
where q( /2) behaves near 0 like a function satisfying (8). Thus (14) extends
to
(1) R =Tv0.
(m) (1 +C)q(K) is 7.
C2 d[Bn
1 Bn(o)]
f. (1 +C)q(K)
'
—
(I„
(n) + 1 lB—!B(0)
R_o (1).
II(1+C)q(K)Iie n
P d[a — ^T(B)] R
[llL ( 1 +C)q(K) e
E l
sE+Pj
f R [(
a
+C)q(K)] z d(B n — ln(
3
(0) =E+P1 J
\\
R
e (1
1 (1 —AA) dA>_E 3 1
+C) 2 g 2 (K) 1—H_
1 Jn
(p) =E+P R (1 —H_)dC>_E'
( e (1 +C) 2 g 2 (K) 1 —U-0 n _ )
322 CENSORED DATA AND THE PRODUCT-LIMIT ESTIMATOR
using
(q)f ^"
[( 1 +C)q(K)] -2 dC <e 4/M
J 0(l+C)q2(K)
K(r)
dCJ (l+C)(l±C)q2(K) dJ0q2(K) dK
(s) K(T)
as 9 E ->r
9 (K(T))
(t) =0 since OF(r)=0.
(u) P ( f
\II Jo
1
( 1 +C)q(K)
d[^^ — Bn (e)] R
lie.
l
e <_ 2E
Proof of (10). Consider [0, 8]. Using the method of the previous proof, the
integral in line (e) of that proof is now replaced by [compare (7.5.13) to (7.5.11)]
fo
11
q 2 (D) { I — F }
JI B
Z J„
1—H (1—DA) dA
1 1 1
(a) 1—F_ o1 Z lll — H„_Ilo Jos q 2 (D) 1—H_ 1 —AA dA.
Hence the proof on [0, S] can be completed as in the previous proof [using
(4) and (3.2.58)]. Note that q(t)/J:" is not used in this half of the proof,
and is thus not required in Remark 1.
Now consider (0, T] with K (0) > 2 where K = K o . Again, let R = Tv 0. Note
that
The proof now proceeds as for B. [again using (3.2.58)], where at step (q) in
that proof we must now deal with the integral
r 7
r (1 - K )z
(c) J,, q 2 (K) dD— eF (1 +D)2g2(K) dD
1
dD
o. (1+D)(1 +D_)g2(K)
EC_(r) 1
(d) fJ T o 1
q 2 (K) dK (e) 9Z(t) dt.
C(i)= J (1—H)
0
- ' dA= J (1—H) -2 dH';
0
"
324 CENSORED DATA AND THE PRODUCT-LIMIT ESTIMATOR
_ Cn
.
(20) ^" 1+C n
Corollary 1. If 0< r so H(6) < 1, F and G are continuous, and ,(ö +/1 2 ( u) du <
cc, then
(23) P(S(x) E 9 n (x)± can - ' /2 §n (x)/[#n (x)iP(ln (x))] for all 0<_x < 0)
When i/i(u)- 1 the probability in (25) is given by (2.2.12), and the bands
in (23) which generalize the classical Kolmogorov bands, were introduced by
Hall and Wellner (1980). In view of Exercise 2, these bands reduce to the
Kolmogorov bands when all the observations are uncensored, S i = 1, i =
1 ,...,n.
Confidence bands based on other functions /i are possible as long as the
probability in (25) can be calculated or adequately approximated. Nair (1981)
EXTENSION TO GENERAL CENSORING TIMES 325
advocates bands similar to these but with ,(1(u) = [u(1 — 2 for a <_ a. u)] "
- u 1—
The choice +/i(u) = (1— u) - ' for 0< u b,
is a choice related to the classical
Renyi-statistic, which gives more emphasis to the right tail; in this case the
probability (25)
is given in Example 3.8.2.
Exercise 3. (Gill, 1983) If F is continuous, then Kc- = K n = K from Exer-
cise 7.4.4. Supposing that G is continuous, show that
(1 — K)/(1—F)=1+ J CdF, 0
Suppose now that we relax our assumptions on the censoring times and assume
throughout this section that
(2) H J
r
o
H1 = 1 — (1 — F)(1—G ; ),
1 n
n ^_^
(3) H„ = 1 H; = [1 — G„
-] dF,
n i o
n
n ,
0\„(t)= J 1 dH„.
As in (7.5.15),
A1^(t)= ° 1[z;^u] 1
1 —F(u—) dF(u),
326 CENSORED DATA AND THE PRODUCT-LIMIT ESTIMATOR
is such that
( 8 ) Mn (t)=JY![Hn(t)—An(t)]= `"`[Hn(t)— fo ( 1— Ha - ) d A
'
i n
(9)=-= t ^ l M"(t)
satisfies
1 n
—^—
satisfies
(14) (1-DA) dA
(B„)(t)= J rJ ” for t>_0.
1 — ^n-
The old proof of (7.2.7) carries over verbatim to give the exponential formula
(1 —AA)dA
(18) —F(.AT)! (t)—f,' I1 1 -F} 1 -4^"-
for 0 I<oo
provided F and G, , ... , G. are such that T < TF a.s.
Then both
and
Proof. The only change needed in the proof of Theorem 7.3.1 is to note that
the SLLN holds for arbitrary independent rv's with uniformly bounded fourth
moments as in Theorem 3.2.1. ❑
—H" ]
dii
(22) "(t)=vT [O l^(t)
- — H^(t)]+ f0
and [as
[as in (7.1.5)]
where
where
likewise,
(31) X„(•n T)
1—F(•AT)
^(t) ^as.D(t)
= 0' 1 1
J dA under (26), (28), and G(t) < 1.
0 —H i —DA
1_
special construction of X,, ... , Xn , Y,, ... , Yn and Brownian motions S and
7L exist for which
and
as n-*cc; here
for the C and D of (30) and (31). If F is continuous, then C = D and S =7L.
Proof. The proofs will be complete if we just verify the ARJ(2) condition
(B.5.8) of Rebolledo's theorem (Theorem B.5.2). We assume initially that
(b) B(t)=Bn(t)= J0
fd& = J "
0
fdV [Hn' —An],
where
Then
(d) AB(t)=fvHn
since (a) implies A n is continuous. Thus the term A'[B](t) of (B.5.3) is given
by
I fAHnl[IJID(nH')^fr]
c <r
Since
U) J " H 2 d(M)<m
0
for 0<t<ao holds a.s.,
Thus, as in (B.5.4),
( 1 ) (B`)(t) = J H 2 d(M)
0
(m) —
f.
' (1— H " _)
Jn 2 1 [J,/^l — trU „If ](1 — H n _) dA
(37) BB—B= J ./
1[^< ✓ nE] dN/Of
EXTENSION TO GENERAL CENSORING TIMES 331
satisfies (use Theorem B.3.1 again)
=(f fd^a 0
n )(t) —(f 0
f{ 1 [Ui< ✓nE] -1 [[fI F J } d>(t)
r
— fd(n)— f2 {1[1<nc]2'0"f'1[i^ ✓ ne]}d(n)
0 0
(38) =0.
Thus (n) and (38) show that (B.5.8) holds. Thus Rebolledo's CLT (Theorem
B.5.2) implies = of B. in case (a) holds. Application of Skorokhod's theorem
(Theorem 2.3.4) gives Z•'s, S ; 's, and §, from which X 's and Y; 's can be ;
constructed.
The proof for X. under (a) is virtually identical; just redefine the f of (c)
to be
and note that the application of the dominated convergence theorem in (n)
uses the conclusion IlF n - FIJö->a.s.0 of Theorem 1.
Suppose now that we drop assumption (a) and suppose
as in (3.2.49), and
where A ' is our usual right-continuous inverse of (3.2.50). We also note that
-
Remark 1. One can now extend the above to the case of F,,,, ... , F„„ con-
tiguous to F,. .. , F via the methods of Section 4.1. Note that if G„,, ... , G,,,,
are kept the same in both situations, they will cancel out of the likelihood
ratio statistic.
n I n+l
47) n t P S,,:, pl
rt i1+1 )}
(ii) Show that this choice of p; is just another representation of the product-
limit estimator I n .
(iii) Extend all of the above to allow ties.
Hint: The quantity to be maximized can be rewritten as j j;= , as^ (1 -
a-)n-;+, -s,,.;
where a ; =p ;/Y^ .' p; . See Scholz (1980) for a broad enough
definition of maximum likelihood to allow a rigorous proof that F. is the MLE
of F.
CHAPTER 8
0. INTRODUCTION
The Uniform (0, 1) order statistics ,; have the same distribution as does
(a 1 +. • • +a 1 )/(a 1 +• • • +a 1 ) where the a ; are iid Exponential (1). Since
the a ; can be viewed as interarrival times of a Poisson process, the link between
empirical, quantile, and Poisson processes is strong. This link was discovered
early, and the literature exploiting it is rich in quantity and variety.
In this chapter we carefully define the link. Our exploitation will come in
later chapters, primarily in treating the convergence of the quantile process
V. and the oscillations of the empirical process U.
While many first proofs in the literature use Poisson representations, many
results were later redone via other methods. So too, we will only use Poisson
methods in a few of the possible situations.
Note that N(t) = k for r,,. < t <1 k+ , for k>0 (here -q o = 0). Also, in analogy
with (3.1.88),
We say that events occur at the times m, 77 2 , ... , and N(t) counts the number
of events through time t.
We specify a two-dimensional Poisson process N on R+ x R + by requiring N
to have stationary, independent increments whose distributions are Poisson
with mean equal to the area of the increments. We take as known that N exists
on (D T , 2 T ) with T= R + x R. Our distributional assumption on the incre-
ments can be expressed as
Proposition 1. Let a, , ... , a n+ , be iid Exponential (1) rv's. For 1 <_ i n+1
define i7 i = a, + • • • +a, and define
( 1 ) n:i = hti/h1n+I
Then 0 <_ en:1 C . . . C en :n <_ 1 are distributed as the order statistics of a sample
of size n from the Uniform (0, 1) distribution.
( ) -' '_ .
Proof. The joint density function of the i ; 's is
a e e -( - 'i 1 . e - ( i"+i - "0 = e - ",-i
Proposition 2. Let 17, , ... , rt„, ... denote successive waiting times associated
with a Poisson process N. Then the conditional distribution of 0 < m < ... <
n„ <_ I given fß(1) = n is the same as the distribution of the order statistics
0 e„ ; , < <_ I of a sample of size n from the Uniform (0, 1) distri-
bution.
Proof. Let 0<t< • ..<„< 1 be fixed, and let h, , ... , h„ be so small that
0 < t, < t, + h, < t 2 < t z + h 2 < < t„ < t„ + h„ < 1. Since N has independent
increments,
(a) _ II h e
^
i =1
while
Dividing the ratio of (a) over (b) by h, • • • h„ and taking the limit as
h, v • • • v h„ - 0 yields n!; this limit also represents the density function of
rl,,....17„ at 0<t,<• • •<t„<1. 0
Exercise 1. Let 1;, , ... , e„ be independent Uniform (0, 1). Then clearly
= —log n:i for 1 <_ i n are distributed as the order statistics in a
sample of size n from the Exponential (1) distribution.
(i) (Sukhatme, 1937) Prove that the normalized exponential spacings
Exercise 3. Show that n,, -o d Exponential (1) as n - oo, while for any fixed
integer k we have
The spacings of Uniform (0, 1) rv's are treated in Section 1 of the spacings
chapter (Chapter 21), which could profitably be read at this point.
_
(1) '1i h0^i<_n +l
fort=+wit
( " n+l) n+1
with /„ linear on each [(i-1)/(n+ 1), i/(n + 1)].
Let a 1 ,. . . , a" + , be iid Exponential (1) rv's. Let Sn+, denote the partial-sum
process of the (0, 1) rv's a, —1,. . . , a n+ , —1, and let S n+ , denote the smoothed
partial-sum process that equals S "+, at each i/(n + 1) and is linear on each of
the closed intervals between these points. Let rt ; = a, + • • • +a 1 for i ? 1. Then
n+l [
§n +l(t) —t§n+111)]
^- n +1 1 n + ,
i i
fort with0^i<_n+I
(2) 5n+l] =n + 1
338 POISSON AND EXPONENTIAL REPRESENTATIONS
and is linear on the closed intervals in between. Proposition 8.2.1 thus gives that
^n+l
(3) [§„+l – I§n+,(1)] for each fixed n.
n + I 71n+l
This representation was exploited for studying empirical and quantile processes
by Breiman (1968). Note that the SLLN, the CLT, and the LIL imply that as
n-3. c30, with b„ = 2 log t n,
n
(4) nn -*as. 1, Vi( n„- 1) -3.d N(O, 1 ), Tim — - –1 =1 a.s.
n-.w b„ n
N(t)=S(t)–tS(1) for0<t<_ 1
(5)
Brownian bridge
Applying (4) and (5) to (3) leads us to expect that 4/„ converges weakly to
Brownian bridge; indeed we already know this result from Theorem 3.1.1. In
a later chapter we will obtain a stronger form of this result.
Additional identities for other variations on the uniform quantile process
are found in Section 11.5.
4. POISSON REPRESENTATIONS OF U.
In Proposition 8.2.1 we saw that Uniform (0, 1) order statistics f„, ; can be
represented as
(1) „:; = m7 ; / n„+l jointly in 1 <_ i _< n +1, for each fixed n
ri can also be viewed as the waiting times associated with a Poisson process
;
and
as Chibisov's representation.
A somewhat different idea is contained in the following Kac representation
of the empirical process U „. Let N„ = Poisson (n), and define
where, , 2 , ... are iid Uniform (0, 1) rv's independent of the N„'s.
(iii) Show that U,= U on (D, -9, I1 II) as n->oo can be inferred from this.
340 POISSON AND EXPONENTIAL REPRESENTATIONS
5. POISSON EMBEDDINGS
We now define
for each s ? 0. Note that the conditional distribution of N, given the event
[N(s, 1) = n] has occurred is the same as that of U; that is,
(3) N,I[N(s, 1) = n] = U.
In fact, for
we have
for each s > 0 [let IOI (t) = 0 for all 0< t <_ 1}; reasons for this name will become
clear as we go along. Note first that for each fixed s we have
and
implies that it shares them conditionally on each value of N(s, 1). Note from
(13) that s shares them also. Thus
and
(17) {ßA (b — t, b]: b >_ t >_ a}
L
for s 1 , s 2 ? 0 and 0 < 1 1 , 1 2 < 1. These are the same moments as K possessed.
They are also the same moments as possessed by the sequential uniform
empirical process K,,. We also note that with S k as defined in (4),
k k+l
(22) vk <_
for-s< ,k>_0,
n n n
(23) = (as a version of the sequential uniform empirical process).
1 rats)
(24) ^(s) ^^ [1[o,,(;) - t]= 1 l(t) jointly ins ?0 and 0-_t^1,
so that the lhs of (24) is another representation of the process ß! 5 (t) of (2).
CHAPTER 9
Some Exact
Distributions
0. INTRODUCTION
y 1-a
1
i
i
L' 'I
i^ b
slope = b/(1—a)
1-b
0 1 4 I Figure 1.
0 a 1
i
[ ^J-'
a+-
c
1—a—
cn
,
The theorem follows immediately from the following result, which is a key
lemma in many papers dealing with exact distribution theory.
n/n
2/n
1/n
0 t
0 a a +b a +2b ... a+ib
Figure 2.
1 n ( nb ) n (i-1)
i 1 i ) n-i
{b} nb
=1 bn/ + i l ^I (nb)'
9=b(i—j)/(1—a—bj).
Now
i -I
(a) = q+pi.
Also
n —1
—J ei -J-ß(1 +l
° A — e)n -i
i =o i —j-1
i —j
j P'ti—j 1 0 n —i +1
(b)
_ 1—a —bi
n
— {b( — i +1)
j q
since the two terms in brackets are equal. From (a) and then (b) we obtain
P(G„ intersects L)
m
_ £ P(G intersects L for the first time at height i/ n )
i=o
_ 1—a
(a) 'Eo P. a, b n using definition (4).
c f
n ! •..
J (n—I)C/n J c/n
d^n:l ... den:n-, d n:n
PA =P(I^i<n—I
max (n n:1+l /i)^A)
= P( n:1 ^ A /n)
A/n
+ P( max nf„ :;+ ,>iAi1=„ : ,=s)n(1 —s) " -I ds
o I^i^n—I
n n
Figure 3.
-1)b) "-i.
(b) =(n-1)a(a +(i- 1)b)i-2(1—a— (i
Thus
^ a/n—s , A 1
PA= (1—^ n + (.LI,) A/" Pn-1 . ;—I
^
J
n ; _,fo 1—s n(1—s)
(c)
(
( 1 n ) + ii
n ^n/A) 0 A f _
\ s\
1 ds
"-'
x^l—n
,l " J A
( n_1 \ 1
x[iA—s]` -Z f 1— ^^
I " ds by(b)
I. n
A"a n i il
) a( )
+(.ls)(iA—s)i2ds
n/ ;_^ i n nj o
i-1 iA^n i
(d )
:1""I
=^ 1-- +
n
1 ;_, i
([-1)
i
nA L 1--
L
as claimed. ❑n
Exercise 3. The following proof of Theorem 2 is due to Renyi (1973). Show
that G„(A)= P(I,G”/ III <A), satisfies
_ / n-1
Gn (A) Gn_i ^ns" - 'ds
l /A'
\ Ans
EXACT DISTRIBUTION OF ICU II AND DKW INEQUALITY 349
by conditioning on sn ,„. Then use induction to show that G(A) = 1-1 /A for
A>_1 and all n >_1.
The null distribution of the one-sided Kolmogorov test of fit is that of IIU I.
Also, let D 1 (6 n - 1) # ^^.
Theorem 1. (Smirnov; Birnbaum and Tingey) Now ICU„ II. Also 0<
IIU II<v'i• Mainly, for ø<A <1, P(IIU„Il>v'A)= P(D>A), and
(n(l-A)) n i i 1 n-i
(1_A--)
n )( i 1r
in—A I I 1+,k
(2) =^ )+I^(
Asymptotic Expansions
t 2 2A 2A2 1 2A2\
(3) P(IJU„^^^A)=exp( -2d )3 n 1— —i-
3
[1+inT12+ -
n3 2 (5 1 1 52 234 (
1 )]
n
c 1 2 3 4 5 6 7 8
c 9 10 11 12 13 14 15 16
c 17 18 19 20 21 22 23 24
e 25 26 27 28 29 30 31 32
350
Table 1 (cont)
c 33 34 35 36 37 38 39 40
c 41 42 43 44 45 46 47 48
c 49 50 51 52 53 54 55 56
cuntinued
351
Table 1 (cont.)
c 57 59 59 60 61 62 63 64
c 65 61. 67 68 69 70 71 72
c 73 74 75 76 77 78 79 80
continued
352
EXACT DISTRIBUTION OF IIU„II AND DKW INEQUALITY 353
Table 1 (cont)
c 81 82 83 84 85 86 87 88
1 .00000 .00000 .00000 .00000 .00000 .00000 .00000 .00000
2 .00000 .00000 .00000 .00000 .00000 .00000 .00000 .00000
3 .00030 .00026 .00023 .00021 .00018 .00016 .00015 .00013
4 .01647 .01542 .01444 .01353 .01267 .01186 .01110 .01040
5 .10178 .09776 .09389 .09017 .08659 .08314 .07983 .07664
6 .26253 .25569 .24900 .24247 .23609 .22986 .22379 .21786
7 .44855 .44056 .43269 .42493 .41727 .40973 .40229 .39497
8 .61675 .60914 .60159 .59408 .58662 .57922 .57188 .56459
9 .74917 .74276 .73636 .72996 .72356 .71717 .71079 .70442
10 .84442 .83949 .83452 .82953 .82451 .81947 .81440 .80932
11 .90833 .90480 .90123 .89761 .89395 .89025 .88651 .88273
12 .94867 .94630 .94390 .94144 .93894 .93640 .93381 .93118
13 .97268 .97119 .96967 .96811 .96650 .96486 .96317 .96145
14 .98618 .98531 .98440 .98346 .98249 .98149 .98046 .97939
15 .99336 .99287 .99237 .99184 .99129 .99071 .99011 .98949
c 89 90 91 92 94 96 98 100
1 .00000 .00000 .00000 .00000 .00000 .00000 .00000 .00000
2 .00000 .00000 .00000 .00000 .00000 .00000 .00000 .00000
3 .00011 .00010 .00009 .00008 .00006 .00005 .00004 .00003
4 .00973 .00911 .00853 .00798 .00699 .00612 .00536 .00469
5 .07357 .07063 .06779 .06507 .05994 .05520 .05082 .04678
6 .21207 .20643 .20092 .19555 .18520 .17536 .16600 .15712
7 .38775 .38064 .37364 .36674 .35327 .34021 .32757 .31533
8 .55735 .55018 .54306 .53600 .52207 .50839 .49496 .48178
9 .69806 .69172 .68539 .67908 .66651 .65403 .64165 .62937
10 .80421 .79909 .79395 .78880 .77847 .76810 .75771 .74731
11 .87892 .87507 .87119 .86728 .85937 .85136 .84324 .83504
12 .92851 .92580 .92305 .92026 .91457 .90874 .90278 .89670
13 .95969 .95789 .95605 .95418 .95032 .94632 .94218 .93791
14 .97830 .97717 .97601 .97482 .97234 .96974 .96702 .96417
15 .98884 .98818 .98748 .98677 .98526 .98366 .98196 .98016
Remark I. (Harter) Let 17 ° denote the upper a percent point of the ry lull;
( )
see (1.3.8). Then (1.3.6) suggests that the upper a percent point i» of IIG„ -III
be approximated by -q ' /Vi. However, Harter's (1980) numerical investigation
(
suggests that
n+4
( 4 ) 17n - r7^°^/ n^
3.5
is far more accurate in small samples. For a values of 0.20, 0.10, 0.05, 0.02,
0.01, approximation (4) seems to have an error of less than 0.5% for n>6
and of less than 0.25% for n >_ 11.
354 SOME EXACT DISTRIBUTIONS
Similar results are reported for each of D„, D, K„, W„, and U„ by Stephens
(1970); both upper and lower tails are considered. See Chapter 4 for these
results.
A Slight Modification
Exercise 2. (Pyke, 1959) (i) Let 0-c<_ 1. Then for 0<nc—A <1
Proof. The first inequality follows trivially from (1IJlI _ IIUJ„11 v IIURII, where
u I I — J) U n II
From (5.2.2) we have for 0<A < / that
where
—,1n2
d
da
log Q,, (j, A) _ (j —Avfn)(n —j +A^) n—j +A/
—4A 16A
-4A —
( n_ j+.^l ^ l 2
(b) = -4A-16A(---- A I z
\2
We will divide the sum (a) into the two parts E' and .":
^-;-' nn< e_
e
Q„(je0)=\j17i(n— j)
I(n J) n l J
n
< (0.4337)(2 — a) 312 (+ a) "2 / n 312
n
AfriE'Q„(j,A)< A exp ( - 2k 2)^' exp l —8^1 z 12— n +3
^ Iz l .n
( 8A 2J2 \
< 2^ 6 exp( -2,1) exp Z
n z =o — ni--
L o J
/ 1 2 l
<_ 26 ` ^+ 8 I exp (-2A 2 )
9n )
for A ? T where
\ z \
CT=exp (2T2 +8T 2 ^2 + 3
2 I +9 4 I.
If 2A/(3'Ji) s ad where d = f/(1 +f) = 0.738, then the second term in the
exponent of (e) does not exceed —(Aad) 2 . Thus the sum of the last two terms
in the exponent of (e) never exceeds —A 2 a 2 d 2 . Thus for j E Y." and A >_ T we
have from (e) that
C T Z 5.625 2
exp ( -2,1 ) = exp (-2^1 )
<v adT 2e ad
(f) = (3.266/a) exp (-2A 2 )
Plugging (d) and (f) into (a) with a = 0.221 yields the theorem.
The choices d = 0.738, T = 0.27, and a = 0.221 each resulted from a minimiz-
ation problem. However, the constant 29 is still excessive. What is the minimum
constant possible? Hu (1985) shows that C=2\/ works.
This proof is taken from Dvoretzky et al. (1956). ❑
Open Question 2. What is the minimum value of c that works in (6)? Does
c= 1 work as conjectured in Birnbaum and McCarty (1958)?
We will demonstrate here that the DKW inequality is quite good. The best
we will get for a Binomial (n, z) ry is Hoeff ding's inequality (11.1.6), which gives
The DKW inequality extends this from t = 2 to all t in [0, 1] and maintains
RECURSIONS FOR P(gsG, sh) 357
the same exponential rate; only the constant is increased. Also, as we compare
the DKW inequality with the limiting distributions of Example 3.8.1, we see
that the constant 2 in the exponent cannot be improved on.
E I!U n 11' <_ 2cr1'(r/2)/2 (2)+ ' for all real r>0
and
E exp (rIIU„ II) <_ 1 + f c/r exp (r 2 /8) for all r> 0
or1
[ 1
K,,= sup t- l[ar,()]
0 ;=t
satisfies
for all 0< x <_ 1. Hint: Compute the probability conditionally on the value of
N, using the Birnbaum and Tingey theorem (Theorem 9.2.1) on each term.
In case a _ —m (in case b. = +co) for 1 ^ i < n, we will label the probability
;
i A i-1 A
(3) a;= n -- v0 and b = + — nl ; for 1:i_
<n.
Note that
P„ P(,, :i b i for 1!5isn)=P(IIU II sA),
[With regard to (3), we note that on the interval [e _,, en: ,) the event I -G <_
;
Functions fr of interest include iji(t) = ut(1- t) and i/i(t) = -log (t(1 - t)).
Table 1 gives the percentage points of the distribution of J1U +n/ 6 „(1 -G„)Ij,
as computed by Kotel'nikova and Chmaladze (1983). ❑
We now define
and
359
Table 1 (tont.)
UI ... U8 U9 U10
1
IC VI V2 V3 •••
V10
Figure 1.
360
RECURSIONS FOR P(gsG„sh) 361
for 1<_i<_n. Then
is also of the form (1). Table 2 gives percentage points of the distribution of
IIv„/ I(1- I)^I as computed by Kotel'nikova and Chmaladze (1983). ❑
Table 2.
Percentage Points of the d.f. of the Standardized Srnirnov Statistic
(from Kotel'nikova and Chmaladze (1983))
The Recursions
We now turn back to the basic problem stated in (1) ; we wish to compute
(15) a 1 <a, b,
and
and let
(18) h(m) —1= (the number of b- boundaries among co , c,, ... , cm _,),
1 <_m<2n +1.
Thus g(0) = 0, g(2n) =n, h(1)-1=0, and h(n +1)-1=n. The probabilities
of the various intervals (c,„_,, cm ] are
(20) Q0(0) j•
For 1:m <_2n,
(23) PP = Q,,(2n).
See Noe (1972) for the above. The following one-sided version is from Noe
and Vandewiele (1968).
In case there is only a lower boundary (b = Co for 1 <_ j <_ n), then
;
(25) 1 Q0 (0)=1.
For 1^mn +1,
Let
(27) I Qm(m) =0 .
(30) P„ = Q„(n+1)
Example 4. Suppose a,, ... , a, o , b,, ... , b, o are ordered from smallest to
largest, with ties allowed in any combination, as in column 1 of Figure 2. The
quantities g(m) and h(m) —1 of (17) and (18) appear in columns 3 and 4. As
m ranges from 1 to 2n = 20, the indices i for which the Q (m) of formula (21)
;
must be computed are given in columns 5 and 6; that is, indices h(m + 1)— 1 _<
i <_ g(m —1) are specified. Thus, in this example with n=20, Q (m) must be
;
0 0
a, 1 1 0 0 0
a2 2 2 0 0 1
b^ 3 2 0 1 2
a3 4 3 1 1 2
a4 5 4 1 1 3
ab 6 5 1 1 4
as 7 6 1 1 5
bg 8 6 1 2 6
b9 9 6 2 3 6
b4 10 6 3 4 6
a7 11 7 4 4 6
be 12 7 4 5 7
be 13 7 5 6 7
as 14 6 6 6 7
b7 15 8 6 7 8
as 16 9 7 7 8
be 17 9 7 8 9
al o 18 10 8 8 9
be 19 10 8 9 10
b io 20 10 9 10 10
21 10 •
bi -••
I ii II 11111
1 i n
n+1 n+1 n+l
Figure 2.
Example 5. (Exact power of tests of fit.) Note that minor changes allow us
to use these same methods to compute the exact power of tests of fit. Stephens
(1974) makes various power computations based on formulas of the type
discussed in this section. O
Note that
(b) Qo(0) = 1,
and also
since we will interpret an event placing no restrictions as the sure event. Thus
the recursions (20)-(22) hold for m = 0 and m = 1.
Now for the inductive step. Fix 1 <m <_ 2n. We will assume that
The recursion will be completed if we can show that (provided m <_ 2n)
Now if k < h(m) —1, then bk+ , <— b h(m) _, <_ Cm -,, so that
(i)Q(I)i_k
(k) Q(m)= ±
k=h(m)-I k
if h(m+1)-1<isg(m-1).
with
(n)(1—b„
(33) P„(b,.....bn)=1 — m+i) m P„ m(b,,...,bn m)
m=1 m
n
(35) P.^(a......an)=1 — a: n m(am+i...an)
-I m
with Po = 1. Kotel'nikova and Chmaladze (1983) used (35) to table the exact
distribution of IIU / I(1—I)II and IIv/ G(1 —G„)II for sample sizes n up
RECURSIONS FOR P(gs0„sh) 367
1 1 h
• g •
0 0
0 b, b 2 b3 ••. b„ 1 0 a, a2 a 3 ... a" 1
u ; = 1 - v„_ ; +I
Figure 3.
to 100; see Tables 1 and 2. All this raises the interesting prospect of finding
the distribution of limit functionals such as IU «/log (1(1 — I ))-' II through
finite n calculations.
with Po = 1. See Steck (1968), though our proof is found in Breth (1976). Steck
indicates this performs well at n = 100. See also Epanechnikov (1968).
Proof of Bolshev's recursion. Now
P"(b,,....bn )=P(nG n (bk )>_k for 1:k <_n)
=1— Y_ P([nGf(bk) =k —
l]nf [nG,,(bJ)?j])
k = 1 i =I
fl/ a
=1— (1 —bk )n-k+lbk -1
k =, k- 1
(
k+1
—bk)" Pk - ,(b......bk -1)•
=1—
k_, ( k— 1 )(1
Letting m = n — k+ 1 in (a) gives
as claimed in (12). ❑
Ao=[b1<e":t]nA,
(b) A;=[ ":j^bi for 1< —i <_j and bj +i <6":j+i]nA
for 1<_jn-2,
A n _,=[fin:; <b; for 1in]r'A;
note how A is partitioned on the basis of the smallest index where ";; <— b ;
fails. Now, using (a),
n-2 n-2
(c) P" = P(A"-,) = P(A) — P(A)=b— E P(A ). ;
j=0 j0
Now A; , for j< n-1, occurs if and only if j of f,, ... , e" fall in [0, b; ] and
satisfy en , ; b. for i j, while the remaining n —j points fall in (b;+ ,, b"]; thus,
using conditional probability,
(b,/b; , ..., b /b )
; ;
Remark 1. It may be of some interest to compare the first few terms of the
recursions (33) and (36) for P"(b). For (36) we obtain
Po =1, Pi(b,)=big
We note that the recursions really proceed differently. Note that both lend
themselves very well to symbolic programming. ❑
Exercise 2. Write out Noe's recursion (25)-(27) to obtain formulas for P,(b,)
and P2 (b 1 , bz).
RECURSIONS FOR P(g <_ G, ^ h) 369
Steck's Formula 4. If a and b satisfy, (15) and (16), then
where (x) + = x v 0 and where it is understood that all elements of the matrix
having subscripts i >j + I are zero. We prove this below. Steck indicates that
problems of numerical accuracy set in with (37) prior to n = 50. Steck's formula
also provides the basis for the following result of Ruben (1976), which we
state without reproducing its lengthy elementary proof by induction.
where the 2 " ' n -tuples k = (k,, .. . , k„) in K. and the corresponding constants
-
and
k c(k)
3 !P3(a, b) = (b, — al)+( b , — a2) +(b3 — a3)+ — z(bl — a2) +(b3 — a3)+
Table 3
Coefficients in Ruben's Formula
(from Ruben (1976))
K1 K3 K4 K5
1 (1) 111 (1) 1111 (1) 11111 (1) 11120 (-2)
201 (-2) 2011 (-2) 20111 (-2) 20120 (4)
120 (-2) 1201 (-2) 12011 (-2) 12020 (4)
300 (6) 3001 (6) 30011 (6) 30020 (-12)
K2 1120 (-2) 11201 (-2) 11300 (6)
2020 (4) 20201 (4) 20300 (-12)
11(1) 1300 (6) 13001 (6) 14000 (-24)
20 (-2) 4000 (-24) 40001 (-24) 50000 (120)
KB
111111 (1) 111201 (-2) 111120 (-2) 111300 (6)
201111 (-2) 201201 (4) 201120 (4) 201300 (-12)
120111 (-2) 120201 (4) 120120 (4) 120300 (-12)
300111 (6) 300201 (-12) 300120 (-12) 300300 (36)
112011 (-2) 113001 (6) 112020 (4) 114000 (-24)
202011 (4) 203001 (-12) 202020 (-8) 204000 (48)
130011 (6) 140001 (-24) 130020 (-12) 150000 (120)
400011 (-24) 500001 (120) 400020 (46) 600000 (-720)
was shown above. Using Ruben's formula, Ruben and Gambino (1982) com-
puted Table 4, giving the coefficients of the polynomials in the exact distribution
of (IG —Ill for 3 ^ n ^ 10 on the range 1/ n ^ t <_ I— I/ n. For example, when
n=4 and 2<_t<—;
They did not give coefficients for 0 <_ t < 1/n and 1-1/n < t < 1 since
0 if 0<_ t< 1/(2n)
(42) P(JjG"—I t)= n!(2t-1/n)" if 1/(2n) ts1/n
if 1-1/n<t^ 1
is true for all n.
Table 4. Coefficients of P(D„ S a) for a In the various subintervals of
[1/n,1-1/n], n = 3(1)10
(from Ruben and Gambino (1982))
1 0 -1 1 3/8 3/8 -1
a -8/3 10/3 a -9/2 -63/8 37/8
a2 14 2 aE 21/2 75/2 3/2
a3 -12 -4 a3 24 -48 -10
a4 -48 16 6
1 96/625 336/625 0 -1 -1
a 672/125 - 188/25 -728/125 522/125 738/125
ae -1464/25 12 224/5 24/5 12/25
a3 240 424/5 -458/5 -56/5 -92/5
a• -288 -240 74 -6 22
ab 0 160 -20 12 -8
[4/7,5/7] [5/7,6/7]
1 -1 -1
a 104486/75 141986/75
a2 11220/74 -7530/74
a3 - 11120/75 - 14870/73
a4 710/49 5210/49
ab 414/7 -766/7
aB -80 58
alp 30 -12
1 -1 -1 -1
a 1500284/88 1894034/88 2547218/88
as 33740/84 138670/85 -187922/85
a3 -131480/84 -192710/84 -274142/84
a4 -140/8 20790/83 98390/83
as 6188/64 5838/64 -16842/64
ae -140/8 -1686/8 1610/8
a° -110 158 -82
a5 70 -42 14
1 0 -1 -1 -1 -1
a -80189328/97 25924114/9 7 31524114/9? 39382322/97 52539010/97
az 9157696/95 654840/95 4491760/9 6 1879024/96 -4709320/9 6
as -41500928/9 5 -1820000/9 5 -2730000/9 5 -3818640/9 5 -4759832/9 5
a4 1434944/99 -34720/93 -42980/9' 537628/94 2016644/9 4
as -2859920/93 79408/93 13412/81 90468/93 -389732/9 3
as 49224/9 840/9 -9632/81 -35840/81 43736/81
a7 -43232/9 -1760/9 -1704/9 4512/9 -2938/9
as 2234 -70 294 -226 110
as -372 140 -112 56 -16
372
RECURSIONS FOR P(g<6 h) 373
Table 4. (cont.)
1 .24809375 .24609375 -1 -1 -1 -1
a - 11.95911504 - 19.70752482 8.19872518 7.45283848 9.23169134 12.25159022
a2 82.9458144 237.91401 11.648385 8.5131018 2.5835922 - 12.5159022
as 152.373432 - 1225.12164 -44.62164 -62.910792 -85.4994 - 104.373768
a4 - 3033.98844 4108.5788 - 48.7964 13.44168 142.51944 472.82088
a5 13862.5956 -9945.9612 197.0388 255.0996 151.3764 -984.0348
as -35923.104 16984.8 48.3 - 320.828 -831.012 1233.372
a7 59053.68 - 19328.4 -440.4 -232.08 1273.2 -984.72
as - 60672.6 13977 128 781.2 - 1004.4 493.2
as 35232 -6240 392 -618 418 -142
a 10 -8712 1528 -252 168 -72 18
Additional Results
A Related Result
IS ism
where R 1 , ... , R,„ denote the ranks of the observations X 1 ,. . . , Xm in the first
sample when ranked among the combined population X,, ... , Xm , Y,, ... , Y,,.
(ii) Show that
Proofs
Lemma 1. Let A 1 ,... , A, and B,, ... , B. denote events such that for any
integer k the sets
(48) {A„ Bs : r < k, s <_ k} and {A„ Bs : r> k, s> k} are independent.
Then
0 ifi>j+1
_ 1 if i =j+1
(50) d” P(B,) if i =j
P(BiB1+l . .. BJAiA;+I ... Ai-i) if i <j,
Consider O N+ ,. The elements of the last row of D , are all zero except
Therefore,
(e) AN +l = P(BN +1)' 1 N — zN,
(i) ^ 2= I P(B 1
) P(B,BzA,) =P(B1B2)—P(B,B2Ai)=P(BIB2AI)
1 P(B2)
as desired. ❑
376 SOME EXACT DISTRIBUTIONS
By (50) we have
and
(d) d„=P(B;)=b;—a;.
If i < j, then
Combinatorial methods have proved fruitful in analyzing both one and two
sample empirical processes; see Takäcs (1967). We present a brief sampling
here; it is in the spirit of Vincze (1970). The only application made in this
section is a rederivation of Dempster's proposition (Proposition 9.1.1).
However, in the next section we will use them to derive some new results; see
also Takäcs (1967).
Lemma 1. (Andersen) Let x,, ... , x, denote any n + 1 real numbers that
satisfy: (i) the sum of all n + 1 is zero and (ii) no proper subset sums to zero.
Let X 1 ,. .. ‚ denote a random permutation of the x i 's and let S;
X 1 + •+X1 for1^i^n+l with S o =0. Let
Then we have
Sn
Sn
So 1 \ / so S' \
Figure 1.
i Ik i
1 0 1 2
10
2
3
4
0
0
2
31
2 2
//
5 2
6 0 60
5
7
8
1
0
70
80•
•
d #'.
t^
8 16 1 8 16
(a) (b)
Figure 2.
P(C) =1
We will now use Tusnädy's lemma (Lemma 3) to evaluate P(D,I C ). For this
;
purpose, let a 1 ,... , a ; be Uniform (0, a + ib) rv's, and let A 0 = [0, a), A i =
[a +(i -j)b, a +(i - j+1)b) for 1 < -j i. Then
P(D IC )= 1 -ib/(a+ib)=a/(a+ib).
; ;
Thus
a
(a+ib)'(1-a-ib)"-'
a+ib (n)
Lemma 4. (Csäki and Tusnädy, 1972) If, in the context of Lemma 3, M = M"
denotes the number of indices i, 1 <_ i r, such that v, + • • • + v; = i, then
n
(6) P(M?m) =m! B'° for0<_msn
m
and
1
(a) P(M^mfvo)=m!(N—vo) on [v0 >0]
m N'"
since the conditional probability of any A ,1 <_ i < N, given Aö is 1/N. Establish-
;
For m = 1 (b) is obvious, since both probabilities equal 1. For m> 1, letting
380 SOME EXACT DISTRIBUTIONS
and
^-' n —1 7 ' j \ ^-^-;
(d) 1°n-1(m-1; 1/n)= ' 1 n I P;(m-1, 1/j).
\
But these sums are termwise equal since (;)(1—f/n) _ ("; `). Thus (b) holds.
Hence by (b) and the induction hypothesis,
P(M?mI v0 =0)=PN { m;
NJ
=PN-^ m-1; N
1` by (b)
N-11 1 1
=(m-1)! by induction
(m-1 ) N
=
m!^ N )
m N'°
1
so that (a) holds on [ v o = 0] too.
Now it follows easily from (a) and the fact that vo = Binomial (N, 1— NO),
that
P(M ? m) = EP(M >_ m v0)
ty-k
M! E[( N m vo )) N m o (Nm k k)( 1—N0 ) k (N0 )
l\
N—m)(1—NB)k(N9)N-k m N9
=m!\ m k-o k J N"'
=m! (hem
as was claimed. Verify (7) by easy subtraction. ❑
Theorem 1. (Takäcs; Csäki and Tusnädy) For 0 <_ d < 1, c> 0 and all n ? 1
the Birnbaum and Tingey theorem (Theorem 9.2.1) follows from the case c = 1,
d = A, i =0 as a corollary. Similarly, upon noting that
I [N(A. =i
(5) LI^1 II^
„1 <^ – 1 a/ ]
(the intersection at t = 1 always counts 1 in this case), the special case c = A ->1,
d = 1-1/1t, and i =1 yields Daniel's theorem (Theorem 9.1.2) as a corollary.
Note that II(1–G„) /(1– I) 1 I = IIG/IU.
Moreover, noting that N. (A, 1-1/ A) equals the number of times that
I – G„ (t) = A (1– t) for 0: 1:5 1, which in turn is distributed as the number of
times G. (t) = At for 0 < t -- 1,
( ) 1
7 ^ as n -^ oo.
A'
382 SOME EXACT DISTRIBUTIONS
^" k k
(8) Ek = I G n (d +—) _–
L nc n
( n k k\" -k
i
-_ )" –d–nc l .
(
(a) P(Ek)=(k)(d+
(9) k 1 l(d+ncj
–tlj i, (nc)' (d+—
k/n
0
0 d 1
Figure 1.
Hence,
(nc(1—d))
(b) _ P(exactly i occurrences among E o , ... , Ek -, I Ek )P(Ek ).
k =i
NUMBER OF INTERSECTIONS OF G,, WITH GENERAL LINE 383
Thus the assertion of the theorem follows upon substitution of (a) and (9)
into (b), coupled with a regrouping of the factorial terms. ❑
Exercise 2. (Takäcs, 1971; Csäki and Tusnädy, 1972) Show that if c < 1 is
fixed and d=u/n, u >_ 0, then
u 1
lim P N„ c,— >i +1 I
n-. n —
-i -' e-u-k /c
(10) u+i/c CO (u+k / C )k
£ i=0,1,2,....
C k-i (k—i)!
Exercise 3. (Csäki and Tusnädy, 1972) Show that if A>0 is fixed and
c = ,1(1—v/n), then
In particular, with N. (1, n - ' / 2 u) = (the number of times U„ (t) = —u, 0< t 1),
for b>0, a + b ? 0. Darling (1960) expands (13) to obtain the 1/v' term.
Csäki and Tusnädy (1972) obtain similar results for the empirical df of a
Poisson sum of independent Uniform (0, 1) rv's.
384 SOME EXACT DISTRIBUTIONS
In this section we will see that the point at which U, is maximized is uniformly
distributed over [0, 1]. Another kind of uniformity obtains for the slightly
different O n process defined below.
Let the smoothed empirical df 4„ be defined by
with (D„ linear on each of the intervals [(i -1)/(n+ 1), i /(n+l)] for 1<i<
n + 1. The smoothed uniform empirical process is
the smoothed uniform quantile process. Note that EÖ/„ (i/ (n + 1)) = 0 for all
1<_i<n.
Note that
0st1 0<-inn
so that U„ (t) is maximized at one of the points f„ :o , „,,, ... , „,„ ; and that,
except on a set of measure zero, the maximizing point is unique on [0, 1). We
let
and
Note that
we also let
We now define
Also
Remark 1. We will follow Dwass (1959) for Theorem 1 and Pyke (1959) for
Theorem 2. We will not, however, prove the claim 0 < p, <.. < pK < 1. See
386 SOME EXACT DISTRIBUTIONS
n-1
Pi = Y 1 n)ji( n - j)„-;-I/nn,
(
(17)
j_-n -i j+ 1 j
i !-1
(18) np -
; exp (-j) — oo,
as n-,
i =I ji
(19)
i -I
np„_ tee-Y exp(-j) (j
;
j=o
+ l)I as
1 n ! n -1 n i
(20) EK/n = 1+ n n+l E=o I
2
and the joint df of K and From (9), (11), and (20) it follows that
n ! n -I n i
(21) EIIvnII-
2n n+1 Y ^-•
The result (13) is due to Gnedenko and Mihalevic (1952). Cheng (1962) shows
that the number J ° J„ of indices i for which U,, ( „:;) >_ 0 satisfies
1 i 1 n \ i 1 / i \ ”-'
(22) P(J=j)=- Y_- I- 11- - I for 1j<_ n.
i-1 1
(a) X;=G „(( for 1^i<m +1,
i—i i l ) m+l
and note that X 1 ,.. X,„ + are symmetrically dependent. Let L= L. and
. , ,
LI I 2n
and k
(b)n m+1 cm +1 m+1 c m +1
for all m ? some m , (note Figure 1). We will verify below that the S,
0
X, + • • • + Xi satisfy
O n —I n jumps
m + 1 subdivisions of [0,11
0 1
Figure 1.
Thus Anderson's corollary (Corollary 9.4.1) yields that L and N are both
uniformly distributed on 0, 1, ... , m. Thus
Thus (11) and (13) hold, subject to proving (c) in the next paragraph.
We will establish (c) only when m + 1 is a prime; and thus we note that
(d) holds when m + 1 oo through the primes. We now establish (c). Assume
some S; =0. Letting J =nG„(i /(m+1)), this means J/n = i/(m+1) or (m+
1)J = ni. But m+1 is prime, son divides J. Thus J = n, and hence m + 1 = i.
But 0 <_ i <_ m. Thus we have a contradiction of the assumption that some S i =0.
Thus (c) holds.
The result (12) follows from (11) and the fact that
I . /(n + 1 ) — en:.I — I I`
<IIG —III+1/(n+1)
n
(i) _*a.s.0-
(3) P(T<oo) =1
follows immediately from the SLLN and the representation of the Poisson
process in terms of exponential interarrival times. Following Lemma 1 below,
we show that
Now for [ T = n ] to occur we must have [N(n) = n] and [!N(t + n) < t + n for
all 1>0], conditional on [N(n) = n], J(• + n) — n =— N; thus, using independent
increments,
Conversely, given a function f" on G. for all n >_ 1 and making an appropriate
definition of fo , we can determine a function f on NJ satisfying (8).
Theorem 1. (Dwass) If
then
Proof. Now, using (9) and (5) for (a) and (b),
(1—A) ') n
(a) an(Ae -
n=0
= L EJn(G )P(T = n)
n=0
Equating the coefficients of the power series (a) and (b) gives (10). ❑
Figure 1. Arrows mark the renewal points leading to the geometric distribution of N.
DWASS'S APPROACH TO G. BASED ON POISSON PROCESSES 391
As in (10), we conclude that
n n_k ! I k
(n—k
(b) P(N"^k) (n—k)! nn )! \n/
(13)
=k!\k/ \n) k,
which agrees with the fact that N. = Nn (1, 0) in (9.5.3). It is an easy application
of Stirling's formula that
Example 2. (Ladder points of U ") Let L denote the number of ladder points
of IN( t) — t for t> 0; that is, L denotes the number of times that N( t) — t achieves
positive maxima which exceed all preceding maxima (note Figure 2). Since
t
Figure 2. Arrows mark the ladder points, while J„ J2 ,.. . indicate the newly achieved excesses
over previous maxima.
the ladder points are also "renewal points", it is again clear from (4) that L
is a Geometric (A) ry satisfying
Thus the calculations of Example 1 show that the number L n of ladder points
of the empirical df G. satisfies
n 1 k
(16) P(L">k)=k!()(—)
k for k^0
while
(17) P(L n /Vi> x)-+exp( —x 2 /2) as n -*co
for all x ? 0. ❑
392 SOME EXACT DISTRIBUTIONS
Exercise 2. Prove (18). Then use (18) and Lemma 2 below to rederive the
Birnbaum and Tingey formula (9.2.1).
N. Clearly, I,, ... , I,j [N ? k] are iid because of the strong Markov property
of the Poisson process.
Proof. Let 0< x < 1. Let f, denote the conditional density of J given [T> 0].
Let f„ denote the density of the nth waiting time i „. We must show
col n—I
(a) fi(x)= i I1— If„(n—x) /P(T>0).
niL n—xJ
(n—x)/(n-1)
n—x n—x
slope —1
slope 1 — n — 1
7" transforms to
0 n—x 0 1
Figure 3.
DWASS'S APPROACH TO G. BASED ON POISSON PROCESSES 393
has probability [l — (n - 1)/(n—x)]. Thus
as claimed. ❑
^
We define the ladder excesses J,, J2 ,... as in Figure 2; thus J; is the amount
by which the excess at the ith ladder point exceeds the excess at the (i —1 )st
ladder point. Again note that J = J, in Figure 2. It is again clear that
J,, ... , Jk I [ L> k] are iid rv's.
We now summarize these distributional results we have established for the
excesses I,, IZ , ... and the ladder excesses J,, J2 , ... .
(i) Given that N >_ k> 0, the rv's I,, ... , Ik are conditionally lid Uniform
(0, 1).
(ii) Given that L> k> 0 the rv's J,, ... , Jk are conditionally iid Uniform
(0, 1).
for independent Uniform (0, 1) rv's f',, 62 ,..., independent of L. Verify that
n n n
P ,max/ = 1)'(x — i)/n!
1 r=o()(-
i
Example 3. (Excesses and ladder excesses of n(G" — I)) Let I, In2i ... be
the excesses and 1n1 , J" Z , ... be the ladder excesses of n (G" — I) [just replace
rkJ(t) and t by nG "(t) and nt in Figures 1 and 2, respectively]. Let N,, and L"
be as in Examples 1 and 2, respectively, and let
so that
This yields
Likewise
so that
and
(28) P(S"/V > t)-+exp (-2t2) as n-^oc for all t-0. ❑
1— A)
n l (Ae -'k )"(
Y- P([(J" i ,...,J"k)EB]r[L"?k]) n"
(d) P([(J"1, ... , J"k) E B] n [L" ? k]) = P((e1, .. . , 6k) E B)P(L" ?k).
and
satisfies
/ n 1 k
(32) P(N„(c)>_k) =k!1 for k=0, 1,... and c> 1.
k,(nc)
This is an alternative proof for Daniels's theorem (Theorem 9.1.2) and a special
case of Csäki and Tusnädy's theorem (Theorem 9.5.1). Dwass also considers
for c> 1, as well as excesses and ladder excesses for c> 1. He also makes
some limited remarks in the two-sided cases.
between. A crossing from below is said to occur in the interval (i/ (n + 1),
(i+1)/(n+1)] for i = 1,...,n if
We let K. denote the number of such crossings from below. A crossing from
above occurs in (i/(n+1), (i+1)/(n+l)] for 1Ci <_n -1 if
crossing from above. For technical reasons, a crossing from below on (n/(n +
1), 1] is not called a crossing. Let M. denote the number of crossings.
(37) P(K„/V7i? t)-> exp (—e 2 t 2 /2) as n-^ x for all t?0
and
(38) P(M„/V >_ t) -> exp (—e 2 t 2 /8) as n-x for all t ? 0.
Some additional results, from a different point of view, are stated in Darling
(1960).
Lemmas
(39) M(X I , X2 , . ..) = max (X I , X,+ X 2 , ...) and M + = max (0, M).
(40) P(M<0)=—EX,.
Thus
(b) EM=E(M+1[m ot )—P(M<0),
and
P(IN(1) = k) = A k e -A /k!
_ Y_ P(t\l(1) = k, t — (u + 1) is last crossed at height n)
n^k
which proves (41) in the special case that u>_0. Now write
) n k -
(n+u 1
-a n
(k+u) Y_
"aA, (n — k)! (^ e)
=(Akean—(A e-A)Akeacu+'))/(I—A)
=Ak e al.
(g)
which proves (42) also in the case that u ? 0. It follows that the left-hand sides
of (41) and (42) have absolutely convergent power-series expansions in u for
all u. Hence from considerations of analytic functions, both sides of (41) and
(42) are equal for all u. ❑
8. LOCAL TIME OF UJ n
Let
Thus L. serves as local time or "occupation density" for the empirical process
U,,.
(3) f
f(Un(t)) dt=2
=
J f(x)^-n(x) dx.
{0<_t<_1:IJ n (t)EA}= J dt
n
dt
i =^ {C+n:i i
751 C :i +l.vn{!}EA}
5
_ dt
i =0 {f;ter<^;+Jn(i/n—t)EA}
LOCAL TIME OF U, 399
J f
n i
= — n2ds letting s=^^- -t)
i= 0 {f ,:j-5i/n—s/v/n<f.:j.j,srA) n
= n ' /2 — ds
=0 {Jn(i /n —fn: + ^)<ssJn(i /n -2^ : ),seA}
—1/2 n
= (Jn(i /n -f). ✓ n(i /n -))(s) dS
i=0 A
(a)
—1j2
A { i=0
— f n ds.
(b) = Jn A
- ' /z {the number of times U(t) equals s} ds
Note that (b) holds since the graph of U n (t) is a series of n + 1 line segments
of slope —Vi, and (a) simply counts the number of these segments that contains
each fixed s in A. ❑
[Un(t)] 2 dt = 2 x2Ln(x)dx,
(4) f
so that the classical Cramer-von Mises statistic may also be viewed as the
second moment of the local time (density) 2Q_ n . [Set f= 1 in (3) to see that
the word density is appropriate.] ❑
/2
2L n (x) = n I(nil2(+/n—f^:^+^)
— '
.n ij2 (i/n—^'^:^))(X)
i=0
i=o
400 SOME EXACT DISTRIBUTIONS
Exercise 3.Use the result of Exercise 2 together with the joint density of two
uniform order statistics to show that, for x ? 0,
n n i i „ -i
(6) E(21-„ (x)) = n E i (-- n-'^zx) \ 1--+n
n
^izx
= (1i +I () n ) /
(7) xZ du
E(20 „(x})^ f 11— u) exp ^— 2u(1—u)
o 21ru
'
exp( — Y 2/ 2 ) dy
2x
(thanks to M. Gutjahr and E. Haeusler) and use the DKW inequality for
Exercise 4. Show that for any real x and for all y > 0
(10) =P(II(U—x)+11>Y)=P(ßw+I!>x+y).
Q-„ = Q_
where IL denotes the local time of Brownian bridge U, satisfying for Borel
sets A,
2n 2n
(2) P(D nn^ d)=
(n +nd) (n)
and
(3) P(Dnn?d)
If 2n I 2n 2n
2 n +nd n+2nd ) n+3nd )
2n 2n
+(- 1)J+1
(n+Jnd)ll( n '
where J = (1/d).
and
(5) n
P(^2D nn >_x)->2 Y ( -1) k + ' exp (-2(kx) 2 ) as n -+oo.
k=1
These two rather elementary theorems will be proved at the end of this
section.
Note the identity
where N = m + n
402 SOME EXACT DISTRIBUTIONS
n m
(9) N U— Jv=u for all m, n
li as m n n-->oo.
(10) m+n D mn -'d IIU #
Combining this last result with (4) and (5) gives the limiting distribution
of as already recorded in Example 3.8.1.
Proof of Theorems 1 and 2. Now the number of distinct orderings of n X's
and n Y's is (n„ ), and these are equally likely. Thus following the graph of
n[F„ (t) —G„ (t)] from t=0 to 1=1 is the same as taking 2n steps from left to
right starting at the origin, where at n of these steps we also go one step up
and at the other n we also go one step down. (See the dark graph in Fig. 1.)
Note that there are („n ) such "random walks” or "paths,” and they inherit an
equally likely distribution. Thus
where N'.„ denotes the number of such paths that reach a height of at least
nd. But any path that reaches height nd does so for a first time, and when the
2nd
nd _ W
Figure 1.
THE TWO-SAMPLE PROBLEM 403
path from this first time on is reflected about height nd (see the dotted part
of the graph in Fig. 1) it will eventually end up at height 2nd. We thus note
that there is a one-to-one correspondence between all paths from (0, 0) to
(2n, 0) that ever reach height nd and all paths from (0, 0) to (2n, 2nd). But
the latter set of paths correspond to n + nd steps up and n — nd steps down;
there are (nd) such paths. Thus
_ 2n
(b) Nnn
n + nd .
Plugging (b) into (a) gives (2).
Now when m = n, formula (2) and Stirling's formula (Formula A.9.1) give
_2n 2n )
with equality when x 2n is an integer
n+ 2nx)/( n
n! n!
(n+ 2nx)! (n— 2nx)!
n n+l/2 e -n
(+ 2nx)n+ßx+1/2 Q -(n+ 2nx)
n
l nn +1/2 e -n
(n — 2nx)n- 2nx+1/2 e -(n-fix)
x
_ 1 1
n+ 2nx+1/2 ( rx\^x+l/2
n-
1 N
+ V / 1+VnL
✓2x
/ V GX \ v ^x
XL t 1—^
I
n( 1+ ✓ ]-fxr /i- \mal ✓ x
)
(1-2n2 (
J L` 1
(e-2xz)-,(efx)-fx(e-"x)
L ✓2x
(c) = e -2 x 2
.
This establishes (4). The proof of (3) and (5) is outlined in the next exercise.
Exercise 1. Use Figure 1 and a reflection principle [see Fig. 2.2.3 and equation
(b) of the proof of (2.2.7)] to establish (3). Then use (3) to prove (5).
CHAPTER 10
0. SUMMARY
as well as Chang's (1955) exact expression for P( II I /G„ II, >_ A). These, when
coupled with a bound on the latter (Inequality 1 in Section 3), provide upper
and lower bounds on G „, respectively; in particular they imply that for any
e >0 there exists a constant M E such that both
and
occur with probability exceeding 1— E. This last pair of results, very useful
technically in establishing --* d of many statistics [as in Pyke and Shorack
(1968)], is recorded separately for greater visibility in Section 4. We refer to
(2a) and (2b) as establishing the existence of "in probability linear bounds.”
In Section 1, we examine from an a.s. point of view just how small and
how large en;k (with k fixed) can be; see Robbins and Siegmund (1972) and
Kiefer (1972), respectively. This seems an appropriate chapter for these results
404
SUMMARY 405
at
1
(n-1)/n
••
•
3/n
2/n t/X
1/n
0 9
^na Ec:2 (n:3 •N En:n-1 Erin 1
Figure 1.
and
thus
and
That is, 6,, is a.s. squeezed between a pair of nearly linear functions; see
Wellner (1977b) and Fears and Mehra (1974) for applications. Functions
differing only logarithmically from linearity [see (5a, b)] are possible, but the
406 BOUND :
THE EMPIRICAL DISTRIBUTION FUNCTION G.
the results similar to (4) are recorded in a separate Section 6 for greater visibility.
The best upper and lower a.s. bounds we obtain are
Of course, these immediately translate into lower and upper a.s. bounds on
G'. In particular, (5b) and (5a) yield
and
Note that E/r(e) < is equivalent to f /i '(t) dt <oo, and this may be helpful
-
(7b) o l
ALMOST SURE BEHAVIOR OF e k WITH k FIXED 407
This yields either II Iqr Ij or oc as the a.s. lim sup of IIG„iIi 1I for 4r \. In Section
6, we establish the parallel results that
while
Wellner's results for truncation to II II ä” ^” carry over to this problem also; see
Section 5. Also found in Section 5 is Chang's (1955) result that
In a number of our theorems a key role is played by the first few order statistics.
In this section we will characterize both how small (see Theorem 1) and how
large (see Theorem 2) the sequence e n:k may be.
The convergence part of (1) does not require a n or na„ to be \,. Recall that
[ n G n (an) ^ k] = [ Snk ^ a.1.
0 0
(2) P( fn :k _< a„* i.o.) = i according as e
= {
<
> .
Set i =1 to conclude that
log (1 /nfnk ) I
(3) hrn °o =k a.s.;
ng2 to n k
Corollary 1. Let X 1 ,.. . , X„ be iid F. Let k >_ 1 be a fixed integer and let
nP(X > c)-* 0 with c,,/. Then (the convergence part does not require c„ to
be A)
<°
(4) P(Xn:n —k+l > Cn i.o.) = 01 as
n_lJE —
n k- 'P(X > c„)k
0. =
Note (1.1.15) for the proof, and note (10.7.9) for contrast.
Oo
Note that [nG„(a„) < k] _ [^„ k > a„].
nen:k
(6) h rn log en = 1 a.s. for each fixed integer k;
-1
na „*=1og 2 n + k log 3 n+ Y_ logj n+(1+e) log ; n.
j=3
ALMOST SURE BEHAVIOR OF f„, k WITH k FIXED 409
(The sum is interpreted as 0 unless i > j.) Then
0 >0
(7) P(en:k ? a* i.o.) = 1 according as e = <0”
Now observe (Exercise 6 below asks the reader to verify the trivial details)
that since a n -*0
thus the Borel-Cantelli lemma implies that the rhs of (b) has probability 0.
Hence the Ihs of (b) has probability 0.
Suppose E; (na„ ) k / n = oo. Then Y_w (2'a 2i ) k = oo by Proposition A.9.3 if
na n \‚ and by a minor modification of the proof of Proposition A.9.3 if a„ \.
Define
Exercise 5. Show that if a n ", 0, then [nG n (a,,)>_ k i.o.]c [(n —1)G n _ 1 (a n )?
k —1 and en c a n i.o.] for any fixed integer k 1. ?
"Proof' of Theorem 2. The proof of this is very long. See Robbins and
Siegmund for k = 1. That proof has been modified for k> 1; though the proof
is given in Shorack and Wellner (1977), the result is just stated in Shorack
and Wellner (1978). See Frankel (1976) for an alternative method. Galambos
(1978, p. 214) gives nearly the same result for k = 1. Our proof of Chung's
theorem (Theorem 13.1.2) contains much of the same flavor. ❑
Proof of Corollary 1. Now by the inverse transformation (1.1.15)
=
1 0 as <co
and we seek to characterize those cur for which II(G n — I)ryII 0 as n -oo.
0 a.s•
(2) lim ^^(G n —I)cufl = accordingas J 1 a[i(t) dt= j <^.
n -= 'X a.s. o l
Corollary 1. If cli "z,, then
( 3 ) li W IIGnII = 100
I
^ accordingas
a.s. g o1 (t) dt = <0.
Proof. By symmetry we may suppose cli \ on (0, 1). Suppose first that
j r (t) dt <co. (We will follow Wellner, 1977a.) Note that
Also
n
: n 11
(o,sl(^r)4(eI)
s
^a.s. (t) dt as n -+ oo by the SLLN
0
<e by(b);
thus we have
Also
-o \(2n) 21 '
412 BOUNDS ON THE EMPIRICAL DISTRIBUTION FUNCTION G,,
Exercise 2. (Lai, 1974) Derive an analog of (2) for him,, -o 51 (6,, - I) 2 1dI.
The distributions of IIG n / I l1 and II I/G n I! L., were obtained in Theorems 9.1.2
and 9.1.3, respectively. Now we examine the limit distribution corresponding
to Theorem 9.1.3 and give some inequalities.
max n n: +,/ i ^ A)
P( l I/G Il ,,, ^ A) = P(1^i5n -1
(1) - (1 n I+ i l I i I ni A 1-n]
e + ^° 1 — 1) -' A e -iA
(2) -* (
il
1=1
(3)
(3) = e - " + 1- e -A* where 0< A * s 1 satisfies A * e - " * = A e -'k .
_(_)n + )^”>
(a)
(': n n
a" . ;
r=o
Note that
and complete our proof, provided we can exhibit a sequence bi for which
W
(b) a"; ^ b ; for all n >_ 0 where b; <x.
r=o
Clearly, a 0 <_ bo = exp (—A) for all n >_ 1. For 1 <_ i <_ (n/,1) we have
( iA ) ' [ ,__ J
an,— (i ) (n) n !'
I I
by Lemma I below, and el e - 'A <_ I
1 —i /n i
1 1
sinceis(n/a)<n/a.
-1/A i3-2
b.
;
( n) 1 '— 1
^ andforAz2wehave 1 1— fi I"
i n I! fl J
for0^i^n/a.
)i (} 1 n] 2ar i(1—i/n)
< (^^ .
nJ -
n +i/2
n
e e i /12n
2Tr A i [n —^li] n-i
i.+i/2 e
-` 27I (n — i) "2 e - ^" ,2 n n "-'
ei /12n
1 n —Ai
'
as was claimed. ❑
^1 e'/Q = [(l
na ^ ex/Qlx i/n
\ \ a
where
Now
aJ
INEQUALITIES FOR THE DISTRIBUTIONS OF 1I6„/Ill AND III/G,,Il ,, 415
and
We must show that g. (x) ^ 0 for all 0 <_ x <_ a and 0< a
a x=a y
1 ^1 x=(2a-1)
g%>0
ga=g'a= 0 —^
ga=0
7
1^^
g^ a <0 ga=—m
0 V ► x
0
Figure 1.
Now for all 0<a<1 we have g a (0) =gä(0) = 0; while gä(0) = (2a —1)/ a 2
is <_0 for 0<a<_. Since gä(x)<0 for all O<x<a when 0<a<_2 i we have
Exercise 1. (Chang, 1955) Show that (3) is true. [Hint: Use Lagrange's
formula; see Whittaker and Watson, 1969, p. 133.]
Inequalities
Inequality 2. (Wellner) For all A >_ 1, 0 <_ a - 1, and 0<_ b < 1 we have
where
(9) h(x)=x(logx -1)+1 for x>0 and
h(x)=xh(1 /x) = x +log(1 /x)-1 for x >0.
10
12
10
-2 L 1 2 3 4 5 6 7 8
[II I/G.IIa A]
=[t?AG n (t) for some a t<_1]
_ [—(G n (t)/ t) ? —1/,1 for some a <_ t <_ 1]
satisfies
(10) [III/Gn`II,^A]=[IIGn/IIU,/AAA]
418 BOUNDS ON THE EMPIRICAL DISTRIBUTION FUNCTION G.
and
(11) [IIGn`/IIIb? A] = [ III/ 6 nhIb^A]• ❑
Exercise 2. Prove the event identities (10) and (11).
for all A >_ 1. Verify these claims. (It should be noted below that this inequality
is not strong enough to permit characterizing the upper-class sequences of
III/ 6, II
Exercise 6. Generalize (10) and (11) by replacing I by a monotone
function g.
The Glivenko-Cantelli theorem (Theorem 3.1.3) says that as n gets large, the
empirical df becomes nearly linear in the sense that I 1 G„ — III 0 as n -* oo.
IN-PROBABILITY LINEAR BOUNDS ON G,, 419
I
t/M
I Mt
0 Figure 1.
0 I
This result is not nearly as strong as what is actually true. In this section we
show that we can choose the slopes on the lines in Figure 1 so that the narrow
central strip with pointed ends will include G. with very high probability.
and
(6) lim 1 I/G„ 1 ? iii neni2 = oo a.s.
420 BOUNDS ON THE EMPIRICAL DISTRIBUTION FUNCTION G,
(1) n n = f <^
P(IIG,/III ^ An i.o.) = j 0 according as Rj l ^
Theorem 2. Let ,l„/n \ and suppose either A. 2' or lim„_ o A„/Iog 2 n >_ 1.
Then
0
(2) P(I) I/C n II ^n „ A n i.o.) = j 1 according as nj I n -00
Corollary 1.
log 16,, I II =1
tim a.s.
n-W
gz
Corollary 2.
(4) IIGn/III=1k-
maxn kl(nn:k),
and the limit in (16.1.3) is \ in k. These two facts indicate that it is the small
values of,,, that control the largeness of IIG„/III. Note, however, that (n,, : ,) - '
and IICn/III have different limiting distributions.
and (10.1.6) being constant in k suggests that the k=2 term controls the
UPPER-CLASS SEQUENCES FOR 16 n /III AND III/G„jI% , 421
maximum. In fact, Theorems 10.1.2 and 2 show that for A n /n \ and A n 1'
Now define
Ak=[nk <n
max5 nk+l
IIG /I II^an];
= P( II G nk+l/ I II ^ nkitnk/nk+l)
nk+l
= nkltnk
(b) a
l ,
n nk
where the last equality follows from Daniels' theorem (Theorem 9.1.2). Com-
D
bining (a) and (b), we have Y_ P(A k ) <oo; so that
(c) P(Akl.o.)=0
by the Borel-Cantelli lemma. But (c) implies that P( IIGn/ III ? A. i.o.) = 0.
Now suppose 11/nA n = m. Let
An = { IG,,! I II ^ A, ]
[0"1( r) ,
= s u p n. 1 nil n
o<1sI I
1[0,^^(n)
su p > ndn
0<,I I J
ii
=E nit,,
(d) = Dn.
422 BOUNDS ON THE EMPIRICAL DISTRIBUTION FUNCTION G.
Since the events D n are independent and
Y_
[' P
n=I D —
l /n) Y
^`
n1
nAn1 = ao
An=LMn—An ,
where
To see this, suppose „ +t falls between fn;k and 6n:k +1 for O^ k <_ n; then
Sn +1
bn+l;i +l Sn:i +l
= forl<i^k -1,
f i
yn+I;k +1 _ bn+] C en ;k+1 and
k k k
so that (7) is established. The monotonicity of (7) and the fact that ,1 n /n is
\ yield
, I.I I ,+l
I'
n.0 Acn U [Mn /n^A 1 /nj +lJ = BJ.
n=nj n=nJ
Thus P(A„ i.o.) =0 will follow from the Borel-Cantelli lemma once we show
that
(c) P(BB)<cc.
It thus suffices to prove (c).
Let do = A n A 21og 2 n. Then
BBc D;=[Mn;?(n;ln;+1)dn,],
Y_ P(D) i5 Y_ e n—n n
j =2
d ,
j=2
;
j+l
;
n;+ exp (–dn ,,) exp \(1
n +1} dn;.1 /
III/Gnll ;^:i
and from Theorem 10.1.2 we have that P(ne :2 A n i.o.) =1 in case (a) fails.
El
nIIGn/gl is "inn.
424 BOUNDS ON THE EMPIRICAL DISTRIBUTION FUNCTION C;
Behavior on [ a" , 11
We consider briefly limit theorems for IlG n / I II Q . , Ij(G — 1)/ I II ä n , and associ-
ated rv's as a" J,0. The first claim of (8) below is from Chang (1955); the other
results are from Weilner (1978b).
If a" J 0 and na" -- 00, then
i > —
I
(8) I -D 0 and I -^ 0 as n co.
a^ a.,
where ß'c \ from oo to I as c 2' from 0 to co and 1 /ßC equals co on (0, 1] and
\A from oo to 1 as c >' from 1 to co. Also
where 1 /y and y both \ from co to 1 as c>' from 0 to co. From (12) and
(13) the behavior of II (4D" — ')/' II,, and II (G „' — I)/I II ä . can be determined.
UPPER-CLASS SEQUENCES FOR JIG„//II AND I^1/G„ 14' , 425
If c,-+0, then
Another Extension
We report here results from Mason (1981b). He shows that for a„>-0 and 7
we have for each 0 v 2 that
(15) P ( [I(1-I)]'-"
n" (G„-I) „i.o. l
II ,a /
For v = , this is a result due to Csäki (1975) that will be considered in detail
in Theorem 16.2.3. For v = 0, this is equivalent to Theorem 10.5.1 of Shorack
and Wellner. For 0< v<, this is due to Mason (1981); he uses the method
of proof of Shorack and Wellner, coupled with his following extension of
Daniels's theorem (Theorem 9.1.2).
For each 0 <_ v < Z there exists a constant c" such that for all A> 1 we have
Il a _
(16) Pi n - 2(1+v)A}
" 05-: and
l o J
" C - I "
Exercise4. Establish (15) and (16). [See Mason, 1981b and Marcus and Zinn,
1983 for the two parts of (16).]
(17 ) n -W
f
lim log " 1” - I
g Ln II [I(1- I)]'-"
l = 1-v a.s.
We now improve the result of Section 10.4 to an a.s. bound by using bounding
functions having slopes 0 (lower bound) and co (upper bound) at t = 0.
Theorem 1. (Wellner) Fix 0< 8< 1. Let e >0 be given. Then a.s. for n
exceeding some we have
and
Proof. Equation (1) follows immediately from Corollary 10.2.1 with i(t)=
t 1 Theorem 2 below; then (2) follows from the trivial identities of
Exercise 10.3.6. This is proven via different methods in Wellner (1977b).
1
(3) lim inf = lim — =1 a.s.
n_'°° fn:t5ts1 Jig n-.00 ^n f:i
and
G n'
(4) lim = 1 a.s.
Ig vn
Note that (10.2.3) and (3) provide rather tight a.s. nearly linear bounds on
G.
Proof. See James (1971) for a preliminary version of this result; this is from
Wellner (1978b).
Now
and
Plugging (b) and (c) into (a) gives the upper bound. Thus (3) holds.
We note that (10.5.10) implies
provided
na„
(6) lira >0 and lim d„ = 0.
n-.001og2 n n^ 0
Note that (3) implies i/i = I/g is an a.s. lower bound for G„ on [fi n ,, 1].
Thus i/i' is an a.s. upper bound for G' on [1/n, 1]. We claim that
Exercise 1. Show that (10.0.5b) and (10.0.6a) follow easily from Theorem 2.
Exercise2. Show that Ei/i(e)=J /i(t) dt<ooif and only ifJ [/i - '(x)] dx<oo.
Corollary 1. We have
a.s.
(8) him t/(n+t)supn/(n+l) t(1 — t) logt [1/t(1 — t)] 1
428 BOUNDS ON THE EMPIRICAL DISTRIBUTION FUNCTION G,
and
^n'(t) —t
(9) lim sup , lim nfn: , = oo a.s.
n- 1/(n +1)< «n/(n+l) t(1 — t) n-.00
Proof. The lower bound in both (8) and (9) follow from (10.1.6) by replacing
sup by t = 1/n. The upper bound in (8) is mainly just (4); but breaking [1/n, 1]
into [1/n, e] [to be treated via (4)] and [e, 1] (to be treated via Glivenko-
Cantelli) is necessary because of the (1 — t) factors in (8). ❑
Open Question 1. Determine the a.s. behavior of G „' —1 more carefully than
in (6) and (7).
(2) 1 /(ne,,:t).
The almost sure behavior of (1) and (2) is similar, as noted in Remark 10.5.1.
We now consider the more general quantity
(3)
1
max Ig( f" : ' )
Isk„ na n
Theorem 1. (Mason) Let a„ >_ 0 be >' and let g>0 be \ on (0, 1). Then
the following three statements are equivalent:
W
(4) E P(g(f)>na„)<oo.
n=1
tim max
ig(e.^:0 < e a.s.
(5)
n-. I^i^(nö na„
then even
Proof. We follow Mason (1982). Clearly (5) implies (6) for all kn >_ 1 having
kn /n+0.
If (6) holds for any single kn , then it holds for k" = 1; that is,
Case 1: Eg(f)<cc.
lg ) 1
( ( " s)
(c) max "^ — a,n a_1 g n:f -
<r<(ns) na n •
Thus we have
ig( ) 1
1 (ns)
(d) lim max
n-.ao l^;s(n8> na n j <—
—
um —ng( )
i_I
a^ n^ao
1_ __ 1
<_—him — I g(^1)110,2s1(,) since n:(nS) ^a.s. s
a l n- = n f=1
1
^— by the SLLN
a, E[g(6)l[o,2,](6)]
(e) <s for small enough 6
Case 2: Eg(6)=oo.
Since g,,,
(f) max
ig(fn:i)
1sr (ns) na n
C— -j=1
]
g(Sn:i)
na,,
_ L.i =1 g ( Si)
'
na,,
'
(g) - 0 a.s.
1X
1 +. .+x"I
(9) tim
n^m bn
<oo,
= 0 a.s. according as Z P(IXn ^> bn ) = j
loo n=1 l
1 g
i(n:t) [ 1" ]
This takes care of the case Eg(f) <oo. The old proof works in case Eg(f) = co.
BOUNDS ON FUNCTIONS OF ORDER STATISTICS 431
Corollary 2. Let a n >_ 0 be ‚"co and let g>0 be \ on (0, 1). Then
and
Jg\Sn:i )
(12) lim max = oo if and only if Eg(^) = oo.
n-.ao 1<isn f
Proof. The result (11) and half of (12) are established in our work above.
Though we do not need it, we remind the reader that for any ry X >— 0 [and
in particular for X = g(e)]
n o0
(13) P(X>n)<EX< P(X?n).
n=^ n=o
The other half of (12) follows from (f) (with S = 1) in the proof of Theorem
1, (9), and (13). ❑
Z = Isisn
max (n— i +l)Xn:; .
Z
Suppose 0 = sup o< , < , (1— t)F '(t) is finite. Show that Zn / n -4 0 a.s. as n -> o0
-
Exercise 3. (Sen, 1973b) Let X 1 ... , X„ be lid Fon [0, oo) with 0< EX 2 <
,
oo. Let > 0 be given. Then there exists 0< C < oo, 0 < p e < 1, and n < co such
F
that
0<x<oo
Moreover,
In this section we will present without proof results concerning the almost
sure behavior of 7L „(a„)/b„. The corresponding results for the quantile process
will also be presented in Section 9. All material is taken from Kiefer (1972).
Recall that the ordinary LIL yields
thus a lim sup of I is maintained provided a„ does not decrease too fast.
The most important special case occurs when lim„ c„ = c. To describe the
results we must introduce the function (which arises in exponential bounds
for binomial rv's)
Then
(8) lim 7L-(a n )/b n = c72(ßC —1) provided lim cn =c>0.
Also,
(9) lira Z(a)/bn =V 7(1—ß) provided lim c n = c> 1.
These limiting functions are graphed in Figures 1 and 2. (The graphs of cßC
9
goes to
goes to I
0 goes to 1
0 1 2 3 4 5
Figure 1.
7
goes to
1 goes to 1
goes to I
C
0 1 10
Figure 2.
434 BOUNDS ON THE EMPIRICAL DISTRIBUTION FUNCTION 6„
and cf3 that are also shown are geared to the statements of Theorems 1 and
2.)
It is trivially true that
We can combine (4), (9), and (10) to yield Figure 3; the solid line gives an
a.s. value of lim sup„ Zn(a n )/bn , while the dotted line provides an a.s. upper
bound on this lim sup.
✓ /2(1 -PC) 1
0
loge n clogen cnlogzn
n n n 2
with c>1
with c,, -+ o
Then
log (1 /cn)
(12) li m7Ln(a n) cn log2 n 1 a.s.
to gz n =
Figure 4 below should provide a quick visual summary of key special cases
of (4), (8), and (12); values beyond oo on the vertical axis are used to indicate
the rate at which Z(a)/b n converges to oo •
Exercise 1. Use Section 6.1 to establish that lim sup,, Z (a n )/b n =0 a.s. when
na n = (log n) with e > 0.
Equations (8), (9), and (12) above are special cases of the following
theorems.
ALMOST SURE BEHAVIOR OF NORMALIZED QUANTILES AS a„ 1 0 435
log n
21og 2 n
log t n
2
log t n
2c
0 '-•
0 1 1 c C C 10 92 11 c„log 2 n 1
n(logn)'*` nlogn nlogn n n n 2
Theorem 1. (Kiefer) Suppose a„ 10 and 1 < lim„ c„ <_ lim„ c„ < 00; and sup-
pose there exists kn t such that kn = [1 + o(1)]c„ßc, 1og2 n. Then
nG„(a n )
(13) lim= 1 a.s.
„—w cn /3cn 1og 2 n
Theorem 2. (Kiefer) Suppose a„ J.0, lim a c,, <o0, and lim a log (1/c„)/
log t n = 0; and suppose there exists k„ T such that k,, = [1 + o(1)]c„ß + log 2 n.
Then
We now state the results corresponding to those of Section 8 for the quantile
process. We have
Once again, this lim sup is maintained if a„ does not decrease too fast. Thus
we state
In case lim e c„ = c in (0, oo), we will need to introduce some notation. For
each c > 0,
(3) let yC > I solve h(y.) = yC /c [or equivalently h'(1/ yC) = 1/c]
and
(4) let y'C < 1 solve h (y') = y' / c [or equivalently 1 ' (1/y) = 1/c].
7 ^s to
1 Des to 1
Des to 1
0 C
01 in
Figure 1.
10
EI
I
N
0 1 2 3 4 5
Figure 2.
ALMOST SURE BEHAVIOR OF NORMALIZED QUANTILES AS a„ 10 437
Now /7(1/A) = (1/A) —1 + log A, and it is apparent from Figure 10.3.2 that y-,
and y are well defined. Now
Also,
(6)
V(an) _I 1— yc_) a.s. provided c„ c>0.
lim an(1—a)6„ 1\
See Figures 1 and 2 opposite. (The other functions graphed are the a.s. lim inf
and lim sup of nG-n'(a n )/log e n, which is the form in which Kiefer (1972)
states the theorem).
If a„ _> I/ n, then
(7) lim —
n- Va„ (1 — an
) bn 2c„ = 1 a.s. provided c n 0.
Vn(an)
(9) li m a„ provided c„ J, 0 and na n T o0.
( a„) b„ = 0 a.s.
CHAPTER 11
Exponential Inequalities
and 11 - / q 11 - Metric Convergence
of U n and 4/ n
0. INTRODUCTION
q(t)2dt <co,
(1) J0
(2) II(U —U) /qll -+‚' and II('' —V) /qjl - pv 0 as n -goo.
Our main goal in this chapter is to prove sharpened versions of this type of
theorem for both the uniform empirical process U,, and the uniform quantile
process 0/„. The theorem for U,, was originally proved by Chibisov (1964),
and was reexamined by O'Reilly (1974). For symmetric functions q which
satisfy both q? on (0, 2] and t -112 q(t) u on (0, Z], the Chibisov-O'Reilly
theorem refines "(1) implies (2)” to the assertion that (2) holds for the special
construction if and only if
If a symmetric q satisfies only qT on (0, Z], then (2) holds if and only if
1/2
438
UNIVERSAL EXPONENTIAL BOUNDS FOR BINOMIAL rv's 439
Our approach to proving these 11 • / q 11 convergence results for U,, and V„
will be to first develop good exponential bounds for binomial rv's and for
uniform order statistics. The exponential bounds are then extended to neigh-
borhoods of 0 for the processes U. and V,,, and these inequalities yield, in
turn, the probability inequalities for U' / q and IV n / q (I ä, respectively, upon
which our proofs will be based.
In Section 6, we establish the convergence of U. and other processes in
weighted', metrics.
In the last three sections of this chapter we collect facts concerning moments
of order statistics, the binomial distribution, and exponential inequalities for
gamma, beta, and Poisson rv's related to those of Sections 1 and 3.
A careful analysis of the detailed behavior of the empirical process will require
tight exponential bounds on the tail behavior of the normalized Binomial (n, t)
ry U (t) ; we suppose for conveience that 0< t<_. These can be derived from
,,
the inequalities and methods of Section A.4. For the short, well-behaved
left-hand tail we obtain Hoeffding's bound (6) which is strong and clean. For
the long and troublesome right tail we have the necessarily much weaker bound
(3) of Bennett.
We will extend these results to intervals of is for the process U,,. The analog
for U will also be considered. These results are of full strength only for very
short intervals. In the next section we will piece strong results on short intervals
together to get strong results on any interval.
12 8 -1 0 1 2 5
We will seek to bound P(±X ? A). Our bounds will be expressed in terms of
the important function
Az
(4) <exp ( — ) (Bernstein),
✓n
2[Pq+gA/(3)]
and for all A >0 we have
/ A2 / \\
(5) exp I —2 41 - -= )I I
-
(Wellner)
\\ p \ p n///
^z
(6) exp I — p
2 q ) : [expression (3)] (Hoeffding).
(
P(IBinomial(n,P)—nfl>A)^2 exp (
A2 ^il A ))
✓n J 2p \ ✓ n p
for all0<p<—p^2
(9) Ai'(A)isTforA>-1,
11-6 if0<A<38and0<-6<1
(11)
6(l-6)/A ifA>38and0^6<1,
(12) 01—^i(A)<A/3 if0^A3,
Differe /n t iate the exponent of (a) to find that it is minimized by the choice
rm i n =V n [log (1 +A/q.Jn)—log (1—,1/p'). Plugging this back into (a) gives
where
A 2 A3
3+...
= —(P —A) )
AP z 1 + (3p
L\P +2
...
/ A3
= Z +— lz— lz)_(P_A)(A3 3 +...) +(q +A) 3 —_ ..^
2pq 2 p q 3P 3q
A z k3 1 A 2 (1 1)
—(P —A) 3p3
'2Pq +...J+ 2 `P z_ 9 2 J
A2
2 pq
+(p—,t) log 1 —A
L \ P/ p 2P
f ( l
+ A + ,1 z + A2 ( 1
2 9/
z — lz l
2 2 2 3
A2 1 !
= A +ph(1— - +- Az + z — zl
2pq p 2 P 2P 2 P q
I — q
2p \ Pl + 2Pq( — qP^
2Pq P) q J
and establishes (5). This is actually a slight improvement over Wellner (1978b),
UNIVERSAL EXPONENTIAL BOUNDS FOR BINOMIAL rv's 443
and we summarize it as
(17) P( — X>—A)<_exp( — A2
[q4l(_^
n
p )+
P(1 9 2P)
1/
The following constants will arise repeatedly. Let ß> 1 denote the solution
of
Exercise 1. We have
(19) h'(A)=log A,
2 3 n n
= 2 — 6 +. . + n(n ) i) +...
forJAI<1.
Solve this exercise by passing to the limit in Inequality 1. (Note Example A.4.1).
Exercise 3. (Slud, 1977) If 0 ^ p <; and np <_ k <_ n (or if np <— k < nq), then
a AZ A
<^1+ p
/^A eXp \ 2p o \Vnp
for all A ? 0 and
A
ctr
— \ Tp/-1^A exp \ 2p \ ^p//
Inequality 2.
(i) (James) Let 0 < p <_ 2, n>_ 1, and A > 0. Then
exp . 2p
\ /)
(26) P(IIU /( 1 — I)jI A /(1 — P)) `— 2
exp (— 2Pq
Proof. Since {v„(t) /(1— t): 0 t:5 p} is a martingale we have that {exp (±
rU „(t)/ (1 — t)): 0: t < p} are both submartingales by proposition A.10.1, where
BOUNDS ON THE MAGNITUDE OF IlU/qH{ °4
45
r denotes any real number. Thus for any 0 < p < 1 we have
y^(t) — 1
(IJU:/(1—I)Il ö--a/(I — P))=P sup t
( 0-t -5p 1—t 1—p.
-
Section 14.5 contains the natural analogs of this inequality for the Poisson
process.
In this section we seek to bound the probability that the empirical process
ever exceeds boundaries such as those shown in Figure 1. Our inequalities
will be built up out of the binomial bounds of the previous section. We will
need these bounds to prove Chibisov's theorem in Section 5.
—q
Figure 1.
Inequality!. (Shorack and Wellner) Let 0<_ a (1— 6) b < b s 8 < and A>
0. Then for q e Q
t exp t — (1 -8) Z y^
2 2
q t t) ^ dt,
where
and
always works,
I-1v 1 .
ifa?g 2
n n
Moreover, for q E Q*
f b
q2(t)
(6) 1'(II^n/9IIa-^) ` g s, t
exp (— (1 —S) b y^(t) 2z I dt,
a
where
(7) y(t)1
and
(8)
[When a = 0, both probability bounds (3) and (6) exceed 1 in the "+" case.]
If b>,0 it may be an advantage to use a fixed small 6 in (3) or (6) rather
than 3 = b; the exponent typically will not matter either way, but the 3 term
in front will stay bounded. For q E Q* it is difficult to bound y + (t) below by
a constant. Hence in applying (6), we will use (11.1.11) to bound the right-hand
side of (6) by the sum of two terms, a device used by both Chibisov (1964)
and O'Reilly (1974).
Corollary 1. When q( r) = f, 0 ^ a <— (1— 6)b < b <— 8 s2, and A >0 we have
(b/a)exp( A2
(9) P(IIl±/VniIlä=a) 3 log )
BOUNDS ON THE MAGNITUDE OF IlU„/qIIä 447
Proofs. Let
(a)
Define
(b) 0=1-5
and integers 0:5 J < K (note that J>_2 since O>_ Z and b <- 2) by
[From here on, 0' denotes 0' for J <_ i < K, but 0' denotes a and B J- ' denotes
b. Note that (new 0' - ') <_ (new 0' )/ 0 is true for all J <- i < K.] Since q is >'
we have
so that
K
(1-6')
(e) P(An)s^ l.
p
\II1^ Iflo ''^q(B') {1-e' ')J -
(f) < K p
ex -- 2 q2(e`)
, 8y since 1- B' ' >_1-b _ 0
; .J 2 8
1 - exp - B2 y dt
=J+I1-0 t
K-'1 J BB1
_, 1 \ 2 q2(t)
A2 t
A 2 g 2 (eb) 82y _1
+exp 1 -- q2(a) B e y I +exp I
2 a III \\\ 2 Ob J
b 2 2
3
(g) a -ex
t (-(1- S) 2 q ` t) Y - ) dt
448 EXPONENTIAL INEQUALITIES AND II'/gII-METRIC CONVERGENCE
by (b), (c), and the fact that l bb t >_ (log (1/0))/8 >_ 1 follows from
a s (1 — S)b and S <_ Z. This completes the proof of (3) in the " —" case.
We now consider A. From (e), (f), and (11.1.25) we have
K ( A2 g 2 ( 0 1 ) ( Ag(0`)(1- 0i -')
(h) P(An) 5 ^ exp — 2 ®^-^ B 4` B; -, f
1
11
But a= 9 '<_6' for i <_K and q(t) /t is \, so
-
Thus, as in case A , we have
(1) P (A ) 8 fa t exp j —
2 q tt) 82 (^a.^ )> dt
completing the proof of (3) in the "+" case. We need only remark that for
(5) we use
We now prove (6). Up to step (f) the proof is the same as that for (3).
Again we first consider A-,, but now qE Q *. By (11.1.16) and (f),
(A„)<_ Y_ exp
2
i
K
\— 2
1
0)
c.' p ( — A2 q (e'.)
ex 1 +ex p
B3/ q(a) 92)J
_j 2 0 0 (_• 2 a
1 I (t)03)
1 1 exp 1
—A2 dt+exp (—^2 qz(a) 0 2 )
S B y e t \ 2 t 2 a
z t z e
since q ^ <—) 8(+, ) for 8 ' +' < t < 0 '
b
(l) s
f ae
i exp ( — 2z q 2 t) (1— S) 3 ) dt+exp 2 g2aa) . 2
f
2 2 a 2 2
I \
ea exp (-8 3 q (t) dt ^ s Jea r exp C—B 3 2 q t t) 1 dt
(n) >_ C1 J a t dt 1 exp (-83 2
ea q 9a )) '—exp ( —BZ 2Z g2aa) )
K n 8'-'
exp ( -101_1h
q0 ll
t (
e \
(o) <_ Y t 1
-j S f 0 i o 1t
exp —nth 1+
K-1
'kq(t)
✓nt //
03 dt
n9 K q( K ) ^ (1
+exp(_ + '^ — o K- ')
1-9 -ih(
,
^8 //
3
J I
b C—nt
ex
t 2nt2
aB p
A2g2(t) 8 e C O3Aq(t)
../itt i,
dt
K -I
+expC-1n eOK_, VnOK-, (1—BK ')//
hC1+
2 fb1 (0 6A2 q2(t) , ( o3A(t) y
(cSaB t exp dt
2 t vt
1
3aot
l
b
—exp (—O6A 2 q2 (t)
2 t
(0
3 Aq(t) 1 1 dt
-, /it //
>_(!
1
Eidt^ exp 1 —nah(l+ '19(a) ®3
t \ 'aO '/
(q) n 0K-I)/
? exp C — 1 nBB K _ 1 I—
K I A q(a) l
.
Then
(13)
f . ,/
21 exp(— Lg t (t) f dt<co for allE>0,
provided
If (13) holds, then lim sup, o g(t) = +oo necessarily, but lim inf, o g(t) <oo is
possible. (ii) If q E Q, then (13) implies (14).
Proof. (i) Merely note that
/21 exp —
t g2(t)) '/2 -f
(a)
f o,
t dt— o
t[log(1/t)]`g2(t)
dt,
where
1/2
(b) [tlog(1 /t)] -" dt <oo when,>1.
0
(a) I 1 _e 2 (t)
J‚ s exp t — q-Ls/
(s) I ds? exp ( )
J s
where
P(IIvn/v'iTIIa>A)
{any Inequality 11.1.1 bound on P(I v„(b)I //E>— A a b)}.
We now develop analogous results for Brownian bridge U. Note first that
see Exercise 3.6.2 and Section 6.6. The following is now easy.
while
Proof. Since {U(t)/(1— t): 0< t< b} is a martingale, {exp (rv(t)/(1— t)): 0:
t <_ b} is a submartingale. Thus
=2 inffexp \1 rb 2b(1—b)
+ 2(lr2 b) /
using the fact that a N(0, r 2 ) moment generating function is exp (0. 2 1 2 /2).
Now let A [ II U/ q II ä ^ A ] and just recopy the proof for An out of
Inequality 1. ❑
We now summarize some stronger results in this vein [the reader is referred
to Section 1.8 of Ito and McKean (1974)]. If q denotes a continuous function
that is positive on (0, 1) with q(0) = 0, then
P(S(t) <q(t), tJ,0)= P([w: S(t) <q(t) for all 0< t <some Ste,])
(18) =0or1.
(In Exercise 2.5.1 the reader was asked to prove Blumenthal's 0-1 law; (18)
is a trivial consequence of that Iaw.) The function q is called upper class for
S if the probability in (18) equals 1; otherwise, it is called lower class for S.
1
(20) P(§(t) < 9(t), t^0)_^O
as
q(t) q2(t)
✓
o+
2^rt3 exp (—
2t J
dt = I <m
2irt
o
q(t) exp I—
f
3 (1 —t) 3 2 1 (1 —t) /
dt.
q2(t)
P(t'(fn:i —
pi) > P9 A),
where
and for
Recall that
Inequality 1. (Exponential bounds for N n (b)) For all A >0 and 0< b <1
we have
(3) P(+Vn(b)^A)^exp — ,1 2 fA l
( 2b (bam /)
i
n in the + case
b ^r( b l_
=exp^-2
(4) 2
xp — A in the — case.
Corollary 1. For all i = 1,,. . . , (n/2) and all A > I we have the crude bounds
(5) 9 A)<exp(—A/10)
P(+Vi(n,i—pi)? P
454 EXPONENTIAL INEQUALITIES AND II'/9II-METRIC CONVERGENCE
and
4z ^(v A)=4
' (v A)iWA)
A
for A >_ 1 since ,taji(A) ? by Proposition 1
=4^G('),
where äir('/)=0.13321 ...>0.1, and thus (a) implies (5). The inequality (7)
follows from (5) and (6) by symmetry about Z. ❑
(9)^i(A)= 1 1— I log(1+A)
)=I+A 4`( I+A) ,
2 2 2 32
-1 „
(14) ^ (A)= 1-3A +4,^ — A + ..+ A+ .. forIAJ<1,
5 n+2
(15) c'( 0 )- - ,
and
and
Letting first A = (77 —1)bVi in (c), and then A = (1 - 1 /77)bVi in (d), then (c)
and (d) give both the + and — parts of (2) by use of (1). The inequality (4)
follows immediately from (10). ❑
0
— r
<
f so
S r—
' ds+2r
50
exp ( hs/so )s' ' ds
— —
(1+2rF(r)h ')s^. -
Note that use of an exponential bound of the type (7), but with constant ;o
improved, only changes Cr . ❑
Inequality 2. (Exponential bounds for suprema of N„) Let A >0, 0 <p <‚
and n ? 1. Then for p <— 2,
Our proof of Inequality 2 will rely upon the following simple events identity.
Lemma 1. For all 0 < p < 1, 0 < A < 1, and r2!1 we have
(22)
1
[1 ( ^1 '— I ) 110 'A ^^I(G1—I)+^Iov av)
°>
1+A 1
G a (t), s
r+ (1 —r)t
1
(1 +r)t — r
r+(1 —r)a
r
(1 +r)a —r
t, 1(S)
0
0 _!_a
1+r
1
rDR
Figure 2.
i1 (G 1 — I ) IL >r ]
_[G„(t) - t+r(1 —t)forsome0<_ta]
=
Thus, letting p r+ (1 — r)a and A = r/(1 — r) so that (p r)/(1 r) a= — — _
p(1 + A) — A = p — Ag, (a) yields (22). Similarly, from Figure 2 we also have
Proof of Inequality 2. From (22) and (i) of Inequality 11.1.2 (written in terms
of the function h rather than the function #A),
=1'
(V 1— It A> 1+A/9)
h [ l+1
ceXp \ —n l (p — A) + ,t/9P - A))
(b)
=expt —nq+A (1+1+Algp-A)/'
note that the probability in (a) is zero for A > p, so we have assumed A <— p.
Thus q + A <_ 1, and hence
and
A/q q+A 1
(d) 1+1+A/q p—A 1—A/p i
Combining (c) and (d) with (b) and using h(x) ‚' for x>— 1 yields
II exp (—np(l—h(l/
(e) II 1' 1) fo > 1 ^ P/ ^ p^—AP^^
=exp \
—n 2p^\ p//
P
(f) Ii AP/
=P
\!I (G I —I) 1Io +A ^ 1 - A /q/
= p
(II (G1 — I ) ( + ^ > 1 — (P+A)/
\
r
<exp —n(p+A)h 1— A/q q —A
(1 —A /q) p +A ).
Note that
_ A/q q A __ 1 —
(g) 1 1 —A /q p +A 1 +A /p'
2
= exp -np_ 2/ 2 h 1 +^
\ P /P P/
r i
}
(h)
=expC—n2P^ \P //
Now the inequalities of the preceding section yield probability bounds for
^IV /qII ä in the same way that inequalities of Section 11.1 were used in Section
11.2. Since the proofs parallel those of Section 11.2, we leave them as an
exercise.
Recall the definitions of the classes Q c Q* used in Section 2.
WEAK CONVERGENCE OF U, AND V. IN jj•/q^) METRICS 461
Inequality 1. Let q E Q, n ? 1, 0 <— a <(1 — 5) b < b <_ S <— Z, and A >0 satisfy
S+n 2 A<_2. Then
Ill t bb 2 2
(3) Y+=+^^ä(f
(5) P\ q Il
/IlV #IIb
Ii — II
a
) z_
-A /
—
b2
s Ja(1-s) t
exp(—(1—S)
by^(t) 2
2
t \
t) 1 dt,
1
where
(6) Y (t)=I?y+(t)=0^ _
v'— )
llb>A) 31og(b/a)
<exp —(1—S) 3 y } 2 I,
(7) P \I1v
^ a 11
(a) s -' exp (—Ag 2 (s) /s) ds? (—log A) exp (—q 2 (t) /t).
At
1-6 ifeq(t)/tVi<_36,
(h) y (t) - 138(l — 3) tN/—
n/ Eq(t) ifvq(t)/tfn>_3S.
Hence by adding the two resulting terms and using na„ = s, it follows from
(g) and (h) that
P(B„)s4 f 0t
'exp(_ (I_6 dt
)2t2g2(t))
+4
S
f ao
(I-s)
t ' exp (—z6(1-6)'sq(t)V) dt
4 foo t_'eX
P
( (16)
2t2g2(t)dt
+4
s
f a
”
,(I s)
-
t ' exp(—z&(1—S) 7 eh(B) nt) dt
E 8 '^
<--+— s-' exp (-2S(1—S)'sh(B)s) ds, set s = nt
2 3 E(I-s)
(i) <e for B <— 0E
since (3) guarantees that the first integral can be made small by choice of 0
sufficiently small, and the second interval can be made arbitrarily small for
small 0 in view of (b). Combining (d)-(f) and (i) yields
Cl) forn^someN..
Combining (j), (11.2.17) of Inequality 11.2.2, and Theorem 3.3.1 with (c) yields
(4) (in view of symmetry of the process about t = Z).
464 EXPONENTIAL INEQUALITIES AND II'/gDI METRIC CONVERGENCE -
Now our given condition (4) implies P(JI(v„— v) /q1I e) 1, and thus (k)
implies
(1) P( —U(t)<e^ t(1 +q(t) /f) for 0<t<6 2 (e„ : ,n1/n)) -*1
as n -> oo. Letting
we see from (1) (using U = —U) that for any fixed small number ß ;
and hence (a) implies that h(t)/fit= 1 +q(t)/ ✓ t --oo as t-+0. Hence
Thus
oo> T(h, A /4)
=
1
exp I —4 (f +9(t))Z l dt
f 1/2
(some 8) 1 eX
/—A [q(t)+q(t)]2)
J c t p I\ 4 J
dt
(s) '
WEAK CONVERGENCE OF U. AND V. IN II• /qil METRICS 465
Thus
(t) T(q,A)<oo,
establishing (3).
That (5) is equivalent to (3) for q in Q was shown in Proposition 11.2.1.
[Hint: See Inequality (11.2.2) to get (3) implies (7) and the second paragraph
only of O'Reilly, 1974, p. 644 gives (7) implies (3).]
(ak — t)'""
(bk+l- bk)
+bk forak+lit<_akandk=1,2,...
( 8 ) l/q2(t)= (ak—ak +l) k
z
lira g 2 (t) = Lim q (t) = 1 <oo.
1-.o ,_o tloglog(I/t)
Combining this with Exercise 1, it follows that (3) does not imply (5). [Shorack,
1979c and Shorack and Wellner, 1982 claimed it did. Condition (5) was
introduced by Shorack, 1979c.] (ii) In fact, if h (t) -* oo as t j.0, then there exists
a q in Q* having 1ö [q(t)] -2 dt <c'. for which lim,., o [q(t) /f h(t)] = 0.
We remark that the investigation of (4) by Chibisov (1964) used (3) while
O'Reilly (1974) introduced (7). Chibisov (1964) proved slightly less than the
466 EXPONENTIAL INEQUALITIES AND II' /gII-METRIC CONVERGENCE
equivalence of (4) and (7). He used the representation 8.4.4 of U,, in terms of
a Poisson process. Using this same Poisson representation, O'Reilly (1974)
established the equivalence of (3), (4), and (7). The new inequality 11.2.6 is
used here in their proofs.
Chibisov (1964) established a result analogous to Theorem 1 for the process
vn of (8.4.2). O'Reilly (1974) established a result analogous to Theorem 1 for
a smoothed quantile process V; we develop this in Theorem 2.
Note that f 1[a b] with 0<— a < b s 1 denotes the function on [0, 1] that equals
f on [a, b] and is 0 elsewhere. Because of minor difficulties [V n (0+) equals
v'i 'n ,, and not 0, etc.], our theorem will be stated in terms of0/ n 1 [1 /(n+1),n /(n+1))
[We use 1/(n + 1) instead of 1/n so that Cn; , is included in the conclusion, etc.]
if and only if
Let qE Q; thus we also have q(t)/ ✓ t is \ on [0, 2]. Then (10) holds if and
only if
(d) 4 J (
S 1/(2n+2)
8 21
(1—S)6y+(t)E2g2(t ))
By Proposition 11.3.1 we have
1—S ifEq(t)/tom<_2S
(e) Y (t)— Z5(1—S)t.in/eq(t) ifEq(t)/tV7i? 5.
Hence by adding the two resulting terms, it follows from (d) and (e) that
P „ <4
f8 't ex ( (1—S)7e2 g2(t) ^ dt
A ) 1/(2n+2) P — 2t
e
+ t-' exp (—Q8(1— 5) 7 q(t)Vi) dt
5
(1-5)
cSJot-'exP( — 2t2g2(t))dt
4
+— t'exp (—(1—S)7q(0) t) dt
S1/(2n+2)
o
I.
as n -> co. Now (h) implies that q is upper class for every e > 0, and hence by
Exercise 2, (11) holds.
The equivalence of (11) and (12) for q€ Q is found in Proposition
11.2.1. ❑
sni
(14) 1
q(i /(n+1)) ^ °0
max1 asn- x.
Proof. Because V/q has continuous paths a.s., (10) implies that the maximum
discontinuity of 4/ „1 ( , /( „ + , ) „ /( „ + , )] /q must converge to zero in probability; but
that is what was claimed. ❑
(15) 0/„(t) =^[ „ :; —t] for i /(n +1) I<(i +1)/(n+ 1),0 <_i^n.
The corresponding functions G', t', and (G„' are shown in Figure 1.
Paralleling (8.2.3) we have the representations (valid for each fixed n)
+l
V,, n r^„+^(n +l) tS,,(1)+ n+1(n+1
(16) n+l ^l„ +1 L —tIJ
n=4
n+1=5
Figure 1.
and
and
if and only if (11) holds. The results (18) and (19) hold for each q E Q if and
only if (12) holds.
Proof. Suppose q E Q* satisfies (11). Then since
as n-oo
max 9(i/(n+l)) -+ 0
and
v' sni
(b) 1I(Qnl[o,n1(n+»]—'n)/qfl^ max -p,, 0 as n - oo
‚i n+' q(i/(n + 1))
470 EXPONENTIAL INEQUALITIES AND I•/q^I-METRIC CONVERGENCE
by Corollary 2, the "if' part of the assertion follows from Theorem 1 and the
triangle inequality. The only "if" part of the assertion can be routinely proved
along the lines of Theorem 1. ❑
(2)
and
f o
, gijW„ — W^' dI p 0 as n -> oo for the special construction
t +i„ dI + P 0 -
as n oo for the special construction.
(3) J0
Note that we can think of (2) as implying either
for the usual norm I [I]I r = (Jo jJY dl ) " on 2„ or we can think of (2) as implying
for the weighted metric [ f]I (Io 'I fi dI )'/' on 2r (Ir) = { f: J,'/'f E t,}. We
note that (2") implies
Remark 1. Shepp (1966) shows that J /i(t)S 2 (t) dt either converges a.s. or
diverges a.s. according as Jö ti(t) dt is finite or infinite. Thus the converse of
Theorem 1 (and Exercise 1) holds in case r = 2.
Proof. Consider W. first. Now for any 0< 0 <_ 2
E f
0
gi(t)IW„(t)I'dt
= f B gi(t)ElW„(t)I'dt byFubini'stheorem
0
(Inequality A.1.5)
(c) P( J B
0
l/i^MU„^'dI?e)<_E fWdI/e<E 2 /E=E.
fe
B 4Iw^-WI'dl- IIIke 8 IIW -wil'
Jo
1
+uIm
n
E 0
+cr j
l
i/4W„I'dI +
o f
o "
J4Wr'dI
(g) P Q rydW„ —Wi (' dI > 5sc, j < 5E for all n ? some n E .
o /
That is, (2) holds. Chibisov (1965) proved (2) for r = 2 with W. replaced by U,,.
For the Y,, process the above argument still holds, since we will now show
[allowing %/„ to replace W. in (a)] that
(5) EINn(t)I ' <_ C, [t(1— t)]'/ 2 for all 0 < t <_ 1.
linearity of Q. on [ p , p + , ] we have
; ;
Exercise 2. (Serfling, 1980, p. 283) Show that if X 1 ,.. . , X. are iid F where
[F(x)(1—F(x))]1/2dx<m,
(7) foo .
then
(8) J dx = O p (n - ` /2 ).
1
(9) q'(t)
[t(1— t)] )+r/2[1og (1/t(1— t))(log2 (1/t(1— t)))]'
Exercise 3. Show that theorem 11.5.1 easily implies that (2), with r = 2, holds
provided jö iJig 2 dl < for some function q satisfying (11.5.5); basically, these
conditions require that q 2 (t)/[t(1— t) log e (1/t(1— t))]-) , 00 as t-*0 or 1. Thus
note that Theorem 11.5.1 does not quite imply Theorem 1.
fo
(10) A 2, -A2 l U (t)2 dt as n-*o .
t(1— t)
474 EXPONENTIAL INEQUALITIES AND II'/9II METRIC CONVERGENCE
-
Existence of Moments
-log (1 - F(x))
(1) a- lim
x -.F '(l) log g(x)
Then
<00 >1
(2) Eg(X) = either according as a = =1.
= co <i
Then
<00 tE(a1,a2)
I
<ccr/a_<i<n+1—r/a +
(7) EIX„,;I'_ as
i <r/a_ or i >n+l—r/a +
Theorem 3. Let X 1 , ... , X. be iid F where F ' (1) = co. Let g>_0 be / co.
-
Suppose
—log (1— F(x)) = 0.
(8) a = lim
X-0 log g(x)
Then for all possible centering constants a E (—cc, oo) and all possible scale
factors b> 1 (typically, b = V) we have
(14) there exists M, x*> 0 such that log g(tx) < Mt log g(x) for t>1 and
x> x*.
Exercise 1. Show that g(x) = exp (tx) with t>0 and g(x) = x'l [0 ) (x) with
r>0 satisfy (14).
where
(16) W=N(O,p(1—p)/f2(xx))•
476 EXPONENTIAL INEQUALITIES AND II'/9II-METRIC CONVERGENCE
Exercise 2. Suppose (10) and (11) hold, and let r >0. Then
if and only if
if and only if
if and only if
if and only if
Bounds on Moments
lI \ ^^Z j ^n:^
(24) —I E h(t)dtI sCrpT'a gT rd 2
,
(26) F '(t) =
- J f h(s) ds for some h.
Then
(27)
f p" -
h(s) ds=X n;; –F '(p ; ) -
rn
(28) E!Xn:i – F '(P^) c1P, d`9. dz
ford,-1<i<(n+l)–(d2 -1).
Inequality 2. Let r 1. If (26) and (23) hold for (d, n d2 )> 1, then
n r/2
2r—' ( C,+C I)P1 rd 9i r^2
(29) ( piq )n:i — EXn:ii r ^
for all i as in (25).
'^2
(30) ( ! )
E(X :; –Xp – C,qT`az for all i as in (25).
Pig1
"Proof" of Inequality 1. We offer only the following intuition for this result.
Now
Cn
(a) J n h(t)dt= n (fn:i – P;)h(p )•
J ' Pr9.
478 EXPONENTIAL INEQUALITIES AND II'/giI-METRIC CONVERGENCE
()
n r/2
E! :i — piI rh(Pf) r
(b) P1g1
(c) <—Constantp -i rd iq ^ rd2 by (11.3.17) and (1)
provided the rth moment exists. Now existence of the rth moment of the
integral in (a) should be controlled [use (1)] by the integral of
r (diI)(1
(d) x (density Of ,, j )
The integral (e) will be finite precisely when (3) holds. Details of this proof
are found in Mason (1984b). ❑
Suppose a=1. Now F(x) = 1-1/x for x>1 has a=1 with EX = oo.
However, F(x) = I — x - ' exp (— log x) for x ? 1 has a = 1 and EX = 3< oo.
Suppose a> 1. Then there exists 1 <c < a and x* such that —log (1 —
F(x)) > c log x for x ? x *. Thus 1 — F(x) < 1/x` for x > x *, which implies
Eg(X) < oo.
uppose a<1. We will show t=1 P(g(X) > n) = oo, which implies
Eg(X) = 00. Since a < I there exists a <c < 1 and a subsequence x 1 <x 2 <
having I — F(x k ) > xk ` for all k with X k -) co as k -> co. Let y k = (Xk ) for k>_1
with yo =0. Then
(a)
( r ] ( 1/r
PlI Xn:il > x) — PlXn:i > x )
and so
-,+l
lim -log P(I Xn: ,I r > x) lim -log (1- F(x'^r))n
X_W log Xx_m log X
=n -i +1 lim -log(1-F(y))=n -i +1 a+
r ,,- W log y r
(b) <1.
It follows from (b) and Theorem 1 that EjX„ :; I' = oo. The case i < r/ a_ is
analogous.
Suppose now that r/ a_ < i and r/ a + < n - i + 1. Now
Since
n
(d) P(Xn:; > X' /r ) i P(X,, ... , X„_ ;+ , all exceed x' /r )
i +,^
= (n^[1 -
F(x'^r)^n
the inequality in (b) also switches in this case to >, and gives
Similarly,
P(IX":il' >x)
(g) uni > 1.
X-0 log X
)pkqn_k.
(2) b(n, p; k) = P(X = k) = (
_ n k n-k
( n k- In-k+1=
(a)
())
p q (k-1)P q k q
(n+l)p—k
=1+
kq
so that
< <
(b) k= =(n+l)p implies b(n,p;k-1)= =b(n,p;k). ❑
> >
We can also reexpress (4) and (5) as
One of the most celebrated results of all probability theory is the limit
theorem that
X—np
(8) =(0 1)
as n-,
oo, for all real A.
482 EXPONENTIAL INEQUALITIES AND II'/gfI-METRIC CONVERGENCE
Bounding the Binomial Tail by the Tail Term Nearest the Center
fork ? np and i ? 0,
(b) 6 ( n , p; k) <( k +l q) —B
where
Thus
establishes (10). Interchange the roles of p and q to obtain (11) from (10). ❑
Note the crude inequalities that for all n >_ 1, 1 <_ i <_ n, and 0 <_ t <_ 1,
— )"_i+1
(12) P( 'n:, t) ^ P(61, ... , C„ -;+, all exceed t) = (1
ADDITIONAL BINOMIAL RESULTS 483
and
„ , +I
(n)P(6I,. ..,e„-1+I all exceed t)=1 n 1(1 - t)
-
(13) P(hn:i>t) 5
The next result is a special case of a theorem of Bahadur and Rao (1960).
Theorem 2. (Bahadur and Rao) Let 0 < p < r < 1, and let X.
Binomial (n, p). Then
(14) P(Xnl n ^ r) =
where
and
(1-p) 1
(16) 1 - lim d lim d„ <
2Tr n- W n-.0 (1 p/r) 2vr *
so that we have a lower bound of the proper order. An upper bound is provided
by Feller's inequality (Inequality 1); thus
P"`- -(n+11)pl(k+1)P(X°=k)- 11
i plrP(X"=k)
1-p n -I/2pfl
(c) ^1-p/r 2^r
Thus
Poisson rv's
and we define
By observing that
we conclude that
satisfies
rA
2^r
As in the binomial case, the tail probabilities are of the same order as the
tail term nearest the center. Thus, by (4),
1 forallk>_r,
(8) 5Pk/[1 k+l]
while
k
k-1 +«') pk ^k\'
P(X <_k)= >2 Pk=Pk(l+—+—
j =O r r r J ;=o r/ }
kl
(9) <_ P k / 1-- for all k < r.
r J
Exercise 1. (Large deviations) (i) Using the same technique as in Theorem
11.8.2 of Bahadur and Rao, show that
where
(12)
! log P(Xr /r A)- (A — 1)—log A asr->oo, for each A>1.
Now show that for 0< A < 1, Eqs. (10) and (12) still hold, but A (A —1) on the
rhs of (11) must be replaced by 1/(1—A) [since (9) is now used instead of (8)].
(ii) Obtain (12) again, this time as a corollary to Chernoff's theorem
(Theorem A.4.1).
Inequality 1. For each A > 0 in the "+" case (each 0< A <_ 1 in the "- - case)
the ry X, = Poisson (r) satisfies
(13) P(t(X,—r)/f^A)<_inffexp(—IA)E(exp(tt(X,—r)/^))
2
(14) cexp \ 2 '^^^
486 EXPONENTIAL INEQUALITIES AND 1I• /qJI-METRIC CONVERGENCE
Proof. Now
(a) P(± (X. — r) >_ A) = P(exp (±t(Xr — r)) > exp (tA)) for all t>0
^infexp(—to+r(e:' -1) F tr)
r >o
where Gamma (r+ 1) has density xre X/r! for x> 0. [Recall that r! I'(r+ 1).]
-
_ a(k*) e —("r
= e 1
exp (— TI1± Ilog^i± A I -
2ar r± / A \ \ ^/ ‚j:/ fJJ/
a(kt) 2
±-
(19) =e 2^rr 1 i exp\—^
2 +G ✓r
EXPONENTIAL BOUNDS FOR POISSON, GAMMA, AND BETA rv's 487
c=ab ifIc—aj<_b.
(20)
_ —a(kt)
P(Xr=rt.lr.l)= 2irr 1O+Sexp(— 2O+2I
I A2 8 \ if
I'kI V IA13
y sS.
<_(1+J /A ifl,1J/f<s
and
(22) ( l r_VA y 1
=f/a.
Thus (8), (21), and (19) and (8), (22), and (19) give, respectively,
±1/2 A2 A
(23) P(t(X,—r)/I>,1)<
_(1±.%-) Vh eXp ( 2 (f))
in case r t f A is an integer; this is typically, but not always, stronger than (14).
Then using aji(±A/f) = 1 F A/(3f )+ O(,l 2 / r), you can show that
I ^' ( 1)(j+2)W
Gamma rv's
We thus manipulate
y)re- (r +TY) dy
P((X, —r) /I ?A)=^ r(r +f /r!
JA
Q -a(r) m
= e -r[(I +y /Jr) -i -log (t+y/ ../r)) dy
2 7i A
e -a(r) m 2
(30)
VJ
I
= 2 A exp(— 2 ^(5)) dy forA>0
and
e -a(r) f 2
(31) P( —(Xr—r)/f>
—,l)=
exp. 2 j \ fl/ dy
for0<A <f
where
The function rli is described in Proposition 11.3.1. [See Devroye (1981) for an
alternative.]
EXPONENTIAL BOUNDS FOR POISSON, GAMMA, AND BETA rv's 499
Exercises. (i) Use that A^ (A) is Tin the exponent of (30) and (31) to show
1 exp
(33) P(±(X—r)/sIr?A)< 2
tAlf) A ( —« ))
the rhs of (34) can be further bounded below using Mill's ratio to obtain an
expression looking more like (33).
(ii) Obtain also the bounds based on the moment generating functions,
as in (13) and (14).
[1 — 4'(A,)] exp ( FA 2
i =I (J+ 1 )r'^ z
provided A, = o(r(k+,)i(z(k+a)))
Beta rv's
(35) X = XQ , b = Betaa + 1, b + 1)
(a+b+1)!
with density x°(1—x)' on [0, 1].
a!b!
Recall that
a+1 (a+1)(b+1)
(36) Xa,b
a+b+2'(a+b+2)z(a+b+3)^
490 EXPONENTIAL INEQUALITIES AND 1•/qfl-METRIC CONVERGENCE
P (( X a +b/l (a+b) 3 ^ A )
_ a + b+ 1 e _0)-d)
(38) a +b 27r
ab(a +b) /a yz b by
x JA exp^—
2 [a+b^( ab(a +b))
a Y
dy.
+a+b^`(
Derive an analogous expression for the other tail; you get the same expression
with +i replaced by i/i(— ), and different limits on the integral. Now derive
analogs of Exercises 6-8.
CHAPTER 12
0. INTRODUCTION
and
1 (fls)
491
492 THE HUNGARIAN CONSTRUCTIONS OF I. 1J„ AND 9/„
(4) K(m /n, •) = m/n U,,, for the empirical process U. of , ... ,
In Sections 2.5 and 2.7 we saw two other approaches for special construc-
tions, and we applied them to the partial-sum process S „. The two additional
techniques are Skorokhod embedding and the Hungarian construction. Both
provide rates of convergence; that is, both would allow us to improve a
CLT-type result whose proof rested on (1) to a Berry- Esseen-type result.
However, only the Hungarian construction provides a single sequence of iid
61, e2,... useful in proving strong limit theorems.
In Section 1 we briefly summarize the Hungarian construction of K „. We
do not give a proof establishing the basic construction. It is long and difficult!
Our purpose is to learn how to use it. The basic fact is that a construction
satisfying IIK,, —KII = O((log n) 2 /Vi) a.s. is possible. The key paper is Komlös,
Major, and Tusnädy (1975). [An alternative that has rate (log n)/V is also
presented, but its incorrect joint distribution theory limits its use.]
In Section 2 we see how a sequence of uniform quantile processes N„ that
converge at a rate can be constructed from the partial-sum process of indepen-
dent exponential rv's. The basic ideas here are due to Breiman (1968) and
Brillinger (1969). Although use of Skorokhod embedding gives an a.s. rate of
nearly O(n ' 4 it is also possible to use a Hungarian construction to obtain
- " )
an a.s. rate of O((log n) /v'i). Again we stress, these results can yield Berry-
Esseen-type theorems, but not strong limit theorems. The Hungarian construc-
tion here is different from that used in Section 1.
Note that the approach of Section 1 yielded an a.s. rate of O((log n) z /Vi)
and the potential for strong limit theorems. It is known that the rate cannot
be improved past O((log n)/'Ii); it is still an open question whether this rate
can be achieved. If so, applications await it (see Section 14.4 for one example).
This is another good reason to omit the proof of what is currently available,
but perhaps not best possible.
The constructions and approximation rates discussed in Sections 1 and 2
all concern the supremum metric II • II. In Section 3 we summarize yet another
construction of 0/„ and U, which pays close attention to the behavior of these
processes near 0 and 1; it is due to Csörgö, Csörgö, Horvath, and Mason
(1984a). This refined construction is based on the same partial-sum approxima-
tion as in Section 2, but particular care is taken in treating the approximation
error near 0 and 1. The result is a construction which is suited to the stronger
metrics II • /qII where q(t)=[t(1 _ t)] 12 ° with 0: v <2. In Section 4 we prove
a Berry- Esseen-type theorem for functionals of U,,, N „, and N„.
THE HUNGARIAN CONSTRUCTION OF K, 493
Even though the proof of the powerful Theorem 12.1.1 is not presented,
one of the basic elementary tools is quite important. The Wasserstein distance
was discussed in Section 2.6.
To extend the empirical process results of this chapter to the general
empirical process — F) on the real line, one simply plugs F into both
K. and K and notes that
Let K. denote the sequential uniform empirical process of Section 3.5 defined
by
1 <hs>
(1) IK„(s, t)= J- Y ^ LI to , o (^,)—z] for s?0 and 0<—t<1
for a sequence f,, ' Z , ... of independent Uniform (0, 1) rv's. Suppose
We recall that
(4) Coy [K(s, , t,), K(s2 i 12)] = (s, n s2)[( t, A 12) — t, 12]
In fact, we have
/
g(lo n)2
M <oo a.s.
(8)
„-,m L l-5ksn n v n Vn
^
>((c,logn)+x)lognf
(9) P(^mk xIf /iKn \ n' ' / —K(k,') II
c2 exp ( c3x)
—
Corollary 1. We have
g 2
(10) k-
E( max IlK (k /n,')— K (
<
) c )
I_k_n J
for some 0< c < oo.
a) EM„= J 0
P(M„>_x)dx
<c '
log2n
+ J o P(M„?x +(c,logn) l ^dx
to to n \ to
=c l g2
✓n
n
+
o
n 1
^P^M„>_(y+c,log ) g I dy g n
✓ n / .T
log n
letting x = y
fn
THE HUNGARIAN CONSTRUCTION OF Iä, 495
log e log n
(b) <_ c, n+JQ
Ic2exp ( — c3y)dy by Theorem 2.
as claimed. ❑
Open question 1. It is known (see Komlös et al., 1975) that the conclusions
in (7) and (8) cannot be improved past (log n) /V. Is it possible to remove
one log n term from (7)-(10)?
The proof of Theorem 1 is very long and very difficult, and it would seem
that it might not give the best possible result either. We omit it.
It is possible to get the best possible rate if you are willing to give up
something, namely, the Kiefer-process structure. See Csörgö and Revesz (1981)
for a readable proof.
for all x and all n > 1; here 0< C 1 , C2 i c3 < oo and U,; is the empirical process
of „1,...,e„„•
(12)
(12)
„yam V
lim 0/„ +
1K(n, ) IlIog 2 n log e n
n
"
1 4
<_ some M < oo a.s.
II i L
Open question 2. What are the best possible rates for simultaneous results of
the type (7) and (12).
Open question 3. Give a result analogous to (7) for the W. process of Section
3.1.
Note that this is an alternate way to establish that U. satisfies (2.3.4) or (2.3.8).
of the origin.
If all X; = Exponential (0), then recall from Proposition 8.2.1 that 0'
• • • n:n - I are distributed as Uniform (0, 1) order statistics. In any case let
U. and V. denote their empirical and quantile processes, and let N. denote
the smoothed quantile process.
Let us define the Brillinger process
Exercise 1. Show that B(s, t) and K(s, t) /Ii have different joint distributions
even though B(s, • ) = U - K(s, • )/f for each fixed s>0.
THE HUNGARIAN RENEWAL CONSTRUCTION OF ®/ „ 497
Theorem 1. (The Hungarian renewal construction) Suppose (1) holds, then
there exists a version of iid X1 , X2 ,... and B as above for which
(5) P( JJV„ I^ II > ((c, log n)+x)/ ✓n) <_ c2 exp (—c 3 x)
—
for all x and all n > 1; here 0< c 3 , c2 , c 3 < 00. We may replace V. by N„ in (5).
Note that this is an improvement over Theorem 12.1.1 in that the extra log n
factor has been removed, as in Theorem 12.1.3. In fact, the best possible rate
has been achieved. However, it is deficient in that the joint distribution of
these N. and V. are different from what they would be if only a single sequence
of iid Uniform (0, 1) rv's were involved. Thus the best rate has been achieved,
but at a price. However, this is the appropriate process to consider when
testing whether or not the interarrival times of a renewal process are
exponential.
yields (a) via the Borel-Cantelli lemma. Now define c/ n in terms of these
via (8.3.1), and let 4" denote a random permutation of
;
n+1
Ilc n —8 n (Ij C q%n +l If
yn+l
_ S((n +1)I)
n- + 1 1
+ )F n
.Ln - 1 11211 §((n+1)I )I
= [1 +o(1)][O((log n) /',[ti)] +O(b"/Vi)O(b" + ,)
a.s.
This is a simpler proof, but along the rough lines of Csörgö and Revesz (1978a).
The treatment of V. is even simpler [we really do not need (b)]. ❑
for the class of functions Ye in (2.9.8). We may replace V. by Ö/" in (8). [Note
that (8) is an assertion concerning the V. process constructed via (2), and not
about the V. resulting from e,,
62i ... iid Uniform (0, 1); other methods are
needed to make a similar assertion for the latter process.]
A REFINED CONSTRUCTION OF U. AND %/„ 499
Exercise 5. Show that the best rate that could be achieved in (6) with the
Skorokhod embedding of (2.5.25) is (log n) 1j2 (1og 2 n)" 4 /n" 4 . [This is Brillin-
ger's, 1969 rate. The rate in the theorem is from Csörgö and Revesz, 1978a.]
The constructions of K,, U,, N", and V,, in Sections 1 and 2 work well for
proving results requiring only convergence of the processes with respect to
the supremum metric II - (I, but do not say anything if convergence with respect
to a stronger weighted 11- /q 11 metric is needed. A refined construction of V.
and v", which emphasizes the behavior of the processes near 0 and 1 and
gives a weak (in-probability) approximation rather than the strong (almost
sure) approximations of the preceding sections, has been given by Csörgö, et
al. (1984a).
This remarkable construction proceeds by first improving the inequality of
Theorem 12.2.1 near 0 and 1. The basic idea is that (12.2.5) involves approxima-
tion errors of partial sums of exponential rv's that are accumulated over the
whole interval [0, 1] which results in the c, log n term inside the probability;
this order of approximation is possible when all n of the partial sums are
required to be close to a Brownian motion. If attention is focused on a
neighborhood of zero, however, for example, a neighborhood corresponding
to a fixed number of summands, then many fewer terms will have contributed
to the sum, and the corresponding approximation error will actually be much
smaller. In order to obtain the same order of approximation near 1, it is
necessary to begin a separate partial-sum approximation at 1 and "run it
backwards" over the interval. Of course, it then becomes necessary to "patch
the approximations together" at t = Z . This program can be carried out, and
the result is the following refinement of Theorem 12.2.1.
and
log n. Theorem 1 can be used to control the error of approximation when the
difference between V. and B. is divided by a function which is small near 0
and 1. The resulting weak approximation theorem follows.
n
(3)
—^n
ei "
The most important special case is that of v = 0: thus for any constant c> 0
we have
(4) , [I(1
y—an
—I)]I/211 c /(n+l)
} 1 — c/(n+t)
—O°
(1)
l
(5) [j(l—I]'
c/(n+t)
/2
n
1 c/(n+i)
= O(b)
and
I— c/(n+n
B
n
(6)
[I(1-1)]'/2
_
1 cl(n+1)
= O (bn), °
so that they both go (slowly) to infinity (for any versions of the processes),
but nevertheless, the special construction of V,, and 3. satisfies (4).
When Theorem 2 is converted to a theorem for the corresponding version
of U,,, the following partial analogue of Theorem 2 results.
and
(8) n' /° n = O 10 n ).
A REFINED CONSTRUCTION OF U. AND V. 501
and
V
(12) l
3 JJ
1/2
n1(n+1)
g 1/(n+I)
"pp0 as
a yn(t)
P sup <_ x) -- P( sup S(t) <_ x)
if 0< a„ < some fixed (3 < 1 for all n sufficiently large, and na" -* x where S
denotes a Brownian motion process. [If a„ = a E (0, 1) is fixed, then this
becomes (3.8.8).]
As we were in the galley proof stage, the following major theorem of Mason
and van Zwet (1985), that gives a different construction for U,, completely
paralleling results for V,,, was announced.
for all 1:d <n and all —oo<x<oo. (We may replace II IIö /n by II II I -d/n in
(13).) Thus
log n
(14) lIM IIUn—Boll <someM<ooa.s.
ur n- 13 nl 1/n,1-1/n]
loan )
(2) sup I P(cf(vn) - x) — P(^ i(U) <_ x)I - o (
1
We may replace U. in (2) by V. or 4/ n .
This is a best possible rate for a proof based on Theorems 12.1.3 or 12.3.1.
Unfortunately, the rate 1/f is typically possible.
Proof. Let D denote the bound on the density of ##(U.). Recall from Theorem
12.3.1 (use Theorem 12.1.3 for U,,) that
(b) C3 J f?
where W 2 = Jö U 2 (t) dt. (Recall from Remark 5.3.4 that the rate 1/n is possible.)
CHAPTER 13
0. INTRODUCTION
In Section 1 we establish Smirnov's LIL that lim IIU !I / b n = Z a.s. and then
strengthen it to Chung's characterization of upper-class sequences. We also
estimate the order of magnitude of the probability that 11 U,, 11 ever crosses an
1' barrier. These are based on maximal inequalities for IjU n /q11 contained in
Section 2, and our earlier exponential bounds.
Section 3 contains Finkelstein's theorem that U n /b n M► Y' a.s., and Cassels's
variation on this. The same results hold for V. also. We use these results, via
the Hungarian construction, to establish K(n, •)/(sb n ) ' a.s. Section 4
extends these results to (I /q (I metrics by giving James's characterization of
those functions q for which U,,/(qb n ) v* YC = {h/q: h E X} a.s. The necessary
Q
n n.= V nb n 2
Recall that II@/ II = !IU„ HH, so that N;; may replace U' in this result.
504
A LIL FOR IIU Il5
05
(2) P(IIUJ" II >_ A" i.o.) = j a according as E ^" exp (-2A) = j <^
l n=, n l o0
Recall that IIV " II = IIVJ" II, so that V. may replace U,, in this result.
thus P"(A) is the probability that a plot of 11U,„ II will ever intersect the shaded
region in Figure 1.
S •
> 2 •
I 0
•
s. rn
1 2 3 •••
n n+1 000
Figure 1.
506 LAWS OF THE ITERATED LOGARITHM
(b) a.s.
If we set An = (1 +E)b„/2 for any s> 0, then the series in Chung's theorem
converges; thus P( IIU„ II/ b„ > (1 + E)/2 i.o.) = 0 for any E > 0. Thus (b) follows
from the convergence half of Chung's theorem (to the proof of which we now
turn).
The proof given below of the convergence half of Theorem 2 follows Shorack
(1980); the theorem is due to Chung (1949).
We first note that we may assume without loss of generality that we have
the rather coarse bound
Then Ä n = A n n log 2 n is / and using 1t n in (d) still yields a finite sum. Thus
would imply (RU,, 11 > a„ i.o.) = 0; and since Ä n ^ a n , it trivially follows that
P( DU,, I1 >A i.o.) = 0. Thus we may assume (c) in what follows. It is also true
that
We now justify this. Since x exp (—x) is \ for all sufficiently large x, there
exists an n o such that n ? n o implies
A2
^"'exp(-2^tm)>_Anexp(-2,1n)ß
n o m na In
and note that (f) converges to ca on any subsequence n' for which A,,<
(log 2 n')/2. Thus A n > (log 2 n)/2 for all but a finite number of n, which
yields (e). This paragraph is essentially contained in Robbins and Siegmund
(1972).
Let n k be a subsequence >' to ca. Let
Ak[ max IIUnII>An k ].
nk^n^nk+l
Now [ IIIJ n 11 > A n i.o.] c [A k i.o.] since A n 1'; thus Borel-Cantelli will yield
P(A k i.o.) =0 and complete the proof once we show that
co
(g) Y_ P(A k ) <m.
k=2
We now specify our choice for n k as (note the discussion in Section A.9.1,
which is not used here)
'Pi k 7
and a = nk+' _
I
(1) c= 1— 7
( 8 log k) n k ( 8 log k)
It is appropriate [see (13.2.2)] to apply this maximal inequality since (j) implies
Thus
Now
m nk+I
oo> I Y- (cn /n) exp (—cn ) by (d)
k=2 n =nk +l
°O C
/ 1^
so that (q) and (r) yield (g). This completes the proof.
The term e 5 in (o) shows why the sequence n k of (h) was required. ❑
Proof of Theorem 3. According to Chung's Theorem 2 we have log P(A) _
(a) Y_
log I = 0 for all n unless
n=1
(cn /n) exp (—c) <m where c,, 2A;
A LIL FOR DW„II 509
and thus the present theorem is uninteresting unless (a) holds. Let n k and A k
be as defined in the proof of Chung's theorem, and recall from that proof that
(a) implies
[which proves part (i)] and that for each K > 1 we have
P„ K (A) P(Ak)
k=K
cn k = d„ k logt nk
(d) >_ d„, log (k/log k) — d„ k allowing coarsely for ( )
k - e lo k\ d ^k
P„ K (A)^106 k Y_ exp (d„k—d„k log(io
g k))
=106
k 1
10 6 k d nk^' - e ) < 10 6 ^^ x -a„ k (I- ` ) ax
k=K K_1
106 k (1-r)+I,
(K — 1 )-e„
d(1—e)-1
and hence
log P„ K (A)
C„ K
er log 106 —log [d„ K (1—)-1]—[d„ K (1—e)-1] log (K —1)
C„ K
logK
=o(1)— "
d K (1—s)logK+
c„ K c„K
log K
(e) ^ —1+2e+ +o(1) by (d)
C„ K
(f) =-1+2e+o(1) by hypotheses.
510 LAWS OF THE ITERATED LOGARITHM
log P(A)
(h) lim
n -.co cn
This completes the proof of (ii). [Note that P P (A) becomes smaller if we
replace IlU.11 by either IIU II.] [Note that this proof fails going from (e) to
(f) if 2A /log 2 n 71+6 for any 0 < 3 < co. In this case we obtain, in place of
(h), that
(8) lim log P.(A) /(2,1„) <_ — 8/(1+8) when 2a. /log 2 n / 1+5.
Remark 2. (Erdös, 1942) Let X,, X2 ,... be iid Bernoulli (1). Let S.
X 1 +•••+Xn —n/2. LetA n /'.Then
0 °D Ä n Z <00
(9) P(Sn > VA i.o.) = 1 according as exp (
n=1 n =-2A^.
n) _
(10) c exp (- 2.1 2 ) <_ P(Sn > TijA) <_ d A exp (-2A 2 )
for some c, d > 0. He introduced the subsequence n k = (exp (k /log k)). Exercise
1 below asks the reader to make minor changes in our proof of Chung's
theorem (Theorem 2) [note the minor differences between (10) and the DKW
inequality (Inequality 9.2.1)] in order to establish the convergence half of (9).
The divergence half of (9) in Erdös (1942) is long and delicate, but the same
type of minor changes in it will produce a proof of the divergence half of (2).
We saw in the proof of Theorem 13.1.2 how delicate LIL-type results require
us to relate probabilities dealing with a whole block of subscripts to a probabil-
A MAXIMAL INEQUALITY FOR j^U^/qfl 511
ity that just involves the last subscript of the block. We now develop such
inequalities for rv's of the type U'n / q jj . They are due to James (1975) and
Shorack (1980).
A ;k
–
— max max q(r)
[ n'smsk <i
±S'(r))
<_a^ and±S (r)
q r >a^ J ^ ^
1
and
B;k =[t(SS ..(r; )–Sk (ri ))<(1–c)q(r 1 )A n'] where 0<c<1.
i • •
i -1
i
—►
n k n"
Figure 1.
Now for fixed k, the A ik are disjoint in i; also A i * k * and B ;k are independent
provided k* <– k. Thus by Inequality A.7.1
±U n .(1) ca \ / ±Sn„(r;) \ n„
P sup >—)?P( sup >c n' I by ; <_a
<<<b q(t) // . i^a q(r,) / n
where for all i, k (provided b,, C, a was chosen small enough) we have
We will now inquire into how small b,,, C,. must be. By (a), it suffices to have
q(t )
Inequality 2. (Shorack) Let a> 1 be given, and let n' s n” be any pair of
integers satisfying n "/ n' <— a. Let 0: a <_ b <— 1 be arbitrary. Then for any 0< c <
I and all A ? A^,^ ° 2(a — 1)/(1 — c) we have
ji11 b
ax a>A)^2P( "
(4) P \nmk n "II I(I k I II I(1 n— I)IIa > J
Proof. The only change needed in the previous proof is in Eq. (a); this now
becomes
a -1 t(1—t) a -1
P(Bik) C sup b
(1 —c) 2 A 2 a
t(1 i) (1 —c) 2 A 2
(a) for A1>!,k, . ❑
The main contents of this section are functional LIL's for U. and V „. Properties
of the functions in the limit set J' are put forth in Proposition 2.9.1. The
theorem is one of the central results presented; it is a natural analog of Strassen
(1964).
RELATIVE COMPACTNESS OF U„ AND V n 513
The empirical process U,, has trajectories in the metric space (D, II II). Let
^ h: h is absolutely continuous on [0, 1] with
(1)
h(0)=h(1)=0and j [h'(t)] 2 dt- 1
0
Theorem 2. (Cassels) Let b„ = ./Jg2 n. Let e > 0. For a.e. w there exists
an NE ,,, such that for all n >_ N, we have
Note that y = f t(1 — t) for all 0 t <_ 1 describes a circle of radius Z centered
at (‚0).
lim
n-.eo
f
Exercise 1. (Finkelstein, 1971)
ol
[yn(t)] 2 dt = 1
bn —
it
a.s.
[Hint: This result is easy once the calculus of variations is used to show that
sup {föh 2 (t) dt: hE }=1/ßr 2 .]
Proof of Theorem 1. Cassels's (1951) theorem was first established using very
delicate approximations to binomial-type expressions. Finkelstein's (1971)
theorem was inspired by Strassen (1964). The results for V. are trivial in light
of the results for U,,.
514 LAWS OF THE ITERATED LOGARITHM
where II ,,(U „ /b„) is the 11 T„ projection of v„/b „. Since a,, a 2 ,... are iid
(0, T) with o; ; = (1/m)(1-1/m) and Q = —1/m 2 for i #j, the multivariate LIL
;;
where Jm {(t, , ... , t m ): Y_, t.=0 and m ” , t . — 1 }. Let R,, +1 denote all
x = (xo , x, , ... , xm+ ,)' in Rm +l having x0 =0, and let lIi : R o„, + , -* R m by
(d) W(X)=(X,—Xo,X2—X,,...,Xm—Xm-,).
f
m 1 i/m
= J 0
[h'(t)] 2 dt s 1.
RELATIVE COMPACTNESS -» OF U. AND V, 515
Thus 1(i(II m (h))EJ, and II m ( )c1/i '(Jm ). Now let t=(t i ,...,tm )'EJm .
-
Then x = (xo , x 1 , ... , ç,)= i/r - ' (t) is defined by letting xo = 0 and xi = E; t; for
1 s i < m. Now define h by letting h(i/ m) = x i for O<— i<— m, with h continuous
and linear on each [(i— 1)/ m, i/ m]. Absolute continuity of this h follows from
its piecewise linearity. Clearly, h(0) = h(1) = 0. Also Jö [h'(t)] 2 dt =
m '(mt 1 ) 2 --1 by definition of im. Thus h E Ye; h has II ,,(h) = rG - '(t) for
t E Jm above; and Jm) c 11 TT (W). Thus (g) holds.
Now let us turn to (2.8.19). Note that the II m approximation (U,, /b„) of
U,,/ b„ satisfies
(h) Il b”—\bn/Il^^mam0,„
ma II O
Un((ib
nl)/m)Il^i-1>/m
Let P. denote the number of times 0 <— e k <— I/ m for 1 <_ k <_ n. Now let e; , 2 , .. .
be independent Uniform (0, 1) rv's that are independent of all other random
quantities introduced so far, including v„. Let U, denote the empirical process
based on f*, ... , f*, . Now note that 0, ... , A m „ are identically distributed
(and dependent) and the infinite sequence of each D i„ (with 1 <_ <_ m fixed) i
satisfies (= here means jointly in all values of n = 1, 2,...)
1 n 1/m
V ►! L
1 ( 1 k) —nt ll / bn
[o r1
M o
1/m
r *\ _
= 1 ^ 1 [O,mt] (Sk/ n mt
= 1 m ll0 / bn
% rt i l 1 [O, s]\Sk )— m S
s
ll '
l bn
= v” 1 1 [O,S^( k)
^ Yn S -i- n m fs n
n V lin iL' [ ( Y n^ S n 11/b
• n II U* + \ Y" m/ V Yn II/bn
(•)
t
Y„ b,,, III*,II IYn—n/mI
=
n b„ b, + Vb„
Considering separately the terms in (i), we note that since lim sup k IIUk II/bk = i
516 LAWS OF THE ITERATED LOGARITHM
a.s. by Smirnov's theorem (Theorem 13.1.1), and since v„/n -_. 1/m by the
SLLN, we thus have an a.s. lim sup of 1/ 2m for the first term in (i). Applying
the classic LIL to independent Bernoulli (1/m) trials show that the second
term in (i) has an a.s. limsup of (1/m)(1-1/m). Thus (2.8.19) holds with
a. =1/ 2m +'./(1 /m)(1-1/m). ❑
Proof of Theorem 2. Now U/ b„ M► a.s. wrt I) II on D. Thus by Proposition
2.9.1 we have for n >— some N , that
E m
IU.,(b)—U,(a)l Un( b) —
for all O< a < b <— 1 provided n > N . This establishes (4).
EW
That the lim sups of (5) are no bigger than claimed follows from (4). That
they are at least as big as claimed follows from applying the ordinary LIL for
each of the countable set of pairs a, b having both members rational; recall
that U, is right continuous.
The proof for N. is completely analogous. However, a bit of epsilonics is
needed for the second result. The key elements are the corresponding result
for U. and the identity (3.1.12). We leave it as an exercise. ❑
Proposition 1. If {aö(s, t): s >_0, 0 <_ t <_ 1} denotes the Kiefer process and
b„ = J2 log e n, then
^ö(n, • )
(7) V—b n a.s. wrt II lion D.
(i) If
(2) f 1
. g 2 (t)log2[ 1 /t( 1— t)]
,
dt<oo,
then
QJ"
(3) qb" M► q = {h/ q: h € C} a.s. wrt I) II on D.
(ii) Conversely, if
(4)
f z
, q (t)log 2 [I/t(I—t)]
"
1 dz = oo for some 0>0,
(5) lim II
+ 1 >_
^"
n —w q b" o
al" JIM 1 (
n-- q(^n:,)bn
'') _ oo a.s.
while
Our proof of James's theorem (Theorem 1) will show that (8) is sufficient to
imply that for all e > 0 there exists 0< a < z for which
E
(10)
v„
qb„ kq = {h - /q: h E J'} a.s. wrt II II on D
if and only if
Thus the Borel-Cantelli lemma implies that a.s. we have 6„ > a„ for all n ^ some
N , ; and since a„ \ 0, this implies that a.s. f„ : , > a„ for all n >_ some N., [or,
a
see Kiefer's theorem (Theorem 10.1.1)]. Thus a.s. we have for n>_ N', that
II U„ II'' nt na„
qb„ o c os, p 0 (t)b„ c 4,(a„)b„ 4,(1 /n)4,(a„)bn
(b) -*0 as n -*co.
Our proof of (b) is the only place where we require the full strength of (2).
In the remainder of this proof we will assume only (8); in particular, the
requirement q(i)//i\ will not be required any further.
RELATIVE COMPACTNESS OF U„ IN II /glj - METRICS 519
Now that we have dispensed with the integral condition and just have
g(t)-*co, we want to show that we can replace g by a smaller function g
satisfying g( t) --> oo, but that -oo in a nonerratic fashion. (Replacing g by a
smaller smooth g still satisfying the integral condition would be much harder.)
Suppose g(t) -* oo as t }. 0. Define §<g by
for some fixed )arge M. Then 1/ ä„ <_ n log e n. It is thus possible to replace f
by a function f <_ f that / to co so slowly that g(t) f (1/ t), which 7 co as
‚1, 0, satisfies
g ) as n -> oc where än = M
n z (1/n)b„'
(e)
^(t) = 8(t) log (1/t)
loge (1/t)
We now define
n [ log n (log n) 2 1
(k) [0 , a ] Ia I1 1! ,2L 1,
. ,
n nn n n
[ (lon) 2 1 ^
>a J >[a>z]
n
520 LAWS OF THE ITERATED LOGARITHM
Let
(n) A.n=[jjU,/(Rbn)^^am's].
To establish that
(q) nV A m C Dk =
m =nk
[
max
nkk.^ I q I Cnk +l'
i(cflk*l) \)
^
7^ _ ( eb,, (c, k+l ) )
(12)/k—(ebn
+1 Cllk+l — nk+lCnk+, nkCnk+l /.
Thus
I CI-B)6FZY p^ Z Id^ k )
6 fc.
P(Dk) — du
c I (lo gl n k
(13) ^ M6[log (1/c»k+,)]/(log nk)(l-e^6E=,k02(d„k)
(w) yk cp 2 (1/ n k ) > (any large M we choose) for all k ? some k91 ..
6n 4(an +,)
(X) ek=yk^2(1/nk)=ßz(1/nk)^ by
nk+lank,
(y) = 4 2 ( 1 ! nk)tJI(fk/M),
where
with
2 111 by (i)
We now specify M = ME to be so large that when k >_ k,,,M = ke, £ , the exponent
in (v) exceeds 3. Thus
(dd) P(Dk ) <_ (some MB)/k 2 for k?
and thus (p) holds. Hence, from (o), since £ >0 is arbitrary
(ee) lim ^lQ. 1 "/(9 6")IIä^" =0.
We next consider the interval [1/n, (log n)/n]. In this case (13) becomes
e 7 +Q'2((1o8
(ff) P(Dk) s M(log n k )/(log nk )(I_o) "k+l) / "k+1 >>
ek yk p 2 ((log nk +l) /nk +l) = 0(b 4( 1 /nk+l)) 2 ((log nk+I )/nk +l)
logt (1 /nk +1) 2 ( ( log
nk+1)/nk+} using (g) and (h)
)
Vnkc ( 1 / nk +l )
(gg)
We next consider the interval [(log n)/n, (log n) 2 /n]. In this case (13)
becomes
k'(a)
(I-9)6£=Ym
VJ) P(Dk)^ Mallog nk)/(lognk)
where
z
(log n k +l)
(kk) ek = Yk 02 (
nk+, )
(mm) li m =0 a.s.
yk = f'f
+ (
bn
k
4((log nk+1 )2/nk+l)
(log nk+l)
l/ W(fk)
(oo) — 4r(0) = 1
since
We summarize what we have accomplished so far. Thus for any e > 0, there
exists 0< a < 2 such that (9) holds. Also
E
= II 1 /T II O E = 1 /( a )
(ss) <_ a if a is chosen sufficiently small.
e E
(tt) li a.s.
s up II \ bn — h / l q ll o e - 2s
Coupling (tt) with
,
(uu) II \bn —h // 9 lla/ ZC q(a) II 6n —hllo 2
524 LAWS OF THE ITERATED LOGARITHM
Finkelstein's theorem (Theorem 13.3.1) shows that for any h E Ye, any w in the
probability space, and any subsequence n' we have
(vv) IIU n ./bn .— h) /q^J -+0 if and only if IIU, , /bn .—hII -+0.
(c) foo n
Constant 1
n =0 n+1 f(an +l)
But log e (1/a)— log 2 n, so that E ° do = Co. Then P(f',, . 1 <d i.o.) ? P(fn <d
i.o.), and n P(e < d
o i.o.) =1 by Borel-Cantelli. Also, Y_," a n <Co implies by
Borel-Cantelli that a.s. n > a n is true for all n >_ some N ; and since a n \ 0,
this implies that a.s. fn , I > a n for all n > some N,. We have thus shown that
[or see Kiefer's theorem (Theorem 10.1.1)]
Now on the a.s. event of (e) we have for n >_ some N' that i.o.
(note that nd„ - 0 as n - oo). Since K >0 is arbitrary, we have thus shown
= (a summable series)
provided a is chosen small enough for the exponent in (c) to exceed 2. Thus,
as in the proof of Theorem 1, we have
provided a = a E was chosen small enough. Combining (a), (d), and (e) gives
^%n
\ a
(f) li
m 1 6 — h I/qIl <_e a.s.
(
- E uniformly for h E
\ n / 0
sup( -1.
(h) 1— I)ll
Thus we cannot have Un/ I(1 —I)b n ^»J ,(,_,) a.s. wrt1) on D; but if
lim,., 0 0(t) < oo, this M► is required. Hence lim,. o 0(t) = oo. ❑
Exercise 1. (James) If f is positive and / on [0, 0] where j0 [tf(t)] ' dt < oo,
then there exists M> 0, a 0 > 0, and no such that for any 0< a < ao and any
n >_ n o we have
Open Question 1. Formulate and prove the analogous results for V..
The successive maxima of partial sums satisfy the "other LIL" given in (1.1.9).
Similar behavior is exhibited by 11U,, this is the content of the following
theorem of Mogulskii (1980).
Recall = 1!U,,11.
THE OTHER LIL FOR 1JU„II 527
The key ingredients in the proof of this theorem are the exponential bound
Note that
where
m
(f) Y_ P(Dk)<oo•
k=t
We define
f(x)(1— 0)(log a)
(i) f(x) = ax' ' has f'(x) — .
xe
528 LAWS OF THE ITERATED LOGARITHM
(1 +E)A^ 1
(^) <2P l IIUJ nk II <_ J for all large k
Since the second term on the rhs of (1) is clearly convergent, (f) follows from
1 (
C1-e)1r
P(IIUII<0+e),l P IIvII — 81o g2 n k
(Constant) (log k)
log nk )Ii(I_e)2
(
(Constant)
V for all large k
`
( 0 ) nk = k k
and we define
for S. as in (e). The R k 's are independent and for all large k
4
2 loge nk exp 41og2 nk
^(1+s) — ((l+e) )]
+ (convergent) by (2.2.12)
(Constant)(log k)
(k log k)' + ( convergent)
g
b„ R k ^r
(r) ki ^ 2'
nk (H SnkI II + Rk)
lim b% li U nk 11— lim bnk II Ik H lim b
k-.00 v /1k Y Ilk
b
slim "k Rk +lIm ( 6flkII`Snk-II1 ) nk-1/nk
(t) 2
530 LAWS OF THE ITERATED LOGARITHM
using Smirnov's theorem (Theorem 13.1.1) on the first lim in (s) and using (o)
on the second. We note that (t) implies (n). ❑
(3) P(liu„11^^ni.o.)
( 3 2
_ { according as ^" exp ^—
n l _, n 8 ^1 2,^ _ <(
=co
Note that 1 /A„ = (1 t e),7r/ 8 log 2 n are the "crude" dividing sequences.
i
(4) lim b„ VJ „(t) dt ='4 a.s.
Other interesting references are Jain et al. (1975) and Csäki (1978).
6. EXTENSION TO GENERAL F
Using Theorem 1.1.2, it is clear that for a sample X,, X2 ,... from F we have
the following results.
Smirnov's theorem becomes
The extensions of the other results to arbitrary F are also easy, but we do
not know an extension of Theorem 13.5.1.
Oscillations of the
Empirical Process
0. INTRODUCTION
In Section 2 we consider the analog of this result for the empirical process
U,,. For sequences a„ satisfying
and
we show that
W,,(an) _
(6) l
im 2a„ log (1/a„)— I a.s.
531
532 OSCILLATIONS OF THE EMPIRICAL PROCESS
for the modulus of continuity w„ of U,,. The condition (4) is such that
The a.s. limiting behavior of wn (a„) /[2a„ log (1/a„)]" 2 on the boundary
sequences a„ of (7) and (8) with c„ = c is also obtained. Our proofs are based
on the inequality that for 0< a <_ S <_ Z,
A2
(9) P(w„(a)>A./a)<_ä3exp^— (1—S)° 2 forallA>0
) na)
ti(
for the qi function of Inequality 11.1.1; we can improve the 4i term to I in the
case of wn.
All of these results apply also to the Lipschitz z modulus
(a)
(11) lilm 1 a.s.
2 log (1 /a)
—
and
wn(ars)
(12) lim =1 a.s. provided the a„ satisfy (3)-(5).
n- 2 log (I/a„)
Moreover, the results for w „(a„) /v and w„(a„) on the boundary sequences
of (7) and (8) with c„ = c are identical.
The theorems of Section 14.2 are proven via classic methods involving
exponential bounds. There are two alternative approaches of importance to
these same theorems. These are based on the Hungarian construction and
Poisson embedding.
In light of the Hungarian construction of Chapter 12 we might expect that
(6) could be established via consideration of the sequence of Brownian bridges
B„ - K(n, •)/'Ji. The analogs of (2) for B. are proven in Section 14.3. In
Section 14.4 we apply these results to U,, via the key result of the Hungarian
THE OSCILLATION MODULI w, ui, AND rü OF U AND S 533
construction that for a version of v„ we have
It is suspected, but has never been proven, that log n may replace log t n in
(13). The improved version of (13) would allow us to prove (6) and treat the
case (7) with c„ = c via these methods [the case (8) with c„ = c cannot be
handled by these methods since "Poissonization" sets in]; as it is, (13) only
allows us to treat the case (7) with c„ = c and to prove (6) under a condition
far more restrictive than (5). All this is carefully discussed in Section 14.4.
Many results for empirical processes have been proven via some form of
Poisson embedding. This approach is taken up in Sections 14.5 and 14.6, the
necessary inequalities having been developed in Section 11.9. We find that the
proofs based on this approach are very similar to earlier ones; however, the
inequalities developed earlier apply directly to U,,, whereas this Poisson
approach uses the analogous inequalities for the Poisson bridge FYI,. The key
idea is that for a two-dimensional Poisson process rkl,
has the same probabilistic behavior as does U 1 , U2 .... Our inequalities from
,
Section 5 are for the process M, that replaces the denominator N(s, 1) of N s
in (14) by I; and results can then extend to N. via the SLLN result that
N(s, 1)/s -*1 a.s. as s - ao. We illustrate this on Chibisov's theorem (Theorem
11.5.1) in Section 14.5.
We feel that all three methods described above are of such importance as
to deserve the careful treatment we have given them.
Let X denote a process on (D, 9). Recall that C denotes an interval (s, t] in
[0, 1], that ICI = t — s, and that X(C) = X(t)—X(s) denotes the increment. The
modulus of continuity cw x of the X process is defined by
Since the big intervals typically control the behavior of this modulus, we also
define
IX(C')I
(3) wx(a)- sup
as^cjsl V IC'!
(4) w = wv , w = mv , and c^ = w v .
(5) U S —IS(1).
Theorem 1. (Levy)
w(a) _
(6) li 1 a.s.
m 2a log (1 /a) —
and
§ +(C)
(8) lim sup > 1 a.s.
a O IcI=a 2a log (1 /a)
S (C)
(10) lim sup > 1 a.s. for each 0< c o < d0 <_ 1.
Flo icl=a 2a log (1/a)
C=[co d]
and
Theorem 2.
(17)
'Io 2log (1/a) — 1 a.s.
Inequality 1. Let 0< a < 1 be given. Then for any 0< 6 < 1 we have
p (—
64 1 .12
(18) P(W5(a)?a,J)<aS (1 —S) Z 2 ) for all ,t >0.
A ex
Proof of Inequality 1. (This seems to be a new and simpler proof of this type
of inequality.) Let M be the smallest integer satisfying
(a) 1 a3 2 2 1 a62
M <— 4' sothat >
M^M-1 4 .
Then partition [0, 1] into M subintervals of length 1/M. Let (s, t] have length
o ••• S ^ t ••• 1
M
Figure 1.
va,,
2log(1/a k )
J.
Then for sufficiently large k we have from Inequality 1 that
P(Ak)sak.2exp{—(1-6)2(1+£)2log (ak)l
(e) k-.wklog(1/ak)^(l+£)
^ a.s.
Applying (e) to (f) gives (a). We restate (a) as: For almost every sample point
w there exists an a 0 (£, w) for which
(g) wheneverIC1:5a0(£,w).
P(A„)=P( max sI n
i 1 ,n
J i (1-0)q(1/n)I
_[1— P(N(0,1)>(1
\ -0) /
2log n)]” by independence
\
= 1— n 1 _ 0 ) <_ exp ( —n e ) since l —x <_ exp ( —x).
§((i-1)/n ,
Hence
S(C) S
H (C) q(1/n)
(m) Icup
q( a) ICIuPn q( 1 /n) q(a)
Since q(1/n) /q(a) -1, the lim inf of the first term on the rhs of (m) is ?1 by
(1). Since q(1 /n(n +1)) /q(a) -.0, the limsup of the second term on the rhs
of (m) is 0 by (7). Thus (8) holds.
The extension from (8) to (10) is easy. Just note that if the exponent n in
(i) is replaced by ((do —co )n), the series in (j) is still convergent. ❑
t
K
0 S t
Figure 2.
It thus suffices to establish (20) for Ew § (a). Now with K =(1/a)+1, we have
( k
w§(a) <_ 3 ocmax l otsup K ^I K, K+ t I= 3 oc maK 1 Mk,
k 1
(b)
]
s3
J 0
[1—P(max Mk <A)J dA
=3 J{1—[1_P(IISjI^A)] K }dA
Jj
N' v(a)
(d) <_3K P(IISII^AvK)da+3 dA
v(a) 0
where
(f) J
Ew5(a)^3q(a) +3K 4P(N(0, 1) >A ✓K) dl
q (a)
A
(g) `3q(a)+ fq ( a) A 1 2 K exp (— 2K^ dA by Mill's ratio A.4.1
'"
6✓K °° 1
=3q(a)+ j — e -''dy
(log 1 /a)Ka Y
lettingy=KA 2 /2
m
e ydy
(h) <-3q(a)+
2ir(1 2)Ka (log 1 /a)Ka
<_3q(a)+3.5V ^6q(a)
Theorem 4
f(t) —f(s)
(a) li mdsup —sh(a)>M,withh(a)= 21og(1/a)^,.
t
Consider the three terms enclosed in curly parentheses in (b); the first is
bounded, the second converges to oo, and hence the third converges to 0. We
now redefine a k (and go to a subsequence if need be to replace -- 0 by 10) so
that
If(tk)—f(Sk)I
(d) lira sup Ii(t)—AS)I lim M.
Skh(tk — Sk)
a10 r —ssa vah(a) k-.cn tk —
THE OSCILLATION MODULI io, Co, AND m OF U AND S 541
Thus
wr(a) Wr(a)
(e)
^—
for any bounded f.
al ö 2a log (1/a) ö 2 log (1/a)
(f)
I f(t) —f(s)I _
^h (a) —M for0^t—s<_a, then
11(t) —f(s)I >M.
t—sh(t—s)
Thus
t)—f(s)I
lim sup If( <Iim sup f(t)—f(s)I
^
so that
(h) a foranyf.
0 2a log (1/a)^a 0 2log(1/a)
Combining (e) and (h) with Levy's theorem (Theorem 1) gives (21). This proof
is due to R. Pyke. Shorack's (1982) original proof was based on the inequality
of Exercise 1 below. ❑
for continuous f
(24) 0<_a(1—S)b<b^5<_1
Our primary concern will be with the modulus of continuity w n of the empirical
process U „. In this section we let
In analogy with Levy's theorem (Theorem 14.1.1), we would like to show that
w (an)
lim 1 a.s. for a n \ 0.
n,m 2 a„ ( 1/a n )=
l0 g
However, the rate at which a n \ 0 is sensitive; it can affect both the value of
the a.s. limit and the form of the appropriate normalizing constant.
We will assume that a n satisfies the regularity conditions
and
C„ log n
(6)
l an =
n
then (4) holds if c„ -^ oo and (4) fails if c„ = cc (0, cc). Condition (3) requires
that a n not be too big; if
1
(7)
an - (log n)`^
then (3) holds if c„ -* oo and (3) fails if cn = CC (0, cc). Roughly speaking then,
THE OSCILLATION MODULI OF U,, 543
We now state two theorems that deal with the boundary cases a„ _
(c log n)/n and a„ = 1/(log n)`, respectively.
For the case a n = (c log n)/n we will have need of the functions ßC and
ß defined earlier. Recall that for c>0 and cl we let ß' > 1 and ß1
denote the solutions of
(9)
f ß+ I from m to l as c T from 0 to oo,
P c T from 0 to las c T from 1 to X.
-
Then we have
w (an)
(12) um c/2(1—ß^) a.s.
log (1/a„)—
The case an = 1/(log n)` is one that will arise in connection with the Kiefer
process in Section 14.3. [Theorems I and 2 above are phrased in the spirit of
conclusion (14.3.6) of Theorem 3.1 below, while the upcoming Theorem 3 is
phrased in terms of conclusion (14.3.7).]
Then we have
w (an) w (an) 1 +C
(14') 1 = lim < lim = a.s.
2a „log (1 /a„) n.= 2a „log (1 /a„) V c
This is the format used in Theorems 1 and 2; we used the format (14) for
Theorem 3 since the rhs of (14) with c = 0 is stronger than (14') with c = 0.
Wn(an)
+C] a.s. wrt I I?
2a„ log Z n I
An examination of our proofs of Theorems 1-3 will show that they, in fact,
establish even more than is claimed. The additional results are easily sum-
marized.
for any fixed 0<_ c o < d0 <_ 1. For co = 0 and do = 1, since we denote this quantity
by w"(a„), we obtain the special case that
(16) w;; (a„) = sup #U(C) may replace (o,* ( a„) in Theorems 1-3.
ICI=a,
w(a)
in Theorems 1 -3,
(17) 2log((I/an) may replace 2aw1
THE OSCILLATION MODULI OF U„ 545
where
Remark 2. Suppose
Then we have
✓n log (1/c„)
lira w„(a„)<2 a.s.
„- w log n
Inequality 1. (Mason, Shorack, Wellner) Let 0< a <_ S <_ Z. Then for all A >0
Moreover,
The next inequality follows along the lines of Stute (1982), with care taken
to extend the generality.
Let r >0 be given. For n k = ((1 + 0)") we can choose 0 = O., d so small that for
k ? some k£, d we have
w l+e) Z a) r+E
(22) Pl max ^/m
— (am) ^ r+2e 1 ^2P1 rs ( `
n,_I mink / A m \ Vank(l +e) 1+0 ink ^^
[Stute then got a bound similar to, but weaker than, (20) by using moment
generating function bounds on the rhs of (23).] (Bolthausen, 1977b also used
a version of this inequality.)
Exercise 2. Attempt to prove a version of Inequality 2 using Proposition 1
below.
Proposition 1. Let I✓n = o[U m (t): 0<_ t<_ I and I <_ m n]. Then
(Viw n (a), 19n ), n >_ 1, is a submartingale for any fixed a > 0.
Proof. Now for any fixed set C we note that (v'iv n (C), fin ), n? 1, is a
martingale, since its just an iid sum. Thus
(a) An =[w(a)>_ate].
k+1 k `
leads to
2
(A a n/M) Z (1–a-1/M)
P(A„)_<M exp( – 2(a+1/M)(1–a–I/M)
(Ala– ✓n/M)(1–a-1/M)
X (a+1/M).^ /^
A 2 a( 1–a-1/M)
! 2 A
(g)
<–Mexp1 – 2 (a+I/M) ( AM✓ä)' P ( an))
\
sin e Ji is j. c
We now require M to be the smallest integer for which
a(1 —a -1/M)
J a)(1 -6)?(1 —S) 2 and
( ) (a +l/M) ^(1—
—S),
(1 AM a)1
provided A >— 8 2 /na. Now (1) is (20) for A ? 6 2 v'na, and combined with (k) it
establishes (19) in case A > 6 2 /.
Before turning to the proof of (19) and (20) in case A < S Z na, we will
improve on (1) for later use. Now using Shorack's inequality (Inequality 11.1.2)
23 p
s ex
Kä 2
—4
z
1 —MA✓n✓--)l z(1_a)2
xa/iI an ` 1—M,lä)(1—a)//
\\
(24) —^3 exp —(1_ S)4 2 #A _i — S) 2 I I
( ( for A >_ S z na,
provided0<a^S<_2.
We now turn to the proof of (19) and (20) in case A < 3V. We now note
that for 0: t — s <a we also have
(m)
+2 max sup I UJ„ I j — r,
nj M Osrr IfM `M M
the factor 2 on the second term is needed in case no JIM point lies between
S t Figure 2.
1 a3 2 4 5
(n) M <— 4 , which entails M < 5a2 + 1 <_52
a
j
M
_I 1 1
P
1 1
( QJ„ 1%
—I 0
1—a 1
1—a 1+S
„ II1/M ^ / 1-1M
/ 1S
+ j=1
M p(I U
1—I 0 1-1/M2(1+8))
(o) =b+d.
using (n), and then and 4i(t)> 1— S for :< _6 2 by (11.1.11). James's
inequality (Inequality 11.1.2) also gives [note also (n) and (11.1.12)]
Thus
5^ 5
(s) < S Z na < {M since M < (n).
a42 by
Suppose
Now define
We will now show that we can choose K = KE , d so large that for all n < m ^ n,
1 0 exp(—G log
6 =160am '^160a 0, ' see(g)
where
z
64Gd
(g) K=KE,d=2, B=BF,d=4D, D=DE,d=eexp( ).
>_ YY Am
(r+2e)Am
—V Y
- i K1Am
+2e)—./(1+0) 2 -1K ✓ 8] as k^oo)
( — (]+0) [(r
(i) >_Am(r+E) by examining KO in (g)
(j) ?A,,(r+e) since A m /.
Thus
(k)
n
U (Amr)Bm)-Dn= sup Un\S^ t]
^^
1
?(r+)A J,
552 OSCILLATIONS OF THE EMPIRICAL PROCESS
Hence
'— I P( (AmnBm)\UAk
T
m=n k=n /
U Ak)n
n=n P ^^ Am\ kn Bm^
(1) _ P( A \ U A ) P(B )
m k m by independence
m=n k=n
n
>_[ inf
nksn
P(B )]P U A m )
m
m=n
(m) ^2 P( U Am
as claimed. ❑
Proof of Theorems 1, 2, and 3 upper bounds. This proof is from Mason et al.
(1983). Let E > 0 be given. Define
(a) A
r
wm(am)
(r+2E) 2 log (am/
L .^ J'
(b) I P(Dk)<°o,
k=1
where
r+e
(c) Dk = I a„ ,I((l+d)Zank) /((l +d))> 2log \
+ LLL l d ank1 _,/ J
with
20r+E i 1
I'(Dk G s3 ank ( —(1 —S) 4 7k' (
1+ d)2exp
)—
I +d log ank
-1
()
s >4(r+021k/(l+a)2l -1
(e) < Ca;,('-
k-1
using ( 2 ),
THE OSCILLATION MODULI OF U, 553
where
(g) Yk -* 1 ask -* o0
since the argument of ill converges to 0. Thus from (e) we have for S = S E
sufficiently small and K = K E , s sufficiently large that
The worst situation for the convergence of the rhs of (h) is when the a m 's are
as large as possible; but from (3) we know that for k sufficiently large
1
< for any constant cc (0, co)
a "` (log n k )`
where
/ ^(r+e) \
(I) yk ? y + = ill 1 r I for k sufficiently large.
554 OSCILLATIONS OF THE EMPIRICAL PROCESS
The series on the rhs of (k) will be convergent for any small E >0 and sufficiently
small choice of S = 5. provided r is at least as large as the solution R of the
equation [note the exponent of (k)]
1 =R2y+=R2+fr( ^cR)
(m) =ch(1+ - I.
VR
1+ =13
or
(n) R= J(PC -1 ).
w(a)
<
2 (ß^
(o) um c —1) a.s.
„ oz 2a„ log (1/a n )
under Theorem 2 hypotheses.
We now turn to the upper bound in (12). We note that (24) applies since
the correspondence
r+e
A ---- 2(oga, and na-^ n k + ,(1 +d) z a „ k
l+d
satisfies the requirement A > S Z na of (25) for all sufficiently small 5 since
((r+E)/(1 +d) 512 ) 27 c >_ 5 2 . Thus applying (24) to the Dk of (c) gives
(1 —S)°yk(l log
P (Dk )CS3a „k (1 +d)2exp(— +d) \al_,/)
= ca(kl-'ii)4(r+El2yq/(l+d)2] -I9
(q)
THE OSCILLATION MODULI OF U„ 555
where
(1—S)z(r+E)/(l+d)z
Y k =—IP 21og1 1
nk+Ian ; \a„k-i ,
(1+d) c
r+e
(r) ? y- \— 2 c for d and 3 sufficiently small.
Thus the series in (q) will be convergent for any small s > 0 and sufficiently
small choice of S = S£ provided r is at least as large as the solution R of the
equation [note the exponent of (q)]
^R IR
Thus
(u)
P(Dk+l)^C{[klog(1+d)]`}
The series on the rhs of (u) will be convergent for any small 8 > 0 and
sufficiently small choice of S = S provided r is at least as large as the solution
E
(w) c(r2-1)=1.
Thus
l+c
(x) R =
c
556 OSCILLATIONS OF THE EMPIRICAL PROCESS
wn(an) 1+ c
(y) lim a.s. under Theorem 3 hypotheses.
n-•^ 2a„ log (1 /a„) c
[The trivial modifications needed in definition (a) for the case c =0 are easily
made.]
Comment: We cannot proceed along these same lines in a proof of Remark
2, since in that case the hypotheses of our maximal inequality fail. The upper
bound of Remark 2 is the cheap one that results when the whole sequence
1 ° P(A m ) [not the subsequence Z ° P(Dk )] is made convergent by choosing
,1 large enough in the exponential bound. ❑
Proof of the Theorem 1 (and its Theorem 4 version) lower bound. This particular
proof is here for a purpose. It is based on the conditional Poisson representation
of U. It is a crude proof in that the factor 3 ✓n enters below at line (f). Because
of this crudeness, it is not possible to establish the lower bound in Theorem
2 by this method; it will be established in Section 6 via consideration of the
Poisson bridge. This proof is presented here so that the shortcomings of the
conditional Poisson representation are made explicit.
Let
1
(a) M=Mn=
Qn
and define
ty ((i-1)a is ] 1
(b) Bam= max "' — (1 —e)r 2log
n ! YQ„
l^i_M - an
As in (8.4.2), we define
Thus
P(E)
(25) =P(E^F)--
P(F)
_ t[N(na")—na„] ^ M
—(1—s)r 2 1 og( 1 /a")) /P(N(n)=n)
(e) —P( na"
3'/
setting r = 1
xp (a-,`)
_ 3^
with c” -^ co by (3)
exp ((log n)"-)
(h) _ (a convergent series).
U,,((i —1)a n , ia n ] 1
(i) lim max > 1 a.s. with M = I
-.=,^;^M 2a„ log(1/a") a"
558 OSCILLATIONS OF THE EMPIRICAL PROCESS
This completes the proof. Note that our proof can be trivially modified to
show that under Theorem 1 hypotheses
±y0(C)
(27) lim sup -1 a.s.
> for each 0 <_ co < d0 s 1;
n -, ICI =Q.. 2a n log (1 /an)
C [co.do]
^ (n^ ) =v foralln1.
Let us define c n by
_ log(1/a„)
(4) cn = to ° or a„ _ (log n) `^,
gz n
MODULUS OF CONTINUITY FOR KIEFER PROCESS US„ 559
The following theorem is due to Chan (1977) for c = cc. The Jim sup result
is due to Csörgö and Revesz (1978b). The lim inf result seems to be due to
Mason et al. (1983), though the same conclusion for S(n• )/v'i had been
established by Book and Shore (1978). Note that for CE (0, cc), the statements
(6) and (7) agree.
"(a") "(a„} —
(6) 1 = lim <— lim 1+c a.s.
n-.00 2a„log (1/a n ) n -.00 2a„log (1/a n ) V c
(7) =tim
n -.
`”'" "(a ) <lim
2a„ log, n „^^
2" a ")
a ( = l+c a.s.
„Og g n
(8) w(a)
_________ -4 1 asn- cc
2a„ log (1/a„)
provided that a„ - 0. That is, the -> p limit agrees with the lim inf in (6) and (7).
From Exercise 2.2.12 we know that
where the LILof theorem 2.8.3 implies that S(n, 1)/(fib„) [-1, 1] a.s. wrt
where b„ = J2 log e n. Thus, note (6),
-0 a.s.
2a" log 2 n – a " b"
as n -* oo when 0:5 c < oo.
Our proof will require the exponential bound of Inequality 14.1.1 and the
following maximal inequality.
Inequality 1. Let 0 <a < I be given. Then for any 0< S < I
§(m, \ / 2 \
(11) P max sup -?A'%ä
C)I I<as3 expl – (1–S)' —
( ^M!5" ICIsa / \
for all A > 0. [This continues to hold if n is regarded as continuous on (0, oo).]
Proof. Let Wv n =Q[S(m, t): 1 msn,0<_ts1]. Then
Z ,
s exp (–rl )E exp ( rM'
an
=e2
1+
by Inequality 14.1.1
f
64f e - cl -s>Z(log z)/(2r> dx
e aS 2 (log x)
]
Now for any 0< 6 < Z the quantity in square brackets in (b) does not exceed
(64/aS 3 ) provided r= rs = (1 - S) 3 /2. ❑
IS(m, C)! 1
(b) A,„° sup ^- > (r+)
e 2a„, log ^ — J
ICHu,, v m , am/
IS(m, C)(
( c c ) nU A,„ ^ Dk = max sup
m — nk nk^mcnk+1 ICI<pnk "K
?(r+e)\j 2 an k + l log (
k)]
'
ank+,
(d) < KB exp (-(1- 0) "k
P(Dk) _ (r+ s} Z Iog 1/
ank \ nk+t a nk ank
+0)) z k b
(log(1K81 2
y (a)
Thus Borel-Cantelli gives P(D k i.o.) = 0, and hence (c) implies P(A,„ i.o.) = 0.
That is, since r > 0 is arbitrary,
S(n, C)J
(h) lim w" a " = lim sup < 1 a.s.
n -.^ n log (1 /an) n -=^cl^a, /2na, log (1 /a")
when c = oo.
1
(i) M=M„=
^an)
and define
(j) B„ max
wn(( t-1)a n, ia„J 1
--(1—B)r 2log J
1
1si-M an
1M
P(B)= P(N(0, 1)<
— (1 _O)ritj 2log 1
a„ J
1M
=11 —PI N(0,1)>(1 -0)r 21og 1
L \ a / J
1 ( _(I — "^
< 1 —exp B) 3 r 2 log 1
an J
for all large n, by Mill's ratio A.4.1
= [1— a „1 - e > 3 r 2 IM
ex (I-° '2 I
P ( —Ma„
)
Thus, since the lim inf of the supremum of a larger set of values is larger,
W(C )
Note that (13) still holds if the sup is restricted to cc [c o , do ] for some
0: Co < d 0 1.
Thus Corollary 1 holds in case c = cc. Theorem 1, in the case c = «0, follows
from (9) through (10). [Since (6) and (7) are different versions of the same
statement in case cc (0, cc), a proof of (7) will complete the proof of Corollary
11 ❑
Proof of Corollary 1 for CE [0, oo). We will consider the lim sup in (7) first.
The proof that
W(a)
(a) li
m 2a" log2 n — 1 + c a.s.
is similar to the proof of (h) in the proof of the c = co case. Just replace
log (1/a.) by log t m in line (b) of the c = oo proof, and obtain at line (f) of
that proof that
(b) P(Dk) KB
a"k[log nk](1-e)'(r+,)2
KB K
c k(^-a^=c.
[log nk](1 a) (r+e)
(c)
If we now let r = VIT, then (c) is a convergent series for any e > 0 provided
0 = 0 is chosen small enough. Thus (a) holds.
We must next show that
Our proof will be modeled after the proof of Proposition 2.8.1, which could
be consulted at this time for motivation.
564 OSCILLATIONS OF THE EMPIRICAL PROCESS
(e) Zk = max
I M "k+t
2 (nk +l — nk la nk +t 1°g 2 nk +1
M = Mnk+,
?1 — [1 —exp ( — (1 — E)(1 +c)logznk+l] M
by Mill's ratio A.4.1, for all large k
- cl -E>cl+r)]M
= 1—[1 —(log nk +l)
-Ed(l+C> )
> 1 —exp (—M(log nk +l) - (l
(1 - e ) c l +c>) by (4) and (5)
1 —exp (—(log n k +1 )` -
1 —exp (—(log nk +l) ) - -1 —exp (—K B k) for some KB
2
KB 1 K0 ) ]
KB K
>_1— 1 —
-E + 2 (V`
ki = kl- E - 2 kz-zE
(h) nk +l — nk
Zk
nk +l
—F S(nk,
max
1,i M k+t
lank +t) —S(nk,
2nkank +t logt
( i— 1)ank +t)
nk +l
Just replace log (1/a") by log 2 n in the definition of B. at line 0) of the proof
in the c = co case, and obtain at line (k) of that c = oo proof that for any
0<0<1 and CE(0,00)
P(B"
) 1
exp ((log nk)`^k-U-e) . )
1
if r–/
exp ([k log (l+
exp (–d'k') for some d, d'>0
(1) _ (a convergent series, by the integral test).
We also note from Levy's theorem (Theorem 14.1.1) [or (8)] that
(C) S( C)
(15) sup W = sup -.p when c E [0, oo]
SCI' a^ 2a" log 2 n IcI=,, 2a" log (1/a")
for any sequence of Brownian motions W".
Combining (m) and (15) gives (14). Minor changes in (m) and (15) give (14)
566 OSCILLATIONS OF THE EMPIRICAL PROCESS
with J Cl = a replaced by ICl^ a. Also, this latter version of (14) implies that
§ + (n, C)
(16) lim sup ?^ a.s.
n-.00 ICI =a„ 2na log 2 n
To go from (14) to (16) we must "fill in the gaps." Defining k by n k <_ n < nk+,,
original term
term 2
term 3 ^,
nk
t t +ant
+a.R,
term 1
Figure 1.
t +a n )—S(n, t)]
iim sup [S(n,
n^w075[^I-a^ 2nan 1og 2 n
—^ l+c a.s.;
(o) 1 +9^ l+e l+c
MODULUS OF CONTINUITY VIA HUNGARIAN CONSTRUCTION 567
the first term in (o) comes from (14), the second term comes from (a) [note
that the length of the intervals in the supremum is a" — a" k and that
0<
a,—a" ^1— na"
(P) nksl—nk^l— k
a" A nka", n n nk+I 1+0
so that (a) can be applied on the subsequence n k with interval length of order
(0/(1+9))a" k ], and the third term in (o) is similarly implied by the proof of
(a). Since 0>0 is arbitrarily small, (o) implies (16). ❑
Corollary 2. Show that if the c„ of (4) satisfies c" -* c E (0, co), then
B
(17) lim sup "(C) =f a.s.
^ck"„ J2lCI log 2 n
satisfies
/ (log n) 2
(2) Ti m !;U" —B„ some M <oo a.s.
provided a" \‚ na,,!', and log (1/ a")/log 2 n- oo as n -* oo. Combining these
should tell us something about a,"(a"). It does yield Theorem 14.2.3 but only
part of Stute's theorem (Theorem 14.2.1) (see Proposition I below). These
results are from Mason et al. (1983).
and
as n - oo. Then
^,(an)
(6) lim = 1 a.s.
2a" log (1 /a")
4 W 1(a ) _ jun(C)I
2a„ log (1 /a") IcIP„ f2a" log 1/a")
JBn(C)j +o(1)
(c) = SU p
^cI_o„ v2a " Iog 2 n
fails to satisfy the present (5), but would satisfy the improved (5). The sequence
[There is another way to look at this: if (2) is ever improved to rate (log n)/v,
then our proof of Stute's theorem (Theorem 14.2.1) implies Theorem 14.3.1
via the improved (2).]
for A >0 in the ”+" case and for 0< A <_ f in the "—" case.
Inequality 2. Let q / and q(t)/' \, for t ? 0. Let 0 <_ a <_ (1— 8)b < b <_ S < 1.
Then
b 2 2 \
(2) P(II(N—I)'/g)lä^A)^s a rexp(—(1—S)y } q i t)
1 dt,
570 OSCILLATIONS OF THE EMPIRICAL PROCESS
where
‚(
(3) y =1 and y + =‚ A (a) f.
a /
Proofs. Now {exp (±r(I^(1) — t)): 0 <_ t<_ b} are both submartingales. As in
the proof of Inequality 11.1.2
for all s > 0. [Hint: Set q(t) = f, b = b„ = logs' n, and a = a„ = (log n) - ' in
Inequality 2 to handle the interval [a., b„]. Then letting A.
[ II (N — I)/ Vi II , > e log t n], we have for the first arrival time q, of that
It is routine to mimick these results completely for the centered Poisson process
(s >0 is fixed) defined by
for A >0 in the "+" case and for 0< A <J in the "—" case.
Inequality 4. Let q / and q (t )/ f \ for t > 0. Let 0 <_ a (1— S) b < b <_ S < 1.
Then
(7) P(Ill-5/q^^a-a)^^ a
6
ieXp{ (i — — S)y 2
}
2 2
q I 1
t) }dt,
where
(1)
(8) y 1 and y + = +p t Aä )
Proposition 1. Let `ii 2 = o [D_,(t): 0 <_ t <_ 1 and 0 <_ r<_ s]. Then
Inequality 6. Let x> 0. Let sequences 0< a s < 1 and A s > 0 be given that satisfy
for some 0< d <. Let e >0 be given. For S k = (1 + 0) k we can choose 0 = 6r, a
so small that for k >_ some kE d we have
and define
We also define
s —r ✓ar J
(d) _ [w,-r(ar) < KV OA r \ a
< 160 p
P(B;) a,
( 1)
(_ !^fOAK',1,
2„
eX 16 s-r JJ
/ z
! exp j - -j--0 log (1/a,)0 (Ka.)) by ( 12 )(iv)
<_ 60
D E2 ( 32GA
(g) K=KE,d= ^ 0 Oe ,a-
D=DE.d-sexpt
,
EJ
4D'
Thus (e) holds.
Now on the event [T = r] n B, we have
ra, sa s
w s ((1+0)a S )>ws (a, ) since a,=—<—=(1+0)a,
r s
1 s(C)
1, -
VD_.(C)j
(h) w.(ar) - sup
s s ^C^^ a , s-r
That is,
^x + 2 e) = 1 P(s<T <
_s) by(b)
2 P \ sup s VA,
)
2
< P(D)
SSrSs
f s
^
P(B r ) dFT (r)
(n) by ( 1 ).
Proof. Our proof of Theorems 14.2.1 - 14.2.3 depended only on the analogs
of Inequalities 5 and 6. ❑
where NJ(s, 1)/s --> a. ,. 1 by the SLLN and the result for M, give the result
for N,. ❑
Proof. SUI = fill s . Let Wr = o [NA(r'): 0 <_ r' <_ r]. Knowing Ml (t) in any neighbor-
hood of t = 0 gives knowledge of tkJ(s, 1) since v(I) is a pure jump process
plus a linear function with slope proportional to N(s, 1). Also, the conditional
distribution of N(s, t) given IN, is (s, r) plus a binomial ry with s, 1) —IN(s, r)
trials and probability (t — r)/(1 — r) of success. Thus for 0<_ r< t < 1
1 )_ {F (s,r)+[f`I(s,1)—N(s, t)](t—r)/(1—r)}—tN!(s, 1)
(a) E ( M(1)
1—t ^' (1—t)/
N(s, r)—r(%(s, 1)
(1—r)
(b) =^(r) ❑
1—r
Inequality 7. Consider any fixed s > 0. Let 0< b < 1 be fixed. Then for all
A >0 we have
2exp ` -
(18) P \Iii — iio > 1 — b)^ 2b(I b)^(^b(1—b)//
Proof. Now both of {exp (±r(^II_,(t)/(1— t): 0<_ :<— b} are submartingales by
Proposition A.10.1 for any real r. Thus, for M = M,,
ri — b) ))
nf exp (_ j - - )E( exp (t by Inequality A.10.1
Now
L VJ 2s k=3 VJ /k /k'J
2 2
br + b r
+s(1—b)
[— ,Is- 2s
k
b(1— b)r2 (rl^)
+sb(1 —b) -
2 k=3 kl
Letting
g(r) = — Ar +sb(1— b)
(e)
L e' / 7 -1
Solving g'(r) = 0 for r,, ;n and plugging into (f) gives (18). [Note that this very
same calculation is performed in going from line (c) to line (d) in the proof
of Bennett's inequality (Inequality A.4.3) if we replace the na g and b of that
proof by the present b(1— b) and 1/f.] ❑
Exercise 5. Verify that the bound (f) in the previous proof reduces to (18).
for all A > 0. On the rhs of (20) the (1— b) 2 does not matter at all (since we
typically work with b's that are less than the generic e > 0), and the rest of
the bound is the exact analog of (11.1.25). Not surprisingly, we can prove the
same theorems that (11.1.25) yields. [In fact, the unwanted (1—b) in the
argument of , in (18) will typically cancel out.]
EXPONENTIAL INEQUALITIES FOR POISSON PROCESSES 577
thus q E Q.
(22)
b
where
Proof. Reread the proof of Inequality 11.2.1 in the A„ case verbatim. Note
that at line (i) of that proof the extra factor (1 — b) in the argument of r' in
(18) even cancels out, so that we obtain exactly the same result here as we
did there. ❑
An Application
(24) >nc
N (x)= {n G,(X/n) if 0n
Recall from Section 8.5 that N(s, t) denotes a two-dimensional Poisson process,
that for each fixed s > 0
(2) v5(t) = [ ^ (S' t) N(s' 1)] =L 5 (t)—tL s (1) for 0<_ t <1
t) —tN(s, 1)]
(3) N$(t)—[N(s, for 0<_ t <_ 1
(s, 1)
(5) (^lst,Ns,,...)=(v1,vz,...)
Over the years many results for U. have been proven by one or another
form of Poisson representation. In this section, we are considering some of
the more difficult results for UJ,,, results dealing with its modulus of continuity.
This is our vehicle for a careful discussion of Poisson embedding, which is
the real purpose of Sections 14.5-14.7. The basic point we wish to make here
is that
Why did we follow the approach we did? Using our approach, the inequalities
used apply directly to U. ; following (7), the inequalities developed would
apply instead to N. (Since these inequalities are interesting in their own right,
we put them in Section 14.5. We also note that Lemma 1 has independent
interest.) Finally, with regard to the lower bounds in Section 14.2, we observe
that Theorem 14.2.1 was proven via the crude conditional Poisson representa-
tion, Theorem 14.2.2 will be proven later in this section via two-dimensional
embedding, and Theorem 14.2.3 could have been proven via a Poisson rep-
resentation.
Exercise 1. Reprove the lower bound of Theorem 14.2.3 via Poisson rep-
resentation.
clogs
(a) M=Ms=(1/a,) whereas =
s
Now define
(b)
E
B5= t ma^
±L,((i-1)a,, ia s ] 1
<_,t5=(1—e)r 21og1/a s .
]
1 ± (1-e)r log
2 (1 /a ) J "'
(d) <- [ i -exp ( -(1- e)r2 \log
as/ \ Sas ))J
by (11.9.19)
E ) r 2 'o] M y_dr 2 ko).,. exp ((-
_ay (I -_ E)r 2
ar ))
[1- a (I-
S -exp (-Ma S
_ 1/ (s log
e /C
and this is a convergent series for any r exceeding the solution R of the
equation [the argument of r1 in (d) is of the order t(1 - ).J/ r]
Thus
±L, (C)
(i) lim sup R'.
k- lCl=a rk sJ2aS k log (i /asp)
We now leave it to the reader as an exercise to "fill in the gaps" and show
that (i) can be extended to
±1 (C)
5
[Just follow the outline of step (o) in the proof of Corollary 14.3.1 for the
case of c E [0, co). Step (o) calls for use of the lim sup result: for this you can
appeal to Lemma 1.] Thus, combined with the upper bound of Lemma 1, we
have
±L,(C) t
(k) lim sup = R a.s.
s-= cI=a, 2a log
5
(1 /a 5 )
±M (C) t
S
(One has to use again the same argument used to "fill the gaps".) This completes
the proof of (14.2.11). For (14.2.12), you can replace I(i in (d) by 1 so that (g)
is replaced by R 2 = 1.
Our thanks to D. Mason, J. Einmahl, and F. Ruymgaart for pointing out
the earlier erroneous statement of Theorem 14.2.2, which was the result of our
treating the "—" case incorrectly here. ❑
(i) If
then
ns"(k")
= n-li - log n = 1 a.s.
(3) h^ W log n
111
(6) & — + J=
( with h (,l) =,1 —1—log a.
yC C
(7) there exist 1' sequences 0< d„ <_ k„ ^ d„ with d„/ n [ 0, d',,/ n J, 0
and d/d„--* 1,
log (n/k„)
n ,co (k„ is not too big),
(8) lo gz
then
n6(k)
(10) lim mow, ^ (k" /n) = lim =1 a.s.
„ 2k„ log( n/k„) „-- 2k„ log (n/k„)
n+l
V [ Sn+l( — t^„+^(1)+ n+1 (n+lt
(12) n+1 „+^
”/J n+l )
THE MODULUS OF CONTINUITY OF V,, 583
for (i — 1)/ n < t <_ i/ n, 1 <_ i <_ n. Thus the modulus of continuity of V. is
asymptotically identical to that of the partial-sum process S. of iid rv's
X, , ... , X„ where X i + 1= Exponential (1). Since an increment of S. of the
form S„(i /n,j /n]= [Gamma (j — i) — (j — i)] /Vi, Eq. (11.9.33) shows why (i
turns out to be the key function. Model your proof after the proofs of Section
2, checking Mason (1984a) for the appropriate maximal inequality.
0. INTRODUCTION
(1)
(2) =U,,—OJ„(QD„')+.^[Q^„-Gn'—I].
(3) a
limn i'n 1^.s.
g°II _v
n-.ao v'
(1) Dn=U,+Vn;
here V. denotes the uniform quantile process. Figure 1 illustrates the sense
in which D. is a difference, and thus shows why we expect it to be smaller
THE UNIFORM EMPIRICAL DIFFERENCE PROCESS D. 585
1 ii
N(t)/Jn /.-
IDn(t) /✓n
/ n(t) /.r
-
-1V n (t)/^n
0 t c '(t) 1 0 c 1 (t) t 1
Figure 1.
(3) ViIIG.°Gn'—ICI<— .
From Eq. (2) we see that D. is naturally associated with the modulus of
continuity a n of v „. By virtue of Smirnov's theorem (Theorem 13.1.1) we have
JIQ^„' — I^^
1 -
n h /41 B'
(5) a.s_ 1 as n -> o0.
OJ„ log n
Recall that JAN„ 11 = IIv„ 11.
586 THE UNIFORM EMPIRICAL DIFFERENCE PROCESS D,=-U,+N„
where
(8) P(vlji ff > x) = 2 (-1) k+ ' exp (-2k 2 x 4 ) for all x>0.
k=1
for 0-t<_1.
v„ = /[l=„ o F ' — I]. Then since F' = F - ' (G ), two easy applications of the
-
- -
that is, we may replace Di,,, U,, by l8 n, U,. We will extend this in Theorem 18.2.3
to df's on (—oo, cc). ❑
since
E(p — n :i)l[o,,J(P — en:i) Elena pI {E(4n:i — — p) 2 }' /2
Hence, ifpn = i/(n+1) -+pE(0, 1),
Proof of Theorem 2 (See Shorack, 1982a). We will work with the Hungarian
construction of (12.1.7) for which
/ (log n) z
(a) ^ n some M <oo.
lim IIvn—lI
n -.'X
_ U,(G ')—I3n(G;')
+0(1) by (3)
(b //i) log n n
G '
(14) _ "—"( " ) +o(1) a.s. by (a)
b n (log ^) /✓ n
for
1 1 1
(c) a„ = (2-2E) b„ Vi with (log ^_ ) —2 log n.
t 0 t<2-2e
(d) hE(t)= 2-2e if 2-2e^t<_Z+2e
1—t 2+2E^t1.
—2s
Figure 2.
0 2-2e Z+2e 1
Let
(e) I{3—E,1+E].
[l"(t)-3„(t+h.(t)b„/.%n)]
(f) lim sup = 1 a.s.
, 2a„ log (1/a„)
EJE
follows from writing the term in (h) as the sum of the term in (f) plus another
term that contributes nothing. Now Finkelstein's theorem (Theorem 13.2.1)
guarantees a.s. the existence of a subsequence m = m 0, on which
Vm
(i) gm b — h E satisfies ^jg m I^ -* 0 as m -* oo.
m
THE UNIFORM EMPIRICAL DIFFERENCE PROCESS D . 589
[he(t) +m(t)]bm
G) t+ '(t).
(1+8)b„
(m) 11G ' — III <_ c„ ° 2'ifor all n >_ some n^,. E
hrn n1/4JI®„ 11
;—
n- b„ log n
(a) a” = 1 /(Kb„.!n)
g n 1
^^"II® lo-
n- 1/4
lim n3 /4II6"°Gn'-III ^ lim
„-^ !IG -III l og n „ IIGlog
1 6„ 1 b„
(c) = lim < lim
n- b„ II 0/„ ^I log n lim b„ I4Nn II n -. to g n
2
— • 0 by Mogulskii's theorem (Theorem 13.5.1)
(d) = 0.
t)I
L^ lim sup IU"(s,
„ ,.o It
- -
n
sitllG ' IJI
- - IIG" ' - III log
jU „(s, t]i
►
1 m sup
”z-sa uiIG — 'ii log n
-.W
+lim sup IU"(s, t)I
a, IIG„` - III log n
THE UNIFORM EMPIRICAL DIFFERENCE PROCESS D . 591
U,S,t I U,, s,t
(e) : tim sup b^ tim
m sup I ( ]I
II -.st a,, lo g n + I^-.^I>a„ 2(t—s) log ( 1 /an)
/1 w,, (a,,) h
Wn(a,,)
(f) 4W 1i+
--v/2a log(I/a )
n n n 2log (1/an)
The previous upper bound proof is short and clear, but it traces back through
many previous deep results. For this reason, we present the following proof,
essentially Bahadur's (1966), of a slightly weaker result (15) below.
Proof. For constants d, and d 2 to be specified later, let
(a) a n — d n-1"2(log2 n) I j bn — n / 4
1 2 1,
,
there is no loss in assuming that a, b n , c„' are integers. Fix is — tj <_ a n . Then
one of the intervals [0, 2a n ], [a n , 3a n ], [2an, 4a,,], . . . ‚ [1 —3a,,, 1— a n ], [1-
2a n , 1] must contain both s and t; and there are less than a„' such intervals.
Let In = [ t* — a n , t* + a n ] denote the interval containing s and t and let
denote a partition of In . If
then
^6 n (s)-6 n (t)—(S—t)
/ n2a
^[G nn,i+l)
( t1 —Gn(tnj) —(tn,i+l—tnj)]+ ba
592 THE UNIFORM EMPIRICAL DIFFERENCE PROCESS ®, =U,,+0/„
Thus
M„+ b".
n
2 ex nc”
2
‚ \n
(
n )
_.. 1 b (11.1.3)
C ^ p 2Pnij(1— p) \Pnij y
/ 8b \ c„
P M
10 Z
lim g 2d2
logg n >cn)1- 1
(c) < -1,
From (c) we obtain P(Mn ? c„) <co, so that P(M„ >_ c n i.o.) = 0; thus
U n—Un(Gn')11
(h) um jj --1 a.s.
V fl c n
Since IIG" o« n ' —1 cn -> 0 as n - oo is trivial, we obtain from (2) and (h) that
n" 4 d
(i) tim 1/2 n)'/4 llBnII ^2i/4 a.s.
(log n) (21og 2
Now recall that (i) holds for any d 2 exceeding 2v'd=2(2 - ' /2 +s)'/ 2 for all
e>0. Thus
(15) lim
n "4
b" ^I gnll s'./ a.s.
Then
(19) him
n-w
n°2(lo g2 n ) `
(log )d2 I U +V I—M (logn)dl/n
II(1—I)]°1
some MZ <a^ a.s.
n° lo )
l+° U +N I-9(io g n)/n
(20) lim Z(gz " ° Z <_ some M <co a.s.
log n [1(1—I)] , 9(log2n)/n
hence a simple proof of (19) based on this replacement and Theorem 2 will
not work. Also the case a, = a 2 = 41 , d 2 = Z, d, = 2 "just fails” to satisfy the first
condition of (18); but it does satisfy the second condition of (18), and is of
the same order as the Kiefer-Bahadur theorem (Theorem 2).
Proof. The proof is deferred until Section 16.4, where a necessary lemma is
established. ❑
the second term on the rhs of (a) corresponds to the shaded areas in Figure
1. Thus
Tn = W„(t) dt- , j W=
(5)
J 0 0
UJ 2 (t) dt,
ll
(12) W„= UJn(t) dt^ d W= I U 2 (t) dt as n-> X.
0 Jo
Open Question 1. How good are tests of fit based on T„ and T„?
CHAPTER 16
0. INTRODUCTION
The process
has mean 0 and variance 1 for each fixed value of t; hence it is called the
normalized uniform empirical process.
We note from Chibisov's theorem (Theorem 11.5.1) that = fails for 7L" and
from James's theorem (Theorem 13.4.1) that M ► fails for Z n / b" ; but in both
cases, the function I(1— I) "just missed."
In this chapter we will consider the rate at which Z,, blows up. Section 1
considers -* d, while Sections 2 and 3 consider -* a.s. . Analogous = results for
0/ n / I(1— I) are established in Section 1, while an a.s. result appears in Section
16.4.
597
598 THE NORMALIZED UNIFORM EMPIRICAL PROCESS d„
It is obvious that
[Note from the LIL (2.8.7) that 7L is not a random element on the measurable
function space (C, f) of Section 2.1.] Recall from (2.2.3) and Exercise 2.2.9
that the Uhlenbeck process X on (CR , WR ) with R = ( —c, cx)) is a stationary
normal process having
(4) EX(t) = 0 and Coy [X(s), X(t)] = exp ( — It — sI) for all s, t E R,
t)=
l(t) = -t)S( It §(^t t )
(6)
— !1
^C`2 log 1 _
t 1
J for 0 < t < 1.
<
We now turn to consideration of the norm of 1. Again, the LIL of (2.8.7)
implies IIZ II = °Ö a.s.; thus we let 0< d„ <e 1 and consider ill (II. We define
=
(7) an
e(1—d
d„(1—e „))
where0<d„<e„<1.
II X II 2 '' °g „”
- by stationarity of X
2.11 and necessitate the introduction of some notation. We will need the
normalizing functions
(9) b(t) ✓21og2 t and c(t) 21og2 t+2 log3 t-2 log 4ar
with b„ = b(n) and c„ = c(n). We also let E„ denote the df of the extreme value
distribution; thus
We now state the Darling and Erdös theorem (Theorem 2.10.1) that
while
II
(13) b(t) —y1–c(t) - d E^, ast-*oo.
[The key to the Darling and Erdös proof was the representation (5) and (8)
of § in terms of X.]
From (8), (12), and (13) we see immediately that
and
via elementary calculations. Combining this with (14), (15), and (11) gives
(16) bn 11 Z' 11 d, „ -* d E v as n -g oo
600 THE NORMALIZED UNIFORM EMPIRICAL PROCESS Z.
and
Theorem 1. (Jaeschke)
and
U(t)
sup
f^:^ VG t)( 1 — G(t))
II 7)I
7
-
vn
WEAK CONVERGENCE OF 11,11 601
The second version still seems the most preferable. Thus we let
n + 2„ :;
and
while
(24) T„ / b„ P 1 as n oo.
log' n
(a) lim 11U„—B„Ij IJ <—someM<co.
/
Now
U,,-8,, bM(log2n)/
„^ Mb„
lim b„ ^` li m = li w log n
„-- [TI—I led
(c) = 0 a.s.,
602 THE NORMALIZED UNIFORM EMPIRICAL PROCESS Z„
and
e
bn ll^n 11 ti„—Cn=bn
11
1 Ms I)I „ d —en
b„
1111 Bn
M —I) dn
e
— c„ +op (1) by(6)
bnIIZ#II„—cn+op(1)
for any k >_ 1; (25) will be established in the rest of this proof.
We will first show that (25) is implied by
p(bn II Z n 1I ö..0„fix)=(
IImnIIÖ ] x +zc„)\(
b n b„
Moreover,
IIZnIIO
bn —
Q-D„
I
.^ l o
I — ^^„
I
1
b„
n
n-1
C 2 [II G n/ I IIO n+1 ]
bn
4G(Eb"))
P \ " II ,i" / \ 8
-^ (12k log t n) exp — 4" log b„
)
(h) -0 as n-+oo,
implying
JjZn1^ i " -+ P 0 oo
asn-,.
(') b”
0
' ' d
J '
^ n
(27) = arcsin (2t-1) dU„(t) = ^^^ aresin (2f —1) ;
in
Show that Y,, ... , Y. are iid (0, (ßr 2 — 8)/4) = (0, 0.467). This would seem to
be a very useful statistic.
For each fixed t the ry Z n (t) is normalized Binomial (n, t) rv. Recall from
Inequality 11.1.1 that for 0<t< 2 the attainable bounds on the long tail
P(7L"(t) > A) are much poorer than those on the short tail P(Z"(t) <—A). This
difference is responsible for the difference in the behavior of IIZ I} I' 2 and
11ZniWö 2 that we will see exhibited below.
604 THE NORMALIZED UNIFORM EMPIRICAL PROCESS Z.
Note also that (t) [or (t)] as t goes from 0 to Z behaves the same as
Z(t) [or 7L„(t)] as t goes from 1 to 2. Thus we concentrate solely on [0, Z] in
most of this section.
We recall for contrast with the results below that (16.1.20) gives
(2) lim 7l„ (t) /bn = 1 a.s. for each fixed t E (0, 1).
n -.
We have also seen in Section 10.8 that the a.s. lim sup of 7L„ (a n )/b n with
a n \, 0 may be different than 1 (it may even be infinite when # denotes +
or) 1).
The question that led to Theorems 1 and 2 below was posed by Csörgö and
Revesz (1974). Theorem 1 was obtained by Csäki (1977). The fact that is
an a.s. upper bound was also obtained by Shorack (1977), which was not
published until Shorack (1980). Theorem 2 is the form due to Shorack; we
simply take this opportunity to state what had been proven.' This result also
follows from Csäki, who also evaluated the lim sup for a few key sequences.
His results are given in Section 3. Shorack used a martingale in n for his
proofs and Csäki used a martingale in t. We use Shorack's method, improved
further via Inequality 11.1.1, for all theorems in this, and the next, section.
Theorem 1. (Csäki)
-
(3) lim
n -.
"
=z ^/2 a.s.
bn
We now ask at what rate the divergence of J7l„jJö z takes place, and give a
characterization.
(6)
lim
)Z ))ö 2 0
= a.s. according as —
1 1<x
(7) n- 00
n=i n,1„ l. =oo
THE a.s. RATE OF DIVERGENCE OF BIZ 2 605
It thus holds that
+1 1 1/2
(8) lim
n-,ao
lo gon
gz
l0 2 a.s.
Il
Remark 1. We may replace II Z„ by II Z d I in Theorems 2 and 3.
„
Remark 2. Equations (3) and (8) show that I^ZnI^ö z and IIZnIh. z behave
rather differently. Recall also from Corollaries 10.5.1 and 10.5.2 that
(9) n-w
go gz
n
lim lo ll " III = 1 a.s.,
while
(11) um log
n-.
in / n"
gz
n k) _ a.s, for all fixed k,
while
(12) ►
1 m n:
nSk
= I a.s. for all fixed k.
log t n
with
E+b
(b) Yk = 4' "kyk = 1,
nk+l Cn+l
(c) Dk =
[ t^
^k mnk+t
e
1l
max 1 1LIIIwith n k =((1+9) k ),
C^ k+1
J
1 1 °° 1
(d) P fin: .o. /
f =0 if ^—<o
nA n ] nA n
(f) r° 1)110„= 0. -
_ t
(g) um II s a.s.
n-, °° bn
and choose 0 <_ some 0, so small that (a) and (e) give (h) via
Consider the "+" case of Theorem 3. Recall from Proposition A.9.4 that
there exists a sequence en for which
= I
(k) <oo and e n 1 0 (slowly) as n -* x.
n=1 nAnen
1
(1) Cn nAne.'
THE a.s. RATE OF DIVERGENCE OF IIZ„UUV Z 607
but we note that (f) still holds for this new c„. We also replace s + by
(m) e+_
=e+_
k =e.JA„ k /b ;, A for anysmalle>0
[so that the definition of Dk is appropriate for (6)]. The exponent of (a) then
satisfies
iI„x ^ 2 ^ E VAnt
EV
(Ek) Z i^k =( b1k
l bn `Y
k b 1/(An, en,) /
2
n;
EA nk 2 log An k
since i/i(A) --- (2 log A )/A and A -> co
b„ k EA nk
s log A n , E•
for large k
,/e„^ log 2 n k V
Letting A n = (log ‚)' + e for any fixed E >0 in Csäki's theorem (Theorem 3) we
find that a.s.
)(1+r)/2] 1/109 2 n
(b) "<
((jln11I/2)I/10g2 (log n
M for all n>some n.
where the events [in <p n ] are independent. Thus Borel-Cantelli implies P(:n <
pn i.o.) = 1, and hence
Thus
+ 1/2
)—Pn
^^[nQ^n(Pn)—nPn]
(b) III 1 ( 1„ 1 )IIo an^p((
I1
>_^/M nQs n (pn )—M since A.>_1
J
✓ M 1 — M i.o. with probability one, by (a).
Since ./Ä(1 -1/M) can be arbitrarily large, we have that the T i (b) is
a.s. infinite. Thus (7) holds. ❑
The proof of the lower bound (3) is omitted. Consult Csäki (1977), or
consider a Poisson bridge proof that follows the rough outline of Csäki's proof.
(14) Ti
7 n
ö °g"'
")/"
^ 1 a.s.,
n
while
(15)
IIZ (ogm n)/n
-*„O as n- cC.
b „
ALMOST SURE BEHAVIOR OF 11Zjä' 2 WITH a1,0 609
Recall that this is the function that a rises in bounding the longer tail of a
binomial distribution. As usual b n = J2 log, n.
2
v
1
0
Figure 1.
.236... 1 C
We define c n by
/ c,, log t n
(2) a =
n
We will present three theorems; they concern the cases cn - CE (0, 00), c n - 00,
and c n -' 0. All results are from Csäki (1977).
We consider first the case C. - cc (0, oo). For each c>0
and let
^n 1/2
(5) lim
n-m bn
I I °" = L, a.s.
As we see from Figure 1, the a.s. lim sup of (7L n fl ' 2/b n varies continuously
from V'2 to co on the class of sequences of Theorem 1. This is more detailed
information than is contained in Theorem 2 of the previous section. The
ordinary LIL at t = Z tells us that the a.s. him sup in question can never be less
610 THE NORMALIZED UNIFORM EMPIRICAL PROCESS 1,,
(6) log2(1/a„)
-c asn-+oowith0 c 1,
log e n
then
(7) lim
n- ao
bn
" _
° a.s.
_ 1
yields c=0,
(8) a" log n
(9) an = exp (— (log n )`) yields cc (0, 1),
(10) a„- n - 'logn yieldsc =1,
(11) a -S with 8<1 yields c =1.
Theorem 3. If c„ N 0 and
log(1 /c„)
(13) lim 17+11ii2,/cn 1 a.s.
/logen
(14) c= 1
log 3 n
and
c,
(15) c„ — (log2 n ) where c,, c, > 0.
ALMOST SURE BEHAVIOR OF JIZ;II,/ 2 WITH a„\,0 611
this should be compared with the result of Baxter (1955) (see Csörgö and
Revesz, 1974), which can be rephrased as
c log, n )_
(17) lim Z n - V 1 a.s.
n-. n log, n
cn log 2 n
(a) a,,=- where c o -' c e (0, oo)
n
and let
We define
7L
(d) lim II b D °"<r a.s.
n-+o n
(f) P(Dk)<oo,
(g) Dk=[llZnk+i+,'(1-9)2rbk].
have [the VT7 in the denominator of 7l„ accounts for one (1- 0)]
3 *
log (an k /an,+,) 7 r z (bflk) z +
(h) P(Dj) 50
exp (—(1 —8) 2 Yk),
rb
(i) nk
n k+l a nk+,
[Note the complete analogy with James's theorem (Theorem 13.4.1), Stute's
theorem (Theorem 14.2.1), Theorem 14.2.2, and Csäki's theorems of the pre-
vious section.]
For Theorem 1 we set b nk = b nk and consider three subintervals:
4(lo0dnk ) $ (_ ))
P(Dk) exp —(1—B) rz (logznk)^G`
_
^lCrJl
(k) ( 4/ 0 )(log dn k )
(l og nk )(I e^B^Z^c,rirr)l
where log n k — k log (1 + 0) and log dnk washes out, since we specified it went
to oo "slowly." Thus for k sufficiently large we have
(some Me )
P(D k )—
(m) 1 = R 2 1J!( R)
(n) R=\/(/3_1).
ALMOST SURE BEHAVIOR OF IIg^U.. WITH a„',,0 613
Now consider the interval [da, a,]. As in (h), we have
for k sufficiently large since ßy(0) = 1. Since log n k --- k log (1 + 0), this series
converges for any r> v. Combining this with (n) gives
^n a
(p) lim II l I "s(f v R)+s=L,+e a.s. for all r > 0.
n-." bn
(q) R*=
2
(r) i — r) >_ (1— 0) for all sufficiently large k,
( Cnk
Me
P(Dk) k a
provided 0 = 0, is chosen small enough. Thus the lim sup over this first
subinterval is a.s. bounded above by 1. From (o) we obtain
and this is all we will need to establish the upper bound. We let
(w)
/(
b*, = log 2 n/ I ./c„ log t n log
L
Q) 1
Cn
.
In this case we also consider the three intervals in (j). For [a n , ad] we obtain
from (h) that
P(Dk) s 3 log (a *k /a „k.,)
0
( 1 - 0) 7 r 2 loge n k r
xexp (
2 c„ k log 2 (1/c„ k ) \c„ k log (1/c„ k ))
(1 -8) 'r z 2
Me (log d„ k ) exp (-
2 r logt nk )
since i/i(A) -- (2 log A)/ A as A -x'
_ M9 (log d k „ )
(log nk )( 1- B) 7r
1
^-M0 log — exp( -(1-8) $ r log2nk)
a „k
_ MB(log nk)
G
)(t
(l og nk -8 f(1„ k
for some sufficiently small a,. James's theorem (Theorem 13.4.1) ensures that
the lim sup over II II 1 ! 2 does not exceed 1. This gives the upper bound in
Theorem 3. ❑
0 °° 1 < oa
p(n k IIwk(Gn — I)Il a„ i.o.)= 1 according asnnl _f k>^=
na '
n /c 1- l
and
Hk is the df of II (R — I) I +k Ii1i ö
and
In this section we will settle for a partial analog of Theorem 16.3.1; we present this
because it is required in Chapter 18.
616 THE NORMALIZED UNIFORM EMPIRICAL PROCESS Z.
(1)
„-.Jim oo l V„ p
I (1 — 1) a. n / b„<— 2 a.s.
where
d(log z n)
(d) a„ -
n
c(log 2 n)
(e) x,> forallat<_1—a„
n
for some fixed constant c; then by Csäki's theorem (Theorem 16.3.1) we would
have, for each e > 0 and all n ? some n e , that
+e) I—x,)b„
Q^„(x,)5x,+ (I'` byCsäki
(f) < t
1—xt
=l+=1+K t b„ by (b)
1—t 1—t 1—tom
(K 1/a„)b„
s l + since t <— 1— a„ by (e)
=1+K ,
d
we see that (g) holds so long as K is large enough to satisfy (recall e > 0 is
arbitrary)
clog e n b„clogtn
x,— n =I—K t(1—t) b
c c1og 2 n 1 c \ b„
= dt — n
J +^1 It—K t(1—t)^
>_0+f^(1—d)^—K ( iftexceedsthea„of(d)
JJ
i j n
>riL(1—d)^—fK
(i) K 1C ) 1 d
a choice which is made possible by Theorem 16.3.1 and Figure 16.3.1. Trial and
error then shows that specifying
leads to the joint satisfaction of (h) and (i), hence (e) and (f). But (f) is just
the left-hand inequality in (a). The right-hand inequality in (a) is verified by
a completely symmetric argument. ❑
Proof of Theorem 15.1.3. We consider only M,(log n) d, /n <_ t <_ z; the rest is
treated symmetrically. Note that
(b) _I 1-9(log2n)/nC (
2+e)bn
<4(
log2 n ) 1/2
v' , 9(Iog2 n)/n
„/ ; • \ n /)
(The case {v n (t AG (t), t v 6(t)] } > 0 is similar, and is omitted.) Thus for
-
max IUn(t)—Un(Gn`(t))It-°'
Mj(log n) d '/nsts1 /2
P,;=P
\I „ \n'^n 3 Jl >
((i /n 3 ) a '(K„1 n a2 )) 2
(g) —2 exp
({ 2.4((log 2 n)/n)'i2(i/n3)1/2}
fl
\ X ^ n24((log2
(I/n3)a'(K„/na2) ^^
)/ n)`f2(i/n3)'tz
for all j satisfying the condition in (d), by inequality 11.1.2. Now the bracketed
{ } term in (g) satisfies
(2a,-iiz)^onli2 za2K
(h) { } [M,(log n)d'/n] 2
2 .
8(log 2 n)'i
n )1/2-°, K
(1)M,(log n) d 1 4na2(1og2 n)'i 2
_K„
4M; /2-° I(log n) d i (I/2- a ,) (Iog 2 n )'i 2= B„
for all i as specified by (d); and thus the % term of (g) satisfies
(positiveConstant)(log2n) 1
(j)
E
+y — B
„
n 8 J for some s> 0
[this first bound on the rhs of (j) is appropriate if B„ oo, else the second]
for n sufficiently large. Combining (h) and (j) into (g) gives
Constant (log n)(d ,c2a,-112)ßo)+d,+d,(112-a,)
(k) pij 2 exp — n((2a,-I/2)ßo)+2a2-t/2(log2 n)`-'
n)d,a,+a,)
(1) =2 exp (—Constant (log
since a, ;2! ä, letting c = 1, and so forth
2
(m) <2 exp (-8 log n) < 8
n
620 THE NORMALIZED UNIFORM EMPIRICAL PROCESS 7L
6
(o) P 0„ >_ <_ n = = (term of a convergent series).
( -)
Thus
n a26 'n
(p) um K < 1 a.s.
n^ „
Combining (p) and (q) into (a) gives the first result (15.1.19).
To prove the second result (15.1.20), we note that the B. of /i is of order
(log n) d=/(log 2 n)' -° i + `, with d2 = 1. Hence the p;; of (k) is bounded by
for n sufficiently large. Now apply steps (n) through (q) again.
for
The special case having a, = a 2 = 4, d, = 1, d2 =-,, c = 0 (but allowing
m- dependent rv's) can be found in Singh (1979). ❑
CHAPTER 17
0. INTRODUCTION
In this chapter we will consider the uniform empirical process U,, "indexed"
by a collection of functions We write
1 _
(1) U.(f)= iavn=J— ^E^[1( i) i], — forf€
0
In Sections 17.1 and 17.2 we treat the case of (indicators of) intervals, 9=
0 < s < t < 1 }. In this case we write U. (C) for U. (1 ) =
{l c : C = (s, t],
U,, (t)—v n (s) with C = (s, t], and let ICI=It—s^. Since
our interest will focus on functions q for which "UJ.= U with respect to 1) • /q11"
where both U. and U are considered as processes indexed by intervals. These
two sections parallel Sections 11.2 and 11.5.
Indexing by Swc C[0, 1] is considered (as a special case) in Section 17.3.
The main theorem there implies weak convergence of {U n (f): f c g} when
6= Lip (a)_{fEC[0, 1]: f(t)—f(s)I<_ t—sI°} ifa>2i these results are due
to Strassen and Dudley (1969) and are closely related to the higher-dimensional
results stated in Chapter 26.
For any interval C = (s, t] with 0 <_ s <_ t ^ 1, let f(C) = f (s, t] =f(t) — f(s ), and
let ICI = 1—s. Let T denote the class of such intervals, let '(a, b)
621
622 THE UNIFORM EMPIRICAL PROCESS INDEXED
{C E le: a ^ ICj ^ b }, and let 16(a) = 19(a, 1). For any subcollection ' let
^IhII w -=sup{^h(C)I: CE '} and if q is a nonnegative function, let 11h/qI
denote sup {Ih(C)I/q(jCI): Cc '}.
Our goal in this section is to develop inequalities for UJ„/gll, (a , b) . The
development here closely parallels that of Section 11.2, making strong use of
the inequalities contained in Section 11.1, and is also related to the inequalities
in Section 14.2.
Our inequalities again involve the function
and
Inequality 1. (Shorack and Wellner) Let 0 < a (1 — 6)b < b <_ S < 2 and
A>0. Then for gEQ
where
(6) sup „
A J
^O(C)I 24s exp(— (i —S)°y -2)
PCIJCJ^bq(ICI) J a ` 2
where
3
21/2 (1-5) if,t< 3 52 an
__ y"(C)I
>A
(a) A” Slop q(I CI )
Define
(b) 0=(1-3)
and integers J<— K by (if a = 0, then K = ce and we consider J ^ i< K)
(we let 0' denote 0' for J:5 i < K while 6 K denotes a and 6 J- ' denotes b);
and let
Since q is T we have
s t
Figure 1.
max sup ;
JsisK `B, q ( u
C€%
(g) M16201
624 THE UNIFORM EMPIRICAL PROCESS INDEXED
(h) M; < Z
S e; + 1 < 5 288 ; -, SZe,-, .
1+8
K M.—t u l/M S
1-1 lM'
1
(i) + P ^Aq(g') = B+D.
' =J j=0 1 —I o 1-1/M; 1+S/
f Aq(01)(1 —O)
X ^ 9' Vi 1 /1
1+S
K 4 _ A 2 9
ei- ') s ( OV—n
^ ex 1
, —S by (h)
l
since a = 6 K <_ B' for i <_ K and q(1)/t\ implies for J <— i <_ K that
tZ exp — 2 t (1_8) 4 y) dt
B < S Z ; ^+^ 1— B ^ J `
+ Sz 0 K exp (—
A2
(1_5)4i)
4 z z
(1) +S b6 exp ( 2 qgb )(1-6)`7)
by (g)
K. 1
I — A2
q (01)
(m) 2M ; exp (1—S)3 1
r=r \ 0 M, 2(1/M; )(1-1/M1 )
( Aq(0')(1—S)2 //
`e' /2 (1/M ).%n
;
symmetric about t = Z.
(2) T(q, A) =
fJ o ,
t-2 exp (—Ag 2 (t)/ t) dt
where A > 0.
Bounding ICj away from 0 in (4) is essential; if jCj shrinks down to a single
point, e, say, then 1J„(C) /q(^Cj)=vi(n - '- 0)/0 =co since all interesting q's
have q(0)=0.
The condition (3) in this theorem can be stated much more simply; this is
done in the following proposition. The Chung-Erdös-Sirao (1959) criterion
(see also Ito and McKean, 1974, p. 36) implies that for q E Q
/ 2
o+ t f - t /
\2 l <00.
If the integral on the right-hand side of (5) is finite (infinite) we say that q is
interval upper class (interval lower class) for U.
and to
7) g(t)=q2(t) /[t log (1/t)] -+co as t\0.
Exercise 1. Prove Proposition 1. [Hint: To show that (3) implies (7), first use
q to show
for 0 <A <1, then let t\0. To show that (7) implies (3), note that
b tb
r -2 exp (-,Ag 2 (t) /t) dt = t - 2 +A g ( ' ) dt.
o 0
To show the equivalence of (3) and (6), argue directly using (2) and (5).]
WEAK CONVERGENCE OF U, IN Ik•/qjl METRICS 627
An Application
o [v„(t)_U.(s)]2dsdt
2 2 o
1'
= I I n[G (t)—G (s)—(t—s)] 2 dsdt.
2fo J o ,
I I [U
(t)—U (s)]2
dsdt.
(8) T" Jo JoJo
It—SI(1—It—SI)
It—sI(1 —lt —sI)
Example 1.
ITn— T 2 I I[y„(t)—y„(s)]2—[U(t)—U(s)]2I
dsdt
D , It—sI(1—It—sI)
[y„(t)-1J„(s)]2+[UJ(t)—UJ(s)]2dsdt
+J DZ It—sI(1—It—sl)
_ DS ,
Ivn(t)—U.(s)—(U(t)—U(s))Ilvn(t)—Uf(s)+v(t)—U(s)I
It—sRi—It—sl)
dsdt
+R„
Open Question. What is the distribution of T2 in (9)? What are the power
properties of the test based on T„?
Exercise
as
2. D Let j
l — ^j l for 1 i, <_ n. Show that T n of (8) maybe written
(10) T =n Y_
„ +—
2 n n
n i=i j=1
E {Dlog(D) +(1 —D ;j )log(1— D) }.
Proof of Theorem 1. We first prove necessity: Suppose that (4) holds for
every E > 0. Choose and fix e > 0. Then
J ` (E 2 n - ' 1ogn)
CI = e 2 n - ' log n}
sup j q(ICI)
Thus eq* (t) s[2(t log (1/t))' /2 +q(t)] is interval upper class for every s >0,
(d) P(Iwn/gjjW(b,.an)? e)
'/2
IQJn (C)i
P sup ,/2 _8 log
a
{g{an))
1 {C:b,sICIsa n } ICI ( .) /
Also
(g) P(IIU/g11w(0,e)? )^E for 0=some BE
by (17.1.4) and (17.1.5) with 8=0 since y? l/i(2 1 " 2 E/0). Applying (f)-(h) and
630 THE UNIFORM EMPIRICAL PROCESS INDEXED
^n I supt LU (t)I
: t?1 -01 +sup j
IUn(s)1 :S^ B^
(
and likewise
fu >eI .
(k) P \II9^c^ e,n
(2) Z (f)= ✓n
i r
fd(P n —P) for fE c C()
_ =Z ,
1 n
[f(Xi)-Ef(X)]
n` /
^ {
INDEXING BY CONTINUOUS FUNCTIONS VIA CHAINING 631
Note that f(X1 ) is just a bounded real-valued rv, and thus Ef(X) is well
defined. Since
IY,(f)— a.s.,
the Y; 's and hence also 7L,,, can be viewed as random elements with sample
paths in the separable metric space (C( ), II II) where II II now denotes the
sup norm on the set C(SW) of all real-valued continuous functions on
Easy calculations show that
(3) E7L„(f)=0
and
this follows from the hypothesis (8) in Theorem I below and Dudley, 1973,
Theorem 1.1.]
Exercise 1. In the case' = [0, 1], P = Uniform (0, 1), show that for the family
of (discontinuous) functions ={1^o,t:0<_t^l}, Z„(l (o,. ) )=UJ , and
Z(1 Eo ,. ] ) = U where U. and U are as in Chapter 3.
To use Theorem 2.3.3 to show that Z„^Z, we need to show that {7L„} is tight;
and this will hold if 9 c C(') is not "too big.”
To make "too big" precise, for any metric space (S, p), let
( k
(7) N(e, S, p)=inf j k: Sc U V,diam (V)<2e
l 1=1
632 THE UNIFORM EMPIRICAL PROCESS INDEXED
•
where the Y are Borel subsets of S. Then H(e, S, p) = log N(E, S, p) is called
the metric entropy of (S, p). In our present context, N(e, 1, II II) is a measure
of the "bigness" of 9. The hypothesis (8) in the following theorem guarantees
that H (and hence N) grows sufficiently slowly as eJ,O.
then
1
(11) M e k ^' <— H(e, ^„ II lI) <_ Mr for all e>0.
We will not prove Proposition 1; see Dudley (1983) for a proof, references,
and discussion.
Since (11) implies that (8) with 1 =., holds if r> k/2, the following
corollary is an immediate consequence of Theorem 1.
Corollary 3. If' = [0, 1], P = Uniform (0, 1), and -9 = Lip (r) with r> Z, then
U,, U as n--*oo in (C(), II II).
f(mx)/m k/2 xE C1
.%(x)= t
o XE[0, I) k n C;.
f 2(mx) dx = Zk fzdP.
E[Z(.%)] 2 = f f dP = I
c, m m f[o,,]k
But
max{I7Lt ^^s'j'
1 I:S =^1}=maxj^ y s 7L( ;):S =f1}
' ' - j . ;
mk
= i =1 IZ(I)1
mk
= m - '` F, jZ; j where Z; are iid N(0, c 2 )
i =l
-+ a.s. c 2/ 1r as m - ► oo,
634 THE UNIFORM EMPIRICAL PROCESS INDEXED
while
as m -> O,
Il; l S' f II 2m'`^ z
_ y, then
A preparatory exponential bound is the starting point: if AIf —g,41 <
Zn(gr) — (1,(gm)1n(gm-f))I
m=r+l
m=r+l
by (c)
^ N ( 2 -r-a , `fir, II II )2 sup P()Zn (fr) —7tn (gr)I > dr)
l IIf,-g,Il 2 • -
by (b) with y = 2 - m
2
Now
log 3—log d,,, <_ log 3+2 log m <_ 4mm -a/20
(f) :4' dm/20<4' dm, 0-0 .
Hence
Thus (d) is bounded by 2 ZM =r dm < E, and this completes the proof of (a).
636 THE UNIFORM EMPIRICAL PROCESS INDEXED
To show that {7L „} is tight, let 0< < 1, choose E = e k = 773 -k, 8 k = S ek for
k ? I in (a), and set
Now if -9, is a finite set dense within 3, in ^, we can choose M so large that
(h) P(sup[IZ,(f)l:fE^,]>M)^2
for all n since E7L„(f) 2 = EY;(f) < oo. But then, with
the set K = A. n B,1 is compact by Arzela-Ascoli, and by (g) and (h) we have
Thus the laws of {7L„} on C(-q) are tight, and this together with (6) implies
(9) by Theorem 2.3.3. ❑
The Standardized
Quantile Process On
0. INTRODUCTION
(3) O.,(t)=Vn(t)•
for V. represented in terms of iid Exponential (1) rv's a 1 ,.. . , a,, 1 ,...
... and
637
638 THE STANDARDIZED QUANTILE PROCESS 0.
(t) )
(5) T(q,A)= t -'exp( t dt forgEQ*,
(6) asn-► co
if and only if T(q, A) <x for all A >0
then
V. 91o2 g n
(10) I m lb,:52 a.s. fora„=
I(1 —I )°n
for the special construction of Theorem 3.1.1. We now fix a df F and define
at p. For 0 <p < 1 we let Xn , p denote Xn:k , for any integer kn satisfying
we call such Xn:p a pth quantile, while Xn,(np)+, is called the pth quantile.
Suppose 0 < p, <. • • <p. < I and F ' is differentiable at each p ; . Then
-
F -i ( n:k„) — F-'(P)
(a) —F '(P))=
(Xn:p ]V n:k„ — P)
fn:k„ — P
= An^(6n:k n — P)
=A.[Vn(kn/n)+^\nn p/J
using (3.1.61) on the first term and using the continuous sample paths of V
and kn / n -+p on the second. The difference quotient A n satisfies
(d) An ßa.,.
- DF '(P) -
as n -* oo
t _ r
In:k.. PI S Sn: k„ n + n — P
kn kn
IIG'III+I nn— P
(e) -'a.5.0
using the Glivenko-Cantelli theorem (Theorem 3.1.3). Plugging (c) and (d)
into (b) gives, for the special construction,
The limiting ry in (6) clearly has the claimed normal distribution. This com-
pletes the proof for K = 1.
For K> 1 we simply observe that a random vector converges a.s. if and
only if each coordinate does.
We have actually established in (6). This implies that the weaker 4 d
of (3) holds for a pth quantile of this, hence for any, triangular array of row-iid
F rv's (not just for the special construction). ❑
tion centered at 0 with different N(0, p(1— p) /[ f(F '( p))] 2 ) densities to the
-
This format is slightly handier when we extend from the vector situation of
(3) to a process.
If F has density f, then the standardized quantile process 0„ is defined by
Theorem 1. (Häjek; Bickel) Let 0 <_ a < b<_ 1. If g —f(F) is positive and
continuous on an open subinterval of [0, 11 containing [a, b], then the special
WEAK CONVERGENCE OF THE QUANTILE PROCESS Q. 641
construction of (1) and (2) satisfies
we have that
(15) sup (g(t)/g(tn*) -1 1'$.5.0.
asisb
Thus (15), jjV n VII'as.0, and the triangle inequality applied to (13) give
—
Extension to 11 /q fl Metrics
such that
I
(18) F has a continuous density f that is positive on some
(—oo, oo) having F(c)=0 and F(d)=1.
(c, d) c
(19)
I Iff(c)= 0(iff(d)= 0),thenf is 7 (is \) on
some interval whose left (right) endpoint is c (is d).
642 THE STANDARDIZED QUANTILE PROCESS Q„
The monotonicity of (19) combined with (18) implies that for t sufficiently
small g(t)/g(t) is bounded by g(t)/g(bt) so long as cv E A„ E , where g=
f(F '). Define
-
q(t)g(bt)
for 0< t
g(t)
(20) qb(t) q(t)g(1 — bt)
for4<_t<1
g(1—t)
linearly, and continuous forks t < .
(a)
q g(t *n) q l
[ g(t) ] 4l-. -4/ + g(t) —1 ^0/
g(l*n)
Then for a = a E >0 chosen so small that g is 1' on (0, a) [see (19)], we have
(b) lnE
1
:(t) — V(t) g(t) 1V^(t) — V(t)j + 2 g(t) IV(t)I
for0t<a
q(t) J g(bt) q(t) g(bt) q(t)
(d) V
(V q ;
IV) 1 0 0.
Also Inequality 11.4.1 implies that a = a E can be supposed to have been chosen
so small that
v
I<_e.
(e) q
WEAK CONVERGENCE OF THE QUANTILE PROCESS Q. 643
Thus for n >_ some NE' we have from applying (d) and (e) to (c) that
)
l flo
(h)
9
P( (Q "
1 >_3e) <_3e
1 ^a
'
for n>some N.
P! (fin)
(i) sl<_e for n>_ some NE's.
9 0 /
This is just (21). See Shorack (1972b) for the first real extension of
Theorem 1 to (0, 1). The same key methods give this theorem [note Shorack
(1979)]. However, O'Reilly's (1974) result was needed to make things work
at full power. ❑
A condition that is often easy to work with was discovered by Csörgö and
Revesz (1978). Given their Lemma 1 below, it will be trivial to check that
Theorem 3 follows from the proof of Theorem 2. Suppose f exists on (c, d)
and satisfies
Theorem 3. Suppose F satisfies (18) and (22). Then the special construction
of V. given in Theorem 3.1.1 satisfies
for all q satisfying either (23) or (24). Here Q'n equals Q. on [1/(n+l),
n/(n+1)] and 0 elsewhere.
It suffices to consider when F is N(0, 1). Then f'(x) = —xf(x), and Mill's ratio
A.4.1 shows that (22) holds with M = 1. It also holds that
that
Exercise 2. See if (18) and (22) are satisfied for all (i) Uniform and (ii)
Cauchy df's.
1) ]
(29) for all 0<_s,t^1.
g(s)c
g() snt I— svt ) M (
(a) ^d logg(t)^ t t.
t( M t) =Mdi logi
(b) log g(t) —log g(s) <_ +[M log (t/(1 — t)) — M log (s/(1— s))]
=M log [t(1 — s)/s(1—t)];
Proof of Theorem 3. Now for 0: t <_ 2, on the set A., of Inequality 10.4.1
we have from (29) that
g(t ) tvG-.'(t) 1M. tv t /b ]M
g(t) I tnGn (t)I
Constants C[
(Abt
(a) = Cb -2M
Just replace the bound g(t)/g(bt) in line (b) of the proof of Theorem 2 by
the bound Cb -2M just developed in (a), and note that the rest of that proof
then carries through with q replacing q b . 0
(3) f is >' (is \) in some interval with left end c (right end d).
and M is as in (2). (We stress that this theorem holds for any iid sequence.
No mention is made of constructions.)
Proof. We follow Csörgö and Revesz (1978a). Now application of Taylor's
theorem shows
(8) =------dn(t)g(t)
V(t)g( g2
fn
2v 9 ([ )
(t*)
1 N „([) t(1— t) *It (1—t)
* S (t *) S(t)
(a) =[ - 2^ t(1 —t)JLt *(1—t*)J g(t * )J 8(t * )
We now define
9(log2n)
(c) a„= n
Recall from Theorem 16.4.1 that the first bracketed term in (a) satisfies
1 N2 ' an / 1 092 n
-
Consider now the second bracketed term of (a). Now by (10.3.10) from
A >_ 1 we have
A n = [G„`(t) is ever s to t/,t on [a,1]]
=[6(t) is ever :!:At on [a n /A, 1]] = B;
-:
and the Tim of G„ /I on [a n /A, 1] equals ß9/ A, by (10.5.12), with ß9/ , <2 for
A =2 by Figure 10.8.1. Choosing A =2 we thus have
0 = P(B n i.o.) = P(A„ i.o.);
and thus
(e) ImIIt /t *IIä„^2 a.s.
(g) I t*(1—g((*)
t*)S'(t*
an
II'-an )
by assumption (2).
The fourth bracketed term of (a) satisfies
a,\ M
by Lemma 18.1.1
Ilg( t *)Ila, an— \IItA t * 1 — (tV t * )„
( ' Ila
(h) —
< 2 M using (f) and Exercise 1.
n - M2"" 91og 2 n
(9) lim ^^^n ^n ^^ ö„ a" logt
— a.s when a n
n- 0 n
under hypotheses (1)-(3). Thus, since (0, a n ] and [1—a n , 1) can be treated
symmetrically, the theorem will follow if we can show
a log 2 n ^
(10) li m +IV 1i 0 18 a.s.
9 (logz n)
(j)IVn(t)1 `.ant <—.man <-
✓n
Equation (i), and thus the theorem, will follow once we show that
(1) li,m IjQ„ II0 - = o(r„) a.s. for r„ as in (6a)-(6c).
If t <_e„ :( „, ) , then we note from (11) and (3) that for n large enough
M M ( 1 \
<_(1—a„) - ,t t ds
SM)
^() (
M
(1 — a„) —
n ifM<1
1— na
(n) Wit)
log \ if M = 1
— (1 —a„
—M M
(1 M n 1 •^ t M-1 ifM>1
S n:(nt)
0( 1og2n )
ifM<1
(o) n ) O( lo g 2 n t ifM =1
< I log(a
e J n:l
^ ^
v n /
Jr an M-1
_j Or lo^nl
iif M> 1
` J
(p) = o(rn)
log (an/en:1)
(r) lim = 1 a.s.
n-= l ogt n
and
(s) Q
r " = O((log n)) for alle > 0.
Sn:l
This completes the proof. We note that our (s) gave us a slightly better rate
than Csörgö and Revesz obtained in case M> 1. ❑
Exercise I. Use (10.3.10) with A =2 and the ß9 x limit of (10.5.12) and Figure
10.8.2 to show, in analogy with the proof of (e) in the previous theorem, that
lim n —,!It/tII ,,2 a.s.
Exercise Z. Show that if 0<f(c) <o0 [if 0 < f(d) < 00], then monotonicity of
f in a neighborhood of c (of d) can be dropped in Theorem 1. [Begin with
line (11) of the proof.]
The following applications of Theorem 1 are also from Csörgö and Revesz
(1978a) and (1981).
(a) Uri+On=(Un+vn)+(On—Vn)
(a) on—vn+on—vn
b„ b„ b„
(14) li m io n +0$„ ^ ^
1
log n) 2 (log 2 n) „
where
(15) Bn=^(^')=U
where the middle term gives the rate. See Theorem 1, Theorem 15.1.2, and
Theorem 12.1.1 for the rates of the three terms in (a). ❑
When (12) and Exercise 2.9.1 are combined with Finkelstein's exercise
(Exercise 13.3.1) and Mogulskii's exercise (Exercise 13.5.1) they yield
J0(r) dt 1 f' 1 Z
(16) lim bZ = ^2 a.s. and lim b„ J „(t) dt =4 a.s.,
n+0
n o
Thus Theorem 1 leaves little room for improvement (see Csörgö and Revesz,
1981, p. 1953).
i(ct) -
(18)
.
1 as t-^0.
f(t)
If
Moreover, if a o v a 1 < 0, then all absolute moments are finite; while if a s v a, > 0,
then
Exercise 5. (Mason, 1982) Suppose F ' is continuous. Then for each r,,
-
rz > 0
Doksum (1974) indicates that if the X 's are controls and the Y's are
;
(3)
_
- g°G -^ °F
mit
m +n(m„
D) on(-,);
mit
(4) ®mn— g °G'°F m "i:m - G -`°IFm +G -' °Fm-G'°F]
+n[Gn
_ , _,
° m - G (^m)]
-g°G °F m +n[G
G -I ( F m )- G '(F) mit -
+ g ° G ' °F [i=m-F]
t=m -F m+n
(5)m+n
On(F)+vm(F);
and
using Exercise 2.2.6. The following theorem is now routine; see Doksum (1974)
for an early version without a q function on an interval [a, b] F [0, 1], and
Aly (1983) for this version.
mn(F`)Wmn]
II -*„0 as m n n-+oo;
(11) (I[® q
Let
m n
(12) Hmn= m+n Fm + m+n f'n
(13) T'" n —
mit (Fm(x)—Gn(x)I with q(t) - t(1 — t).
m+n .. up^^n q(^ mn (X))
654 THE STANDARDIZED QUANTILE PROCESS Q„
Now note that (using Section 1.1)
G (0 +I)=G„(G ' ° F) -
i n in
= nl ^ 1 [Y G - '-F( ) ] = n j L t 1[F- '(G(yj)),5.] a.s.
(14) = t„(= the edf of a sample of size n from F).
mit IF.(X)—Gn(x)I
(15) 1— a=PF_c1\
m +n a < up)b[O^ mn (x)(1 — H mn (x))] 1/2 ^ Kmn
Thus
defines K.
Define
m mit K2
N=m+n, A =N, M=--, c=M,
u = F -(x), v=Grt(X),
Since the coefficient of v 2 in d (v) is positive, d( v) < 0 if and only if v is between the
two real roots of d(v) =0. These roots are
1+c(1—A)2
ASYMPTOTIC THEORY OF THE Q-Q PLOT 655
when G„' is the usual left-continuous inverse and Gn'(t) = sup {x: G (x) <_ t}
is the right-continuous inverse. El
Of course,
This is from Doksum and Sievers (1976); see Switzer (1976) for q= 1.
Let (x) denote our usual greatest integer less than or equal to x, and let
« x » denote the least integer greater than or equal to x. Then the band (20)
can be rewritten as
on a <_ F m (x) <_ b (use X n . 0 = —oo and X n:n+ , +(X if need be). The W band
(q(t) = '/t(1 — t)) of Doksum and Sievers (1976) and the S-band (q(t) = 1) of
Switzer (1976) are shown in Figure 1 (from Doksum and Sievers, 1976). They
found that the W band performed well in the tails, and recommend it.
The width of the W band at x, when multiplied by mit (m + n), is equal
to [see (20)]
2K F(x)[1— F(x)]
(24) ^ p , uniformly on [a, b]
g° G - (F(x))
656 THE STANDARDIZED QUANTILE PROCESS Q.
g S• .
4 S . `'
D
2 _ r'
- 2r w —3
•
— 2 —1
s ,
0 1 2
Figure 1. Synthetic data: the level 0.95 S band (solid line) and the W band (dashes).
provided
+ rn +n [G'
Note that the first term in (26) is zero. The second term in (26) results when
the lead term u in both the h and h of (18) cancel when h + —h - is formed,
and mnJ( m+n) times the t2[ +4cu( 1 — u)]'' 2 term of both h + and h
contributes K F(1— F)/g o G(F); the other terms in h + and h - all wash
out since mit /(m +n)c -'0.
Exercise 3. Derive formulas analogous to (22) and (24) for the S bound. In
particular, show that in case q(t) = I for all t, then
[i(x), llmn(x)) =
g°
Of course K mn (1— a) and K now refer to percentage points of
mit/ (m + n) F m —6,, j) and IJ U JI, respectively.
Note that we cannot hope to estimate F '(t) for t> F(r) since all the observa-
-
tions are <—T. Assuming that F has density f, the standardized product-limit
quantile process is
where B = D(F '); see (20.4.12) for the definition of X. Thus V is a mean-zero
-
with
J [a,:]
(1— u) -2 (1— G(F '(u))) ' du.
- -
(a) L(F-')=V'[tn(F-')—I]
satisfies
(b) 1Ivn(F-')—X(F-')IIö-*a.5.0 as n r
for 9 < F(r). By Vervaat's (1972) lemma (see Exercise 1 below), (b) implies that
satisfies
where V —X(F - '). But, by (d /dt)F - '(t) = 1 /g(t) and the mean-value
theorem, it follows that
Vn = —
F, ( gYn
F(i -) — I
(f) 1
11'n—Yn11öC g(In) -1 , 11 ly 11 _...0 n 0
by (d) and the continuity and positivity of g. Combining (d) and (f) yields
where y is a continuous function on [0, 1]. If x (t)= inf{u: xn (u) >_ t }, show
that
L- Statistics
0. INTRODUCTION
( I /
( 1 ) Tn L Cnih(Xn:i)
n i.,
--
asymptotic normality of T„, as well as a functional CLT and LIL for both
T,, ... , T. and T„, TT+ The proofs are all contained in Section 4. General
,, .. ..
examples are given in Section 2, and examples built around randomly trimmed
and Winsorized means are considered in Section 3.
Y_
c„,, ... , c„„ denote known constants and let h denote a known function of the
form h = h, — h 2 with each h i T and left continuous. Consider
in /
(1) Tn=— Cnih(Xn:i)
n ;_i
(2) T„
f
= o ,
h(Fn')J„
l
0
dt = h(F) d'Y„
660
STATEMENT OF THE THEOREMS 661
where we define
(5) µ„
= fo, h(F - ') J„ dt = J 0
, h(F - ') d 1'',,.
where
for iid Uniform (0, 1) rv's . Let 6„ and U. denote the empirical df and
empirical process of f,, ... , i,,. Define
Then [we will verify step (11) below in the next section]
=J 1
0
g(G )dVn — gd'`P,,
0
662
(12) _—
f ol
[6,, —I ]J dg (note the approximation sign)
L- STATISTICS
1 n i
[I11—
, t]Jdg
n i=i o
n
n
(13)
_,
=1^Yr° „•
i r.
Thus, the key to this approach is to control the size of the error made in using
the approximation of step (12). That is, we need to control the error
n
( 14 ) Y, =Tn E Y, = T
— µn -1n ;_, . — fA. — n
n
The Assumptions
where
and
(19) h1(F -1)I <Don (0, 1) for i = 1, 2
STATEMENT OF THE THEOREMS 663
where
(20) D(t) = Mt -d ß(1— t) -d= for 0<1<1 with any fixed d,, d2
(For a CLT or LIL, we will require a <; while a <1 will yield a SLLN.)
integral ,( dg; integration with respect to the total variation measure will be
denoted J dlgI.
Remark 1. Suppose g>_0 is \, and has Jö g(t) dt < oo. Then tg(t) ^
1.g(s)ds-+0 as t-->0.Thusforh J', ;
(22) EIh;(X)I'<x
implies
(24)J 0
[t(1— t)]'B(t)dlgl(t) <oo ifr>a.
Thus the ry
of (13) satisfies
(26) EY = 0 if a < 1,
All proofs for this section are grouped together in Section 19.4.
664 L- STATISTICS
AssuMvrlox 2. (Smoothness) Except on a set of is of Igo-measure 0 we
have both J is continuous at t and J. -* J uniformly in some small neighborhood
oftasn - oo.
ASSUMPTION 2'. (Smoothness) Suppose
satisfying
1-
n 1,n ,1 <—i <n
l
(31) c,, ; =J(t) for somete
[The interval in (31) could be widened to width Min. but this should be
sufficient.]
ASSUMPTION 2". (Smoothness) Suppose J is Lipschitz in that
and suppose the c,, ; satisfy (31). (This is a special case of Assumption 2.)
(ii) Suppose Assumptions 1 [recall (23)] and 2' hold with a <Z. Then
0
STATEMENT OF THE THEOREMS 665
and
(36) ../ii (,, — µ) ^ M[EI h,(X )I + EI h2(X )I ]l.%n.
(iv) If J equals 0 on [0, a] and [b, 1] for some 0< a <b < 1, then Assump-
tion 1 may be omitted entirely in (i) -(iii) while Assumption 2, 2', or 2" need
only hold on [a, b].
(v) In all six cases above, the ry f ö JU „ dg satisfies
(37)
o
Jv„ dg = JU dg
a fo'
or
(39) '/i( bn — µ) ms [—u, a]
as. asn-400
n
Theorem 3. (SLLN) Let Assumption I hold, but now with a < 1. Then
(40) T„ — µ„ -* a.s. 0 as n -* o0
or
(41) T as.
T N as n -* o0
k(Tk —µ k ) k-1 k
(42) 7L„(t)= for <t—,1<_k<_n,
fo, n n
1(1)_ T Irk) n
(43) for n <t-- k?n
U k+1 k'
with ß l„(0) = 0.
666 L-STATISTICS
(44) Z Sandi'Son(D,9,^^
= 11) asn-oo
In Chapter 2 we showed that both (44) and (45) hold for the partial-sum
processes of the Yk 's of (13) defined by
1 (n!) Yk I
(46) S„ - 1 0' (the past)
and
(ii) (LIL) Let b„ = 2 log e n. We have the functional LIL for the past
(54) max
ka:n
6J yk as ^
.
0 as n .
(55) yn -^ a . s . 0 as n oo.
(56) T„ Y_
Remark 2. Consider the special case of (1) defined by
_—
n
I n
i_ 1 c1 —urn
c„ h(X„ )
; :; where 4 /n = J , „
;
j
J(t) dt
„')J„ dt = f
r
J g(G„) dI'
i
= g(G ol g(G n )Jdt =
-
0 0
(57) = I g d 4r (G ),
0
n ; _ 1 n
668 L-STATISTICS
Thus for the special scores c,, ; of (56) we have µ„ = p, and (33) could be of
direct interest to us in this case.
(60) ;
c =J(t) for sometc t n
-1 i
, n ,1<i<n.
Then
(iii) If J is Lipschitz, EIh i (X)I <cao for i= 1, 2 and (60) holds, then we
can even claim that
Remark 3. A careful look at the proofs shows that extending from J^ -> J to
a random function J„ - J can typically be adjusted for in the proof of asymptotic
normality with only minor changes.
Better Rates for Embedding for T, with Extensions to T., Based on the
Hungarian Construction
To establish the following less important result, we require the special scores
(56) used in T, as well as the added smoothness Assumption 2' on J.
Remark 4. Let Y denote a (0, o- 2 ) rv. Komlds et al. (1975) showed that if Y
has a finite moment generating function in a neighborhood of 0, then there
exists iid rv's Y?, Yr, ... , Ye,... distributed as Y and a Brownian motion
S on [0, oo) for which
This is the best possible rate of embedding. If we only have Ej Yj' <x for
some r> 2, then the rhs of (66) is to be replaced by O(n - ( ' /2) + 1) ) a.s. If we
only have EY 2 <OD, the rhs of (66) can be replaced by either o p (1) (for a
functional CLT) or o(b) a.s. (for a functional LIL).
Remark 5. If we plug (64) and (66) into (48), we see that if Assumptions 1
and 2' hold with a <', then
(nt)(T(n,)-p. ) S(nt)
lim sup I
n -+coOstsl ^o, vi
Y* _ S(nt)
I a.s.
,ntt
n(67) -= i ii su p S a.s.
n-.ao
Otss l .'/ j Ø
for the KMT construction of rv's Y*, ... , Y n*, ... of (13); this holds since (64)
implies that max t .5 x < n kIyk l/.fi= O(b /v)= o((log n)//i), which is better
than the best possible rate O((log n)/s) of (66). The appropriate rate for
the rhs in (67) is obtained from the Y; 's from Remark 4. The key point is that
the rate is determined solely by the Y; 's of (13) or (25), and not by the y ; 's.
If we wish to replace Tn in (67) by Tn , then we also use (62) to determine the
appropriate rate that is possible. [Equation (65) is just an unused observation.]
A Brief SLLN
If h is the identity function, then for r, s> 1 having r - ' + s - ' = 1 we have
<_ [Jo IF
(69) 0 a.s. if h = I
670 L-STATISTICS
i n 11
(70) lim -E jcn; j' <e0 a.s. and E(XI s <oo with-+-=l.
n-^ n ;_, r s
Note that this does not require any limiting function J to exist. For the proof,
merely apply the obvious analog of (2.6.4) or (2.6.8).
Estimates of Location
Suppose
J
1
(7) = N(0, u 2 )
(12) f
%(N) l/2dU =i(µ) 0 1[1
/2,l]dU
( ❑
(13) N 0'4l2(µ)/.
Example 3. (The sign test) The sign statistic (centered at µ) can be represen-
ted as
F
(15)
%n (X„ µ x ) a 1
fo ,
-'dOJ
o2 <oo and F has
derivative f as in (11).
' dU
fl/2
Note that the covariance of the limiting ry is (let h, = F ' with h, = µ, let
-
Suppose F has density f on (—cc, cc) with finite Fisher information for
location 10 (f). As an alternative to the P. of (1) we consider
i 1 f(r1)
and according to Theorem 4.5.1, when the null hypothesis P n is actually true,
where
(21) Zn _ —1 Y- f (En;) = 1
vI1 i =I JJ
E ß0(m1)
(E V ^! 1=1
with 1O
0
F1
Jo F —
(22) =Zoe= f 1 ßo dU
a
=— J O x df(x)=xff + J ^ f(x) dx
(24) =1
and
1 1
Coy[Tmed, Z1oc]_ ,' 4odt=— f(x)dx
f(l-^)f 12 f(L) Jµ
(25) =1.
Thus Theorem 4.1.4 (Le Cam's third lemma) gives
as n -4 oo under Qn
(26) L ✓n Xn — µ I -* d N[ [ ], ]
provided Q 2 < oo and II (f) < o0
for the asymptotic joint behavior. 0
SOME EXAMPLES OF L -STATISTICS 673
Suppose tests of Ho : Pn vs. H,: Qb, b> 0, can be based on either of two
test statistics. How do we define the asymptotic efficiency? Pitman efficiency
is described as the limit of the ratio of the sample sizes that produces equal
asymptotic power for the same sequence of alternatives. Suppose
Then if the T. and T„ tests are compared, equal alternatives requires [see
(18), and note (20)]
b b
be be
T T
Thus the Pitman efficiency of the Tn -test with respect to the Ta -test is
Example 5 (cont.) The Pitman efficiency of the median test with respect to
the mean test is thus
using (20)
]_
2 2
[
JO z \. l 7 n j ^lJ
1 ^(.n
z z
(34) ^ I Z^ (f) — Iö Z T — Z^ ( f, if T„ (some T) ° T(U).
)
(35) Tn =Z/I
n
0 (f), typically, for the MLE.
(36) Z 0 /I0(f)
However, if we used a X; percentage point, then (34) shows that our test is
asymptotically conservative; it gives us a representation of the correct
asymptotic distribution that assumes only that 10 (f) < oo and that our estimate
9n satisfies vfn— (4, — 0) =,, t —= T(U) under Pn. ❑
Example 7. (The linearly trimmed mean) Let 0< a < 2 be fixed, and let
a n = (na). For n even (we omit n odd) we define
(
(37)
7'o
'n
1
— L.
‚i/2
n
i— a —1/2 / 1 _.J!
a\ 2
(Xn:i+X,:n —i +l)
(_
n i_ a . +I n 2n
and
f. (i— a )z
as in Crow and Siddiqui (1967). Then
Ja
Figure 1. J for linearly trimmed mean.
0 a 1—a I
2
where
r
4o(s) ds dg(t)
(41) =1 ifI0(f)<oo.
°
Thus, provided I (f) <oo, we have
1n Cl 1 1'
(44) TL„ — J(t)F '(t) dt I =TL = — JU dg
-
I n r=1 Jo J Jo
676 L- STATISTICS
Cov[T,Z10 ]= J ^J(t)dt
0
iflo(f)<co
in
(45) = 1 if we have location equivariance: — c^; = 1.
v2
(47) STL, m*a"=Var[TL]
(i /n)E ^= ,J(i /n+1))X^:;. Derive the asymptotic form and covariance with
the Z10 of (22). Specialize to (i) logistic H, (ii) Cauchy H, (iii) Student's t,„
df H, and (iv) the extreme value H.
Other Examples
i n
- n
(— n+2i -1)X^:i \2)
(
(49) _— I
in
SOME EXAMPLES OF L-STATISTICS 677
+s
Then Theorem 19.1.1(iii) gives (assuming E IX IZ <x for some 5>0)
(50) '(T„-0)-ß d
with
E 2(1-2t)U(t)dF - '(t)=N(0, Q2 )
(51) «2 =f f
1
0
, 4(1- 2s)(1-2t)(sAt- st)dF - (s)dF - (t)
o
since
0
(52) µ= J
h(F - ')Jdt=-2 J 0
t F - '(t)(1-2t) dt
J
2[ o F - '(t)tdt- Jo F - '(s)(1-s)ds
I
('t ('t
J
=2 F(t) dsdt- F-'(s) dtds
0 0
J o s
t t
2
J o s
[F-'(t)-F-'(s)] dtds
s)I dtds
fo , ot IF - '(t)—F - '(
=0.
k
[(p-,(3i-1 [(D - , ] k und (D -t
3n+1), Xn:i-EXk+tl =-(
n t J ° Jo
t
( 53 ) _ Ud[V-t]k +t = N(0. 1 [ EX zk +i_ (EXk+t) 2
]
k+1 o (k +l)2 /
for integers k>_ 1. Show this. [Shorack (1972a) missed the 1 /(k+1) 2 in (53).]
Example 10. Let X t , ... , X„ be iid Bernoulli (0) rv's. Then g(t) = F - '(t)
equals -oo, 0, 1 for t=0, 0<t<_1-0, 1-0<t<_1. Let J(t) equal 0 or! as
0< t< 2 or z <_ t <_ 1, and let c,, = J(i/ n ). Then T„ (1/ n) E t cn;X,, : , equals Z
;
if more than 2 of the X's are positive, while T. equals the proportion of
positive X 's if less than 2 of the X,'s are positive.
;
678 L- STATISTICS
(i) Suppose 0 = 2. Then ln(T" — 2) is asymptotically distributed as a ry
having a N(0, -a' ) density on (—oo, 0) and having point mass 2 at 0. Note
that J is not continuous a.s. Jgj; hence the hypothesis of Theorem 19.1.1
fails.
(ii) Suppose 0<. Then .Ji(T" —2) is asymptotically N(0, 0) by Theorem
19.1.1.
(iii) Suppose 0>. Then f (T" — 0) is asymptotically N(0, 0(1-0)) by
Theorem 19.1.1.
Let Xn ,, < • • • < Xn : n denote the order statistics from a random sample
X,, ... , X. of size n from a population having df F. Let 0 <_ a < b <_ 1. We let
B ont>b
(1) g°F-', A=g(a), B =g(b), ga,b= g ona't--b
A ont<a
where we suppose g, a, b are such that A and B are both finite. We let a"
and ß, denote integer-valued rv's depending on X,, ... , X. for which 0 <— a" <
ß" <_ n. The randomly trimmed mean is defined by
1 p^
(2) T" =T"(a",n—l3")= Y_ X":,,
JGn — an
R.
(3) Tn = Tn(a", n — l3 ") = 1 a "X":.,+, + X":i + (n — ß")X":p.
n i =a„ +1
We call a"/ n and (n —,6,)/ n the random adjustment percentages. The key to
the asymptotic behavior of T. and T "* is contained in the next lemma. We
follow Shorack (1974) throughout this entire section.
Theorem 1. Suppose
la1Ja ]
-[B-/L],^^ß"-b^„
where
=aA+
E xdF(x) +(i —b)B.
(12)
a
^(Ta — u) — 6 1 a U dg=
b
f 1
b—a o
l &,bdU^
(13)
a .b lo f ('
Likewise
=T *(a, b) =—{ ! U
b dg+ag'(a)D_1(a) +(1— b)g'(b)U(b)
a (Ja JJ
for the special construction
(17) = N(0, o,(a, b)),
where
a
Q* (a, b)= x 2 dF(x)+aA 2 +(1 —b)B 2
A
(22) S„ mI —
1 p”
Xn:; — f ß^/n 1 f
g dI = —IJ„ dg,
b
.
n i =a„+I a„/" j a
since each subsequence n' contains a further subsequence n" on which Ja (r)
I ( ‚ / „ ß ^ / „ t (t) and J(t)=1 (a.b] (t) satisfying J„•-(t)->J(t) for all try {a, b} and
since gI({a, b}) = 0 under (5).
(i) Now write
n r ß,/n b
(ii) Write
fl,/n f b
. l
fi(T n—µ*)={ Sn+.in n gdI—
/n
gdl
I
+^n a"Xn:a" +i —a"g(a)+ an g(a) — ag(a)]
En n n
+VRr n nßn nß" g(b)—(1—b)g(b)11
x":Q„— n nß
ßn
(c) ^Sn+n/ [Xn:a n +lg(a)]+ n n [Xn:l3^ — g(b)],
using (8)
(e) ^g'(b)^—UJn(b)+^(Pn—b^J
Note that
a (8 „ - 0) (B„ -B)
-
( c) =Un(a)+f(-B)-
a
► [( 9n,-0)-(B„-B)]
RANDOMLY TRIMMED AND WINSORIZED MEANS 683
while
n—b)=1J„(6)+,ßn(i — b)
(d) =UJ
a
„(b)+f(B)./n [(en— e)+(B„ — B)J.
Thus (9) gives
, /n(T„ — A*)
-a
—ag'(a)[—f(——B n +B)—f(B)V(B^+B„—B)]
(f) II —
f I-aU„ dg+af(B)g'(a)2 n(en — 0) ,
=— I a UJ„dg+2a ✓n(6„-0)
Ja
(g) = (1-2a) T„(a)+2a f (e„— 9)
as claimed.
Note that if assumption (23) is dropped, we can still claim
vn n —µ ;k )=(b—a)Tn ((na),n—(nb))+(a+l—b)^(e„-9)
a
(29) —a^(Bn—B)+(1—b)^(Bn—B).
(37) y"= A(H,,) where A: [1, co) [0, 2) with At and continuous;
n
thus, lighter-looking tails cause us to trim less the second time. Note that the
interpretation of the weighted combination on the right-hand side of (35) is
thus very reasonable. We note that
(38) 2a = A
(V E(X — 8) 1 2
EjX -01
for the choice of H,, above. This appears to give a very reasonable adaptive
estimator. The natural way to Studentize ✓n (T* —8) is to divide by V* where
, *2 _ 4a 4a2
(39) Vn — SYY+I —2aoSYz +(1 -2ao)2Szz
T9(2,2)
sample
H n Hn
— pseudo-sample
r•
xn
Figure 1.
Suppose we use the ordinary trimmed mean Tn (ao ), with 0< a 0 <, as our
preliminary estimator 0 of 0. Then the total number of observations trimmed
is
Xn:a^„ if i < an
(43) Zi = Xn:; ifa,,+,^i<ßn.
X,,: ßry ifi>ßn
Note that the sample mean Zn of the pseudo observations is exactly the
Winsorized mean T,, (a n , n—/3,,). Let
n
^` (
(44)
) V n= L (Zi — Zn) 2 ,
n-1 i =,
T9(2,2)
sample
•
•• • pseudo-sample
Xn
Figure 2.
686 L- STATISTICS
and note that this is the V„ of (20). It is proposed to use
Example 4. (Nearly symmetric random means) If (23) and (26) hold, and
if the random adjustment percentages are equal to the extent that
(47)
an
- n
=_o1/2
n — —n Ya p (n ) where n
an
a
some a,
1 2a fal
—1 a a U d
and
This would be true if your choice for 0n was a nearly symmetric randomly
trimmed mean ä la (48) (or if it was a Huber-type estimator). Now define
(S 1 ) y,, =2ä,,,
and repeat Example 3, from line (42) on, verbatim. Nice estimator! ❑
Example 6. Let 0< a, < • • . < a K < 2. Suppose c„ ; = c,, ; (X,, ... , X„) -+ P c, as
n * oo for I i i ^ K where Y_K I c,=1. Suppose F is symmetric as in (23) and
-
Y_
(28), (45), and Example 5 [use the temporary notation M(a 1 ) for any one of
them] all satisfy
(52)
ß[i =1
cn1MM(a,)
a i =1
- 8] _E c.T ((na,))•
Johns (1971) shows how the c,, ' s can be chosen so that the asymptotic variance
;
of (52) is uniformly close to the Cramer-Rao bound over a large class of rather
smooth F. Moreover, K =2 is shown to produce good results. (Note that
F need not be symmetric if the ordinary random mean of Example 1 is
used.) 0
1 'm U
7,,,(a)—
(53)
1-2 a f a
dg
1 1 a
1/z
(55) 1 2a a § dg,
where
U(t)+U(1— t)
(56) S(t) _ — (Brownian motion on [0, 2]).
Shorack (1974) extends this to an analog of the two-sample t-test, for which
F need not be symmetric.
4. PROOFS
_—f 0
` [ 4 rn(Gn) —41 n] dg+
since the interval 0< t < f'n ,, we have under Assumption 19.1.1
-M f , [t( 1— t)]' -b dg
[t( 1— t)]'Bdlgl
fo, 0
CM[t(1—t)]r-bgI0+MJo I [t(1— t)]' -b ) dt
g dt
f
^0 -0+M o[t(1 —t)] -"I
j [t(1
— t)]' -b f dt using a < r
s M' f
ol
[t(1 — t)]' -b -a- ' dt
EIYI^E I 1 [, l]—tIIJidlgj
0
(e) J[ra]odt<cc
ifaO<1,
and set 2 + 5 = 1/a for the boundary value. We can thus apply Fubini's theorem
to find
Var[Y]=j
0 f o,
dg(s) dg(t)
=I
0 o
J [s" n t —st]J(s)J(t) dg(s) dg(t) =Q 2
as in (19.1.27). ❑
690 L- STATISTICS
_f { ' Gn(t)
n
jG() J,,(s)ds— J(t) }[G (t)
— t
— t] dg(t)
[ We define the ratio to be 0 if G (t) = t.] Consider A. Since IIG, —III -> 0 a.s.
by Glivenko-Cantelli, Assumption 19.1.2 implies that
(4) for a.e. w we have A n (t) -* 0 a.e. Igl as n -> oo.
IAn(t)I^[B(G )vB] +B
^ M0[t(1 — t)] -(n+O) a.s. for n ? n e,^,
using Theorem 10.6.1 of Wellner. For 0< t < fn ,, we have G. (t) = 0 so that
A n (t)I< J r
0
B(s) ds /t+B <_Mt I- b /t =Mt -b,
(5) IAn(t)I <_ M9 [t(1— t)] -(b+0) on (0, 1) a.s. for n ? some n 0,.
Case (ii). (LIL) Recall b n = 2 log t n. In this case for n >_ n 00, we have
from (3) that
lim —m i ,/z -e
n-.ro l n -. ao
xS lim
I n o0 0
J ^IA (t)I[t(1—t)]
n 112 - 0 dg(t)}
11
(6) ^1 •0=0 a.s.
PROOFS 691
(provided 0 was chosen so small that a+20<12) using James's theorem
(Theorem 13.4.1) to obtain the 1 in (4) and using the dominated convergence
theorem, with dominating function
guaranteed as in Lemma 19.1.1, to obtain the 0 in (6). Now (6) trivially implies
max
k?n
b IYkl -^0 a.s.
bk
as n - oo,
and thus (19.1.54). Also, for K E and then n £ chosen large enough, we have
Case (iii). (SLLN) In this case, for n > n 0 , we have from (3) that
using Lai's theorem (Theorem 10.2.1) for the first 0, and using (4) and (5) and
the dominated convergence theorem with dominating function
b -2 e
M0 [t(1— t)]' —
guaranteed by Lemma 19.1.1 for the second 0. See Wellner (1977a) for a version;
see also Sen (1981).
u
II i2-0
(9) 1— t)
II[I(1 I)]' /2 Bit Jo jAn(t)I[t( dg(t) =Zn• nn
max k
l kn
o r-
- max ^Zk A k
lsksn n
e + M [e/ M ] = 2e
Q with probability exceeding 1- s,
E
since
since
n
(13) max nZk = Op (1).
Versions of this CLT appear in Sen (1978, 1981) and Govindarajulu (1980),
and a somewhat different version in Mason (1981a).
To improve -* p 0 to as. 0 we would need to expand to a J' term. If we
do, uniformity in g, Jn, J using a common B, D is fairly straightforward. The
proof of Theorem 19.1.7 should serve as a model for both of these remarks.
Proof of Theorem 19.1.6. Now
f
1-1/n
DB
+
Bdt+)D ^-
I(1 —I)dt
(15) ^^( 1/n
o 1-z/n V /l
:M ./ t
f. 2/ n
°dt +M—
j J
^
1 ( 1/2
t ° - 'dt
M
= (Mn'/ 2 /n' -° ) +Mn - '/ 2 n° =nZ °
as claimed.
PROOFS 693
It's easy to show from (14) that
, 2/ 1/2
(17) ^.‚/ t (a+e)Mt bdt+,n^
-
-
t (a+o)n 't (I+n>dt
- - -
i/„
J)
i
+ (analogous terms for
(19) —2ny„=-2[n(T„—µ)—S„]=2n
t ('I'(G„)—If—(G„—I)J]dg
J r (Gn)U2 dg
ft
0
for some 6*(t) between G(t) and t. Now the integrand of the middle term
of (19) is bounded by
J
J2ny,I ^ IIU,/g1I 2 ^ IJ'(G )Ig 2 dIgI
0
Hence (52) and (53) follow from James's theorem (Theorem 13.4.1).
Representations with a lesser rate appear in Govindarajulu and Mason
(1980) and Sen (1981). ❑
CHAPTER 20
Rank Statistics
Let X 1 ,. . . , X„ n be lid having a continuous df. Let R„,, ... , R nn denote the
ranks and D 1 ,.. . , D the antiranks of these observations. Let
cn ,, ... , cnn denote known weights and d,, 1 ,. . . , dnn denote known scores. We
wish to consider the linear rank statistics
1 i
(3) hn(t)=dni for —L t<--and l=imn
'
n n
Remark 1. The key result we will try to mimic is (6.1.49). Thus W n denotes
the weighted empirical process. Moreover, W. (t)/(1— t) is a martingale on
[0, 1), and we have the relationship
(1) W,(t)
= f 1 1 s dM,(s)
695
696 RANK STATISTICS
1—s
ds for 0 <_ t^ 1
of (6.1.41).
We now turn to ranks. We will make use of the following notation. Let
ø(p,1) __ R.
(4) Z; = for 0< i _< n —1, is a martingale.
1—/ n
i 1— i/ n'
We will also let
(5) M\i
for an appropriately defined basic martingale M. The analog of Eq. (1) that
will provide us with our definition of M. is
n _ n
(6) OZZ; — Z;_,
=n—i +l aM' n—i+l (M' —Mr-I)
for 1 <_i<_n -1.
_ n n n—i
n—i R ' n —in— i+1 R, ' -
(7) AZ'
= n [ n_i +1 C + 1 R ]
—i'-'
n—i +1 n—i n
_ n
(8) n—i+1
provided
(10) =u[Rn(t):025t^Pn;]=o[C1,...,C;].
THE BASIC MARTINGALE 697
we have that
i
(12) fvii,, —i <n (with Mo =O and iM„=0)
=M;= AM; for 0<
n j=1
satisfies
/ t
(13) 1^1A„ 1 n , 0 <_ i <_ n, is a martingale with respect to the p"'s.
(14)( ±) = I C,+
;=, \ n—j R^}=R,+1n;.,± 1—j/n
l„(!„;)
Ri
=R„(p„r) +i
n j =, 1—j/n
note that this is exactly the same process as the of (6.1.44) when F = I.
Recall from (6.1.44) or (6.1.24) that
1 n —i +Z
rn— (i —1)I 1+ 1 I
n (n —i) L \ n-1/ J
_ 1 n —i +l 4
(17) for 1<i<_n-1 and n?2.
n-1 n —i ^n
698 RANK STATISTICS
Thus
of Theorem 3.1.1.
Proof. Note that
n+1 f P^•^^ 68„ i i —I
(21) 1 n (t) = 68 „(P„,)+ n forty<_t< n, 1 <i<n,
o 1 —I„ dt
where I„ (t) = j/ n for p, <_ t < p + , and 0 <_ j n-1, with the convention that
;
the term Il / (1— I„) in the integrand of (21) equals 0 on [p,,,, 1]. Now
II68 n —W a.s. 0 by Theorem 3.1.1. Apply this to (21), just as was done in the
proof of Theorem 6.1.1, to obtain JIN^III„ — lyll II -a_5.0. Now apply the Birnbaum
and Marshall inequality (Inequality A.10.4) using the proof of Theorem 6.2.1
for a model. ❑
by plugging (14) into the rhs. Show that this, in turn, gives
(23) R„ (PIU)
= Y 1 OfW„ for 0 <_i<n.
1—i/n - 1 1—j/n n
These are the analogs of (6.6.3) and either (6.6.2) or (1), respectively. Use
Theorem 20.2.1 to justify passing to the limit in these two formulas. Equation (23)
gives
(24) W(t) =
1—t
J r
o
1 dM(s)
1—s
forall0<_t<1;
We also suppose d„ 1 , ... , d„„ are known constants, and we define a scores
function
i -1 i
(2) hn(t)=d„i for —<ts, 1i<_n,
n n
I Cl
T. ni c,, E, = O h„ dR n .
(3) .l c,c d
Suppose, additionally, that the data comes sequentially so that only the
first k order statistics are available. This could well be the case if we were
dealing with life testing data. If at a particular moment only the first k order
statistics are available, then (3.1.33) shows that D„,, ... , D„ k are available.
We can thus compute
1 (k' C Pnk
Pk
(4) Tnk = d
CSC ' ' dniCnD ., =fo hn dl n
f
"
Rn
( 5 ) =hn(pnk)R n(Pnk)— dh ne
P
recall (4) and (3.1.63). We know from Theorem 6.4.1 that ET(t) = 0 and
for 0<_ s, t ^ 1, and we compare this with (6). See (3.1.39) for the a-notation
used
used in (9).
We consider II /q II metrics where q is ‚" on [0, 1]. We require that the
function q satisfies
We now require that the h„'s converge to some limit function h at a rate
sufficiently fast that
Theorem 1. Suppose (1), (10), and (11) hold. Then the process T„ of (7)
satisfies
for the o of (3.1.38). Häjek and Sidäk (1967, p. 163) prove that (13) holds
under these same conditions with q = 1.
k i
(c) =—^(1-- Od;+jZi+dk+I Rk
n
k
(d) _ —> OP;+, ZI + dk +l R k with Apr+i = p; +I Pi —
provided
C1 — -
i- 11 1 <i<_n +l.
for 1_
(e) p
n Jd + —(d
n ; l +• • • +d; _,)
Note that
(f) P. = Pn +l = dn•
^k•
Tnk —pIAZa — pk+iZk+dk+IRk
k
_^ PjnZ; —Zk (d,+...+d k )ln
(d,+• • •+dk)
k n OMA —Zk by (6.1.6)
(g) —^p'n —i+1 n
— k j d +d,+...+ d1_, +...+dk) Z
II, i+i
n—
J
1 AM — (d^
n
k
k (d,^.....+dk) k d 1 +...+d i _,
(14) =^d;^M;—Zk
n +Y n—i+1
ln k/n
(15) _hndM,,—{On(k /n)/[1—k/n]} hndt
f Jo
f,
k/n 1 {ns} /n
for {x} the greatest integer less than x. Compare (15) to (6.4.14).
Applying the Häjek and Renyi inequality (Exercise A.10.3) to (15) [as the
Birnbaum and Marshall inequality was applied to (6.4.14)] gives
( 16 ) P(IlTn/qJj nk>3E)
(' k/n j' k/n k/n
f J <4s 2
o
(hn /9) 2 dt +2 J
0
hn dt
0
q-2 dt/(I — k/n) 2
702 RANK STATISTICS
where the factor 4 comes from (20.1.17). Writing h„ = (h„ — h) + h and applying
(11) shows that we can choose 0 = 0, in (16) so small that
<E
for the function h,„ of Proposition A.8.1. Now for n >_ n, we have P( y„ I ? E)
(by (16) and since
J• 1-0
(j) (hn — hm) 2 dt<_2I[h„ — h]r 2 + 2 1[h,„ — h]I 2 = 0
e
by (11) and Proposition (A.8.1)), P(yr3 >_ E) < E by (6.4.15) and f e (h,,,
h) 2 dt -* 0, and y„ 2 -* p 0 for each m since
so that
(26) Var[T' > —T;,Z ^]=Var[T;,"]—Var[T Z ^]=o —Var[T;,2 ]
(28) Var[T]->o-
n
(30) T(v) = E b„(R„;, v) with b„ (k, v) = P( < vI R„ = k)
; ;
+_^ c c
3. CONTIGUOUS ALTERNATIVES
The Model
Let denote a a-finite measure on (R, ß). Let X„ ; denote the identity map
on R for I C i <_ n, n ? 1. Consider testing the null hypothesis
all densities are with respect to µ. We suppose that f 0 =f and also write X ° ,
for X„ ; . The special construction for this model is X, = (F 1 )( 4j. We agree
that in this section
where the space of all square integrable functions h is denoted by 2'2 (µ). Let
pn and q„ denote the densities of P„ and Qnb with respect to µ x • • • X µ.
We assume the existence of constants a„,, ... , a nn , n ? 1, satisfying
z z
(4) max a„ = max „ a „' 2 - 0 as n -+ o0
1'-^i^n a a ‚in i =1 ani
where
a = an = anls...,a„„ .
We let rank (a„ 1 ) denote the rank of a„ ; among a„,, ... , a n „; in case of ties we
will refer to possible rankings of a„,, ... , a„ n . We also assume the existence of
a function
(5) S in 22 (µ)
for which the densities satisfy our key contiguity condition
n
Ia ] 2 0 as n-
(6) Y [T
2s-F
(7) 30= on (0, 1).
f ,F-'
CONTIGUOUS ALTERNATIVES 7 05
We wish to consider linear rank statistics T" under the alternatives (2); see
(3.2.30). Thus we introduce constants c",, ... , c,,, satisfying the u.a.n. condition
We will appeal primarily to Hodges and Lehmann (1963) for direction, and
to Jureckovä (1971) and Theorem 5.9.1 for technique.
Testing
Let ß,,1,. .. , ß n " denote the rv's of the reduction defined in (3.3.31). Now the
R 1 , . . . ‚, Rn" of Xn,, ... , X „” also serve as possible ranks of ß n ,, ... , ß nn .
ranks R„,,
(For now we have one fixed b.) Our concern is with
nb c R
nr h ni
Tn = Tn =
(9)
= I
fo,
h dlf8" under the alternatives Q„.
We let
n
Rnrl
(12) yn =Tn—Sn = ' >^ ^; c h( l_h(ßn^)],
J
(15) 0
hdz np sJhdw
under Q„.
0
706 RANK STATISTICS
Thus
(16) Sn = Q T„ = h dZ n +µ n + yn = a S+ „ under Q.
(19) S„ ^ d S+ bp ac h6 o dt under Q,
0
since Coy [S, bZ] = bpca I'o h6 o dt under P„. Combining (16) and (19) shows that
(22) T„ = Q S+bp„(c, a) J t
0
h6 o dt under Q.
It is important to note that (21) and (22) each give a representation to the
limiting form of T„ that is valid under the contiguous alternatives Qn of
(4.1.1)-(4.1.6).
CONTIGUOUS ALTERNATIVES 707
Suppose now that we wish to test the hypotheses H: b = b o vs. the alternative
K: b> bo . As our test we will choose to reject H if the statistic
C Rh ob
n
nt ni
(23) 7'n = ^I
c' c n + 1
of (9) is "too big." Now from (22) [or from (3.8.20)] we have
(24)
0
T„0 = o
J
hdW+bopn (c,a)J1 h30 dt
0
under H: b = b o .
(25) Pb (TT > z t ° ) o -h ) = a P(N(0, 1)> zt')— (b —bo)p „(c, a) h8o dt/a-b f.
o
Now 3 o and the a „ ; 's are given by nature, while we choose h and the c„ ; 's.
How can we choose h and the c,'s to maximize the power? Clearly
(26) the asymptotic power depends on the c,'s only through p„(c, a)
and is maximized if p„(c, 1) - I [p„(c, a) = 1 if c,,, = a„ for all i]. As was shown
;
in (4.1.7),
(27)
f o,
S 0 (t)dt =0.
Thus
As a measure of the efficiency X 2 , 1 of the tests T;,2) with respect to the tests
T' ) we will take the ratio of distances of alternative
hypothesis from null hypothesis that produces equal asymptotic power for the
two sequences of tests. That is, equal power
Asymptotic Linearity
G - '(t) for all t with > for at least one t (this is equivalent to requiring
F(x) <_ G(x) for all x with strict inequality for at least one x).
We now assume of the df's F„ ; in (2) that there is a single parameter family
of df's F,, for which
(32) F b = Fbo Q , a B
where F is stochastically increasing in B.
(in the location case when Fba ^ = F(• — ba„ ; / a'a), this condition clearly
holds).
CONTIGUOUS ALTERNATIVES 709
Consider now the rank statistic
"
c ni R bni
(35) Tfl b = h ( - j)
(36) his/
and the a" ; 's and c" ; 's are concordant in that
n n; h rank (a n; )
(39) tim T b -_ t max = n c
b-+ " " ,_,
1=V'c'c n+1
=(the maximum of all n! possible values of T„)
and
We also implicitly assume that our statistic is nontrivial in that t' < Eb = O T„ <
t max
n
Proof. Now for each fixed b the term, call it A, inside the supremum in
(41) converges to 0 in probability by (22). We will assume first that p "(a, c) >_ 0
Jä
for all n and hS o dt > 0. Note that in this case both T. - T , and °
710 RANK STATISTICS
Estimation
where (4)-(6), (34), (36), and (37) are still assumed for these a n; .
Note that (34) implies that for each fixed w E SZ we have
n
c n; h rank (an,)
(43) lim T B _ t ma"= n
;=, c i c n+1
=(the maximum of all n! possible values of Tn)
and
min _
n
c„; h
( n+I_rank
(a„;) 1
(44) lim T B —
nB' °D — n ;=1 C S C n+1 J
=(the minimum of all n! possible values of T.
We also implicitly assume that our statistic is nontrivial in that t n a” < 0< t";
note that 0= E9 = 0 T.". Thus
Theorem 2. Suppose (4)-(6), (8), (32), (42), (43), (34)-(37), (47), (48) all
hold. Then
(c)P( sup I
Ibl<B.
^h8 o dtl>E^<E
0 2
for all n >_ (some n 2F ). Let A „ denote the indicator of the intersection of the
F
complements of these two sets; thus P(A„ F )> 1— e for all n >_ n E = (n, F v n2E).
Moreover, for w E A n , the graph of T. must lie within vertical distance e of
the line y = T + bp „(c, a) f ö h3 o dt on the interval IbI <_ B F . That is, on the event
A r , the ry b„ is within horizontal distance s/[p„(c, a) ,(„ h1 0 dt] of the solution
point of the equation 0= T„ +bp „(c, a) f ä h6 0 dt. Thus
T°
(d) b„ — b = o —
p„(c,a)J h8 ° d (
The upper and lower endpoints 6„ and e„ of a confidence interval for 0 based
on (5) are asymptotically
e„ f
a'ap„(c,a)10h80dt
712 RANK STATISTICS
Thus
Ja'a(en—e„) (a) — Qh 1
(50) 2 Z a pn(c, a) Jo hS o dt o•50pn(c, a)p(h, s)
Note that (50) implies that if the optimal scoring function S o does not have
much dispersion, then the estimation of 0 inherently results in rather long
intervals.
bx x ,,
(51) Fes = F ; ß with ß = + • • • +
' '' 2 b „° i Z (call this Q")
• =1 x1 ^i• =i x i'p
for u.a.n. x ;; 's [as in (4.5.30)] and suppose (6) holds uniformly in IbI < B for
any 0 < B < co and for all a ; 's as in (4). Then, as in Theorems 4.1.3 and 4.5.3
the log likelihood ratio statistic L„ for testing Qn vs. P. satisfies
where
(53) n
Z = r b1xi1 2 + ... ^ ^— b 'p 2
—
]o0(en1) = a bb fo' bo dW(i),
L
^x ;•=1 x r'P i=
where
for the jth column X3 of the matrix X of x ;; 's. Also, for U we have
( k)
t
(56) kn 1 = 1 J C (k) hk(n+l )
[note (3.1.66)]. Let T„ and S denote column vectors with kth entries Tk" and
Sk , let H denote a p x p diagonal matrix with kth entry Jö h,^3 0 dt, and let Mn
denote a p x p matrix with k, jth entry pn (c^ k ^, )C; ). We may summarize (57) as
(60) each h k is /
and
[Jureckovä (1971) contains a weaker condition than (61) that will suffice.] The
following result is then no more complicated than Theorem 1.
Theorem 3. Suppose (4)-(6), (8) for each k, (51), (33), (34), (56), (60), (61)
all hold. Then
and try to estimate 0. The natural analog of (45), (46) is to choose 8” to make
T° (with the obvious definition) come as close as possible to achieving the
value 0. Then (59) shows (as in the proof of theorem 2) that 0. satisfies
(64) HM f 01)
S.
"L^X'PXP(BP"-0,)^ a
Of course, when HM" has an inverse, we can solve for our standardized
estimator and see that it is asymptotically normal with 0 mean vector and
714 RANK STATISTICS
covariance matrix
5 S
(65) Mn'H_, Y_ H_,M„ ' where I has j, kth entry p„(c ( », c (k) )(h; , h k ).
n n
Proof. The analog of equation (3.3.27) and then Lemma 4.1.1 together imply
(b) Z„ =Z = S o dW°.
a
Recall Remark 3.1.1 to clarify the meaning of W` and W'. Since any sub-
sequence n' contains a further subsequence n” on which p n .,(a, c) -- some p as
n” -* co, we have
(c) 1
)J ° [TZ)] N L[O] ' LP°-h 1 oo3,So pO hl} tolf}bo ^^
Since the -^' of (c) trivially extends to vectors (T „- (t,), ... , T„.•(tk )), so does
,
the -4 d of (d). Checking for proper covariance is trivial. Subtract the mean
from the left-hand side so that we do not need to assume p„ converges to
some p. Replace -> d by = a in (d) as done in (22). ❑
Suppose X 1 ,. . . , X„, are iid with continuous df F and Y,, ... , Y. are iid with
continuous df G. Let N denote (m, n) when it is used as a subscript and let
N=m+n otherwise. Let A N =m/N and HN =A N F +(1 —A N )G. Let F m and
G. denote the empirical df's of the X; 's and Y; 's, respectively, and then
HIN - ANFm +(1 AN )G„ denotes the empirical df of the combined sample of
-
size N. Many classical statistics used to test the hypothesis that F and G are
identical are of the form
1 m
[^
m L
(1) TN- CN1ZNi
when CNI , ... , C NN are known constants and where ZN; equals 1 or 0 according
as the ith largest of the combined sample is an X or a Y. We define a score
function JN by
Note that
+ f J(HN)d^(F,,, - F)
= r ✓N(HN)- ✓ (HN) -
/ [HN-HN] dF.
V^►! + J ✓ (HN) dU -(F)
J HN—HN
+ f J(HN) dUm(F)
(9) = 1 -A N f V—
Af ✓ '(HN)Vn(G) dF
- '11 -A N f ✓'(HN)Um(F)dG)
(10) =TN= 1- AN {
-/1 -A J J'(H
N N )UJ(F) dG}
Suppose that for some function J the scores C N; =JN (i/n) are close to J(i/n)
in the sense that
I N-I
(14) ^cN;—J(i/N)H0 and C NN /J7.T-0 as N-I.00.
Exercise 1. Let V. denote the uniform quantile process. Recall that e n: , <_
• • • - 6„,,, denotes the order statistics and S „ i = ; — fin:; _, is the ith spacing.
Let p„ ; = i /(n+1). The fact that
1 —
f,:i
(1) E(,;+1I .,:,)= 1
n+1 I—p,, ;
suggests that
1 1— .,:;
(2) ^
M.,(=fen:i
Pni) —
n +l j =, 1—p„j
= VR (Pn^)_
______
1 '' 1
L wn(Pnj)
n+1;=, 1—Pnj
71 8 RANK STATISTICS
is the appropriate basic martingale for this situation. Solving for V. gives
h( - -)61_/ ] h d4/ „.
(6) T„=•/n[YI = fo,
Exercise 2. Find conditions on h under which
for 1 <i<_ n + 1.
n+1 t <—,
n+1
Then
1i I r
(9) Tn=^ o hdAl„.
n;E^h {n+l }fin:;–^o
th(t)dt]=
(11) J
0
IU„(t)I k dt= J
0
O IN „(t)l k dt for all k>0.
CHAPTER 21
Spacings
0. INTRODUCTION
The joint density of the order statistics„,, ... ,„:„ of a Uniform (0, 1) sample
is [see (3.1.95)]
where n:O 0 and n:n +l ° 1; these are called uniform spacings. The joint density
of Snt,... 9 Snn is
for 1 <
—i n and S „,+• • •+S nn <1;
720
just note that the Jacobian of the transformation is one. Note also from
Proposition 8.2.1 that
/ al an+1
() (sn 1 1 • ' • s sn,n+l) — n+l ^ ' ' ' + n+l
where a, , ... , a n+ , are iid Exponential (1) rv's. From this we see that
Sn,n+, are exchangeable rv's; that is, permuting the coordinates of
5 n, n+ ,) does not change the distribution of the vector.
Thus
n
(7) E8;= --- and Var[Sn;]=
-
n+l (n+l)z(n+2)'
1 1
(8) Coy [Sn;,S";]=—(n+l)2(n+2) and Corr[S" ; ,8,; ]=—n
We let 0<_ S n1 s <_ S n:n+ , denote the ordered uniform spacings. Note that
an+la an+l:n+l
(10) \Sn:1,•• ‚ 3 n:n+l) — n+l ''''+ n+l )'
a; E, ai
where 0 = an+l:o < an:,:, < • • : an+,:n+, are the order statistics of the sample
a l , ... , a" +1 of Exponential (1) rv's.
(12)
( Yni =^ an:i
I I
and
Now show that (S ,, ... , S„ „ + ,) has the same distribution as does the vector
:
1 1
(15) ES„ = :;
n+1 ; , n —j+2
(16) Coy [S n :i , S n :j ]
1 1 1 1 i 1
(n+1)(n+2) Lk 1 (n — k-2) z— I (n—k+2) k I n—k+2
Also show that (&‚. . , S n,„ + ,) has the same distribution as does the vector
whose ith coordinate is
(17) (n — i +2)[lin:i—sn:i—I]•
is distributed as the sum of n iid Uniform (0, 1) rv's. It thus has mean n/2,
variance n/12, and approaches normality very fast as n increases. Show that
n+tn+l r n+l
(19) L„ =2 (n+1)— 2
iOn+I:i/ a ; ] = I (n
+1)— i5n:i],
and note in this expression that the large S „ :i 's are weighted the most heavily.
DEFINITIONS AND DISTRIBUTIONS OF UNIFORM SPACINGS 723
are Exponential (1) order statistics. As in Exercise 2 or Exercise 8.2.1, the rv's
( _ n—i+l Ynj
(22) fn:i = exp ( — an:n -i +l) = exp E
j . 1 n —j+l
—i +l
Ynj Yn,n —i +2
8ni = en:i — n:i-1= exp — . 1 1—ex
; i _ n—j+1) p( ( i -1
(23) _^,^ „ I1 — eX p (
i
Yn,n -i +2
i-
11
1 /) J for l<_i<_n+l;
—
note that (23) represents the spacings as the product of two independent factors.
l i Ynj 1 2 c
( 24 ) E ( n:i—
n+1) — ( 1 n/ 1 n—j+1J n2
i/n-+u and j/n- v as n oo. Then the rv's n(X. :i — Xn:i _,) and n(X—X„.; _,)
are asymptotically distributed as independent exponential rv's with means
1/f(a) and 1/f(b).
for 0 < z < 1. [Sweeting (1982) uses this fact to test for outliers.]
Proof of (9). We note that
4(u)= P(S n2 C YI 5 n1 = u)
=P (&2_1,1 j A i)
X' P( 8 n2 5 yI 6 1 = U)
dFD^i(u)
= P(bn1^Xl)
cJ o x2 p( P(sn1 x2) F
by(a)
(b) ^ P(sn2<Y) by ( 26 )
^k7
(27) P(ti„ 1: X,,..., 8 nk' 5 xk)^ 11 P(sni C xi)
i=1
and
k
(28) P(lSnl>xl,...9snk>Xk)^ fl P(sni>xi)•
i=1
LIMITING DISTRIBUTIONS OF ORDERED UNIFORM SPACINGS 725
Now 6,, ... , e„_, divide [0, 1] into the n subintervals [e„_, ;; _, ,„_,;;] of length
for 1:5 i <_ n. Let S * be the length of the subinterval
containing the next observation e„.
(29) n(n —1)t(1— t) „-2 for 0<_ t<_ 1; that is, 5 * = Beta (2, n-1).
Proof. Since all S n _, ;; _— S„_, ; , with density (n — 1)(1 — t) n-2 for 0< t <_ 1, by
(3.1.90), then (29) is immediate from the generalized version of Theorem 1
contained in (30) below.
Let [0, 1] be divided into n disjoint subintervals in such a way that the
length X; of each subinterval is a ry with the same df F for each 1 <_ i n. Let
f be Uniform (0, 1), and independent of the X; 's. Let X, denote the length
of the subinterval that contains the point and let its df be F. Then F * has
a density with respect to F given by
i =1 JO
I P(I = i l Xi = s) dF(s)
(a) =n ^ r s dF(s).
0
(31)
0
J
EX* = J 1 tdF* (t)= tntdF(t)=nEX1
0
holds in the general case. (This was taken from a Berkeley preliminary.) ❑
We now show that the limiting distribution of the minimal and maximal
uniform spacings are exponential and extreme value, respectively.
726 SPACINGS
Note that
since 1„ has mean 1 and variance 1/(n+1). From (21.1.10) we know that
M .
Now
=(n + 1 ) 2 MI I„
n
log((n +l)/e ` )
Also
(n+1)M "- log(n+1) I"
(n+1)M„-1og(n+1)=
J
(e) =[(n+l)S":"+,—log(n+l)][1+o,(1)]+op(1)
by (a) and (b).
Exercise 2. (The exact distribution of S„ : " + ,) Use (21.1.3) to show that the
joint density function of (8, .... S" ; ) at the point (t,, ... , t ; ) is
where the series continues as long as (1—it)>0. (See David, 1981, p. 100.)
[The exact joint distribution of S" : , and S + , is given by Darling (1953).]
[See Rao and Sobel (1980) for generalizations to ith largest and ith smallest
spacings.]
1
(2) -n I1 =l[D..ryt=F„(Xny) for0<_y<oo.
1
(3) spacings and the D,, ; of (1) when F in (1) is any exponential df
with mean Fc have the same finite-dimensional distributions.
If we are interested in testing that the Xi 's are exponential against some
alternative F, then
(5) =F - 1 (
X1 =X,i— 1 ) for 15i<_n.
Theorem 1. If F satisfies (9), then the special construction (5) above satisfies
Z '
(11) U U +— F - 'f o F '
- with Z F ' d u.
-
lu o
(12) X_ J(F(p.))•
Iix -v(F(SI))II
IIUn(F(XnI)) — U(F(Xnf))II +IIu(F(X„I)) — U(F(µ'I))II
Now 7n2 -> p 0 also, since U has uniformly continuous sample paths and the
mean-value theorem gives
(X) * * I _ op(1).
(c)
* II F(I) — F(I)II = supY I AY
Yf(Y
I^Y 1i Y )
But (d) is clear since (9) implies that zf(z) is uniformly continuous on some
[0, M], with M chosen so large that both yµf(yp.) and yµ*f(yµ*) are uni-
formly small on [M, co) with probability exceeding 1— E. [Note that (9) is a
bit stronger than we actually need.] See Pyke (1965) for a version of this
theorem. ❑
with V(0) = 0; we will call it the ordered renewal spacings process. Note that
730 SPACINGS
(14) ^'„=µf(F-') ✓ f1 1
—n J
F- 'f(F-' )
=Xn f(F ')V(F.'— F ']—^(X„— s)
- -
Xnk
cn'
(18) Zf(y)= E [1[x1<X.,r]—F(Ly)] fory>0
where W,, is the uniform weighted empirical process of the Uniform (0, 1)
rv's f =F(X ) and p„(1, c)=c'1/ 1'lc'c as in Section 3.1. Thus virtually the
; 1
where
(22) NN=W+ZF - 'foF- ' with Z = F ' d U. -
[even though these :j are not Uniform (0, 1) order statistics unless the X ; 's
are exponential], then the quantile process of these „:; was considered in
Section 12.2.
and
A
(5) 0 = — v,
for0ss,t<_1.
(Note that the 6 process arose in Pettitt's Table 5.6.1 where the distribution
of rv's like J 1 [v„(t)] 2 dt is tabled, see the rows labeled "E ".)
Consider the weighted normalized uniform spacings process
whenZ= — J ,
0
[—log(1 —t)] dU(t)
for0<_s,t<_1
provided
c' 1
(12) p„(1,c)= -+somepi, asn -00.
1'lc'c
TESTING UNIFORMITY WITH FUNCTIONS OF SPACINGS 733
As in Chibisov's theorem (Theorem 11.5.1) and as in Theorem 11.5.3 we
suppose that either
t/z
(13) q E Q* and T(q, A) =t - ` exp (—,1g 2 (t)/t) dt <oo for all A >0
0
or
Proof. Consider N„, and note the identity (21.3.15). Apply Theorem 18.1.3
to the O n term and either a Taylor's expansion or l'Höpital's rule to the
multiplier —(1 — t) log (1 — t) of Z. We leave the rest as an exercise. ❑
Exercise 1. Provide the details of a proof of (15) and (16). See Beirlant et
al. (1982).
Suppose now that we are testing n — I data values, , ... ,„_, in [0, 1] to see
if they can be regarded as a sample from the Uniform (0, 1) distribution. Thus
we have the
Suppose now that the null hypothesis of uniformity is true. If S„_, , ... ,fin-t,n
denote the n spacings between the data values, then under the null hypothesis
(3) = Y / Y
; for lid Exponential rv's Y,, ... , Y„, under the null.
734 SPACINGS
(4) alternative hypothesis: e,, ... , fin _, are lid on [0, 1],
but are not Uniform (0, 1).
(5) Y- G where G(x) 1 —exp ( —x) denotes the Exponential (1) df.
Many statistics designed for testing uniformity against the alternatives (4)
are of the form
n
(9) _ 14 dLn
0
I n
(10) U(t) -7= [i(D^G-1(:] —t] for0<_t<_1.
and
Proposition 2. We have
Theorem 1. Suppose
and
(18)
f o ,
t(1—t)dcb k (t)<oo
A
fork=1,2with¢ k =h k (G
where
(20) Q 2 =Var[h(Y)]—{E[(Y-1)h(Y)]} 2 .
n —1 yIn-2
(21) 1—F„(y)= I- } ny" - '
and f„(y)= n 11—n for0--ysn.
Thus
y/n
736 SPACINGS
v 1—(1—n) ( _ f/(i
L exp n//J}
2a,2a„ 1 a n a„
^g(Y) n exp
\ n / v \n + 2n^\ 1 nl/
an g(Y)[ 2 e2 v 2]
n
15a„
(22) c for 0<y s a n when 1 <_ a„ <_ n/2.
n g(Y)
^f a
Ih(Y)I
e-Ydy< Ih (a)+a j Ih(Y)I[Ih(Y)I+y4]e-'dy
1/2
+2 f Y e 8 -Y dy I I
by Cauchy-Schwarz and then the c, inequality
15a„ 1o ^ lh(y)l e -v d y
15an (1')I
+l0 lh(a +an o(1) by (e)
(f) = O(n - '/ 6 ) if we now let a„ = n'/ 6
as claimed. ❑
Exercise 2. Minor changes in the proof of Proposition 1 give
(24) ✓n II F„ — G 11 = O(n - '/ 6 ).
=—log (1 — t) = t +^ +3 +•
^ t+t 2 +t 3 +•
= t/(1 — t)
(b) <
-2t for0 -t <_z.
738 SPACINGS
Thus
Var U. doII = foO f oO E[vn(s)U„(t)] dcb(s) d(t)
L f."
e z
J {E[v„(t)]
('
:[ L
J B {Constant' t(1—t) }'
o
/Z y (25)
do(t) zb
]
0 do I ?s)Var
<_ [J
(d) P\J o
o / oBLd ]/ czcE3,EZ —E '
^ t K(s, t)dcp(s)d4(t)
J o Jo
J J [sAt st]do(s)do(t)
=
—
0 o
(1
= Var [h ° G '(e)]
- — Ye "dh(y)I
tJ 4
'
e-ydylz
=Var[h(Y)]—^ I (y-1)h(y)
o J
(h) = Var [h(Y)] —{E[(Y-1)h(Y)]} z
as claimed. C
Exercise 3. (Holst and Rao, 1981) Prove a limit theorem for statistics of
the form Y n= , h(Dn1 , (i —z)/n)/n.
Let T. and MU,, denote the obvious modifications of T n of (27) and *. in the
spirit of (12). Such a statistic with c ni >' in i might be appropriate for testing
against the
Theorem 2. If (17) and (18) hold, then the T n of (27) and its modification
Tn satisfy
where
(33)
1 alternative hypothesis: X1 ,. .. , X„ are lid F( / 0) on [0, cc) for some
0>0 where O< f x 2 dF(x) < oo, but F is not Exponential (1).
11 n
T„=^I J(
n ;=, n+I
i X„
X"`J —
J oxJ(F(x)) dF(x) .
(' °O
(35) T„ _ — Y- c„ ; X„ — xJ(F(x))
:; dF(x)
n=i o
(37) i F ' d
T = T— UJ
LJ - F(x
J f ^ xJ( )) dF(x) provided T n ^ T.
0 .
J
I
(38) T = T=
T JU dF ' - under "certain hypotheses";
a — o
Exercise 4. Compute Q 2 .
Now define a function a„ on [0, 1] by letting a(t) = a„ ; for (i — 1)/n < t z i/n
and 1 s i <_ n, with a„ (0) = 0. Show that
(41) T„=^I n
LL 1
Xn
„
a, .
1
]
(_i)2
dt
(43) fo \—(1—t)log(1—t))
under the null hypothesis.
Devroye (1981) shows that the kth largest spacing (denote it by K„) induced
by , ... , e„_ I satisfies both
nK„ —log n _ 1
(1) li 0 k a.s.
2 10 $z n
and
( 0 "0 a„ ( Goo
(3) P(n2Sn :k ^ a n i.o.) = j 1 according as Y =j
l n =^ t1 l =^
(4)
1 0
P(n23 n:k ^ a n i.o.) = S 1 according as
°° a„ (
exp (—a n ) = S
<co
t n=i n l o0
CHAPTER 22
Symmetry
.... As usual, we let G„, U,,, W, and R. denote the empirical df,
the empirical process, the weighted empirical process and the empirical rank
process of , ... , i,,; see Section 3.1. Of course,
The df of —X 1 ,. . . , — X„ is
In fact, (4) holds on (—co, oo) unless some achieves a value t satisfying
F(x,) = F(x 2 ) = t with x, 54 x 2 .
Then
and
Thus
+/[Fo119„'(t)—FoU-0(t)] in 11 11
(23) =U((1+t)/2)+Q..)((1 — t)/2) = R(t)
(25) when F is a continuous and symmetric df, then R. (1 — t), 0 <— t <_ 1, is just a
"version" of the partial-sum process of n independent rv's that equal ±1
with probability 2,
see Figure 1. Note that S. ° H n '(1— t), 0<_ t <_ 1, is exactly the partial sum of
n independent rv's that equal tl with probability 2; however, our reasons for
0 Plot of X 1 , Xn
t
--2-- • 0• n-1- _n 1 —
Figure 1.
746 SYMMETRY
Suppose X„ 1 , ... , X.,, [with X„ = F ' ( „;) as in Section 3.1] are iid F where
; -
for the empirical df H. of X,,,I, ... , IX,,.J with IX J = H. It is immediate from (3)
and (22.1.13) that
It is also clear from (3) and (22.1.3) that (see Schuster, 1973)
These results can be used to test whether the true df is a specified df F. Also,
confidence bands based on ^^0=„ — FiI are exactly twice as wide as those based
on 11än — FII. Likewise, use (22.1.6) again,
To test symmetry without specifying the/ true df, it is natural to reject for
large values of the statistic
(12) = s 2„ dH n .
T„J
as n - x. Thus
where J (1- t), 0<_ t <_ 1, is the Brownian motion of (22.1.19). The percentage
points of T/4 are given in Table I from Orlov (1972), with the two adjustments
indicated by Koziol (1980a).
Table 1
Limiting Distribution of the Symmetry Statistic T,^
(from Orlov (1972))
( 14 ) T= F 2 (t)dt = 41 zZ;
fo
, [(2) - 1)1rj
for iid N(0, 1) rv's Z,, Z2 ,.... (Recall the methods of Chapter 5.)
TESTING GOODNESS OF FIT FOR A SYMMETRIC DF 749
Exercise3. (Srinivasan and Godio, 1974) Let P k (let Nk ) denote the number
of positive (of negative) X ; 's for which jX; 1 <_ IXI n:k. Define S„ by 4S„ _
Y,k_ I (Nk —Pk ) 2 . Show that
These authors table the exact distribution of S. for 10 <_ n < 20. They suggest
that the asymptotic approximation (16) should give at least two-digit accuracy
for n>20.
Show that while T. and T„ are both invariant under the transformation x -* —x,
T„ + T„ also has the desirable property of being invariant under the tranforma-
tion x-3- 1/x. Thus
Components of S.
„]
(19)
where
with the f's orthonormal on [0, 2], and where Z*, Zz, ... are iid N(0, 1). Now
verify that the series in (22) converges uniformly and absolutely on [0, 1].
(24) Zn `
=^^ J;(t) 7 1(t) dtv
o
Verify that the Z„,, Z* 2 , ... are uncorrelated with means 0 and variances 1,
and that they are asymptotically independent N(0, 1). Then use integration
by parts to verify that
2 (
cos[(2j-1)arF(—Xn; )]
— xY o
J .
rr(1+Rnr/n) 1
(26) Z„* , = — [sign (X1)] cos 2
fl i
where J*(t) equals f cos (or(1— t)) or — cos in according as 0<_ t <_ 2 or
2 <_ t 1. (Use the results of Section 22.4.) Koziol's consideration of Z*, is
based on the fact that the first component usually contributes most of the
power; he finds that this test has good power against location shifts.
THE PROCESSES UNDER CONTIGUITY 751
is to be tested against
(2) Q„: X ,, ... , X„„ are independent with densities f ,, ... 'f,,,,.
As in Section 4.1, suppose
(5) 3E.2(µ),
r 2 _ 1
-^-
(6) 1
y- I
L a',a s ] ,aa [a nl / aä -S J I^ 0
as n -* oo.
Theorems 4.1.1 and 4.1.2 show that the natural extension of the null absolute.
empirical process, namely
For S, we really just combine the bracketed results of (10) with a plus sign
instead of a minus sign. Thus, Theorems 4.1.1 and 4.1.2 show that the natural
extension of the null S n , namely
i n
(11) S (x)=^ ^ t {1 (x,, . ] -1 [ _ x1 .]} for —oo<x<oo
J0
(1—r)/z
in II II-norm
for the special construction of Section 3.1, if p. = Lebesgue.
Exercise 1. Prove (13) for UB n . Hint: Follow the outline of the proof of Theorem
4.1.2. There is only one real difference: Since E00, we cannot put in the
centering term in step (f) for free; however, a canceling term (F is symmetric)
must be put in for F
where we suppose that the a ni 's are u.a.n. as in (4) and that
Likewise,
n
Exercise 2. On the one hand we have proved (12) and (13) under (1)-(6).
Now replace (1) by (4.6.1) and obtain the natural extension of Theorem 4.6.1.
That is, extend (12) and (13) to processes of residuals from a general linear
model.
Vn n+l n;_, n+ l
where X,, 1 ,. . . , Xnn are iid with continuous and symmetric df F and J is a
score function on [0, 1] such as J(t) = t or J(t) = cos (ir(1 + t)/2). Of course,
the R „; and the Dn; are the ranks and antiranks, respectively, of the IX m; I.
Note that
(3) =
fo,
JdR,,
754 SYMMETRY
Also, recall from (22.1.25) that if X,= F '(e„ ; ), then (in fact)
-
(6) 1 t I+OJ( 2
R(t)_=U 2 1 t^
Moreover, the empirical processes (F„ — F) and v„ of X„,, ... , X„„ and
„, respectively, satisfy
These last two formulas are of interest since they allow an asymptotic rep-
resentation of T„ on the special probability space of Section 3.1; see (11) below.
satisfies
(10) q T= J i JdR.
Hint: Mimic the proof of Theorem 3.1.2 given in Section 3.4. Note that (4)
makes calculation of variances easy.
(ii) Now note that the T of (10) satisfies
T=
fo
,
J(t)dR(t)= J
0
. J(t)d[U((1—t)/2)+U((1+t)/2)]
— Jo ^
dv(s)+
'/z J(1-2s)
f^/2
J(2s-1) dv(s)
i
(11) = J* dU
0
SIGNED RANK STATISTICS UNDER SYMMETRY 755
Figure 1.
(12) J (t)
J
* _ J(1-2t)
J(2t — 1 )
for0<t<2
for2^t<1;
J(t) =J*( 2
1 t ) for0<_t<1.
(13) T =N(0,o )
with
(15) W„ W= ^R(t)dt=N(0i).
JJo
Contiguous Alternatives
(16) Z = f dw° J
W cpo d ° with cpo
o `l /f(F )
' o ✓.%(F-5)
756 SYMMETRY
see (4.1.18) and Theorem 4.1.4 (Le Cam's third lemma). Note that
(17) Cov [T, Z] = p(a, 1) J J*^p o dt provided p„(a, 1)--> some p(a, 1);
0
as n -^ x .
Fixed Alternatives
(20) J J(H)d[U(F)+v(1—F)]
[U(F)—U(1—F)]J'(H)d(F—F)
+J 0
(iii) Of course, the results of (i) and (ii) can be added together if J has
both types of components.
ESTIMATING AN UNKNOWN POINT OF SYMMETRY 757
5. ESTIMATING AN UNKNOWN POINT OF SYMMETRY
Suppose that
(1) Fis symmetric about 0 and I 0 (f)= I (f' /f) ZJdx <oo.
1 r
f,,;).
Let X,,,, ... , X,, be iid F,, = F(• — 0) with 0 unknown; that is, X„ ; = Fo'(
Suppose also that
Note that
(4) T„(b) is ,, in b.
Define
where the right-hand side of (6) is also \ in b. With /i(B„ — 0) -^ b, Eq. (6)
suggests (see Exercise 1) that
(8) 0„ = sup {b: T(b) < — t „°^ 2) } and 9„ = inf {b: T(b)> t;,°^ 2 }.
758 SYMMETRY
(9) 1—a?P0(Bn^O: ),
so that [B„, B„] is a (1—a) • 100% confidence interval for 0. Then (6) again
suggests (see Exercise 1) that
and
i i
(I1) In(@-8}
a
— z (a/z) Uj — f J* J*(Gods];
JdU]
o[—J0
Remark 1. For our statistics T. one can show that T. -+ p (some To ), for
the special construction of Theorem 3.1.1, under the null hypothesis. Then for
the contiguous alternatives of (4.1.4)-(4.1.6) one immediately has
The monotonicity of both T(b) and the rhs of (13) immediately yield the
asymptotic linearity
Exercise 1. Using the technique of Section 20.3, establish (7) and (10)-(14).
Extend this to location type regression situations.
ESTIMATING THE DF OF A SYMMETRIC DISTRIBUTION 759
Estimators Based on Integral Test of Fit
(16) N
I—
n(ô —0) J f2 (x)11«2F(x) —1) dx/J f3 (x) dx.
Suppose that
where t„(x) =1— F„(29„ —x—) is the empirical df of the 2B„ —X ,'s. By way
of regularity of F, we assume
Schuster, 1975)
as n - oo.
Exercise 3. Suppose (1) and (4) hold. Suppose Var [ B„ ] < oo and V(& —
-
0)
0) d N(0, o.2 ). Then
(9) 2F(xe)-0.2f2(xo)•
(11) v'i(f* —F0 ),B(2F-1) +fO T(U) in ' under (1), (4), and (10)
for the Brownian bridge B(t) = v((1 + t)/2)+v((1— t)/2) of (22.1.14) and
(22.2.8).
ESTIMATING THE OF OF A SYMMETRIC DISTRIBt;TION 761
Exercise 4. Suppose either (i) F is normal and B„ = X„, (ii) F is double
exponential and 0„ is either X„ or the median X„, or (iii) F is Cauchy and 9„
is the MLE. Show that the expression in (9) is -0 for x for all the special
cases described here. Can you show it more generally?
Estimation of F
/[in(x) – F(x)]
=W [f„(x+8„) +l`„((x – B„)–)–FB (x +0)-1 +F9 (0 –x)]
by symmetry of F. Thus
(15) jl./n(1. –F) –B(2F- 1)/21 --> v 0 asn->X under (1) and (4).
Comparing with (22.2.8), we see that we can estimate F equally as well whether 0
is known or unknown. This is not the case when estimating F9, as (11) shows.
X; +X
_– n
z21IX, – X;I+ ni I2
—
I ^–o.
;,;
762 SYMMETRY
TT-*,T=
11' .
[R(2F-1)]2dx
f 2 (x) dx}/Qc
—[ J o' 68(2t-1)dt ]
w
(19) 7 . _ Z2j-1 _ Z
j = 2 1(2 . 1) = 1 (2i+1)(i+1)
for iid N(0, 1) rv's Z 1 , Z2 ,.... Note that for logistic F we have
_ z 2 _,
(20) ; 'j(j -1 / 2 )'
while
' i
{21} j I8{2t-1} dt] f2(x) dx=2Z;
JJo
and aQ=2.
CHAPTER 23
Further Applications
Let X„ = (X 1 ,. . . , X„) where the X; 's are iid F. Let F. denote the empirical
df of these observations. Suppose now that the distribution of some ry R(X,,, F)
depending on both X„ and F is desired. The bootstrap principle of Efron (1979)
suggests approximating the distribution of R(X„, F) by that of R(X, F„),
where X n = (Xe,. .. , X) with the X *'s being an independent random sample
from the df F „. It is possible to obtain as many independent values of R( ( *, F,)
as desired by Monte Carlo sampling, and thus estimate the distribution of
R(X*, IF„) to whatever accuracy is desired. This distribution is then used as
an approximation to the distribution of R(X„, F).
The fact that this procedure is asymptotically correct in many situations is
well established in the literature. Here we simply show that the empirical
process /i(F„ — F) itself can be bootstrapped; this result is due to Bickel and
Freedman (1981), but the following simple proof is from Shorack (1982b).
Let X 1 , ... , X. be iid F with empirical df IF,,. Let X,. . . , X* N be iid F„
with empirical df F. We will assume that a special construction of the X's
has been used. Thus X^ ; =F( 1 ) where. , C NN are iid Uniform (0, 1)
rv's whose empirical process U *N satisfies I^ aJ N — UJ ll ^ a . s . 0 as N -> m for a
Brownian bridge U. Thus
II ✓N(FN—F n )—U(F)11
=11UN(F„)—U(F)1I
5 IIU (Fn)—U(F,,)II+11U(IFn)—U(F)
From this we can draw the following conclusion, which does not depend on
a special construction for its validity.
763
764 FURTHER APPLICATIONS
Theorem 1. For a.e. infinite sequence X 1 , X2 ,.. . , X,,, ... , relative to the
conditional distribution given X 1 ,. . . , X„, we have
2. SMOOTH ESTIMATES OF F
and suppose b.10. If X 1 ,. . , X,, are iid F with density f = F', then
(3) „ (x)=
fx .1,(t) dt=
00
I x j;*_) dFn
K(\ _ (Y),
If we let e, , ... , 6„ be iid Uniform (0, 1) rv's and construct the Xi 's as
X = F '(e ) for i ? 1, then U. (F) = v'i[F„ — F] and a natural question arises:
; - ;
How close are the processes X. and OJ „(F)? The following theorem gives an
example of this type of result:
Theorem 1. Suppose that 11 H 00 < 00 and that f has derivative f with f ^^ 00 <
oo. If k satisfies (1), then, with c„ -> oo,
(a) X,, (x) = fn[rF„ (x) — E1„ (x) + EI,, (x) — F(x)]
= J - ) _
K( Xy dU,(F(Y))+v(F,(x) F(x))
b„ —
—f 00
vn(F(Y)) dK(x—y)
\ bn +
✓ n(FF(x)—F(x))
J
= — . aJ„(F(x—sb„)) dK(s)+^(F„(x)— F(x)),
where
x—yl
(b) F„(x)=Et „(x)= f ' K( dF(Y)•
J oo ` bn J
Thus
J
_ — [UJ„(F(x—sb„))— v „(F(x))] dK(s)
+ f(F„ (x) — F(x))
= Rln(x) + R2n(x).
766 FURTHER APPLICATIONS
Now with c = c„
[U (F(•—sb„))—U„(F(•))] dK(s)
+II1, ^-^^^ - '
ü) , (an)+ 2 IIU II K ([ — c, c] c )
= J . [F(x—sb„)—F(x)]k(s) ds
_—Vii sb„f(x)k(s) ds+Vii
J f ^ 1 s2 bnf'(zs )k(s) ds
.2
= 2 ib„ s2f'(zs)k(s) ds J
since y dK(y) = 0,
rb,Ilf' lk2
(e) IIR2n11W-co 2
as n -^ X .
Proof. This follows immediately from Theorem 1, Theorem 14.2.1, and
Theorem 13.1.1. ❑
THE SHORTH 767
Exercise 1. If k is the standard normal kernel k(x) = (2ir) - ' 12 exp (— x 2 /2),
find c„ and b„ which minimize the bound on the right-hand side of (6). (Claim:
the "right" b„ is b. = n —'13 .)
3. THE SHORTH
(3) n'/3(0)—,ABT
where
(a) G' 10 F.
G=F ',g-G'= and g'=G "= —( foF
1)3.
Our first task is to determine which (I —2a)- fraction of the data is the
shortest one. To this end we define
(6) 1 ff(t) = n 2/3 LXn:(n(I - 2 F —' (1— a)]
o +Atin't 3 )) — Xnc((a+Atfn' /3
))
We note that for any d >0 the Hungarian construction of the process
satisfies (see Theorems 18.2.1 and 12.2.2, but note that we are concerned here
only with the interval [—d, d])
a.s. 0 as n-o0.
Moreover,
I sup I +nt31n'
/6t3„(1—a+n
t
r At )
—g(1—a)n' /6 B„tl—a+ l
n /3
[The reader is asked to verify the 0(1) term of line (h) in Exercise 1 below.]
Since the M en term is analogous to the M,„ term, we have thus shown that
— B„ a+ _ n (a) }
L \ n 3/ J
= Sup I M 3(t)'
Iris K
(m) _+a.s.0 as n -+ o0
then, asymptotically, n' /3 Tn /A plays the role of tin (6). Thus (7) shows that
n' / 3 Tn /A behaves asymptotically like the value T. that minimizes
We thus have
7,
(8) n1 /3T/A-d
where r is defined by
1 -2a
(p) (1 ' 2a)n V3 Bn
-
= n1/3 I 0
Xn:n(a +T +u) du+op (1)
-2a
f
1
n 3 [IF„'(a+Tn+u)-F-'(a+Tn-u)
o
+F - '(a+Tn +u)-F - '(a+u)] du+oo(1)
1 -2a
(q) =oo(1)+n' /3
I -2a
J 0
[F-'(a+Tn+u)-F-'(a+u)] du+o o (1)
= n1/3TF-1(a+u)I1-2a+o (1)
(r) =n 3 Tn 2F - '(1-a)+o(1).
Thus
Thus we see that the constant A is inherent to this type of estimator, while
the other constant depends on what functional is applied to the shortest
(1 —2a) -fraction of the sample.
(1)
where h is a real symmetric function (the kernel) of m variables and >, denotes
the summation over all (,„) combinations of m distinct elements {i,, ... , im }
chosen from {1, ... , n }. Tn is called a U- statistic. We suppose that
(2) Ehe° Eh 2 (X,,...,Xm)<aC,
and let
We let
provided (2) and (4) hold. If T„ is a second such statistic with kernel h, then
(T,,, T„) are jointly normal with covariance
(6) Cov [E(h(X,, ... , X,„) I X,), E(h(X,, ... , X„,) X,)]•
where
(8) T*=m
✓n E
;_ ►[h,(Xj)-6] with h I (x)=E(h(X,,....X,„)IX,=x).
Since 6-0„(x) and H,, (y) are U-statistics whose kernels are indicator functions
with means H(x) and H(y) and finite variances, the joint normality above
clearly gives the finite-dimensional convergence
(13) Zn _>f. d . Z as n- oo
CONVERGENCE OF U- STATISTIC EMPIRICAL PROCESSES 773
where 7L is a 0 mean normal process with
of the rv's h(X11 , ... , X,,) and of the normal process Z above.
Proof. In this first paragraph we present the key idea. Let a = (a (1), .. . , a (n ))
denote an arbitrary permutation of (1, ... , n), and define iid rv's
(17) Y;+1 = h(Xa(im +l) , Xa(jm+i), ... , Xa(Jm+m)),J =0, 1,... ,(n/m)—'1.
I (n/'n>
(18) Hn = (n/m) j t l^r;<•>>
here OJ^n1m) is the uniform empirical process of; , ... , f(n , m) where f; = H(Y )
with Y = H being the associated continuous ry of (3.2.48). Note that
The interesting feature of (20) is the fact that although the H„ are dependent,
each H, is the empirical df of an iid sample of size (n/m). Now note that
where
(22) ON n =— D1<n ^ m) .
n! ally
774 FURTHER APPLICATIONS
[Recall that the definition of the Y's involves extraneous Uniform (0, 1) rv's.
However, the key equation Z n =W(H) of (21) shows that almost all sample
paths of Z n are totally unaffected by these extraneous rv's. Thus we can
introduce them in the most convenient manner possible. We now specify that
the extraneous rv's introduced for permutation a will be totally independent
of those introduced for a * whenever a 96 a * . Note that this does have an effect
on the sample paths of W,,, even though it does not have an effect on the
sample paths of Z. It is simply that Z n =W(H) does not "look in on" W n
at any points where there is an effect.] Using W Z to denote the modulus of
continuity of the process Z, we have
since the sup of a sum does not exceed the sum of the sups. Now Chebyshev's
inequality gives
We also have
If s and t are in the range of H, then Coy [W(s), W(t)] is of the U-statistic
form (14); while on each interval [H_(d), H(d)] associated with a point of
discontinuity d of H the process W behaves like Brownian motion conditioned
to match up with the other part of W at H_(d) (this is a result of how we
choose our extraneous rv's above). Thus W n =W on (D, 9, II II) by Theorem
2.3.1. A Skorokhod construction as in the proof of Theorem 3.1.1 now finishes
this proof. 0
Exercise 2. (Serfling, 1984) Let N= (;„), and let Wn: , <_ Wn:N denote
the ordered values of h(X ., ... , X,,,). Consider the
; generalized L-statistic
/ N
( 29 ) Tn — Cni Wn:i
i =1
RELIABILITY AND ECONOMETRIC FUNCTIONS 775
for known u.a.n. constants c„,, ... , c„ N . Establish conditions under which T.
is asymptotically normal.
F '(s ) ds
(2) L(t)- 1 F '(s) ds= lo 0 <t<1
fo,
µ Jo
-
F '(s) ds'
- '
and
i , F1
O F(x) dx ö
(3) µ H (t) =^ F(x) dx= 0s t <_ 1.
F(x) dx fo
50 F(x)dx
initial tth fraction of the population. While e and H '/µ have been used
-
e(x)=µ for0<x<cb,
(4) L(t)= t+(1— t)log(1 —t) for0^t<_1,
Exponential F
1 H '(t)=t
- for0t<_1,
776 FURTHER APPLICATIONS
1'], and hence L(t)st for all 0 <_ t <_ 1. If F is absolutely continuous with
density function f = F', then
t) = fo 1 —t
dt H ^( -'( t ) _ A o F(t)
for almost every 0< t < I where A —f/ F is the hazard rate. Thus H ' is concave -
Hall and Wellner (1981); for more about H ' see Chandra and Singpurwalla
-
(1978) or Barlow and Campo (1975). Goldie (1977) gives a brief review of
applications of L including distributions of scientific grants and publishing
productivity of scientists.
These three transforms of F are all closely related:
and, if F is continuous,
µ µ
=EX1 (x , ) (X)
=p. —
EX1(o,x](X)
=p.— J(O,F(x)]
=µ.(1—L(F(x)).
F-'(t) dt by Theorem 1.1.1
(b) H-'(t)=
j F 4n
J F'(x) dx=
f f
oF-1(
)
dF(y) dx
0 (x,-)
= f0
F '(t) n y dF(y)
-
ydF(y)+F-'(t)(1 —t)
(O,F '(t)]
where the last two inequalities follow from continuity of F and Proposition
1.1.1. ❑
^n(Y) dY ^ <x<oo,
(8) en(x)=j^ 0_
^n(x)
(9) L (t)°X
Jo F I(s)ds, O^t^l,
I -
j F '(r)
(10) XHn'(t) =X J o F „(x)dx, 0 <_t1.
k) (
k / [k` ll
Hn = 1 (n-1 +1)(Xn:i—Xn:i -1)— 1 X,,:i+(n k)Xn:k
L
1
' — .
n n i =I n 1=1
After a little bit of algebra it is easily seen that the corresponding limit processes
are
{J
(15) Z—I vdF '—L
-
t
o udF-'}.
(16) T=Z+(1—I)Z'
_ ( 1
=z+ (1 µ I) {— f 1 _ I U +L' UdF - ' .
l °F o J
The following theorem gives convergence of the processes R,,, Z,,, and T.
to l, Z, and T respectively. Under slightly more stringent hypotheses the R.
part is due to Wang (1978) and Hall and Wellner (1979), the convergence of
Z. was proved by Goldie (1977), and convergence of T. was established by
Barlow and Proschan (1977). The theory of these processes has been given a
unified treatment by Csörgö et al. (1983).
We suppose that the X's have been constructed as X ; = F - '(f; ) where the
e's are the Uniform (0, 1) rv's of Theorem 3.1.1 having empirical process U.
satisfying(3.1.60), and that the limit processes (14)-(16) are defined in terms
of U of Theorem 3.1.1.
If, in addition, f= F' is continuous and >0 on the open support of F and
II (1— I)q/f ° F - 'II <oo for some q in Q* satisfying (11.5.3), then
Now define
r 2 (x)=Var[X—xIX>x]
and set
G(x) = P(x)o2(x)
(20) 2(x)
RELIABILITY AND ECONOMETRIC FUNCTIONS 779
Then
(24) F(x)
e( x)
exp \—
Jo e dI J.
Exercises. If K(x) = $ Fd I/µ, show that K(x) = exp (—J (1 /e) dl) and K
has hazard rate k/ K = lie.
the cumulative total time on test transform, and let G(F) _ ( 2 — jö L dI )/ (2) be
the Gini index. Give a geometric interpretation of G(F). Show that
G(F) = 1
to J Ix —yt dF(x) dF(y) = EjX — Y1,
/a o
T(F)
J
=-- o
(1— t)F '(t) dt,
-
G(F)=1—T(F).
780
FURTHER APPLICATIONS
Exercise 9. Use (17) to show that JI (tJ — l) Pll o- -*„O for some sequence c„ - o0
as
CHAPTER 24
Large Deviations
0. INTRODUCTION
1. BAHADUR EFFICIENCY
then
(2) L„=L„(X,,...,X„)= p„(T„)isthep-value
actually obtained. Bahadur defines {T: n ? 1} to have exact slope c(9), when
B is true, if
I 1
(3) —log L„ -+ -2 c(9) a.s. P0 .
781
782 LARGE DEVIATIONS
Note that (3) implies that a plot of (n, —2 log L„) a.s. tends to look, for large
n, like a half-line through the origin having slope c(0). For 0< e <1 Bahadur
defines
(4) N(E)=NE(Xi,X2,...)
min {m: L„(X i .... , X„) < e for all n > m}
0o if no such m exists.
Thus, N(e) is the minimum sample size required for the T„-test to become
and stay significant at level e.
(5) E- o N(e)
2 log (1/e) — c(8)
a.s. Pe .
Suppose T, n and T2 are two different sequences of tests, and suppose they
have finite and positive exact slopes c,(6) and c2 (0) when 0 is true. Then
N1 (e) c2(8)
we call e 12 (0) the Bahadur efficiency of the T1 „- test with respect to the T2 -test
when 0 is true. It is thus of some importance to verify (3).
(1) D,.^=II(Gn-I)#iG!I
for a weight function qi such that
where
a - t
Table 1. f(a,t)=(a+t)iog t
a t +(1 - a - t)1og 1
-
Note that
_
I a
sup f (a,t) a log :t <_S anda<
1
-1-+5e E
l
for some 5. >0 depending only on e > 0. To see this, just write
[ alfa -I ía 1 a I
t II +G(t) +r log +rJ -r log r + ^
I ()
L«G(r)J t) le
a l 1_t a/ iP(t)
+ I1 - t - +G(t)] [
log 1-t
-
LARGE DEVIATIONS FOR SUPREMUM TESTS OF FIT 785
Remark 1. The functions iP s (t) = [t(1— t)] a , with S >0, satisfy both (2) and
(8). Thus their g functions are identically 0. Intuitively, these functions are
too severe in that they put too much weight on the extreme order statistics;
in fact
a(J2(t) _ —log (t(1— t)) is the most extreme weight function Ifs
(14 )
for which the exact slope c"(F) is nonzero.
This suggests the potential value of tests of fit based on T" 2 ,„. (We recall that
the weight function I,, 12 (t) _ [t(1— t)] - `^ 2 that produced the constant variance
786 LARGE DEVIATIONS
process Z„(t) = U„(t)/ t(1— t) was shown to have IIZ. 11 perform poorly in
Chapter 16. Though A„ = Jö Z„(t) dt performed well in Chapter 5, we note
that A„ does not lend itself to confidence bands.)
while g 2 (a) -> oo as a- 1. Show that the value tQ at which the infimum in (5)
is achieved for g 2 converges to a solution of t(1 — t) = exp ( -2) as a .1, 0.
gç°gm^g^•
( 19 ) P(I)(G„ — I) + II > A)
exp(— n[(6, -02 )A +92 +log(1 -92 )])
{A1021 -1 (1 — B2)[1+(IB21/ei)3(1 — e])/(1 — 62)]}1/2
where 02<0<0 1 satisfy 0 1 — 0 2 = log [(1— 0 2 )/(1— 0 1 )] and 9,- ' + 02' _ A -'
This has surprisingly good accuracy.
Proof of Theorem 1. Consider first D— D y,,, . Fix a ?0. For any fixed t we
have from (3) that
(a) n-' log P(Dy,,, ea)>_n - ' log P(6,(t)?t+ - )-*—f( ,t).
Hence
(b) li
en ' log P(D,y,,, ^ a) ^ oinf g^(a)•
1
if(^(t). t)
Case 1: q(t) -10 as t-+0. Note that for all 0 1 and 0 2 in [y,, 2], with y f > 0,
and for 0 < t 3. we have
aq(t) e9,t+021
<I [aq(t)—B,t+0211log
L 1 —aq(t) log I +g
provided aq(t) > O1 [which is true by (d), a > 0, and YE > 0]. For each n we
let t, < . • . < t,,,,„ + , = i (with t,,,„, +;+ , = 1— t n, m _ ;+ , for 1 <_ i <_ m) satisfy
• .
uuu I I I 1 1 1- 1 l i l t 1 111111.
and
Let
a
(j) < exp ( – nf tni – t n.;+1 +—,
( by Theorem 11.8.1
where
and
p n; ^ exp _n[aq(ti)log(
(
1
t ni
-) –
: exp ( n[f(aq(tnt), t 1 )–e])
–
That
satisfy
(P) P„o V P„.2m +1'5 P( 6 „(t„ 1 ) > Ø)<- nP(e, <_ t) <_ n exp (-2n 2 )
5 exp ( -n 2 ).
Case 2: lim,_ 0 q(t) exists in (0, oo). The derivation of (q) is just a repeat of
step (o).
Combining (b) and (q) gives
<2P(Dm „>_a) .
we may suppose 0 <_ dQ/ dP < oo for all x E X. We now define the Kullback-
Leibler information number K(Q, P) by
if Q« P
(1) K(Q, P)= J ( log dP) dQ
ao else.
Proof. Note that r = dQ/ dP satisfies 0< r(x) <'x a.s. Q. Also log t < t —1 for
0 t < oo with equality if and only if t=1. Thus, for D = [x: 0 < r(x) < co],
Let 11= {A 1 , ... , A,,,, ... } denote a measurable partition of Cl. Let
Q(A,,)
(3) Kn(Q, P) Q(Am) log
M P(A,,,)
Proposition 2. For any measurable partition 11 we have
Proof. We may assume K(Q, P) <oo, else the result is trivial. Thus 0
dQ/dP < for all w. Also ç(x) ° x log x is convex on [0, co) (with 0 log 0 = 0).
Thus for any measurable set A having P(A) > 0, we can use Jensen's inequality
to write
fA d
log
dp
P i dQ fA \ dP dP
dQlog \ dP P(A)
dQP(A)
_ dQ
J dP
A (dP) P(A) P(A)
Proposition 3. Let I denote the Uniform (0, 1) df. Among all df's G that
pass through the point (t, h) with t E (0, 1), the one that minimizes K (G, I)
is the df G, that is linear from (0, 0) to (t, h), and from (t, h) to (1, 1). The
minimum value is
We let
(c) K(G, F)? G(x) log F(x) +(1 —G(x)) log I—F(x))(
for any fixed x by (6). Thus for m' sufficiently large we have
m'
(d ) K(G F)>-->_ K(Sl, F)-- I whenever I'(G— F)t, F)I) >_ m;
where
1
(8) -log P(T (lF„)? a+u „)--K(SZ a , F) as n-"x.
Pc(BI)
(9) K1(G, F)= y_ PG(Bi) log
i = 1 1'F(Bi )
(We use the conventions 0 log oo =0 and a log 0 = —oo for a > 0.) Consider
also the pseudometric do on 2 given by
the smallest topology such that the sets {GE 2i: dn (G, F) < e} are open for
each e > 0, each Fe 9, and each finite partition H. In fact, these sets form a
subbase for the topology.
Exercise 1. The I II-topology is strictly coarser than the r-topology. That is,
all I l -open sets are r-open, but the converse fails. (See Groeneboom et al.,
1979.)
794 LARGE DEVIATIONS
1
(13) --log P(T(F„)?a+u n )-*—K(fl , F) as n- o3.
Large deviation results for the Anderson and Darling statistic A n and for
L- statistics are derived in Groeneboom and Shorack (1981).
Let 5F denote the set of all df's on the real line. For a fixed positive function
:/r and a fixed df F, we define the function TF by
(15) TF(^n)=D^.n.
For a ? 0 we define
(This offers an alternative expression for the limit in (24.2.10), and is in the
format of the Sanov result.)
Proof. Since F o F ' is the identity for continuous df's F, we can define the
-
on intervals density
[t +a/iy(t)]
if0<u<t
t
(a) g,(u)=
[(1 — t) —a/t^/(t)]
if t <u<1.
1—t
THE SANOV PROBLEM 795
Thus
To obtain the reverse inequality, we note that for > 0 there exists a df G, E fl a
for which
Now G* E fl, implies lO (ta ) — ta l ? a/ r(ta ) for some t o in (0, 1). But by
Proposition 24.3.3 the df Gö that is linear from (0, 0) to (t a , Gä(ta )) to (1, 1)
has the smallest Kuhback- Leibler number, when compared to the Uniform
(0, 1) df, among all dfs that pass through the point (ta , Gq(ta )) and its
information number is -K(G,., F) [note that Gä = G,, when Gä(t q ) — to =
a/i/i(ta )]. Thus for s>0
Actually the definition of the T-topology makes sense much more generally;
just imagine that F and G above (or introduce P and Q) are probability
measures on a complete separable metric space. Let int 1 and cl t denote interior
and closure in the r-topology on the set of all measures on the metric space.
The next theorem is from Groeneboom et al. (1979).
1
(19) --log P(1' ,, E fl) -. —K (fl, P) as n - 00.
CHAPTER 25
0. INTRODUCTION
We now return to the situation of Section 3.2 in which X,,,, ... , X,, are
independent with df's Fn ,, ... , Fnn . Good theorems for the empirical process
in this present situation will require an extension of the DKW inequality
(Inequality 9.2.1). This is the subject of Section 1. Just as U n (t) - Binomial (n, t),
in the present situation the marginal of the empirical process at a fixed point
has the generalized binomial distribution. In a very strong sense the generalized
binomial distribution is less dispersed than an appropriately chosen ordinary
binomial distribution. This fact is the subject of Section 2. This result is then
used to obtain in probability "linear bounds" on the empirical df in Section
3. Armed with these inequalities, we then establish the weak convergence of
the empirical, weighted empirical, and quantile processes of independent but
not identically distributed rv's in II /q1I metrics in Section 4. Section 5 extends
our earlier work on L- statistics to this case.
Let X„,, ... , X. be independent rv's with arbitrary df's F„,, ... , Fnn as in
Section 3.2, and let
( 1 ) Fn(x) =n
i=1
n
(2) F(x) ° F,(x)= n '
— Fni(x),
796
EXTENSIONS OF THE DKW INEQUALITY 797
and
(5) Ec?(V)<E*co(V),
(6) E(p(Vnp)-E*(p(VnF)•
(8)
to 0
—
Jo pd( H)<_ j
pd( K) —
for each function on R + of the form ç (t) = c(t — b )+ with c, b > 0. Finally,
suppose that K (x) - " is convex for some 0< a <1. Then
If K(x)_a is convex for all small a >0 [or equivalently K(x) = exp (—f(x))
with f convex] then
Note that if (8) holds for all / convex from R + to R + with cp(0) = 0,
then the hypothesis of the lemma, and hence (9), holds.
Proof of Lemma 1. Let f(t) K(t) - °. Since f is convex and 7, there exist
constants p and q describing a supporting hyperplane such that
side can be integrated by parts, and using (b) and K(oo)9(oc) =0 by (b) gives
(d)
('
J
J x—u
o
1
°°
K(t)dt^
1
K(t)dcp(t)=
x—u
( °° °°
[p +qt]- 'f °dt.
Hence
Proof of Inequality 2. Let M" =' I) (f'" — F") + II = n - " 2 V"F, and set, for A >_0,
and
where c = 29. For cp satisfying the hypotheses of Lemma I it follows from the
DKW inequality (Inequality 9.2.1) that
fco
^cpd(—H*) ^ j cod(—K)
(a) J o o
. cpd(—H,,)s f cpd(—K).
(c) Jo o
Since K - ° is convex for every a >0 and H"(0) = 1 <c = K(0), (7) follows
from Lemma 1. ❑
(iii) Replace both x i and xj by their average to obtain a new vector x (1) =
1/ 1/
(x1, ...exi — l,21xi + xj), xi+le... xj—1, Rxi+xj), xj +l,.. ., xn);
(iv) Repeat this process, denoting the vector obtained after m steps by x (m) .
x (m) _Mm M, x.
For any x E R ", let x = n -1 Y_;_, xi and z = (z, ... , .z) E 18 ".
equivalently,
S. M. M,- 1 J asm-"
n
EXTENSIONS OF THE DKW INEQUALITY 801
n
llSmF — F1I=,^"
max sup I Z S m (i,j)F"; (x)— 1n Y_ F ; (x)
-oo<x<m _, j
-*0 as m -- CO,
and hence
(b) = ^p(a)+
fCaa+2]
V'(z)P(V> z) dz
and to complete the proof when n =2 we need to show that the right-hand
side of (c) is >0.
802 INDEPENDENT BUT NOT IDENTICALLY DISTRIBUTED RV'S
Let T, = min {X,, X2 } and T2 = max {X,, X2 } denote the order statistics, and
note that
_ (1J(t)+A(t))
Now let
a,(z)=sup {t: A(t)>z -1},
and
so that a, and a 2 are monotone and a 2 (z+ 1) = a,(z). Thus it follows from (e)
that V = a v (1 +A(T,)) v (2 +A(T2 )) and hence that
IN + A `
i V
II
A V
a fit.
Figure 1. ^i
EXTENSIONS OF THE DKW INEQUALITY 803
while
P(T,<xorT2 <y)
= F 1 (y)F2 (y)+F1 (x)(1—F2 (y))+F2 (x)(1 — F,(y)),
it follows that
P*(V>z)—P(V>z)
G 2 °a 2 (z) a+1-z
(f)
G2 ° a 2 (z) —2G ° a,(z)G ° a 2 (z) a <_ z.
E*rp(V)—Ecp(V)
=
f cp'(z){G Z ° a 2 (z) —2G ° a,(z)G ° a z (z)} dz
°
+
f
a+2
p'(z)Gz° a Z (z) dz
f
++
= J ^p'(z){G °
a+^
°
a z (z) — G ° a,(z)} 2 dz
a+,
+
f {ç'(z+l)—^p'(z)}GZ° a,(z) dz
a
>0
since cp is >' and convex implies that So' is >O and 7. This completes the
proof when n=2.
804 INDEPENDENT BUT NOT IDENTICALLY DISTRIBUTED RV'S
Now suppose that n >- 3. Choose two variables X and X; call them X, ;
(h) E(c➢ (V)I X3, ... , Xn) _< E * (w(V)I X3, ... , Xn)
(i) Ecp(V)<_E,cp(V)
where E denotes expectation under F = (F,, ... , F„) and E, denotes expecta-
tion under F, = (2(F, + F2 ), Z(F, + F2 ), F3 ,.. . , F„), and we have shown that
the "regularization operation" which consists of replacing two coordinates F
and F; of F = (F,,..., F„) by their average 1(F+F) increases Ep(V).
To complete the proof, first note that since a <_ V< a + n a.s. where a
sup, (—a(x)), p( V) is a bounded continuous function of X 1 ,. . . , X„, and
hence the function F =(F,,...,F„)-Ecp(V) =f • . . f cp(V(x)) dF,(x,)
dF„(x„) is continuous with respect to the topology of weak convergence of F.
But it follows from Lemma 3 that S,„F F uniformly, and hence weakly, as
m -> oo. It therefore follows from (i) that
then we have
(3 ) Eg(X) ^ Eg(Y).
Now
n
f(g) = Y, g(k)P(X = kl a)
k=0
n
_ Y- g(k)[(1—a )h (k)+a h (k-1)]
; ; ; ;
k =0
n
_ F, g(k) {(i —a )[(1—a )h (k)+a h(k-1)]
; ; ;; ;
k0 =
+a;[(1—aa )h(k-1)+ah(k-2)]}
n-2
g(k)h(k)}
k=0
f l—I
+a a I Y, g(k)[h(k)-2h(k-1)+h i (k-2)]^
; ;
k-0 11
(b) =B;;+(a;+a;)C;;+a;a;D;;;
806 INDEPENDENT BUT NOT IDENTICALLY DISTRIBUTED RV'S
n-1
(c) C;;= [g(k +1) —g(k)]h(k)
k=0
and
n-2
D = E [g(k +2) -2g(k +1) +g(k)]h(k)
;;
k=0
1(d)
1=0
D;; :5
0 whenever a ; 0 a;
whenever a ; 0 a; with 0 < a ;, a; < 1.
We now assume (2). We saw from (d) that iff (p) = E(g(X) p ) is maximized
in L at a, then
Applying (e) to the representation for D;; given in (c) shows that h(k) = 0
for k =0,..., n —2 for any pair having a ; 54 a;. But we must always have
h (0) + • • • + h (n — 2) = 1. Thus a ; # a; for some i O j is impossible. Thus the
;; ;;
maximizing value a must have all coordinates equal. Thus a =/5, and (3) is
proved. ❑
Exercise 1. (Feller, 1968) Show directly that for p in L we have Var (X)
-j 1 p; (1 —p; ) =nP- 1 p;--np(1 —P)= Var(Y) with equality if and only if
= pn =. (This is from Feller, 1968, Vol. I, p.231.)
BOUNDS ON F„ 807
3. BOUNDS ON F.
It is often useful to bound F,, above or below by some function of F„. Inequality
1 gives the probability bounds necessary to establish an extension of Inequality
10.4.1 to the case of independent but nonidentically distributed rv's. To simplify
notation, we state these theorems for the left tail of F. and F,, only; by
symmetry, analogous results hold for the right tail.
Let X,, 1 ,.. . , X,,,, be independent rv's with arbitrary df's F,, 1 ,.. . , F„„ as in
Section 1, and let
11 F j F,^ (X)
(1) =sup
F —co<x<oo Fn (x)
and
The following inequalities should be compared with those of Section 10.3 for
IIG,/III and J1I/6„ II L.1.
Inequality 1. (Van Zuijlen) For all n ? 1
P(IIF„II^A) s exp(1-1/A)(1/A)
(3) for A>1
F„ J 1—(1/A) exp (1-1/A)
and
/ F„ °° \ eAe ” -
Theorem 1. For an array of arbitrary df's, and any e >0 there exists a number
A =A(e) such that
F„
(5) P(F,, <_ AF„ on (—ox, co) and F„ >_ on [X,, ,, (X)))>_ 1—
:
Proof of Inequality 1. By (3.2.36) and (3.2.37) the suprema in (1) and (2) are
bounded by the same suprema defined in terms of H,, and R. of Section 3.2;
hence it suffices to consider the case when all the F„ 's are continuous.
;
808 INDEPENDENT BUT NOT IDENTICALLY DISTRIBUTED RV'S
F (I >A) = P(F n (x) >_ AFn (x) for some —oo < x < oo)
P( n
(Xn:i Fn'( ))
i1 P
(a) = L P(Sn - 0,
f=1
P(Sn >i)<_P(Y>i)
=PI YAI
np"
<_exp (—nph(A)) by Inequality 10.3.2
^n ^
Pt >_ A )< exp(—i1(1/A))
!
'
F.
\ _ exp (-1(1/A))
(c) 1 —exp (— h(1 /A))'
and (3) follows upon noting that h(1 /A) = 1/A —1 +log (A)> 0 for A>!.
We now prove (4). For A >_ 1 and n >_ I we have
P( Fn(Xn:i)^A n
1
=2
(n/A)+1
(d) _ I P(Sn ?n —i+1)
1=z
CONVERGENCE OF X,,, Y„, AND Z. WITH RESPECT TO /q^J METRICS 809
Y exp(— (i -1)1(A))
P \IIFnn llx 1
2
(f) exp ( h(A)) —
—exp (—h(A))'
and (4) follows upon noting that h(A) = A +log (1/A) —1. 11
Open Problem 1. Formulate and prove appropriate analogs, for the case of
independent but not identically distributed random variables, of Theorems
10.2.1, 10.5.1, and 10.6.1.
Let c,, ... , c a ,, denote arbitrary constants (Inequality I below is true even if
the c,, ; are rv's such that the (c 1 /V?, X„ ; ) are independent). As in Sections
3.2 and 3.3 the weighted empirical process is
i n
(2) I(x)= CC'y' cni[l(_,x](XI)—F,(x)] for —oo<x<oo,
810 INDEPENDENT BUT NOT IDENTICALLY DISTRIBUTED RV'S
in
(3) F„(x)=— c. F (x) for —co<x<oo. ;
for the reduced empirical process 1„ of the associated array of continuous rv's.
0
q([) -2 dt <oo
and that
Then
(iv) Suppose (8) holds and a Skorokhod construction has been performed
so that II4 —7L lJ -* p 0. Let h E.2 . Then
Cl
I
(9)
o
hdn
7L
° f
o ,
hdZ
ä
for some ry 5 h dZ on the Skorokhod probability space.
CONVERGENCE OF X„, V,,, AND Z. WITH RESPECT TO 11 /qff METRICS 811
X, to satisfy
Corollary 2. If (7) holds [with all c,, ; = I and n ' E; G,, 1 (t) =z, 0 < t s 1],
-
The nexit corollary is useful for proving limit theorems under local alterna-
tives. Recall that the df's F,,,, ... , F„„, n > 1, are said to form a nearly null
812 INDEPENDENT BUT NOT IDENTICALLY DISTRIBUTED RV'S
array if
(14)
Imax
i,;_nHF F; — F II -'0 as
Corollary 3. Let F 1 ,. .. , F , n >_ 1, denote any null array. Then there exist
row-independent rv's X",, ... , X"", n >-1, with these df's such that the reduced
empirical and quantile processes X. and V. of the associated array of con-
tinuous rv's satsify
for all q E Q with Jö q(t) -z dt < oo. Here U denotes a Brownian bridge. Thus
the empirical process f (F n — Fn ) of Xn ,, ... , Xnn , n > 1, satisfy
Our proof could be made to rest on the following inequalities, but won't.
P(JIPEn11>A +3)
k \
^P max I E cni[ 1 (_0,.](Xni) — Fni]+^II > A +
( :^^ki5n
(18) <_ 8P(IT I> A/4)/[1 - 648 -2 ET„]
T
where e,, ... , e" are iid Rademacher rv's independent of X 1 ,. .. , X" n and
./'
(19) n
7' = C,C E,Cnt'Y(Xnt) =^0,
f
.'
W Z dF"^•
f
/2
(20) q >_ 0 is 1' and right continuous to [0, 2] and [q( t)] -2 dt < oo.
o ,
Any value y with f(7)<_ 9 where !ö [q(t)] Z dt < r 3/1024 will work; thus y
depends on n, c n ,, ... , cnn , F 1 ,. .. , F" n only through Fn . Of course,
e \
(22) l1
P(I Zq. 1o>e)e
///
for this choice of 0, independent of n, c n ,, ... , C nn, Fn ,, ... , F"".
k \
(d) p,=P^1ma jXsjII>A
„I E I by (A.14.7)
k
C 2 P( 1 ma n (^' e i Cnil( - m, • ](Xni)Y'll
} ^/
/ n A
(e) - 4P1 y^ E1Cn1 1 (-m .I(Xni )+P ^ 2) '
Let RC n:; denote the process associated with the order statistic X ; thus ;;
X n:; = X (D); , where D(i) = D, 1 denotes the rank of Xn;. Since cr is \, we have
(f) I n
^ EiCnil(—^']\Xni)WII ^ max c1 (Xn:k)I Y-
i= 1 I--kern
k
^`
k
J=1
- D(J) CnD(j)I
(g) ^ 2 t mk
x L
n E D(j) CnD(j)I( Xn:j)
j1
using the monotone inequality (Inequality A.2.2) for (g). Thus Levy's inequality
(Inequality A.2.8) at step (i) gives
k /
max Z
P1- 5 4P lsksn ED(j)CnD(j)`Y(Xf:j) >A /4
j=1
k
(h) =4E max I Y- £D(j)CnD(j)'tl(Xn:j) > A/ 4 Xnl, ... , Xnn
jp( --5k,^n 1j= 1 1
(1)
[1 :5k
^n i Ii
k2
= S -2 E max Y- E,(RC n; —RC n,)
H k
by (A.14.7)
2 1
=S -Z E max
[ 1!5k^sn Z
,.^(Xn,)-1(_^*^(X *n;)}
J
j ( k 2
<_ S Z E max 2111 E;Cnr1(-m.•l(Xn,)]
i =1
k1 ] Z II]
r k
<4S -Z EI max
L lsksn ;_1 e;Cn;l(_^.)(Xn;)I
since Z Z*
CONVERGENCE OF X,,, V,,, AND Z. WITH RESPECT TO 11 Jail METRICS 815
j k 2
(m) X83-2EI EiCni1(_.*](Xni)H
i =1
((( k Z
<_ 323 Z ES E j I max 1-DQ)CnD<j)+G(Xn:j)
l ll lsk<_n j _ 1
( Xn 1, ...
)( n /
(0) ^ 645-ZE {E} ED(j)CnD(j) 1 (Xn:j)] xnl, ... > x}}
l j=1
_ f^ ^2 di n = f 1[o,o](Fn){q(Fn)] -2 din
0 0
(a) ^ J 0
q -2 dt by (3.2.58).
Let y be such that F„ (y) < 0. Then plugging (a) and (b) into the Marcus and
Zinn inequality (Inequality 1) gives
and that
( a a
(e) EZ„(S, t] 4 2 3 j Z4(t)t] +EZ,,(S)a
\q(s) q(t)) }
35 2 +s[max c „ /c'c] ; Z 1 1 a
+ s 2 s \q(s) q(t)!
3(t —s) 2 Z 1 _ 1 a
+3s
c 8L q4(t) (q(s) q(t)/
r 2
<48(` q u ( ) -2 du )
J
\s /
1 1 2 _ s
[ I_ q(S) ] 2
t _ q(s) 2
s Lq(s) q(t)^ q 2 (s) q(t) q 2 (t) L 1 q(t)J
z(t)J
t I s 2 t—s
q 1 —t
q2(t)
Now (24), Exercise 2.3.6 and (b) show that (i) -(iii) hold. The proof of Theorem
3.3.3 still suffices to imply (9). This completes the proof.
We make one further comment on the continuity of the process Z =1/ q
inasmuch as we appealed to the nonstandard Exercise 2.3.6. On any subse-
quence n' where Zn ' -* rd. Z we have convergence of third moments. Thus
(g)- { 7 f 5
q(u)du}1
by (24). Thus the familiar Exercise 2.3.5 shows that P(Z E C) = 1; that is
P(7L/qEC)=1.
Our original proof of this theorem used (22) to show that for 0 small enough
we have
this is the key step. We can replace Z„- by 7L in (h) since P(7/q E C) = 1. The
rest is trivial. ❑
Remark 1. Theorem I is from Shorack and Beirlant (1985). The same con-
clusions hold for the processes IL,, of the a n; 's, but the additional hypothesis
max.-, max,^ n nc, /c'c<M is required; see Shorack (1979a).
;
Proof of Theorem 2 and Corollary 2. All but (11) follow immediately from
Theorem 1.
818 INDEPENDENT BUT NOT IDENTICALLY DISTRIBUTED RV'S
Now the X. part of Corollary 2 follows from (10) and Theorem 2.3.4. To
prove the 0'„ part of Corollary 2, and hence (11), note that by (3.2.17)
(a) = R,+R 2 +R 3 +R 4 .
Now van Zuijlen's inequality (Inequality 25.3.1) implies that for every E > 0 there
is a A = A 6 > I such that
Hence
I
1 /n q
'/2
Ilq(A )I '/2
0 o
su I/2
q(At)
q(t)^
Vl
R,
—I [Xn(C.,') —
q(G )
X(C„')] q(G ')
q
ll'"-
Ihn
G
q vn
S^11(X„ SC)/q11
-0 — , as n-'ac'
by the X. part of Corollary 2. Using the same idea with (c) and Theorem 3.2.1
shows that R2 -+‚' Now R 3 < 1/[✓n q(1/ n)] -* 0, and R4 - p 0 by continuity
of X/ q with X(0)/ q(0) = 0. Hence the right-hand side of (a) --> p 0 as n - o0
and the V*„ part of (13) holds. But (13) implies (11). ❑
Proof of Corollary 1. From (3.2.15) and n' ^i Gn; (t) = t for 0 s t 1 it
follows that
S t—st
A —K.(s, t) n Y__
=—
;
n
i
G n; (s)Gn; (t) —st
(a)
n ;_,
n
_ — E (Gni(S) — S)(Gni(t) — t).
CONVERGENCE OF X,,, V,,, AND Z. WITH RESPECT TO 1q11 METRICS 819
and
M„ =11 tr n is tr ts II,
n ( / / /
(b) 1,—lv= l im I i Gni(tr)—tr)(Gni(ts)—ts)II
(
„-. = n i =1
for all A > 0. Letting k = 2, t. =j/2m for j = 1, ... , 2' and letting m -* 00, (c)
implies (12). ❑
( z
(27) P(±TT >, 1)<expl — for all A>0,
2B ^AIBcII))
P (
I` —n i=1
— j`
j
max I Et CnD ( ^ x) ^ ni 2P Y_ x
(
\ i=1
n
Eicni ^ )
for arbitrary independent rv's X1 , X2 ,... and arbitrary constants c,, c2 ,....
With
(31) a ^ — \ c? / loge ( ci )
Z
(32) lira sup <oo a.s.
Use Inequality 3 to establish (32) in the case 1:, , c, < oo. (Note the reduction of
Section 3.2.)
Marcus and Zinn in fact consider the more general process
n
(33) {^1,1tx5t—E(RI1tx1^xt)}/9(x), —oo<x <00,
and establish conditions under which the 11 Jj of this process is a.s. O(a) for
appropriate sequences a,, -> oo, here (rl . X ) are independent vectors.
; ;
5. MORE ON L-STATISTICS
Suppose that X,, ... , X,,. are independent with continuous df's
F 1 ,.. . , F... Let c.,..., c,,,, denote known constants, the scores, and define
J,, byJ,,(t) =c,, ; for ( i-1/n <t^i/n,1 i<_n. Let h be a fixed function. Then,
if X.,, ... , X,,,, are independent rv's with df's F,, ... , F.,,, the linear combina-
tion T„ we want to consider is defined by
i n
(1) T„ c,,;h(X,,:;)
n =1;
in
n ;_ i
we have
where, as in (14.1.4)
(5) µ „(J) =
f o,
g„(t)J(t) dt = ^ th(F;')J(t) dt.
0
A version of the following theorem for bounded scores was given in Shorack
(1973).
all n >_ 1). Further, suppose that the covariance functions K. of the reduced
empirical process X,, satisfy
where
In addition, if
then
Corollary 1. If lim inf ra o, >0 and the hypotheses of Theorem 1 hold, then
so that
o —Tn =
r jo At st
-
J 0
l [S — —
K,,(s, t)]J(S)J(t) dgn(S) dgn(t)
J
1
—(Gni(S)—s)(Gnf(t)— t) ✓ (S) ✓ (t) dgn(s) dgn(t)
0 .n
,
il
I 2
K(s,t) foralln>_1
Example 1. (The sample mean) When h(x) = x and Jn = 1 for all n >_ 1,
T = X, the sample mean, and (13) yield the well-known formula
T
Oro Tn -- L (t'Lni i ) 2 ,
ni =i
0 x<A
F T (x)= F(x) A<_x<B.
1 x>B
the F„ ; exist, let f„ = n ' ^;_, fn;, and suppose that f„ ° F-'(1/2)> 0, then,
-
12(J)- 1)
and
n -
where (1—e)G(0)+eH(0)=2.
Example 4. (The mean deviation from the median) Let h(x) = x, J(t) _
sign (t — 2), and suppose that 0 = Fn' (z) is unique. Then
and
Empirical Measures
and Processes for
General Spaces
0. INTRODUCTION
Let X,, X2 ,... be iid X- valued random elements with induced distribution
(probability measure) PEP on X. Here (3C, si) is a measurable space and P
denotes the set of all probability measures on X. The empirical measure of
X1 ,.. . ‚X is
(1) Pn =n
where 8,, is the measure with mass one at x E X, 5(A) = 1„(x) for all A E Si.
The corresponding empirical process is
= vT [P (f) - P(f)]
=n ' / 2 E [.f(XX)-Ef(X1)l.
-
For a fixed function f it follows from the strong law of large numbers, central
limit theorem, and law of the iterated logarithm that: If Ef(X) <oo, then
826
GLIVENKO-CANTELLI THEOREMS VIA VAPNIK-ÖHERVONENKIS 827
and if Ef2 (X) < oo, then with o- 2 (f) = Var [ f(X )], both
and
or
converge a.s. to 0? Throughout this section we assume that D. (` ') and D. (Y)
are measurable. A variety of generalizations of the classical Glivenko-Cantelli
theorem are available. Here we consider extensions to rather general classes
based on the following combinatorial ideas.
Let X be the space in which observations X 1 ,. .. , X,, take their values, and
let W denote a class of subsets of 3r. For any finite subset F of X, let
(2) #,e(F)=#{FnC:CEW}
and call it the growth function for . Note that m^ (r) 2'. Let
and
so that v denotes the smallest integer for which some subset of size v cannot
be shattered by W. The integer v g is called the Vapnik-Chervonenkis index
number of W If vw <‚ then T is called a Vapnik- ehervonenkis class, or VC
class.
(7) sup Pr[maxD m (W) >s]= sup Pr[max11P m —PII w ?r] -*0
PE 9 min P€P m?n
as n-c .
GLIVENKO-CANTELLI THEOREMS VIA VAPNIK-HERVONENKIS 829
Several inequalities provide the key to the proof. The first is the classical
Vapnik-Chervonenkis (1971) inequality and the second is a variant thereof
due to DeVroye (1982).
Inequality 1. If W is a VC class of subsets of är, then for all A > 0 and n >_ 2
we have
and
(9) Pr(D„(t)?1t)<_4mw(n2)exp(4A+4A2)exp(-2nA2)
for n>(2vA - ')
provided 40 2 A 2 n'> 1. To prove this, let X ( ' ) = X, 3E ; and 3E (2) = X"+; ; and
let P ' and p(2) denote the measures on X' and 3E (2) induced by (X 1 ,. . . , X„)
( )
(a)
f' [f.
(1) r^Z^ 1 [IIP^-Q^ III>( , e)a) dP (2)] dP(,)
using Fubini's theorem
where
(b)
A(n=Lx(`):11P„—P11 >A].
Now if x ( ' ) E A ( ' ) , then there exists a set C0) E T such that IP„ (Cx ww>, x ( ' ) ) —
P(Cx ('))I > A. Thus if x (z) e Az23,, where
then IMP „(•,x")—Q „(•,x (2) )l^ W >(1 -8)A. Thus, from (a),
__P f ')
A
2 (A9) dP^'^
x
fA [1-40212,]
dP” by Chebyshev's inequality
it will suffice to
where X = (X I , ... , X„ + „-) and x = (x,, ... , x„ + „'). For this it will be handy
to have the notation so that
n+n'
(12) Fn+n' (C)= l c (x ; ) /(n+n') for CE 16
Consider the following urn model for (13). Imagine an urn that contains n + n'
balls of which
(14) k=kc=(n+n')^nn+n•(C)
bear the number 1 while the rest bear the number 0. These are thoroughly
mixed (since X,, ... , Xn+n • are iid). Then n of them are chosen at random
without replacement and designated to comprise the first sample. The number
W of 1's in the first sample has the same distribution as does the conditional
distribution of n P n (C) given X = x. Thus
We also note that for n'= n 2 — n, 0 =1 / (nA ), and n>_2, the last term in (10)
satisfies
(j) [1-1/(4O2A2n')]-' 2.
•
XI
Now for the third key step. Each set C E 16 defines a set (see Fig. 1)
Combining (22) and (j) into (10) gives the inequality of (9).
Most of this was learned from a P. Gänßler seminar, given at the University
of Washington, in which a slightly different inequality was proved. ❑
One reason for the importance of VC classes is that when v < co it is possible
to obtain a bound on the growth function m(r) defined in (3) that is of an
order far less than 2'. This is the fourth key step.
for all PE 9 by applying (8) with the bound (23) on m w (n 2 ). Thus (7)
holds. ❑
Remark 1. Note that for all sets C we have used Hoeffding's crude bound
(A.4.9) on the hypergeometric ry We of (18). For most sets, sets C having
the k c of (18) larger than 1, we can use the far stronger bounds of Inequality
11.1.1. However, the combinatorial method of proof given does not allow us
to claim that with high probability k c is substantially larger than 1. The next
section will incorporate such ideas in a different context.
Exercise 1. Use 0 = z and n' = n in the above proof to show that (8) holds.
Examples of VC Classes
°
Example 6. If X = R d and W = {all lower layers in R } = {B: x E B and y_ ^ x
implies yEB}, or if 'W = {all convex subsets of l }, then * = oo. v
Exercise 3. Let X 1 ,. . . , X,, be iid d vectors, let u E S d- ' be a unit vector in
R d, and consider the empirical df of u • X,, ... , u • X":
Even though s c = oo, it may still be true that # ( {X,, .. .‚X})<2 with
probability converging to 1, and hence the Glivenko-Cantelli theorem may
continue to hold for such classes. The following theorem addresses this
possibility.
Note that (28) holds for any VC class W (uniformly in P). In general,
however, the convergence in (26) and (27) need not be uniform in P.
Exercise 4. Let
Entropy Conditions
A useful way to say that a class of functions . is not "too big" is to control
the growth of some measure of the number of sets needed to cover JW. Several
different versions of the metric entropy functions have proved useful, and we
now define a few of the more common variants.
Let E > 0, 0< r <— oo and suppose that P is a probability measure on (3r, sq).
Let IIJIIY = If Ifl dP}'I'r. Then define
•
The function
[f g]={hEYO(X,sa^):f<_h<_g}
where the set is empty unless f s g. Given e > 0, r> 0, and a probability
measure P on (X, .si), define
(5)
and
Dr(E, S, F, 9 ) = min k: 1^k
Dr(E,
xeS
[f(x) — f (x)] r ^ E
for all fc3W
xeS
F(x) r
where the supremum is over all finite subsets S of 3 . Then Pollard's entropy
is
(6) Hr(e,F,F)=logD,(E,F,F).
Note that if FE2'2 (P), then (7) implies that (8) holds with room to spare.
Pollard (1982) shows that when F E $'2 (P) and (8) holds, then Z. indexed by
9 satisfies the central limit theorem uniformly in Je 9; see Section 3.
Example 1. Let ^= maß,,,, = {f: [0, i] d - R such that max i ,,,,,, sup {ID"f(x)I:
xe[0,1] d }+max ipl =„, sup {IDPf(x)— D °f(y)^/^x—yj ' : x^ye {0,1} d }M}
where m =[Ji3, y = ß—[ß], and 4pl =p, +• • •+pd with p =(p 1 ,...,pd ) for
integers p i . Then
for some constant A = A(ß, M, d); this is due to Kolmogorov and Tihomirov
(1959). Thus
Theorem 1. (Blum, DeHardt) Suppose that 3'c Y, (X, .s9, P), that
NB, (E, 9, P) < oo for all e>0, and that 11P.—P11, is measurable. Then
It is not hard to show that (10) holds uniformly in Pe.9, for any collection
.9 1 satisfying
sup fF1[ F ^ AJ dP0 as A-oo.
PER 1 J
1 1,2
(1) er(f,g)= J(f — g) z dPl forf,gEYz.
Let W be the isonormal Gaussian process indexed by '2 (3E, s4, P); i.e., the
rv's {W P (f): fE .z } are jointly Gaussian with mean 0 and covariance
for f, gE^P2.
(2) EWP(.f)WP(g)=P(f9) =
J fg dP
See Dudley (1973) for more information about W. Let Z P be another Gaussian
process indexed by 9z (3r, s', P) with mean 0 and covariance function
(3) E7r(f)7p(g)=P(g)—P(f)P(g)
=f (f — J fdP) (g —JgdP)dP.
(4) ?L(f)_W(f)—P(f)W(l)
Exercise 1. Show that if (X, ,i, P) = ([0, 1], , I) where I denotes Lebesgue
measure, then {W 1 (1 [0 ,, ] ): 0_< t_< 1} is Brownian motion, while {l 1 (l [0 ,, ] ): 0<_
t <_ 1} is Brownian bridge.
?0
(6) PP(fg)=OpP(f —g)=
< (f,g)•
while
and
(12) for every e >0 there exists a S >0 and n o such that for all n ? n o
Pr*Isup[IJ
(f—g)dz^I:.f,gcF,ep(fg)<3 >et<E.
1
Then .^ is a functional Donsker class for P; that is J
k
(13) n - ' /2 manx VkZ k — VjDI M^ -p0 as n-00.
i I
-
and write
i2
(H, (u 2 ))' du <oo,
(16).1 0
Another result in this same vein is the following theorem due to Pollard
(1981b).
To state Pollard's theorem, we introduce the notion of K -measurability:
Regard Z, as an element of the space B(.) of all bounded functions on JV
metrized by uniform convergence. Let C(S) consist of all bounded real
functions on which are uniformly continuous with respect to the.T z (. , .d, P)
norm on 9. Finally, a random element of B(f) is K- measurable if it is
measurable with respect to the o -field generated by all closed balls in B(3W)
centered at points in C(f). If Z. is stochastically separable for all n> 1, then
Z. is K-measurable for all n > 1.
(17)
J 0
[H2 (u, F, F)]" 2 du <ce,
Inequalities and
Miscellaneous
0. INTRODUCTION
Recorded here are many of the most useful inequalities in probability theory.
Many of them are used in this book, but not all.
When dealing with sums of independent rv's X i ,. . . , X„ we will let
O'k= Var[Xk ] and s„ -o +• +Q „. Also S„=X, +• • • +X' and X„=S„/n.
Recall that we write
(I) Xk=(0,uk)
( 2 ) Xk = Fk ( 0 , o k)
842
MAXIMAL INEQUALITIES FOR SUMS AND A MINIMAL INEQUALITY 843
Jensen's Inequality 4. If g is convex on (a, b) with -m <_ a < b < oo, P(a <X <
b) =1, and EIXI <oo, then
(4) Eg(X)>
_g(EX).
(7) EIXYI<_[EX2EY2]'i2
(3) P( max ISk j/bk >_d)< ok/bk /^1 2 for all ,t >0.
1 k n L k =1 ]
and (Levy)
(6) P(max Sk >_A)<_2P(S n >A- 2s n ) for allA>0.
1^ksn
2 log 4n 2 2
(8) E( max S k
1^ksn ) ^ log 2) Sn.
Symmetrization Inequalities
0 we have
Inequality 10. Let X,, ... , X„ be iid (0, 0.2 ). Let 101 < 1 and let 101 < S < 1.
Then for all s >0 we have
X1 +... +Xn
(13) P
C ✓nb.
-0 'SE
for all n 2t1 and some 0<c 1 , c2 < oo (where bn =v' 2 log t n).
See Loeve (1977) for most of these. See Mogulskii (1980) for Inequality 9
and Goodman et aI. (1981) for Inequality 10.
(14) JjZ/gjjä`-2
I J ,. [q(s)]_' dZ(s) 11a
[a ]
(we can replace 2 by I if Z is 7 , but 2 is best possible in the general case) and
If both q and Z are left continuous, the result still holds. Likewise q \, and
c b < a is allowable provided q and Z are both right (or left) continuous.
Note that
Thus
= V+ ( t)
I[at]dq+ V+(t)q-(a) — V (a)q-(a)— V
fla,t)
q
(f) =I [a.t]
[V+ (t)— V_(s)]dq(s) since wma q_(a)=0
Hence
(k) Z(t)
f fa,t]
[V+ ( t)—V-(s)]dq(s).
Thus
Thus
Z
(n)ilgilb^ q +li 1ib
C
1 d[Z—Z(b)]ii^
cIZ(b)I+iif(b,-,
q(b) q b
by applying (14).
Suppose now both q and Z are left continuous. At any point x where q is
continuous, changing the sense of continuity of Z does not change the value
of [IZ+ (x)j v IZ_(x)^]/q(x). At any point where q has a jump, changing the
sense of continuity of both q and Z does not change the value of
[JZ+(x)J /q+(x)] v [JZ- (x))/q-(x)]. ❑
The basic idea of this sort of theorem is that the df of the normalized sum of
independent rv's will converge uniformly to the N(0, 1) df at a rate provided
slightly more than a second moment exists. Thus it provides a rate of conver-
gence for CLT-type results. In many cases, it can be used as an alternative to
exponential bounds in LIL-type results.
The basic results were provided independently by Berry and Esseen. The
versions we give here are variations on the original theme; we give references,
but not original sources. We also present refinements due to Cramer and Stein.
Let c(x) denote the N(0, 1) df, and let 0 denote its density.
(1) g is even, ?0, g(x) for x >0, and x /g(x)?' for x >0.
The classical result of Berry and Esseen uses g(x) = lxi. The next version
does not require the existence of anything beyond a second moment.
The next theorem produces a bound that improves as lxl 1'; but stronger
assumptions are needed.
Theorem 3. Let X 1 ,.. . , Xn be iid (0, 1) with E I Xi12+s < oo for 0< 3 C 1. Then
for all x
where
The key to the above results is the following lemma of Esseen [see Petrov
(1975)].
Theorem 5. (Stein) Let X and Y be random vectors. Let T(X) and S(X, Y)
be rv's. Suppose for some a >0 and 0< E, 8 s 1 and some K 1 , K2 , Ks we have:
(i) T=T+S.
(ii) EIE(SI X) +aT/21<K 1 aE (typically, K1 =0).
(iii) EIE(S 2 I X) —aI -5 Ke as.
(iv) EISIz +s — Ksaes.
Then we have
(10) sup IP(T-x)—P(N(0, 1)^x)I<_(5/S)(K,+K 2 +Ks )E S ^^' +a>
-OD<X<N
If K, = 0, then
2 +s
(11) ET=0,IVar[T]-1I:K 2 e,and E^TI <cO
We shall let the bound for N(0, 1) rv's serve as a standard for later comparison.
as n - oo for some S > 0. Michel (1976) contains some refinements and improve-
ments of this in the iid case. According to Kostka (1973) in the iid case, if (3)
holds for a" = (1 + e) 2 log e n with e>0, then EX 2 [ 1 v log X ]e < eo. Con-
versely, if EX 2 [1 v log IXI] 1+3 ` <x, then (3) holds with a" = (1 ± e) 2log e n.
More generally, if EX 2 g(X)<co where g(x)>'oo and x/g(x)/ as xToo and
if exp ((A./2){1 + o(1)})/g(v') - 0 as n moo, then (3) holds. See also Michel
(1976) for refinements when g(x) = Ixl s.
For bounded rv's we can obtain inequalities valid for all A >0 that are
improvements on the above claims. Those due to Bennett and Hoeflding will
be used heavily, but we list others also.
Bennett's Inequality 3. Let X 1 ,.. . , X" be independent with Xk <_ b, EXk =0,
and Var [Xk ] = o for 1: k <_ n. Let Q Z = (o + .. . + „)/ n. Then for all A ? 0
and
(8) 0(A) >1 -0 if0<A<_30, since i/i(A)>_1 /(1 +A /3) for A>_0.
r >o I
Exercise 2. (Chow and Teicher, 1978) Show that the function g(x)
(e" —1— x)/ x 2 , x ^ 0, defined in the preceding proof, is ? 0, T, and convex
for all real x. (Hint: Use the mean-value theorem.)
(a) M(s)^1—EX+EXe'.
Thus by Markov's inequality, (a), and the fact that the geometric mean does
not exceed the arithmetic mean,
^ exp ( n(µ+A)s) fl ( 1—
— +µi e s )
1 ‚i
cexp
n I (1—/ßi+µies) J
(—n(µ+A)s) —
(1—tl)(A+A) >0.
so=log
854 INEQUALITIES AND MISCELLANEOUS
where
We will now minimize G(A, µ) with respect to A for 0<A < 1— µ, and the
resulting minimum will be denoted as g(µ). Now
(f)H(-_
1 A ) —H(_A
)
where 0 <A /(1 —µ)<I and 0 <A/(µ+A)<1 and where for 11slt<1 we have
2
H(s)= 1 -- log(1 —s)
s
with all coefficients of this power series positive. Thus H(s) is T' for 0< s < 1.
Hence we see from (f) that (a/eA)G(A, µ)>0 if and only if A /(1 —µ)>
,1 /(µ + A) or equivalently A> 1- 2µ. Hence for 1— 2µ > 0, G(µ, A) achieves
its minimum at A =1- 2j.; while if 1— 2µ <_ 0, then G(A, ) achieves its
minimum at A = 0. Plugging these values into G(A, µ) yields for the minimum
>_a) A2/2
(P(VX
11) exp –
( [Y, t vk /n+cA/vn])
Exercise 3. Prove Bernstein's inequality. Note that for bounded rv's it follows
from Bennett's inequality (5) and (8).
(P(
12) i(X –EX)-A)<exp(-2A 2 / ( bi – a ) 2) .
Kolmogorov's Exponential Inequality 7. Let Xk = ( 0, vk) with IXk l < K, 1 < k <_
n, be independent. Then
(13) P(S„/s„>A)<
exp(– 2z(,_AK))
"
forall a <_s„/K
exp (– 4K) for all A >_ sn / K ;
(14)
C
P(S„/s„>A)>exp ---(1 + r))provided A?.l and — K.
•l2 Kn
E
Klass (1976) notes how this upper bound (see Loeve, 1977, p. 254) can be
extended to the event P(max,. k .„ Sk /s„> A); see Klass for the exact details.
Exponential bounds for the special cases of binomial, Poisson, beta, and
gamma rv's are found in Sections 11.1 and 11.9.
Large Deviations
We define
(17) P(X?0)<_K.
(20) f(a) = aT—log M(-r) for r defined by a = (d /dt) log M(t)I ,=,.
(21) R - ' log P(t[Poisson (r) — r]/r A)--* — (,1 2 /2)i(±A) as r--oo. 0
A version of this type of theorem that applies to more general rv's was
proved by Sievers (1969) and improved by Plachky and Steinback (1975). We
present it following a listing of some of the properties of mgf's. See Bahadur
(1971) for Theorem 1 and the next three exercises.
Exercise 5. Suppose P(X = 0) < 1 and M(to ) <co for some t o > 0. Then M is
strictly convex and continuous on [0, t o ], M has derivatives of all orders on
(0, to ), and M' is continuous and T on (0, to ).
Exercise 6. Suppose M(t)<co for all t >0, —co <_EX <0, and P(X>0)>0.
Then M(t) <oo on some (0, t0 ) with to > 0, M'(0 +) <0, and M'(t) > 0 for some
0 <t<to.
MOMENTS OF SUMS 857
1 ^./
Mit)
K to
Figure 1.
Let
t>o
5. MOMENTS OF SUMS
Von Bahr Inequality 1. Let X k = (0, ok) with EIX k r < oo for r>2, 1:5 k ^ n,
be independent. Then
where
./z
+1 \
EIN(0, 1)t'=^ I'(r .
If r is an integer>_ 4 (?3 works for lid rv's), then we may replace absolute
moments by moments.
See Michel (1976) for the above result and von Bahr (1965) for refinements.
See Pinsky (1969) for analogs for Eh(S„/s„) with h” continuous. Gut (1975)
provides a good review of rth mean convergence. See also von Bahr and
Esseen (1965).
858 INEQUALITIES AND MISCELLANEOUS
rth Mean Convergence Theorem 2. Let X1 ,. . . , X,,,.. be iid and let 0< r < 2.
The following are equivalent
We now extend our consideration to the whole infinite sequence. Gut (1978)
summarizes the following results and extends the second to higher-dimensional
indices; Gabriel (1977) extends the first.
For iid rv's we have
For iid mean 0 rv's, Siegmund's (1969) and Teicher's (1971) results show
that the following are equivalent:
cEIS„1' CrE
c,E \ \i l X? / r/2 / \ \i X? / r/z /
for some constants C r and Cr . [See Burkholder (1973).] Also max,. k ^„ Sk i'
may, using Doobs inequality, replace EIS„V.
6. BOREL—CANTELLI LEMMAS
Borel—Cantelli Lemma 1.
In n J z
lim E Y_ P(A A )j/
; k E P(A k )j = 1, then P(A„ i.o.) = 1.
n-.m I I I J
Baum, Katz, Stratton Variation 5. Suppose Y,, P(A n )=oo, P(B„)?a for all
n where 0<a<1 and P(A„B„A k Bk ) = P(A n )P(B„A k Bk ) for all 1 <_ k < n —1
and n ? 2. Then P(A„Bn i.o.) ? a.
See Chung (1974), Renyi (1970), Serfling (1975), Kostka (1974), Baum et
al. (1971), and Klass (1976) for these results.
We also have the following results approximating tail probabilities based
on the principle that rare independent events are almost disjoint. See Wichura
(1973b, p. 446) and Chung (1974, p. 62)
P U Ak P(Ak) as n -* oc.
n n
7. MISCELLANEOUS INEQUALITIES
Inequality 1. (Events lemma) (i) Suppose that for every k> I the events
A k , A_ 1 ,. . . , Aö and Bk are independent (where A 0 denotes the empty set).
Then
(ii) Suppose that for every k >_ 1 the two classes of events
{A ik (Q J A;•k _,) • • (Y- j A;,,)`: i is arbitrary} and {B ik : any i} are independent.
Then
= cP(uA k ).
(ii) Now
U Y AikBik)
P( k i /
/
=P1 Ai1B11])+P([EA11B11] IY_Al2B;z])+ ...
! 1 i
= cP(U Aik I • ❑
\k i /
P(XEC)-P(YeC).
Exercise 4. (See Stigler, 1974) Let (X, Y) have joint df F with marginals
G and H. Then
(1) J
EX = o G(x) dx+
00
J '0
0
[1— G(x)] dx
if EX exists, and
and this same function of x converges to 0 as x -^ too. [Hint: 1%, iyl' dF(y)
lxIrF(x)•]
(i) X„ - ,X as n ->oo,
(ii) EIX l' -3,ElXl' as n-*cc,
(iii) the rv's IX„I', n >_ 1, are uniformly integrable.
Then
Proof. Now
m / i
0Sf (h-h m ) 2 dt= Y_ f ' h(t)- hm Zdt
_
o,
rn
i =1
i
i m
u-1)/m
z
h dt-- h m
m L
=1 (i -1)/m
2 m i
m;-,
m +—Ehm
^—
m , m
(
1 m
—/J
m
^ ^
(b) hm(t)=m(m—t){fi/mhdsl —t
(i )}
+m(t—iMI)
{J(i-l)/m h ds/(t—` ►nl)}
since each term in { } in (b) converges a.e. to h(t) by the Fundamental Theorem
of Calculus. ❑
i -1 i
(6) h.(x)-h m+l for m <t<_m and 1<
—i <_m.
864 INEQUALITIES AND MISCELLANEOUS
Use Scheffe's theorem (as in Häjek and Sidäk (1967, p. 164)) to show that
(7) J 0
i
(h_h m ) 2 dtO as m.
where
1 1
<a <—
12n +1 12n
Euler's Constant 2.
n 1
– – log n ,[ y = 0.5772156649... .
log n =
1 < ao implies n-.=
lim 0.
do
°° 1
— < oo implies lim d„ = oo.
n-.=
Proof. (i) For each integer K there exists a constant MK J 0 such that
1 1 "' 1
oo>MK _ m
K +, nd„ dm K+1 n
(a) ?[log m –(log K)-1+y)/d,„ for all m, K
Thus
0> um MK _> lim (log m)/d.
K- co m-.00
MISCELLANEOUS DETERMINISTIC RESULTS 865
f 06
f( )t
1
dt<oo implies li ö
log(1/t) =0
f(t)
and
s 1
J :g(t)t log(lit)
dt <ao implies tim
m
log2(1/t)
g(t ) 0.
And so on.
Proof. Let 0 < e < S, and
f:+.
fixt so that 0<t<5–e. Then
"
f' ,
1'
uf(u) du
11log(t+E)—logt
uf(u)
du> –
.f(t)
f udu=
+f
f(t)
f06 uf(u)
logt +e)+log (I/t) - -log (1/t)
du_li m
f(t) = iii f(t)
_
an> Y an (n k —1— nk -1) cZ Y- a n , ❑
n=1 n k=1 nk k=1
I an k
— ak« <
an
k-1 n =, n n =0 k-1 n., n
Proposition 4. If a n < cc, then there exists a sequence c n T' cc such that
^ I cn a n <cO.
Proof. (J. Fabius) Let s,,, = a 2m with 0< a < 1. Choose n m T so that Y„ a k <
E. for all m. Let ek = i/I for n m <_ k < n m + , and m >_ I with c, _ = c- 1 =
0. Then
ao m nm + 1 -1 m 1 =Q
We would like to claim that E , (c„ /n) exp (—c„) is <oo or =00 according
as Y Z c„ ' exp (—c„.) is <oo or =oo. This is not true, but we now see what
we can do in this direction.
We first claim that
(2) logi n;—logj asj-*cc
and
n —1 asj-goo.
(3) n+1 ~lo
log j
If we now define d„ by
(5) d„ = c„ n 2 log e n, ,
then
(8) d„ ?A log t n for all n ?some n,A for each 0 < A < 1,
(9)
1
M^ <both (i — n^
{
n; l and ( n;+i _ i ) d.1 <— M«
/ d„
for some 0< MQ < oo and for all j ? some J., and finally
00
(10) Y, d„-' exp (—d„,)<oo.
1=2
If, however,
then
00
Exercise 6. Establish (3), (8)-(10), and (12). [The key points, for k = 1, are
contained in Robbins and Siegmund (1972).]
Integration by Parts
If U and V are of bounded variation on the real line, then Hewitt and
Stromberg (1969, p. 419) show that
and
(13') U+(b)V+(b)—U+(a)V+(a)=
f
(.,b]
U_dV+f
a b]
V+dU
or, symbolically,
f hd (1 F) = J h { F-d (1 1 F) + 1 F dF }
J 1
= h{F (1—F)(1—F_)dF+l—FdF}
(^ 1
1
(18)
= J h(1—F)(1—F_)dF
Also, (S n , fin ), n > 1, is said to be a submartingale of EIS,,I < for all n and
if
In the most common situation Y. = a [S. , ... , S,], and we omit 'n and refer
only to S. We define
Exercise 1. If (S„, fin ) and (S*n, fin ) are submartingales, then (Sn v S*, ."„)
is a submartingale.
Proposition 2. Let S,, be adapted to gn with EIS„ <x for all n. Then (Sn , El,,)
is a martingale (submartingale) if and only if for every n > m and every A E En
we have
Inequality 1. (Doob) Let (S„, Y.) be a submartingale. Then for all A >0
AP(M,,—A)<— J SrdP
[M„ A]
J
= k=1 fwnz-A and r=k]
SkdP= Y J
k=1 [r=k]
SkdP
Sn dP. ❑
[M„zA]
The previous inequality of Doob is a generalization of Kolmogorov's
inequality. The next inequality extends this to a type of Skorokhod inequality.
AP(Mn?A)<—f SndP
CM^^A]
=
f
CM,ztA and S,,>cA]
SndP+
f
t M,aAandS, cA]
SndP
f
C&>CA]
Sn dP+cAP(MM >_A).
Remark 1. If (Sn , .9n ) is a submartingale, then so is (exp (rSn ),.n ) for any
r>0. Applying Doob's inequality (Inequality 1) yields
(1+E(S.(logSn ) + ))
eel ifp=1
(8) EMn<_ / \p
1 p f ES„ ifp>1.
\p - 1/
(10) b,>...>bn?0=bn+l•
Let r? 1. Let
n /
Proof. Since the given condition implies E(ISkJ'iS'°k_,)>_ BkISk _,I' for any
r ? 1 by Jensen's inequality applied to the convex function x', without loss of
generality we set r = 1 and assume all S k >_ 0. Let
Ak
dP.
And since
n j k
(c) Y_ (bj — 0j +1 bj+ ,) I II O i = b k provided fj 0 = 1, ;
we have
_ E Y, (bj—Bj+lbj Sk dP by(c)
k=1j=k -n ei /
+l)\ i=k+t f1 k
n j
(bj _0j±1 bj±1 ) J S1 dP
j=1 k=1 Ak
(14) P(mk
max bkISkJ ^ A) ^ [b m J Qk+ bkQk //i2.
1 m+l ]
State and prove the analog for a reverse martingale (see the next section).
Inequality 4. (Birnbaum and Marshall) Let (IS,I, .9',), 0<_ t <_ 0, be a submar-
tingale whose sample paths are right (or left) continuous. Suppose S(0) =0
and v(t) = ES 2 (t) < co on [0, 0]. Let q>0 be an 14 right- (or left-) continuous
function on [0, 0]. Then
Proof. By the right (left) continuity of the sample paths and S(0) =0
p= P( sup IS(t)I/q(t)^ 1)
0<t O
= lim P( max 1)
n-'= 0^is2
llm
L 1— [E(S2(Oi/2")—S2(0(i-1)/2"))/g2(Oi/2n)]]
z ° 1
=1—li q
m z (ei /2) [v( 0ij 2 ) — v( 8 (i -1 )j 2 )]
J
= 1— e [q(t)] -z dv(t),
0
where the inequality comes from Inequality 3 and the convergence comes from
the monotone convergence theorem. In fact, we only need 5 to be separable
and q to be A. ❑
874 INEQUALITIES AND MISCELLANEOUS
(17) _
AP(IIS + IIa^A) < SbdP^ESa:EISb.
fill s + IItea]
Also, (S„, 9',,) is said to be a reverse submartingale if EIS„) <oo for all n and
if
Exercise 2. Let S. be reverse adapted to .9'„ with EIS„I <co for all n. Then
(S„, ,,) is a reverse martingale (submartingale) if and only if for every m < n
INEQUALITIES FOR REVERSED MARTINGALES 875
(3) A f
S,„ dP = I S,, dP.
^' A
Then the Sn 's are uniformly integrable and there exists a ry S ,,, and a a--field
Y., such that (Sk , .9'k), 1 <_ k <— oo, is a martingale. Finally,
(7) S,,-S,,,
S,,, a.s. as n -+ o0
and
(2) P(M,,^a)^2'P(iSn!^c'A).
Proof. We give the proof in dimension r = 2 only; the rest is only notation.
With n = (m, n) and S;; = S(i j) , let
and
Bik—[ISmk—Sikl <(I—c)Ä]•
Now
Hence
= P(Mn A)/2.
INEQUALITIES IN HIGHER DIMENSIONS 877
So
In cases where Wichura's inequality can be applied to the Tk 's on the rhs
of Shorack and Smythe's (1976) inequality (Inequality 2) below, the result is
an analog of the Häjek and Renyi inequality.
Let bk , k E N', be a set of positive constants such that
here Obk is the usual r- dimensional differencing around the 2 points of N'
neighboring k which are sk. Define
Inequality 2. (Shorack and Smythe) If b k > 0 satisfy (3), then for arbitrary
Xk we have
Sk = I bj(AT) =
jsk - -
Y_ (b)=
jsk
(ATj)
- i^j isk
where Tik = Y. Note that each Tik is formed by differencing 2' sums
of the type T. Since
Y_ =1
i5k
(clbi )/bk with each Abi >_0,
Y- (Ab;)jT,,I/bk<_mak IT-iI•
'`
— k -
878 INEQUALITIES AND MISCELLANEOUS
Thus
max Sk I/b k <max Y_ (Ab)I7 I/bk ^ max max I T;k I
kin L'' ‚k k<n isk
<_ 2r ksn
max I Tk 1.
Let a finite population consist of the N values c,, ... , C. Let Y,, ... , Yn
denote a random sample without replacement, and let X 1 , ... , X. denote a
random sample with replacement from the population. Let
1N
2_. I N
(I) µ=— E c; and
N1= 1 N1
Then
(3) Eh(2)<_Eh(9).
are also processes on (D, 2). Let S0 = O. Also, for any process X on (D, 2),
-
we note that IIXII = lim, n -, max o s 1 ^ 2 m IX(i /2'")I is a rv. Thus all rv's and
probabilities are well defined in what follows. All conclusions claimed so far
are also true on (D R , O R ). We now present generalizations of the inequalities
of Skorokhod and Levy.
Then
n
aP( max IISkII^A)^ E P(Ak)P(I1Sn—Sk1Is(1—c)A)
1`-k`-n k=1
n
= Y,= P(Akn[IISn—SkII^(1—c)A])
k1
(b) ^ P (II S II ^ CA ),
where
Then
n
(b) P(IISrII>A)^ I P(Akn[IISnII>A])•
k=1
(c) Akm=[ max IIS; II<A<IISrIIm] and tkm = min {i: Sk (i/2m)>A}.
Irk- n
n 2^'
_ E hm P(Akm n[ II `Sn IIm > A]n[Jkm - 1])
k=1 m-.ao j=0
_
n 2`6 / /
lim n [S (j/ 2m ) ^ Sk(j/ 2m )] n [Jkm — I])
k , m+V J .. Q
n 2'" /
_ Y_ lim Y P(Akmn[Jkm=l])P(Sn(J/2m)—Sk(.J/2m)^O)
k= , m-. J _ 0
2'"
1 n ^n`
lim Y_ P(Akm n [Jkm — .I]) = — L lim P Ak m )
(
(5) e(X —X *) =X X*
for e a Rademacher ry independent of X and X.
(The one-dimensional argument suffices to make (5) clear. Thus
as claimed.) Thus if
(6) XX*for1:i<
_n whenX l ,...,X ng X*1 ...,X 9
and if
(7) e,, ... , e„ are iid Rademacher rv's independent of the X i 's and X 's,
then
t - 1 li - 1 JJJ
Now for any function g we have E IIg — XII a EI g(t) —X(t)I > I E[g(t) —X(t)]I ?
Ig(t)—EX(t)I for all t, so that
E{ max Ilgk—Xkll}^Eilgk—Xkli>—Ilgk`EXkII
lsk^n
Letz _ (X,, ... , ^;C n ), EZ = (EX„ ... , EX ), g = (g...... gn), and define
(12)IIIgIII=,mk x ll.
Then
Proof. Now
as claimed. O
Martingales and
Counting Processes
In this section we set forth the notation and basic definitions that will be used
throughout the remainder of Appendix B and in Chapter 7. Additional useful
references are Bremaud (1981), Jacod (1979), Meyer (1976), Dellacherie and
Meyer (1978, 1982), Liptser and Shiyayev (1978), and the survey by Shiryayev
(1981).
Suppose that (11, 9, P) is a fixed complete probability space. A family of
v-fields # = {. , c : t E [0, co)} is a filtration if
An adapted right-continuous process M with EI M(t) < oo for each 0 <_ t < o0
is a martingale [or, to be explicit, an (F, P) -martingale] if
Although [M] is not in general predictable, [M] = (M) (so that [M]--(M)€
P]), and [M]' /2 E s4' ° `[<°^", P]. Note that [M]—(M)=
_ (,1M) 2 —(M d ). The quadratic covariation process [M, N] is defined by
T 1 T2 T3 14 •0•
Figure 1.
Example 2. (Renewal counting process) If X,, X2 ,... are iid positive ran-
dom variables [so P(X; = 0) = 0] and T; = X, + • .+X,, then N(t)
1 [0• , ] (T; ) is a (renewal) counting process. ❑
n n
(1) M = N —AE.tilo°[_F,Pl .
888 MARTINGALES AND COUNTING PROCESSES
The following theorem shows that for the counting process local martingale
of (1), the predictable variation process (M) can be easily computed from the
compensator A.
(3) (M)(t)=
f 0j]
(1—AA)dA, 0<_t<oo.
(5) A _ r'
Ar,
COUNTING PROCESSES AND MARTINGALES 889
where
(6) Ai(t)=
f(0 , tATjj
( 1 F; -)
-
(ot7
- 'dFi= J 1 t :..^(T1)
1 -
1
F;(s - )
dFF(s)
-t <oo.
for 0<
Proof. See Theorem 18.2, Liptser and Shiryayev (1978). ❑
Example 1. (continued) Here A(t) = At for 0 <_ t < oo and {J(t) -At: t > 0}
is a martingale. ❑
(7) A(t)= dF
Yi-F_
i =7 fO,t^Tj—rnT-}
A,(t)=
I (o , t ^ x , 1
1
1- F
dF =
co,r^
1(s..)(X,) 1
1- F(s -)
dF(s).
Noting that N n is a sum of iid counting processes each having the same
distribution as rk, this yields
J (oI'
(n - N,
1-F-
-) 1 dF.
Equivalently,
(8) N=N„+JA
The first theorem of this section asserts that the martingale property is preserved
for stochastic integrals of predictable processes with respect to counting process
martingales.
Let N be an (, P)-counting process.
(2) (Y)(t) =
1 0,I)
H2d(M)=
f
(o,t]
H2(1—AA)dA —t <oo.
for0<
(d) P(j (01] H 2 d(M) <oo for all 0 s t < oo) = 1 if and only if Y E .ti l. oc[F, P]
and has predictable variation process (Y) given by (2).
(e) If P(5 H 2 d(M) < oo) =1, then P( II YII 0 < öx) = 1 .
Proof. See Theorems 18.7 and 18.8 on pp. 268-270 of Liptser and Shiryayev
(1978); Chapter II of Meyer (1976); and Jacod (1979). Also see the survey of
stochastic integration by Dellacherie (1980). ❑
The second theorem of this section asserts that all the local martingales
associated with a counting process N are of the form (1).
Suppose that N is an (.F, P)- counting process, let W N = {3;': 0< t <oo}
denote the natural or minimal a --fields associated with N, that is,
(3) P( IHIdA<ooforall0<t<co) =1
(o,t]
and
Proof. See Theorem 19.1, p. 282, Liptser and Shiryayev (1978). Results of
this type were given by Davis (1974), Boel et al. (1975a), Chou and Meyer
(1974), Dellacherie (1974), Kabanov (1973), and Grigelionis (1975). ❑
892 MARTINGALES AND COUNTING PROCESSES
4. MARTINGALE INEQUALITIES
(i) If Y is predictable, then for all A > 0, rl > 0, and all stopping times T
(ii) If IID YII <c, then for all A > 0, 77 > 0, and all stopping times T
Part (i) of this inequality is due to Lenglart (1977); part (ii) was proved by
Rebolledo (1979).
Proof. We first show that
Let S= inf {s <_ TA n: X(s) >_ A}, T n n if the set is empty. Thus S is a stopping
time and S <_ TA n. Hence
E(Y(T))>_ E(Y(S))
>_ E(X(S)) since X is dominated by Y
'— E(X(S)I[IIXIIö".A7)
(b) ?AP(IIXIIo""^A).
Letting n -> oo in (b) yields (a).
To prove (1), we will in fact show that for A > 0, rt > 0, and all predictable
stopping times S
Then (1) follows from (c) applied to the processes X T = X(. A T) and Y T
Y(• A T) with the predictable stopping time S = oo.
Let R ° inf {t: Y(t) > 77 }. Then R >0 by right continuity of Y and is predict-
able since Y is predictable. Thus R A S is predictable and
`P(X(R„s)--A)+P(YS-^ v1)
since 1 [ ,.s _ < ,, l X *s_ _ X(R„s)-
*
E(YYRAS)-)+P(Y5-^7t)
A —E
E(Ys-^Tt)+^'(1's-gin)
(d) ^^l —s
Letting E -*0 in (d) yields (c) which in turn implies (1). The argument for (2)
is similar. ❑
Also let
(4) I=_A[M]_A6[M]
and
(5) MM—Me.
Then (M„) nal E fl ,,,, ö `[F., P] satisfies the ARJ(1) condition if, as
(7) äe[M„](t) p0 0
-
for all e > 0, t e R + .
Then (M„) n.1 E fl .,, 40c[, P] satisfies the ARJ(2) condition if, as n - 00,
Proposition 1. (Rebolledo)
(i) For (M„) n .1 E fl n,l 0 ö °[,`n, P] the strong ARJ(1) condition implies
the ARJ(1) condition: (7) implies (6).
(ii) For (M„)„^ 1 E .# 2,
1oc [., n , P] the Lindeberg condition implies
the strong ARJ(2) condition which implies both the ARJ(2) and the strong
ARJ(1) conditions. Also, the ARJ(2) condition implies the ARJ(1) condition
[and, of course, (i) continues to hold]:
^ (8) v
(10)=(9) (6).
y (7) ^
(10)^(9)^(8)^(7)^(6).
and
(12) [M„](t) -* A(t) asn-^coforalltER+
and either
where A is >' and continuous with A(0) = 0. Then (15) and (16) are equivalent
and
where X0 is Fo -measurable, M e tf o [P, P], and A e 'VO [P, P]. The decompo-
sition (1) is not unique. The collection of (, P)-semimartingales will be
denoted by ‚9 [^, P].
2
k
+ 1 E DD'FoX(s—)d(X",X")(s)
2 (o, , ] t,i=1
+ jFoX(s)—FOX(s—)— D'FoX(s—)AX'(s)^
$ I111 i_I
k
+ 1 D`D'FoX(s—)d[X`,X'](s)
2 j,,1 i,jt
sir I. i t
n
— 1 Z D'D'F o X(s —)LX'(s)AX'(s) .
2 ^,i =1
It is given by
Proof. See Doleans-Dade (1970) or Meyer (1976), p. 304ff. The proof uses
the change-of-variable formula (2). ❑
or equivalently
(12) Z(t)=Z(0)+ J
f(o r] aZ_dA
is given by
By Theorem 3, Z E rlt'°`[^, P]
(15) Z(t)=exp(c(N(t)—A(t))) fi ( l + E)
-
si5l: eN(s)=I e
and
and
A^
(21) P(IIMIIö^,i)^2 exp T ))+P((M)(t) r).
( -2r (
For the discrete case see Freedman (1975), and note that the bound in
Freedman's (1.6) may be written as exp (— (a 2 /2b)Ji(a /b)).
Proof. Now EZ'(0) = 1 for every r> 0 and
P( IIM+IIö>A, (M)(t) ^ r)
= P(M(s) a- A for some 0s s ^ t, (M)(t) s r)
= P(rM(s) — (r)(M)(s) ? rA — (p^(r)(M)(s)
for some s < t, (M)(t) <_ r)
= P(exp [rM(s) — cp^(r)(M)(s)]? e'A- ` ,(')' for some s t)
-5 1/exp (rA — ço (r)r)
Choosing r = (1/c) log (1 +Ac /r) yields (17). Since —Mc Jt0 °[. , P] also has
c, (17) holds with ISM1ö, and combining the two inequalities
yields (18). ❑
Errata
1 Introduction
Since publication of our book Empirical Processes with Applications to Statistics
in 1986, we have become aware of several mathematical errors and a number of
typographical and other minor errors. Although we would now do many things
differently, our purpose here is only to give corrections of the errors of which we
are currently aware.
We encourage readers finding further errors to let us know of them.
We owe thanks to the following friends, colleagues, reviewers, and users of the
book for telling us about errors, difficulties, and shortcomings: N. H. Bingham,
M. Csörgö, S. Csörgö, Kjell Doksum, Peter Gaenssler, Richard Gill, Paul Janssen,
Keith Knight, Ivan Mizera, D. W. Muller, David Pollard, Peter Sasieni, and Ben
Winter.
We owe special thanks to Peter Gaenssler for providing us with a long list of
typographical errors which provided the starting point for section 3 here.
The corrections of Chapters 7, 21, and 23 given in section 2 were aided by
discussions and correspondence with Richard Gill and Ben Winter (in the case of
Chapter 7), I. Bomze and E. Reschenhofer, and W.D. Kaigh (in the case of Chapter
21), and Keith Knight (in the case of section 23.3).
3. CONSISTENCY OF A n and IF n .
In this section we use the representations of Theorem 7.2.1 and continuity of
the product integral map E which takes A to F (see section B.6 and_ especi al ly
example B.6.1, page 898) to establish weak and s trong consistency of A n and IF n .
Our first result gives strong consistency of both A n and 9,,, on any interval [0, 0]
with0<T- TH - H -1 (1).
Theorem 1. Suppose that F and G are arbitrary df's on [0, no). Recall T - TH
H 1 (1) where 1 — H - (1 — F)(1 — G). Then for any fixed 0 < r
-
and
The following theorem is more satisfactory since F and G are completely ar-
bitrary; the price is that the consistency is in probability (and the supremum in (5)
is just over the interval [0, T)).
Theorem 3 (Wang). Suppose that F and G are completely arbitrary. Then
Open Question 1. Does Wang's Theorem 3 continue to hold with --> p replaced
by —> as ? (The hard case not covered by Theorem 2 is F(T—) < 1, G(T—) = 1.)
Errata 903
Recall that for an arbitrary hazard function A (of a df F on R+), the (product
integral) or exponential map £( -A) recovers 1 - F:
1 - F(t) = £(-A)(t) _ fJ (1 - dA)
O<s<t
then
(8) sup Ih n (t) - ho (t) I -' 0 as n -> oc.
tEA
Proof of Theorem 1. Now IIH n - HII ->a.s. 0 by Glivenko-Cantelli, so that
III() - H_ I a . s . 0 also. Thus for any fixed t < 0 we have a.s. that
An (t) - A(t) I <
t I1-
( H n -) - '-(1-H_) 1 I dIH[ n1
+ 0
t
(1 - H_) -1 d(H 1 - H 1 )I
(a) a.s. 0+0=0
by the Glivenko-Cantelli theorem and H t-) < H(0) < 1 for the first term, and
(
by the SLLN for the second term. Since A n and A are 1, the standard argument of
(3.1.83) improves (a) to (2).
But then (1) follows from (2) and continuity of the product integral map £
given in Lemma 2.1. ❑
Proof of Theorem 2. First suppose H(r -) < 1. Then as in (a) of the proof of
theorem 1,
_
A n (t) - A(t)I < J 0
t
{(1 - H n _) -1 - (1 - H_) - 1 }d]H[n
+ II t
(1 - H-) - 1 d(E - H 1 ) ,
904 Errata
where the first term converges to zero uniformly on [0, T] by the Glivenko—Cantelli
theorem since 1— H(T—) > 0 and H7(r) < 1. Now the second term: for 0 < t <
T,
t 1
% ( 1 n — H 1)I
J o 1 — H_ d ]EI[
< 2I
1
— H'IIö + I IH[ (T) — OH 1 (T)
1 — H(T—) 1 — H(T—)
—'a.s. 0 + 0 = 0,
so the second term converges to zero a.s. uniformly in t E [0, T]. Hence
If 0A(T) < 1, then (3) follows from Lemma 1. If A11(T) = 1 (so F(T) = 1), then
lemma I and (a) imply that
and
(t)<1
and
1 — E<F(T) C1.
Hence
by (1). Since E is arbitrary, (1) and (b) imply (3) in this case (F(T—) = 1), too.
Since T = Zn . n <r a.s., (4) follows from (3) . ❑
Proof of Theorem 3. We first suppose 0 < T with F(9 —) <1 and show that
and
To see this, let e > 0, and choose o < 8 so that A(8 —) — A(o) < E, and hence
,
^6 1 — DA(s)
(IDn) (0,) _ dA(s)
p 1 — H n (s—)
1 — DA(s)
#as a dA(s) < oo.
0 1 — H(s—)
Therefore,
and
and
so that
Errata 907
Now gn(t) is the sum of at most a finite number of terms. Thus by (7) for every
E >0with
it follows that
(f) sup
teA s<t
09n (S)1[lo9n(S)I>E] — Z
s<t
A 9 o(s) 1 [Io90(3)1>E] I —' 0
asn —iocand
9n(t) —g(t)
s<t
+ Ig(t) + E A9n(s ) 1 [I 0 9n (s)I<El — 9ö(t) —
s<t
09o(s) 1 [Io9o(s)I<_El
+ I9n(t) — go(t) I
+I i {log(1 — Ogo(s)) + Ago(s)} 1[logo(s)ISe]
s<t
+ I9n(t) — go(t)I•
(h) lim sup sup log h n (t) — log ho (t) j < 2ego (T) .
n—oo tEA
(1 — K)/(1 — F) _ (1 +
J CdF)
0
fo [l < j — t]J(t)dg(t)j
— g(1 — a)n 1 / 6 (1 — a +
At
< n 1 / 6 sup ( ) _(1_a) }
jtj<x
sup Ih n
(tj<K
(1 — a + A l
\
t
/
At
< n 1 /4 sup g (1 — a+ ) — g(1 — a)
^tl <K
• n-1/12 At
SUP B n (1 —a+
jtI <K
= o(1)O(l) a.s.
= o(1) a.s.
Exercise 1. Show that for any 0 < K < oc and 0 < A < oc and 0 < a < 1 we
have
n-1/12 sup Ilg n (a
+ to /n 1 / 3 )I = 0(1) a.s.
I tl <K
Knight's alternative correction for this section involves the following alternative
exercise.
Exercise 1'. Show that for any 0 < K < oo and 0 < A < oo and 0 < a < 1 we
have
699 fo fo
731 (22) .f o F -1 —* .f o F 'Pn(l, c)
—
Errata 913
REFERENCES
BASS, R. F. AND KHOSHNEVJSAN, D. (1995). "Laws of the iterated logarithm for local times of
the empirical process," Ann. Probab., 23, 388-399.
BRETAGNOLLE, J. AND MASSART, P. (1989). "Hungarian constructions from the nonasymptotic
viewpoint," Ann. Probab., 17, 239-256.
CHERNOFF, H. AND SAVAGE, 1. R. (1958). `Asymptotic normality and efficiency of certain non-
parametric test statistics," Ann. Math. Statist., 29, 972-994.
CSÖRG6, M., CSÖRG6, S. AND HORVATH, L. (1986). An Asymptotic Theory for Empirical Re-
liability and Concentration Processes, vol. 33 of Lecture Notes in Statistics, Springer- Verlag,
Berlin.
DEHEUVELS, P. (1991). "Functional Erdös-RBnyi laws," Studia Sci. Math. Hungar., 26, 261-295.
DEHEUVELS, P. (1997). "Strong laws for local quantile processes," Ann. Probab., 25, 2007-2054.
DEHEUVELS, P. (1998). "On the approximation of quantile processes by Kiefer processes," J. The-
oret. Probab., 11, 997-1018.
EINMAHL, J. H. J. AND MASON, D. M. (1988). "Strong limit theorems for weighted quantile
processes," Ann. Probab., 16, 1623-1643.
Errata 917
EINMAHL, J. H. J. AND RUYMGAART, F. H. (1987). "The almost sure behavior of the oscillation
modulus of the multivariate empirical process," Statist. Probab. Lett., 6, 87-96.
GÖTZE, F. (1985). `Asymptotic expansions in functional limit theorems," J. Multivariate Anal., 16,
1-20.
KHOSHNEVISAN, D. (1992). "Level crossings of the empirical process," Stochastic Process. Appl.,
43,331-343.
LIFSHITS, M. A. AND SHI, Z. (2003). Lower functions of an empirical process and of a Brownian
sheet. Teor. Veroyatnost. i Primenen. 48 321-339.
MARTYNOV, G. V. (1978). Kriterii omega-kvadrat, Nauka, Moscow.
MASSART, P. (1990). "The tight constant in the Dvoretzky-Kiefer-Wolfowitz inequality," Ann.
Pmbab.,18, 1269-1283.
SMIRNOV, N. V. (1947). "Sur un entere de symetrie de la loi de distribution d'une variable aleatoire,"
C. R. (Doklady) Acad. Sci. URSS (N.S.), 56, 11-14.
WANG, J. G. (1987). `A note on the uniform consistency of the Kaplan-Meier estimator," Ann.
Statist., 15, 1313-1316.
References
919
920 REFERENCES
Andrews, D. F., Bickel, P. J., Hampel, F. R., Huber, P. J., Rogers, W. H., and Tukey, J. W. (1972).
Robust Estimates of Location, Princeton University Press, Princeton, New Jersey.
Bahadur, R. R. (1966). "A note on quantiles in large samples," Ann. Math. Statist., 37, 577-580.
Bahadur, R. R. (1971). "Some limit theorems in statistics," Regional Conference Series on Applied
Mathematics, Vol. 4, SIAM, Philadelphia, Pennsylvania.
Bahadur, R. R. and Rao, R. R. (1960). "On deviations of the sample mean," Ann. Math. Statist.,
31, 1015-1027.
Bahadur, R. R. and Zabell, S. L. (1979). "Large deviations of the sample mean in general vector
spaces," Ann. Prob., 7, 587-621.
Barbour, A. D. and Hall, Peter (1984). "Stein's method and the Berry- Esseen theorem," Aust. J.
Statist., 26, 8-15.
Barlow, R. E. and Campo, R. (1975). "Total time on test processes and applications to failure
data analysis," Reliability and Fault Tree Analysis, pp. 451-481, SIAM, Philadelphia, Pennsyl-
vania.
Barlow, R. E. and Proschan, F. (1977). "Asymptotic theory of total time on test processes with
applications to life testing," in Multivariate Analysis-IV, P. R. Krishnaiah, ed., pp. 227-237,
North-Holland, New York.
Baum, L., Katz, M., and Stratton, H. (1971). "Strong laws for ruled sums," Ann. Math. Statist.,
42, 625-629.
Baxter, G. (1955). "An analogue of the law of the iterated logarithm," Proc. Am. Math. Soc., 6,
177-181.
Beekman, J. A. (1974). Two Stochastic Processes, Almqvist & Wilksells, Stockholm.
Beirlant, J., van der Meulen, E. C., and van Zuijlen, M. C. A. (1982). "On functions bounding
the empirical distribution of uniform spacings," Z. Wahrsch. verw. Geb., 61, 417-430.
Bennett, G. (1962). "Probability inequalities for the sum of independent random variables," J.
Am. Statist. Assoc., 57, 33-45.
Beran, R. (1977a). "Estimating a distribution function," Ann. Statist., 5, 400-404.
Reran, R. (1977b). "Robust location estimates," Ann. Statist., 5, 431-444.
Bergman, B. (1977). "Crossings in the total time on test pilot," Scand. J. Statist., 4, 171-177.
Berk, R. H. and Jones, D. H. (1979). "Goodness-of-fit statistics that dominate the Kolmogorov
statistics," Z. Wahrsch. verw. Geb., 47, 47-59.
Berning, J. A. (1979). "On the multivariate law of the iterated logarithm," Ann. Prob., 7, 980-
988.
Bickel, P. J. (1967). "Some contributions to the theory of order statistics," Proceedings of the Fifth
Berkeley Symposium Mathematical Statistics and Probability, Vol. 1, pp. 575-591, University
of California Press, Berkeley, California.
Bickel, P. J. (1973). "On some analogues to linear combinations of order statistics in the linear
model," Ann. Statist., 1, 597-616.
Bickel, P. J. and Freedman, D. A. (1981). "Some asymptotic theory for the bootstrap," Ann.
Statist., 9, 1196-1217.
Bickel, P. J. and van Zwet, W. R. (1980). "On a theorem of Hoeffding," in Asymptotic Theory of
Statistical Tests and Estimation, I. M. Chakravarti, ed., pp. 307-324, Academic Press, New
York.
Bickel, P. J. and Wichura, M. J. (1971). "Convergence for multiparameter stochastic processes
and some applications," Ann. Math. Statist., 42, 1656-1670.
Billingsley, P. (1968). Convergence of Probability Measures, Wiley, New York.
Billingsley, P. (1971). "Weak convergence of measures: Applications in probability," Regional
Conference Series on Applied Mathematics, Vol. 5, SIAM, Philadelphia, Pennsylvania.
Billingsley, P. (1979). Probability and Measure, Wiley, New York.
REFERENCES 921
Birnbaum, Z. and Marshall, A. (1961). "Some multivariate Chebyshev inequalities with extensions
to continuous parameter processes," Ann. Math. Statist., 32, 687-703.
Birnbaum, Z. W. (1952). "Numerical tabulation of the distribution of Kolmogorov's statistic for
finite sample sizes," J. Am. Statist. Assoc., 47, 425-441.
Birnbaum, Z. W. (1953). "On the power of a one-sided test of fit for continuous probability
functions," Ann. Math. Statist., 24, 484-489.
Birnbaum, Z. W. and McCarty, R. C. (1958). "A distribution-free upper confidence bound for
P( Y< X) based on independent samples of X and Y," Ann. Math. Statist., 29, 558-562.
Birnbaum, Z. W. and Pyke, R. (1958). "On some distributions related to the statistic D„," Ann.
Math. Statist., 29, 179-187.
Birnbaum, Z. W. and Tingey, F. H. (1951). "One sided confidence contours for probability
distribution functions," Ann. Math. Statist., 22, 592-596.
Blackman, J. (1955). "On the approximation of a distribution function by an empiric distribution,"
Ann. Math. Statist., 26, 256-267.
Blom, G. (1958). Statistical Estimates and Transformed Beta-Variables, Wiley, New York.
Blum, J. R. (1955). "On the convergence of empiric distributions," Ann. Math. Statist., 26, 527-529.
Boel, R., Varaiya, P., and Wong, E. (1975a). "Martingales on jump processes, I: Representation
results," SIAM J. Control, 13, 999-1022.
Boel, R., Varaiya, P., and Wong, E. (1975b). "Martingales on jump processes, II: Applications,"
SIAM J. Control, 13, 1022-1061.
Bohman, H. (1963). "Two inequalities for Poisson distributions," Skand. Aktuarietidskr., 46, 47-52.
Bolthausen, E. (1977a). "Convergence in distribution of minimum distance estimators," Metrika,
24,215-227.
Bolthausen, E. (1977b). "A nonuniform Glivenko-Cantelli theorem," unpublished technical
report.
Book, S. A. and Shore, T. R. (1978). "On large intervals in the Csörgö- Revesz theorem on
increments of a Weiner process," Z. Wahrsch. verw. Geb., 46, 1-11.
Boos, D. (1979). "A differential for L-statistics," Ann. Statist., 7, 955-959.
Boos, D. (1982). "A test for asymmetry associated with the Hodges-Lehmann estimator," J. Am.
Statist. Assoc., 77, 647-651.
Borovskih, Ju. (1980). "An estimate of the rate of convergence of the Anderson-Darling criterion,"
Theor. Prob. and Math. Statist., 19, 29-33.
Braun, H. 1. (1976). "Weak convergence of sequential linear rank statistics," Ann. Statist., 4,
554-575.
Breiman, L. (1967). "On the tail behavior of sums of independent random variables," Z. Wahrsch.
verw. Geb., 9, 20-25.
Breiman, L. (1968). Probability, Addison-Wesley, Reading, Massachusetts.
Bremaud, J. P. (1981). Point Processes and Queues: Martingale Dynamics, Springer- Verlag, New
York.
Bremaud, J. P. and Jacod, J. (1977). "Processus ponctuel et martingales," Adv. Appl. Prob., 9,
362-416.
Breslow, N. and Crowley, J. (1974). "A large sample study of the life table and product limit
estimates under random censorship," Ann. Statist., 2, 437-453.
Bretagnolle, J. (1980). "Statistique de Kolmogorov-Smirnov pour un enchantillon nonequireparti,"
Colloq. internat. CNRS, 307, 39-44.
Breth, M. (1976). "On a recurrence of Steck," J. Appl. Prob., 13, 823-825.
Brillinger, D. R. (1969). "The asymptotic representation of the sample distribution function,"
Bull. Am. Math. Soc., 75, 545-547.
922 REFERENCES
Bronstein, E. M. (1976). "Epsilon—entropy of convex sets and functions," Siberian Math. J., 17,
393-398. (Siberksi Mat. Zh. 17, 508-514, in Russian.)
Brown, B. (1971). "Martingale central limit theorems," Ann. Math. Statist., 42, 59-66.
Burk, M. D., Csörgö, S., and Horvath, L. (1981). "Strong approximations of some biometric
estimates under random censorship," Z. Wahrsch. verw. Geb., 56, 87-112.
Burke, M. D., Csörgö, M., Csörgö, S., and Revesz, P. (1978). "Approximations of the empirical
process when the parameters are estimated," Ann. Prob., 7, 790-810.
Burkholder, D. L. (1973). "Distribution function inequalities for martingales," Ann. Prob., 1, 19-42.
Butler, C. (1969). "A test for symmetry using the sample distribution function," Ann. Math.
Statist., 40, 2209-2210.
Cantelli, F. P. (1933). "Sulla determinazione empirica delle leggi di probabilita," Giorn. Ist. Iral.
Attuari, 4, 421-424.
Cassels, J. W. S. (1951). "An extension of the law of the iterated logarithm," Proc. Cambridge
Phil. Soc., 47, 55 -64.
Chan, A. H. C. (1977). "On the increments of multiparameter Gaussian processes," Ph.D. Thesis,
Carleton University.
Chandra, M. and Singpurwalla, N. D. (1978). "The Gini index, the Lorenz curve, and the total
time on test transforms," unpublished technical report, T-368, George Washington University,
School of Engineering and Applied Science, Institute for Management Science ad Engineering.
Chang, Li-Chien (1955). "On the ratio of the empirical distribution to the theoretical distribution
function," Acta Math. Sinica, 5, 347-368. (English Translations in Selected Transl. Math.
Statist. Prob.), 4, 17-38 (1964).
Chapman, D. (1958). "A comparative study of several one-sided goodness of fit tests," Ann. Math.
Statist. 29 655-674.
Cheng, P. (1962). "Nonnegative jump points of an empirical distribution function relative to a
theoretical distribution function," Acta Math. Sinica, 8, 333-347. [English translation in
Selected Trans!. Math. Statist. Prob., 3, 205-224 (1962).]
Chernoff, H. (1952). "A measure of asymptotic efficiency for tests of a hypothesis based on the
sum of observations," Ann. Math. Statist., 23, 493-507.
Chernoff, H., Gastwirth, J., and Johns, M. V. (1967). "Asymptotic distribution of linear combina-
tions of order statistics with applications to estimation," Ann. Math. Statist., 38, 52-72.
Chibisov, D. M. (1964). "Some theorems on the limiting behavior of empirical distribution
functions," Selected Trans!. Math. Statist. Prob., 6, 147-156.
Chibisov, D. M. (1965). "An investigation of the asymptotic power of tests of fit," Theor. Prob.
Appt., 10, 421-437.
Chibisov, D. M. (1969), "Transition to the limiting process for deriving asymptotically optimal
tests," Sankhya, A31, 241-258.
Chou, C. S. and Meyer, P. A. (1974). "La representation des martingales relatives a un processus
punctuel discret," C. R. Acad. Sci. Paris A, 278, 1561-1563.
Chow, Y. S. and Lai, T. L. (1978). "Paley type inequalities and convergence rates related to the
law of large numbers and extended renewal theory," Z. Wahrsch. verw. Geb. 45, 1-20.
Chow, Y. S. and Teicher, H. (1978). Probability Theory: Independence, Interchangeability, Martin-
gales, Springer- Verlag, Heidelberg.
Chung, K. L. (1948). "On the maximum partial sums of sequences of independent random
variables," Trans. Am. Math. Soc., 67, 36 -50.
Chung, K. L. (1949). "An estimate concerning the Kolmogorov limit distributions," Trans. Am.
Math. Soc., 67, 36 -50.
Chung, K. L. (1974). A Course in Probability Theory, 2nd ed., Academic Press, New York.
REFERENCES 923
Chung, K. L., Erdös, P., and Sirao, T. (1959). "On the Lipschitz condition for Brownian motion,"
J. Math. Soc. Jpn, 11, 263-274.
Clements, G. F. (1963). "Entropies of several sets of real valued functions," Pacific J. Math., 13,
1085-1095.
Cotterill, D. S. and Csörgö, M. (1982). "On the limiting distribution of and critical values for the
multivariate Cramer-von Mises statistic," Ann. Statist., 10, 233-244.
Cramer, H. (1928). "On the composition of elementary errors, Second paper: Statistical applica-
tions," Skand. Aktuarietidskr, 11, 141-180.
Cramer, H. (1962). Random Variables and Probability Distributions, Cambridge University Press,
Cambridge.
Crow, E. and Siddiqui, M. (1967). "Robust estimation of location," J. Am. Statist. Assoc., 62,
353-389.
Csäki, E. (1968). "An iterated logarithm law for semimartingales and its application to empirical
distribution function," Studia Sci. Math. Hung., 3, 287-292.
Csäki, E. (1975). "Some notes on the law of the iterated logarithm for empirical distribution
function," in Colloq. Math. Soc. J. Bolyai, Limit Theorems of Probability Theory, P. Revesz,
ed., Vol. 11, pp. 45-58, North-Holland, Amsterdam.
Csäki, E. (1977). "The law of the iterated logarithm for normalized empirical distribution function,"
Z. Wahrsch. verw. Geb., 38, 147-167.
Csäki, E. (1978). "On the lower limits of maxima and minima of a Wiener process and partial
sums," Z. Wahrsch. verw. Geb., 43, 205-222.
Csäki, E. (1981). "Investigations concerning the empirical distribution function," Selected Trans!.
in Math. Statist. and Prob., 15, 229-317.
Csaki, E. and Tusnady, G. (1972). "On the number of intersections and the ballot theorem,"
Periodica Math. Hung., 2, 5-13.
Csörgö, M. (1967). "A new proof of some results of Renyi and the asymptotic distribution of the
range of his Kolmogorov-Smirnov-type random variable," Can. J. Math., 19, 550-558.
Csörgö, M. (1983). "Quantile Processes with Statistical Applications," Regional Conference Series
on Applied Mathematics, Vol. 42, SIAM, Philadelphia, Pennsylvania.
Csörgö, M. and Revesz, P. (1974). "Some notes on the empirical distribution function and the
quantile process," in Colloq. Math. Soc. J. Bolyai, Limit Theorems of Probability Theory, P.
Revesz, ed., Vol. 11, pp. 59-71, North-Holland, Amsterdam.
Csörgö, M. and Revesz, P. (1975). "A new method to prove Strassen -type laws of invariance
principle, I and II," Z. Wahrsch. verw. Geb., 31, 255-260, 261-269.
Csörgö, M. and Revesz, P. (1975). "Some notes on the empirical distribution function and the
quantile process," in Colloq. Math. Soc. J. Bolyai, Limit Theorems of Probability Theory, P.
Revesz, ed., Vol. 11, pp. 59-71, North-Holland, Amsterdam.
Csörgö, M. and Revesz, P. (1978a). "Strong approximations of the quantile process," Ann. Statist.,
6, 882-894.
Csörgö, M. and Revesz, P. (1978b). "How big are the increments of a multiparameter Weiner
process?," Z. Wahrsch. verw. Geb., 42, 1 -12.
Csörgö, M. and Revesz, P. (1981). Strong Approximations in Probability and Statistics, Academic
Press, New York.
Csörgö, M., Csörgö, S., Horvath, L., and Mason, D. M. (1983). "An asymptotic theory for empirical
reliability and concentration processes," unpublished manuscript.
Csörgö, S. (1975). "Asymptotic expansions of the Laplace transform of the von Mises w 2 test,"
Theor. Prob. Appl., 20, 158-160.
Csörgö, S. (1976). "On an asymptotic expansion for the von Mises w 2 -statistic," Acta Sci. Math.
Hung., 38, 45-67.
924 REFERENCES
Csörgö, S. and Horvath, L. (1983). "The rate of strong uniform consistency for the product-limit
estimator," Z. Wahrsch. verw. Geb., 62, pp. 411 -426.
Csörgö, M., Csörg6, S., Horvath, L., and Mason, D. M. (1984a). "A new approximation of uniform
empirical and quantile processes and some of its applications to probability and statistics,"
Technical Report, Vol. 24, Laboratory for Research in Statistics and Probability, Carleton
University.
Csörgö, M., Csörgö, S., Horvath, L., and Mason, D. M. (1984b). "Applications of a new
approximation of uniform empirical and quantile processes to probability and statistics,"
Technical Report, Vol. 25, Laboratory for Research in Statistics and Probability, Carleton
University.
Daniels, H. E. (1945). "The statistical theory of the strength of bundles of thread," Proc. Roy.
Soc., London Ser. A 183, 405-435.
Darling, D. A. (1953). "On a class of problems relating to the random division of an interval,"
Ann. Math. Statist., 24, 239-253.
Darling, D. A. (1955). "The Cramer-Smirnov test in the parametric case," Ann. Math. Statist.,
26, 1-20.
Darling, D. A. (1957). "The Kolmogorov-Smirnov, Cramer-von Mises tests," Ann. Math. Statist.,
28, 823-838.
Darling, D. A. (1960). "On the theorems of Kolmogorov-Smirnov," Theor. Prob. Appl., 5,356-360.
Darling, D. A. and Erdös, P. (1956). "A limit theorem for the maximum of normalized sums of
independent random variables," Duke Math. J., 23, 143-155.
David, H. A. (1981). Order Statistics, 2nd ed., Wiley, New York.
Davis, M. H. A. (1974). "The representation of martingales of jump processes," research report,
pp. 74/78, Imperial College, London.
de Haan, L. (1975). "On regular variation and its application to the weak convergence of sample
extremes," Mathematical Centre Tract, Vol. 32, Mathematisch Centrum, Amsterdam.
DeHardt, J. (1971). "Generalizations of the Glivenko-Cantelli theorem," Ann. Math. Statist., 42,
2050-2055.
Dellacherie, C. (1972). Capacites et Processus Stochastique, Springer- Verlag, New York.
Dellacherie, C. (1974). "Integrales Stochastique par rapport aux procesus de Wiener et de Poisson,"
Seminaire de Probabilities VIII, Vol. 381, Springer- Verlag, Berlin.
Dellacherie, C. (1980). "Un survol de la theorie de ('integrale stochastique," Stoch. Proc. Appl.,
10, 115-144.
Dellacherie, C. and Meyer, P. A. (1978). Probabilities and Potential, I. Hermann/North-Holland,
Paris/Amsterdam.
Dellacherie, C. and Meyer, P. A. (1982). Probabilities et Potential, IL Hermann, Paris.
Dempster, A. P. (1959). "Generalized D„ statistics," Ann. Math. Statist., 30, 593-597.
Devroye, L. (1981). "Laws of the iterated logarithm for order statistics of uniform spacings,"
Ann. Prob., 9, 860-867.
Devroye, L. (1982). "Upper and lower class sequences for minimal uniform spacings," Z Wahrsch.
verw. Geb., 61, 237-254.
Diaconis, P. and Freedman, D. (1984). "Asymptotics of graphical projection pursuit," Ann. Statist.,
12, 793-815.
Dobrushin, R. L. (1970). "Describing a system of random variables by conditional distributions,"
Theor. Prob. Appl., 15, 458-486.
Doksum, K. (1974). "Empirical probability plots and statistical inference for nonlinear models
in the two-sample case," Ann. Statist., 2, 267-277.
Doksum, Kjell A. and Sievers, Gerald L. (1976). "Plotting with confidence: Graphical comparisons
of two populations," Biometrika, 63, 421-434.
REFERENCES 925
Doleans-Dade, C. (1970). "Quelques applications de la formule de changement de variables pour
les semi martingales," Z. Wahrsch. verw. Geb., 16, 181-194.
Doleans-Dade, C. and Meyer, P. A. (1970). "Integrales stochastiques par rapport aux martingales
locales," Seminaire de Probabilites IV, Universite de Strasbourg, vol. 124, pp. 77-107, Springer-
Verlag, Berlin.
Donsker, M. D. (1951). "An invariance principle for certain probability limit theorems," Mem.
Am. Math. Soc., 6, 1-12.
Donsker, M. D. (1952). "Justification and extension of Doob's heuristic approach to the
Kolmogorov-Smirnov theorems," Ann. Math. Statist., 23, 277-281.
Donsker, M. D. and Varadahan, S. R. S. (1977). "On laws of the iterated logarithm for local
times," Commun. Pure Appl. Math., 30, 707-753.
Doob, J. L. (1949). "Heuristic approach to the Kolmogorov-Smirnov theorems," Ann. Math.
Statist., 20, 393-403.
Doob, J. L. (1953). Stochastic Processes, Wiley, New York.
Dudley, R. M. (1966). "Weak convergence of probabilities on nonseparable metric spaces and
empirical measures on Euclidean spaces," Ill. J. Math., 10, 109-126.
Dudley, R. M. (1968). "Distances of probability measures and random variables," Ann. Math.
Statist., 39, 1563-1572.
Dudley, R. M. (1973). "Sample functions of the Gaussian process," Ann. Prob., 1, 66-103.
Dudley, R. M. (1976). "Probabilities and metrics: Convergence of laws on metric spaces, with a
view to statistical testing," Lecture Notes Series, Vol. 45, Matematish Institut, Aarhus Univer-
sitet.
Dudley, R. M. (1978). "Central limit theorems for empirical measures," Ann. Prob., 6, 899-929.
Dudley, R. M. (1979). "Balls in R k do not cut all subsets of k+2 points," Adv. Math. 31, 306-308.
Dudley, R. M. (1981). "Some recent results on empirical processes," Probability in Banach Spaces
III. Lecture Notes in Mathematics, Vol. 860, Springer- Verlag, New York.
Dudley, R. M. (1983). "A course on empirical processes," preprint.
Dudley, R. M. and Philipp, W. (1983). "Invariance principles for sums of Banach space valued
random elements and empirical processes," Z. Wahrsch. verw. Geb., 62, 509-552.
Durbin, J. (1971). "Boundary-crossing probabilities for the Brownian motion and Poisson proces-
ses and techniques for computing the power of the Kolmogorov-Smirnov test," J. Appl. Prob.,
S, 431-453.
Durbin, J. (1973a). "Distribution theory for tests based on the sample distribution function,"
Regional Conference Series in Applied Mathematics, Vol. 9, SIAM, Philadelphia, Pennsylvania.
Durbin, J. (1973b). "Weak convergence of the sample distribution function when parameters are
estimated," Ann. Statist., 1, 279-290.
Durbin, J. and Knott, M. (1972). "Components of Cramer-von Mises statistics, I," J. Roy. Statist.
Soc. B, 34, 290-307.
Durbin, J., Knott, M., and Taylor, C. (1975). "Components of Cramer-von Mises statistics, II,"
J. Roy. Statist. Soc. B, 37, 216-237.
Durbin, J., Knott, M., and Taylor, C. (1977). "Corrigenda to: Components of Cramer-von Mises
Statistics, 1I," J. Roy. Statist. Soc. B, 39, 394.
Duttweiler, D. L. (1973). "The mean-square approximation error of Bahadur's order statistic
approximation," Ann. Statist., 1, 446-453.
Dvoretzky, A., Keifer, J., and Wolfowitz, J. (1956). "Asymptotic minimax character of the sample
distribution functions and of the classical multinomial estimator," Ann. Math. Statist., 27,
642-669.
Dwass, M. (1959). "The distribution of a generalized D„ statistic," Ann. Math. Statist., 30,
1024-1028.
926 REFERENCES
Dwass, M. (1974). "Poisson processes and distribution free statistics," Adv. Appl. Prob., 6, 359-
375.
Efron, B. (1967). "The two-sample problem with censored data," Proceedings of the Fifth Berkeley
Symposium on Mathematical Statistics and Probability, Vol. 4, pp. 831-853, University of
California Press, Berkeley, California.
Efron, B. (1979). "Bootstrap methods: another look at the jackknife," Ann. Statist., 7, 1-26.
Eicker, F. (1979). "The asymptotic distribution of the suprema of the standardized empirical
process," Ann. Statist., 7, 116-138.
Elliot, R. J. (1976). "Stochastic integrals for martingales of a jump process with partially accessible
jump times," Z. Wahrsch. verw. Geb., 36, 213-226.
Epanechnikov, V. A. (1968). "The significance level and power of the two-sided Kolmogorov test
in the case of small samples," Theor. Prob. Appl., 13, 686-690.
Erdös, P. (1942). "On the law of the iterated logarithm," Ann. Math. 43, 419-436.
Fears, T. R. and Mehra, K. L. (1974). "Weak convergence of a two-sample empirical process and
a Chernoff-Savage theorem for b-mixing sequences," Ann. Statist., 2, 586-596.
Feller, W. (1943). "On the general form of the so-called law of the iterated logarithm," Trans.
Am. Math, Soc., 54, 373-402.
Feller, W. (1946). "A limit theorem for random variables with infinite moments," Am. J. Math.,
58, 257-262.
Feller, W. (1968). An Introduction to Probability Theory and Its Applications, Wiley, New York.
Ferguson, T. S. (1967). Mathematical Statistics, Academic Press, New York.
Fernandez, P. J. (1970). "A weak convergence theorem for random sums of independent random
variables," Ann. Math. Statist., 41, 710-712.
Finkelstein, H. (1971). "The law of the iterated logarithm for empirical distributions," Ann. Math.
Statist., 42, 607-615.
Földes, A. and Rejto, L. (1981a). "Strong uniform consistency for nonparametric survival curve
estimators from randomly censored data," Ann. Statist., 9, 122-129.
Földes, A. and Rejto, L. (1981b). "A LIL type result for the product limit estimator on the whole
line," Z. Wahrsch. verw. Geb., 56, 75-86.
Frankel, J. (1976). "A note on downcrossings for extremal processes," Ann. Prob., 4, 151-152.
Freedman, David (1971). Brownian Motion and Diffusion, Holden-Day, San Francisco.
Freedman, David A. (1975). "On tail probabilities for martingales," Ann. Prob., 3, 100-118.
Gabriel, J. (1977). "Martingales with a countable filtering index set," Ann. Prob., 5, 888-898.
Gänßler, P. (1983). Empirical Processes, 3, IMS Lecture Notes-Monograph Series, Institute of
Mathematical Statistics, Hayward, California.
Gänßler, P. and Stute, W. (1979). "Empirical processes: a survey of results for independent and
identically distributed random variables," Ann. Prob., 7, 193-243.
Galambos, J. (1978). The Asymptotic Theory of Extreme Order Statistics, Wiley, New York.
Ghosh, M. (1972). "On the representation of linear functions of order statistics," Sankhya, A34,
349-356.
Gill, R. D. (1980). "Censoring and stochastic integrals," Mathematical Centre Tract, Vol. 124,
Mathematisch Centrum, Amsterdam.
Gill, R. D. (1983). "Large sample behavior of the product-limit estimator on the whole line,"
Ann. Statist., 11, 49-58.
Gill, R. D. (1984). "Understanding Cox's regression model: a martingale approach," J. Am. Statist.
Assoc., 79, 441-447.
Gine, E. and Zinn, J. (1984). "Some limit theorems for empirical processes," Ann. Prob., 12,
929-989.
REFERENCES 927
Gleser, L. (1975). "On the distribution of the number of successes in independent trials," Ann.
Prob., 3, 182-188.
Glivenko, V. (1933). "Sulla determinazione empirica della legge di probabilita," Giorn. Ist. Ital.
At:uari, 4, 92-99.
Gnedenko, B. V., Koroluk, V. S., and Skorokhod, A. V. (1961). "Asymptotic expansions in
probability theory," Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics
and Probability, Vol. 2, pp. 153-170, University of California Press, Berkeley, California.
Gnedenko, B. V. and Korolyuk, V. S. (1961). "On the maximum discrepancy between two empirical
distributions," Selected Transl. Math. Statist. Prob., 1, 13-16.
Gnedenko, B. V. and Mihalevic, V. (1952). "Two theorems on the behavior of empirical distribution
functions," Selected Transl. Math. Statist. Prob., 1, 55-57.
Götze, F. (1979). "Asymptotic expansions for bivariate von Mises functionals," Z. Warsch. verw.
Geb., 50, 333-355.
Goldie, C. M. (1977). "Convergence theorems for empirical Lorenz curves and their inverses,"
Adv. Appl. Prob., 9, 765-791.
Goodman, V., Kuelbs, J., and Zinn, J. (1981). "Some results on the LIL in Banach space with
applications to weighted empirical processes," Ann. Prob., 9, 713-752.
Govindarajulu, Z. (1980). "Asymptotic normality of linear combinations of functions of order
statistics in one and several samples," Colloq. Math. Soc. J. Bolyai, Nonparametric Statistical
Inference, Vol. 32, B. V. Gnedenko, M. L. Puri, and 1. Vincze, eds., North-Holland, Amsterdam.
Govindarajulu, Z. and Mason, D. (1980). "A strong representation for linear combinations of
order statistics with application to fixed width confidence intervals for location and scale
parameters," Technical Report No. 154, Department of Statistics, University of Kentucky.
Grigelionis, B. (1975). "Random point processes and martingales," Litovsk Matem. Sbornik, XV,
4. (In Russian).
Groeneboom, P. and Shorack, G. R. (1981). "Large deviations of goodness of fit statistics and
linear combinations of order statistics," Ann. Prob., 9, 971-987.
Groeneboom, P., Oosterhoof, J., and Ruymgaart, F. H. (1979). "Large deviation theorems for
empirical probability measures," Ann. Prob., 7, 553-586.
Gut, A. (1975). "On a.s. and r-mean convergence of random processes with an application to
passage times," Z. Wahrsch. verw. Geb., 31, 333-342,
Gut, A. A. (1978). "Moments of the maximum of normal partial sums of random variables with
multidimensional indices," Z. Wahrsch. verw. Geb., 46, 205-220.
Häjek, J. (1960). "On a simple linear model in Gaussian processes," Trans. ]Ind Prague Conf.
Inf. Theory Stat. Dec. Functions, Random Processes, pp. 185-197.
Häjek, J. (1970). "Discussion of Pyke's paper," in Nonparamagnetic Techniques in Statistical
Inference, M. L. Puri, ed., pp. 38-40, Cambridge University Press, Cambridge, Massachusetts.
Häjek, J. (1972). "Local asymptotic minimax and admissibility in estimation," Proceedings of the
Sixth Berkeley Symposium on Mathematical Statistics and Probability, Vol. 1, pp. 175-194,
University of California Press, Berkeley, California.
Häjek, Jaroslav and Sidäk, Zbynek (1967). Theory of Rank Tests, Academic Press, New York.
Hall, W. J. and Loynes, R. M. (1977). "On the concept of contiguity," Ann. Prob. 5, 278-282.
Hall, W. J. and Wellner, J. A. (1979). "Estimation of mean residual life," unpublished technical
report, University of Rochester.
Hall, W. J. and Wellner, J. A. (1980). "Confidence bands for a survival curve from censored
data," Biometrika, 67, 133-143.
Hall, W. J. and Wellner, J. A. (1981). "Mean residual life," in Statistics and Related Topics, M.
Csörgö et at., eds., pp. 169-184, North-Holland, Amsterdam.
928 REFERENCES
Harter, H. L. (1980). "Modified asymptotic formulas for critical values of the Kolmogorov test
statistic," Am. Statist., 34, 110-111.
Hartman, P. and Wintner, A. (1941). "On the law of the iterated logarithm," Am. J. Math., 63,
169-176.
Hewitt, E. and Stromberg, K. (1969). Real and Abstract Analysis, Springer- Verlag, New York.
Hill, D. and Rao, P. (1977). "Tests of symmetry based on Cramer-von Mises Statistics," Biometrika,
64, 489-494.
Hoadley, A. B. (1965). "The theory of large deviations with statistical applications," unpublished
dissertation, University of California, Berkeley, California.
Hoadley, A. B. (1967). "On the probability of large deviations of functions of several empirical
cdrs," Ann. Math. Statist., 38, 360-381.
Hodges, J. L. and Lehmann, E. L. (1963). "Estimates of location based on rank tests," Ann. Math.
Statist., 34, 598-611.
Hoeffding, W. (1956). "On the distribution of the number of successes in independent trials,"
Ann. Math. Statist., 27, 713-721.
Hoeffding, W. (1963). "Probability inequalities for sums of bounded random variables," J. Am.
Statist. Assoc., 58, 13-30.
Holst, Lars and Rao, J. S. (1981). "Asymptotic spacings theory with applications to the two-sample
problem," Can. J. Statist., 9, 79-89.
Hu, Inchi (1985). "A uniform bound for the tall probability of Kolomogorov-Smirnov statistics,"
Ann. Statist., 13.
Iman, R. (1982). "Graphs for use with the Lilliefors test for normal and exponential distributions,"
Am. Statist., 36, 109-112.
Ito, K. and McKean, H. P. (1974). Diffusion Processes and Their Sample Paths, Springer- Verlag,
Berlin. Second Printing, Corrected.
Jackson, O. (1967). "An analysis of departures from the exponential distribution," J. Roy. Statist.
Soc. B 29, 540-549.
Jacobsen, Martin (1982). "Statistical analysis of counting processes," Lecture Notes in Statistics,
Vol. 12, Springer- Verlag, New York.
Jacod, J. (1975). "Multivariate point processes: Predictable projection, Radon-Nikodym deriva-
tives, representation of martingales," Z. Wahrsch. verw. Geb., 31, 235-253.
Jacod, J. (1979). "Calcul stochastique et problemes de martingales," Lecture Notes in Mathematics,
Vol. 714, Springer- Verlag, Berlin.
Jaeschke, D. (1979). "The asymptotic distribution of the supremum of the standardized empirical
distribution function on subintervals," Ann. Statist., 7, 108-115.
Jain, N. and Pruitt, W. (1975). "The other law of the iterated logarithm," Ann. Prob., 3, 1046-1049.
Jain, N. C. and Marcus, M. B. (1975). "Central limit theorems for C(S)-valued random variables,"
J. Funct. Anal., 19, 216-231.
Jain, N. C., Jogedeo, K., and Stout, W. F. (1975). "Upper and lower functions for martingales
and mixing processes," Ann. Prob., 3, 119-145.
James, B. R. (1971). "A functional law of the iterated logarithm for weighted empirical distribu-
tions," Ph.D. dissertation, Department of Statistics, University of California at Berkeley,
Berkeley, California.
James, B. R. (1975). "A functional law of the iterated logarithm for weighted empirical distribu-
tions," Ann. Prob., 3, 762-772.
Johnson, B. McK. and Killeen, T. (1983). "An explicit formula for the cdf of the L, norm of the
Brownian bridge," Ann. Prob., 11, 807-808.
Johns, M. V. (1974). "Nonparametric estimation of location," J. Am. Statist. Assoc., 69, 453-460.
REFERENCES 929
JureZkovä, J. (1969). "Asymptotic linearity of a rank statistic in regression parameter," Ann. Math.
Statist., 40, 1889-1900.
Juretkovä, J. (1971). "Nonparametric estimate of regression coefficients," Ann. Math. Statist., 42,
1328-1388.
Kabanov, Yu. M. (1973). "Representation of the functionals of Wiener processes and Poisson
processes as stochastic integrals," Teoria Verojatn Primenen, XVIII, 376-380.
Kac, M. (1949). "On deviations between theoretical and empirical distributions," Proc. Nail.
Acad. Sei., USA, 35, 252-257.
Kac, M. and Siegert, A. J. F. (1947). "An explicit representation of a stationary Gaussian process,"
Ann. Math. Statist., 18, 438-442.
Kac, M., Keifer, J., and Wolfowitz, J. (1955). "On tests of normality and other tests of goodness
of fit based on distance methods," Ann. Math. Statist., 26, 189-211.
Kalbfleisch, J. D. and Prentice, R. L. (1980). The Statistical Analysis of Failure Time Data, Wiley,
New York.
Kale, B. K. (1969). "Unified derivation of tests of goodness of fit based on spacings," Sanhkya,
A31, 43-48.
Kallianpur, Gopinath (1980). Stochastic Filtering Theory, Springer- Verlag, New York.
Kanwal, R. (1971). Linear Integral Equations Theory and Technique, Academic Press, New York.
Kaplan, E. L. and Meier, P. (1958). "Nonparametric estimation from incomplete observations,"
J. Am. Statist. Assoc., 53, 457-481.
Karlin, Samuel and Taylor, Howard M. (1975). A First Course in Stochastic Processes, 2nd ed.,
Academic Press, New York.
Kaufman, R. and Philipp, W. (1978). "A uniform law of the iterated logarithm for classes of
functions," Ann. Prob., 6, 930-952.
Khmaladze, E. V. (1981). "Martingale approach in the theory of goodness-of-fit tests," Theor.
Prob. Appl., 26, 240-257.
Keifer, J. (1961). "On large deviations of the empiric d.f. of vector chance variables and a law
of the iterated logarithm," Pacific J. Math., 11, 649-660.
Kiefer, J. (1967). "On Bahadur's representation of sample quantiles," Ann. Math. Statist., 38,
1323-1342.
Kiefer, J. (1969). "On the deviations in the Skorokhod- Strassen approximation scheme," Z.
Wahrsch. verw. Geb., 13, 321-332.
Kiefer, J. (1970). "Deviations between the sample quantile process and the sample df," in
Nonparametric Techniques in Statistical Inference, M. L. Puri, ed., Cambridge University Press,
Cambridge.
Kiefer, J. (1972). "Iterated logarithm analogues for sample quantiles when p„ --^ 0," Proceedings
of the Sixth Berkeley Symposium on Mathematical Statistics and Probability, Vol. 1, pp. 227-244,
University of California Press, Berkeley, California.
Kiefer, J. and Wolfowitz, J. (1956). "Consistency of the maximum likelihood estimator in the
presence of infinitely many nuisance parameters," Ann. Math. Statist., 27, 887-906.
Kiefer, J. and Wolfowitz, J. (1976), "Asymptotically minimax estimation of concave and convex
distribution functions," Z. Wahrsch. verw. Geb., 34, 89-95. 1
Klass, M. (1976). "Toward a universal law of the iterated logarithm, Part I," Z. Wahrsch. verw.
Geb., vol. 36, pp. 165-178.
Kolmogorov, A. N. (1933). "Sulla determinazione empirica di una legge di distribuzione," Giorn.
Ist. Ital. Attuari, 4, 83-91.
Kolmogorov, A. N. and Tihomirov, V. M. (1959). "The epsilon-entropy and epsilon-capacity of
sets in functional spaces," Usp. Mat. Nauk, 14, no. 2 (86), 3-86. Am. Math. Soc. Transl. 17
277-364 (1961).
930 REFERENCES
Komlös, J., Major, P., and Tusnädy, G. (1975). "An approximation of partial sums of independent
rv's and the sample distribution function, I," Z. Wahrsch. verw. Geb., 32, 111 -131.
Komlös, J., Major, P., and Tusnddy, G. (1976). "An approximation of partial sums of independent
rv's and the sample df, I1," Z. Wahrsch. verw. Geb., 34, 33-58.
Kostka, D. (1973). "On Khintchine's estimate for large deviations," Ann. Prob., 1, 509-512.
Kostka, D. (1974). "Lower class sequences for the Skorokhod- Strassen approximation scheme,"
Ann. Prob., 2, 1172-1178.
Kotel'nikova, V. F. and Chmaladze, E. V. (1983). "On computing the probability of an empirical
process not crossing a curvilinear boundary," Theor. Prob. App!., 27, 640-648.
Koul, H. (1970). "Some convergence theorems for ranks and weighted empirical cumulatives,"
Ann. Math. Statist., 41, 1768-1773.
Koul, H. (1977). "Behaviour of robust estimators in the regression model with dependent errors,"
Ann. Statist., 5, 681-699.
Koul, H. and Staudte, R. (1972). "Weak convergence of weighted empirical cumulatives based
on ranks," Ann. Math. Statist., 43, 832-841.
Koul, H. and DeWet, T. (1983). "Minimum distance estimation in a linear regression model,"
Ann. Statist., 11, 921-932.
Koziol, J. (1980a). "On a Cramer-von Mises type statistic for testing symmetry," J. Am. Statist.
Assoc., 75, 161-167.
Koziol, J. A. (1980b). "A note on limiting distributions for spacings statistics," Z. Wahrsch. verw.
Geb., 51, 55-62.
Kuelbs, J. and Dudley, R. M. (1980). "Log log laws for empirical measures," Ann. Prob., 8, 405-418.
Kuelbs, J. and Philipp, W. (1980). "Almost sure invariance principles for partial sums of mixing
B-valued random variables," Ann. Prob., 8, 1003-1036,
Kuiper, N. H. (1960). "Tests concerning random points on a circle," Proc. Kon. Akad. Wetensch.,
A63, 38-47.
Kunita, H. and Watanabe, S. (1967). "On square-integrable martingales," Nagoya Math. J., 30,
209-245.
Lai, T. L. (1974). "Convergence rates in the strong law of large numbers for random variables
taking values in Banach spaces," Bull. Inst. Math. Acad. Sinica, 2, 67-85.
Lauwerier, H. A. (1963). "The asymptotic expansion of the statistical distribution of N. V.
Smirnov," Z. Wahrsch. verw. Geb., 2, 61-68.
Le Cam, L. (1958). "Une theoreme sur ]a division d'une intervalle par des points pres au hasard,"
Pub. Inst. Statist. Univ. Paris, 7, 7-16.
Le Cam, L. (1969). Theorie Asymptotique de la Decision Statistique, Les Presses de l'Universite de
Montreal.
Le Cam, L. (1972). "Limits of experiments," Proceedings of the Sixth Berkeley Symposium on
Mathematical Statistics and Probability, Vol. 1, pp. 245-261, University of California Press,
Berkeley, California.
Le Cam, L. (1979). "On a theorem of J. Hajek," in Contributions to Statistics; Jaroslav Hajek
Memorial Volume, J. Jureckova, ed., pp. 119-135, Reidel, Dordrecht.
Le Cam, L. (1982). "Limit theorems for empirical measures and poissonization," in Statistics and
Probability: Essays in Honor of C. R. Rao, P. R. Krshnaiah and J. K. Ghosh, eds., North-
Holland, Amsterdam.
Le Cam, L., (1983). "A remark on empirical measures," in A Festschrift for Erich L. Lenmann in
Honor of His 65th Birthday, P. Bickel et al., eds., Wadsworth, Belmont, California.
Lehmann, E. L. (1947). "On optimum tests of composite hypotheses with one constraint," Ann.
Math. Statist., 18, 473-494.
Lehmann, E. L. (1959). Testing Statistical Hypotheses, Wiley, New York.
REFERENCES 931
Lenglart, E. (1977). "Relation de domination entre deux processus," Ann. Inst. Henri Poincare,
13, 171-179.
Lepingle, D. (1978). "Sur le comportement asymptotique des martingales locales," Seminaire de
Probabilities XII, 649, 148-161.
Levit, B. Ya. (1978). "Infinite-dimensional informational lower bounds," Theor. Prob. Appl., 23,
388-394.
Levy, P. (1937). Theorie de ('Addition des Variables Aleatoires, Gauthier-Villars, Paris.
Lewis, P. A. W. (1965). "Some results on tests for Poisson processes," Biometrika, 52, 67-78.
Lilliefors, H. W. (1967). "On the Kolmogorov-Smirnov test for normality with mean and variance
unknown," J. Am. Statist. Assoc., 62, 399-402.
Liptser, R. S. and Shiryayev, A. N. (1978). Statistics of Random Processes II: Applications,
Springer- Verlag, New York.
Loeve, M. (1977). Probability Theory, 4th ed., Springer-Verlag, New York.
Lombard, F. (1984). "An elementary proof of asymptotic normality for linear rank statistics,"
unpublished technical report, University of South Africa.
Lorentz, G. G. (1966). "Metric entropy and approximation," Bull. Am. Math. Soc., 72, 903-
937.
Loynes, R. M. (1980). "The empirical distribution function of residuals from generalized
regression," Ann. Statist., 8, 285-298.
Major, P. (1976a). "Approximation of partial sums of independent rv's," Z. Wahrsch. verw. Geb.,
35, 213-220.
Major P. (1976b). "Approximation of partial sums of iid rv's when the summands have only two
moments," Z. Wahrsch. verw. Geb., 35, 221-229.
Major, P. (1978). "On the invariance principle for sums of independent identically distributed
random variables," J. Mult. Anal., 8, 487-501.
Mallows, C. L. (1972). "A note on asymptotic joint normality," Ann. Math. Statist., 43, 508-
515.
Malmquist, S. (1950). "On a property of order statistics from a rectangular distribution," Skand.
Aktuarietidskr, 33, 214-222.
Malmquist, S. (1954). "On certain confidence contours for distribution functions," Ann. Math.
Statist., 25, 523-533.
Marcus, M. B. and Zinn, J. (1984). "The bounded law of the iterated logarithm for the weighted
empirical distribution process in the non-iid case," preprint. Ann. Prob. 12, 335-360.
Marshall, A. W. (1958). "The small sample distribution of w,," Ann. Math. Statist., 29, 307-309,
Marshall, A. W. and Olkin, 1. (1979). Inequalities: Theory of Majorization and Its Applications,
Academic Press, New York.
Mason, D. M. (1981a). "Asymptotic normality of linear combinations of order statistics with a
smooth score function," Ann. Statist., 9, 899-908.
Mason, D. M. (1981b). "Bounds for weighted empirical distribution functions," Ann. Prob., 9,
881-884.
Mason, D. M. (1981c). "On the use of a statistic based on sequential ranks to prove limit theorems
for simple linear rank statistics," Ann. Statist., 9, 424-436.
Mason, D. M. (1982b). "Some characterizations of almost sure bounds for weighted multi-
dimensional empirical distributions and a Glivenko-Cantelli theorem for sample quantiles,"
Z. Wahrsch. verw. Geb., 59, 505-513.
Mason, D. M. (1983). "The asymptotic distribution of weighted empirical distribution functions,"
Stoch. Proc. Appl., 15, 99-109.
Mason, D. M. (1984). "Weak convergence of the weighted empirical quantile process in L 2 (0, 1),"
Ann. Prob., 12, 243-255.
932 REFERENCES
Mason, D. M. (1984a). "A strong limit theorem for the oscillation modulus of the uniform empirical
quantile process," Stoch. Proc. Appl., 17, 127-136.
Mason, D. M., Shorack, G. R., and Wellner, J. A. (1983). "Strong limit theorems for oscillation
moduli of the uniform empirical process," Z. Wahrsch. verw. Geb., 65, 83-97.
Mason, D. and van Zwet, W. (1985). "A refinement of the KMT inequality for the uniform
empirical process," Universitaet Muenchen Mathematisches Institut technical report no. 30,
1-18.
Massey, F. J. (1950). "A note on the power of a nonparametric test," Ann. Math. Statist., 21,
440-443. [Correction: Ann. Math. Statist. 23, 637-638 (1952).]
McKean, H. P. (1969). Stochastic Integrals, Academic Press, New York.
Mehra, K. and Rao, J. (1975). "Weak convergence of generalized empirical processes relative to
dq under strong mixing," Ann. Prob., 3, 979-991.
Meier, P. (1975). "Estimation of a distribution function from incomplete observations," in
Perspectives in Statistics, J. Gani, ed., pp. 67-87, Academic Press, London.
Meyer, P. A. (1976). "Un cours sur les Integrales Stochastique," Seminaire de Probabilites X,
Lecture Notes in Mathematics, Vol. 511, pp. 246-400, Springer- Verlag, Berlin.
Michael, J. (1983). "The stabilized probability plot," Biometrika, 70, 11 -17.
Michel, R. (1976). "Nonuniform central limit bounds with applications to probabilities of large
deviations," Ann. Prob., 4, 102-106.
Millar, P. W. (1979). "Asymptotic minimax theorems for the sample distribution function," Z.
Wahrsch. verw. Geb., 55, 72-89.
Miller, R. (1981). Survival Analysis, Wiley, New York.
Mogulskii, A. A. (1980). "On the law of the iterated logarithm in Chung's form for functional
spaces," Theor. Prob. Appl., 24, 405-413.
Mogulskii, A. A. (1984). "Large deviations of the Wiener process," in Advances in Probability
Theory: Limit Theorems and Related Problems, A. Borovkov, ed., Optimization Software, Inc.,
New York.
Mogyorbdi, J. (1975). "Some inequalities for the maximum of partial sums of random variables,"
Math. Nach., 70, 71-85.
Müller, D. W. (1968). "Verteilungs-Invarianzprinzipien for das starke Gestz der grossen Zahl,"
Z. Wahrsch. verw. Geb., 10, 173-192.
Nagaev, S. V. (1970). "On the speed of convergence in a boundary problem, I and II," Theor.
Prob. Appl., 15, 163, 186, 403-429.
Nair, V. N. (1981). "Plots and tests for goodness of fit with randomly censored data," Biometrika,
68, 99-103.
Nair, V. N. (1984). "Confidence bands for survival functions with censored data: A comparative
study," Technometrics, 26, 265-275.
Neuhaus, G. (1975). "Convergence of the reduced empirical process for non-i.i.d. random vectors,"
Ann. Statist., 3, 528-531.
Neveu, Jacques (1975). Discrete-Parameter Martingales, North-Holland, Amsterdam.
Niederhausen, H. (1981a). "Scheffer polynomials for computing exact Kolmogorov-Smirnov and
Renyi type distributions," Ann. Statist., 9, 923-944.
Niederhausen, H. (1981b). "Tables of significance points for the variance-weighted Kolmogorov-
Smirnov statistics," Technical Report No. 298, Department of Statistics, Stanford University,
Stanford, California.
Noe, M. (1972). "The calculation of distributions of two-sided Kolmogorov Smirnov type statis-
tics," Ann. Math. Statist., 43, 58-64.
Noe, M. and Vandewiele, G. (1968). "The Calculation of distributions of Kolmogorov-Smirnov
type statistics including a table of significance points for a particular case," Ann. Math.
Statist., 39, 233-241.
REFERENCES 933
O'Reilly, N. E. (1974). "On the weak convergence of empirical processes in sup-norm metrics,"
Ann. Prob., 2, 642-651.
Oodaira, H. (1975). "Some functional laws of the iterated logarithm for dependent random
variables," in Colloq. Math. Soc. J. Bolyai, Limit Theorems of Probability Theory, P. Revesz,
ed., pp. 253-272, North-Holland, Amsterdam.
Oodaira, H. (1976). "Some limit theorems for the maximum of normalized sums of weakly
dependent random variables," Proceedings Third Japan- USSR Symposium Probability Theory,
Vol. 550, pp.467-474, Springer- Verlag, New York.
Orlov, A. (1972). "On testing the symmetry of distributions," Theory Prob. Applic., 17, 357-361.
Owen, D. B. (1962). Handbook of Statistical Tables, Addison-Wesley, Reading, Massachusetts.
Parr, W. C. (1981). "On minimum Cramer-von Mises-norm parameter estimation," Commun.
Statist., A10, 1149-1166.
Parr, W. C. and Schucany, W. R. (1982). "Minimum distance estimation and components of
goodness-of-fit statistics," J. Roy. Statist. Soc. B 44, 178-189.
Parthasarathy, K. R. (1967). Probability Measures on Metric Spaces, Academic Press, New York.
Parzen, E. (1980). "Quantile functions, convergence in quantile, and extreme value distribution
theory," Technical Report B-3, Statistical Institute, Texas A & M University, College Station,
Texas.
Pearson, E. and Hartley, H., eds. (1972). Biometrika Tablesfor Statisticians, 2, Cambridge University
Press, Cambridge.
Penkov, B. I. (1976). "Asymptotic distribution of Pyke's statistic," Theor. Prob. Appel., 21, 370-374.
Peterson, A. V. (1977). "Expressing the Kaplan-Meier estimator as a function of empirical
subsurvival functions," J. Am. Statist. Assoc., 72, 854-858.
Petrov, V. V. (1975). Sums of Independent Random Variables, Springer- Verlag, New York.
Petrovski, I. (1935). "Zur ersten Randwertaufgabe der Warmeleitungsgleichung," Compositio
Math., 1, 383-419.
Pettit, A. (1976). "Cramer-von Mises statistics for testing normality censored samples, " Biometrika,
63, 475-481.
Pettit, A. (1977). "Tests for the exponential distribution with censored data using Cramer-von
Mises statistics," Biometrika, 64, 629-632.
Pettit, A. (1978). "Generalized Cramer-von Mises statistics for the gamma distribution," Bio..
metrika, 65, 232-235.
Phadia, E. G. (1973). "Minimax estimation of a cumulative distribution function," Ann. Statist.,
1, 1149-1157.
Philipp, W. and Pinzur, L. (1980). "Almost sure approximation theorems for the multivariate
empirical process," Z. Wahrsch. verw. Geb., 54, 1 -13.
Pierce, D. A. and Kopecky, K. J. (1979). "Testing goodness of fit for the distribution of errors in
regression models," Biometrika, 66, 1 -6.
Pinsky, M. (1969). "An elementary derivation of Khintchine's estimate for deviations," Proc. Am.
Math. Soc., 22, 288-290.
Pitman, E. (1979). Some Basic Theory for Statistical Inference, Chapman & Hall, London.
Pitman, E. J. G. (1972). "Simple proofs of Steck's determinantal expressions for probabilities in
the Kolmogorov and Smirnov tests," Bull. Aus:. Math. Soc., 7, 227-232.
Plachky, D. and Steinbach, J. (1975). "A theorem about probabilities of large deviations with an
application to queuing theory," Period. Math. Hungar., 6, 343-345.
Pollard, D. (1979). "General chi-square goodness-of-fit tests with data dependent cells," Z.
Wahrsch. verw. Geb., 50, 317-331.
Pollard, D. (1980). "The minimum distance method of testing," Metrika, 27, 43-70.
Pollard, D. (1981a). "Limit theorems for empirical processes," Z. Wahrsch. verw. Geb., 54, 1 -13.
934 REFERENCES
Pollard, D. (1981b). "A central limit theorem for empirical processes," J. Aust. Math. Soc. A 33,
235-248.
Pollard, D. (1982). "A central limit theorem for k -means clustering," Ann. Prob., 10, 919-926.
Pollard, D. (1984). Convergence of Stochastic Processes, Springer- Verlag, New York.
Prohorov, Yu. V. (1956). "Convergence of random processes and limit theorems in probability
theory," Theor. Prob. Appl., 1, 157-214.
Proschan, F. and Pyke, R. (1967). "Tests for monotone failure rate," Proceedings of the Fourth
Berkeley Symposium on Mathematical Statistical Probability, Vol. 3, pp. 293-312, University
of California Press, Berkeley, California.
Puri, M. L. and Sen, P. (1969). "On the asymptotic normality of sone sample rank order test
statistics," Theory Prob. Appl., 14, 163-167.
Pyke, R. (1959).• "The supremum and infimum of the Poisson process," Ann. Math. Statist., 30,
568-576.
Pyke, R. (1965). "Spacings (with discussion)," J. Roy. Statist. Soc. B 27, 395-449.
Pyke, R. (1968). "The weak convergence of the empirical process with random sample size," Proc.
Cambridge Phil. Soc., 64, 155-160.
Pyke, R. (1970). "Asymptotic results for rank statistics," in Nonparametric Techniques in Statistical
Inference, M. L. Puri, ed., Cambridge University Press, Cambridge.
Pyke, R. (1972). "Spacings revisited," Proceedings of the Sixth Berkeley Symposium on Mathematical
Statistics and Probability, Vol. 1, pp.417 -427, University of California Press, Berkeley,
California.
Pyke, R. and Shorack, G. R. (1968). "Weak convergence of a two-sample empirical process and
a new approach to Chernoff-Savage theorems," Ann. Math. Statist., 39, 755-771.
Quade, D. (1965). "On the asymptotic power of the one-sample Kolmogorov-Smirnov tests,"
Ann. Math. Statist., 36, 1000-1018.
Raghavachari, M. (1973). "Limiting distributions of Kolmogorov-Smirnov statistics under the
alternative," Ann. Statist., 1, 67-73.
Rao, J. S. and Sethuraman, J. (1975). "Weak convergence of empirical distribution functions of
random variables subject to perturbations and scale factors," Ann. Statist., 3, 299-313.
Rao, J. S. and Sobel, M. (1980). "Incomplete Dirichlet integrals with applications to ordered
uniform spacings," J. Multivar. Anal., 10, 603-610.
Rao, K. C. (1972). "The Kolmogoroff, Cramer-von Mises, chi-square statistics for goodness-of-fit
in the parametric case, Abstract 133-6," Bull. Inst. Math. Statist., 1, 87.
Rao, P. V., Schuster, E. F., and Littell, R. C. (1975). "Estimation of shift and center of symmetry
based on Kolmogorov-Smirnov statistics," Ann. Statist., 3, 862-873.
Rao, R. R. (1963). "The law of large numbers for D[0, l]- valued random variables," Theor. Prob.
Appl., 8, 70-74.
Read, R. R. (1972). "The asymptotic inadmissibility of the sample distribution function," Ann.
Math. Statist., 43, 89-95.
Rebolledo, R. (1978). "Sur lea applications de la theorie des martingales a ]'etude statistique
d'une famille de processus ponctuels," Journees de Statique des Processus Stochastiques,
Lecture Notes in Mathematics, Vol. 636, pp. 27-70, Springer- Verlag, Berlin.
Rebolledo, R. (1979). "La methode des martingales appliquee a ('etude de convergence en loi de
processus," M. Soc. Math. France, 62, 125.
Rebolledo, R. (1980). "Central limit theorems for local martingales," Z Wahrsch. verw. Geb., 51,
269-286.
Rechtschaffen, R. (1975). "Weak convergence of the empiric process for independent random
variables," Ann. Statist., 3, 787-792.
Renyi, A. (1953). "On the theroy of order statistics," Acta Sci. Math. Hung., 4, 191-227.
Renyi, A. (1970). Probability Theory, North-Holland, Amsterdam.
REFERENCES 935
Renyi, A. (1973). "On a group of problems in the theory of ordered samples," Selected Transl.
Math. Statist. Prob., 13, 289-297.
Rice, S. O. (1982). "The integral of the absolute value of the pinned Wiener process—calculation
of its probability density by numerical integration," Ann. Prob., 10, 240-243.
Robbins, H. (1954). "A one-sided confidence interval for an unknown distribution function,"
Ann. Math. Statist., 25, 409.
Robbins, H. and Siegmund, D. (1972). "On the law of the iterated logarithm for maxima and
minima," Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probabil-
ity, Vol. 3, pp. 51-70, University of California Press, Berkeley, California.
Root, D. (1969). "The existence of certain stopping times on Brownian motion," Ann. Math.
Statist., 40, 715-718.
Ruben, H. (1976). "On the evaluation of Steck's determinant for the rectangle probabilities of
uniform order statistics," Commun. Statist., A(1), 535-543.
Ruben, H. and Gambino, J. (1982). "The exact distribution of Kolmogorov's statistic D sub n
for n s 10," Ann. Inst. Statist. Math., 34, 167-173.
Runnenberg, J. T. and Vervaat, W. (1969). "Asymptotical independence of the lengths of subinter-
vals of a randomly partitioned interval; a sample from S. Ikedä s work," Statist. Neerlandica,
23, 66-77.
Salvia, A. A. (1981). "Minimum Kolmogorov-Smirnov estimation," unpublished technical report,
Pennsylvania State University.
Samuels, S. M. (1965). "On the number of successes in independent trials," Ann. Math. Statist.,
36, 1272-1278.
Sander, Joan M. (1975). "The weak convergence of quantiles of the product," Technical Report
# 5, Division of Biostatistics, Stanford University, Stanford, California.
Sanov, 1. N. (1957). "On the probability of large deviations of random variables," Selected Trans!.
Math. Statist. Prob., 1, 213-244.
Schoenfeld, D. (1980). "Tests based on linear combinations of the orthogonal components of the
Cramer-von Mises statistics when parameters are estimated," Ann. Statist., 8, 1017-1022.
Scholz, F. W. (1980). "Towards a unified definition of maximum likelihood," Can. J. Statist., 8,
193-203.
Schuster, Eugene F. (1973). "On the goodness-of-fit problem for continuous symmetric distribu-
tions," J. Am. Statist. Assoc., 68, 713-715. [Corrigenda (1974). J. Am. Statist. Assoc. 69, 288.]
Schuster, Eugene F. (1975). "Estimating the distribution function of a symmetric distribution,"
Biometrika, 62, 631-635.
Sen, P. K. (1973a). "An almost sure invariance principle for multivariate Kolmogorov-Smirnov
statistics," Ann. Prob., 1, 488-496.
Sen, P. K. (1973b). "On fixed size confidence bands for the bundle of filaments," Ann. Statist.,
1, 526-537.
Sen, P. K. (1978). "An invariance principle for linear combinations of order statistics," Z. Wahrsch.
verw. Geb., 42, 327-340.
Sen, P. K. (1981). Sequential Nonparametrics, Wiley, New York.
Sen, P. K., Bhattacharyya, B. B., and Suh, M. W. (1973). "Limiting behavior of the extremum of
certain sample functions," Ann. Statist., 1, 297-311.
Seneta, E. (1976). "Regularly varying functions," Lecture Notes in Mathematics, Vol. 508, Springer-
Verlag, Berlin.
Serfling, R. (1975). "A general Poisson approximation theorem," Ann. Prob., 3, 726-731.
Serfling, R. (1980). Approximation Theorems of Mathematical Statistics, Wiley, New York.
Serfling, R. J. (1984). "Generalized L -, M-, and R- statistics," Ann. Statist., 12, 76-86.
936 REFERENCES
A41, 271-277.
Skorokhod, A. V. (1956). "Limit theorems for stochastic processes," Theor. Prob. Appl., 1, 261-290.
Slud, E. (1977). "Distribution inequalities for the binomial law," Ann. Prob., 5, 404-412.
Slud, E. (1978). "Entropy and maximal spacings for random partitions," Z. Wahrsch. verw. Geb.,
41, 341-352. [Correction: Z. Wahrsch. verw. Geb., 60, 139-141 (1982).]
Smirnov, N. V. (1939). "An estimate of divergence between empirical curves of a distribution in
two independent samples," Bull. MGU, 2, 3-14 (in Russian).
Smirnov, N. V. (1944). "Approximate laws of distribution of random variables from empirical
data," Usp. Mat. Nauk, 10, 179-206 (in Russian).
Smirnov, N. V. (1948). "Table for estimating the goodness of fit of empirical distributions," Ann.
Math. Statist., 19, 279-281.
Srinivasan, R. and Godio, L. (1974). "A Cramer-von Mises type statistic for testing symmetry,"
Biometrika, 61, 196-198.
Steck, G. P. (1968). "The Smirnov two-sample tests as rank tests," Ann. Math. Statist., 40,
1449-1466.
Steck, G. P. (1971). "Rectangle probabilities for uniform order statistics and the probability that
the empirical distribution function lies between two distribution functions," Ann. Math.
Statist., 42, 1-11.
Steele, J. M. (1978). "Empirical discrepancies and subadditive processes," Ann. Prob., 6,
118-127.
Stein, C. (1972). "A bound for the errors in the normal approximation to the distribution of a
sum of dependent random variables," Proceedings of the Sixth Berkeley Symposium on Mathe-
matical Statistics and Probability, Vol. 2, pp. 583-602, University of California Press, Berkeley,
California.
Stephens, M. A. (1970). "Use of the Kolmogorov-Smirnov, Cramer-von Mises and related statistics
without extensive tables," J. Roy. Statist. Soc. B 32, 115-122.
Stephens, M. A. (1974). "EDF statistics for goodness of fit and some comparisons," J. Am. Statist.
Assoc., 69, 730-737.
Stephens, M. A. (1976). "Asymptotic results for goodness of fit statistics with unknown para-
meters," Ann. Statist., 4, 357-369.
Stephens, M. A. (1977). "Goodness of fit for the extreme value distribution," Biometrika, 64,
583-588.
Stigler, S. (1969). "Linear functions of order statistics," Ann. Math. Statist., 40, 770-788.
Stigler, S. (1974). "Linear functions of order statistics with smooth weight functions," Ann. Statist.,
2,676-693.
938 REFERENCES
Stigler, S. M. (1976). "The effect of sample heterogeneity on linear functions of order statistics,
with applications to robust estimation," J. Am. Statist. Assoc., 71, 956-960.
Stone, M. (1974). "Large deviations of empirical probability measures," Ann. Statist., 2, 362-366.
Stout, W. (1974). Almost Sure Convergence, Academic Press, New York.
Strassen, V. (1964). "An invariance principle for the law of the iterated logarithm," Z. Wahrsch.
verw. Geb., 3, 211-226.
Strassen, V. (1966). "A converse to the law of the iterated logarithm," Z. Wahrsch. verw. Geb., 4,
265-268.
Strassen, V. (1967). "Almost sure behaviour of sums of independent random variables and
martingales," Proceedings of the Fifth Berkeley symposium on Mathematical Statistics and
Probability, Vol. 2, pp. 315-343, University of California Press, Berkeley, California.
Strassen, V. and Dudley, R. M. (1969). "The central limit theorem and epsilon-entropy," Lecture
Notes in Mathematics, Vol. 89, pp. 224-231, Springer- Verlag, Berlin.
Stute, W. (1982). "The oscillation behaviour of empirical processes," Ann. Prob., 10, 86-107.
Sukhatme, P. V. (1937). "Tests of significance for samples of the chi-square population with two
degrees of freedom," Ann. Eugen. Lond., 8, 52-56.
Sukhatme, S. (1972). "Fredholm determinant of a positive definite kernel of a special type and
its applications," Ann. Math. Statist., 43, 1914-1926.
Sweeting, T. J. (1982). "Independent scale-free spacings for the exponential and uniform distribu-
tions," Technical Report 21, Dept. of Mathematics, University of Surrey, Surrey, U.K.
Switzer, P. (1976). "Confidence procedures for two-sample problems," Biometrika, 63, 13-26.
Takäcs, L. (1967). Combinatorial Methods in the Theory of Stochastic Processes, Wiley, New York.
Takäcs, L. (1971). "On the comparison of a theoretical and an empirical distribution function,"
J. Appl. Prob., 8, 321-330.
Uspensky, J. V. (1937). Introduction to Mathematical Probability, McGraw-Hill, New York.
Vanderzanden, A. J. (1980). "Some results for the weighted empirical process concerning the law
of the iterated logarithm and weak convergence," thesis, Michigan State University.
van Zuijlen, M. C. A. (1976). "Some properties of the empirical distribution function in the
non-i.i.d. case," Ann. Statist., 4, 406-408.
van Zuijlen, M. C. A. (1978). "Properties of the empirical distribution function for independent
non-identically distributed random variables," Ann. Prob., 6, 250-266.
van Zuijlen, M. C. A. (1982). "Properties of the empirical distribution function for independent
non-identically distributed random vectors," Ann. Prob., 10, 108-123.
Vapnik, V. N. and (`hervonenkis, A. Ya. (1971). "On uniform convergence of the frequencies of
events to their probabilities," Theor. Prob. Appl., 16, 264-280.
Venter, J. H. (1967). "On estimation of the mode," Ann. Math. Statist., 38, 1446-1455.
Vervaat, W. (1972). "Functional central limit theorems for processes with positive drift and their
inverses," Z. Wahrsch. verve. Geb., 23, 245-253.
Vincze, I. (1970). "On Kolmogorov-Smirnov type distribution theorems," in Nonparametric
Techniques in Statistical Inference, M. L. Puri, ed., Cambridge University Press, Cambridge.
von Bahr, B. and Esseen, C. (1965). "Inequalities for the rth absolute moment of a sum of random
variables, 1 s r s 2," Ann. Math. Statist., 36, 299-303.
von Bahr, Bengt (1965). "On convergence of moments in the central limit theorem," Ann. Math.
Statist., 36, 808-818.
Watson, G. S. (1961). "Goodness-of-fit tests on a circle," Biometrika, 48, 109-114.
Watson, G. S. (1967). "Another test for the uniformity of a circular distribution," Biometrika, 54,
675-676.
Weiss, L. (1970). "Asymptotic distribution of quantiles in some non-standard cases," in Nonpara-
metric Techniques in Statistical Inference, ed. M. L. Puri, 343-348, Cambridge University Press,
Cambridge.
REFERENCES 939
Wellner, J. A. (1977a). "A martingale inequality for the empirical process," Ann. Prob., 5, 303-308.
Wellner, J. A. (977b). "A law of the iterated logarithm for functions of order statistics," Ann.
Statist., 5, 481-494.
Wellner, J. A. (1977c). "Distributions related to linear bounds for the empirical distribution
function," Ann. Statist., 5, 1003-1016.
Wellner, J. A. (1977d). "A Glivenko-Cantelli theorem and strong laws of large numbers for
functions of order statistics," Ann. Statist., 5, 473-480. [Correction: Ann. Statist., 6 (1394).]
Wellner, J. A. (1978a). "A strong invariance theorem for the strong law of large numbers," Ann.
Prob., 6, 673-679.
Wellner, J. A. (1978b). "Limit theorems for the ratio of the empirical distribution function to the
true distribution function," Z Wahrsch. verw. Geb., 45, 73-88.
Whittaker, E. T. and Watson, G. N. (1969). A Course of Modern Analysis, 4th ed., reprinted
Cambridge University Press, Cambridge.
Wichura, M. (1969). "Inequalities with applications to the weak convergence of random processes
with multidimensional time parameters," Ann. Math. Statist., 40, 681-687.
Wichura, M. J. (1970). "On the construction of almost uniformly convergent random variables
with given weakly convergent image laws," Ann. Math. Statist., 41, 284-291.
Wichura, M. J. (1971). "A note on the weak convergence of stochastic processes," Ann. Math.
Statist., 42, 1769-1772.
Wichura, M. J. (1973a). "Some Strassen-type laws of the iterated logarithm for multiparameter
stochastic processes with independent increments," Ann. Prob., 1, 272-296.
Wichura, M. J. (1973b). "Boundary crossing probabilities associated with Motoo's law of the
iterated logarithm," Ann. Prob., 1, 437-456.
Wichura, M. J. (1974a). "On the functional form of the law of the iterated logarithm for the
partial maximum of independent identically distributed random variables," Ann. Prob., 2,
202-230.
Wichura, M. J. (1974b). "Functional laws of the iterated logarithm for the partial sums of i.i.d.
random variables in the domain of attraction of a completely asymmetric stable law," Ann.
Prob., 2, 1108-1138.
Williamson, Mark A. (1982). "Cramer-von Mises type estimation of the regression parameter:
The rank analogue," J. Mutt. Anal., 12, 248-255.
Withers, C. S. (1976). "Convergence of rank processes of mixing random variables on R, II,"
Technical Report No. 61, Department of Scientific and Industrial Research, Wellington, New
Zealand.
Yang, G. L. (1978). "Estimation bf a biometric function," Ann. Statist., 6, 112-116.
Yoeurp, C. (1976a). "Decomposition des martingales locales et formules exponentielles," Semi-
naire de Proabilites X. Lecture Notes in Mathematics, Vol. 511, pp. 432-480, Springer- Verlag,
Berlin.
Yoeurp, C. (1976b). "Decomposition des martinglaes locales et formules exponentielles," Semi-
naire de Probabilites X, Lecture Notes in Mathematics, Vol. 511, pp. 481-500, Springer- Verlag,
Berlin.
Yor, M. (1976). "Sur les integrales stochastique optionelles et une suite remarquables de formules
exponentielles," Seminaire de Probabilites X, Lecture Notes in Mathematics, Vol. 511, pp. 481-
500, Springer- Verlag, Berlin.
Author Index
940
AUTHOR INDEX 941
Samuels, S. M.. 806 Strassen, V., 55, 60, 69, 74, 80, 81, 452, 512,
Sander, J. M., 657 513, 621, 632
Sanov, 1. N., 792 Stratton, H., 860
Schoenfeld, D., 251 Stromberg, K., 867
Scholz, F. W., 174, 333 Stute, W., 542, 545, 546, 837
Schucany, W. R., 254 Suh, M. W., 811
Schuster, E. F., 746, 761 Sukhatme, P. V., 336, 721
Sen, P. K.. 139, 140, 432, 691, 692, 694, Sukhatme, S., 230, 232. 233, 243, 246
703, 756, 811 Sweeting, T. J., 724
Seneta, E., 651 Switzer, P., 655
Serfling, R., 174, 473, 772, 774, 860
Shepp, L. A., 149, 471 Taktics, L., 345, 376, 383
Sheu, S. S., 75, 844 Taylor, C., 41, 42
Shiryayev, A. N., 302, 884, 888, 889, 890, Taylor. H. M., 42
891, 898 Teicher. H., 852, 853, 859
Shorack, G. R., 18, 21, 82. 106, 110, 134, Tikhomirov, V. M., 836
141, 162, 197, 404, 410, 41.5, 419, 420, Tingey, F. H., II. 349
465, 506, 511, 512, 518, 541, 545, 569, Tusnädy, G., 16, 379, 383, 492, 493
585, 587, 604, 627, 643, 662, 668, 677,
678, 687, 730, 741, 746, 753, 756, 759, van der Meulen, E. C., 724, 733
783. 784, 789, 792, 793, 794, 812, 817, Vanderzanden, A. J., 51, 140
822, 853, 877 Vandewiele, G., 363
Shore, T. R., 559 van Zuijlen, M. C. A., 104, 724, 733, 807
Sidäk, Z., 4, 9, 37, 38, 157, 162, 165, 169, Van Zwet, W., 501, 806
194, 700, 703 Vapnik, V. N., 829
Siegers, A. J. F., 14, 208 Varadahan, S. R. S., 40
Siegmund, D. 0., 404, 408, 507, 786, 859, Varaiya, P., 888, 891
865, 867 Venter, J. H., 771
Sievers, G. L., 655, 856 Vervaat, W., 159, 594, 595, 658, 659
Silverman, B. W., 773, 774 Vincze, l., 347, 376, 377, 378, 379, 386
Singh, K., 628 von Bahr, B., 857
Singpurwalla, N. D., 776, 779 von Mises. R., 145
Sirao, T., 626
Skorokhod, A. V., 16, 48 Wang, G. L., 768
Slud, E., 443 Watson, G. N., 415
Smirnov, N. V., II, 142, 143, 349, 384, Watson, G. S., 147, 212, 220, 223
401 Weiss, L., 640
Smythe, R., 877 Wellner, J. A., 18, 69, 276, 324, 386, 405,
Sobel, M., 727 406, 410, 415, 420, 424, 425, 426, 442,
Srinivasan, R., 749 465, 545, 627, 668, 690, 691, 776, 778,
Staudte, R., 110 846
Steck, G. P., 13, 367, 373 Whittaker, E. T., 415
Steele, J. M., 834 Wichura, M. J., 29, 47, 51, 77, 78, 81, 131,
Stein, C., 849, 850 132, 860, 876
Steinbach, J., 856 Wintner, A., 74
Stephens, M. A., 149, 212, 234, 237, 238, Withers, C. S., 9
243. 354, 364 Wolfowitz, J., 12, 173, 174, 356
Stigler, S. M., 668, 824, 862 Wong, E., 888, 891
Stone, M., 793
Stout, W., 844 Zinn, J., 425, 812, 821, 837, 846, 883
Subject Index
944
SUBJECT INDEX 945
order statistics, 86, 97, 334 reduced Z. of the ß„'s, 117, 810
quantile process, 13 z, 109, 810
smoothed empirical df, 86, 87 uniform, 19, 88, 695
spacings, 98, 717, 720 Weight function, 784
spacings processes, 728, 729, 730, 731
Uniformity, test of, 733 Zero-one law, 57