Professional Documents
Culture Documents
WITH APPLICATIONS
Second Edition
Mathematics and Its A ~ ~ l i c a t i o n s
Managing Editor:
M. HAZEWINKEL
Centrefor Mathematics and Computer Science, Amsterdam, The Netherlands
Volume 582
PROBABILITY THEORY
WITH APPLICATIONS
Second Edition
M.M. RAO
University of California, Riverside, California
R.J. SWIFT
California State Polytechnic University, Pomona, California
- Springer
Library of Congress Control Number: 2005049973
AMS Subject Classifications: 60Axx, 60Exx, 60Fxx, 60Gxx, 62Bxx, 62Exx, 62Gxx, 62Mxx, 93Cxx
Part I .Foundations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
focused and the division into parts makes the work more useful. We now turn
t o explaining the new format.
The first part, on foundations, treats the two fundamental ideas of prob-
ability, independelice and conditioning. In Chapter 1 we recall the necessary
results from Real Analysis which we recoininelid for a perusal. It is also iin-
portant that readers take a careful look at the fundamental law of probability
and the basic uniform continuity of characteristic functions.
Chapter 2 undertakes a serious study of (statistical) independence, which
is a distinguishing feature of Probability Theory. Independence is treated in
considerable detail in this chapter, both the basic strong and weak laws, as
well as the convergence of series of random variables. The applications consid-
ered here illustrate such results as the Glivenko-Cantelli Theorem for empiric
and density estimation, random walks, and queueing theory. There are also
exercises (with hints) of special interest aiid we recommend that all readers
pay particular attention t o Problems 5 and 6, aiid also 7, 15 aiid 21 which
explain the very special nature of the subject and the concept of independence
itself.
The somewhat long third chapter is devoted t o the second fundamental
idea, namely conditioning. As far as we know, no other graduate text in prob-
ability has treated the subject of coiiditioiial probability in such detail aiid
specificity. To mention some noteworthy points of our presentation, we have
included: (i) the unsuspected, but spectacular, failure of the Vitali convergence
theorem for conditional probabilities. This is a consequence of an interesting
theorem of Blackwell and Dubins. We include a discussion and imposition of a
restriction for a positive conclusion t o prevail, (ii) the basic problem (still un-
resolved) of calculating coiiditioiial expected values (probabilities) when the
conditioning is relative t o random variables taking uiicountably inany values,
particularly when the random variables arise from continuous distributions.
In this setting, multiple answers (all natural) for the same question are ex-
hibited via a Gaussian family. The calculations we give follow some work
by Kac aiid Slepian, leading t o paradoxes. These difficulties arise from the
necessary calculation of the Radon-Nikod9m derivative which is fuiidaineiital
here, and for which no algorithmic procedure exists in the literature. A search
through E. Bishop's text on the foundations of constructivism (in the way
of L.E.J. Brower) shows that we do not yet have a solution or a resolution
for the problems discussed. Thus our results are on existence and hence use
"idealistic methods", which present, t o future researchers in Bishop's words,
"a challenge t o find a coiistructive version aiid t o give a coiistructive proof."
Until this is fulfilled, we have t o live with subjectively chosen solutions, for
applications of our work in practice.
It is in this context, we detail in chapter 3, the Jessen-Kolmogorov-
Bochner-Tulcea theorems on existence of arbitrary families of random vari-
ables on (suitable) spaces. We also iiiclude here the basic martingale limit
theorems with applications t o U-statistics, likelihood ratios, Markov processes
and quasi-martingales. Several exercises, (about 50) add complements t o the
Preface t o Second Edition xi
her assistance and diligence with aspects of typesetting and other technical
points of the manuscript. Our colleague Michael L. Green offered valuable
comments, and Kunthel By, who read drafts of the early chapters with a
student's perspective, provided clarifications. We would like t o thank our wives
Durgamba Rao and Kelly Swift, for their love, support, aiid understanding.
We sincerely thank all these people, and hope that the new edition will
serve well as a graduate text as well as a reference volume for many aspiring
and working mathematical scientists. It is our hope that we have succeeded,
at least t o some extent, t o convey the beauty aiid magnificence of probability
theory aiid its manifold applications t o our audience.
Riverside, CA M.M.Rao,
Pomona, CA R. J. Swzft
Preface to First Edition
The material in this book is designed for a standard graduate course on prob-
ability theory, including some important applications. It was prepared from
the sets of lecture notes for a course that I have taught several times over the
past 20 years. The present version reflects the reactions of my audiences as
well as some of the textbooks that I used. Here I have tried t o focus on those
aspects of the subject that appeared t o me t o add interest both pedagogi-
cally and methodologically. In this regard, I inelltioil the following features
of the book: it emphasizes the special character of the subject and its prob-
lems while eliminating the mystery surrounding it as much as possible; it
gradually expands the content, thus showing the blossoming of the subject;
it indicates the need for abstract theory even in applications aiid shows the
inadequacy of existing results for certain apparently simple real-world prob-
lems (See Chapter 6); it attempts t o deal with the existence problems for
various classes of random families that figure in the main results of the sub-
ject; it contains a more complete (and I hope more detailed) treatment of
conditional expectatioiis and of conditional probabilities than any existing
textbook known t o me; it shows a deep internal relation among the Lkvy
coiitiiiuity theorem, Bochner's theorem on positive definite functions, aiid the
Kolmogorov-Bochner existence theorem; it makes a somewhat more detailed
treatment of the invariance principles and of limit laws for a random number
of (ordered) random variables together with applications in both areas; aiid it
provides an unhurried treatment that pays particular attention t o inotivatioii
at every stage of development.
Since this is a textbook, essentially all proofs are given in complete de-
tail (even at the risk of repetition), and some key results are given multiple
proofs when each argument has something t o contribute. On the other hand,
generalization for its own sake is avoided, aiid as a rule, abstract-Banach-
xvi Preface t o First Edition
space-valued random variables have not been included (if they have been, the
demands on the reader's preparation would have had t o be much higher).
Regarding the prerequisites, a knowledge of the Lebesgue integral would
be ideal, aiid at least a concurrent study of real analysis is recommended.
The necessary results are reviewed in Chapter 1, aiid some results that are
generally not covered in such a course, but are essential for our work, are given
with proofs. In the rest of the book, the treatment is detailed and complete,
in accordance with the basic purpose of the text. Thus it can be used for
self-study by mature scientists having no prior knowledge of probability.
The main part of the book consists of Chapters 2-5. Even though I regard
the order presented here t o be the most natural, one can start, after a review
of the relevant part of Chapter 1, with Chapter 2, 3 or 4, and with a little
discussion of independence, Chapter 5 can be studied. The last four chapters
concern applications and problems arising from the preceding work aiid partly
generalizing it. The material there indicates some of the inany directions along
which the theory is progressing.
There are several exercises at the end of each chapter. Some of these
are routine, but others demand more serious effort. For many of the latter
type, hints are provided, and there are a few that complement the text (e.g.,
Spitzer's identity and aspects of stability in Chapter 5); for them, essentially
complete details are given. I present some of these not only as good illustra-
tions but also for reference purposes.
I have included in the list of references only those books and articles that
influenced my treatment; but other works can be obtained from these sources.
Detailed credits and priorities of discovery have not been scrupulously as-
signed, although historical accounts are given in the interest of motivation.
For cross-referencing purposes, all the items in the book are serially nuin-
bered. Thus 3.4.9 is the ninth item of Section 4 of Chapter 3. In a given
section (chapter) the corresponding section (and chapter) number is omitted.
The material presented here is based on the subject as I learned it from
Professor A/I. D. Donsker's beautiful lectures many years ago. I feel it is appro-
priate here t o express my gratitude t o him for that opportunity. This book has
benefited from my experience with generations of participants in my classes
and has been read by Derek K. Chang from a student's point of view; his
questions have resolved several ambiguities in the text. The manuscript was
prepared with partial support from an Office of Naval Research contract aiid
a University of California, Riverside, research grant. The difficult task of con-
verting my handwritten copy into the finished typed product was ably done
by Joyce Kepler, Joanne McIntosh, and Anna McDermott, with the care and
interest of Florence Kelly. Both D. Chang and J . Sroka have aided me in proof-
reading and preparation of the Index. To all these people and organizations I
wish t o express my appreciation for this help aiid support.
(M.M. RUO)
List of Symbols
= [E(IX~)]'/~
= [JQI x I~~P]~/~
=p-norm of X
integral part of the real
number n 0 >
topological equivalence
means an/bn + 1 as n + cc
boundary of the set A
distinguished logarithm
of P
signum function
convolution of f l and f 2 in
L1(R)
the kth binomial coefficient
Part I Foundations
As late as the early 1920s, R. von hlises summed up the situation, no doubt in
despair, by saying, "Today, probability theory is not a mathematical science."
1.1 What Is Probability? 5
called a probability, defined for all elements of C so that the following rules
are satisfied.
<
( I ) For each A E C, 0 P ( A ) and P(a)= 1.
(2) A, B E C, A n B = 0,implies P ( A U B ) = P ( A ) + P(B).
From these two rules, we deduce immediately that (i) (taking B = 0)
P(0) = 0 and (ii) A > B, A, B E C, implies P ( A - B) = P ( A ) - P(B). In
particular, P ( A C )= 1 P ( A ) for any A E C, where AC= a - A.
-
The reader should verify that C is a semiriiig and that P satisfies coii-
ditioiis ( I ) aiid (2'), so that it is a probability on C with the above-stated
properties. We use this example for some other illustrations.
+
As a special case, we get f : 0 + C,where f = f l i f 2 , f j : R + R, j =
1 , 2 , is a complex random variable if its real and imaginary parts f l , f 2 are
(real) random variables. To illustrate the above ideas, consider the following:
The following two results relate a given collectioii and its desirable gener-
ated class. They will be needed later on. Note that a A-class which is a T-class
is a a-algebra. We detail some nonobvious (mathematical) facts.
Proof The argument is similar for both parts. Siiice the proof of (a) is in
most textbooks, here we prove (b).
The proof of (b) is not straightforward, but is based on the followiiig idea.
Consider the collection A1 = {A c 0 : A n B E A. for all B E B}. Here
we take A. > B, and A. is the smallest A-class, which is the intersection of
all such collectioiis containing B. The class A1 is not empty. In fact B c dl.
We observe that A1 is a A-class. Clearly R E A l l Ai E A l l Al n A, = 8 +-
A, n B, i = 1 , 2 , are disjoint for all B E B, and Ai n B E do.Since A. is a
A-class, (A1 u A,) n B = (A1 n B) u (A, n B) E Ao, so that Al U A, E A1.
Similarly A1 > A2 + A l n B - A 2 n B = ( A l - A 2 ) n B E A. andA1-A2 E A1.
1.2 Random Variables 11
Proof The basic idea is similar t o that of the above result. Let A. be an
algebra, generated by C and 1, which is closed under uniform convergence and
is contained in 'Ft. Clearly A. exists. Let dl be the largest such algebra. The
existelice of A1 is a consequence of the fact that the class of all such A. is
closed under unions and hence is partially ordered by inclusion. The existence
of the desired class A1 follows from the maximal principle of Hausdorff.
If f E A1, then there is a k > 0, such that if f l 5 k and if p(.) is
any polynomial on [-k, k], then p ( f ) E dl. Also by the classical Weierstrass
approximation theorem the function h : [-k, k] + R, h(z) = 1x1, is the
uniform limit of polynomials p, on [-k, k]. Hence p,(f) + If uniformly so
that (by the uniform closure of A1) If 1 E A1 and A1 is a vector lattice.
Observe that A1 automatically satisfies (ii), since if 0 < g, E A l , g, 1'
g E B ( f l ) , then g E 'Ft, aiid if A2 is generated by dl aiid g (as Ao), then by
the inaximality of A l l Az = d l . Thus A1 satisfies (i) and (ii) aiid is a vector
lattice. The second part essentially has this conclusion as its hypothesis. Let us
+
verify this. By (i'), i f f E 'Ft, then f * E 'Ft, so that f + f - = f 1 E 'Ft. Hence
+ +
if f , g E 'Ft, then f V g = +(if - gl f g) E 'Ft, since f - g E 'Ft (because
'Ft is a vector space). Thus 'Ft is a vector lattice. Coiisequently we coiisider
vector lattices containing C and 1 which are subsets of 'Ft. Next one chooses
a maximal lattice (as above). If this is A;, then it has the same properties as
A z . Thus it suffices t o consider Az and prove that each f in B ( R ) which is
C-measurable is in A2 ( C 'Ft ).
12 1 Background Material and Preliminaries
0 < >
f E C,B, = [ f > a ] = [ f l u 11 for a > 0, then f l u E A2 and by the
above proof B, E S for each a. But such sets as B, clearly generate C, so
that C c S. This completes the result in the algebra case.
In the lattice case A, B E S + XAXB = m i i i ( ~xB) ~ , E A;, SO that
A n B E S . Thus S is a n-class again. That it is a A-class is proved as before,
so that S is a a-algebra. The rest of the argument holds verbatim. Since with
each f E C, one has f A 1 E C we do not need to go t o A;, and the proof is
simplified. This establishes the result in both cases.
E(X)=Ep(X)=
L XdP,
the integral on the right being the (absolute) Lebesgue integral. Thus E ( X )
exists, by definition, iff E(IX1) exists. Let C1 be the class of all Lebesgue
integrable functions on ( R , C, P). Then E : C1 + R is a positive linear
mapping since the integral has that property. Thus for X , Y E C1 we have
text on real analysis [see, e.g., Royden (1968, 1988), Sion (1968), or Rao (1987,
2004)l.
The basic Lebesgue theorems that are often used in the sequel, are the
following:
lim E(X,) = E ( X )
n-00
Since log is an increasing function, this yields for a > 0, ,C? > 0, z > 0, y > O
Thus (5) is true in this case. Let then p > 1, so that q = p/(p - 1) > 1. In (4)
set a = l/p,,C?= l / q , z = ( l X / I X I , ) P ( w ) , aiid y = (IY/IY1,)q(w). Then it
becomes
This proves (5) in this case also, and hence it is true as stated.
+ <
(ii) Since X < +
Y P 2P m a x ( X l p , IYlP) 2P[XlP IYlP], the linearity of
+
E implies E(IX YIP) < cm, so that (6) is meaningful. If p = 1, the result
+ < +
follows from X Y X I IYI. If p = oo, X I <
IIX,, YI <
I Y , , a.e.
Hence IX YI+ < + 1x1, IYl,, a.e., so that (6) again holds in this case.
1.3 Expectations and the Lebesgue Theory 15
Now let 1 < p < oo. If I X + Y I P = 0, then (6) is trivial and true. Thus let
IIX+YIl, > 0 also.
Consider
or
I IX + YI . I 1x1.I + I lYl. 1
This completes the proof.
Some specializatioiis of the above result, which holds for any measure
space, in the context of probability spaces are needed. Taking Y = 1 a.e. in
so that all lower-order moments exist. The inequality holds if we show that
&" is a noiidecreasing fuiictioii of r > 0. But this follows from (9) if we let
>
p = s/r 1 and replace X by X I T there. Thus
because
which is (10). Note that the coiivexity of y, can also be proved with a direct
application of the Holder inequality. This completes the proof.
Remark The conditions for equality in (5), (6), ( l o ) , and (11) can be
obtained immediately, and will be left to the reader. We invoke them later
when necessary.
One can now present the promised generalization of (9) as
Proof Let xo, x1 be two points on the line and x be an intermediate point
+ +
so that x = a x 1 pxo, where 0 I a I 1, a p = 1. Then by (3)
so that
(x - xo)($(x) - 4(x1)) I (x1 - x)(4(xo) - 4(x)).
By setting y = 21, yo = x, this becomes
This is (12), and the result holds. [Note: t i < tz +-g(t1) < g ( t ~ ) . ~ ]
This is not entirely trivial. Use ( 3 ) in different forms carefully. [See, e.g., G.H.
Hardy, J.E. Littlewood, and G. PolyA (1934, p. 93).]
18 1 Background Material and Preliminaries
which always exists since a 2 ( X ) < E ( X 2 ) < cm; and (b) the covariance of
X . Y as
(ii) ( ~ e b ~ ~ eInequality)
v's If X has a finite variance, then
This follows immediately from definitions. The second line says that for
uncorrelated random variables, the variance of the sum is the sum of the
variances. We later strengthen this concept into what is called "independence"
and deduce several results of great importance in the subject.
For future use, we iiiclude two fundamental results on multiple integration
aiid differentiation of set functions.
Ll f ( w i , .)pl(dwl) i s p2 - measurable,
and, moreover,
L2 f (., ~ z ) p z ( d w z )i s p1 - measurable,
20 1 Background Material and Preliminaries
-+ .
(ii) (Tonelli) If i n the above p1, pz are a-finite and f : R + R zs measur-
able, o r pC1,are arbitrary measures but there exists a sequence of p-integrable
simple functions fn : 0 + R+ such that fn 1' f a.e. ( p ) , t h e n again (21) holds
even though both sides m a y n o w be infinite.
The detailed arguments for this result are found in most standard texts
[cf., e.g., Zaaiien (1967), Rao (1987, 2004)l. The other key result is the follow-
ing:
in the sense that if either side exists, so does the other and equality holds.
(ii) ( F u n d a m e n t a l Law of P r o b a b i l i t y ) . If (a,C , p ) i s a probability
space and X : f l + R i s a r a n d o m variable with distribution function F x ,
and g : R + R is a Bore1 function, Y = g ( X ) , t h e n
in the sense that if either side exists, so does the other with equality holding.
(iii) I n particular, for a n y p > 0 ,
Proof (i) This very general statement is easily deduced from the definition
of the image measure. Indeed, if g(s) = xA(s),A E A, theii the left side of ( I )
becomes
Thus (1) is true, and by the linearity of the integral and the (a-additivity of
v) the same result holds if g = Cr=la i x ~, ,a simple function with ai 0 . If >
> <
g 0 is measurable, theii there exist simple functioiis 0 g, 1' g, so that ( I )
holds by the Lebesgue monotone convergence theorem. Since any measurable
g = g + g f with g* >
0 aiid measurable, the last statement implies the truth
of (1) in general for which g+ or g- is integrable.
(ii) Taking S = R, ( p is probability) we get v ( - m , z ) = Fx(z), the
distribution function of X . Thus (1) is simply (2). If Y = g ( X ) : L? + R, then
clearly Y is a random variable. Replace X by Y, g by identity, aiid S by R in
This is (3), and the proof is complete. In the last equality, Fx(-oo) = 0 and
Fx(+oo) = 1 are substituted.
In the above theorem, it is clear that g can be complex valued since the
+
stated result applies t o g = gl iga, where gl,g2 are real measurable func-
tions. We use this fact t o illustrate the followiiig important concept on Fourier
transform of real random variables. Indeed if X : R + R is any random vari-
able, g : R + C is a Bore1 function, then g o X : R + C is a complex random
+
variable. If gt(x) = costx i s i n t x = eitx, then gt : R + C is a bounded
continuous function and g t ( X ) is a bounded complex random variable for all
t € R.
Thus the followiiig definition is meaningful:
Hence given E > 0, choose L, > 0 such that Fx(L,) - Fx(-LC) >
1- ( ~ 1 4 ) .
If t l < t 2 , consider, with the elementary properties of Stieltjes integrals,
(eatl' - e i t 2 x ) d ~(z)
x
+ i l > L E l
The earliest occasion on which the reader may have encountered this con-
cept is perhaps in studying real analysis, in the form of the Vitali theorem,
which for finite measure spaces is a generalization of the dominated conver-
gence criterion (Theorem 2.2). Let us recall this result.
Actually the conclusion holds i f only E(IX,) < oo,n > 1, and (ii) of Defini-
tion 3 is satisfied for {X,, n > 1).
where [ ( t )is a constant on [k, k + 1) and increases only by jumps. Clearly $(.)
is convex, q5-x) = q5(x), 4(0) = 0, $(x)/x > <(k)((x- k)/x) 1' oo, for k < x
aiid x, k /' oo. We claim that this function satisfies the requirements of (iii).
Indeed, let us calculate E ( 4 ( X t ) ) . We have
However,
(11)
Thus (10) and (11) imply sup, E ( 4 ( X t ) ) 5 1, aiid (iii) follows.
(iii) + (i) is a consequence of the Holder inequality for Orlicz spaces since
4(.) can be assumed here to be the so-called Young function. The proof is
similar t o the case in which 4(x) = xl",p > 1. By the support line property,
<
the boundedness of E(q5(Xt)) I k < oo implies that of E ( I X t ) kl < oo.
The second condition follows from [q = p/(p I ) ]-
26 1 Background Material and Preliminaries
+
So there is a k(> 1 a l ) < oo such that sup, E ( I X t ) < k < oo.To verify the
second condition of Definition 3, we have for A t C
Given E > 0, choose a = a, > 0, so that by (ii) the first integral is < E
uniformly in t . For this a,, (12) becomes
Since E > 0 is arbitrary this integral is zero, and (i) holds. This completes the
demonstration.
<
is employed in the argument. Since min(Xn,X ) X , and min(Xn, X ) + X
a.e., the dominated convergence theorem implies E (min(Xn,X ) ) + E ( X ) as
n + oo. Hence taking expectations on both sides of (14) and letting n + oo,
we get E(max(X,, X ) ) + E ( X ) as well. On the other hand,
implies for each E > 0 that there is an n, such that for all n > n, aiid all
A E C,
It follows that, because each finite set of integrable random variables is always
uniformly integrable,
lim
P(A)-0
Lx,~P< lim
P(A)-0
(16)
Exercises
1. Let 0 be a nonempty set and A c 0. Then X A , called the indicator
(or "characteristic," in older terminology) function, which is 1 on A, 0 on
f l A = A", is useful in some calculations on set operations. We illustrate its
uses by this problem.
Thus the first two terms usually underestimate the probability of Uy=lAi.
(b) Let (R,, A,),i = 0 , 1 , . . . , n , be measurable spaces, f : Go + x,"=,Ri
be a mapping. Establish that f is measurable iff each component of f =
( f l ,. . . , f,) is. [Hint: Verify f p l ( a ( C ) ) = o ( f p l ( C ) ) for a collection C of sets.]
Exercises 29
>
3. (a) Let {X,, n 1) be a sequence of random variables on a probability
space (0,C , P ) . Show that X, + X , a random variable, in probability iff
(b) If X , Y are any pair of ra,ndom variables, and Lo is the set of all random
variables, define
6. (a) Let X be a bounded random variable on (R, C , P). Then for any
E > 0 and < + >
any r > 0, verify that E ( I X r ) E~ a r P [ X E], where a is the
bound on X I . In particular, if a = 1, we have E ( x ~ ) 5 P [ X I E].
- E~ >
(b) Obtain an improved version of the one-sided ceby:ev7s inequality as
follows:
Var ( X )
P I X > E ( X ) +&I 5 &, Var(X).
+
Then if B = [Y > E ] , verify that E(f (Y)) > P ( B ) and E(f (Y)) = -.I
>
7. Let {X,, n 1) be a sequence of r.v.s on (0,C, P) such that X, + X
a.e., where X is an r.v. If 0 < p < cm and E(IX,") < cm,n > 1, then
{lXn I", n > 1) is uniformly integrable iff E ( X n - XI") + 0 as n + cm.
The same argument applies to a more general situation as follows. Suppose
4 : R+ + R+ is a symmetric function, 4(0) = 0, and either 4 is coiitin-
uous concave increasing function on R+ or is a convex fuiiction satisfying
4(2x) < ccj(x),x > 0, for some 0 < c < cm. If E(4(X,)) < k < cc
and E(cj(Xn)) + E ( 4 ( X ) ) , t h e n E ( 4 ( X n - X ) ) + 0 as n + cm and
{4(Xn),n > 1) is uniformly integrable. [Hint: Observe that there is a con-
stant 0 < 2 < cm such that in both the above convex and concave cases,
>
~ ( Z + Y )I c [ 4 ( z ) + 4 ( y ) l , z , yE R.Hence ~ [ 4 ( X ~ I ) + 4 ( X ) 1 - 4 - X ) 0
a.e.1
liin liin
m+cx n i o o
1..
Y,dp =0
vanish outside a fixed a-finite set, and p may thus be assumed a-finite. It may
be noted that (iii) is a consequence of (p) if p(Q) < oo.Next, ( p ) also implies
using ( a ) ,this implies the result. The difficulty is that the hypothesis is weaker
than the dominated or Vitali convergence theorems, and the X,,n > 1, are
not uniformly integrable. The result can be extended if the X, are vector
valued.]
(b) The following example shows how the hypotheses of the above part
caii be specialized. Let X,, g,, h, be random variables such that (i)X, + X
a.e.,g,+ga.e.,andh,+ha.e.asn--oo,(ii)g,<X,<h,,n>l,
and (iii) SOgndp -- s Q g d p E R, SO h,dp -- SO hdp E R , n -- oo. Then
limn,, SO X,dp = SO X d p E R. [Let Y, = X, - g,, 2, = g,. Then (i) and
(ii a ) of (a) hold.
Now 0 < Y,< h, - g, and Sn(h, - g,)dp + s Q ( h g)dp by hypoth-
-
esis. Since h, - g, > 0 and we may assume that these are finite after some
n , let us take n = 1 for convenience. As shown in Proposition 4.6, this im-
plies the uniform integrability of {h, - g,, n > I}, and (ii p) and (iii) will
hold, since SQ (h, - g,) - (h g ) d p + 0 is then true. Note that no order
-
relation of the range is involved in (a), while this is crucial in the present
formulation.] Observe that if g, < <
0 h,, we may take g, = -h,, replacing
h, by max(h,, -g,) if necessary, so that IX, < h, andSQ h,dp -- SQ hdp
implies the h, sequence, and hence the X, sequence, is uniformly integrable
as in Proposition 4.6. The result of (b) (proved differently) is due t o John
W. Pratt. The problem is presented here t o show how uniform integrability
caii appear in different forms. The latter are neither more natural nor elegant
than the ones usually given.
[If P(w1:) is independent of w l then this reduces to Theorem 3.1 1(ii) and the
proof is a modification of that result.]
[Hints:If Q1 << Qa, then observe that, by considering the marginal measures
Q,(., Ra), we also have p1 << pa. Next note that for a.a.(wl),
is a inoliotolie class and an algebra. Deduce that PI( w l , .) << Pa(w1, .),
a.a,(wl). The converse is simpler, and then the above formula follows. Only a
careful application of the "chain rule" is needed. Here the proof can be simpli-
fied and the application of monotone class theorem avoided if C2 is assumed
countably generated as was originally done.]
Chapter 2
2.1 Independence
If A and B are two events of a probability space ( R , C, P), it is natural t o
say that A is independent of B whenever the occurrence or nonoccurrence of
A has no influence on the occurrence or nonoccurrence of B . Consequently
the uncertainty about joint occurrence of both A and B must be higher than
either of the individual events. This means that the probability of a joint
occurrence of A and B should be "much smaller" than either of the individual
probabilities. This intuitive feeling can be formalized mathematically by the
equation
for a pair of events A, B. How should intuition translate for three events
A, B, C if every pair among them is independent? The following ancient ex-
ample, due t o S. Bernstein, shows that, for a satisfactory mathematical ab-
straction, more care is necessary. Thus if R = {dl,w2, w3, w4), C = P ( f l ) ,the
power set, let each point carry the same weight, so that
But P ( A n B n C ) = i, i.
and P ( A ) P ( B ) P ( C ) = Thus A, B , C are not
independent. Also A, ( B n C ) are not independent, and similarly B , ( C n A)
and C, (A n B ) are not independent.
These coiisideratioiis lead us to introduce the precise coiicept of mutual
iiidependeiice of a collection of events by not pairwise but by systems of equa-
tions so that the above anomaly cannot occur.
holds simultaneously for each rn = 2,3, ...,n. The whole class {Ai, i E I ) is
said to be mutually independent if each finite subcollectioii is mutually inde-
>
pendent in the above sense, i.e., equations (1) hold for each n 2. Similarly if
{Ai, i E I) is a collection of families of events from C then they are mutually
independent if for each n , Ai, E Ai, we have the set of equations (1) holding
for Ai,, k = 1, ...,m , 1 < m 5 n. Thus if Ai E Ai then {Ai, i E I ) is a mutually
independent family. [Followingcustom, we usually omit the word "mutually" .]
n
pairwise case needs only ( ) equations. Similarly %-wisen iiidependeiice has
2
n
(rn)
equations, and it does not imply other independences if 2 < m < n is
a fixed number m. It is the strength of the (mutual) concept that allows all
n > 2. This is the mathematical abstraction of the intuitive feeling of inde-
pendence that experience has shown to be the best possible one. It seems to
give a satisfactory approxiination to the heuristic idea of iiidepeiideiice in the
physical world. In addition, this mathematical formulation has been found
successful in applications to such areas as number theory, and Fourier analy-
sis. The notion of independence is fundamental t o probability theory
and distinguishes it from measure theory. The coiicept translates itself
to random variables in the following form.
Taking S = R(orRn) and A as its Bore1 a-algebra, one gets the corresponding
concept for real (or vector) random families.
B = { B E a ( D ) : P ( A n B ) = P ( A ) P ( B ) ,A E A). (3)
Evidently D c G. Also R E G, and if B1, Bz E with B1 n Bz = 0,then
Thus El B2 E G. Finally, if B, E G, B, c
- we can show, from the
fact that P is 0-additive, that limn En = U,>lBn E G. Hence G is a A-class.
Since B > D , by Proposition 1.2.8b, G > ;(Dl. But (3) implies G and A
are independent. Thus A and a ( D ) are independent also, as asserted. Note
that since J c I is an arbitrary subset, we need the full hypothesis that
{A, B,, i E I) is a mutually independeiit collection, aiid not a mere two-by-
two independence.
(b) It is clear that Definition 2 implies the statement here. Conversely, let
B1 be the collection of sets {[X,, < XI, x E R), and
2.1 Independence
and similarly for B2. Hence by (a), B1 and a ( & ) are independent. Since
B1 is a T-class, we also get, by (a) again, that o ( B 1 ) and 0 ( B 2 ) are inde-
pendent. But a ( & ) = x ; ' ( ~ ) [ =o ( X i , ) ] ,and 0 ( B 2 )= o ( u ~ = ~ x ~ , ' ( R ) ) [ =
a ( X i , , . . . , X,,,)],where R is the Borel a-algebra of R.
Heiice if A1 c a ( X i , ) ,A j c xcl(R)(= o ( X z J ) )c o ( B z ) ,then Al and
{ A z ,...,A,} are independent. Thus
The above result says that we can obtain ( 1 ) for random variables if we
assume the apparently weaker coiiditioii in part (b) of the above theorem.
This is particularly useful in computations. Let us record some consequences.
If, moreover, each of the random variables is integrable, then their product
is integrable and we have
n
= E ( I X ~ I [by
) , Theorem 1.4.lil.
i=l
2.1 Independence 39
Since the right side is finite by hypothesis, so is the left side. Now that
n r = l = X,I is integrable we can use the same computation above for X,
and Fx,,..,x,,(= n,=,
n
Fx,),and this time use Fubini's theorem in place of
Tonelli's. Then we get (6) in place of (7). This proves the result.
We shall now establish oiie of the most surprising consequences of the iii-
dependelice concept, the zero-one law. If X I , X 2 , . . . is a sequence of random
variables, then n,"==, > >
o ( X i , i n ) is called the tail a-algebra of {X,, n 1).
Proof The argument is similar t o that of the theorem. Note that lo and
BJ = a ( X i , i E J) are independent for each J E 3, as in the above proof.
So by Theorem 3a, '& and B = a(BJ, J E 3)are independent. But clearly
B = a ( X i , i E I),so that '& c B. Hence the result follows as before.
by Fubini's theorem and the fact that Pi(Ri) = 1. Consequently the xi are
independent (cf. Theorem 3b) and each xi has the same distribution as X,.
Thus by enlargement of (fl, C, P) to (dlC, P), we have n independent ran-
dom variables. This procedure can be employed for the existence of any finite
collection of independent random variables without altering the probability
structure (see also Problem 5 (a)). The results of Section 3.4 establishing the
Kolmogorov- Bochner theorem will show that this enlargement can be used
for a n y collection of random variables (countable or not). Consequently, we
can and do develop the theory without any question of the richness of the
2.1 Independence 41
Proof (i) This simple result is used more often than the other more in-
definition, A = limsup, A, =
by the a-subadditivity of P, we have
'
volved parts, since the events need not be (even pairwise) independent. By
U k r nAk Uk>, Ak for all n 1. Hence >
Letting n + oo, and using the convergence of the series Cr=l P ( A k ) , the
result follows.
(ii) (After Chung, 1974) Let {A,,n >
1) be pairwise independent. By
Problem 1 of Chapter 1 , we have
P([lim S, = oo]) = 1.
n-00
Equivalently,
If In= )cA,, - p,, then the 1, are orthogonal random variables. In fact, using
the inner product notation,
Thus
Let n + cm, and then N + cm (so that /3 4 1); (17) gives P[lim,,, S, =
oo]= 1. This establishes the result because of (10).
(iii) This is an immediate consequence of (ii), and again gives a zero-
one phenomenon! However, in the case of mutual independence, the proof is
simpler than that of (ii), and we give the easy argument here, for variety. Let
A: = En. Then En,n > 1, are independent, since {o(A,), n 1) forins an >
independent class. Let P(A,) = a,. To show that
= liin
mi00
n (1 - air) (by independelice of Ek )
- x
00
k=n+l
ak) =0 ( since xa
k=l
a; = oo by hypothesis).
P [ liin
sn
-= 11 = 1.
n-m E(Sn)
Sn
p[lim sup -< 1 ] > 1 - - ,
1
n E(Sn) - N2
and letting N + oo,we get
s,
liin sup -< 1 a.e.
n E(Sn)
On the other hand by (17), P[P < S,/E(S,)] > 1 - 1/N2, n > no. Hence for
each fixed N , this yields
P
[ < liminf sn -
E(Sn)
] >IpF.
1
have the same (finite dimensional) distributions, because the X, are identi-
cally distributed, and we have for any B E Bm, P(X E B ) = P ( X E B). In
particular, if the permutation is such that A, = [(X2,, X a n - ~.,. . , X,+I) E
B,], then A, and A, are independent and p(A, A,) + 0 as n. + oo again.
Indeed, let T be the 1-1 measurable permutation mapping TA, = A, and
TA = A since A is symmetric. So
by independence.
Letting n + oo, and noting that the metric function is also contiiiuous in
the resulting topology, it follows that A, + A in p +-P(A,) + P ( A ) . Heiice
lim P(A,
,100
n A,) = P(A n A) = lim P(A,) . P(A,) = P ( A ) ~
,100
(2) Note 10, as well as part (ii) of Theorem 9, indicate how several weaken-
i n g ~of the independence condition can be formulated. A number of different
extensions of Borel-Cantelli lemmas have appeared in the literature, and they
are useful for special problems. The poiiit here is that the concept of inde-
pendence, as given in Defiiiitioiis 1 and 2, leads t o some very striking results,
which then motivate the introduction of different types of dependences for a
sustained study. In this chapter we present only the basic results founded on
the independence hypothesis; later on we discuss how some natural extensions
suggest themselves.
The first two as well as the last convergences have already appeared, and
these are defined and profitably employed in general analysis on arbitrary
measure spaces. However, on finite measure spaces there are some additional
relations which are of particular interest in our study. The third concept, on
2.2 Convergence Concepts 47
Proof The first implication is a standard result for any finite measure. In
fact, if X, + X a.e., then there is a set N E C, P ( N ) = 0, and on R - N ,
X,(w) + X(w). Thus lim sup, X,(w) = X(w), w E R - N , and for each E > 0,
Hence the set has measure zero. Since P is a finite measure, this implies
48 2 Independence and Strong Convergence
< P ( N ) = 0. (1)
Consequently,
P
Thus X n i X , and the first assertion is proved.
For the next implication, let F x , Fx,, be the distribution functions of X
and X,, and let a , b be continuity points of Fx with a < b. Then
P
Also, since X n +X , with E =b- a > 0 , one has from the inclusion
= 1 - Fx,, (a + + FX,,(a
E) - E) + 0 as n + oo,
since
*
and a! E are points of continuity of Fx for each E > 0. Thus X + a. This
P
completes the proof except for the last comment, which is illustrated by the
following simple pair of standard counter-examples.
D
X, + X trivially. This shows that the last implication callnot be reversed in
general. Next, consider the first one. Let R = [O, 11, C = Bore1 0-algebra of R ,
and P = Lebesgue measure. For each n > 1, express n in a binary expansion,
+ < < >
n = 2r k, 0 k 2r, r 0. Define f, = XA,, , where A, = [k/2r, (k+ 1)/2'].
It is clear that f, is measurable, and for 0 < E < 1,
But f,(w) f + 0 for any w E R. This establishes all assertions. (If we are
allowed to change probability spaces, keeping the same image measures of the
random variables, these problems become less significant. Cf. Problem 5 (b).)
In spite of the last part, we shall be able t o prove the equivalence t o a
subclass of random variables, namely, if the X, form a sequence of partial
sums of independent random variables. For this result we need to develop
probability theory much further, aiid thus it is postponed until Chapter 4.
(For a partial result, see Problem 16.) Here we proceed with the iinplicatioiis
that do not refer to "convergence in distribution."
The following result is of interest in many calculations.
>
Recall that a sequence {X,, n 1) on ( R , C, P) converges P-uniformly to
X if for each E > 0, there is a set A, E C such that P(A,) < E aiid on R A E ,
X, + X uniformly. We then have
theii X,(w) + X(w), i.e., the sequence converges a.e. The other direction is
non-trivial.
Thus let X, i X a.e. Then there is an N E C, P ( N ) = 0, and X,(w) +
X(w) for each w E R - N . If k > 1, m > 1 are integers and we define
1
Ak,, = {w E 0 - N : IX,(w) - X ( w ) < - for all n
m
> k),
then the facts that X, + X on 0 - N and Ak,, c &+I,, imply that
R-N = U >
=,: Ak,, for all m 1. Consequently for each E > 0, and each
m >
1, we can find a large enough ko = k o ( ~m, ) such that Aka,, has large
measure, i.e., P ( R Ak,,,) < €12". If A, = Urn=, Aio(E,m),m
00
- then
2.2 Convergence Concepts 51
On the other hand, n > ko(&,m ) + IX,(w) - X(w)1 < l l m for w E Ak,,,.
Thus
for every m > 1, so that X, +X uniformly on A:. This completes the proof.
by the pth mean convergence hypothesis. Note that there is generally no re-
lation between mean convergence and pointwise a.e., since for the latter the
random variables need not be in any LP, p > 0.
We now specialize the convergence theory if the sequences are partial sums
of independent random variables, and present important consequences. Some
further, less sharp, assertions in the general case are possible. Some of these
are included as problems at the end of the chapter.
At the root of the pointwise convergence theory, there is usually a "max-
imal inequality," for a set of random variables. Here is a generalized version
of ~ e b y # e v ' inequality.
s The latter was proved for only one r.v. We thus start
with the fundamental result:
time. It is clear that the Ak are disjoint, Ak E C , and A = UL=l Ak. Let
Y , = X, - p, and S, = xi=l Yk, so that E(s,) = 0, VarS, = VarS,. Now
consider
n
(Yk+l + . . . + Y,,)~dP, +
since Sn = Sk Y,,
i=k+l
Remark The only place in the above proof where we use the independence
hypothesis is to go from (11) t o the next line to conclude that
Any other hypothesis that guarantees the nonnegativity of this term gives the
corresponding maximal inequality. There are several classes of iioiiindepeiident
random variables iiicludiiig (positive sub-) C2-martingale sequences giving
such a result. This will be seen in the next chapter.
2.2 Convergence Concepts 53
All the strong convergence theorems that follow in this section are due t o
Kolmogorov.
Proof It should be shown that limn Sn exists a.e. If this is proved, since
C r = l pk converges, we get
lim
n
x
n
k=l
X k = lim Sn
n+cx
+ n+m
lim xn
k=l
pk = X exists a.e.
>
But the sequence {Sn(w),n 1) of scalars converges iff it satisfies the Cauchy
criterion, i.e., iff inf, supk S m + k ( ~ ) - S m
= (0~ ) Thus let E > 0 be given,
a.e.
and by Theorem 5,
It follows that
Letting E /' GO, since 02 < GO, the right side of (15) goes t o zero, so
that limsup,,,IS, -SmI < cc a.e. But IS, 5 S n - S m + ISml,so
2 Independence and Strong Convergence
as m + cm for each E > 0. It follows that limsup, Sn = lim inf, Sn a.e., and
the limit exists as asserted.
If we let m = 0 in (14) and Xo = 0, then (14) implies (12). It remains to
establish mean convergence. In fact, consider for rn < n , with = X, -p,,
E((s,-s,,)') = E ( ( x m + , + . . .+x,)") = xn
k=m+l
a: +0 as m , n + cm. (17)
00
But Xn = X x r = l 1/n, diverges a.e., on the set where X > 0, a.e.
A partial converse of the above theorem is as follows.
However,
IA,,-I
and
x ; d p = E(XA ,b+, x;) = a i p ( ~ , - l ) by independence of X A , , and
~ ~ X,,
1A,,-I
&S,-ldP = E ( X , ) E ( X A ~ , + , S , - ~=
>
since E(X,) = 0. Thus by noting t h a t P(AnP1) P(A,), (18) becomes, with
) 0,
these simplifications and the hypothesis that X,I c < ce a.e., <
56 2 Independence and Strong Convergence
Since P(Z) > 0, (19) implies that Cr=la: < oo. This yields the last state-
ment and, in view of Theorem 6, completes the proof.
X:-sequence. Now, the given probability space may not support two such se-
quences. In that case, we enlarge it by adjunction as explained after Corollary
8 in the last section. The details are as follows.
Let (fi,2?, P) = (a, C, P) @ ( R , C, P), and let Xk, X:; be defined on fi
by the equatioiis
Remarks (1) If any one of the three series of the above theorem diverges,
then En,, X, diverges a.e. This means the set [C,"==, X, converges] has prob-
ability zero, so that the zero-one criterion obtains. The proof of this statement
is a simple consequence of the preceding results (since the convergence is de-
termined by C k > , XI, for large n ) , but not of Theorem 1.12.
(2) Observe that the convergence statements on series in all these theorems
relate t o unconditional convergence. It is not absolute convergence, as simple
examples show. For instance, if a, > 0, C,"==, a, = oo, but
Proof By ~ e b ~ ~ einequality
v's (1) follows at once.
Note that if all the X, have the same distribution, then they have equal
moments, i.e., a: = 022 = ... = a2, SO that 02(sn) = Cr=l 022
= n a 2 , and
a 2 ( S n ) / n 2 = a 2 / n + 0 is automatically satisfied. The result has been im-
proved in 1928 by A. Khintchine, by assuming just one moment. For the
proof, he used a truncation argument, originally introduced in 1913 by A. A.
Markov. Here we present this proof as it became a powerful tool. Later we see
that the result can be proved, using the characteristic function technique, in
a very elementary manner, and even with a slightly weaker hypothesis than
the existence of the first moment [i.e., only with the existence of a derivative
at the origin for its Fourier transform; that does not imply E ( X ) exists].
Proof We use the preceding result in the proof for the truncated functions
and then complete the argument with a detailed analysis. Let E > 0, S > 0 be
given. Define
Ug = X k ~ [ <~ n ~6 ] ,h Vp = X k ~ [ l X ~ l > n 6 ] , PI
+
so that X k = Uk Vk. Let F be the common distribution function of the
X k . Since E ( X k l ) < GO, we have M = E(IXkl) = JR z l d F ( z ) < GO, by
the fundamental (image) law of probability. If p = E ( X k ) = JR z d F ( z ) and
pk = E (UE) , then
>
Thus there is N1 such that n Nl + /A; - pi < &/2. Note that p; depends
only on n, and hence not on k, because of the common distribution of the X k .
Similarly
By hypothesis U,", U,", ... are independent (bounded) random variables with
means pd and variances bounded by n6M. Let TG = U," +
. . . U g and +
60 2 Independence and Strong Convergence
+ +
W z = V;L . . . V z . Then by the preceding proposition, or rather the
ceby;ev's inequality,
On the other hand, adding and subtracting n p and using the triangle inequal-
ity gives
Thus if n > Nl, we have, with the choice of Nl after (2), on the set
the following:
Now this holds if Xi, X j are independent when i # j . Thus the above proof
actually yields the following stronger result, stated for reference.
In our development of the subject, the next result serves as a link be-
tween the preceding considerations and the "strong laws." It was obtained
by A. Rajchman in the early 1930's. The hypothesis is weaker than pairwise
independence, but demands the existence of a uniform bound on variances,
aiid then yields a stronger conclusion. The proof uses a different technique, of
interest in the subject.
- -
and let T, = max,a<l;<(,+l)a S k - Sn2. Since as k + ce the first term on
the right + 0 a.e., (shown above) it suffices to establish Tn/n2 + 0 a.e. To
use the orthogonality property of the Yk, consider T.: We have
by Markov's inequality and (12). Thus En,, PIITn > n2€] < ce and the
Borel-Cantelli lemma again yields p[ITn/n2 > E , i.o.1 = 0. Hence Tn/n2 + 0
a.e. and by (11) S k / k + 0 a.e., proving the result.
2.3 Laws of Large Numbers 63
Hence
because s, + s +- for any E > 0, there is no[= no(€)] such that n > no +-
Is, - sI < E , and hence
a.e. a s n + oo.
Proof To prove the sufficiency of the first part, suppose E ( X 1 l ) < oo.We
use the truncation method of Theorem 2. For simplicity, let E(X1) = 0, since
otherwise we consider the sequence Yk = XI, E(X1). For each n, define
-
2.3 Laws of Large Numbers
00
Hence
x00
n=l n2
5
00 n-1
n=l k=O
H a k , k + l [ using the notation of (15)]
n2
Thus
. n . n . n
as n + GO.
Conversely, suppose that S n / n + a o , a constant, a.e. We observe that
so that liin sup, ( X , 1 In) = 0 a.e. Again by the Borel-Cantelli lemma, this is
equivalent t o saying that C r = l P I I X n > n] < oo. But
= x00
n=l
nan,,,+l [ as shown for (15)]
since the X, have the same distribution. Consequently, by the second Borel-
Cantelli lemma, we have
limsup-1 % a
>- on R - A,.
n n 2
Letting a run through the rationals and setting A = UCYE ratiollalS A", we
get P ( A ) = 0, and on R A, limsup,(lS,/n) > k for every k > 0. Heiice
-
liin sup,(IS, I n ) = +oo a.e. This completes the proof of the theorem.
The above result contains slightly more information. In fact, we have the
following:
>
Proof Since the X n are i.i.d., so are the X n l , n 1, and they are clearly
independent. Moreover, by i.i.d. P[-x < X, < x] = P[-x < X1 < x]. Indeed
> 1)
=S, x [ l x l < x l d F ( Xsince
)( X, has F as its d.f. for all n
If S
A =Ck=l
n
and S: = Sn-SA, then {SA/n, n > 1) is uniformly
bounded, so that it is uniformly integrable. But
68 2 Independence and Strong Convergence
>
uniformly in n. Thus { ( l / n ) S t , n 1) is also uniformly integrable. Conse-
>
quently { ( l / n ) S n I , n 1) is a uniformly integrable set. Hence the result
follows by Vitali's theorem and the limits must agree, as asserted.
These results and their methods of proofs have been extended in various
directions. The idea of investigating the averages (both the WLLN and SLLN)
has served an important role in creating the modern ergodic theory. Here the
random variables X n are derived from one fixed function X1 : R + R in terms
of a measurable mapping T : R + 0 [TP1(C) c C] which preserves measure,
meaning P = P o T p l , or P ( A ) = P ( T p l ( A ) ) ,A E C. Then
>
where T 2 = T O Tand T n = T o T n p l , n 1. Since X1 : R + R and T : f l + R
are both measurable, so that (Xl o T)-I (B) = TP1 ( X P 1(B)) c T P 1(C) c C ,
where B is the Bore1 a-algebra of R, X 2 is an r.v., and similarly X n is an r.v.
For such a sequence, which is no longer independent, the prototypes of the
laws of large numbers have been proved. These are called ergodic theorems.
The correspoiideiits of weak laws are called mean ergodic theorems and those
of the strong laws are termed individual ergodic theorems. This theory has
branched out into a separate discipline, leaning more toward measure theoretic
functional analysis than probability, but still retaining important connections
with the latter. For a brief account, see Section 3 of Chapter 7.
Another result suggested by the above theorem is t o investigate the growth
of sums of independent random variables. How fast does Sn cross some pre-
scribed bound? The laws of the iterated logarithm are of this type, for which
more tools are needed. We consider some of them in Chapters 5 and later. We
now turn t o some applications.
One of the important and popular applications of the SLLN is t o show that
the empiric distribution converges a.e. and uniformly t o the distribution of
the random variable. To make this statement precise, coiisider a sequence
of random variables X I , X 2 , . . . on ( f l , C, P) such that P [ X n < z] = F(z),
2.4 Applications 69
z E R,n > 1; i.e., they are identically distributed. If we observe "the seg-
ment" X I , . . . , X n , then the empiric distribution is defined as the "natural"
proportion for each outcome w E R :
1
Fn(z,w) = -{ number of Xi (w) < z} . (1)
n
Equivalently, let us define
lim
n-a
sup
-a<,<,
IFn(z,.)-F(z)l=O
I =l.
for all i > 1. Hence by the (special case of) SLLN, we get
and
lim F,(zi,,
Ha,, = {w : n-00 + 0, w) = F(zk,, + 0)) .
The11 by (4), P(Ek,,) = 1 = P(Hk,,), 1 < k < r. Let
then S E C, because if
>
Since r 1 is arbitrary, the left-side inequality holds if the right-side inequal-
ity does, for almost all w. Hence w E E + w E = S . Thus E c S , and the
s
theorem is proved.
Remark: The einpiric distribution has found substantial use in the statis-
tical method known as the "Bootstrap". In the theory of statistics, bootstrap-
ping is a method for estimating the sampling distribution of an estimator by
"resampling" with replacement from the original sample.
In the proof of the theorem, one notes that the detailed analysis was needed
above in extending the a.e. coilvergelice of ( 4 ) for each x to uniform conver-
gence in x over R.This extension does not involve any real probabilistic ideas.
It is essentially classical analysis. If we denote by C the class of all intervals
(-cm, x ) , and denote by
aiid similarly
72 2 Independence and Strong Convergence
P
lniw
lim suplpn(A) - p ( A ) = O
AEC 1 = 1.
We shall not present a proof of this result, since it needs several other
auxiliary facts related to convergence in distribution, which have not been es-
tablished thus far. However, this result, just as the preceding one, also starts
its analysis from the basic SLLN for its probabilistic part.
lim sup I f , ( x ) f ( z ) l = O
n+oopm<x<m I =l.
Proof The argument here is somewhat different from the previous one,
and it will be presented again in steps for convenience. As usual, let E be the
expectation operator.
1. Consider
But, given E > 0, there is a 6, > 0 such that f (x - t ) - f(x)l < E for t l < 6
by the uniform continuity of f . Thus
+ li (k) k d t
since f is bounded. Letting n + oo, so that h, + 0, by (i) aiid (ii) both the
second and third terms go to zero. Since the right side is independent of x , it
follows that vn(x) + f (x) uniformly in x, as n -+ oo.
where Fn is the einpiric distribution of the X,. Now using the "iiiversion
formula" (16) for k,
we can express f, as follows:
3. The preceding work can be used in our proof in the following manner.
By Markov's inequality
76 2 Independence and Strong Convergence
where the limit can be brought outside of the P-measure by Fatou's lemma.
(Note that the sup inside the square brackets is bounded by hypothesis and
is a measurable function, by the same argument as in step 2 of the proof of
Theorem 1. The existelice of limit in (20) will be proved.) We now show that
the right side of (20) is zero, so that (12) results. But if I . 11, is the uniform
(or supremum) norm over R,then
where we used (19) aiid the fact that g,(z) = E ( f,(z)), which is again ob-
tained from (19) with E ( & ( u ) ) . With the same computation, first using the
Fubini theorem and then the dominated convergence theorem t o interchange
integrals on [-a, a] x R,we can pass t o the limit as a + cc through a sequence
under the expectation. Thus
(22)
But by (171, $,(u) - E($,(u)) = I & ( u ) -~ ( e ~+~0 a.e.,~ ~and ) since
1
these quantities are bounded, this is also true boundedly. Thus by letting
n + oo in both sides of (22) and noting that the limits on a and n are on
independent sets, it follows that the right side of (22) is zero a.e. By the
uniform bouiidedness of the left-side norms in (221, we can take expectations,
aiid the result is zero.
Thus E(Ifn(.) - f ( . ) I u ) + 0 as n + cc, and the right side of (20) is zero.
This completes the proof.
(C) Queueing
If we let X k = bk- 1 ak , then the X k are i.i.d. random variables, aiid (23)
-
+ >
becomes Wo = 0 and Wk+1 = max(Wk Xk+1,0), k 0. Note that whenever
Wk = 0 for some k, the server is free and the situation is like the one at
the beginning, so that we have a recurrent pattern. This recurrence is a key
ingredient of the solution of the problem of finding the limiting behavior of
the Wk-sequence. It is called the single server queueing problem.
Consider So = 0, S, = C;=,X k . Then the sequence {S,, n >
0) is also
>
said t o perform a random walk on R, and if Sk E A for some k 0 and Bore1
set A, one says that the walk S, visits A at step k. In the queueing situation,
>
we have the followiiig statement about the process {W,, n 0).
Then for each n > 0, the quantities W, and A(l, = max{Sj, 0 < j < n} are
identically distributed random variables. Moreover, if F,(x) = P[W, < x],
then
lim F,(x) = F ( x )
n+cx
78
>
2 Independence and Strong Convergence
Proof For the first part of the proof we follow Feller (19661, even though
it can also be proved by using the method of convolutioiis aiid the fact that
Wk aiid Xk+1 are independent. The argument to be given is probabilistic aiid
has independent interest.
Since Wo = 0 = So,we may express Wn in an alternative form as Wn =
max{(S, - Sk): 0 k < < n). In fact, this is trivial for n = 0; suppose it is
verified for n = m. Then consider the case n = m + 1. Writing S,+l Sk = -
+
S, Sk X,+l, we have with V for "max"
-
>
Hence the statement is true for all m 0. On the other hand, X I , . . . , X, are
i.i.d. random variables. Thus the joint distribution of X 1 , X 2 , .. . , X n is the
same as that of X i , X i , . . . , XA,where X i = X,, Xh = X,-1,. . . , XA = X I .
But the joint distribution of Sb,S;, . . . , SA, where Sh = X i (S&= 01,
and that of So,S 1 , . . . , S, must also be the same. This in turn means, on
substituting the unprimed variables, that So,S1,.. . , S, aiid SA, Si = S, -
But
so that
lim max Sk < x
n+cc O<k<n I ,
F ( x ) = liin Fn(x) = P
n-00
lim max Sk
n-oo O<k<n
<x I = P sup Sk < x ,
[k>o I (25)
2.4 Applications 79
Case (i): p = E(X1) > 0: By the SLLN, S,/n + E(X1) a.e., so that for
sufficiently large n , S, > E(X1) . a.e. Thus
Case (ii): p = E(X1) < 0: Again by the SLLN, S,/n + E(X1) a.e., and
given E > 0, and S > 0, oiie caii choose NEs such that n > NEs implies
This may be expressed in the following manner. Let E > 0 be small enough so
+
that E(X1) E < 0. Then for 0 < S < $, choose NEs such that with (261,
= lim sup(X1
n) 1
+ . . . + X,) = X1 + liinn ysup(X2
2
+ . . . + X,)
Siiice Y = ko a.e. and X1 is a real nonzero r.v., (29) caii hold only if ko = +GO.
--
Now [lim sup,,l S, = +oo] C [sup,,o S, = GO], aiid so we are back in the
situation treated in case (i), i.e., F(z) 0, z E R. This completes the proof
of the theorem.
The preceding result raises several related questions, some of which are
the following. When E ( X 1 ) < 0, we saw that the waiting times W, + W in
distribution where W is a (proper) r. v. Thus, in this case, if Q, is the number
of customers in the queue when the service of the n t h customer is completed,
D
then Q, is an r.v. But then what is the distribution of Q,, and does Q, + Q?
Siiice Q, is no more than k iff the coinpletioii of the nth customer service time
is no more than the interarrival times of the last k customers, we get
P[Qn< k] = P [ W n bn I + a,+l +.
. . +an+k]. (30)
The random variables on the right are all independent, and thus this may
D
be calculated explicitly in principle. Moreover, it can be shown, siiice W, +
W aiid the b, aiid a, are identically distributed, that Q, 2 Q from this
expression.
2.4 Applications 81
In Theorem 4 we saw that the behavior of the waiting time sequence is gov-
erned by S, = EL=, X k , the sequence of partial sums of i.i.d. random vari-
ables. In Section 2 we considered the convergence of sums of general iiide-
pendent random variables, but the surprising behavior of i.i.d. sums was iiot
analyzed more thoroughly. Such a sequence is called a random walk. Here we
include an introduction t o the subject that will elaborate on the proof of The-
orem 4 and complete it. The results are due t o Chung and Fuchs. We refer t o
Chuiig (1974). For a detailed analysis of the subject, and its relation t o the
group structure of the range space, see Spitzer (1964).
Thus if X,, n > 1, are i.i.d., and {S, = X k ,n> 1) is a random
walk sequence, let Y = limsup, X,. We showed in the proof of Theorem 4
+
[Case (iii)] that Y = X1 Y and Y is a "permutation invariant" r.v. Then
this equation implies Y = ko a.e. (= *oo possibly), by the Hewitt-Savage
zero-one law. If X1 = 0 a.e., theii by the i.i.d. condition, all X, = 0 a.e., so
that S, = 0 for all n (and Y = 0 a.e.). If X I $ 0 a.e., theii ko = o o or +ce
only. If ko = - m , then clearly -ce 5 lim inf, S, 5 lim sup, S, = - m , so
that limn,, S, = -ce a.e.; or if ko = + m , then liminf, S, can be + m , in
which case S, + +ce a.e., or liminf, S, = o o < liinsup, S, = +ce. Since
liin sup, (S,) = liin iiif, (-S,), no other possibilities can occur. In the case
-
c e = lim inf, S, < lim sup, S, = +ce a.e. (the interesting case), we can
>
look into the behavior of {S,, n 1) and analyze its fluctuations.
A state x E R is called a recurrent poant of the range of the sequence if for
each E > 0, P[IS, - x < E, 2.0.1 = 1, i.e., the random walk visits x infinitely
often with probability one. Let R be the set of all recurrent points of R.A
point y E R is termed a possible value of the sequence if for each E > 0, there
is a k such that P[Sk- yl < E] > 0. We remark that by Cases (i) and (ii)
of the proof of Theorem 4, if E ( X 1 ) > 0 or < 0, then lim,,, S, = +ce or
= -ce respectively. Thus fluctuations show up only in the case E(X1) = 0
when the expectation exists. However, E(IX1I) < oo will iiot be assumed for
the present discussion.
Theorem 5 For the random walk {S,,n > I), the set R of recurrent
values (or points) has the following description: Either R = 0 or R c R is a
82 2 Independence and Strong Convergence
-
,
I
often ,
(31)
By hypothesis P[Sm- y < E ] > 0, and this shows that the second factor of
(31) is zero. But by the i.i.d. hypothesis, Sn aiid Sm+, Smhave the same -
It is clear from the above result that 0 plays a key role in the recurrence
pheiioineiioii of the random walk. A characterization of this is available:
>
Theorem 6 Let {X,, n 1) be i.i.d. r a n d o m variables o n ( R ,C , P) and
{S,, n >
0) be the corresponding r a n d o m walk sequence. If for a n E > 0 we
"ndeed,if R # 0,because it is a closed subgroup of R, let d = inf{x E R , x >
0). Then d 2 0 and there exist d, t R,d, I d. If d = 0, we can verify that
{kd,,k= 0,*1,*2 , . . . ; n 2 1) isdense i n R and c R + R = R .Ifd > 0,then
{ n d , n = 0 , *I,. . .) c R and is all of R. There are no other kinds of groups. Note
that if R f 0 every possible value is also a recurrent value of the random walk.
2.4 Applications 83
then 0 is not a recurrent value of {S,, n > 0). If, on the other hand,for every
E > 0 it is true that the series in (32) diverges, then 0 is recurrent. [Itfollows
from (36) below that if the series (32) diverges for one E > 0, then the same
is true for all E > 0.1
Proof If the series in (32) converges, then the first Borel-Cantelli lemma
implies P[IS, < E, finitely often ] = 1 so that 0 6 R. The second part is
>
harder, since the events {[IS, < ~ ] , n 1) are not independent. Here one
needs to show that P[IS, < E, i.o.1 = 1. We consider the complementary
event aiid verify that it has probability zero, after using the structure of the
S, sequence.
>
Consider for any fixed k 1 the event A h defined as
+ +
trial, but visits at the mth trial [from the (m l ) t h t o (m k - l ) t h trials,
it may or may not visit]. Hence A h , A h + k ,Ah+,k,. . . are disjoint events for
> >
rn 1 aiid fixed k 1. Thus
> > +
But for each k 1, [IS, < E] and [IS, S m 5 2 ~n , rn k] are independent,
aiid A k > [lSmI< E] n [IS,-Sml > >
2 ~ , n rn+k], k >
1, since IS, >
>
(IS,-SmI - SmI) 2 ~ - E= E, on the displayed set. Hence, with independence,
(34) becomes
But
Hence
00
or P[IS,I < E, i.o.1 = 1 for any E > 0. This means 0 E R and completes the
proof of the theorem.
Moreover, the set of all recurrent values R forms a closed subgroup of the
additive group IKk.
The proof of the last statement is the same as that for Theorem 5 , which
has a more precise description of R in case k = 1.
If R = 0 , then the random walk is called transient, and is termed recurrent,
(or persistent) if R # 0.
We can now present a sufficient condition for the recurrence of a random
walk, and this completes the proof of case (iii) of Theorem 4.
Remark As noted prior t o Theorem 3.2, this condition holds for certain
symmetric random variables without the existence of the first moment. On the
other hand, if E ( I X 1 ) < cm, then it is always true by the WLLN (or SLLN).
2.4 Applications 85
We shall establish the result with the weaker hypothesis as stated. The proof
uses the linear order structure of the range of S,. Actually the result itself is
not valid in higher dimensions (2 3). It is true in 2-dimensions, but needs a
different method with characteristic functions (cf. Problem 21.)
Since b > 0 is arbitrary, (32) fails for each E > 0, and so {S,, n 2 1) is
recurrent.
It remains to establish (36). Consider, for each integer m, [ r n ~ Sn < <
+
(m I)&]and write it as a disjoint union:
n
~ Sn < (rn
[ m ': + I)€] = U [ r n <~ Sn < (rn + I)€]? Akl (38)
k=O
where A. = [me < +
So < (rn l ) ~and ] for k> 1, Ak = [ S k [ r n ~( ,m +
<
I ) € ) ,S j @ [ r n ~( ,m + I ) € ) ,0 j 5 k 11. Thus Ak is the w-set for which S k
-
This proves the inequality (361, and heiice also the theorem.
Exercises
1. (a) Let (0,C, P) be a probability space with 0 having at least three
points. If X : 0 + R is a random variable taking three or more distinct
values, verify that 1,X , X 2 are linearly independent (in the sense of linear
algebra) but will be stochastically independent only if X is two valued and
X 2 is a coilstant with probability 1, in which case 1 , X , X 2 are not linearly
independent. Give an example satisfying the latter conditions. On the other
hand, if X, Y are stochastically independent and not both are constant, then
they are linearly independent, whenever X # 0 and Y # 0.
and that for 0 < a < b < c < 1,1< i < j < n,
Exercises
-
n!
(i - l ) ! ( j - i - l)!(n - j)!
<
[Note that for 0 yl < yz < . . . < y, <
1, for small enough E > 0 such that
+
[Yi, Y, E] are disjoint for 1 <
i I n , we have
P[Yi I Y , I Yi + E i 7 1 I i < n]
-
-
C I xiI I Y j + E j ,
P [ ~ j 1 Ij < n],
all perlnutations
...,,i,,) of (1,2 ,...,7
(i~ ~ )
where the X, are i.i.d. for each permutation, and that there are n! permuta-
tions.]
(b) Let 2 1 , . . . , Zn be i.i.d, random variables on ( 0 ,C , P) with their com-
mon distribution F on R contiiiuous aiid strictly increasing. If Xi = F ( Z i ) ,
1 < <i n , show that X I , . . . , X, are random variables satisfying the hy-
pothesis of (a). Deduce from the above that if 2, is the ith-order statistic of
(21,. . . , Z,), then
have the same distributions and the (stronger) pointwise convergence is true
for the Y,-sequence. (Compare this with Proposition 2.2.) [Sketch of proof:
>
Let Fn(x) = P [ X n < x],F ( x ) = P [ X < x],x E R, n 1. If Yn, Y are inverses
to F,, F, theii Y,(z) = inf{y E R : F,(yL> z}; aiid similarly for Y. Clearly
Yn, Y are Borel functions on ( 0 , l ) + R. Since Y,(z) < y iff F,(y) > z,
we have, on letting R1 = (0, I ) , C1 = Borel a-algebra of R', with P I as the
Lebesgue measure, P1[Yn< y] = P f [ x : x < Fn(y)] = Fn(y); and similarly for
P1[Y< y] = F ( y ) . Since F,(x) + F ( x ) at all continuity points of F, let x be a
continuity point of F. If the F, are strictly increasing, theii Yn = F;' and the
result is immediate. In the general case, follow the argument of Proposition
<
2.2, by showing that for a < b c < dl
&,
a,,, taking values 0 , 1 , . . . , 9 each with probability independent of one
another. (This is where probability enters!) Then each Y, is uniformly dis-
tributed and they are mutually independent. (Clearly binary or ternary etc.
expaiisioiis can be used in lieu of decimal expansion. Unfortunately, no recipe
exists for choosing a,,, here. A similar frustration was (reportedly) expressed
by A. Einstein regarding his inability t o find a recipe for a particular Brownian
particle t o be in a prescribed region, but only a probability of the event can be
given. [cf., Science, 30 (20051, pp. 865-890, special issue on Einstein's legacy].)
>
If {F,, n 1) is a sequeiice of distribution functions on R,let F l l be the geii-
>
eralized inverse of F, as defined (in part (b)) above. Let X, = F;l(Y,), n 1.
Then {X,, n > 1) is a sequeiice (of mutually independent) random variables
with distributions F,. [It is even possible t o take a single uniformly distributed
random variable Y by reordering a,,, into a single sequence {bk, k > 1) so
that Y = ~ rh, = by ~excluding the terminating decimal expaiisioiis which
are countable aiid hence coiistitute a set of (Lebesgue) measure zero, and
then X, = F;l(Y), n > I.] It should be observed that in the representation
of X, as a mapping of (Yl, . . . , Y,) [or of Y] by I, which is one-to-one, there
are infinitely many representations, while a unique distribution obtains if it
is nondecreasing, such as I?;.' This fact is of interest in applications such as
those implied in part (b) above.
The followiiig example is considered by Wiener (in the book cited above,
p. 146). Let Yl, Y2 be independent uniformly distributed random variables on
(0,l) aiid define R = ( log Yl) +, and Q = 27rY2 and let X1 = R cos 0, X 2 =
R sin 8. Then the Jacobian is easily computed, and one has dyldyz = ~ r - ( x ~ + z ~ )
dxldx2 so that X I , X 2 are independent normal random variables generated
by Yl, Y2. Extending this procedure establish the followiiig n-dimensional
version. Let Yl, . . . , Y, be independent uiiiforinly distributed raiidoin vari-
ables on (0,1), Q1, = 2TYk+1 and X1 = Rsin QnP1 . . . sin 82 sin 81; X 2 =
R sin 8,-1 . . . sin 82 cos 81, . . . , X,-1 = R sin 8,-1 cos 8,-2, and X, = R cos 8,-1
where R = (-2 log ~ 1 ) ; .The Jacobian is much more difficult, [use induc-
tion], but is nonvanishing, giving a one-to-one mapping. (With R = 1, the
trailsformatioil has Jacobian t o be (-l)"(siii Ql)"(sin 82)" . . . sin QnP1 cos 8,
so that it is 1-1 between the open unit n-ball and the open rectangle
0 < 8, < T , z = 1 , . . . n.) This shows that the @, sequence (different from the
F,) can be somewhat involved, but the procedure is quite general as noted by
N. Wiener whose use in a construction of Brownian motion is now legendary,
aiid was emphasized by P. Lkvy later. [In the last chapter we again coiisider
the Brownian motion construction with a more recent and (hopefully) simpler
method.]
points j E R, and no mixed types can occur. [Hints: Let G c R be the group
generated by the ranges of the X n , so that G is countable. Note that for any
+ +
Borel set B , the vector sum G B = {x y : x E G, y E B ) is again Borel.
If no = {W : S,(W)+ S(W)), then let A = {w : S(W) +
E (G B ) n Ro),
and verify that A is a tail event, so that P ( A ) = 0 or 1 by Theorem 1.7.
+
Indeed, if gl - g, E G, then gl E G B for some Borel set B iff gz E G B . +
+
Now if Sn = S - (S - Sn) E G , then S - Sn E G B , and conversely. But
+
S - Sn E no.Hence A = [S- Sn E G B ] n Go, so that A is a tail event,
and P ( A ) = 0 or 1. This implies either S is countably valued or else, since
P ( R o ) = 1, P[SE G + B ] = 0 for each countable B . In this case P[S E B ] = 0
for each countable B , so that S has a continuous distribution, with range non-
countable. Consequently, either the distribution of S is singular relative t o the
+
Lebesgue measure, or it satisfies P[S E G B ] = 0 for all Borel B of zero
+
Lebesgue measure. Since G is countable, this again implies P[S E G B ] = 0,
so that P[S E B ] = 0 for all Lebesgue iiull sets. This means the distribution
of S is absolutely coiitinuous. To see what type is the distribution of S, we
have to exclude the other two cases, and no recipe is provided in this result.
In fact this is the last result of Jessen-Wintrier's long paper (1935).]
(b) To decide on the types above, we need to resort to other tricks, and
>
some will be noted here. Let {X,, n 1) be i.i.d. random variables with
independent random variables on the Lebesgue unit interval [O, I ] , with the
< <
uniform distribution F, i.e., F ( x ) = 0 if x 0, = x if 0 < x 1, aiid F ( x ) = 1
for x > 1, and hence has an absolutely continuous distribution. [Hints: Note
that if Fu,Fv are the distributions of U, V, then FrJ+v can be obtained by
the image law (cf. Theorem 1.4.1) as a convolution:
with p as the Lebesgue measure. Deduce that U has the same distribution
as the identity mapping I : ( 0 , l ) + ( 0 , l ) with Lebesgue measure.](Explicit
calculation with ch.f. is easier and will be noted in Exercise 4.11.)
>
( c ) By similar indirect arguments verify the following: (i) If {X,, n 1) is
as above, then S, = EL=, x k / 3 b S a.e. aiid S has a singular distribution.
(ii) (P. Lkvy) If Y,, n = 1 , 2 , .. . , are independent with values in a countable
set C c R,and if there is a convergent set of numbers cn E C such that
then S = Cr!l Yk exists a.e., and S takes only countably many values with
positive probability.
(d) The proofs of T h e o r e m 2.6 aiid 2.7 used the Kronecker lemma and
the (c, 1)-summability. Thus the Kolomogorov SLLN (Theorem 2.7) can be
considered as a probabilistic analog of the classical (c, 1)-summability in the
>
sense that a sequence {X,, n 1) of i.i.d. r.v.s on (R, C , P) obeys the (c, 1)-
pointwise a.e. iff E(X1) = p E R exists. Since classical analysis shows that
>
(c, 1)-summability implies (c, p)-summability for p 1, one can expect a sim-
ilar result for i.i.d sequences. In fact the following precise version holds. Let
>
p, p E R,p 1. Verify the following equivalences for i.i.d. r.v.s:
>
(i) {X,, n 1) obeys the SLLN,
(ii) E(X1) = p ,
>
(iii) {X,, n 1) obeys (c, 1)-summability a.e. with limit p ,
>
(iv) {X,, n 1) obeys (c,p)-summability a.e. with limit p ,
i.e., lim ~ ( k + ~ - l ) ~ n p k = p a . e . ,
(v) {X,, n
"+" ('"3 k=O
> 1) obeys Abel mean a.e. with value p ,
00
i.e., lim (1 - A)
OSXTl
Aixi= p a.e..
i= 1
[Hints: The classical theories on summability imply that (i) +- (iii) +- (iv)
+- (v) and Theorem 2.7 gives (i) @ (ii). So it suffices t o show (v) +- (ii).
For ordinary sequences of reals, Abel convergence does not imply even (c, 1)-
convergence. (Here the converse holds if the sequence is bounded in addition,
as shown by J.E. Littlewood.) But the i.i.d. hypothesis implies the converse
a.e. as follows. Using the method of Theorem 2.9, called symmetrization, let
X: = X, - XA where Xn aiid XA are i.i.d. (one may use enlargement of
the basic probability space as in the proof of 2.9, where X i is denoted as 2,
there), and (v) can be expressed if 1 - X = l l m , m 1 as >
2 Independence and Strong Convergence
lim
m-oo
-x(l--)~'=~-~=O,
1
m
O0
i=l
1
m
i
"
or alternately
7. This problem illustrates the strengths and limitations of our a.e. con-
vergence statements. Let ( R , C , P) be the Lebesgue unit interval, so that
R = ( 0 , l ) and P = Lebesgue measure on the completed Borel a-algebra C .
If w E 0, expand this in decimals: w = 0.x1x2.. . SO that if X,(w) = x,, then
X, : 0 + {0,1, . . . ,9} is a r.v. Verify that {X,, n > 1) is an i.i.d. sequence
with the common distribution F , given by F ( y ) = (k+1)/10, for k 5 y < k + l ,
k = 0 , 1 , . . . , 9 ; = 0 if y < 0; = 1 for y > 9. Let Sk(.) be the Dirac delta func-
tion, and consider Sk(Xn). Then P[Sk(X,) = 11 = 1/10, P[fik(X,) = 01 =
>
9/10, and Sk(X,), n 1, are i.i.d., for each k = 0,1, . . . , 9 . If kl, k2, . . . , k, are
< <
a fixed r-tuple of integers such that 0 ki 9, define (cf. Problem 5 (d) also)
Show that the E,,,, n > 1, are bounded uncorrelated random variables for
which we have ( l l m ) C z l E,,, + 1/10, a.e. as m + oo (apply Theorem
3.4), r = 1 , 2 , . . .. This means for a.a. w E f l , the ordered set of numbers
(kl, . . . , k,) appears in the decimal expailsion of w with the asymptotic rela-
tive frequency of 1/10'. Every number w E R for which this holds is called a
normal number. It follows that CT=lE,,, + cc as m + cc for a.a.(w) (as in
the proof of Theorem 4.4); thus E,,, = 1 infinitely often, which means that the
given set (kl, . . . , k,) in the same order occurs infinitely often in the expansion
of each normal number, and that almost all w E f l are normal. [This fact was
established by E. Borel in 1909.1 However, there is no known recipe t o find
which numbers in 0 are normal. Since the transcendental ( T - e ) E (0, I ) , it
Exercises 95
is not known whether T - e is normal; otherwise it would have settled the old
question of H. Weyl: Is it true or false that in the decimal expansion of the
irrational number T, the integers 0 , 1 , . . . , 9 occur somewhere in their natural
order? This question was raised in the 1920's to counter the assertion of the
logicians of Hilbert's school asserting that every statement is either "true" or
"false," i.e., has only two truth values. As of now we do not know the definitive
answer to Weyl's question, even though T has been expanded to over lo5 dec-
imal places and the above sequence still did not appear! [See D. Shanks and J.
W. Wrench, Jr. (1962). Math. Computation 16, 76-89, for such an expansion
of n. On the other hand, it is known that 0.l23456789lOlll2l3l4l5l6l7...,
using all the natural numbers, is normal. Recently two Japanese computer
scientists seem to have shown that the answer is 'yes' after expanding T for
several billions of decimal places. See, e.g. J.M. Borwain (1998), Math. Intel-
ligence~,20, 14-15.]
8. The WLLN of Theorem 3.2 does not hold if (even) the symmetric
moment does not exist. To see this, we present the classical St. Petersburg
game, called a "paradox," since people applied the WLLN without satisfying
its hypothesis. Let X be an r.v. such that
>
on ( R , C , P ) . Let {X,, n 1) be i.i.d. random variables with the distribution
of X . If S, = C L = , X k , show that S,/n f i a, as n + cm, for aiiy a E R,
either in probability or a.e. for any subsequence. (Use the last part of Theo-
rem 3.7.) The game interpretation is that a player tosses a fair coin until the
head shows up. If this happens on the nth toss, the player gets 2, dollars. If
aiiy fixed entrance fee per game is charged, the player ultimately wins and the
house is ruined. Thus the "fair" fee will have to be "infinite," and this is the
P
paradox! Show however, by the truncation argument, that S,/(n log2 n) + 2
as n + cm, where log, n is the logarithm of n to base 2. If the denominator is
P
replaced by h(n) SO that ( n log, n ) / h ( n ) + 0, then S,/h(n) + 0 and a.e. In
>
fact show that for any sequence of random variables {Y,, n 1) there exists
an illcreasing sequence k , such that P[IY, > k,, i.o.1 = 0, so that Yn/kn + 0
a.e. Thus nlog2 n is the correct "normalization" for the St. Petersburg game.
(An interesting and elementary variation of the St. Petersburg game can be
found in D.K. Neal, & R.J. Swift, (1999) Missouri J. Math. Sciences, 11,No.
2, 93-102.)
P
(a)X,zX,Y,zY+X,*Y,zXf Y,andX,Y,+XY.
(b) If f : R2 + R is a Bore1 function such that the set of disconti-
nuities of f is measurable and is of measure zero relative to the Stieltjes
measure determined by the d.f. FX,Y of the limit vector (X, Y) of (a), theii
P P
f (X,, Y,) 2 ( X , Y) under either of the conditions: (i) X, + X , Y, + Y
or (ii) ax, + P
+
BY, + a X PY for all real a , p. If f is continuous, then
strengthen this to the assertion that f (X,, Y,) 3 f (X, Y) if coiiditioii (i)
holds. [Hint: For (ii), use Problem 5(b) aiid the fact that (X,, Y,) 2 (X, Y)
iff ax, + PY, 2 a X + PY for all real a,p.]
10. Suppose that for a sequence {X,,n >
1 , X ) in L1(P) we have
X, 2 X . Show it is true that E ( X 1 ) 5 liminf, E(IX,), and if, further,
the set is uniformly integrable, then E ( X ) = limn E(X,). [Hint: Use Problem
5 (b) and the image probability Theorem 1.4.1. This strengthening of Vi-
tali's convergence theorem (and Fatou's lemma) is a iioiitrivial contribution
of Probability Theory to Real Analysis!]
12. (Kolmogorov). Using the method of proof of Theorem 2.7, show that
if {X,, n >
1) is an independent sequence of bounded random variables on
Exercises 97
(0,E, P), common bound M and means zero, then for any d > 0 we have,
with S, = XI,,
Deduce that if Var(Sn) + oo, then for each d > 0, P[IS, < dl + 0 as n + oo.
13. (Ottaviaiii). Let {X,,n > 1) be independent random variables on
+
( 0 , E , P )and E > 0 be given. If S, = C L = l X k , PIIXk . . . +X,I < E] >
q > 0, 1 < k < n , show that
> >
[Note that if Al = [S1l E ] , aiid f o r k > 1, AI, = [ISkI E , I S< ~E,I <
j < > + +
k - 11, then [JS,J ~ / 2 ]3 Uk(Ak n [JXk+l . . . X,J <
~ / 2 ] )The
.
>
decomposition of [maxk<, ISkl E] is analogous to that used for the proof of
-
Theorem 2.5.1
Hence, if C n > l ( a i / n 2 ) < oo, conclude that the sequence {X,, n > 1) obeys
the SLLN. (Thus we need not use Kronecker's lemma.)
1 > E > 0, there is an no[= no(€)]such that m , n > no +-PIISn-S, > E] < E.
Thus P[Sk-SmI> E] > I E for all rn < k 5 n. Hence by Problem 13 applied
> m > no}, we get
t o the set {Xj, j
1
max ISk S, > 2 ~ 5] -P[IS, SmI> E] < -.
E
- -
This implies upon first letting n + cm, and then letting m + cm, since the
0 < E < 1 is arbitrary, that ISk,k >
1) is pointwise Cauchy and hence con-
verges a.e.1
[Hints: By Problem 17b' and the i.i.d. hypothesis, we have, with So= 0 = Xo,
100 2 Independence and Strong Convergence
The converse here is similar, so that the last equivalence follows. It should
be remarked that actually all the implicatioiis are equivalences. The difficult
part (the first one) needs additional computations, and we have not yet devel-
oped the necessary tools for its proof. This (harder) implication is due t o Hsu
and Robbins (1947), and we establish it later, in Chapter 4.1 Show, however,
what has been given is valid if the symmetry assumption is dropped in the
hypothesis.
20. In the context of the preceding problem, we say [after Hsu and Rob-
bins (1947)l that a sequence {Y,, n >
1) of random variables on ( 0 ,C , P)
converges completely if for each E > 0, (*)C;==, PIIYn > E] < co. Show that
complete convergeiice implies convergeiice a.e. Also, verify that (*) implies
that the a.e. limit of Yn is necessarily zero. Establish by simple examples
that the converse fails. [For example, consider the Lebesgue unit interval and
Yn = n ~ [ ~ Show,
, ~ ~ however,
~ ~ . ]that the converse implication does hold if
there is a probability space (a1, El, P'), a sequence {Z,, n >
1) of indepen-
dent random variables on it such that P[Yn < z] = P1[Zn < z],z E R,n >
1,
Exercises 101
21. The following surprising behavior of the symmetric random walk se-
quence was discovered by G. P6lya in 1921. Consider a symmetric random
walk of a particle in the space R '
. If k = 1, the particle moves in unit steps
to the left or right, from the origin, with equal probability. If k = 2, it moves
in unit steps in one of the four directions parallel to the natural coordinate
axes with equal probability, which is 114. In general, it moves in unit steps in
the 2k directions parallel to the natural coordinate axes each step with prob-
ability 1/2k. Show that the particle visits the origin infinitely often if k = 1
or 2, and only finitely often for k = 3. (The last is also true if k > 3.) [Hints:
If e l , . . . ,el, are the unit vectors in IKk, so that ei = ( 0 , . . . , 1 , 0 , .. . , 0 ) with 1
in the ith place, and X, : R + IW'" are i.i.d., then
Let S, = CklX j . Then if k = 1, the result follows from Theorem 4.7, and
if k = 2 or 3, we need to use Theorem 4.8 and verify the convergence or
divergence of (35) there. If p, = P[IS, = 01, so that the particle visits 0 at
step n with probability p,, then the particle can visit 0 only if the positive
and negative steps are equal. Thus p, = 0 for odd n and pa, > 0. However,
by a counting argument ("multinomial distribution "), we see that
ux : A H
S, X d P , A E C,
defines a signed measure and vx << Pa. Thus du,/dPB = E"(x)a.e. [Pa]
exists and is Pa-unique by that theorem. Any member of the Pa-equivalence
class is called a version of the conditional expectation E"(x), and it is cus-
tomary to call E"(x)an r.v. when a version is meant.
This general concept could not have been formulated before the avail-
ability (before 1930) of the abstract Radon-Nikodfm theorem. [Alternatively,
if the martingale convergence for directed indexes is granted, then (1) can
be extended; cf. Problem 30.1 Since (3) is a special case of (41, the elemen-
tary definition is included in the present one. We note that in (41,if X is
not integrable [i.e., E ( X + ) = cc but E ( X p ) < oo, or E ( X + ) < cc but
E ( X p ) = oo], then dux/dPa = +oo (or = o o ) on a set of positive Pa-
measure, so that E"(x)= E"(x+) - E"(x-) is still defined but need not
106 3 Conditioning and Some Dependence Classes
be a proper r.v. as compared with X . Thus the general case is deeper and
not quite intuitive, in contrast to the elementary formulation (1). Similarly,
P" : C + L m ( P ) , the conditional probability, is not a (scalar) measure, but
it is a vector space-valued (bounded) function. These concepts constitute an
enormous generalization of the classical expectation aiid probability notions.
Just as the definition of independence is given by a system of equations, so are
these conditional notions given by (4) and (5). We now present some simple
properties and also show some of their individual characteristics that are not
possessed by the uiiconditional concepts. These will give us a clear idea of
their structure.
The first consequences are coiltailled in the following:
Proof Definition 1 implies both (i) and (ii). Taken together, it says that
E" is a positive linear operator on L 1 ( P ) .
For (iii), if X1 = E"(Y) E L1(R,B,P"), we have to show that
Remark Property (iii) is often called an averaging, and (v) the commu-
tativity property of the conditional expectation operator. Item (iv) is termed
the contractivity property. Also, ~"(1x1) = 0 a.e. iff X = 0 a.e., since
Several of the standard limiting operations are also true for E", as shown
by the followiiig result.
=S, X ~ =
P S, E R ( x ) d P B , (by definition).
108 3 Conditioning and Some Dependence Classes
<
n n
+ Y)
- lim s u p ( ~ " ( x , ) + En(y))= lim sup EB(xn
Thus there is equality throughout. Caiicelling the (finite) r.v. E"(Y), we get
limn E B ( X n ) = E n ( X ) a.e. Finally, as n + cm,
This yields the last statement also. The negative statement is clear since, e.g.,
P
if B = C , then X n + X +
a.e. convergence.
(iv) Again we have
3.1 Conditional Expectations 109
as n + oo, by the classical Vitali theorem, since the Xn are uniformly in-
tegrable and the measure space is finite. The last statement is obvious. We
obtain the negative result as a consequence of (the deeper) Theorem 5 below,
finishing the proof.
z, R, 1 < i < k < GO, such that for some a-algebra Bo c C , E"o(x;) +
E U u ( X ' ) at almost no point as n + GO.I n other words, even if the X n are
uniformly integrable, the X; satisfy:
X, >
X , so that (recalling X,, X 0) >
sup ZnI <X aiid E ( X ) < cm.
n) 1
EB(xn
-X) = 01 = [
P lim En().;,)
n-00
+ n-00
lim ~ " ( 2 ,=
I
)0
= P lim E"(Y,) =
In+_
11. We now assert that it is even sufficient t o find a suitable two-valued YA-
sequence. More explicitly, we only need t o construct a sequence {YA, n > 1)
such that YA + Y' = 0, P[YA = a,] = p, = 1 - P[YA = 01, 0 < pn < 1,
a, > 0, and for each w E R,YA(w) . YA(w) = 0 if n # rn, C n = , p n = 1, with
00
00
U' = supn2l YA = E n = , YA, satisfying
For let Ak be the event that Yk >U - 1 for the first time, with the Yk, U
of the last step. This means
Here hk = b i x ~ may
, be assumed to be in canonical form, so that
Bi n = 0,Bi c Ak,bi > 0. Hence supk,,- hk = C F = l hk, since the Ak are
also disjoint. Thus
> >
aiid hence P[limk~ ' ( h k )= 01 P[limkYk = 01 0, aiid it suffices t o show
that P[limk~ ' ( h k )= 01 = 0 when hk + 0, each hk being a positive siinple
function satisfying (12). Since each hk is in the canonical form, it is a sum
112 3 Conditioning and Some Dependence Classes
Let T(= Tk) = u,N, = {i > 2 : a, > 2"'). Set r = CiETpi <1 - pl <
1 2 ~ "and
-
P [ W = 01 = 1 - t , P [ W = n] = t,, n > 1,
P f ifi>2,i@T
0, otherwise,
for n E N,
0, otherwise.
00
and for i E T
P [ V = i] = P [ W = n , Zn = i] = p,. (16)
This proves the assertion. Define YA = a,X[v=,p Then the YA have the two
values 0, a, > 0, and, [V = n] being disjoint events, for each n only one YA(w)
is different from zero. Atoreover, YA + 0 a.e., and
Thus a,P[V = ilZ, = i] = a,t,,i E N,. But t, > 2Tnpk and, for i E N,,
a, > 2n+k, so that ant, > 1. Coiisequeiitly if A, = [Z, # 11 and
B, = [ a n P [ V= il Z, = i] > 1 for some i E N,],
then for n > 1,A, c B,. Though the B, are not necessarily indepeii-
dent, the A, are mutually independent, since the Z,, n > 0, are. Thus [A,,
114 3 Conditioning and Some Dependence Classes
i.o.1 c [B,, i.o.1, and it suffices t o show that P[A,, i.o.1 = 1 by verifying that
C r = l P ( A n ) = oo (cf. Theorem 2.1.9ii).
By ( I s ) , P ( A n ) = CztN,, pz/tn = rnltn. To show that - (rn/'tn) = ~7
Consequently,
and by choice Ci,, piai = +GO.Hence CitTo piai = GO, so that the series in
(19) diverges. Thus in all cases C,,,(r,/t,) = +oo. It follows that P[A,,
Lo.] = 1, which completes the proof of the theorem.
= E
" (liin i i i f ( - ~ ; x [ ~ ~
n
- u,,
<
- E" (- lim inf (-x,,))a x . [by (23)j
n
Since X, + X a.e., lim sup, X, = lim inf, X, = X a.e. so that (23) aiid (24)
imply
Proof Since En is faithful (cf. remark after Proposition 2), these inequali-
ties follow from the unconditional results. Briefly, (26) is true if X =O a.e. Thus
let 0 < E'(~xP)= N$ < ce a.e., and 0 < ~ ' ( 1 ~ 4 =) N; < oo a.e. Then
the numerical inequality of Eq. (4) of Section 1.3, with ( a = l l p , C,? = 114,
and 1 < p < oo there ( N i = Nx, N$ = Ny) implies
This is clearly equivalent to (i), called the conditional Holder inequality. Sim-
ilarly (ii) and (iii) are established. Because of its importance we outline (28)
again in this setting.
Recall that the support line property of a convex function at yo is written
as [cf. Eq. (13) of Section 1.31
+
P(Y?2 $ 4 ~ 0 ) g(yo)(y - yo), y € R, (30)
where g : R + R is in fact the right (or left) derivative of y at yo and
is nondecreasing. (It is strictly increasing if y is strictly convex.) Take y =
X(w),yo = EB(x)(w), in (30). Then we get
which is (28). Under the alternative set of hypotheses of (iii), one has from
(32), since y is now increasing,
118 3 Conditioning and Some Dependence Classes
Note that (32) and (32') are valid for any bounded r. v. X , and then the gen-
eral case follows by the conditional inonotoiie coilvergelice theorem, so that
E " ( ~ ( x )exists.
) This completes the proof.
Thus let 1 < p < oo. Then by (28), since lXlP is B-measurable
This implies (33), aiid that E"(x)E L" for each X E L", as asserted.
problems, we present just a sample result. [It is not used in later work.]
Proof The second part has already been established. We need to show
the more involved result Q = E". For this we first assert that Q is a positive
operator with the property E ( Q X ) = E ( X ) , X E L1(C), and reduce the
general case t o this special result. Indeed, by the density of bounded fuiictioiis
in L1(C), and the fact that L1(C) is a vector lattice [i.e., X E L1(C) + X =
<
X + - X - , 0 X* E L1(C)],it suffices to establish the result for 0 X < <
1
a.e. Siiice Q is identity on its range (because Q2 = Q), and 1 E L1(B), it
follows that Q l = 1. Then
This implies
If we show that the right sides of these equations in (34) are equal, then we
get the equality of the left sides giving the middle equality below:
With further work, one can prove that the above theorem is true if L 1 ( C )
<
is replaced by LP(C), 1 p < oo.It is also true if Q is a contractive projection
<
on LP(C), 1 p < oo,p # 2, but Q1 = 1 a.e. Then its range can be shown
t o be of the form L"(B). The result again holds for p = 2 if Q is assumed
positive in addition. These and related characterizations are not essential for
the present work and are not pursued. A detailed account of the latter may
be found in the first author's book (1981).
C P"(A~) i o as m - w.
These assertions are immediate from Theorem 1.3. Consider (iii) as an illus-
tration:
1 P A -
n=l
P A ) 1 P
P
= /n
P ( u ~ A ~ )d&] ' [using (ii)]
= P (u,>,A,) i O as m i oo.
Taking B = C (or An E B for all n ), one sees that the assertion fails if
p = +w, since P B ( A ) = E"(XA) = XA hence I x ~ ~ > , , A ~ l o o = 1 f + O as
-
n + w.
This proposition states that P"(.) has formally the same properties as the
ordinary measure P. However, each property has an exceptional P-null set
which varies with the sequence. Thus if B is not generated by a (countable)
partition of C, then the collection of these exceptional iiull sets can have a
union of positive P-measure. This indicates that there may be difficulties in
treating P U ( . ) ( w ) : C + [0, I] as a standard probability measure for almost
all w E R.Indeed, there are counterexamples showing that PB(.)(w) cannot
always be regarded as an ordinary probability function. We analyze this fact
and the significance of property (iii) of the above proposition since the struc-
ture of conditional probability fuiictions is essential for the subject.
Proof It was already seen that P" exists satisfying the fuiictioiial equation
( I ) for any a-algebra B c C . By definition = PB(A) and hence, by
the linearity of E", we have for each simple function f = C r = l a,XA,, Ai E C,
122 3 Conditioning and Some Dependence Classes
where the integral for simple functions is defined by the sum. It is easily seen
that the integral in (2) is well defined for all simple functions, and does not
depend on the representation of f . Thus for such f , there is a P-null set Nf
such that for w t f l Nf
= lim E"~(~,)(W).
n-00
P""(.((w): C + [O, I]
= S,Ll g(xlu)dx k
(*)
f (u, u)dudu
Now both sides of (7) are a-additive in A(E B x B) for each fixed B2 E B2.
Since B x B generates B 8 B = C, the Hahn extension theorem for a-finite
measures implies that (7) holds on C for each B2 in B2. Hence Q(., .) is a
3.2 Conditional Probabilities 125
Proof The argument uses the fact that rationals are dense in R (or use
any dense denumerable set in Rn), and properties (i) and (ii) of Proposition
>
1 are available. Let {ri,i 1) c R be an enumeration of the dense set which
we take as rationals, and coiisider for each w t fl,
and the right side always defines a conditional probability distribution by The-
orem 5; aiid (15) says that F x y ( . l y ) is a distribution function (can be chosen
t o be such for each y), and FxlY (al.) is Borel measurable. Hence we may state
the following (but we show later there is room for troubles in this calculation!):
= S,,P ( ~ , l t ) ~ ~ ( d (by
t ) the image probability law)
It is clear that C contains all intervals of the form [a,b) and their intersec-
tions, so that the semiring S of such intervals is in C and R E C. It is a T-class,
aiid by the inoliotolie convergence theorem it follows that C is also a A-class.
Hence by Proposition 1.2.8, C > o ( S ) = R. Since C C R , we have C = R,
aiid (17) is verified. Note that for the argument here aiid for (171, h can be
any Fxly-integrable function and need not be bounded. This completes the
proof.
Since this equation is true for all bounded Borel h, it follows from the Lebesgue
theory that fX,y(x, y) = f X l y ( x y fY(y)
) for a.a.(x, y) (Lebesgue). Hence the
example given for regular conditional distributions [cf. (6)) is recovered.
130 3 Conditioning and Some Dependence Classes
E ( h ( X ,Y ) )= / 1
R"' R"
h ( z 1 Y ? P X ~( Y~ ~ Y ? (dy?.
P Y
All these statements have already been proved above. Because of (211, the
conditional Holder aiid Minkowski inequalities (cf. Theorem 1.9) can also be
obtained using the corresponding classical procedures with Lebesgue-Stieltjes
integrals, once the regular conditional probability theory is available, i.e., if
Theorem 5 is given.
If X , Y are random variables, as above, having an absolutely continuous
(joint) distribution function with density f x , ~ ,then for any real a , b, aiid
61 > 0, Sa > 0, we have
3.2 Conditional Probabilities 131
as 61 + 0 or 62 - 0. Thus, e.g., P [ a 5 X
(14) and (20) oiie has
< a + S1,Y = b] = 0. However, by
On the other hand, using the naive approach [cf. (1) of Section 11
This result will not be proved. [It is quite iiivolved, and indeed is related to
the existence of a "lifting map" on (0,C, p). A complete proof can be found
in books on differentiation, and a recent version (and discussion) of it is given
in the first author's monograph (Rao (1993), (2005), Section 3.4).] We do not
need the result here except to draw the readers attention to the existelice of a
nontrivial problem with calculatioiis of regular conditional (probabilities or)
distributions.
Recall that in a partially ordered set I, a subset J is called cofinal if for each i E I
there is a j t J with j 2 i .
3.2 Conditional Probabilities 133
>
for t 0. Hence we are in the predicament discussed followiiig (25). Since Y
is obtained by a linear operation on X ( t , .), it is easily verified that E ( Y ) = 0.
Let E ( Y 2 ) = a2 > 0 ( a 2 < cc always), and
so that with (27) the conditional density is obtained as S + 0 [since the right
side of (28) does not depend on 61 :
However, in this approximation, the fact that X(0) is part of {X(t), t 0) and >
that X ( 0 ) = limt,o X ( t ) is not used; i.e., part of the information is ignored.
So we remedy this in the next set of approximations.
134 3 Conditioning and Some Dependence Classes
(ii) Approximation 2(m). Let 6 > 0 and m be a real number that is the
slope of a straight line through (0, a)(yt = a mt, t + >
0). Let A p = [ X ( t ) :
+
X ( t ) passes through the line y = a mt, of length 6, for some t 01. Thus>
< <
AT = {w : X ( t , w ) = a + m t , for some 0 t ~ / ( l + r n ~ ) ~ / is~ anevent.
),A~
Again P ( A F ) > 0, and for each rn, AT + A as S + 0. We now calculate
the "conditional density" p m ( . a ) and show that for each m it is a different
function. First let Y > m , and using the procedure of (28) and differentiating
relative to y to obtain the density, we get (on noting that Y and X(0) are
independent and that
X(t) - (a - mt)
lim
+ Jaa-(,mj,- P(Y?. f (z)dz
(as?
6-0 1
6 p(y)cly Jaa-(y-m)6 f (z)dz + i .rrnW
P ( Y ) ~Ja
Y
a-(y-m)b
f (z)dz,
where f (.) is the density of X ( 0 ) as in (26) and p(.) is the density of Y given by
(27). Here since Y > m , the approximation obtains only when a - (y - m)6 <
X(0) <a . To simplify (29), let us find the limits of the numerator (= N6)
and denominator (= D6) separately:
Using similar calculations for Y < m , and combining it together with the
above for (29), one finds
The affirmative answer to (C) has already been indicated in the preceding
discussion. This is a consequence of the general differeiitiatioii theory that
can be found, for instance, in the references of Bruckner (1971) or Hayes and
Pauc (1970). There is also a positive solution t o (D), but it lies somewhat
136 3 Conditioning and Some Dependence Classes
Hence the extreme integralids agree a.e. [PI on o ( B U B2) which is (ii), because
all such sets generate the latter a-algebra (or use Proposition 1.2.8).
(ii) + (i) Since B c o ( B u B2), and hence E" = E " E ~ ( " ~ " (ii)
~ ) ,yields
for a.a.(w),
PB(Bln B2)= E " ( E ~ ( " ~ ". xB2))
~)(~~~
Using the ( T , A)-class theorem (cf. Proposition 1.2.8) just as in the uncon-
ditional case, we deduce the following result from the above.
Hence P"(A n B ) = P"(A)PB(B) a.e. Now the case that B c B2 being simi-
lar, the result holds as stated.
Since in the uiiconditional case B = (0, a), this says that if {X,, n 1)>
are mutually independent, 7 aiid (0, R) are equivalent, which is Kolmogorov's
>
zero-one law. Another interesting observation is that if the X,, n 1, are in-
dependent with the same distribution and if P is the a-algebra of permutable
events relative t o {X,, n > I ) , by definition of permutability, we can con-
clude that {X,, n > 1) will still be conditionally independent given P, since
in Definition 11 only finitely many X, will appear each time. Consequently
by the above proposition P aiid 7 are equivalent. Siiice each event of 7 has
probability zero or one, so must each event of P . Thus each permutable event
determined by the independent X n with the same distribution has probability
zero or one which is the Hewitt-Savage law (cf. Theorem 2.1.12).
follows that Gt7, and Bt,,+, are also coiiditioiially independent, given Bt,, . Then
by Proposition 2.12iii
Consider 7 , = {tl, . . . t,}. Clearly G,, c Gt7, aiid Bt7, c G,, . Heiice applying
the operator EB7l~ t o both sides of (2) aiid noting that ~4 EB7~~,
" 7 i.e.,
1 ~
= EG711
(P(AIBt,,))= P(AIBt,,) a.e., A E &,+, [by (2)l.
Heiice (ii) follows.
(ii) + (iii) First note that if F is the collectioii of all finite subsets of
TI (i.e., a E 3 iff a = { u l , . . . , u,} c TI for soine 1 <n < oo),where
TI = {t E T , t 5 t l ) , then Gtl = C T ( U , ~ ~ G , ) ,with G, = a ( X t , t E a ) .
Indeed, since each G, c Gtl, a E F, Gtl contains the right-side a-algebra. But
Gtl = ~ ( u ~ ~aiid ~ BBt~c )G, , for soine a E F. Hence Gtl is contained
~ each
in the right side. Now (ii) implies, by Proposition 2.12iii again, that G, aiid
Bt, are coiiditioiially independent given Btl for each a E F. The argument
proceeds as in the case of independent events (see the proof of Theorem 2.1.3).
Thus t o use Proposition 2.12ii, we introduce two families of sets D and C as
follows. Let D be the class of all finite intersections of events each belonging
t o a G,, a E F. Then it results in G, c D , a E F. But this clearly implies
Gtl c o ( D ) . It suffices t o show, for (iii), that Bt, and a ( D ) are conditionally
independent given Btl. For this consider the class
3.3 Rtarkov Dependence 143
(iii) + (iv) Writing t for t l , in (iii), we get that Gt and Bt, are conditionally
independent giveii Bt for any t2 E T , t2 > t. From this, by Proposition 2.12ii,
we deduce that Gt aiid a ( X t , ,i = 1, . . . , n, t , > t, t , E T ) are coiiditioiially
independent giveii Bt. Then, by the argument of the preceding paragraph, Gt
and Gt are conditionally independent given Bt, which is (iv).
[by independence of X I , X 2 ,]
B X I ] dPa where
From ( I ) of Section 2.1, the left side =
x,l(4 PIX1 + X a E
B = a(X1). It follows that (*) holds since x P 1 ( A ) is a generator of B. This
can be extended for n >
2 to obtain the following, which we leave to the
reader (cf. also Chung (1974), P. 308):
(ii) For any t l < t2 < . . . < t,+l in T and Bore1 set A c R,
P([XtrL+,~ A I 1 X t l , . . . , X, ~
t ) = P ( [ X t , ~ + , l ~ A l 1 X t , ~ a.e.;
) , n > ~ (4)
,
(iii) For any sl < sz < . . . < s, < t < t l < . . . < tm in T
and Borel sets A, c R, Bj c R,
for almost all w E fl, the exceptional null set depending o n r , s , t and A.
[Often Q,,t (A, X, (w)) is written as p(X, (w), s ; A, t ) in (7) and interpreted
as the probability of the motion of a particle w starting at time s from the
state X,(w) and moving into a position or state in the set A at time t > s.]
< <
Proof. Consider the process {Xu, r u s < t , Xt : r , u, s, t in T}. Then
by (3)
< <
P ( [ X t < XIX,, r u s) = P ( [ X t < XIIX,) a.e. (8)
Hence applying the operator E(.IXT) to both sides of (81,and noting that
~ ( x Cr )r ( X t Lr, 5 u 5 s ) , one gets, by Proposition 1.2,
for a a(X,)-adapted bounded Y, so that (7) follows from this aiid (6), coin-
pleting the proof.
The preceding proposition implies that for every Markov process its family
of conditional probability functions { p ( . , t ; ., s),s < t in T) must satisfy the
Chapman-Kolmogorov equation (6) or (7). It would be surprising if a Markov
process can be characterized by this property, in the sense that the only con-
ditional probability functions satisfying (6) or (7) are those given by a Markov
process. Unfortunately this is not true, as was first pointed out by P. Lkvy
already in 1949. The following simple example, due to W. Feller, illustrates
this point.
$ ifif j
= i 1 ifi=j,m=n
0 otherwise.
Also, {XI, X 2 ,XS} are pairwise independent. They are not Markovian. To see
the latter.
so that the "future" depends not only on the "present" but also on the "past."
< <
But for 1 i < j 3, the Qi,,(., .) satisfy (7), since
148 3 Conditioning and Some Dependence Classes
and similarly other coinbiiiatioiis are verified. Note that (7) holds identically
(not a.e.).
The preceding example can be extended t o an infinite family of ran-
dom variables with the same (non-Markovian) properties. Let us assume
that the basic probability space ( 0 ,C , P) is rich enough to support the fol-
lowing structure. [Actually, it is possible to enlarge the space to suit the
needs. For this, if ( fi, C, P) is as in the example, then we let (R, C , P) =
@i21(fii,Ci,P,), fi, = fi, etc., as a product space, and the correctness of
this procedure is a simple consequence of Theorem 3 of the next section.] Let
{X,, n >
1) be random variables such that X I , X 2 ,X3 are the ones defined
above, and let each block of the following three r.v.s have the same distribu-
tions as these. Thus for any rn 1,>
where 1 <
i l l i2,i3 <
3 are integers. It then follows immediately that the
family {X,, n >
1) is non-Markoviaii, but pi,,,, = P[Xk+1 = i21Xk = ill =
>
113, k 1, and (7) holds. Here one defines X, at w E R , w = (51,5 2 , . . .), G, E
Ri, by the equation
for any bounded Borel f : R + R and any random variable X on ( R , & + I , P),
where = a ( X 1 , . . . , Xn+1), of the family. Indeed, we have
for any real bounded Borel functions f l , . . . , f,+l and the image measure ~1
of P given by Q 1 ( ~= ) P(XT'(A)) = PIX1 E A]. This is called the initial
>
distribution of the process {X,, n I), being the image measure of the first,
or initial, random variable X I . Taking f, = X A , A, c R, Borel, (12) yields an
important expression for the joint distribution of ( X I , . . . , X,+l) :
/
&+I
Qn(dxn+l>xn). (13)
for all Borel sets A c R. Similarly others are obtained. Here Q1,2(.,x1)is
the same as in (7) with r = 1,t = 2 there. E v e n though the conditional
distributions {Q,,(., .), r < s ) do n o t uniquely correspond t o a Markov process,
as shown in the counterexample above, the (absolute) finite dimensional o r
joint distributions (as n varies) given by (13) are uniquely defined and, a s
demonstrated in the n e x t section, t h e y determine a Markov process. In this
sense both (7) and (13) play crucial roles in the Markov process work.
In the preceding discussion Q,,t(A, x) is a version of P ( [ X t E A ] X , = x ) ,
and is a regular conditional distribution of Xt given X, = x. Since generally
150 3 Conditioning and Some Dependence Classes
there exist several versions of the conditional probability, how should one
choose a family z),r < t , x E R} in order that all these measures
simultaneously satisfy (13)? There is no problem if the family is finite or even
couiitable, since we can find a fixed iiull set and arrange things so that this
is possible. Also, if the state space is at most countable (i.e., the Atarkov
chain case), we can take the exceptional set as empty and answer the problem
affirmatively. In the general case, no such method is available. To include
all these cases, one assumes the desired property and develops the general
theory. This is called the (Markov) transition probability (nonconstructive or
idealistic procedure) family defined as follows [we write Q(., t ; z, r) in place of
QT,t(., z) for conveiiieiice].
A mapping Q : B x R+ x R x R+ + [O, 11, where B is the Borel a-
algebra of the state space R, is a (Rtarkov) transition probability if for
each O < r < t , z E R ,
(i) Q(., t; z, r ) : B + [0,11 is a probability,
< <
(ii) Q(A, t; .,r ) : R + [0, I ] is B -measurable for each A E B, O r t ,
<
(iii) for each 0 r < s < t , one has
identically in all the variables shown. [For this definition R+ can be replaced
by a subinterval of R. But for the following work we use R'.]
If p is an initial probability on B then substituting p and Q in (13) oiie
can generate an n-dimensional probability measure on Rn. We also take
equal a.e. Thus (14) is just the Chapman-Kolmogorov equation, which is now
assumed to be true identically (without any exceptional sets). It follows from
the work of the next section that there exists a Rtarkov process on a probability
space (R, C, P) such that p(A) = PIX. E A], and P ( [ X t E A] X , ) (w) =
&(A, t ; X,(w), s ) for a.a.(w). [The exceptional iiull sets depend on A, s , t , in
general, aiid this is why we assume the conditions (i)-(iii) above identically.]
A coilsequelice of the above (strengthened) conditions is that oiie can
transform the Markov process theory into one of functional operations by
means of (14):
) if E ( X S 2 )> 0, s < t in T
~ ( st ),=
otherwise.
[p(s,t ) will be the correlation if E(X, 12) is replaced by [E(x, 1 2 ) ~ ( 1X2 ) ] t'I2
and all the Xt have zero means; Xt is the complex conjugate of Xt.]
(b) If t l < . . . < t, are points of T, and 5352; = sp{Xt,, . . . , X t n } is the
linear span, let ~(.15352;)be the orthogonal projection of L 2 ( C ) onto 5352k. Then
the given family is called a Markov process i n the wide sense if for each such
>
collection of points t l < . . . < t,, n 1, of T ,
which implies Q,Xt = p(s, t)X, [ and (I Q,)Xt = X t p(s, t)X,]. But
- -
and 2 1 , . . . , 2, are order statistics of an r.v. whose density is (23). Note that
< <
in the above, we have also obtained the d.f. of qk = F ( X k ) .In fact, 0 qk 1
a.e., and its d.f. is given by
u
[the sum is over all permutations o of ( I : . . . , n)]
=
u
/ 'i'/ dF(hl) . dF(h,), (siiice the Xi are i.i.d.)
But this is true for all Borel subsets B of A. Hence by the Radon-Nikod9m
theorem, ( X I , . . . , X i ) has density, say, f * , given by
0 otherwise.
(26)
<
If 1 zl < z2 < . . . < ik <
n is a subset of the set ( 1 , 2 , . . . , n ) , then the
marginal density of X ; , . . . , XG, say, gi , i k (21,.. . , x k ) , is obtained from
(26) by integrating f *, for fixed x l < x2 < . . . < xk, relative to all the X over
the following set
This gives (22). The details of this elementary integration are left to the reader.
It remains to establish the special properties of the Y,- and 2,-sequences.
First consider the 2,. Let Uk = Zk - ZkP1 (and Zo = 0 a.e.), k = 1, . . . , n.
>
Then Uk 0, and Z1 = U1, Z2 = U1 U2,.. . , Zn = + Uk. The mapping
from the range space of (Z1,. . . , Z n ) to that of (U1,.. . , Un) is one-to-one,
and we shall compute the joint distribution of Ul, . . . , U, and show that they
are mutually independent. Since, as seen in Example 3, each partial sum se-
quence of independent random variables is a Markov process, both properties
announced for the &sequence are established at once. Now with (23), the
joint distribution of Z1, . . . , Zn is given from (26) as
Since B is an arbitrary Bore1 set in the positive orthant of Rn, the range
space of (U1, . . . , U,), it follows from (271, by means of the Radon-Nikod9m
theorem, that the Uk have a density function h given by
h(u1,. . . ,u,) =
n! nj":; C ( ~ - J ) ~ J + ~if u1 > 0 , . . . , u, > o
otherwise.
Since the density factors, it results that the set {Ul,. . . , U,} is a mutually
independent family (Uj with density = constant .e-(n-j)U~+lfor uj+l 0). >
Thus the &-sequence is Markovian, and is in fact of independent increments.
It only remains to verify the wide sense Markov property for the real Y,-
sequence.
Let r ( i l , i2) = E(Y,,Y,,). Now using (24) in (26) or (22) for 1 i l < i2 < <
n , we get, on substitution, that the densities g,, and g,,,i, of Y,, , (Y,,,Y,,) to
and
Hence
Similarly,
158 3 Conditioning and Some Dependence Classes
Thus
The mysterious condition (3) simply states that the intersection of the sets
inside P ( . ) in (1) is commutative and thus the determination of F does not
depend on the ordering of the indices of T. Equations (2) aiid (3) can be
put compactly into a single relation, namely: if for any set a, = ( t l , . . . , t,)
of T, Rn = X ~ , E ~ , , R ~=, , RR, B ~ n, = @t,E,,,Bt,,Bt, = B , the Bore1 a-
algebra of R, and .irn,,+l . . Rn+l + R n , the coordinate projection [i.e.,
-1
~ n , n + l ( x l , .. ., % + I ) = (21,. . . , xn)], then Rn+' = Tn,,+l (Rn), Bn+l >
1
Tn,n+l (Bn). Moreover, if we define
then (2) and (3) can be combined into the single statement that
measures defined by the right side of (13) in the preceding section can be shown
to be a compatible family also. A class of distributions n 1) which >
cannot be so factored is the following "multinomial" (discrete) distribution:
where a, = ( 1 , 2 , . . . , n ) aiid
Proof Since inverse mappings preserve all set operations (i.e., unions,
intersections, differences, and complements) for any collectioii [which need
not be countable, e.g., n i l(no AD) = no n;l(Ap)], it follows that n i l (C,)
is a 0-algebra on R , a E 3,and if a1 < a z , then n;:(C,,) c n;:(C,,). This
immediately implies that Co is an algebra (and in general n o t a a-algebra).
We now define P on Co. The compatibility of the P, was already verified
preceding the statement of the proposition, and it is essential for the definition
of P.
Let A E Co. Then A E n;l(C,) and perhaps also A E n i l (Cp). Thus
there are B1 E C,, Bz E Cp such that A = r;'(B1) = T F ' ( B ~ ) . Let =
a U /3 E 3,so that y > a ,y > p. Then we know from the definition of the
T, above that T, = T,, o T, and TO = TO, o T~ Consequently A can be
represented in two ways:
argument, and the result is thus valid for any compatible family of {P,, a! E
F). The proof is complete.
The main problem now is to show (under suitable conditions) that P is
0-additive on Co, so that it has a unique a-additive extension P t o a(Co),
(= C, say). The triple ( R , C, P) is then the desired probability space, giving
our first existence result. We now establish this property for the product case.
and deduce that there is an w2: E Ri, such that gn(~E) f i 0. Repeating this
procedure, we see that XB,, ( W E , . . ,u:~, w , ~ +. .~. ,w,,, ) cannot be zero for all
points ( w , ~ +. ~. . ,w,,,) E fl,,+, x . . . x a,,,. Thus there exists an w0 E f l such
that w0 = ( w i , . . . , w,: wik+, . . .) E A,, for any n > k. If /3 = Uzl,ai we
can choose w0 such that its countable set of components corresponding to the
countable set P, and the rest arbitrarily in xiEI-pfli. Then by the form of
A, (that they are cylinders), w0 E A, for all n > 1, so that w0 En,"==, A,.
This shows that P(A,) > S > 0 implies n,A, # 0, and hence P is a-additive
on C o Thus (0,C,P) exists, where P is the unique extension of P onto
3.4 Existence of Various Random Families
Remark The space (0,C, P ) is called the product of {(Ri, Ci,Pi),i E I),
aiid usually it is denoted ( R , C, P ) = BiE1(Ri, Ci,Pi).
n
= P ( A , ) (since P is a product measure) (8)
k=l
Thus (8) is ( 7 ) ,aiid (9) proves independence, since it is true on Co and hence
by the (T, A) criterion (cf. Theorem 2.1.3), on all of C . This completes the
proof of the theorem.
164 3 Conditioning and Some Dependence Classes
1 < < i n. The P,,, need not be product measures. Indeed (11) is just the
statement that P,, = P,,,,, IC,~, . (Verify this.)
Suppose now that we are given an initial probability Pl on E l . Let us then
say that PO7, is a pmductlike measure on C,,, , where a, = (1, . . . , n ) as before,
if there exist mappings P,(., .) : C, x RorLpl+ R+ such that
The right side of (13) is just the right side of (12) when regular conditional
probabilities exist. We now present the desired extension using the above no-
tation.
One of the interesting features of this result is that Jessen's theorem ex-
tends t o certain noiiindepeiidence (but just productlike) cases. Again no topo-
logical conditions exactly intervene, but the existence of regular conditional
probabilities, is assumed instead. We now show that this result implies the
existence of Markov processes that were discussed in the last section.
Then there is a unique probability P : C + [0, 11, a Markov process {X,, n >
1) on (R, C, P) such that PIX1 E A] = P l ( A ) , A E El, and for each A,", E
C, of the form A,,, = A1 x . . . x A,, a , = ( 1 , 2 , .. . , n), we have
3.4 Existence of Various Random Families 167
This result implies that all random families considered in the last section
exist. Regarding both Theorems 5 and 6, the reader may have noticed the
special role played by the availability of a minimum value in the index set.
168 3 Conditioning and Some Dependence Classes
The proof is almost identical to that of Theorem 5, and is left t o the reader.
3.4 Existence of Various Random Families 169
POTY'(A,)=/ 1 ~ ~ ~ ( d w ~ ~ ) ~ ~ ~ P ~ ~t,,,t,,-l(dwt,,;wt,,-l)l
~ ( d w ~ ~ ; w ~ ~ ) . .
(18)
where A, = Atl x . . . x At", E C,, a = ( t l , . . . ,t,) C T . I n fact, Xt(w) =
wt E Rt, t E T, w = (wt, t E T ) E fl, defines the above Markov process.
This result follows immediately from Theorem 7, and the fact that (17)
implies the consistency of the system defined by (18). Note that if the minimal
element exists, then we can always start for each a! E .F from the minimal
to E T . The modifications are simple, and again are omitted (given as Problem
23).
Observe that we strengthen (17) so as to be valid for all x E fl,, not just
a.a. x E R,. This is useful in constructing Markov processes from each given
starting point; i.e., in case (i'), Pt, may be replaced by Pz[= Pt,(.xo) with
Pto(A1xo)= x A ( x O ) With
] this result we now have the complete existence
theory for the work of Section 3. However, the demand for regular condi-
tional probabilities is not always fulfilled naturally unless we have topologies
in the Rt-spaces with which to use the theory of Section 2. In those cases
(with topologies) we can present a more general and natural proposition for
applications. This is due to Kolmogorov (1933), and its extension t o Bochner
(1955). These results are sufficient for all the processes we deal with in this
book, and in fact for essentially all stochastic theory. We thus turn to them.
They again may be skipped at a first reading.
170 3 Conditioning and Some Dependence Classes
and by induction
>
Proposition 2 Let {X,, B n , n 1) c L 1 ( P ) . T h e n it is a martingale iff
X, can be expressed as X, = C;=, Yk, where E'"Y~+~) = 0 a.e., k 1. >
>
Moreover, for a martingale sequence {X,, B,, n I), if each X, i s i n L 2 ( P ) ,
>
then its increments {Y,+1 = X,+1 X , , n 1,Yl = X I ) form a n orthogonal
>
sequence. [ T h e Y,, n 1 is also termed a martingale difference sequence.]
>
Proof For a martingale {X,, B,, n 11, if we set Y, = X, - XnP1,n > 1,
and Yl = X I , then
Conversely, if the condition holds, then for each n > 1, since X,+l =
~iz: n
Yk = Yn+l+ C k = 1Yk = Yn+1+ Xn, then
>
Hence {X,, B,, n 1) is a martingale.
If the martingale is square integrable, then for the increments sequence
>
{Y,, n 1) we have, with m < n ,
Hence it is a martingale.
For the second statement, note that
=L E"" ( x [ ~I>.]
~ ,X, ) d P , (because X, is B, - adapted),
Proof The assertioiis follow from the coiiditioiial Jeiisen iiiequality. Indeed,
by hypothesis
EBr7 (&+I) X, > a.e.
3.5 Martingale Sequences
provided E($(X,)) < oo for all n < no. Since q5 is convex, there is a
support line such that ax + b 5 $(x) for some real a , b and all x. Hence
>
E ( 4 ( X n ) ) aE(X,) + >
b > o o for each n 1. Thus if E ( 4 ( X n ) ) < oo, then
< <
I E (4(X,)) I < oo for 1 n no, aiid (6) implies {4(X,), B,, 1 n no) is< <
a submartingale, as asserted.
Since E , no, are arbitrary, (8) plus additivity of u,, +-u,, (a)
= C,"=,
u,, (A,),
and hence u, is a-additive on B, for each n . But each p , is P-continuous,
and thus it follows that u, is also P-continuous. By the Radon-Nikod$m
theorem (cf. 1.3.12ii), we may define x;) = dun/dPB,, on B,. Then the fact
that u,(A) = u,+l (A) for all A t B, implies
The integrands are &-measurable. Hence xi2)= x;') - X, > 0 a.e., and
Thus {xA2),
B,, n 1) is also a positive martingale, and this finishes the
proof.
The next result extends Kolmogorov's inequality in two ways. The exten-
sions are due to Doob, and HAjek and Rknyi. These are frequently used in
martingale analysis. [Here (ll),(121, and (14) are due to Doob.]
and
3.5 Martingale Sequences 179
(12)
(ii) If, moreover, X, > 0 a.e. for each n, we have for A > 0, and a n J, 0,
qPE(Xi) ifp>l
(14)
E ( l max
k l n XE ) 5 { ~ ~ l + E ( x n l o g + x n )i f~p = l .
e- 1
This gives (11). For (12)! we consider N = [minkin X k A], and set Nl = <
[XI <A]. If 1 < k <
n , let Nk = [XI, & X i A,1 <
i 5
k-11. Thus < <
Nk E Bk, disjoint, and N = UL=l Nk. Hence Nk c Nz)" aiid (u~z;
= hP(N1) + 1
Nz
X2dP + 1
N;-nnr;
X 2 d P (since N2 c N;)
>
Xn 0, a, - an+l > 0, we have
If YII, = 0, then the inequality is true; if lYlgiq > 0, dividing both sides
by this number, (15) reduces t o (14) in this case.
If p = 1, we let Z = (Y - l ) ~ -
[ ~ and
> ~ calculate,
] (since (Y - l ) ~ [ ~ <
< O),
~ ]
(t )
becomes
E(1') - 1 5 E (x,,log+ x,) + E(Y).
and
iiif X,
XP'n21
< A] > liin sup X L d P + E(X1)
First Proof To begin with, we establish that X, + X' a.e., and then
that X ' = X, a.e. Thus let X, = lim inf, X, and X * = liinsup, X,. Then
X, < X * and are B-measurable. For the first assertion it suffices to show
that if B = [X, < X*], then P ( B ) = 0. Equivalently, if B,,,, = [X, rl < <
7 3< X * ] , SO that B = U{B,,,, : rl,r2 rationals}, then P(B,,,,) = 0, since
the uiiioii is countable. [We used the fact that v is P-continuous +- v, is
P,-continuous and so, X, = dv,/dP,.]
Let a, b be in R and consider Ha = [X, <
a], K b = [ X * b]. Then >
Ha E B,, K~ E B,. We assert that
Since v l ( f l ) < oo, on letting n + oo (21) reduces to (i) of (20) if A E U,, B,.
For the general case, let p(A) = aP(H, n A) v(H, n A). Theii p i s a real
-
>
Since rn 1 is arbitrary, this shows that u,(A) = JA XXIdP,AE B,,
and then as in the preceding argument the a-additive function v&(.) -
Since X,,X1 are B,-adapted, X' = X, a.e. Moreover, by the same Vitali
theorem, E(IX, X, ) + 0 as n + cm. This proves the theorem completely.
-
>
EBrb(Y,) = YE a.e. for all n no, we have for n m no > >
IX, X , I Y , I + EBf-(x,) p K I
< I EUn(x,) a.e.
< E", (x, YEI+ E""] (IX, Y E ) a.e.,
- -
5 lim P
[
l <sup
k<n E""(x,-Y~';)>x/~]
The preceding argument shows, for any integrable r.v. Z such that X, =
EBr7 (Z),n > 1 (cf. also Proposition 3), that {X,, B,, n > 1) is a uniformly
integrable martingale. Conversely, given any uniformly integrable martingale
{Y,, B,, n. > l), define
Then (23) shows that u, = u,+l B,. Hence we may define u : & +R
by setting u(A) = u,(A) if A E B,, and this gives u unambiguotsly on the
algebra Un>l B,, and it is additive there. The uniform integrability now ad-
ditionally implies that v is o-additive on this algebra, aiid hence has a unique
>
a-additive extension t o B,. Thus for each martingale {X,, B,, n I}, we can
>
associate a compatible system {R, B,, u,, P, n 1) which determines a signed
measure space ( R , B,, u). Here T , : 0 + 0, = 0 are identity mappings.
This exhibits an inherent relation between martingale theory and the exis-
tence theory of Kolinogorov and Bochiier (cf. Theorem 4.10). This seemingly
simple coiiiiection actually is much deeper between these two theories. An as-
pect of this is exemplified in the second proof below. However, if the sequence
{X,, B,, n > 1) is merely a martingale (but not uniformly integrable), then
also u : Un>l B, + R is uniquely defined, but is only a finitely additive func-
tion. F i i i a l l ~note that in Theorem 7, the sequence {X,, B,, 1 5 n 5 oo} is a
>
martingale, so that X, = E'" (X,), n 1. If there is such an r.v. X t L1(P)
with X, = E'~L(X),then the martingale {X,, B,, E'-(X), n > 1) is said t o
be closed on the right.
We shall now present the general martingale convergence theorem, again
with two proofs. The first one is direct, in the sense that it is based only on
Theorem 6i after a preliminary simplification. The second one, based on an
application of Theorem 4.10, is a reduction of the proof t o that of the pre-
ceding theorem of Andersen and Jessen. It therefore shows that both these
186 3 Conditioning and Some Dependence Classes
results are equivalent, although this connection lies somewhat deeper. There
are several other proofs of both these theorems (and also of their equivalence
assertion), but we shall present a relatively simple argument. However, all
these different proofs have independent interest, since they lead t o various
extensions of the subject.
>
Theorem 8 (Doob) Let {X,, B,, n 1) be a martingale on (R,C , P) and
sup, E ( X,I) < oo. Then X, + X, a.e. and E(I X,I) liminf, E(I X,I). <
First Proof Here the convergence assertion follows if we express each
X, = xi1) xi2)
- with {x:), B,, n >
1) (since sup, E(I X , ) < oo) as pos-
itive martingales for z = 1 , 2 , by Lemma 5, and prove that +x x?)
): a.e.
Thus the result obtains if each positive martingale {Y,, B,, n >
1) is shown
t o converge a.e. Since {ecYrb, >
B,, n 1) is a positive uniformly bounded sub-
martingale, aiid Y, + Y, a.e. iff ecYn + ecY- a.e., it is clearly enough
t o establish that each bounded positive submartingale converges a.e. Since
L w ( P ) c L 2 ( P ) , this will follow if we demonstrate that each positive square
integrable submartingale {Z,, B,,n >
1) satisfying ~(2:) K < oo con- <
verges a.e. Now by Leinina 4, {Z:, &, n >
1) is a submartingale, aiid if
<
a, = E(Z:), then a, 1' a , a K < oo, as n + oo, because
and both terms on the right are nonnegative since by the submartingale hy-
pothesis ~ (Z,(Z,
~ 1 2,)) = z,(E'.~ (2,) 2,)
- - - >
0 a.e. Now let n + oo,
and then rn + oo; the left side of (25) tends t o zero, and hence so does the
right side. Thus E(Z, - 2,)' + 0, implying that Z, + 2, in L ~ ( P ) .Using
this, we can deduce the pointwise convergence.
Let m >
1 be fixed and consider (2, - Z,, B,, n > m). This is clearly a
submartingale. Hence by Theorem 64 given X > 0, we have
XP
I max (Zk
m<k<n
- 2,) >X <E(Zn
J - ZmI)
and
sup lZk-Z,I
k>m
>X J =O.
>
Hence Zk - Zm I + 0 a.e., and so {Zk(w), k 1) is a scalar Cauchy sequence
for a.a.(w). It follows that Z, + Z, a.e. [and in L1(p)]. Thus, recapitulating
the argument, what we have shown implies that X, + X, a.e. for the original
mart ingale.
The preceding result also implies I X, + I X, I a.e., and then by the Fa-
tou inequality one gets E(I X,I) <
liminf, E(I x,). This proves the result
completely.
>
Since A E .En is arbitrary, {T,, .En, n 1) is a positive martingale.
Finally, let u, : .En + R+ be defined by u,(A) = JA jan(s)dp(s). Then un
is a-additive on .En for each n. Also, this gives a unique additive set function
u on all the cylinder sets of fl since, u, = un+ll F,, by (29). On the other
hand, let Garb = u, o xi,:. Then G,,, is a finite measure on Z, , aiid so is
a Lebesgue-Stieltjes or regular measure (i.e., by the standard real analysis
theory, each open set has Garb-measurefinite, and it can be approximated
from inside by compact sets, even intervals). If the constant value E ( X n ) = a
is taken as a = 1, by dividing if necessary, then Gar, is even an n-dimensional
distribution fuiictioii on Sorb.Hence by Theorem 4.10, the compatible family
>
{Garb, n 1) uniquely determines a a-additive function 5 : Z + [0,a] such
that Garb= 5 o T;,: = u o T;,:, n > 1. It follows that u is a a-additive and
uniquely (by extension) defined on Z, u, = ul.En, T, = du,/dpF7,. Hence by
Theorem 7, n, + nw a.e. (and n, = d u l d p also.) Thus there is a set A E Z
with p(A) = 0, aiid n,(s) + n,(s), for all s E S A. Let N = f p l ( A ) , so
-
The last statement now follows by Fatou's lemma, as before. This completes
the second demoiistratioii of the theorem.
and the left side gives a martingale, while {A; A,, Bn-l, n 1) is adapted.
- >
Since Bo = 0, Yo = Yd a.e., aiid for n 1, one has >
E"'? (B,+l) = E"" (Y,+l - Y;+,) = Y, - YA (by the martingale property)
= y,+1 YA+1
-
(because is &-adapted).
>
Theorem 10 Let {X,, B,, n 1) be a submartingale with sup, E ( X,I) <
cm. ThenX, + X , a e . , andE(IX,I) liminf,E(X,I). <
Proof By the above theorem, X, = Y, +A,, where A, > AnP1 > 0 a.e.,
>
and {Y,, B,, n 1) is a martingale. Heiice
by hypothesis. Thus sup, E ( Y,) < cm, and Theorem 8 implies Y, + Y, a.e.
Coiisequently X, = Y, +A, + Y, +A, = X,, say, a.e. The last iiiequality
between expectations follows again by Fatou's lemma, completing the proof.
3.5 Martingale Sequences 191
<
Remark In (33) only the weaker hypothesis that E ( X 2 ) K O < oo is
used, but in (34) we needed E(I X, 1 ) <
K1 < oo. However, these two are
equivalent conditions. In fact, XnI = 2XL - X, and E(X,) >
E(X1) for
submartiiigales. Thus
< 2E(XL) E(X1) < 2 s un p E ( X L ) E(X1)1
E(I XnI) (35)
and hence if E(X$) < K O< oo, then E ( X,I) < K1 < oo [since X1 E L'(P)].
On the other hand, X L < X,I, so that the opposite iinplicatioii is always
true.
= Lx,~P Lx,~P
+
Proof (i)+(ii)+(iii) and (i)+(iv) have the same proof as in the pre-
ceding result with equality in (36). That (iii)+(i) follows from the fact that
X, = E"~L(X,) a x . , by the present hypothesis, I X , < E"" ( X,) a.e.,
by the coiiditioiial Jeiiseii inequality, and that {E"" ( X, I), n > 1) is a uni-
formly integrable set. [This property was noted before, aiid is an immediate
consequence of Theorem 1.4.5. Indeed, if Y, = ~ " ~ ~ X,I),
( 1
Proof By (ii), Z,, Z, are integrable. Also, by Theorem 12iii, {E"" (X,),
< <
1 n cm) is a uniformly integrable martingale, so that EUrb(X,) + X,,
a.e., a n d i n L1(P). Let U, =supnr,IX,-X, < 2 Y , m > 1. Then by hy-
pothesis U, + U, = lim,, supn2, X, - X, = 0 a.e., and dominatedly.
In particular, E(U,) < 2E(Y) < oo, and Urn J 0 a.e. On the other hand, if
12> m,
The last term of (37) tends t o zero a.e. as well as in L1(P), and the first term
also goes t o zero by the coiiditioiial dominated convergence criterion. Hence
2, + 2, a.e. Thus E(EBr7 (U,)) = E(U,) + 0 by the Lebesgue dominated
convergence, so that E ( 2, - Z,I) + 0 also, as n + cm. This proves all the
statements.
orems 7 and 8 on probability spaces. It must be noted that there are analogous
results if the probability space ( R , C, P) is replaced by a nonfinite (a-finite)
space ( 0 ,C ,p ) , and then the comparison of difficulties will be reversed. How-
ever, we do not treat this case here. (See Problems 39, 40 and 42.)
Proof We follow the argument of the first proof of Theorem 8, for con-
venience. If uk : A ++ JA X k dP, A E Bk, then vk is a signed measure aiid
the martingale property implies u l Bk = uk, k > 1. Since a signed measure is
bounded, and by the Jordan decomposition y = u ~ vwe ~let vk , = fic-fii,
where fi: : B, + R+ is a (finite) measure such that u f B , = ,f:i i > 1. Evi-
dently, fi: is Pk(= P ~k)-continuous. By the Radon-Nikod9m theorem, there
exist xi1)= dfi;/dpk, x?) = dfi,JdPk such that XI, = xi1) -xP) (because
uk = fiz - fi;), and {x;),Bk, k > l), 2 = 1,2, are positive decreasing mar-
tingales. Hence t o prove X k + X, a.e., it suffices t o prove that each positive
decreasiiig martingale converges a.e. [Note that the proof of Jordan-type de-
composition for decreasing indexed martingales is simpler than that for the
illcreasing case (cf. Lemma 5), since in the latter there need be no a-additive
u on a(Unyl B,) such that uBk = uk. Even though the Jordan decomposition
is valid for finitely additive set functions, their restrictions fi: (of u* t o Bk)
are not suitable, aiid t o obtain a useful splitting, one needs the coinputatioiis
given in Lemma 5.1
The proof is now essentially the same as in Theorem 8. Thus briefly, if
{X,, B,, n > 1) is a positive decreasing martingale, then {e-Xr7,B,, n > 1)
is a positive bounded decreasing submartingale by Lemma 4, and X, + X,
a.e. iff ecX1l + ecXx a.e. If {Z,, B,, n > I} is a positive decreasiiig L 2 ( P ) -
bounded submartingale, then E ( Z i ) = a, J a >0 as n + oo. Next (25)
implies, on coilsidering 0 < a, - a, + 0, letting first n + oo and later
m + cm,that E(Z,-2,)' + 0, SO that Z, + 2 , in L 2 ( P ) .If Y, = 2,-Z,,
>
then {Y,, B,, n 1) is a submartingale such that Y, + 0 in L 2 ( P ) .With this
the work of (26), (27) holds, since the maximal inequalities in the decreasing
case take the form for any X > 0,
-AP 1 min yk 5 -A
m<k<n 1> E(Y,) - E(Y,+).
as n + cm and then m + cm. It follows from this that Y, + Y, a.e. and that
Y, = 0 a.e., [since Y, + 0 in L 2 ( P ) ] . Hence Zn + 2, a.e. and in L 2 ( P ) .
This proves that X, + X,, a.e.
The uniform integrability follows from the fact that
>
Theorem 15 Let {X,, B,, n 1) be a submartingale o n (a, C, P ) , where
B, > B,+1 are a-subalgebras of C. T h e n X, + X, a.e. Moreover, the se-
quence {X,,B,, 1 n< < cm} i s a submartingale iff E(X,) >
K > oo, or
equivalently the submartingale i s uniformly integrable. (The last condition is
automatic for martingales, but not for submartingales.)
Proof For the convergence statement, one uses a form of Theorem 9, then
reduces to the result of Theorem 14 as follows. Define a1 = 0, and recursively
for n > 1,
active areas in probability. A few extensions will be treated later, for instance,
in Chapter 7.
In the theory of nonparametric statistics, a class of random sequences,
called U-statistics (originally introduced by W. Hoeffding in the late 1 9 4 0 ' ~ ) ~
together with their limiting behavior plays a key role. The latter can be stud-
ied effectively with martingale theory. Let us define these and deduce their
convergence properties.
Let {X,, n > 1) be an i.i.d. sequence of random variables, or more gen-
erally an exchangeable or permutable orsymmetrically dependent sequence, in
>
the sense that for each n 1, {X,, , . . . , X,,, ) aiid {XI, . . . , X,) have the same
joint distributions for any permutation ( i l l . . . , i n ) of (1,2,. . . , n ) (cf. Defini-
tion 2.1.11). If f : Rk + R is a Borel function such that f (xil, . . . , xili) =
f (21,.. . , x k ) , SO that it is symmetric and if either f is bounded or
then the U-statistics are the random variables {u;,,, n > k ) defined by
so that E(u{,) = E ( X 3 ) for the i.i.d. case, and complicated higher symmetric
functions are possible candidates for other parameter estimations. The matter
is not pursued further.)
This sequence has the following general property:
U;,,+U~,~-E
then u L , ~= E(f ( X I , . . . ,XI.,))a e . , and hence is a constant.
Proof First note that, for symmetrically dependent r.v.s, by definition
for each subset i l < i2 < . . . < i k of (1,2,. . . , n ) , 1 < < k n < oo, the
joint distributions of ( X i l , . . . , X,,) and ( X I , . . . , X k ) are identical. Hence if
198 3 Conditioning and Some Dependence Classes
= O ( U { , ~m
Since Fn+1 , > n + I ) , the first r.v. in the sequence which is
In case the r.v.s are i.i.d. also, then Foe, being the tail a-algebra, is degen-
erate, so that uf,, is constant by Theorem 2.1.12. The L1(P)-coiivergmce
implies that E(U;+) tends to E(f (XI, . . . , X')), as asserted.
>
Corollary 18 Let {X,, i 1) be symmetrically dependent integrable r.v.s
>
on (R, C , P ) . Then, if ';fn = a(Xk, k n ) and ';fm = 0, F n , we have
Remark If the X, are i.i.d., then the above result becomes, since X1 is
independent of F,, so that E ~ = ( X = ~ )E(X1), the sufficiency part of the
SLLN given in Theorem 2.3.7. The necessity part, however, is n o t implied by
the martingale theory and one has t o refer t o the original work presented there.
+
Proof By hypothesis, if Q, = QE Q i is the Lebesgue decomposition
of Q, on B, relative t o P, into the absolutely continuous part Qk and the
singular part Q i , so that Q i ( B o ) > 0 only on Bo with P,(Bo) = 0, we have
for any B E B,,
3 Conditioning and Some Dependence Classes
a.e. [p].
It follows from (47) that [(g/fo) > a] = [(gl/fo) >a] a.e. [PI.This also
shows that these sets do not depend on the auxiliary dominating measure
p, and any other such p' can be used. Siiice f = gl/fo, by (46) one has
0 5 f (w) 5 (g/ fo)(w) < m for a.a.(w). Replacing C by B, and B,, we deduce
that f, = (9nIf0,n) and fa = (ga/fo,,) a.e.[P]. But {g,, &, 1 5 n 5 m)
aiid {fo,,, B,, 1 < <n oo) are martingales on (a, C, p) by Theorem 8, so
that g, + g, a.e. [p] and in L1(p); similarly fO,, + fO,a a.e. [p] and in
L1(p), with fo,, = 0 a x . only if the measure P, = 0. Siiice this is not
the case [Q(Q) = 1 = P ( G ) ] , we get f, = g,/fo,, f, = g,/fo,,
+ =
(dQ,/dp,)/(dPa/dl-l,) a.e., and the last ratio is dQ&/dP, a.e.[p] (hence
[PI),by (46). This proves the theorem completely.
The preceding result is usually given in terms of the image measures, which
are distribution functions. This can be translated as follows. Let Fn,Gn be
the n-dimensional d.f.s of P, and Q,, i.e.,
3.5 Martingale Sequences
Here xi = Xi(w) is the value of Xi for the realization w. Thus the ratios
(gn/ f n ) form a supermartingale aiid their values can be calculated in any
problem. It should be noted, however, that each gn/fn : Rn + R+ is defined
and as n changes their domains vary and are not nested. Thus {g,/ f,, n 1) >
cannot be meaningfully described as a supermartingale on the spaces {Rn, n >
11, though informally it may be and is so described in the literature. The
rigorous definition (and meaning) is that given by (48).
An interesting consequence of the above theorem is the followiiig result,
which we present as a filial item of this chapter.
Remark The term "quasi-martingale" was used for a more general process
in the middle 1960s by D. L. Fisk and it was also termed an F-process and
a (*)-process. It is an adapted process satisfying (49), called a star condition.
The term has not yet become standard. We use it here since in the discrete
case they all coincide. Clearly every martingale and every L1(P)-bounded
sub- or superinartingale (see below) is a quasi-martingale. This class presents
a nice generalization of these important processes. The main convergence re-
sult above is originally due t o E. S. Aiidersen and B. Jessen, who gave a direct
proof as in Theorem 8.
>
(since E " ~ ~ ( Z , + ~ ) Z, and similarly for Y,),
Exercises
1. Let X , Y be two square integrable random variables on ( R , C, P ) . If,
further, E(XI Y) = Y a.e. aiid E(YI X ) = X a.e., show that X = Y a.e.. The
same result is true even if X , Y are only integrable, but the proof is slightly
more involved. (For the latter use Theorem 1.4.5iii and Theorem 1.9iii.)
>
2. Let { X n , n 1 ; Y ) be a s e t of r.v.s in L ~ ( Q , C , P )and
, B = a(X,,n>
1). Consider the subspace L2(fl,B, P) of L 2 ( f l ,C, P).Each X n is in the sub-
space, but Y need not be. Show that there is a unique Yo E L 2 ( f l ,B, P)
such that E(IY Y o 2 ) = inf{E( Y XI2) : X E L 2 ( f l , B ,P)), aiid that
- -
6. Let (0,C, p) be a (a-) finite space and L"(p) be the usual real Lebesgue
<
space on it, 1 p < cm. An operator T : L"(p) + L"(p) is termed positive if
Tf > 0 a.e. for each f >
0 a.e. Establish the following statements:
(a) If T is a positive linear operator on LP(p) + LP(p), theii (i) T(f)I <
T ( f 1) a.e., (ii) f, > 0, f, < 9 E L"(P) + sup, T(f,) < T(sup, f,) a.e.,
and (iii) a positive linear operator T : L"(p) + L"(p) is always continuous
(=bounded).
(b) Let T : LP(p) + LP(p) be as above, and fnl < g E LP(p). If fn + f
a.e., theii T(f,) + T ( f ) a.e., and in LP(p). (In other words, the assertioiis of
Theorem 1.3 for conditional expectation operators extend t o a more general
class. Here T need not be faithful or a projection.)
>
7. (i) If { X n , n 1) c LP(P) on a probability space (a,C, P), then
X, is said to converge weakly to X E LP(P) if for each Y E Lq(P), where
+ <
ppl q p l = 1 , l p < oo, E(X,Y) + E ( X Y ) as n + cm. Show that for any
a-algebra B c C, E"(x,)+ E"(x)weakly if X, + X weakly. [Hint: Verify,
by Proposition 1.2, that E(YE"(x)) = E(xE"(Y)).] This property of E" is
called "self-adjointness."
(ii) Let {X,, n > 1) be as in (i) and T : LP(P) + LP(P) be a con-
tiiiuous linear operator. It is known that there is a continuous linear T * :
L4(P) + L4(P) such that E ( Y T ( X ) ) = E ( X T * ( Y ) ) ,X E L"(P), Y E L4(P),
and if 1 < p < oo, then (T*)* = T. Show that if X, + X weakly in
L"(P), 1 < p < cm, then T X n + T X in the same sense. (This reduces to
(i) if T = E".)
(a) Suppose that {Po,6' E I) is a dominated family, in the sense that there
is a finite (or a-finite) measure X : C + R+ such that Po << X for all 6' E I.
[Note that if X is a-finite, then there exist A, E C, disjoint, 0 = U,>l- A,, 0 <
X(An) < oo,by definition, so that
with 0 < 0: < co, p < 1,let Z = X1+X2. Show that E " ( ~ ) ( =x aZ~a.e.[P]
)
is a representation of the conditional operator, where a is some real number.
Verify that E(Z)' 5 a:/a: with a1 = (a: +pala2)(a: +2pala2 +a;)-'. [First
compute the d.f. of ( X I , Z).] Is a = a l ?
13. We now state a form of the classical example showing the nonex-
istence of a regular conditional probability. Let (a, C, P) be the Lebesgue
unit interval, and A c 0 be a nonmeasurable set of Lebesgue outer mea-
sure one and inner measure zero. Let 2 be the a-algebra generated by C
and A. Verify that A E 5 iff A = (A n B ) U (Ac n D ) , B , D E C . Let
+
P(A) = a P ( B ) (1 a ) P ( D ) ,0 < n < 1. Show that P is well defined on
-
14. In the opposite direction of the above result, here is a positive asser-
tion. If (S,Z) is a measurable space, it is called a Borel space if there exists a
Borel subset A c R aiid a one-to-one bimeasurable, onto, mapping f : S + A
[so f is (Z, R(A))-measurable aiid the inverse f : A + S is (R(A),Z)-
measurable, where R ( A ) is the trace a-algebra of the Borel a-algebra R of
R.] It is known that every separable complete metric space with its Borel
a-algebra is a Borel space. [Cf. Parthasarathy (1967), Section V.2, on Borel
spaces.] If B c C is any a-algebra of a probability space ( R , C, P),(S,Z) is a
Borel space, aiid X : R + S is an r.v. [i.e., X p l ( Z ) c C], then a conditional
distribution P" O X - ' = P(., .) : Z x 0 + [O, 11 exists, and if X ( 0 ) E Z, then
P" itself is regular on C1 = XP1(Z). [Hint:Consider Y = f O X : 0 + A, and
by Theorem 2.5, P" o Y-l gives a version which is a regular conditional dis-
tribution, Q ( . ,.), aiid let P(., .) = Q( f (.), .). Since f = (f P ' ) ~ ' , it preserves
countable operations. The second part is as in Proposition 2.6.1
where a2 = E ( Y 2 ) > 0. Thus we have yet another form of the density de-
pending on the derivation basis which is different from all the others!
Thus the limiting moments do not depend on the initial distribution of the
r.v. Z o What happens if P[IX Y I = I] = I? [It results from the Helly-Bray
D
theorem of next chapter that Z, + Z and n k = ~ ( 2 " )The . last equation
follows from Problem 10 of Chapter 2 also, if the convergence result is given,
since {Z,, n >I}, being a bounded set, is uniformly integrable. This result
(and the next one) appears in DeGroot and the first author (1963).]
D
21. (Continuation) Let Z, + Z as in the preceding problem, keeping the
same assumptions and notation. For some applications, indicated in Problem
19 above, it is of interest to find the limit Z-distribution. If F, is the distribu-
tion of Z, and F that of Z , using the relation Z, = X,ZnP1 + Yn(l - Zn-l)
aiid knowing that F, + F, show that for each n (the intervening conditional
probabilities are regular and)
+
with a ( z , t) = max[O, (x t - l ) / t ] and b(z,t) = min[(z/t), 11. It is of interest
to solve the integral equation (*) for general classes of g(., .), but this does
not seem easy. The reader is urged to work on this interesting open problem.
As an example, verify that if X , Y are independent aiid have densities g x
and g y , given (for p > 0, q > 0) by the beta density
0 < z < 1 , 0 < y < 1, theii f z of (*) is also a beta density, given as
<
22. (a) Let X I , . . . , Xn be i.i.d. with a d.f. given by F(z) = 0 if z 0, = z
if 0 < z < 1, and = 1 if z > 1, called the uniform d.f. If X I , . . . ,X: are the
order statistics, show that a coiiditioiial density of X I , . . . , X i given Xf =
c , , O < c i < l , i = k + l , . . . ,n , i s g i v e n b y
< <
and = 0 otherwise, aiid hence that {Xz, 1 k n} forms a Markov process.
Deduce that, if X I , . . . , X n are i.i.d. with a contiiiuous strictly iiicreasiiig d.f.,
theii their order statistics also form a Markov process.
(b)We have seen in the proof of Theorem 3.9 that Zk = ~ f Ui,=where ~
the Ui are independent exponentially distributed random variables. Changing
the notation slightly, suppose that S, = Cr=l Xi where the Xi are indepen-
dent, each exponentially distributed, i.e., P ( X i < z) = 1 e P x , z = 0, so
-
that S, is iiicreasiiig (in the proof 2, has a reverse labelling!) Show that,
using the change of variables technique of elementary probability, the vari-
s,,+1k = 1 , . . . , n have a multivariate beta (also called a Dirichlet)
ables Yk = Sk-,
Exercises 211
distribution. Thus, Yk and X i of part (a) above have the same (joint) distri-
bution, the latter being obtained using a random sample from the uniform
distribution on the unit interval.
23. Complete the details of proof of Theorem 4.8 on the existence of con-
tinuous parameter Markov processes, and also the details of Theorem 4.7 t o
use it there.
< <
where Bn = o ( X i , 1 i n). Thus if k = 1, we have the ordinary Markovian
condition, aiid if k > 1, the X, are said t o form a kth order (or multiple)
Markov process. (Follow the proof of Proposition 3.2.) The preceding rela-
tion shows that many properties of multiple Markov processes are obtainable
from ordinary (vector) Markovian results. Show that any kth-order Marko-
+
vial1 family is also (k 11th-order Markovian for all integers 1 > 0 but not
if 1 < 0. (This is an important distinction t o note, since in the theory of
higher-order "stochastic differential equations" of the standard type, the so-
lutions turn out t o be multiple Markov processes, but not of lower order ones.)
25. (a) As an application of Theorem 4.5 (or 4.71, show that there exist
multiple Markov processes (on suitable probability spaces) of any given order
k.
(b) Using Theorem 4.11, extend Problem 5(c) of Chapter 2 as follows.
Let (0,C, P) be a complete probability space, and {B:, x E R, t E I) c C ,
where I is any index set and B: c B:, a.e. for any x < x'. For each t E I,
there is a (unique) r.v. Xt : R + R such that [Xt < z] c B: a.e. and
[Xt > z] c (B:)" a.e. Show that for each t l , . . . , t,, if PIBZl nB2 n . . .nB$;, ] =
>
Ftl,..,trL( X I , . . . , x,), (+), then {Ft,,. . , t r Lt,, E I,n 1) is a compatible fam-
ily of d.f.'s and {Xt, t E I) is a process with the F ' s as its d.f.'s. [Thus the
process is determined solely by (R, C, P).]Conversely, any compatible fam-
ily of d.f.'s as in Theorem 4.11, determines a probability space and a family
{B:, x E R,t E I) satisfying (+). Thus we have an adjunct of Theorem 4.11.
28. Let {X,, n >1) be a sequeiice of r.v.s on (a, C , P) with a finite state
space (=range), say, T = ( 1 , 2 , . . . , s). If the sequeiice is a kth-order Markov
process ( = a chain in this case), it can be regarded as an ordinary Markov chain
with values in TIC c TKk, by Problem 22. Let Yn = (X,+l,. . . , X n + k ) , n 1, >
be this chain. For each j, 1 E T', let pi;) = P([Y, = 111 YnPl = j) be the
transition probability. Suppose that the one-step transitions are independent
of n (i.e., the chain has coilstant or stationary transitioii probabilities.) Find
the Chapman-Kolmogorov equations in this case. (Note that, of the possible
s2%alues of the transitions, all but values vanish in this representation.)
Show that an irreducible Markov chain is path reversible iff the chain ad-
mits a positive invariant measure {mj, j >
1) such that for all j, k we have
mjpjk = mkpkj. In this case the invariant measure is also unique except for a
constant multiplicative factor. [Further properties of such chains can be found
in Feller (1957) and Kendall (1959). Note that in our definition of a Markov
process (cf. Definition 3.1), it is symmetric with regard to the ordering of the
index set, and here the symmetry is about the motion of the chain relative to
its states.]
Since u << p , this is well defined. Show that {X,, F,, n >1) is a positive
martingale, so that X, + X, a.e. Verify that the sequence is also uniformly
integrable. (Note SIX,,> a1
X,dp = u[X, > a].) Hence deduce that u(A) =
S, X,dp,A E C, and that d u l d p = X,, a.e.[p]. If N = [X, = 11, then
P ( N ) = 0. If g = X,/(l - X,), and uC = u ( N Cn .), then uC << P, and
g = d u c / d P a.e. is the Radon-Nikodqm derivative. If us = u ( N n .), then
SA +
u(A) = g d P us (A), A E C, gives the Lebesgue-Radon-Nikod9in theorem,
where us : C + R+ becomes the P-singular part of u aiid uc its P-continuous
part.
(b) If now (a) is given, and g, = du:/dP,, where u, = u F,, P, = PF,,
>
show that {g,, F,,n 1) is a supermartingale on (R, C, P) such that g, + g
a.e. [Remark: Part (a), and hence (b), can be extended t o the general case
that C is iiot countably generated as follows. Consider A c C, a countably
generated 0-algebra, aiid let X A be the corresponding duA/dpA. The col-
lection of all such 0-algebras can be directed (into I) under inclusion. Then
{XA,A, A E I) forms a uniformly integrable positive martingale. Such a
martingale can be shown t o converge in L1(p) t o X (but not a.e.). This is
-
sufficient t o conclude that X = duldp and then g = X / ( l - X ) on NC,where
N = [ X = 11, gives tj = duc/dP. Thus the general result is obtainable from
the martingale theory. See, e.g., theorem 5.3.3. in Rao (1987, 20041, for more
details.]
31. Show by an example that Theorem 5.8 is not valid if the moment
condition is omitted; i.e., a lion-L1 (P)-bounded martingale need iiot converge
a.e. (Let {X,, n > 1) be independent, P [ X , = +n] = P[X, = n ] = and
>
coiisider { S , = C r = l a k X k ,n I ) , and follow the discussion in Section 4C
of Chapter 2, with suitable ak, the interarrival times.)
>
32. Let {Vk,F k - 1 , k 1) be an adapted uniformly bounded sequence of
r.v.s and {X,, F,, n > 1) be an L2(P)-bounded martingale on a probability
space ( R , C, P ) . Show that the sequence {S, = >
Vkq5k,n 1) converges
a.e. and in L 2 ( P ) , where q5k = X k - X k p l is the martingale increment or
difference, and X o = 0 a.e. (Such a Vk-sequence is termed predictable and S,
a predictable transform of the X,-process. Note that the q5k are orthogonal.)
214 3 Conditioning and Some Dependence Classes
33.Let{X,,n>1)bei.i.d.andB,=a(Xk,1<k<n),B=a(U,,1&), -
In case {X,, B,, n > 1) is a submartingale, then one has for any A,
P sup X,
n
>X <
J lim ~ ( e ~ ( ~ " - ' ) )t, > 0
This yields the result, due originally t o D.G. Austin that Ek,l4;
- < oo a.e.,
and then our assertion follows.
Exercises 215
[Note that, by the submartingale property, we have ISn11, 1' IS1 1,. If p > 1,
this is a consequence of Theorem 5.6ii1with C,? replaced by p / ( p 1) = q; thus
> < <
if p 1.25, this is 5. For 1 p < 1.25, we need to prove this. Let us sketch
the argument which gives the result. This application is due to Doob. First
suppose that X, has a symmetric distribution; i.e., P[X, E A] = P[X, E -A]
> >
for any Bore1 set A c R. Since P[maxk<, Sk A, Sn A] = P[Sn A], verify >
>
with the decomposition of the event L a x k < , Sk A], as in Theorem 5 . 6 ,
that P[maxk<, - S k> < >
A, S, < A] P[Sn A], so that by addition one has
Then again as in the proof of Theorem 5.6ii with the above inequality,
In the unsymmetric case, let (GI, C1,P') be a copy of the given probability
space, and {XA, n > 1) be independent r.v.s distributed as the X,. Then on
>
the product space (0,C, P)x (R', C', P I ) , {X, -XA, n 1) is a symmetrically
distributed sequence, aiid the above result applies to give
But XI" is convex, and so IA" < 2"-I [ A - A'I" + AII"]. Hence
216 3 Conditioning and Some Dependence Classes
and B, is the 0-algebra generated by the partition, show that {f,, B,, n 1) >
is a uniformly integrable martingale and that f, + f a.e. and in L1(X). [Thus
every element of L1(X) can be written as the a.e. and in mean limit of a mar-
tingale of simple functions.]
44. (Continuation) With the setup of the preceding problem, aiid if the
>
tail 0-algebra 7 of {X,, n 0) on (R, B, P ( x o ,.)) is degenerate, so that each
A E 7 satisfies P ( x o ,A) = 0 or 1, aiid if f is invariant, then show that f must
be a constant relative to P ( x o ,.). In particular, if P(x0, dy) = p(x0, y) dy, so
that we can define the convolution g(x) = (f * p(x0, .))(x) on R as
show that each bounded continuous invariant f depends only on xo. [Reduce
the problem t o the above one by writing p for p(xo,.) and considering the
>
special Markov process {S, = C;=,X k , n I), where the X, are i.i.d. This
is discussed, in a more general case, for finding the solutioiis p of p = p * P,
by G. Choquet aiid J. Deny in C.R. Acad. Sci., Ser. A., 250 (1960), 799-801.1
45. The theory of super- and subharmonic functions was primarily de-
veloped by I?. Riesz in the late 1920s; among other theorems, Riesz proved
an important decoinpositioii of such functions. These results have analogs in
martiiigale theory (by substituting "martingale" for "harmonic"), of which the
above two assertions are illustrative. Thus the corresponding Riesz decompo-
sition in our theory is as follows. A positive supermartingale {X,, B,, n 1) >
is termed a potential if X, + 0 a.e. and in ~ l ( ~ ) - m e aLet
n . {X,, B,, n 1) >
be an arbitrary supermartingale. Then it admits a unique Riesz decomposi-
+ >
tion X, = Y, Z,, where {Y,, B,, n 1) is a martingale and {Z,, B,, n 1) >
is a potential iff it dominates a martingale, i.e., iff there exists a martiiigale
{W,,B,, n >1) such that X, >W, a.e. for all n >1, or iff there is an
a > -oo such that limn E(X,) > a. [Hint: If X, dominates a martingale,
then E(X,) > E(X,+,) > E(W,) = n > o o , since the expectation of a
martiiigale is a constant. This implies sup, IE(X,) I < oo. If X, = Xk A, -
>
is the decomposition of Theorem 5.9, where {A,+1, B,, n 1) is the increas-
ing adapted process, A1 = 0, and {XA, B,, n >
1) is a martingale, so that
E(A,) < E ( X i ) +sup, I E(X,)I < oo, let A, = lim, A,. Then E(A,) < oo.
If we let Y, = Xk - E'I~ (A,), and 2, = (A,) - A,
E'VI >0, then we have
X, = Y, +Z, as the desired decomposition. For uniqueness if Y, 2, = X,+
is another decomposition, then Y, -p, = 2, Z,, so that
-
Exercises 219
Since { Y, - ? >
, , B,, n 1) is a submartingale
- -
whose expectations -
are nonde-
creasing, we must have E(I Y, - Y,) = 0, so that Y, = Y, and then 2, = 2,.
The converse is immediate.]
46. Consider the Haar system of fuiictioiis on [0,1]. These are defined on
[0,1] as Ho(.)= 1, Hz,,(.) = 2"2(X[0,2-b1) - x [ ~ - L - I , ~ - L ) ) , k = 0 , 1 , . . . , and if
<
1 j < 2" k >
1, H2r+j(.) = ~ ~ k ( . j 2 - ~ ) ~ [ ~ ~ - k , ( Then ~ + ~ {H,, >
) ~ - ~n ) . 1)
50. The preceding three problems admit the following further extension.
Let ( 0 ,C, P) be a probability space and C be countably generated. Then
there exists a (universal) martingale {X,, B,, n >
1) on ( R , C, P) such that
C = c(U, B,), aiid every extended real-valued measurable (for C), but not
necessarily integrable, f on R is a pointwise a.e. limit of a subsequence
220 3 Conditioning and Some Dependence Classes
{X,,, k > 1) of the X,,n > 1 sequence. [This result is involved, and it
abstracts the preceding work, and is due to Lamb (1974). However, not all
a.e. convergence results of Fourier series may be established by martingale
methods. For instance, if f E LP(O,I ) , 1 < p < oo,and q5k(x) = e2"i"",hen
So1
f = Ckrlakq5k, ak = f (x)$k(x) dx, converges a.e. (and in LP-norm), but
the partial sums do not form a martingale, and as yet no martingale proof of
this statement is available. A relatively "simple" proof of this result, which is
originally due to L. Carlesoii and R.A. Hunt, in the middle 196Os, is given by
C.L. Fefferman. Ann. Math. 98 (1973), 551-571.1
Part I1 Analytical Theory
The fine properties of probability are often obtainable by using the sharp
Fourier analysis techniques They are called characteristic function methods,
and Chapter 4 is utilized in establishing various important results including
the Lkvy-Bochner-Cramkr representation theorems applicable for uniqueness,
derivation of distributions of ratios of raiidoin variables and special properties
of sums of independent raiidoin elements. Then the longest chapter of the
book, Chapter 5, is devoted t o the central limit theory with error estimations,
t o stable distributions as well as t o invariance or functional limit theorems.
Here the Kolmogorov law of the iterated logarithm and certain m-dependent
theorems are also included. Several important compleineiits are considered in
both the chapters and should be of interest for students as well as researchers.
Chapter 4
Then for all n > N we have by the inonotoiiicity of the Fn and G aiid the
above inequalities:
Similarly
4.1 Distribution Functions
>
Remarks (1) The fact that the set {ri,i 1) c R is the set of rationals
played no part in the above proof. Any dense deiiuinerable set will do. Consider
F,(x) = O for x < n, = 1 for x >
n; we see that Fn(z)+ O as n + oo for
all x E R, so that the limit F satisfies F(z) = 0. Thus such an F is not
necessarily a distribution function (d.f.).
(2) If +oo and -cm are continuity points of each F,, then Fn(-oo) +
F(-oo), F, ( + m ) + F(+oo), so that F,(+O) - Fn(-0) = 1 implies in this
>
case that F is a distribution. In particular, if {F,, n 1) is a sequence of d.f.s
on a compact interval [a,b] c R, then the limit of any convergent subsequeiice
is necessarily a d.f., since {a,b) may be included in the set {r,, i 1). >
(3) The preceding theorem can be stated as follows: A uniformly bounded
sequence of nondecreasing functions on R is weakly sequentially compact in
the sense that it has a coiivergent subsequeiice whose limit is a bounded lion-
decreasing function.
The next theorem supplements the above result and is very useful in our
study. It should be contrasted with the Lebesgue limit theorems, for which
the integrands vary aiid the measure space is fixed whereas the opposite is
true in the following. It is due to E. Helly in a special case, and the general
case to H.E. Bray in 1919. The connection between these two viewpoints is
clarified in an alternative proof below.
Hence {Gn, n >1) is uniformly bounded. Thus there is an M < oo, with
G,(x) <
M , for all x E R.Next, for any a > 0, we consider
This is possible since G is bounded and its continuity points are dense in R.
>
By (i) and (ii), we may choose an Nl (E), such that n Nl (E) implies
Then
Similarly
IIrI < c[G,(+oo) - G,(a) + G(+oo) - G(a)].
Adding these two and using (2) and (3), we get
<
For I:l, since GnI M aiid [-a,a] is a compact interval, divide [-a, a]
at the xi into rn subintervals such that the oscillation of f on each is bounded
by ~ / 1 6 Mwhere -a = xo < x l < . . . < x, = a , the xi also being continuity
points of G. All this is clearly possible. Hence
4.1 Distribution Functions
Let Nz(E) be chosen so that n > Nz(E) + G , (xi) - G(xi) < &/8mc,i =
I , . . . , m. Then the above inequality becomes
Thus (4) aiid (5) imply for n > max(N1 (€1, N2(€11,
Note that, as the example in the first part of the above remark shows,
condition (ii) of the hypothesis in the Helly-Bray theorem is essential for the
coiiclusioii of ( I ) . In that example, F, is the d.f. of a discrete r.v. X,, and
P [ X , > a] + 1 as n + cc for each a > 0. Thus the probability "escapes
t o infinity," and condition (ii) is simply t o prevent this phenomenon from
happening, so that (1) is true.
We present an alternative sketch of the important Helly-Bray theorem by
reducing it t o the Lebesgue bounded convergence through the image probabil-
ity law (cf. 1.4.1) and the representation in Problem 5b of Chapter 2. [Readers
should complete the details given as hints there, if they have not already done
so. A more general case will be proved in Proposition 5.4.2.1
Alternative Proof For simplicity we take G,, G as d.f.s. (The general case,
which caii be reduced t o this, is left t o the reader.) Thus G,(z) + G ( z ) as
n + oo for all continuity points x of G. Hence by the above noted problem,
there exists a probability space ( 0 ,C , P) and a sequence of r.v.s X,, X on it
such that
aiid X, + X a.e. [In fact, (R, C , P) caii be taken as the Lebesgue unit interval,
and X, (w) = G;' (w),w E ( 0 , l ) where G;' is the (generalized) inverse of G,,
i.e., G;'(w) = inf{y E R : Gn(y) > w}, and similarly X(w) = G-'(w), w E R.]
Now if f is as given, f ( X n ) + f ( X ) a.e. and f ( X ) is an r.v. By the image
law 1.4.1, we have
4 Probability Distributions
= L f (x)dG(x)
The technique of the second proof above will be used to establish the fol-
lowing result on the convergence of moments.
and
E(lXIT)= 1B
xlrdF(x) = L, IYITdP1. (7)
Since 0 < r < s , the second coiiditioii implies the uniform integrability of
>
{1Ynr, n 1). Indeed, we have for any a! > 0,
uniformly in n. Hence, since IY, 1' + Y 1' a.e., by the Vitali convergence
theorem we deduce that
Using the same ideas as in the above proof, we can also deduce the follow-
ing result, which complements Theorem 2 in some respects.
Proof The method of proof here is to show that each part is equivalent
to (i) or (ii). Now the Helly-Bray theorem already established that (i)+(ii).
For the converse (cf. also the remark after the proof) let E > 0 be given.
If zo is a point of continuity of F, there is a 6[= S ( z O , ~>) 0] such that
z z o < S +-IF(z) F ( z o )< €12. We construct two bounded contiiiuous
- -
<
functions f l , fi on R, f l f 2 , as follows:
Hence.
Similarly
<
By (17)-(19) aiid (11) we have d(F,, F) E. Thus (iii) holds.
Conversely, if (iii) is given, let E > 0 and xo be a continuity point of F.
Then there is a S1 [= 61 (xo,E ) > 0] such that for 0 < 6 < 61, Ix - xo < 6 *
IF(z) F ( z o )< €12. If r] = m i n ( ~ / 2 , S )> 0, then by (iii) there exits No such
-
>
that n No + d(F,, F) < r ] , and from (11) we have
(i)+-(iii) can be quickly proved using Exercise 5(b) of Chapter 2 again. Thus
F, + F, a d . f . + X, + X a.e. on (fl',C1,P1),Fn= FxrL,F = Fx. So for c > 0,
there is an no,n 2 no + P 1 [ X n-XI < E] 2 1 - c . If f1° = [IX, - XI < c ] ,then
on 00,X, -E < X < X, +
E +- Fn(x E) E 5 F ( x ) 5 Fn(x
- - E) E. Hence + +
d(Fn, F) I E .
232 4 Probability Distributions
h k ( ~ ) P ( d-~ ) (21)1.
and the left side is equal to 1 - lim sup, P,(C). Hence (iv) is true. Conversely,
if (iv) holds, then considering the complements as here, we get (v). Thus
(iv)@(v) is always true.
( v ) ~ ( i v together
) imply (i). Indeed, let A c R be a Borel set whose
boundary has P-measure zero. [For example, if A = (a, b), then {a), {b) have
P-measure zero, which is equivalent to saying that a , b are contiiiuity points
of F.]Thus P(A - int(A)) = 0, where A is the closure of A and int(A) is the
interior of A. Thus by (v) and its equivalence with (iv) we have
P(A) > lim sup P, (A) > lim sup P, (A) > limninf P, (A)
n n
> limninf P, (int (A)) > P(int (A)) [by (v)]
But the extremes are equal. Thus limn Pn(A) = P ( A ) for every Borel A whose
boundary has P-measure zero. In particular, if A, = (-cm, x ) and noting that
Fn(s)= Pn(A,), F(s)= P(A,), this yields (i). Thus the proof is complete.
>
Proposition 6 Let {F,, n 1) be a sequence of d.f.s having moments of
>
all orders {Mik),k 1). Thus we have
If limn,, >
M i k ) = dk)exists for each k 1, and if {(a("), k 1) determines >
a distribution F uniquely, then Fn(x) + F ( x ) , as n + co, holds for each x
which is a continuity point of F, so that F, + F as n + co.
> k-oo
lim 11 - 1/a2 x2 dl?,, (x)
XI> a1
> k-oo
lim [l - l / a 2 x2 d ~ n(x)]
,
= 1- a ( 2 ) / a 2 [since AT?) by
+ a(2), hypothesis,]
>I-&.
Since E > 0 is arbitrary, we deduce that F(.) is a d.f., aiid Fn, + F.
Hence there is a probability space ( R , C ,P ) ( a s in the proof of Proposi-
tion 4) and a sequence of r.v.s {X,, , Y, k >
1) such that X,, + Y a.e.
and P [ X n h < x] = Fnh(x), P [ Y < x] = F (x),x E R. Also for each in-
teger p > 1,O <
)";M
"i + a('"), so that { M",);i k 1) is bounded.>
By Proposition 3, E(Xn,14) + E ( l Y 4 ) , 0 < q < 2p. Hence by Proposi-
tion 1.4.6, {IXnhIq, k >
1) is uniformly integrable. But this implies that
{X&, k > 1)) q >
1 integer, is also uniformly integrable. Consequently by
Theorem 1.4.4, E(X2,) + E(Y4), and the a ( 4 ) , 1 <
q < 2p, are the qth
moments of F . Since p is arbitrary, it follows that { a ( 4 ) , q 1) are all the >
moments of F, and by hypothesis these determine F uniquely.
If {Fnl, n' > 1) c {F,, n >
1) is any other coiivergeiit subsequence,
then the preceding paragraph shows that F,, + F', a d.f. Since F' also has
>
{a(4),q 1) as its moments, by the uniqueness hypothesis F = F'. But by
Theorem 5iii, the set of distribution functions on R is a metric space under
convergence in d.f. topology, and in a metric space a sequence converges iff
each of its coiivergeiit subsequelices has the same limit. Thus the full sequence
>
{F,, n 1) converges to F, coinpletiiig the proof.
2 34 4 Probability Distributions
This result is of use in applications only if we have some criteria for the
unique determination of d.f.s by their moments. The question involved here
is nontrivial and there has been a considerable amount of research on what
is called the "moment problem." For an account, see Shohat aiid Tainarkiii
(1950). For instance, if S = { z : F(z) > 0) c R is bounded, then F is uniquely
determined by its moments. This and certain other easily verifiable sufficient
conditions can be obtained from the work on characteristic functions of dis-
tribution functions. We now turn t o a detailed analysis of these functions and
devote the rest of the chapter t o this topic as it is one of the most effective
tools in the subject.
This concept was already introduced for, aiid the uniform continuity of 4 :
R + C was established in, Proposition 1.4.2. It is clear that complex analysis
plays a role, since generally 4 is complex valued. Note that by the image
probability law (Theorem 1.4.1), (1) is equivalent t o saying that, if X is an
r.v. on (a,C , P) with F as its d.f., then
Thus Fx is the convolution of Fxl and Fx, (already seen in Problem 6(b) of
Chapter 2). Also it is a commutative operation. Equations (4) and (5) together
imply that the ch.f. of the convolution of a pair of d.f.s is the product of their
ch.f.s. It is also clear from (5) that, if Fxl and Fx, have densities f l and
f 2 relative to the Lebesgue measure, theii the convolutioii Fx = Fxl * Fx,
becomes
Note also that (5) implies Fx is absolutely continuous if either Fxl or Fx,
has this property.
To get some feeling for these functions, we list a set of basic d.f.s that
occur frequently in the theory, and theii their ch.f.s will be given.
2. Poisson:
e-X 2
C05k.<a: k ! ' ~ > O 1
F(x) =
x = O , ( w e take X0 = 1 for X > O),
07 ifx<O, o r O < X < m .
236 4 Probability Distributions
3. Cauchy:
4. Gamma:
5. Uniform:
0, z < a a c R
a < z < b , b ~ R
1, z2b.
6. Bernoulli:
8. Binomial:
, I
k € Z
otherwise.
We leave a verification of these formulas t o the reader with a reminder that one
can use, for coiivenience, the calculus of residues for some of these evaluations,
(such is the case with the Cauchy and Gaussian distributions).
4.2 Characteristic Functions 237
The interest in ch.f.s stems from the fact that there is a one-to-one relation
between the d.f.s and their ch.f.s, and since the latter are (uniformly) continu-
ous, they are more suitable for a finer analysis using the well-developed results
from Fourier transform theory. The uniqueness statement is a consequence of
the followiiig important result.
F1(x) = f (x) = -
in L e-"x$(t) d t .
Proof The importance of the result lies in the discovery of the formula
(8). Once it is given, its truth can be ascertained by substitution as follows.
= 1ST
R -T
y e " ( x - a ) d t dF(x) (by Fubiiii's theorem)
where
t
sin h t
while -siiit(z a ) is an odd function of t.]
-
t
sin(z - a + h)t - sin(z - a - h)t
dt
t
sin u 1 (z-a-h)T
sin u
- duu - ; i - u du.
238 4 Probability Distributions
1
liin GT (z) =
T-oo
-
2
- +
[sgii(z a h) sgn(z - - a - h)]
O,z<a-h
21 , x = a - h
1, a - h < z < a + h
21 , z = a + h
0, z > a + h
liin
1
-
T - - w ~ ~
sin ht e-ita
-
S _t ~
4(t) dt = I-,
a+h
1. d F ( z ) = F ( a + h) - F(a - h),
*
since ( d F ) ( a h) = 0, by hypothesis. This establishes (8).
For the second part, let q5 be integrable. Dividing both sides of (8) by 2h,
and noting that (sin ht)/ht is bounded (by 1) for all t , we can first let T + cc
and then h + 0, both by the dominated convergence theorem, since $(.)I is
the dominating integrable function. It follows that the right side has a limit
aiid so must the left side; i.e., F1(a) e x i s h 2 Since
sin ht
lim -= 1:
h i m ht
(8) reduces to (9). Further, for all z, it is clear that f (z)l < sR
4 ( t ) l dt, aiid
f is bounded. Also expressing q5 by its real aiid imaginary parts aiid the latter
by the positive and negative parts, we deduce that 4 is a linear combination
of four nonnegative integrable functions (since $ 1 is), and (9) implies that f
is a sum of four terms each of which is a Fourier transform of a nonnegative
integrable function. By Proposition 1.4.2, it follows that each of the terms is
continuous, and hence so is f . This proves the theorem completely.
Proof By definition, every d.f. associates a ch.f. with it. If two d.f.s Fl
aiid F2 have the same ch.f. 4, we need to show that Fl = F2. To this end,
since each F, has at most a countable set of discontinuities, the collection of
continuity points for both Fl and F2,say, Co, is the complement of a countable
set, and hence is everywhere dense in R. Let ai E Co,i = 1 , 2 ,a1 < a2. Then
by (81,
Fl(a2) Fl(a1) = Fz(a2) Fz(a1),
- - (11)
since their right sides are equal. If Pi is the Lebesgue-Stieltjes probability
determined by Fi on R, theii (11) implies that Pl(A) = P2(A) for all intervals
A c R with end points in Co. Consequently, PI and P2 agree on the seiniriiig
generated by such intervals. Since Co is dense in R, the a-algebra generated
by this semiring is the Borel a-algebra of R. But the Pi are a-additive on the
semiring, and agree there. By the Hahn extension theorem, they have unique
exteiisions to the Borel 0-algebra of R and agree there. Thus PI = P2 on this
algebra, so that if A = (-oo, z),z E R, which is a Borel set, we get
and the d.f.s are identical. (A direct proof of this for d.f.s. is also easy.)
Proposition 3 Let F be a d.f. and 4 its ch.f. Then for any b > 0, we
have
+ z)
LblF(a - F(a - z)]dz = - eCtoq5(t) dt, a E R. (12)
Proof Replacing 4 by its definition aiid simplifying the right side, exactly
as in the proof of Theorem 1, we get the left side. However, for variety we
present an alternative argument, following H. Cram&, and deduce the result
from Theorem 1.
Let h > O be arbitrarily fixed and consider G ( z ) = J ~~ (~ydylh.
)+ ~ Then
G is a contiiiuous d.f. In fact, if F is the uniform d.f. on (-h, 0), as defined
in the list above, theii G ( z ) = & F(z y) d ~ ( y =
- ) (F * F) (z) is the coii-
volution. Since F is continuous, G is also. Let $ be the ch.f. of G. Then
$(t) = q5(t)$(t),t E R, where $(t) = (1 eith)/ith,$(.) being the ch.f. of F .
-
Hence by Theorem 1,
240 4 Probability Distributions
e-ita e-it(a+h)
G(a + h) - G(a) = liin -
it $it) dt
Proof The necessity follows from our previous work. Indeed, let Fn + F,
a d.f. Then by the Helly-Bray theorem (cf. Theorem 1.2, which is valid for
complex functions also, by treating separately the real and imaginary parts),
with f (z) = eitx,
The converse requires more detail and uses the preceding technical result.
>
Since IFn,n 1) is uniformly bounded, by the Helly selection (Theorem 1.1),
< <
there is a subsequence F,, + F, where 0 F 1 and F is a left contiiiuous
nondecreasing function. We first claim that F is a d.f., using the hypothesis
4.2 Characteristic Functions 241
Since
1cos bt
-
--
2 sin2(bt/2)
t2 t2
is integrable on R,by the dominated convergence theorem we can let k + cm
on both sides of (15) t o get
7r R u2
4 (i)du.
Letting b + +cm and using L'Hopital's rule on the left and the dominated
convergence theorem on the right, we get
Thus F is a d.f., and by the necessity proof we can now conclude that 4 is
the ch.f. of F.
Let IF,;, n' > 1) be any other convergent subsequence of {F,, n I), >
with limit F. Then by the preceding paragraph F is a d.f. with ch.f. 4 again
(since 4, + 4 implying that every coiivergent subsequence has the same
limit). By the uniqueness theorem (Corollary 2) F = F . Hence all coilvergelit
subsequelices of {F,, n > 1) have the same limit d.f. F, so that the whole
sequence F, converges t o the d.f. F with ch.f. 4. This completes the proof.
The preceding remark and the theorem itself heighten the interest in
the structure of ch.f.s. First, how does one recognize a uniformly coiitinuous
bounded fuiictioii t o be a ch.f., and second, how extensive aiid coiistructible
are they? (These are nontrivial.) Regarding the second problem, note that
since the product of two (or a finite number of) ch.f.s is a ch.f. [cf. (3) and
(4)], we may coiistruct new ones from a given set. In fact, if {&, n >
1) is a
>
sequence of ch.f.s aiid ni 0 with Ci,, -
ni = 1, then
is also a ch.f. Indeed, if F, is the d.f. corresponding t o the ch.f. q5,, then
F= n,F, is clearly a d.f., and its ch.f. is given by (16).
The preceding example admits the followiiig extension.
Proof Let G = R.The first part of the hypothesis implies that h(s, t ) I <
h(s,O) = 1 , s E R and t E R, so that the integral in (17) exists. By the
4.2 Characteristic Functions 243
But H, is a discrete d.f., since X , is a discrete r.v. Hence the left-side integral,
for each n, is of the form (16)with a finite sum, so that it is a ch.f., say, $,(.).
By (18) $,(t) + $ ( t ) t, E R, as n + cm.Since h ( s , .) is continuous, it follows
immediately that the right-side integral of (18) is continuous on R.Theorem
4 then implies that 4 is a ch.f. The case when G = Z is simpler and is left to
the reader.
The last part is immediate. In fact, expanding the exponential,
where a,(X) > 0, a,(X) = 1, and $ ( t ) , is a ch.f. for each n. This is thus
of the form (16),so that 4 ( . ) is a ch.f. as shown there. The proof is completed.
Using this proposition, it is clear that we can generate a great many ch.f.s
from those of the list given at the beginning of this section. The first ques-
tion is treated in Section 4. To gain further insight into this powerful tool, we
consider some differentiability properties.
whenever this limit exists. Similarly the pth symmetric derivative, if it exists,
is given by the expression
>
(p 1) ordinary derivative, then it also has the pth symmetric derivative and
244 4 Probability Distributions
they are equal. But the converse is false, cf. the example after the proof. (This
is also a standard fact of differentiation theory.) In our case, f = 4, x = 0.
+
If p = 2m 1, or = 2m, then 2[p/2] = 2 m , m >
1. Since ~ ( P ) ( o )exists by
hypothesis, its symmetric derivative also exists and is the same as 4(")(O).
Thus we have
2m
sin tx
x2" d F ( x ) (by Fatou's lemma)
= 1 x2"' dF(x).
4(t + h ) - 4(t)
exists. The general case for p > 1 follows by induction, or by a similar direct
argument. Note that 4(")(0) = i" JR ZP d F ( x ) = Pap,where a p is the pth
moment of F, if a, exists. This completes the proof.
To see that the first part cannot be strengthened, consider the example of
a symmetric density f given by
where
Then it is not difficult to verify that 4("(0) exists, and 4(3)(0) = 0. But it is
SR
clear that E(1XI3) = 1 xi3f (x) dx = +m. The details are left to the reader.
4.2 Characteristic Functions 245
These expalisions will be useful in some calculations for the weak limit
laws in the next chapter. An immediate simple but useful observation is that
if $ : R + R is any continuous function which can be expanded in the form
+
$(t) = 1 O(I t 1 2 + E ) ,> ~ 0, it will not be a ch.f. unless $(t) = l , t E R,so
that it is the ch.f. of the unitary distribution at the origin. Indeed, if $(.) is
a ch.f. with such an expansion, by the above corollary, a1 = a 2 = 0. If the
second (even, or absolute) moment is zero, then the d.f. colicelitrates at the
origin, proving our claim. In Section 4, we characterize functions which are
ch.f.s, but this is a nontrivial problem, and in general there is no easy recipe
for recognizing them.
To illustrate the power of ch.f.s we present an application characterizing
the Gaussian law. If X I , X 2 are two independent r.v.s with means p l , p2 and
variaiices a:,o; > 0, then Y , = (X, p , ) / a i , i = 1 , 2 , are said t o have a
-
"reduced law," since E(Y,) = 0 and Var Y , = 1,i = 1,2. Note that Yl, Y2 are
still independent. Now if Xi is Gaussian, N ( p i , a:), then it is very easy t o
find that E ( X i ) = pi and Var Xi = a:, i = 1,2. Hence the sum X = X1 X 2 +
+ +
is seen t o be N ( p , a 2 ) ,where p = p l pz and o2 = of 0;. Indeed, using
the ch.f.s we have
)=
q5(t) = ~ ( ~ , t ( x l + X d E (eitxl)E(edX2) (by independence)
upon P6lya's above noted work and the second one is noted by Ibraginov and
Linnik (1971), p. 32.
(ii) Let X , Y be r.v.s with ch.f.s cjx, q5y. If Y is normal N ( p , a'), and
4 x ( u ) = 4 y ( u ) for a bounded countable distinct set of real values u,, n >1
>
and 4x (u,) = 4y (u,), n 1, then 4x (u) = 4y (u) for all u E R so that X is
also N ( p , a2).
Proof (i) If they are Gaussian, then (20) shows that the reduced law of
the sum is of the same form. Thus only the converse is new. This is trivial if
022 = 0. Thus let a: > 0, i = 1,2.
+
Let Yl, Y2 be the reduced r.v.s from X I , X2 and Z be that of X1 X 2 . By
hypothesis, Yl,Y2 and Z have the same d.f., or equivalently the same ch.f.,
= $ (say). If 41, $2, and $3 are the ch.f.s of X 1 , X 2 , and X1 +X 2 , then we
have the relations [since Xi = aiY, + p i , X1 + X 2 = (a: + O ; ) ~ / ~ Z(pl
+ +p2)] :
4((4
43(t) = ei('l~+'lz)t + ai)1'2t) = $h(t)42(t), t E R.
This equation simplifies on substitution for q5j(t) to
Repeating the procedure for each term, we get at the nth stage
where pol p l , . . . ,p, are the coefficients in the binomial expansion of (1+z)" =
CZopixi, which are integers. Because 0 < a < 1 , 0 < P < 1, it is clear that if
4.2 Characteristic Functions 247
log 4(t) =
1
--t2(a2
2
+ p")" + R,,
Rn + P 1 ~ 1 ~ 1 a 2 n p+2.p. .2+ p , 6 , ~ , p ~ ~ ] ,
= t 2[ p 0 6 0 ~ O ~ 2 n
1 6j < 1. Now for each E > 0, we can choose an N such that n > N + 16, u j <
+
€/t2 for given I t > 0. Since a2 P2 = 1, we get IR, < E. Hence (26) for
>
n N becomes
+
I log4(t) t 2 / 2 < R n I < E. (27)
Since the left side is independent of n , and E > 0 is arbitrary,
$(t) = ept2I2. (28)
But (23) shows that (251, (26), and hence (27), are valid for any t , since for
n large enough anPkp't is in a neighborhood of the origin. Thus the result
holds for all t E R,and by the uiiiqueiiess theorem, (28) implies that 4 is a
Gaussian ch.f.
248 4 Probability Distributions
This shows
and Re(f) are continuous on [0,t l ] and f (t) # 0 on [0,tl]. Now extend g by
defining inductively as:
Alternative Proof (& 1&K.L. Chung (1968), p. 241) The argument is based
on the MacLaurin expansion of the log function log z about 1, and is slightly
longer, and is as follows.
(-q-1
Log z = C7 ( - 1 z - 11 < 1
at z = 1. If h(t) = Logf (t) here, then h(0) = 0 aiid h(.) is continuous. Let
a > 0, aiid for E = ;, we find a S > 0 aiid a partition of [-a, a], a = t, <
tp,+l < . . . < to < t l < . . . < t, = a,tj+1 - t j = t l - to < 6, such that
f(t1)- f ( t ) < <
?j f o r t E [ t - l , t l ] , and If(t)-11 = f ( t ) - f ( 0 ) ?j. Hence
Log f = h is well-defined by the series with z = f (t), t E [t-1, t l ] , h(0) = 0.
As a power series, h(.) is continuous and f (t) - f (ti)1 = f (ti)1 f ( t ) - 1I <
i, t E [ti,ti+l], since f (ti)l = 1. We may extend h onto [ti,ti+l] by setting
In the same way it can be extended t o the left so that h(.) is defined on
[-a, a]. As before, we can iterate the procedure t o R = U,,o[-n,n], by a -
compactness. The uniqueness is, as before, immediate andpthe result (the
unique continuous representation of Log f ) follows.
Remark In the first edition of this book, the proof given is analogous
t o the first one above. But an anonymous reader indicated that the series
argument is superior t o the first one. So we included both the methods of
proof that might appeal t o a wider audience.
A useful consequence of the above proposition is given by
4.3 Cramkr's Theorem 251
Proof Since f,, f do not vanish, by the above proposition Log f, and
Log f exist and
Proof We establish (1) for x E R,h > 0 such that n: aiid z h are points +
of continuity of G. The latter set is dense in R.This will prove (1) since both
sides of (1) are continuous in h and x for each 0 < a < 1. To see this, let
(u - x ) / h = v in the left side of (1). It becomes
+ +
as z + xo, since G(z vh) + G(zo vh) a.e. (Lebesgue) as z + xo. Also,
the left side tends t o zero as h + 0, by the bounded convergence. Hence it is
continuous in x and h. For the right side of ( I ) , since G = GI - G2, where
Gi are bounded and nondecreasing with one moment existing, their Fourier
transforms are differentiable by Propositioli 2.8, aiid hence so is g(t). Since
g(0) = 0, this implies g(t) = tg'(0) +
o(t), so that g(t)/t = O(1) as t 4 0.
Regarding the last integral,
= ( a ) 1 x
z+h
(y - x)YdG(y) + ( h Y / a ) G ( x+ h)
(after simplification)
f ( z ) = F 1 ( z )= ,
2:tL2
-it, t z ) dt, z t B.
(5)
and the given asymptotic rate growth implies J[l 1 g(t)/tl dt < oo. Thus
we can apply the second part of Theorem 1 t o get
Proof Since q5(t, u) = q51(t)$z(u), the result follows from (4) because the
>
integrability conditions of (6) are satisfied as t + 0, and for tl M > 0, we
have
The current hypothesis implies that this is integrable on tl > M. The rest is
exactly as in the theorem, and the result holds as stated.
=
1
lim
1 + (iz/T)
-
271 T-a [ ( I + x2)([1+ T 2 + 2 i T ~ j / T ~ ) l / ~
Thus the distribution of the ratio of two dependent r.v.s, neither of which is
N ( 0 , I ) , is again Cauchy.
nature and the third one is more Fourier analytic in content. We therefore
give only a probabilistic proof, which is due to Cram&.
Proof The necessity is simple and classical, due to M. Mathias who ob-
served it in 1923. Thus if 4 is the ch.f. of a d.f. F , then 4 is continuous,
4(0) = 1, aiid
since the integral exists on the compact set [O, T] x [0,TI, and is a limit of
the finite (Riemann) sums Ci Cj q5(ti t j ) f (ti)f (tj)AtiAtj. But the latter
-
so that
<
1 z N, and = 0 for I zl > N . Then $INpT : R + R+ is contiiiuous, aiid
) 1 as N + oo,for each z E R.Hence
$ ~ ( zjl
where we used the dominated convergence theorem t o move the limit inside
) ~integrable] aiid then the continuity of 4 at
the integral [since ( s i n ~ l v is
t = 0, plus the fact that $(O) = 1, implying cjT(0) = 1, and Iq5(t)l 5 1 (cf.
Proposition 3 below.)
(ii) To see that 4T is the ch.f. of p ~ we, use the just established integra-
bility of p ~ and
, the dominated coilvergelice t o conclude
4.4 Bochner's Theorem
= lim
N-oo
S, e""$N(x)m(x) dx [$.N (.) is defined above for (4)]
Proof If bij = q5(ti t j ) and a = ( a l , . . . ,a,)t, then (1) implies that the
matrix B = (bij) is positive definite; i.e., using the inner product notation
aiid t for conjugate traiisposition of a vector or matrix aiid B* for the adjoiiit
of B, we get by positive definiteness:
4(Ba,b) = ( B ( a + b ) , a + b) - ( B ( a b ) , a b)
+ i ( B ( a + ib), a + ib) - i ( B ( a - ib), a - ib),
= 4(B*a, b) [by (5)].
of B implies that each of its principal minors has a positive determinant. Thus
1 $(t)I2 5 4(0)', which is (iii). That (iv) is true follows from (1) itself (or also
from (5)).
For (v) coiisider the third order principal minor Bj of B. Excluding the
trivial case that 4(0) = 0, we may and do normalize: 4(0) = 1 [since otherwise
$(t) = $(t)/q5(O) will satisfy the conditions]. Then the determinant of B3 is
which tends t o zero uniformly as t + tl. This establishes (v), and the proof is
complete.
p,(x) = iT
iT cj(u - v)ei(u-v)z d u d v > O , xER.
The proof is essentially the same as that of Theorem 2 with simple alter-
ations and is left t o the reader.
In view of Proposition 3, it is natural t o ask whether the contiiiuity hy-
pothesis can be eliminated from Theorem 2. F. Riesz in 1933 has shown that
this is essentially possible, but we get a slightly weaker conclusion. A precise
version of this statement is as follows.
This inequality holds for any vector ( t l , . . . ,t,) E Rn. Hence integrating the
expressioiis on both sides of (7) relative t o the n-dimensional Lebesgue mea-
sure on the compact n-rectangle [O, NIn c Rn and using Proposition 3, we get
on the left for the diagonal terms nq5(0)Nn = n N n , and for the non-diagonal
terms [there are n ( n - 1) of them, and q5 is a bounded measurable function]
n ( n - 1 ) ~ " ~ $(t
~ lN
lN - tl)e'"('-'1 dt dt'
Coiisequeiitly ( 7 ) becomes
But this is 2 ~ N p ~ (ofx )(3), and we can use the argument there.
Thus if 4N and p~ are defined as in that theorem, then pN(x) 0 and >
4~(t= ) [1 ( t /N)]q5(t), where [1 (I t I N ) ] actually defines the ch.f. of
- -
1. Now the work leading t o (4) implies that 0 5 JRPN(x)dx 5 Mo < cm,
uniformly in N . Next consider for any u E B
= lim
K-oo
1
T
/ 1"4~
0
(t + (a) sin v
2
dt du [by step (ii) after (411
= lim
K-a
11
R v/K
u+v/K
4 N ( r ) dr &)
sin v 2
dv
5 ~ lb
gu(x) d(GNk- G) (x) + ifif0s ~ { g u ( x ): x $ ( a )
The last term can be made small if a , b are chosen suitably large, aiid then
the integral on the compact set [a,b] goes t o zero by the Helly-Bray theorem.
(12)
But the right side of (11) is J:4(t) dt. By this argument we deduce that
>
each convergent subsequence of {GN, N 1) has the same limit = J : $(t) dt,
which is absolutely continuous. It follows that the limit fuiictioiis of G N differ
by at most a constant, i.e., dGN + dG for a unique G, aiid hence their
Fourier trailsforins converge aiid if Jp:eitx dG(x) = $(t), then $(t) dt =
$(t) dt, u E R. Thus $(t) = $(t)a.e. But $ is continuous and $(0) = 1 =
$(O), so that $ is a ch.f. by Proposition 3. This proves the main part of the
theorem.
4
For the coiiverse, let 4 = a.e., where 4 is a ch.f. aiid 4(O) = 1. We
again form pT(x) as in (3) with q5. Since the Lebesgue integral is unaltered if
we replace the integrand by an a.e. equivalent function, p~ remains the same
if $ is replaced 4. Consequently (3) implies p~ > 0, and (4) shows p~ is a
probability density. Also, & is the ch.f. of p ~ aiid
, hence is positive definite.
The same is true if q 5 is~ used in place of &, aiid q 5 is
~ continuous, being
the product of the bounded function q5 aiid the ch.f. defined there. Thus
~ positive definite and cjT + $ pointwise as T + cm. Since a pointwise
q 5 is
limit of a sequence of positive definite functions is clearly positive definite,
it follows that 4 is positive definite. Since 4 is not necessarily coiitiiiuous at
t = O,4 is not generally a ch.f., as simple couiiterexainples show. This proves
the coiiverse aiid the theorem is established.
Even though we used the continuity theorem for ch.f.s as a key result in
the proof of Theorem 2, one can establish that the continuity theorem is a
consequence of Bochner's theorem. Thus there is a certain equivalence be-
tween these results. Let us now establish this statement.
4.4 Bochner's Theorem 263
aiid similarly 4 is the ch.f. of p . Since K" is separable, we can use the same
reasoning as in Helly's selection principle and conclude that there is a subse-
quence p,, + ,Ci and the compactness of Kw implies is a probability measure
(no mass "escapes" t o infinity). By the Helly-Bray theorem (also applicable
4
here), q5,, (t) + & t ) , the ch.f. of ,Ci. But then = q5, aiid so by the uniqueness
theorem p = j i Repeating the argument for each convergent subsequence, we
deduce that p, + p. (If we had extended Theorem 1.5 somewhat, this could
have been immediately deduced from it.) We still have t o prove the result for
the Pn-sequence.
Let R = U, A,, where A, are disjoint bounded Borel sets whose bouiid-
aries have zero P-measure. If C c R is any closed set, then (& denoting the
closure of Ak)
i n sup P (C) = lim sup
n
x
k>l
P, ( C n Ak) < x
k>l
liin sup P, ( C n zk). (15)
2 64 4 Probability Distributions
l i m s u p ~ , ( ~ n & )= limsupP,
n n
07~(7(~nz~))
= lim sup pn ( T ( C n &))
n
4(t) = 4(0) + /f
0
t
( u )' u , > 7' (17)
and 4(0) = 1. Here f is the right derivative of 4, which exists and is non-
decreasing. Since 4 is decreasing by hypothesis, f must be negative on R+
4.5 Some Multidimensional Extensions 265
and lim,,, f (u) = 0 [limt,, 4(t) = 0 being the present condition]. We now
complete the proof (using a remark of K. L. Chung) by reducing the result t o
Proposition 2.5.
As noted in Remark 2, after Theorem 2.4, we know that hl (t) = 1 I t 1 -
then for each t , h(.,t ) is also a ch.f., aiid h(s, .) is coiitiiiuous for each s . We
now produce a d.f. H on R using (171, and then get a mixture of h and H t o
represent 4 and complete the argument.
Since f in (17) is increasing, consider H(s) =:J t df (t) = 1 - $(s)+sf (s).
>
Then H(.) /' 0 is a d.f. because 4(s) + 0, as s + oo so that f (s) = o(spl),
implying lim,,, >
sf (s) = 0. Hence for t 0, if
The preceding results for ch.f.s are all on R or the (discrete case) inte-
gers. Some of these extend immediately t o k dimensions, but others, such
as Bochner's theorem, are more iiivolved. In this section we indicate these
possibilities, but prove a generalization of Bochner's theorem using the pro-
jective limit method. This section may be skipped on a first reading.
266 4 Probability Distributions
is a d.f., where AF, is the (n-dimensional) increment of F,. The ch.f. is defined
as usual as
Hence
Using this result, for instance, the inultidimensioiial continuity theorem for
ch.f.s caii be reduced t o the one-dimensional case. We sketch this argument
here. If P, + P, where P,, P are the Lebesgue-Stieltjes probabilities on R k ,
then by the k-dimensional Helly-Bray theorem (same proof as in the one-
dimensional case) the corresponding ch.f.s converge t o that of P. Thus for the
sufficiency, the above (reduction) technique caii be applied.
If 4, is the ch.f. of a k-dimensional distribution (= image probability)
P,, then, by the (multidimensional analog of) Helly selection principle, there
exists a a-additive bounded measure P on Rk such that P,(S) + P(S) for all
Borel sets S c Rk such that ~ ( 8 s =)O,8S being the boundary of S. On the
other hand, for each fixed t l , . . . , t k j # 0 in R k ) ,& ( t ) = & ( t t l , . . . , t t k ) +
4 ( t t l , . . . , t t k ) = $ ( t ) , t E R, and 4 is contiiiuous at t = 0. Hence by the
one-dimensional coiitiiiuity theorem $(.) is a characteristic function. If S,,, is
given by (4) as a subset of R y w i t h a, = t, here), then Fz(z) = P,(S,,,) +
P(s,,,), aiid F,"(z) + F0(z) at all continuity points z of F", a d.f. with 4
as its ch.f. Now let z + oo (because Fa(+oo) = I ) , it follows that
P ( R k ) = lim P(s,,,)
= F0(+oo) = 1.
00'2
Hence P is a probability function, and then 4 will be its ch.f. Next, by the
familiar argument, with Theorem 1, we conclude that each coiivergent sub-
>
sequence of {P,, n. 1) has the same limit P, and thus the whole sequence
converges t o P. This gives the multidimensional continuity theorem for ch.f.s
Let us record another version of the above proposition which is very use-
ful in some applications. In particular, we caii deduce a classical result of J.
268 4 Probability Distributions
Consider a line eQthrough the origin making an angle 8 with the x axis. Then
the projection of (x, y) on this line is given by
is known. Since the vector ( X , Y ) has a bounded range, the measure deter-
mined by the d.f. F has a compact support. Thus its Fourier transform 4 is an
analytic function (since 4 is continuously differentiable; also see Problem 6).
But this transform is known for a convergent subsequence {Q,I),/>~,so that
(9) implies, by the method of analytic continuation, that q5 is defined uniquely
>
on all of the complex plane C. This shows that {q5e (.), 0 E (Q,, n 1)) deter-
mines 4, and hence F, as asserted.
where T%S the cube with a side [0,2n] i n Rk and p i s a probability o n the
Bore1 sets of T', iff q5 i s positive definite and $(0) = 1, i.e., iff q5 is a ch.f. o n
4 Probability Distributions
for each Borel set A of T" then p~ is a probability, since q5(O) = 1. More-
over, the orthogonality relations of trigonometric functions on the circle T , by
identifying 0 and 27r, give
if jTl< N, r l l. . . k
={0
j (1 - ) otherwise.
=
(13)
This is (10).
The uniqueness can be obtained from Theorem 1. Alternatively, since T%S
k
8, E T, j E Z k )
compact, we note that the set of functions {exp(i C T = ljrQr),
is uiiiforinly dense in c(T'"), the space of continuous complex fuiictioiis on
T< by the Stone-Weierstrass theorem. This implies the uniqueness of p in
(10) at once, completing the proof.
4.5 Some Multidimensional Extensions 271
The preceding result will now be given in a general form using Theorem
3.4.10 in lieu of the Lkvy continuity theorem for ch.f.s. If S is a set, then
TS(= xtEsTt,Tt = T ) also denotes the space of all functions defined on S
with values in T . Since T = [0, 2x1 is compact, T' is then compact under the
product topology and, with compoiientwise addition modulo 2x, it becomes
a compact abelian group. We can thus express Q E T S as 6' = (Q(s),s E S),
so that the coordinate projection p, : 6' H Q(s)is a mapping of T S onto T .
Let B be the smallest a-algebra (=cylinder algebra in the earlier terminology)
with respect t o which each p,, s E S, is measurable when T is given its Borel
a-algebra 7.Thus B = o(U, p l l ( ' T ) ) . Let Zbe the set of mappings n : S + Z
such that n(s) = O for all but a finite number of s E S . With compoiientwise
addition, Z becomes an abelian group (the "dual" of TS.).
Using this terminology, we have the following simple but crucial extension
of the preceding result:
Proof First note that the integral in (14) is really a finite-dimensional one
since n(s) = O for all but a finite subset of S. Let us define for each finite set
F c S,
ZF = {n E Z : n ( S - F) = 0).
Clearly ZF is a group (isomorphic t o RF). Hence by Proposition 5 applied t o
T F , there is a unique probability PF on the Borel 0-algebra of T F such that
If 3 denotes the directed (by inclusion) family of all finite subsets of S, then
the uniqueness of PF in (15) implies that {PF,F E 3)is a compatible family
of Borel (or Radon) probability measures on {TF, F E 3 ) . Now by Theorem
3.4.10 there exists a unique probability P : B + [O, 11 such that pop;' = PF,
where p~ : T S + T F is the corresponding coordinate projection. This means
we can replace T F and PF in (15) by T S and P, so that (14) holds. This
completes the proof.
will be given here, as this is easy. If the "duality" is assumed, then the proof
carries over to the general locally compact abelian groups. However, for sim-
plicity we restrict our treatment t o TKk here. [A similar idea will be employed
in Section 8.4 in discussing a subclass of strictly stationary processes.]
Recall that if G is a locally compact abeliaii group, then a coiitinuous ho-
momorphism a of G into the multiplicative group of complex numbers with
absolute value one is called a character of G. Thus a : G + C satisfies (i)
+
a ( x ) 1 = 1, (ii) a ( x y) = a ( z ) a ( y ) ,x, y E G (with +
as group operation),
and (iii) a is continuous. If G = &tk, the additive group of k-tuples of reals,
then we can give the followiiig simple and explicit description of these char-
acters. In addition, the set of all characters on G, denoted G(C T" endowed
with the product topology), will be identified.
A consequence of the preceding two results is the following. Since the space
T can be identified with 0 < B < 27r (with group operation addition modulo
2 ~ )or
, equivalently with the group of all complex numbers of absolute one,
4.5 Some Multidimensional Extensions 2 73
Thus assume (c) and let $,(t) = ~ ( e " ~ "By ) . independence, if gn is the
ch.f. of Sn,then $,(t) = ni=, 4k(t), t E R. Since Sn + S in distribution, the
continuity theorem for ch.f.s implies $,(t) + $(t), t E R, and $ is the ch.f.
of S. Hence there is an interval I : a 5 t 5 a , a > 0, such that $(t) # 0.
Let t be arbitrarily fixed in this interval. Then for each E > 0, there is an
>
no = n o ( & , t )such that n no + I$,(t) - $ ( t ) < E, and by compactness
of [-a, a ] , we can even choose no as a function of E alone (but this fact will
not be used below). Thus $,(t) # 0 for all n > no. The idea of the proof
>
here is t o consider {e"sll ,n no} and show that it converges a.e., and then
deduce, by an exclusion of a suitable null set, that S, + S pointwise on the
4.6 Equivalence of Convergences 275
complement of the null set. The most convenient tool here turns out to be the
martingale convergence theorem, though the result can also be proved purely
by the (finer) properties of ch.f.s. We use the martingale method.
Let Yn = edts kn(t) aiid 3, = a(S1,. . . , S,). Then we assert that
>
{Y,, Fn,n 1) is a uniformly bounded martingale. In fact Yn = l/l$, (t) 5
(I$(t) - < cm, where E may and will be taken smaller than min
{I$(s) : -a 5 s 5 a } > 0. Next consider
are constants, we deduce that ezts"(-') + p,(w) for each w E R Nt, where -
?
, (w) = Y, (w)+(t). From this one obtains the desired convergence as follows.
Consider the mapping (t, w) H ezts~l(-').This is continuous in t for each w
aiid measurable in w for each t. We assert that
These properties imply the result. Indeed, if they are granted, then from
+
the form of the exponential function one has f, ( t t') = f, (t)f, (t'). Since
+
this is true for each t , t' in I, for which t t' E I,, it follows that f, satisfies
the classical Cauchy fuiictioiial equation (cf. Proposition 6.7 or Problem 23
for another argument), and since If, (t)1 = limn leitslb(w) = 1, the solution
is f,(t) = eit"(,) for some a ( w ) E R.Hence f,(t) # 0 for t E I, and it is
continuous for all t E I, (whence at t = 0), so that eitsr7(") = f,,,(t) + f,(t)
as n i cm for all t E R, and f, is a ch.f. (of a unitary d.f.) for each w E
Ro. Therefore the hypothesis of Corollary 2.10 is satisfied, aiid so tS,(w) =
i Log f,,,(t) + i Log f,(t) = ta(w),t E R. It follows that S n ( w ) +
a!(w), w E Go, and so Sn + a! a.e. But then Sn + a! in distribution and
by hypothesis S, + S in distribution. The limits being unique (in the Lkvy
metric), we must have S = a! a.e., and (a) follows. Let us then establish (i)
aiid (ii).
The joint measurability does not generally follow from sectional measur-
ability. Fortunately, in the present case, this is quite easy. By hypothesis,
2 76 4 Probability Distributions
Now that the basic technical tools are available, we proceed to develop the
key results on distributional convergence in the next chapter. Several useful
adjuiicts are given as problems below.
Exercises
1. Let M be the set of all d.f.s on R,aiid d(., .) be the Lkvy distance, as
defined in (11) of Section 1. Show that d(., .) is a metric on M x M + R+
and that ( M , d ) is a complete metric space.
where d is the Lkvy distance. Verify that the metric dl is stronger than the
metric d2 in the sense that if X, + X in d l , then the same is true in d2. Give
an example to show that the converse is false, [i.e., convergeiice in distribution
doesn't imply convergeiice in probability].
Exercises 277
4. The alternative proof of the Helly-Bray theorem given for Theorem 1.5
extends t o the following case. Let C be the algebra generated by the open
>
sets of R, and let {P,P,, n 1) be finitely additive "probabilities" on C .
Suppose that for each open set A such that P ( d A ) = 0 we have Pn(A) +
P ( A ) , where dA is the boundary of A. Then show that for each real bounded
contiiiuous f on R
aiid f,(x) = 0 for x 5 0, i = 1,2. Show that, with a calculus of residues com-
putation, JR+ xn s i n [ ( & / 2 ) ~ ~f/l ~ >
( x] ) dx = 0 for all integers n 0. Deduce
that f l , f 2 are densities having the same moments of all orders, even though
f l f f2 o n R + .
(b) If the r.v. X is normally distributed, N(0, I ) , and Y = ex, then Y
has a d.f., called the log-normal, and the densities are
278 4 Probability Distributions
are both densities with the same moments of all orders even though fu # gy
on R+.
12. Strengthen the conclusion of Corollary 2.10 as follows: If (4, $,, n 1) >
is a sequence of ch.f.s such that &(t) + 4 ( t ) , t t R, theii the conver-
gence is uniform on each compact set of R. [First show that the set of
fuiictions is equicontiiiuous; i.e., given E > 0, there is a 6, > 0 such that
I t - t'l < 6, + f (t) - f ( t ' ) < E for all f E (4, $,, n I).] Suppose >
that none of the 4 in this set vanishes. Show that Log &(t) + Log $(t) iff
&(t) + q5(t), t t R, and the convergence is uniform on compact intervals in
both directions.
13. This problem contains a simple extension of the Lkvy inversion formula
with the same argument. Thus let F be a d.f. with 4 as its ch.f. If g : R + R is
an absolutely (i.e., g, gl are) Riemann integrable function such that for each
z t R,
*
liin g (z h) = g (z 0)
hi0
*
exists. Show that we then have
lo ( t t = h ( t ) d t , u t R.
16. We present three important facts in this exercise along with sketches
of their proofs.
(a) Let X I , . . . , X n be i.i.d. N ( 0 , l ) r.v.s and set X = ( l / n ) Cy=l Xi. Show
that X and {(Xi - X),i = 1 , . . . , n} are independent. [Use Problem 15, and
the algebraic identity
and T = ( l l n ) C r = l ti.] Show that the r.v. V = C r = l (Xi - XI2 has a gamma
distribution whose ch.f. q5 is given by q5(t) = (l-2it)p(np1)/2. [Use the identity
and the fact that the left side has a ch.f. t H (1 - 2it1-,/~. This result is
important in statistical inference where V/(n-1) is called the sample variance
and X, the sample mean. Just as in Proposition 2.8, it can be shown (with
further work) that the independelice property of X aiid V characterizes a
normal distribution. See below.]
(b) It suffices to establish the above converse for n = 2. Thus let X I , X2 be
i.i.d. with F as the common d.f., having two moments. Let Y = X1 X2 and +
Z = X1 - X 2 . Show that if Y and Z are independent, then F is N ( p , a2)and
this essentially gives the last statement of (a). [Let q5y, q5z aiid $ be ch.f.'s
of r.v.s Y, Z and d.f. F. Then the independence of Y, Z +- q5y ( ~ ) q 5 ~ ( = s)
+
$(s t)$(s t) aiid so
-
Now use the fact that F has two moments, so that by Proposition 2.6, $
aiid hence f , is twice differentiable. Thus differentiate (**) twice relative t o
t and set t = 0 t o get ffl(s)= -a2 where f f f ( 0 ) = a2,the variance of F.
Since f l ( 0 ) = ip, the mean of F , the solution of this differential equation is
+
f (t) = 02t2/2 p t , SO that $(s) = exp(ip a2t2/2) as asserted.
-
Remark: The result is true without assuming aiiy moments. But then we
need a different method. The coiiclusioii holds even if X I , X 2 do not have the
same distribution. Then gx1(s)= e, L I S + O ~and S ~ /gx,
~ (s) = e - ' l ~ + u a s a / 2 .
See Stromberg (1994), p. 104. The above argument is a nice application of
Propostions 2.6 and 2.9.1
(c) The independence coiicept is so special for probability theory, even for
Gaussian families it has distiiict properties. Thus let X I , . . . , X, be N ( 0 , l )
random variables. Then they can be uncorrelated without being independent,
or pairwise (or "m-wise, m < n) independent without mutual independence.
Verify these statements by the following examples with n = 2 and 3.
(i) Let X1 be N ( 0 , l ) aiid X 2 = X l x I X1xIc where I is an interval
-
so that they are not jointly normal or Gaussian and are not independent.
(ii) Let X I , X 2 ,X3 be each N ( 0 , I ) , but with a joint density
where X > 0, a > 0, 2 E R,y E R+.Using Corollary 3.3, show that the
distribution of x y p 1 I 2 has a density f given by
i.e., r is the "ch.f." of P. (Consider Herglotz's lemma. See Section 8.5 where
the use of such r is discussed further.)
20. Suppose that {Xt, t E R) is a set of complex r.v.s each with mean
zero and variance one. Let f ( s , t) = E(x&) and be of the form f ( s , t) =
r ( s - t). This is the continuous parameter analog of the above problem. If r(.)
Exercises 283
where A has Lebesgue measure zero. [Consider the Riesz exteiisioii of Bochner's
theorem. In both problems, finite variances suffice, in which case P ( . ) is a fi-
nite (Borel) measure, but not necessarily a probability.]
Hence 4, (t) + 1 for t < 2a, aiid since R c U ,,, (-nu, n a ) , the result holds
first for I tl < nu, n > 1, by induction, and then Tbr all t .
Remark The above result is true somewhat more generally, namely: If
&(t) + $(t) for all tl < a , and q5 is the ch.f. of a d.f. for which the m.g.f.
also exists, then &(t) + 4(t) on all of R.See Problem 34 below for a more
analytical statement.]
and h,(z) = 0 for z 5 0. If p, aiid a: are the mean and variance of Y,, show
theii that, as an application of the continuity theorem for ch.f.s,
What happens if o,/p, f i 0 (The ratio o,/p, is called the coeficient of vari-
ation of YT aiid in statistics it is sometimes used t o indicate the spread of
probability in relation t o the mean. In contrast t o Problem 21, ch.f.s can be
directly used here, and by Proposition 2.9, Log 4, is well defined.)
< <
(a) 4(t) = exp{-ItT}, 0 r 2, (see Example 5.3.11 (ii) later),
(b? 4(t? = (1 + I tI?r11
(c) 4 ( t ) = l I t if t <
i ; = i l t r l if I t 2;.
+
[If $(t) = 1 o(t2) as t + 0, then it is a ch.f. iff $(t)
+
-
1. Also g : R+ + R+
is convex iff g(x) = g(0) Jf h(t) dt, where h(.) is nondecreasing. Use this in
part ( a ) . In ( a a ) , if 1 < r < 2, show that 4(t) = lim,[$(t/nll')]n, where $ is
the ch.f. of the so-called syininetric Pareto density, p : x H ( r / 2 ) zlr('+l) if
1x1 > 1; = 0 if I xi < +
1; and so $(t) = 1 c,l t T O(t2) as t + 0, but P61ya7s
-
f (x7 y) = 2 4
1
-p2)1/2 . exp {- 2(1
1
- p2)
(x2 - 2pxy + y2)
Show that, by the image law theorem,
286 4 Probability Distributions
Show that for each such a , Fa is a d.f. with the same marginals Fx,Fy. [It
>
should be verified also that the increment A F a ( x , y) 0 for all (x, y) E R2.]
Thus the marginals generally do not determine a unique joint d.f.
30. Let X I , X 2 , . . . be i.i.d. random variables each with mean zero and a
finite variance 0 < o2 < cm. Let Sn = EL=, X k . Suppose that the coinmoii
ch.f. q5 of the X, is integrable on R.If a E R aiid 6 > 0, show that, by use of
the Lkvy inversion formula,
and
C P[1 % > 2711 < C(p[sn+ zhp1l> n] + P[IZI > nh]),
n>l n) 1
Use the above inequalities for q5(t),and choose 6 > 0 so that for some constants
B7 ( 3 7
dt + O(1).
By a careful estimation, show that these integrals are bounded. There is some
delicate estimation here, which proves that (*) is bounded, aiid thus the result
follows.]
(b) If {an, n >1) is the set of all discontiiiuity points of the d.f. of X ,
then show that we have an extension of the above result as
[Hints: Let Y be another r.v. which is independent of X and which has the
same d.f. F as X . Then 4(t) . = ~ ( e ~ ~ ( But
~ - the ) .G of X Y is
~ )d.f. -
where F ( y ) = P [ - Y < y]. By (a), the left side (above) gives the discon-
tinuity of G at x = 0, and for this value show that the integral for G is
C n ( P [ X = an])2,since a, is a discontiiiuity of F iff a , is such for F.]
and for 0 < x, y < a , since 0 < xZn < (x + y)2n, we get from this
34. If 41, 4 2 are ch.f.s on R,which agree on (-a, a ) , a > 0, and if one of
them is regular, then they agree everywhere. (Compare with Exercise 9. This
is a form of a classical result due t o J. Marcinkiewicz, and the proof depends
on complex function theory. Compare with Proposition 2.8 (ii).)
+
35. An r.v. X is of lattice type if its range is of the form {a k p : k =
0, *I, f2 , . . . , P > 0 and a real }. Show that its ch.f. 4x is periodic of period
/3. If P = 2 ~then ,
4 Probability Distributions
The strong (or pointwise a.e.) limit theory of Chapter 2 naturally leads t o
the distributional convergence of random sequences. Such a shift in viewpoint
enabled an enormous growth of probability theory. This chapter contains a
general outline of this picture. It starts with the classical central limit theo-
rems of Lkvy and Liapounov and contains their modern versions as well as
an error estimate of Berry and Essken. Some aspects of infinite divisibility
together with the Lkvy-Khintchine representation and stability are treated.
The invariance priiiciples of Donsker aiid Prokhorov are discussed, and two
important applicatioiis are included. Further, Kolmogorov's law of the iter-
ated logarithm and related results are given. Applications and extensions t o
m-dependent sequences establish the generality and limitations of invariance
principles. The tools developed in Chapter 4 are essential here. The material
in this chapter represents a central aspect of analytical probability theory,
soine of which will be essential for Part I11 of this book dealing with soine
important applications. In fact, much of additive process analysis in Chapter
8 depends upon the work of this chapter.
Remark It was noted in the last chapter (Problem 10) that a ch.f. can
have a derivative at the origin without the d.f. having a finite mean. Thus the
hypothesis here is weaker than that of Theorem 2.3.2. That proof does not
apply. (However, the proof there illustrates the truncation technique which is
useful for other results.)
Theorem 2 Let {Xn, n > 1) be i.i.d. random variables each with mean
p and variance o2 > 0. If S, = EL=1 X k , then
= fi
1=1
E (ex { x - p)}) (by independence)
Since t H ept2I2 is the ch.f. of the standard normal N ( 0 , l ) d.f., the result
follows by the continuity theorem for ch.f.s, completing the proof.
A natural question is t o consider the case that the X, are not identically
distributed; if this is solved one tries t o extend the results for sequences which
form certain dependence classes. We consider the first problem here in some
detail aiid aspects of the second (dependent) case later. It is noted that the
above theorem fails if the i.i.d. hypothesis is omitted. (See Example 4 be-
low.) The early treatment of a nonidentically distributed case was due t o A.
Liapounov in 1901, and his results were generalized in the works of J. W. Lin-
deberg aiid W. Feller. This circle of results attained the most precise and
definitive treatment in the theory of illfinitely divisible distributions. First we
present Liapounov's theorem, since his method of proof has not only admitted
a solution of the central limit problem, as indicated above, but contained a
calculation for the speed of convergence in the limit theory as well. Here is
the convergence result.
lim P
nicx [ Sn - E(Sn)
a(sn)
<z
I ,(2+1/2
L e-t2/2
dt, z c R, (3)
2 94 5 Weak Limit Laws
so that
<
since 01, pk by Liapounov's inequality (cf. Corollary 1.3.5 with r = 2, s = 3,
aiid p = 0 there). But pk/a(sn) <
p(Sn)/a(Sn), which tends t o zero as
n + cm. Thus there exists an no such that n >
no + pk/a(Sn) < 1 and for
fixed but arbitrary t , ykl <
$ since y k + 0 as n + cc for each k. Now
<
where O4 = 04,' aiid 041 1. Hence we can take the logarithms of $I, and 4k
(cf. Proposition 4.2.9) t o get
5.1 Classical Central Limit Theorems
Hence limn,, $,(t) = exp(-t2/2), and the proof is completed by the conti-
nuity theorem.
and
P [ X k = 0] = 1 - ( l l k ) .
Then E ( X k ) = 0 , 0 2 ( x k ) = 1, so that a2(sn)= n, where Sn = EL=1 Xk
and p"Sn) = E;=, kV2. Thus p3(sn) is asymptotically of the order n5I2 (by
Euler's formula on such expressions). Hence p 3 ( s n ) / a 3 ( s k )f i 0. On the other
hand,
Therefore
Hence the limit distribution of Sn/+ exists, but it is not normal, even though
all the X k have the same means and the same variances, but not identically
296 5 Weak Limit Laws
+
For the validity of Liapounov's theorem it suffices t o have 2 6,6 > 0 mo-
ments for the r.v.s. Then define pi+6= E ( xt12+" , p 2 + 6 ( ~ n )= pi+6.
The sufficient condition of the theorem becomes
lim [p2+"(s,)/a2+"S,)] = 0.
n+cx
If the X, are i.i.d., and the rest of the hypothesis is satisfied, let Var X1 =
02,p3 = E ( X1 - p 1 3 ) , so that p"S,)/a"S,) = (p/a)"l/fi). Under these
conditions we deduce from (6) the following:
Proof Since the result is clearly true if the integral is infinite on the right
side of (8),let it be finite. Then by definition
since H ( f ca) = 0, and integration by parts is used. Hence for any fixed but
arbitrary ao E R (t # 0),
2(1 - cos T z )
dz
2 sin2(Tz/2)
H (x + ao) dx
=2 k (T)sin v
H~, (g) dv;
It is now necessary t o find a lower estimate of the left side of the integral of
(111,after an analysis of the behavior of H . This involves some computation,
and we now set down the details.
If a = s u p z E ~H(z)l,then *a is attained at some a E R,i.e., H ( a ) = f a
+ <
(by left continuity), or H ( a 0) = a . Indeed, since 0 < a 2, there exists a
sequence {x,, n > 1) c R c PS with a convergent subsequence z,, + a E PS.
But H(z) + 0 as z + *oo, and a > 0, so that a E R,i.e., a must be finite.
Thus there is a subsequeiice {znl,j >
1) c {z,, , i >
1) such that znl + a
and H (znl) + a or H (znl) + a . Consider the first case: H (znl) + a. Now
{znJ, j> 1) must have a further subsequeiice which converges to a from the
left or from the right. In the former case, by left continuity H ( a ) = +a. In the
+
latter case H ( a 0) = a. Also, by the Fatou inequality, and the continuity of
G7
1 - cos T z
dz [by (1211
I - cos T z
dx
x2
O0 sin2 v
dv.
= 2 a LT,l2
~ 7
But then the left side of (11) simplifies, using (13) and (14) (set a0 = b there),
as follows:
2 1sin2 v
--Hb
R v2
(F) dv, since Hb can be negative,
1 cos T z
-
= Hb(x) Tx2 dx
1 - cos T x 1 - cos T x
Hb(z) Tx2 Hb(z) Tx2 dz
2 2MP (1 O0
PTl2
sin2 u
>PI
I cos T z
-
T 2
2a (r 1 - 3
O0
PTl2
sin2 u
dv) [by (13) and (1411
2a (: +)- (since
sin2 v
7 1
< 7).
300 5 Weak Limit Laws
Putting this in (11) and transposing the terms, we get (8). In case H ( a ) = -a,
with the reverse inequality of (12)' we get the same result after an analogous
computation. This completes the proof of the lemma.
Let us specialize this lemma t o the case where G = @ aiid F is the d.f. F,
of our normalized partial sum (S, - E(S,))/a(S,). To use the result of (8),
one needs t o find an upper bound for the right-side integral involving h(.),
which is the difference of two ch.f.s. This is obtained in the next lemma.
Proof It may be assumed that the means pn are zero. Let &(t) =
E(eitx"). Then by Corollary 4.2.7, we can again use the computations of
Liapouiiov's proof, i.e. of (4) aiid (5), if, with the iiotatioiis there, we write
where
+
and if I t < a(Sn)/2p(S,), we get I ykl < (118) (1148) < 112. Thus for t in
this range with the Q i <
1, one gets on using a k pk <
2
-
- --
o;t2
2a2(S,)
+-olp;l t13
6a3(S,)
Hence, summing on k ,
Log &(t) =
t2
-
2
+ Q5
97 p3(Sn) t 1 2
~ m
Let us now simplify the left side of (15) with the estimate (16). Here we
use the trivial inequality I e" - 1I <
I zlel"l. Thus
5.1 Classical Central Limit Theorems
Next we extend the range of t as given in (15). For this we symmetrize the
d.f. Thus let X l be an r.v. independent of Xk but with the same d.f. Then
X k xi has for its ch.f. ( 4 k I 2 , and since (by coiivexity of (zI3)
-
E ( I X k - ~ f i 1 3 ) < 8 p E , Var(Xk-X;)=2a;
(with zero means), we have on writing I q 5 k 2 in place of cjk in (41, and using
the above estimates for the variance and third absolute moment there (this is
simpler than squaring the value of 1 q5k))
Thus (17) and (19) together establish (15), with C2 = 16, and the lemma
follows.
then by ceby5ev's inequality F,(z) < l/z2 if z < 0 and > 1 (1/z2) if
-
z > 0. The same is true for @(z)= P[X < z].Hence H = F, @ is Lebesgue
-
integrable first on [I XI > E ] and, being bounded, also on [-&,&I for E > 0,
thus on R itself. Let T = [a(Sn)/p(S,)]3 in Lemma 8. Since the hypothesis of
Lemma 7 is also satisfied, we get
where Co is the above constant. This is (6) and the proof is complete.
Remark It should be noted that the various "standard" tricks used in the
above estimates have their clear origins in Liapounov's proof of this theorem,
and thus the important problem of error estimation is coiisidered there for
the first time. Taking the X k as Bernoulli r.v.s in the Berry-Essken theorem,
one may note that the order of (p3/a3)(S,)cannot be smaller than what we
have obtained in (6). Under the given hypothesis, it is "the best" order of
magnitude, though Co can (with care) be improved. Also, if the limit distri-
bution is different (but contiiiuous with a bounded density such as gamma),
Lemma 7 can still be used. For an up-to-date treatment of this subject, see
R. N. Bhattacharya and R. R. Rao (1976). For another account including
the multidimensional problem, see Sazonov (1981), Springer Lecture Notes in
Math. No. 879.)
In the central limit theorems of Lkvy and Liapounov, we coiisidered the
< <
independent sequences {Xk/a(S,), 1 k n}, n = 1 , 2 , . . . , and their partial
5.1 Classical Central Limit Theorems 303
sums. Abstracting this, one may consider the double sequences {Xnk,1 k < <
n} of independent r.v.s in each row. These results can appear as interesting
problems in their own right; S. D. Poisson already considered such a question
in 1832. We can establish it with our present tools quite easily, but as seen
in the iiext section, this turns out t o be an important new step in the devel-
opment of our subject. Its striking applications will appear later in Chapter 8.
Since the limit is a Poisson ch.f., the result follows from the continuity theo-
rem.
The interest in this proposition steins from the fact that the liinit d.f. is
not Gaussian. Unfortunately, the significance of the result was not recognized
until the late 1920s, and the theorem remained a curio for all those years. Its
status, on a par with the Gaussian d.f., was realized only after the formulation
and study of infinitely divisible distributions as a cullnillation of the central
liinit problem. We take it up in the iiext section, presenting several key results
in that theory, and then deduce certain important theorems, which were orig-
inally proved by different methods. For instance, we obtain, among others,
the Lindeberg-Feller theorem, which generalizes both Theorems 2 and 3, later
in Section 5.3. Its key part is considered further in the continuous parameter
case in Section 8.4. As is clear from the above work, the use of ch.f.s in the
analysis is crucial. The basic probability space will be in the background. For
that reason, this aspect of the subject is often referred t o as the analytical
theory of probability.
304 5 Weak Limit Laws
4, are ch.f.s, their product is a ch.f. (It is the ch.f. of X - X', where X , X'
are i.i.d. with ch.f. $,.) Similarly, 412 is a ch.f. Since I 4(t) 1, <
liin
n i m
1 4, (t) 1 = liin
nicx
4 (t) 1 2/n =
exists, aiid g(t) = 0 on the set {t : 4(t) = O), aiid =1 on the set {t : I 4(t) > 0).
But 4(0) = 1 and 4 is continuous at t = 0. Thus there exists a T > 0 such that
4 ( t ) l > 0 for tl < T. Hence g(0) = 1 and g is coiitiiiuous for T < t < T.
Since each 1 4,12 is a ch.f., and 1 4,12 + g, which is continuous at t = 0, by
the continuity theorem, g must be a ch.f., and hence is continuous on R. But
g takes only two values: 0, 1. Thus we caii conclude that g(t) = 1 for all t E R
which yields I 4 ( t ) > 0 for all t E R. This proves (a).
(b) If X colicelitrates at a single point p E R, then its ch.f. 4 : t H e"p is
clearly infinitely divisible. If X takes finitely many values, then it is a bounded
r.v. Thus, more generally, suppose X is a bounded r.v. We assert that it
must be degenerate. Indeed, if I XI <
M < cc a.e., and if X is an infinitely
divisible r.v., then by definition, for each integer n >
1, there exist n i.i.d.
variables X n l , . . . , X,, such that X = Ci=,Xnk The fact that I XI <
M
< < <
a.e. implies I XnkI M/n a.e. for all 1 k n. If not, A1 = [Xnk> M/n] or
5.2 Infinite Divisibility 305
A, = [Xnk < -M/n], P(A,) > 0, for i = 1 or 2, and at least one (hence all by
i.i.d.)k, P [ X n k > M l n ] > 0 or P [ X n k < -M/n] > 0, then on A1, [ X > MI,
or on A2, [ X < -MI, which is impossible. Thus P(A1 U A2) = 0. Hence
>
Proof Let 4i E 3,i = 1 , . . . , k. Then by definition for each integer n 1,
k
there exist $, such that 4, = ($,),. Thus 4 = n , = l cji = ( n t = l and
n;=, gin is a ch.f. Hence 4 is infinitely divisible and 4 E F. Next suppose
4
that 4, E F and 4, + a ch.f., as n + cm.Then for each integer rn 1, we >
have that $n12/m is a ch.f., as seen before, and the hypothesis implies that
I 4, 12/" +-1 $ 1 ~and
1 ~
the limit is continuous at t = 0. Thus by the continuity
theorem 412/" is a ch.f., and $12 is infinitely divisible. Now by Proposition
2a, $ never vanishes and is continuous, so that Log $ is well defined. The same
is true of Log 4,. Hence on R
306 5 Weak Limit Laws
The argument used for (2) implies that, if 4 E F, then qY is a ch.f. for
>
each rational r 0. Then the continuity theorem and the fact that each real
>
X 0 can be approximated by a sequence of rationals r, >
0 give the result
>
for all X 0. A similar argument (left t o the reader) gives the second part of
the following, since ( 4I2)'ln is a ch.f., n 1. >
Corollary 4 Let 4 be a n infinitely divisible ch.f. T h e n for each X > O,q5'
i s also a ch.f. Further, I q 5 is a n infinitely divisible ch.f.
But there are ch.f.s q5 such that q5' is not a ch.f. for some X > 0. (See
Problem 9.) These properties already show the special nature and intricate
structure of infinitely divisible ch.f.s. There exist pairs of noninfinitely divisible
ch.f.s whose product is infinitely divisible; and also pairs one member of which
is infinitely divisible and the other one not, and the product is not infinitely
divisible. We illustrate this later (see Example 8.)
The problem of characterizing these ch.f.s was proposed, and a first solu-
tion given, by B. de Finetti in 1929, and a general solution, if the r.v.s have
finite variances, was given by A. Kolmogorov in 1932. Since, as is easily seen,
the Cauchy distribution is infinitely divisible aiid it has no moments, a further
extension was needed. Later P. Lkvy succeeded in obtaining the general for-
mula, which iiicludes all these cases, but it was not yet in canonical form. The
final formula was derived by A. Khintchine in 1937 in a more compact form
from Lkvy's work. We now present this fundamental result and then obtain
the original formulas of Lkvy and of Kolmogorov from it.
itx 1 + x2
4(t) = exp { i y t + (e'xt - 1--
)
1+ x 2 x } 7
(3)
Using the Riemann-Stieltjes approximating sums for the integral on the left
in (6), one has, by ( 5 ) ,
But this is a finite product of the ch.f.s of Gaussian and Poisson d.f.s. Thus
it is infinitely divisible for each n , and by Proposition 3, the limit q5 is also
illfinitely divisible as 4 is clearly contiiiuous at t = 0.
Let us establish uiiiqueiiess before turning to the proof of necessity. As
noted above, Log q5 is a well-defined continuous complex function on B by
Proposition 4.2.9. Consider
w(t)= ( 1 / 2 )
1-1 Log 4(u) du + Log 4(t), t E R.
n($I,(t) -
[ : I 1
1) = n exp -Log 4(t) - 1
5.2 Infinite Divisibility
If F, is the d.f. of $
,I then (9) may be written as
Let
Hence the real and imaginary parts of I,(t) converge to the corresponding
parts of Log 4 ( t ) , so that one has, from the real parts,
>
Thus {En,n 1) is bounded, and hence {G,(+oo), n 1) is bounded. >
Now by the Helly selection principle there is a subsequence {Gn,)k>l
which converges to G, a bounded nondecreasing left continuous function, at
all points of continuity of G. Clearly G(-cm) = 0. To see that G,,(+cm) +
G(+cm), let E > 0 be given and choose a0 > 1, a continuity point of G, such
that G(+oo) G(ao) < €13 and for I t
- <
2/ao, log 4(t)l €112. Choose <
>
N, > 1 such that n N, +-I G,, (ao) G(ao)l < €13. Then
-
Now if Nl > n,, then (12b) holds if z > a0 > 1, so that for n > Nl we get
Thus
A xl>a"l
dG,(x) < €16 + 2 sup
I tl<2ar1
I log I m(t) I < €16 + 2 . €112 = €13, (15)
by the choice of ao. Hence if n > N2 = max(N,, Nl), then (13) and (15) yield
5.2 Infinite Divisibility
Finally, let
Since the integrand in (17) is bounded and continuous for each t , and
G,, + G as shown by (16), we may apply the Helly-Bray theorem for it and
interchange the limits. But by (11) In,(t) + Log q5(t), so that (17) implies
yn, must converge to some number y. Hence we get
Given G, (18) and (19) determine M and N , aiid conversely, if M , N are given
to satisfy (18) aiid (19) aiid the conditions (i)-(iv), then G is determined in
terms of which (3) becomes, with a y t R,
ixt
+ L + ( e m t 1
- +1x 2) ~N(x)}, t t ~ .
312 5 Weak Limit Laws
The collection (7,a', M , N ) is the L h y (spectral) set for the infinitely divisible
ch.f. 4, and the pair (M,N ) the Le'vy measures. These will be needed in Section
8.4 as well.
If the iiifinitely divisible d.f. has a finite variance (equivalently, its ch.f. q5
is twice differentiable), then we can also get the Kolmogorov formula from (3)
as follows.
Define K : R + R', called the Kolmogorov function, by
We have to show that K(+cm) < cc if the d.f. has finite variance. In fact,
from (22) by differentiation of the holomorphic function Log 4 at t = 0, we
get
where X is the r.v. with 4 as its ch.f. Thus K(+oo) < oo, and then
Thus (22) is obtained from (3) and is rigorously correct. We state this result
as follows:
It may be remarked that (22) can also be obtained directly using the argu-
ment of (3) with slight simplifications, though it is unnecessary t o reproduce
them here (cf. Problem 12).
To present the alternative proof of the necessity part of Theorem 5 noted
earlier, we need the following result on the convergence of iiifinitely divisible
d.f.s, which is of independent interest. It is due to B.V. Gnedenko.
5.2 Infinite Divisibility 313
Proof Let F, be infinitely divisible and 4, be its ch.f. Then for each n ,
, ,
and G, is increasing, G,(-oo) = 0. Thus w,(.)is the ch.f. of the "d.f." H,,
where
314 5 Weak Limit Laws
Since l / h ( u ) is also bounded and continuous, one concludes from (25) and
the Helly-Bray theorem that
G, (x) + G(x) =
I-, -d H ( u )
Remark The corresponding limit theorems hold for the Lkvy and Kol-
mogorov representations, although each statement needs a separate but en-
tirely similar argument.
(e"~ - 1 - )itx
1+x2 d~,(x)
where
If 4, is defined by (3) with (y,, G,), then it is clear (by the sufficiency) that 4,
is illfinitely divisible. The hypothesis and (9') imply that q5,(t) + q5(t),t E R.
Thus the necessity part of Proposition 7 implies that y, + y , G, + G satis-
fying the stated conditions and that 4 is given by (3) for the pair (y, G). This
5.2 Infinite Divisibility
and
P[Y = 01 = 1 / ( 1 + a) = 1 - P[Y = 1 1 .
+
Let Z = X Y. Then X is an infinitely divisible r.v., but Y and Z are not.
However, if 2 is independent of Z and has the same d.f. as Z , then V = Z - 2
is an iiifiiiitely divisible r.v.
Hence
+
Since the ch.f. of Y in absolute value lies between (1 - a ) / ( l a) and 1, it
also never vanishes. Thus $(t) never vanishes, and again by 4.2.9,
5 Weak Limit Laws
+
where y = C n > l ( p n (-l)na!n)/(l + n 2 ) and G is a function on the integers
+
of bounded variation with jumps of size n p n / ( l n2) and (-l)nP1[nan/(l +
n2)] at positive and negative integers, respectively. It has no jump at 0. This
formula is like (3); but G is not inonotoiie increasing. Hence by Theorem 5, @
cannot be infinitely divisible.
If c(t) = E(eitv) = $(t)$(-t) = $(t)I2, then using (27) we get
This last part implies that <= $? is an infinitely divisible ch.f. even
though neither $ nor $ is such. Also c(t) = 1 4 I 2 ( t )E(eitY)I2 is a product of
an infinitely divisible I 412 and a noninfinitely divisible one. Thus the intricate
structure of these ch.f.s as well as the beauty and depth of the L&y-Khintchine
formula are exhibited by this example.
The next result is motivational for the work of the followiiig section.
n > 1, but Sn D
i S and S is infinitely divisible; it is not Gaussian.
k=l
E (eztxrb" (by independence)
Thus 4, is real and for each n ( > 5), &(t) is never zero [> (i)n].
Consequently
log &(t) = 2
k=l
log [l + (cos
kt
; I)]
-
Z
= - [(cosk t j n ) - 11 - 0(1/n)
n
D
is a ch.f., which, by the continuity theorem, implies Sn + S and 4 is the ch.f.
of S. Clearly S is not a normal (=Gaussian) r.v.
To see that q5 is infinitely divisible, note that
-1<x<1
x > 1.
By Theorem 5, (0, G) is a Lkvy-Khintchine pair and 4 is infinitely divisible,
as asserted.
thus
and
lim max P[lXnk - E ( X n k )
n+cc l < k < n
> E] = 0
Note that the variances of the partial sums are bounded, aiid the independent
components X n k are uniformly absolutely negligible.
Abstracting these two properties, we present some limit theorems aiid
some important results, such as the Lindeberg-Feller theorem, conditions for
the limit law t o be Poisson, and the like.
lim
n-cc
max P[IX n k
l<k<k,,
> E] = 0.
(b) More generally, the Xnk are called asymptotically constant if there ex-
ist constants ank [which typically are E ( X n k ) if these exist, and are medians
< <
in general] such that in {XAk = Xnk - ank, 1 k k n , n I ) , the XAk are >
infinitesimal.
Proposition 2 Let {Xnk, 1 < k < k,, n > 1) be a sequence of r.v.s. Then
the following are equivalent:
lim max ~ 5 , ~ ( t ) - 1 1= 0
n-cc l<k<k,,
5.3 General Limit Laws
Proof (i)+(ii) Let T > 0 be arbitrarily fixed. Since (1)holds, given E > 0,
>
there exists an NE such that n N, implies
E &
< -T+ - = E for all It S T .
-2T 2
Thus (ii) holds. This implication is the one often used.
Hence
x2
inax
llklk, IW 1 + x2 dF,k
-- (x)
> mar
1
1
,,>, -
1
€2
+ €2 dF,k (x), because
x2
--
1+ x 2
1' on R',
Here
Fnkbeing the d.f. of Xnk T h e limit d.f. in both cases i s the same, and i t i s
infinitely divisible.
5.3 General Limit Laws 321
Remark The associated d.f.s with the ch.f.s $, are called the accompany-
ing laws of the given sequence {Sn- A,, n > 1).
Proof Let
Then by independence
We now associate an infinitely divisible ch.f. with the right side on using the
infinitesimality of XAk = Xnk - E ( X n k ) .
+
Let FAkbe the d.f. of Xkk, so that FAk= Fnk(z E ( X n k ) ) Fnk
, being the
d.f. of Xnk. Consider
By Proposition 2 [ ( i ) ~ ( i i ) ] ,
lim
n-oo
max l a n l , = 0,
l<k<k,,
tl < T; T > 0.
If $, is as defined in (3) and 4, is given by (4), then one has the followiiig
>
simplification in which n N1, so that Log 4, is well defined for I t T : <
322 5 Weak Limit Laws
I
k ,,
= itan + (edX -
z2
1 - itz)
I
d K n (x)- itan + C Log On/ ( t )
/=1
(ii) lim
n-00
5/"
/=I -00
+
u 2 d ~ , * ( u E ( x , ~ ) )= K ( + a ) ,
5.3 General Limit Laws
(iii)
where Fnkis the d.f. of the Xnk T h e ch.f. of F i s given by (22) of Section 2,
with (a,K) as the Kolmogorov pair determined by (2)-(iii) above.
Let us now present a specialization of the above result if the desired limit
d.f. is the standard normal N ( 0 , l ) . Since its ch.f. is t H e p t 2 I 2 , in the Kol-
inogorov formula [see (22) in Section 21, it is seen that y = 0 aiid K must
have a jump of size 1 at z = 0; i.e., K ( z ) = 0 for z < 0, = 1 for z 0. With>
this knowledge of (a,K ) the following normal convergence criterion holds.
n-oo
(iii) lim
n-oo
(5
k=l
- A,
E(xnk1
Proof Suppose that the conditions hold. Then the (Xnk - E(Xnk)) are
infinitesimal since for each E > 0,
The variances a2(S;) are bounded because by adding (i) and (ii), we get
and
l~ll
n-oo
k",
k=l
/"_" +
u2ci~,k(u E(x,,~))
= 0 if x < 0
1 if 2 2 0 .
Thus if
then conditions (i)-(iii) of Theorem 4 are satisfied for the infinitesimal se-
< < >
quence {Xnk E ( X n k ) ,1 k k,, n 1). Hence, by that theorem, the limit
-
x2 dFnk(x) =
S [I ~12.1
x2 d ~ ( kx a ( ~ , ) )= --
a2(Sn) S y2 dFk(y).
[I UI~EO(S,,)]
(10)
Thus we have the following result as a consequence of Theorem 5. The suffi-
ciency is due t o Lindeberg and the necessity t o Feller.
It is useful t o note that if the r.v.s have three moments finite, then the Lia-
pouiiov condition P3 (S,)/03(S,) + 0 implies ( l l ) ,so that this is an important
generalization of the Liapounov theorem. To see this implication, consider for
E > 0,
5 Weak Limit Laws
1
x2 dFk (x)
k=l
+
The same computation holds if only 2 S, S > 0, moments exist and moreover
E(I ~ ~ 1 ~ + ~ ) / fgoes
l ~t o+zero
~ ( as
S n~+
) cm.
Observing that, for a Poisson distribution with parameter X > 0, the
Kolmogorov pair (a,K) is given by a = A,
x<1
K(x) = {Oh; x 2 1;
we can present analogous conditions for sums of independent r.v.s for conver-
gence t o a Poisson limit. This will extend Proposition 1.9. However, it is again
an immediate consequence of Theorem 4. The easy verification is left t o the
reader.
C
independent sequences of infinitesimal r.v.s with nite variances. Then for
some sequences {A,, n 2 1) of constants, S: = C k = l X n k A, converges i n
-
distribution to a Poisson r.v. with parameter A > 0, and 02(S;) + A, iff for
each E > 0 we have
(ii)
(iii)
It is now natural t o ask whether there are analogous results for the se-
>
quences of partial sums {S,, n 11,S, = x;"
X n k , if the infinitesimal X n k
do not have finite moments. Indeed, the answer is yes, and with the Lkvy or
Lkvy-Khintchine representations and Proposition 2.7, such results have also
been obtained, primarily by B.V. Gnedenko. Now a more delicate estimation
of various integrals is needed. We state the main result-a generalized Bawly
theorem. Then one can understand the type of conditions that replace those
5.3 General Limit Laws 327
of Theorem 4. For a proof of the result, we direct the reader t o the classic
by Gnedenko and Kolmogorov (1954) where references t o the original sources
and other details are given.
E (eztyr7) = $, ( t )= exp
where for any fixed but arbitrary T > 0, with Fnkas the d.f. of X n k , we have
(Even though the Yn depend o n T > 0, the limit r.v. Y does not.)
The other, equally useful, forms of the limit distributions use the Lkvy
representation. All such results are discussed in detail, with a beautiful pre-
sentation, in the above monograph. In all these theorems the r.v.s X,k were
assumed asymptotically constant. If this hypothesis is dropped, the methods
of proof undergo drastic changes. No general theory is available. But the fol-
lowing extension of the Lindeberg-Feller theorem, obtained by V.M. Zolotarev
in 1967, indicates these possibilities.
Let X n k ,k> l,n > 1, be rowwise independent r.v.s with E(Xnk) = 0,
2
Var Xnk = ank,and xkrl = 1. Let Sn = xk,l X n k , which converges
by Theorem 2.2.6. If G is the normal d.f., N(O, I ) , let Gnk be defined by
Gnk(x)= G ( x / a n k ) .If F, H are two d.f.s, let d(F,H) denote the Lkvy metric
defined in Eq. (4.1.11). We then have
(i) an = supk d(Fnk,G n k ) + 0, where Fnki s the d.f. of Xnk and Gnk i s
defined above, and
(ii) for each E > 0, with An = {k : oik < 6)
Again we omit the proof of this interesting result t o avoid the digression.
In this vein, we state an important coilsequelice of Theorem 8. Its proof in-
volves an alternative form of the conditions of Theorem 8, and they are still
nontrivial.
lim
n-00
P [X n k1 > E] = 0,
which i s equivalent t o the condition
There are other interesting specializatioiis for Poisson aiid degenerate con-
vergence aiid then t o single sequences. For details, we refer the reader t o the
Gnedenko-Kolmogorov fundamental monograph noted above.
The preceding theory shows that any infinitely divisible d.f. can be a limit
d.f. of the sequence of sums of rowwise independent infinitesimal r.v.s. The
classical central limit theory leads t o the normal d.f. as the limit element.
However, it is of interest t o look for a subclass of the illfinitely divisible laws
which can arise having a "simpler" form than the general family. This turns
out t o be a family, called stable laws. Let us introduce this class by the fol-
lowing motivational example, which gives some concreteness t o the general
concept t o be discussed.
>
Example 11 Let {X,, n 1) be i.i.d. random variables with the common
d.f. F, given by its density F
L = f,, (Pareto density, c.f., Exercise 4.25)
5.3 General Limit Laws 329
Thus we only need to consider 0 < p 5 2. This has to be treated in two parts:
0 < p < 2 and p = 2. The variances do not exist in both cases.
(ii) 0 < p < 2. We cannot directly try to verify (12) here since it is first
D
necessary to find the b,(p) such that S,/b,(p) + S. Only then (12) gives
conditions for s to be N ( 0 , l ) . However, this is the key part. Let us try, by
analogy with (i), n", for some a > 0 as the normalizing factor. A heuristic
reason for this will become clear a little later. We use the technique of ch.f.s.
Thus
Taking a p = 1, we can then let n + oo.But note that the integral on the right
converges only if p < 2. (Its singularity is at the lower end point if p = 2.)
Hence for 0 < p < 2, letting a! = lip, we have
and since the right side is continuous, it is a ch.f., by the continuity theorem.
If p = 1, then the limit is the Cauchy ch.f. Thus for O < p < 2, bn(p) = nllp
is the correct normalizing constant for Sn.
(iii) p = 2. Since b, = fi is not enough t o control the growth of S,, as
seen from the divergence of (16), we need b, t o grow somewhat faster. Let us
try b, = ( n log n)lI2. Then (15) becomes with this new normalization
if J n < 1. Here
nJ, = 4n 1
" sin2(tz/2 [n log n]'I2)
x3
dz
It is clear that the integral on (i,cm) converges for each i > 0, and hence
the right side goes t o zero as n + cc on this interval. Therefore its value, as
n + cc, is the same as the limit of
We now estimate this. First, choose i > 0 such that given rj > 0,1 - rj <
< <
sin2 u/u2 < 1 for 0 u i. Then
t2
< JA < -[logi - t /2(n log n ) 'I2)].
log n
First letting n + cc on both sides and then r j + 0 we see that the extremes
have the limit t2/2. Coiisequently limn,, A
J = limn,, nJ, = t2/2. Substi-
tuting this in (17) we get
Now that b, = ( n log n)'l2 is seen as the correct normalizing constant, so that
S,/b, 2 S, we could also immediately verify (12), so that s is N ( 0 , I ) , which
5.3 General Limit Laws
From this it follows that normal, Cauchy, and degenerate d.f.s are stable.
But there are others. Their structure is quite interesting, though somewhat
intricate, as we shall see. [Some important applications of this class will be
considered in Section 8.4.1
sn = X, and a',X + b' have the same d.f. for each n and for some
a; > 0, b' E R.[Equivalently, (S, - b,)/a, has the same d.f. as X.] However,
the converse of this statement is also true; i.e., if q5 is a ch.f. which satisfies
(21) for each n , then q5 is stable in the sense of Definition 12, or (20). This
follows from the representation (or characterization) below of the class of sta-
ble ch.f.s. In (21), the a, are called norming constants. One says that the
+
d.f.s F, G are of the same type if F ( x ) = G(ax b) for all x E R and some
a > 0, b E R.In words, F, G differ only by the scale and location parameters.
From this point of view, if F is stable, then so is G, where G(x) = F ( a x b) +
for some a > 0, b E R, aiid all x E R. Thus we have stable types of laws.
the same norming constants. Indeed, if q5 is the (common) ch.f. of the X, and
$I that of Y, we have by (21)
and $ ( t )> 0. Thus $I and 4 have the same norming constants a, > 0.
Consequently we may (and do) assume that X is a symmetric stable r.v. with
a, > 0 as its norming coilstant for the rest of the proof. Also let X $ 0 (to
avoid trivialities).
As noted in the remark before the statement of the proposition, the
stability hypothesis on X implies that if X I , . . . , X, are i.i.d. as X , then
S, = c r = l X i and a,X + bk are identically distributed for each n , where
a, is the normiiig coilstant aiid n > 1 is arbitrary. Replacing n by m n +
>
(m, n 1 integers), we first note that (S,+, - S,) aiid S, are independent
r.v.s, aiid the stability hypothesis implies the following set of equations, since
(S,+, - S,) and S, are identically distributed:
where X 1 , X " are i.i.d. as X . The symmetry of X implies that b' = 0 in this
representation of the S,. Since S,+, = (Sm+,- S,) +
S,, (22) yields
5 . 3 General Limit Laws 333
since XI, X" are i.i.d. aiid have the same ch.f. as X . Replacing t by tla, and
setting m = n in (24), we obtain
Since t E R is arbitrary, we see (why?) from (25) that aa, = aaa,. In a similar
manner considering r blocks of n terms each for S,,, we get a,, = a,a,, r >
>
1 , n 1. If now n = r P , SO that a , r + ~= a,a,k, k = 1 , 2 , . . . , p , multiply them
to get a, = (aT)P. We next obtain a few other arithmetical properties of
{a,, n >1). Since the result is true and trivial in the degenerate case for 4
(any a, works), we exclude this case in the following argument.
The sequence {a,, n >
1) is monotone increasing and tends to infinity.
Indeed, let u > 0 be arbitrarily fixed. Consider with (23) and the symmetry
of the r.v. X .
Thus
1 1
P [ X > (am/am+,)u] > -PIX1
2
> u] = - P [ X
2
> u].
(26)
Now the right side is a fixed positive constant for a suitable u > 0. If a,/am+,
is not bounded as m + oo, and n + cm, then the left side goes to zero,
contradicting the inequality of (26). Hence
by the preceding paragraph. Letting k + cm, this implies a,/a,+l 1 so that <
>
{a,, r 1) is monotone.
Next we assert that a, = r P , for some C,? > 0, so that a, tends to infinity
>
aiid proves the main part. In fact, if k , p 1 are integers aiid q > 1,then we
can find a unique integer j >
0 such that
334 5 Weak Limit Laws
This implies a, > 1 and then on taking "logs" for these inequalities and
dividing, one gets (all ak = 1 + X = 0 a.e. by (21), so that ak # 1):
j logp
< - log k 5 +
( j 1)log p
(j+l)loga, logak jloga, '
iff X has the same property (cf. Problem 33 of Chapter 4), and its ch.f. is
$I2 > 0. If F is the d.f. of Y, then
1 - cos tx
kX2d~(x)=2kf$ t2 dF(x)
1 - cos tx
5 2liminfk t2 dF(x) (by Fatou's lemma)
t+O
The preceding result has another consequence. We also term the constant
a! > 0 of the above proposition the characteristic exponent of (the d.f. of) X .
Corollary 14 If X is a nondegenerate symmetric stable r.v. with a char-
acteristic exponent n > 0, and X 1 , X 2 are i.i.d. as X , then for any positive
numbers a, b we have
for any positive integers m, n. Replacing rn, n by np aiid rnq(p, q > 1 integers)
and t by t/(nq)'la in (30), we get
Hence (29) is true if a = p/q, b = rnln, i.e., all positive rationals. If a , b > 0
are real, then they can be approximated by sequences of rationals for which
(31) holds. Since 4 is continuous, (31) implies
The interest in stable laws is enhanced by the fact that they are the only
laws that can arise as limit distributions of the normalized sums of i.i.d. se-
quences of r.v.s as originally noted by P. Lkvy. That is how he first introduced
this concept. Let us present this result precisely.
Proof That every stable law is a limit of the described type is immediate
from definition. In fact, by the remark following Definition 12, if X is a stable
r.v. aiid X I , . . . ,X, are i.i.d. as X , then S, = x:=, Xi
-
a,X b,,a, > +
D
0, so that (l/a,)S, b, = X , where b, = b,/a,. Thus only the converse
-
is nontrivial. The true and trivial case that S is degenerate will again be
eliminated in the rest of the proof.
D -
Suppose then (l/An)Sn- B, + S, as given. Hence every convergent subse-
quence on the left has the same limit 3. Consider the following blocks of i.i.d.
kn
sequences: SI, = E:jl Xi, 5'2.1 = E;Zn+l Xi1 . 1 S k n = E i = ( i - l ) n + l xi.BY
hypothesis, for each k >
1,
336 5 Weak Limit Laws
D
as n + cm, since Yl, + s by hypothesis and {kn, n >
1) is a cofinal subse-
quence of the integers. (This is immediate if we use the image laws and go
t o another probability space on which the corresponding sequence converges
a.e., as in the second proof of Theorem 4.1.2.) From (32) and the definition
of Yk, in (33), one gets
D
But (33) and (34) imply that Ykn + S and ak,Ykn bk, + 2
?. Since S
is iioiidegeiierate by assumption, we can apply the result of Problem 22 in
Chapter 4 (and the reader should verify it now, if it was not already done),
ak, + ak > 0, bkn + bk, SO that (34) implies
Since k > 1 is arbitrary, this implies s is stable, which ends the proof.
The preceding work heightens interest in the stable laws and it is thus
natural t o study aiid determine this subset of the illfinitely divisible class.
Such a characterization has been obtained again jointly by A. Khiiitchiiie aiid
P. Lkvy in 1938, and we present it now with a somewhat simpler proof. The
original one depended on the canonical representation of infinitely divisible
laws given in Theorem 2.5.
Remark The constant a! > 0 here will be seen t o be the same as that of
Proposition 13, and hence it is just the characteristic exponent of 4. The case
that a! > 2 of Proposition 13 corresponds t o c = 0 here. Also a = 2 gives the
normal, and n = 1, p = 0 , r = 0 gives the Cauchy ch.f.s. If 4 is liondegenerate
(thus c > O ) , then 14 is Lebesgue integrable on R, aiid so every nondegener-
ate stable d.f. is absolutely continuous with a continuous density (by Theorem
4.2.1). However, an explicit calculation of most of these densities is not sim-
ple. [Some asymptotic expansions for such densities have been presented in
1954 by A.V. Skorokhod.] We derive the representation (35), but the complete
proof will be given only for the symmetric stable case and comment on the
omitted part (shifted t o the problem section, see Exercises 24 and 25). The
argument uses some number theoretical techniques of great value.
Proof Let 4 be a stable ch.f. in the sense of Definition 12. Then (21) holds
by Proposition 13, with a, = nilo for some n > 0, so that, setting 6 = l / n
for coiivenience, we have
so that
h((m/n)" = (h(m")'ln = (h(l))"ln.
By the continuity of h, we get h ( t 7 = ( h ( l ) ) t for t > 0. Replacing t by t q n
the above, one obtains
where c = -log h ( l ) > 0. Clearly (39) is true for t = 0 also, and then for all
t € R.
Let us next consider g ( . ) . From (36) and (37) (considering the imaginary
parts), one has
338 5 Weak Limit Laws
m n g ( t ) = g(m6n6t) tb,, +
= mg(n%) b,n% - + tb,, [by ( 4 0 ) ]
= m ( n g ( t ) tb,)
- - b,n% + tb,,.
Rewriting this, one has (set t = 1 )
Thus
b,(n - n" = bn(m - m", n, m > 1 integers. (41)
If now a # 1 , so that 6 # 1 , we get a solution of (411, for some a0 E R : b, =
a o ( n - n 6 ) . Setting f ( t )= g ( t ) - a o t , then with ( 4 0 ) one has
nf ( t )= f ( n 6 t ) , t E R, n > I. (42)
For this functional equation we can apply the same argument as in ( 3 8 ) . Thus
f ( n 6 )= nf ( 1 ) and f ( m 6 )= f ( n 6 ( m / n ) 6=)nf ( ( m / n ) 6 ) .
ng (mt)= g ( n m t ) + mtb,
= g ( n m t ) + rn [ n g ( t )
- g ( n t ) ]. (46)
To solve the functional equation (461, let w ( u ) = e P t L g ( e Uu) ,E R.Then if
>
u = log t , a, = log n , n 1, t > 0 , we get from (46)
If rn, n are replaced by rno,no, then, using the density argument, we can
deduce that
+ +
( ( u r ) = ( ( u ) ( ( r ) , U , 7- E R. (50)
340 5 Weak Limit Laws
But ((.) is continuous, and thus (50) is the classical Cauchy functional equa-
+
tion. So ((u) = aou, or w(u) = aou bo for some constants ao, bo. Hence for
t > 0, and u = logt,
which proves (35). Note that the a in (35) is the same constant as that in
Proposition 13. For a > 2 , 4 is a ch.f. only if c = 0.
Conversely, if q5 is given by (35), in the symmetric case it reduces to (53).
The latter is a ch.f. by P6lya's criterion (cf. Problem 25 of Chapter 4). To see
that 4 is then a stable ch.f., it suffices to verify the relation (20)(i.e., Definition
12). Thus for a1 > 0,az > 0 we see that
+
where a = (a? a$)'/'. Hence a function q5 defined by (53) is always a sym-
metric stable ch.f. Actually if we first verify that 4 given by (35) is a ch.f.
(indicated in the problem), then the above simple argument implies that it is
a stable ch.f. (Of course, the "if" here iiivolves a nontrivial amount of work.)
This finishes the proof.
Remark Let X be a symmetric stable r.v. so that its ch.f. q5x,,(.) is given
by (53). Then Schilder (1970) has observed that I . 11, : X + c ~ l ' " ~0~<,
<
a 2, defines a metric on the class of independent symmetric r.v.s in L m ( P ) .
Thus I X 1, = log ~ x , l c ll A m P ~ ( I ) , and with a small computation it is also seen
that for independent r.v.s X I , X2 with the same characteristic exponent a,
5.4 Invariance Principles 341
+ +
1x1 X z l m = I X I I , IX21,. This statement is also true if X 1 , X 2 and
+
X I X2 are all symmetric stable with the same a! > 0, but here the joint (or
multivariate) stability concept is needed. (See Problem 25 (c) on this notion.)
where [nt]is the integral part of nt, so that for t = 1, Y,(l, w) = S,(w)/,";I, w E
L?. Thus if we set So = O,Y,(.,w) is a polygonal path, joining (0,O) and
342 5 Weak Limit Laws
Z,(.) = Y,(., .) : L? + C[O, 11, the space of real continuous functions as the
range space of {Z,, n > I ) , and if p, = P o 2;' is the image law, then t o
investigate the convergence of p, as well as t o determine the limit. Thus it is
desired t o show, in general, that, under reasonable conditions, one can assert
few of the results of Donsker and Prokhorov in this section because of their
importance and great interest in applications.
The preceding discussion clearly implies that we need t o consider new
technical problems before establishing any general results. The first one is the
definition and existence of Wiener measure aiid process, which can be stated
as follows. (From now on an r.v. X ( t , .) is also written as X t , for convenience.)
< <
Definition 1 An indexed family {Xt, 0 t 1) of r.v.s on a probability
space ( R , C, P) is called a Brownian motion (or a Wiener process) if each Xt
<
is a Gaussian (or normal) r.v. N(O, a 2 t ) aiid for each O t l < t2 < . . . < t n <
>
I , n 1, the r.v.s Xt,, Xt, Xt,, . . . , XtrL Xt ,,-, are mutually iiidepeiideiit
- -
(Here the index [0, 11 is taken only for convenience. The concept holds if the
index is any subset of R', or even R with simple modifications.)
It is not obvious that such a process exists. Indeed, if Ftl,,,,,tn is the joint
>
d.f. of X t l , . . . , Xt,, , then from (3) we can immediately note that IFtl,...,t,, , n
1) is a compatible family. In terms of ch.f.s this is immediate, since
,...,t,, ..
( ~ 1 , . 1 ~ n )
= E(exp{iulXt, + . . . + iunXt,,})
aiid the compatibility conditions on the F become [cf. Eqs. (3.4.2), (3.4.3)]
The right side is given by (3). Fortunately this is all possible, aiid iiot too diffi-
cult. We omit the proof here, since it is iiot essential for this discussion. (It may
be found, for instance in the first author's (1979) monograph, pp. 186-191.)
One then notes from the work in real analysis, that because C[O, 11 is a sep-
arable metric space [under the sup norm as metric, I XI = sup,5tll X ( t ) ] ,
its Borel 0-algebra (i.e., the one determined by the open sets) aiid C are the
same and that any finite measure on such a Borel 0-algebra is automatically
regular, [i.e., P ( A ) = sup{P(K) : K c A, compact ), A E C]. This regular
probability measure P on the Borel a-algebra B of C[O,11, is called the Wiener
measure, and is also denoted W(.). Thus (C[O,11,B, W ) is the Wiener space
and {Xt, 0 5 t 5 1) can be regarded as a process on ( R , C, P) with its sample
fuiictioiis in C[O,I]. It is the Wiener (or Brownian motion) process.
There are other ways of constructing this process. We establish one such
construction in the last chapter and present some deep results. N. Wiener
was the first in 1923 to demonstrate rigorously the existence of this process
(hence the name Wiener process), even though R. Brown, an English botanist,
observed the process experimentally, i.e., the erratic behavior of its sample
paths (or functions), as early as 1826 (hence Brownian motion). Now it caii
be approximated by a random walk; and other methods, such as Wiener's
5.4 Invariance Principles 345
aiid Ak = B; - U?Z; B:, aiid let us continue this procedure for each k 1. >
If we let S,, , , =,, A:, , then for each k-tuple of integers (il, . . . , i i ) , the
collection {S,,,,,,,,,, k > I} are disjoint P-continuity sets such that for each
k , Uik21Silr,,,,iL = S Z ~ , . the . . ,diameters
~ ~ ~ ~ satisfy
, diam (S,,,,..,,,) I 2-IC and
together they cover S . (Verification is left t o the reader.)
Next obtain for the interval 10, 1) the corresponding decoinpositioiis such
,,
that Ii,,..., i, and I: ,,,,, are chosen t o satisfy p(Ii ,,..., i,) = P(S2 , i k ) and
p ( I z ,...,i,) = P, (S,,,,,,,i , ) , if ( i l l . . . , ik) 4 (ii , ik, . . . , i i ) (lexicographic or-
,;,
der), take I,,,,,,,,, t o the left of I,;,,,,, Here we use the order property of the
real line. Similarly we order I: ,,,.,ik t o the left of I; ,...,i; and both of these
cover the unit interval. With such a decomposition we construct the desired
r.v.s as follows: Choose a point xi ,,..., i, E Sil,...,i, and set
5 Weak Limit Laws
(Omit the empty I's.) This defines x,"and ~ " 1 1 [O, 1) as r.v.s into S, since
Sil,,,,,i,
E I3 for each k >
1. Furthermore, p(x,"(w),x,"+'(w)) 2-"', <
1 1, >
and the same is true of x ~ w ) Thus
. for each n , these are Cauchy sequences,
and by the coinpleteiiess of S, there exist mappings X, and X such that
X: (w) + X, (w), ~ " w ) + X(w), as k + oo,for all but at most a count-
able set of w that are boundary points of these I-sets. Heiice defining them
arbitrarily at these points (which have p-measure zero) we see that X,, X are
measurable mappings on [0, 1) into S, i.e., r.v.s. Also,
by hypothesis, since Sil,,,.,i, are P-continuity sets. Thus for each w E I,,,,..,i, ,
>
an no(w, k) can be found such that n no(w, k) + X: (w) = xk(w) , and so
Corollary 3 Let (S,B) and (3,g) be two metric spaces as in the propo-
sition. If P,, P are probabilities o n B such that P, + P in the sense of the
proposition, and i f f : S + s i s a (B, @-measurable mapping such that the
discontinuity points Df E B satisfy P ( D f ) = 0, t h e n P, o f -' + P o f -'. I n
particular, if ((R, C , Q ) i s a probability space, Y,, Y are r.v.s from (R into S
D
such that Y, + Y (i.e., Q o Y'; + Q o Y - l ) , then f (Y,) 2 f (Y) for each
continuous f : S + 3 .
Proof By the above proposition, there exist r.v.s, X,, X on the Lebesgue
unit interval ([O, I ) , L, p ) into (S,B) such that X, + X a.e., and P, =
p o x;', P = p o X-l. Also, f (X,) (w) + f ( X )( w ) for all w for which f is
coiitiiiuous at X(w). The set of discontiiiuities of f is contained in x - ' ( D ~ ) ,
which is p-null. Hence p o f (x,)~' + p 0 f (X)-', or equivalently
Note that both the above proposition and corollary reduce t o what we
have seen before if S = R. Moreover, the calculus of "in probability" results
of Mann and Wald given as Problems 9-11 in Chapter 2 extend t o the case
considered here.
We are ready t o prove Donsker's (1951) invariance principle discussed ear-
lier in this section, aiid it still needs many details, t o be given here.
>
Theorem 4 Let {X,, n 1) be a sequence of i.i.d. random variables o n
( 0 ,E, P) with zero means and variances a2 > 0. If Y, i s defined by (1) as a
polygonal function o n fl + C[O, 11, which i s a random element with distribu-
tion P,(= PoY;l) o n the Bore1 sets of the separable metric space C[O, I ] (with
uniform n o r m as its metric), then P, + W, where W i s the W i e n e r measure.
Equivalently, Y, 2 Z, where {Z(t, .), t E [O, I]} i s the W i e n e r o r Brownian
m o t i o n process with Z ( t , .) as N ( 0 , a2t).Hence for each h : C[O, 11 + C[O, 11
which is measurable and whose discontinuities form a set of W i e n e r measure
zero, we have h(Y,) 2 h ( Z ) o r P, o h t l + W o h p l .
1
< n€2
- +0 as n + cx ( ~ e b ~ ~ einequality).
v's (9)
On the other hand, since [nt]/n + t as n + GO, from L6vy7s central limit
theorem we get with $(u) = ~ ( e ~ ~ ~ l ) ,
<
measure of (I/&) (S[,tll,. . . , S[,t,I ) , 0 t l < . . . < tk <
1, then it converges
(weakly or distributionally) t o W(.) iff Pn o h-' + W o h-' for each fixed
k. Taking k = 2,O < t l < t2 < 1, the general case being similar, we get by
independelice of S[ntllaiid Slntz1S[nt,l, -
>
Let k 1 be fixed and t = j / 2 k , j = 0 , 1 , . . . , 2k, a dyadic rational. Replacing
t by these numbers in the definition of A, the new set Ak (say) approximates
A from above as k + GO.Given 6 > 0, we can find a k >
1 such that [we
assume W(8A) = 01, W(Ak) - W(A) < 6. But by the first paragraph, (the
5.4 Invariance Principles 349
It follows from the arbitrariness of S > 0 that G,,,P,(A) W(A). Thus <
the next step is to establish the opposite inequality, liminf, P,(A) W(A), >
which needs more work than that of establishing (11).
11. Let E > 0, r] > 0 be given. Choose n > 0 such that if
-
H
C
Y = {g : f (t) + < g(t) <f (t) - a, 0 < t < 1))
which increases to A as a J 0, then W(H,) > W(A) -E. One can approximate
H, from above if t is replaced by the rationals as in the last paragraph. If
t = i / k , i = 0 , 1 , . . . , k, and
+
then W(A) < W ( H m ) E < w ( H ~ )E for each k + >
1. But f,fx
E C[0,1].
Hence they are uniformly continuous. Thus - we can
- find k > 1 such that if
I t s < I l k , then if(t) f(s) < n/3, if(t) f ( s ) < n/3. In particular, if
- -
n > k, and Cn c C[O,I] is the set of functions which are piecewise linear on
[(i l ) / n , i l n ] ,i = I , . . . , n , then by the uniform contiiiuity of these functions
-
If q,,, is that q satisfying (q l ) / k < r / n < qlk, 1 5 q < k < n (set qn,o = O),
-
then
By definition of F,,,,
is false and hence g $ H , or is in H:. Hence the first term on the right side
of (13) is dominated by P,((H;)"), since n k. >
Consider the second term of (13). Since Y,(t, .)- aiid Y(t, .)-processes have
independent increments, we have for 0 t , t' < <
1 and S = C[O, 11,
by the triangle inequality. But by the first paragraph [cf. ( l o ) ] the first and
<
last integrals of (14) tend to 0. The middle term ( [nt]- [ n t f ] / n ) l l 2 Hence
.
for large enough n , the right side of (14) 3 < d m .
Consequently
From this we deduce that ( E > 0, q > 0 being arbitrary) lim inf, P,(A) >
W(A). This and the inequality (11) above imply limn P,(A) = W ( A ) , and
the proof of the theorem is complete since the last part is an immediate con-
sequence of Corollary 3.
Let us state Prokhorov's result, which illuminates the structure of the in-
variance principle as well as the above theorem.
< <
Theorem 6 (Prokhorov) Let {Xnk,1 k knIn2l be a sequence of row-
wise independent sequences of r.v.s which are infinitesimal and which have two
moments finite such that E ( X n k ) = 0, Var S, = 1, where Sn, = Xnk EL=,
and S, = Snk7,. Let
t E [tn,,t,(,+l)], Sno= O,tno = 0, and t,, = Var S,,. Let P, be the distri-
bution of Yn in C[O, I ] . Then P, + W, the Wiener measure in C[O, I], iff the
Xnk-sequence satisfies the Lindeberg condition:
for each X > 0, where Fnk is the d.f. of Xnk [Compare this form with the
Lindeberg-Feller form given in Theorem 3.6.1
352 5 Weak Limit Laws
Theorem 7 Suppose that the Xnk satisfy the Lindeberg condition (16).
Thus if t, = tnk,,, then with the above notation and assumptions, (16) becomes
implying that
Proof The result is established in two stages. First we show that the left-
side limit of (17) exists and is independent of the Fnk.(This would be immedi-
ate from Theorem 4 if the X n were i.i.d., since then h ( f ) = supoltsl f (t), f E
C[O, I ] , is a continuous fuiictioiial on C[O, 11.) Next we choose Xnk t o be syin-
metric i.i.d. Bernoulli r.v.s and evaluate the limit. These are iioiitrivial coin-
putations.
5.4 Invariance Principles 353
>
I. For the first point, let {Y,, n 1) be a sequence of independent N ( 0 , l )
r.v.s on ( 0 ,C , P) on which the Xnk are defined. The probability space may
be assumed rich enough to support all these r.v. families, by enlarging it if
necessary. Let 2, = EL=,Yk and if Fn(x) is the left side of (17) before taking
the limit, then we assert that for each E > 0 and integer r 1, >
liminfF,(x)
n
>P I max ~
l<k<r
~ / < xr - ~ / ~
and
limsup F,(x)
n
<P max ~
l<k<r
~ / < x]
r .~ / ~ (19)
These two inequalities are analogous to those of steps I and I1 of the proof
of Theorem 4, and the following argument proceeds on the same lines but is
tailored to the case under consideration.
Let i l l . . . , i, be integers chosen such that for each 1 j r, we have < <
where t, = Var S,k,, , as in the statement. Thus i j is the smallest integer such
that tniJ > < <
( j / r ) t n , and 1 il ...< i, = k,. Consider Unl = Sniland
Un j. -
- S nz,
. - S .nz J . = 2 , . . . , r . We note that as n + oo the Unj satisfy
Lindeberg's condition. In fact, by (16') one has
as n + oo,using (16'). Thus the Unj also satisfy Lindeberg's condition for
1/2 D
each j . By Theorem 3.5, Unj/anj + to an N(O,1) r.v., for each j = 1 , . . . ,r,
as n + cm.Since the Unj, j = 1,. . . , r, are independent, so are these limit
r.v.s. But by (20) lim,(anj/t,) = l/r. Consequently, r 1 / 2 ~ , j / t k / 22 t o an
N(O,1) r.v. as n + oo,j = 1 , . . . , r, these being independent. This means
Now going to an auxiliary probability space and using Corollary 3 with h(5) =
~ % .j. ), Zr)
m a ~ ~ < ~ < ~ ( ~2 t== (21,. , E Rr, we get from (20) that
11. It will be shown presently that (22) implies (19) as well as (18).
For this let G,,,.(z) = P[maxl<j<,-S,,] < ztk"]. We express the event
- -
[inaxlc S,,! < ztA/2] here as a disjoint union in much the same way as in
(13) (as was done many times before, e.g. in martingale convergence). Thus,
let
Hnl (z) = [Snl ~t:'~], >
and for j > 1,Hnj(z)= [Snj ~ t k ' ~ >, < zt;"
Sn, for 1 <i <j - 11. These
are disjoint, aiid if Qnj(z) = P(Hnj(z)),
we have
and
snk> z t y 2 , l <max
z<k-1
s,, < lit:/', s,,~ &*I < ~ t k / .~ ]
-
by ~ e b ~ ~ einequality
v's for the last one. By (20), this becomes
5.4 Invariance Principles
However, on each set appearing in Qgk7we have Snk2 xi$" and I Snil-
S n h< ~ t k ' ~so, that SnG> Snk E ~ A>/ ~(x
- - ~)t;''. Hence
Clearly F,(x) < G,,(x) for all r > 1. Hence the above inequality becomes
which is (19). Taking liminf of the first two terms and using (22), we get (18),
since rj > 0 is arbitrary.
From (18) and (19) it follows that
max Z~ < ~ r l / ~
l<j<r J
If Q,(x) = P[maxlsilr ~ ~ / < rx],~then / by
~ the Helly selection principle
for a subsequence ri, Q,, (x) + Q(x) at all continuity points x of Q, a nonde-
creasing function, 0 < <
Q 1. Since Fn does not depend on ri, we get from
(27) on letting r, + oo,
(27') + Q is a d.f. But this is again independently obtained in step 111.1 This
proves our first assertion, as well as the fact that the limit does not depend
on the d.f.s of Xnk. It only remains t o calculate this limit.
>
111. Consider {YA, n I), where the YA are independent and P[Yl = I] =
P[Yi = I ] = i l k >
I. Now wemay take Xnk = YL,1 5 k 5 k, = n (or
< <
X n k = ~ l l n l l1~ , k n , so that the Xnk are infinitesimal, but this is not
used at this point) and set Sn = EL=, Yl.
Let Si = maxl<k<, Sk and z > 0. Now if N = [zfi], the integral part,
- -
then since S: takes only integer values, we have (with largest N < x f i )
j=1
>
(the first set being [S1 J,Sn < J] and the nth term is zero)
(by independence)
n-1
= C >
P[Sj J,S, < J. 1 i < 5j - l]PISn Sj > 01,
-
j=1
(by symmetry of the Y , )
n-1
= C P [ S ~ > J , S ~ < J . ~ < ~ ~ ~ - ~ , S ~ - S ~ > O ] ,
j=1
(by independence)
n-1
= CP[Sj> J,Si < J71< i 5 j - l , S n > J]
j=1
= P[s; > J,Sn> J]
= P[Sn> J].
But
liin p[s,/fi
n
< z] = ( ) l 2 e U 2, z t 8,
by the central liinit theorem. Since the liinit is coiitinuous, the last term in
(29) goes t o zero, and hence for all z > 0
(and, of course, also the independence). This is called the reflection principle
due t o D. Andri.. It is thus clear that in all these problems involving invariance
principles a considerable amount of special ingenuity is needed t o obtain spe-
cific results. For instance maxjs, I Sj, minjln Sj, etc., are all useful problems
with applicational potential and admit a similar analysis.
The next illustration shows how another class of problems called empiric
distributional processes can be treated, and new insight gained. We use the
preceding result in the following work.
formly distributed on the unit interval. By the classical central limit theorem,
2
fi(F,(z) F(z)) Z,,an N(O,1) r.v. Consequently, one should consider
-
and consider -oo < xa < oo such that 0 < F ( x a ) = a < 1. Then one has
the following result, which complements an application detailed in Theorem
3.3.9. However we need t o use the latter theorem in the present proof.
Theorem 8 Let F be a continuous strictly increasing d.f. and z,, 0 < a <
1, be as above. If Fn is the empiric d.f. of a set of n independent r.v.s with F
as their common d.f., then
However, (31) is also an easy extension of (30). Thus we obtain (30) for sim-
plicity, and then modify it to deduce (31). Because of the separability of R,
there are no measurability problems for the events in (30) and (31) (cf. the
proof of the Gilvenko-Cantelli Theorem 2.4.1). We present the proof again in
steps, for clarity. The first step deals with a reduction of the problem.
. .
lim P
n i m
I sup
a < H , , (2111
fi(~n(x)
x
< y] = 8iy(a) epu2l2du, y > 0.
(33)
To see this, let
n
P(A,) < lim
n
P(A,) = / 0
u(a+e)
eP7'2/2du,
360 5 Weak Limit Laws
and if
r 1
-
then An n Bn >An n B,, and since a 5 x , we get with (33)
Since the right sides of (34) and (35) are continuous functions of y ( a ) , and
y(a f E ) + y(a) as E + 0, (34) and (35) imply (32) if (33) is true.
11. Let us therefore establish (33). Here we use the properties of order
statistics, especially Theorem 3.3.9. If Y;" < Y,* < . . . < Y,* are the order
statistics of the uniform r.v.s Y l ,. . . , Y, [Yk= F ( X k ) ]then
, clearly the einpiric
d.f., H, can also be defined as
(This is obviously true for any d.f., not necessarily uniform.) Since H , is a
step function which is constant between Y< and Y$+l,it follows that
6
n-a
liin P inax
[ansiisn ( --
1) < y]
I
k
= lim P max J;l log - < y] .
n-oo anllcsn nu,*
Thus (33) will be proved, because of (36) and (381, if we show
5.4 Invariance Principles 361
+
In other words, if Uk = Vk/(n k I ) , theii Vk has the standard exponential
-
Thus
n k
the V, are i.i.d. with density given by (40). But -log: Yc = Zn-k+l =
E":; Uj, and the Uj satisfy the Lindeberg condition. In fact they satisfy
the stronger Liapoullov condition lim,, p(Zn)/a(Zn) = 0, where
where z > 0. But the left-handed side (LHS) of (41) can be written as
Since 0 < a < 1 and j 2 an, we have with the standard calculus approxima-
tions
362 5 Weak Limit Laws
Thus (A1) and (A2) are independent r.v.s and (A1) 2 to an r.v., N(0, (1 - b)/b)
as n i oo, and by the first part
[("?A) [ab/(b-a)]'"
x e-~2/2 dv.
We have included here all the (brutal) details because of the importance
of this result in statistical hypothesis testing. It is related t o classical the-
orems of Kolmogorov and Smirnov, aiid is used t o test the hypothesis that
a sample came from a specified contiiiuous (strictly increasing) d.f. Various
alterations [e.g., with absolute deviations in (30) and (3111 are possible. We
do not consider them here, but indicate some in a problem (see Problem 32).
Before ending this section, we add some remarks on two points raised
before. Processes {Xt, 0 5 t 5 1) more general than Browiiiaii motion are
those which have independent iiicrements aiid are stochastically continuous,
i.e., limt+, P [ Xt X,I
- > E ] = 0 for t , s E [O, I] aiid a given E > 0. Such
processes can be shown t o have no discontinuities of the second kind, so that
their sample paths belong t o D[O, 11. Consequently one can seek conditions
for the convergence of Pn of Yn(t, .)--the random polygon obtained as in The-
orem 6 for sequences of rowwise independent asymptotically coilstant r.v.s.
Here the theory of infinitely divisible distributions (as a generalization of the
classical central limit problem) enters. The corresponding characterization has
been obtained by Prokhorov (1956). The second point is about the "strong"
invariance principle. This is about a statement that the random polygonal
processes, obtained as in ( I ) , converge t o the Brownian motion process with
probability one, if both caii be defined on the same probability space. Cali
this always be done or only sometimes? A first positive solution was given by
V. Strassen in the early 1960s. To describe his formulation, let {X,, n > 1)
be i.i.d. with zero means and unit variances. If So = 0, S, = Ci=,X k , let
Yn(t, .) be the polygonal process defined similarly (but we can not use the
central limit theorem aiid so must have a different normalization). Thus let
us define on the probability space (R, C , P) of the X,
where [t]is the integral part. Extending the ideas from the embedding method
of Proposition 2, aiid using other tools (such as the iterated logarithm law-see
the next section) one caii obtain the followiiig result.
Even though we discussed the existence of Brownian motion on [0, 11, the
general case of R+ is similar. A number of other "old" results have been ex-
tended to this setting. A survey of these aiid other possible extensions with
references have beeii given by Csorgo aiid Rkvksz (1981), to which we refer the
interested reader for information on this line of investigation. As is clear from
the statement of Theorem 9, one needs to use several properties of Brown-
ian motion, (some of these will be studied in Chapter 8) and will lead us
tangentially in our treatment. So the proof will not be detailed here.
It is clear that the results of this section indicate that a study of limit the-
orems in suitable metric spaces more general than those of C[O,I] aiid D[O, I]
can be useful in applications. These may clarify the structure of the concrete
problems we described above. Relative compactness of families of measures
and their equivalent usable forms, called tightness conditions from Prokhorov's
work, have beeii the focus of much recent research. One of the main points
of this section is that an essential part of probability theory merges with the
study of measures on (infinite-dimensional) function spaces. We have to leave
these specializations at this stage.
This result showed the power of the probabilistic methods and represented a
great achievement in the subject. That was enhanced by the next result of
Kolmogorov's, when he generalized the above for an arbitrary independent
sequence of bounded r.v.s. We establish this here. Because it is a pointwise
coilvergelice statement, the result is a strong limit theorem. However, error
estimates in the proof depend critically on the weak limit theory, and so the
result is placed in this chapter. (Thus the strong statement is really based on
the "weak statement.") Actually in the modern development of probability,
the strong and weak limit theories intertwine; and this greatly enriches the
subject.
and
For a proof of (3a), we need t o establish with the weak limit theory certain
exponential bounds on probabilities. Thus we first present them separately
for clarity. Note that if (3a) is established, then considering the sequence
{-Xn,n > 1) which satisfies the same hypothesis, one gets (3b), and then
(3c) is a consequence. Hence it suffices t o establish (3a). This is done in two
stages. First one proves that
5 Weak Limit Laws
and then for each E > 0, one shows that the left side is >
1 E a.e. These
-
1
h(x) = (e" - 1 - x)/x2 = -
2!
+ 3!x + x2
- -
4!
+ . .. . (5)
For x > 0, h(x) > 0, hf(x) = (dh/dx)(x) > 0, and hl'(x) = (d2h/dx2)(x)> 0.
Heiice (h(.) is a positive illcreasing convex function. The same is also true if
x < O[h(O) = 1/21. To see this, let z = y , y > 0, so that
Hence h(x) > 0 for x < 0, and limy,o g(y) = 112, lim,,, g(y) = 0. Also,
9'(?4) = - (Y + 2)g(?4)lly. Thus g1(?4) < 0 iff 9(y) > (Y + 2Ip1. And g"k4) =
[ 9 ( ~ ) ( ~ ~ + 4 ~ + 6 ) ( ~ SO ) 1 d/ l~( y, ) > 0 i f f d y ) > (y+3)(y2+4y+6)r1.
+ 3that
+
For us it suffices to establish the first inequality. Since (y 2 ) ~ < + +
' (y 3) (y2
4y+6)-17 y > 0, if we verify the second, both inequalities follow. Thus consider
the latter. Now
Thus
>
I f y 3, g(y) > ( y - l ) / y 2>(y+3)/(y2+4y+6). Heiice in all cases h given by
(5) is a nonnegative increasing convex function on R. This function h plays a
5.5 Kolmogorov's Law of the Iterated Logarithm 367
key role in the next two lemmas, giving the exponential bounds alluded to be-
<
fore. (Only 0 h / is used in the next lemma, and Xk's are as in Theorem 1.)
+ >
(ii) For (91, we use the inequality 1 x exp(x - x2) for x 0. In fact,>
>
this is obvious if x 1 and for z = 0. If 0 < x < 1, then this is equivalent to
+
showing that (1 x) exp(x2 x) > 1, or that
-
Multiplying over 1 < k < n , this yields (9) since si = C;=,02. If also
<
02 c i s i , so that n, n
we get (10) from (9) on substituting the bound c i for EL=la;/sA. This com-
pletes the proof of the lemma.
Next we proceed to derive the lower bound using (10). This is somewhat
involved, as remarked before.
The idea of the proof is (i) t o obtain lower and upper bounds for (so we
get exponential bounds for q5k), and (ii) then use an ancient transformation,
due t o F. Esscher in 1932, t o relate the result t o the desired inequality (14).
There is no motivation for this computation, except that it was successfully
used by Feller (1943) in an extension of a result of Cram& aiid then in his
generalization of Kolmogorov's LIL. We employ it here in the same manner.
To proceed with point (i), let c, = d,/s, and consider q5kn(t) = $k(t/sn).
Since q5k is actually holomorphic (on C),we have on differentiation
However, a; <
d: = c:s:, k <
n , and for t > 0, 4k(t/sn) > 1. Thus the
inequalities (16), (17), and (11) yield, with h of that lemma, [$kn = log&,]
and similarly
For point (ii) we proceed t o the key Esscher transformation aiid use these
bounds in its simplification. Let 0 < t < to be fixed, and if Fk is the d.f. of
370 5 Weak Limit Laws
Xk, so that Fk, is the d.f. of Xk/s,, 1 5 k 5 n, define a new d.f. Fin by the
equation
dFin (x) = [$kn (t)lP1etxdFkn (x). (all
Let {Xkn(t), 1 k < < n} be independent r.v.s each with d.f. FL,. It may
be assumed that these r.v.s are defined on the same probability space as
the original Xk, by enlarging the underlying space if necessary. Let S,(t) =
Xkn(t) and Fi be its d.f. Noting that if the ch.f. &,
of F;, is calculated,
then the ch.f. 2of S,(t) is given by
respectively, so that they are $k(t) and $:(t). If F ' is the d.f. of G ( t ) =
(S, (t) $; (t))/
- a, then it is clear that
Here we use
Now.
5.5 Kolmogorov's Law of the Iterated Logarithm
1
2 t;[-
2
+ o(t,c,) - o(t;c:) - (1 + o(t,c,))]
1
+ +
[since h ( x ) = - o ( z ) ,h l ( z ) = 1 o ( z ) as z = t,c,
2
+ 01
Also by hypothesis
dnxo dnzn
lim -5 liin -= 0,
n-cc S, n-os S,
372 5 Weak Limit Laws
because dFk(x) = 0 for 1 XI > dk and ~ s ,> d, for large enough n , by the
above noted condition. But ~ a r ( ~ , ( t , ) / m=) 1. It follows from Theorem
3.5 that (S,(t,) -$k(t,))/ + an r.v. which is N ( 0 , l ) . Consequently
the right-side integral of (27) is >
112. Hence (27) becomes for large enough
n
This is (14) if t, = x,(1 ~ ) 2 ' / ' , aiid the proof of the lemma is complete.
-
Note how significantly the central limit theorem (for rowwise independent
sequences) enters into the argument in addition t o all the other computations
for the lower bound. We are now ready t o complete the
Proof of Theorem 1 / ~ ) so d, T,
Let d, = o ( s , ( l o g l ~ g s , ) ~ ~ aiid
1 X,l < d,, a.e. If c, = d,/s,, then c, = o(1) and a i / s i < c i . Also,
~ , ( l o g l o g s , ) - l / ~ + 0 as n + cm. For any a > 0, b > 0, consider a! >
bF1 + bh(ab), where h(x) = (ex - 1 - x)/x2. Choose P > 1 such that
+ +
alp2 > bF1 bh(ab). Since s i + , = s: o:, so that s,+l/s, + 1 as n + oo
(because ai/si 5 c i + O), we deduce that there exist nk < nk+l < . . . such
that s,, < PIC< s,,,,. For otherwise there will be no s, in (PIC,Pk+l)such
>
that l i m ( ~ , + ~ / s , ) /3 > 1, contradicting the preceding sentence. This implies
also that s,, PIC,the symbol indicating that the ratio + 1 as k + oo.
Let x i = log logs,, so that for large enough n , c,x, 5 a , since c, goes t o
zero. We can now use Lemma 2. Taking A, = n / P 2 > 0, b > 0, aiid x, > 0 as
here, we then get by ( 8 ) ,
P
I inax Sk> a p F 2 s n , (log log s,, )
l<k<n, J
< exp{- [(ab/P2)- b2h(ab)]log logs,,). (29)
But (nb/P2) b2h(ab) > 1, by the choice of P. Hence there is an 7 > 0 such
-
+
exp{- (1 q ) log log s,, ) < {k log ~ ) l ( " / ~ )
for all large enough n , because of s,, ,!?%Thus by the first Borel-Cantelli
lemma, since (kl~gp)-~-(< ~ /cm,
~ ) we have
5.5 Kolmogorov's Law of the Iterated Logarithm
<P 1
r
mau
<n<nh+l
S, > as,, (log log s,,)~/', i.0.
7
J
It follows that
+
Since a > bE1 bh(ab) for all a > 0, b > 0, we see, on letting a + 0, so that
+
h(0) = 112, that (30) is true for all a > bE1 b/2. The least value of the right
side isa, and so (30) holds a.e. if a = a . This establishes (4). Note that
by applying this result t o {-X,, n 11, we deduce that >
-
lim Isn
s, (log log s,) lI2
<J?
-
a.e.
We now prove the opposite inequality t o (4). Again choose the {nk, k 1) >
as before, and let 0 < E < 1. To use Lemma 3 and the second Borel-Cantelli
lemma (cf. Theorem 2.1.9ii), it is necessary first t o consider at least pairwise
independent events. We actually can define mutually independent ones as
follows. Let
Ak = [S,, Snkpl > (1 ~ ) ~ ' ~ a l c b l c ] ,
- pk, >
- -
a; s:, , (1 pE2),
=
-
- -
Consequently, we have
This important theorem answers some crucial questions but raises others.
The first one of the latter kind is this: Since many of the standard applica-
tions involving i.i.d sequences of r.v.s are not necessarily bounded but will
have some finite moments, how does one apply the above result? Naturally
one should try the truncation method. As remarked by Feller (1943), if the
X,, n > 1 are i.i.d., with slightly more than two moments, then they obey
the LIL. In fact, let Y, = X,xArL,where A, = [ X,I <n1I2 log log n] and
E(Xf(1og X I I ) ' + ~ )< oo for some E > 0. Then one can verify with the Borel-
Cantelli lemma that P [ X n # Yn] < oo, so that X, = Y, a.e. for all large
n. But the Y, are bounded and the reader may verify that the Yn-sequence
satisfies the hypothesis of Theorem 1. Hence that result can be applied t o the
i.i.d. case with this moment condition. However, this is not the best possible
result. The following actually holds. The sufficiency was proved in 1941 by P.
Hartman and A. Wintner, and finally in 1966 the necessity by V. Strassen.
We state the result without proof. Note that, since "limsup, S," defines a
tail event, the probability of that event is either 0 or 1, by the "0-1 law."
In a deep analysis of the case of bounded r.v.s, Feller (1943) has shown that
for any increasing positive sequence {a,, n >
1) and S, = X k with X k
as (bounded) and independent r.v.s, one has with s i = Var S,, E ( X k ) = 0,
In the i.i.d. case, (35) reduces t o the sufficiency part of the previously stated
result. (For the Bernoulli case, see Problem 35.)
Another point t o note here is that in both Theorem 1 and Proposition 4
>
the {S,, n 1)-sequence satisfies the central limit theorem, in the sense that
Thus one can ask the question: Does every sequence of independent r.v.s
>
{X,, n 1) which obeys the central limit theorem also obey the LIL? What
about the converse? In the i.i.d. case, Proposition 4 essentially gives an answer.
The general case is still one of the current research interests in probability the-
ory. The preceding work already shows how the combination of the ideas and
methods of the strong and weak limit theory is essential for important inves-
tigations in the subject. In another direction, these results are being extended
for dependent r.v. sequences. Some of these ideas will be presented in the next
section which will also motivate the topic of the following chapter.
zp-zaixtpi) (W) =
t=l
E,"(w)
lim P[g(n;a)(&
n-00
- a) < z] = 1: f (u) du, 2 t 8, (3)
exists, where (i) f (u) = (llfi) exp(-u2/2) if if a1 < 1 [so that the limit is
+
the N(O,1) d.f.], (ii)f (u) = [ ~ ( lu2)Ip1 for a1 > 1 when the E, are also
normal (so that the limit is the Cauchy d.f.), and (iii) for I a1 = 1 [the E,
being as in (ii)]
5.6 Application to a Stochastic Difference Equation
f 2 -112
(x? = ( 8 ?~ /R ( )
t)i3/2
COS(~(X, t ) - 83 ( x 7t ) )
2
(
p 2 ( x , t )= 2 1 - -
id2 (sinh2 & + Ix (sinh2 &
+ sin2 6)
+ cos2 4%) - nfi
( &)
1 - (sin & + siiih &),
r 2 ( x ,t) = sinh2 & + cos2 &+ 2zt (sinh2 &+ sin2 6)
-
Furthermore, if I a > 1 and the E , are also normal, o r 1 a1 < 1 and the E ,
satisfy the only conditions of (i)(so that they need not be normal) t h e n we
have
I n fact, if if a1 > 1, the limit d.f. of the (6, - a) depends o n the d.f. of the
&,-the 'hoise," in the model (1)( w i t h F=l) and the invariance principle i s
inapplicable.
The proof of this result is long and must be considered separately in the
three cases: 1 a < 1,l a1 = 1, and 1 a1 > 1. This result exemplifies how real
378 5 Weak Limit Laws
<
since the X j are independent and E(x;) m202. Similarly (since the Y , are
also independent) one gets the following estimate:
5.6 Application t o a Stochastic Difference Equation
Hence
Thus we have reduced the problem t o finding the limit d.f. of SA/o(SA) of
independent summands which, however, are not identically distributed. For
this we now verify Lindeberg's condition (cf. Theorem 5.3.6). If Fj is the d.f.
of X j , then the following should tend t o zero. Indeed,
We now turn t o the proof of Theorem 1, in stages. First the case 1 a <1
is considered.
n
D
=
,-112 C E ~ x ~ - (cf.
~ Problem
/ ~ ~ 38). (10)
t=l
[Actually in this calculation we use the result of Problem 11 of Chapter 2,
especially the Cram&-Slutsky calculus. The reader should review it now if
not already done so.] We now assert that the d.f. of this r.v. tends t o N ( 0 , l ) .
In fact, let
D
Thus (A,/&) = (Em,,/&). Consider Y , = ~j C:=,a'~~-~-l. Then
>
a ( ~ , , , ) / n l / ~ + +m. Thus (5,
j I} is an rn-dependent sequence satisfying
the hypothesis of Proposition 2. Hence (10) and (12) imply that
>
For the cases 1 a / 1, by hypothesis the ~i are N ( 0 , l ) . The exact d.f. of
g(n; a)(&,-a) can be derived in principle. This is accomplished by calculating
the moment-generating function (1ng.f.) of ( X I , . . . , X,), and then we shall
be able to find the limit m.g.f. of this quantity. By inverting the latter, the
limit d.f. of g(n; a)(&,- a) is obtained. Let us fill in the details.
Because the ~i are independent N ( 0 , l ) and EI, = X k - axkp1, the trans-
formation from the ~i to the Xi is one-to-one with Jacobiaii unity, aiid we
find that the density of the X k to be:
+ +
But if we let p = 1 a2- 20 2au, q = -(a! +
u), then writing D, for the D
which depends on n , we get by expansion (of D, = A, 2uBn 2vCn)
- -
with boundary values D l = 1 and D 2 = p - q2. Let 1-11, 1-12 be the roots of the
characteristic equation of the difference equation (17):
1 1 2 1/2.
p2-pp+q=0, sothat p 1 , p 2 = -2p * - (2p 2 - 4 q ) (18)
Substituting (19) in (16), we get the m.g.f. of the r.v.s for each n,. As we
shall see later, one can employ Cramkr's theorem (Theorem 4.3.2) to obtain
the exact distribution of h, - a! without inverting m(., .) itself. This is still
involved. So we turn to the asymptotic result.
For the limiting case, first observe that
Case 1 a ! <1
5.6 Application to a Stochastic Difference Equation
Case 3 1 a / = 1 :
fiat1
pl=l+-
n
+ 2 inf i + 0 ( n P 2 ) ,
-
-
- I (1 2t2
[exp
-
(-%)
+
e x ~ ( t 2 t:/2)
- tf)P1/2
(cos26- *2 6
ifla1 < l
ifla1
~ i n 2 ~ ) - ifl a1
>1
/ ~ = 1.
(21)
[The calculations from (17) leading to (21) are due to John S. White.] Replac-
ing t l , t2 by itl and it2, we see that the limit ch.f. 4 is given by
$(tl, t2) = lim Atr,(itl, it,).
n-00 (22)
Incidentally, this shows that (corresponding to I a1 > I ) ,
where
XIB,X XIC,X
Hence kiiowiiig the ch.f. of (U, V) by (22), we need t o find the d.f. of U/V
which gives the desired result. Here one invokes Theorem 4.3.2. With that
result, we get the density f of F , the d.f. of U/V, as follows:
This was seen in Proposition 3, even without the normality of the ~ i The .
above simple calculatioii verifies the result with this specialization, and is in-
cluded only for illustration. It also shows how iiivariance principle is at work.
using the notation of (23). Hence the asserted density is obtainable from
Theorem 4.3.2 after getting the ch.f. $ of (u/@). This is seen by using the
illversion formula for (U, V) aiid simplifying the following where
5.6 Application t o a Stochastic Difference Equation 385
Here, on substituting for 4 from (21) [or(22)], one recognizes the ch.f. of
the gamma density in the simplification. The easy computation is left t o the
reader. Then by a straightforward evaluation, one gets
which is (4').
In the case that a < 1, (cf. Problem 38) we have
n
[(1- tw2)/la] xffl+ rr 2 = 1 in probability.
t=l
However, if I a1 > 1, then a slightly more involved calculatioii of a similar
nature shows that
D
where V > 0 a.e. Hence g(n; a)(&a ) + U/V; and since V is a positive r.v.
(which is not a constant), its d.f. is determined by that of the E~ Consequently
the limit d.f. of the estimators in this case depends on the initial d.f. of the
i.i.d. "errors." This is the substance of the last comment. With this the proof
of Theorem 1 is filially finished.
Remarks 4 The result of Theorem 1 for la1 < 1 holds for all ~i as stated
there, without specification of their d.f. This is a case of an invariance principle
for dependent r.v.s which we have not proved. Also, in the case that a = 1
one can apply Donsker's theorem itself, since the fuiictioiial
and (f' = df /dz is t o exist), has for its set of discontinuities zero Wiener
measure, and we may apply the corollary of Skorokhod's theorem. If the &i
have a symmetric distribution, then at-ki = for a ! = 1, and the same
reasoning holds. However, if a > 1, then no invariance principle applies.
Even in the best circumstances, the Lindeberg condition fails, and, as Theo-
rem 4.6 (Prokhorov) implies, an invariance principle caiiiiot be expected. The
last part of Theorem 1 is an explicit recognition of this situation. An extension
of Theorem 1 if X o is a constant (# 0) is possible and is not difficult.
We have not coilsidered the case that k > 1 in (1) here. It is not a simple
extension of the above work, but needs additional new ideas and work. A brief
discussion was included in the earlier edition. We omit its consideration now.
5 Weak Limit Laws
Exercises
1. Let X 1 , X 2 , .. . be independent r.v.s such that P [ X k = k3I2] = P [ X k =
-k3I2] = 1/2k and P [ X k = 01 = 1 - k-l. If Sn = x i = l X k , verify that the
sufficiency condition p(Sn)/a(Sn) + 0 as n + oo is not satisfied in Theorem
1.3. Show, however, that Sn/a(Sn)2 S, which is not N ( 0 , l ) .
4. Just as in Problem 2, show that the Berry-Essken result can be given the
following form if only 2 + S, 0 < S 5 1, moments exist (and the corresponding
p / a + 0): If the X,, n >
1, have 2+S moments aiid the r.v.s are independent,
let Sn = C i = , X k , p2+"Sn) = E ( xk2+" aiid 02(sn) = Var Sn > 0.
Then there exists an absolute constant Co[= Co(6)]such that if G(.) is N ( 0 , I ) ,
where
D
If S, = Cy'l Xi, show that (logn/n)S, + S as n + oo,and find the d.f. of
S. [Hint: Use ch.f.'s and the continuity theorem.]
7. If in the above example the common d.f. has the following density:
D
find the normalizing sequence a , > 0 such that S,/a, + S as n + oo. Does
a, = [n/(2rn 1 ) 1 ' / ~work? What is the d.f. of S ? Discuss the situation when
-
x [I pf2tlt2/(1 i t l ) ( l i t 2 ) 1 ~ ' ~ .
- - -
10. Let {Xnk,1 < < k k,,n > 1) be rowwise independent r.v.s with
s n = C'd
kr' X n k ,E ( X n k ) = 0. Suppose that E ( s , ~ + " ) KO < oo for a
0 <6 < +
1 (so that each X n k has 4 6 moments by Problem 2 in Chapter
2). Suppose also that Var Xnk + 0 uniformly in k as n + cc (essentially
infinitesimal) and Var S, + a2 < cm. Show that S, 2 S, which is N ( 0 , a 2 )
iff E(S:) - 3[E(S;)l2 + 0 as n + cm. [Hints: Use Proposition 4.1.3 in one
D
direction; for the converse, infinitesimality implies that when S,, + S for any
subsequence, S must be iiifinitely divisible. Then the given moment condition
implies that in the Kolmogorov pair (a,K),a = 0, and K must have a jump
at z = 0 of size a2 and be constant elsewhere.]
>
11. Let ISn,n 1) be partial sums as in the preceding problem and with
4 moments, but assume that each S, is infinitely divisible. (This is the tradeoff
for S = 0 in the above.) Then Sn 2 S and S in N(0, a2)iff E(Si) + a2 and
E(S;) 3[E(S:)l2 + 0 as n + cm. (Since S is necessarily iiifinitely divisible
-
now, one can proceed as in the last problem again. The point of these two
problems is that the coiiditioiis are only on the moments and not on the d.f.s
themselves. These observations are due t o P. A. Pierre (1971).)
13. (a) Let 4 be an infinitely divisible ch.f. and (7, G) be its Lkvy-
Khintchine pair. Then show that q5(2"), the (2k)th derivative, exists iff sro0
z2'
dG(z) < cm. (This is a specialization of Proposition 4.2.6.) (b) Let $ be the
ch.f. of a d.f. Show that q5 = exp($ 1) is an infinitely divisible ch.f. Hence,
-
4
using the remark after Theorem 4.2.4, conclude that q5, + # O,q5, infinitely
divisible ch.f.+4 4
is a ch.f. [But if is a ch.f. theii it must necessarily be
infinitely divisible, by Proposition 2.3.1
(First define yn, G, as in the proof of Theorem 2.5 and theii obtain y , G by a
limit process.)
Exercises 389
15. For the same d.f. as in the preceding problem, show that the corre-
sponding Kolmogorov and Lkvy pairs are (y, K) and (y, a', M , N) where
where the product is extended over all prime numbers pi > 1. If 4 is defined
+
as 4 ( t ) = <(a it)/<(o), show that q5(.) is an iiifiiiitely divisible ch.f. (Note
that q5 never vanishes, and Log q5 can be expressed, with the Euler product,
as a limit of suitable Poisson ch.f.s)
I t 5 1 / ~SO, that both n and the 4k depend on E . We then have the followiiig
assertion: The r.v. X (or its ch.f. q5) is iiifiiiitely divisible in the generalized
sense iff it is infinitely divisible in the sense of Definition 2.1 [The method of
proof uses several estimates on ch.f.s. That the ordinary + generalized sense
is easy. For the reverse implication, note that 4 never vanishes, and the proof
proceeds by centering the r.v.s at their medians and writing:
n
Log 4(t) = iy,t +C +
( t )- 11 o(1),
[$j
j=1
$j(t) = E(exp{it(Xj - mj)}) with rnj as a median of X j ,
so that
5 Weak Limit Laws
+k (e'tx - 1 - -)
itx
1+x2
1 + x2
-ci~,(z) o(1)
x2
+
for suitable y,, F,, G,. Since E > 0 is arbitrary, let E \, 0, and show that the
right side gives the L&y-Khintchine formula, and hence the converse. The
details need care as in the necessity proof of Theorem 1.5. This result is due
to Doob (1953). Because of this result, "in the generalized sense" is omitted.]
The factor exp{iylt - i t ' r t ) is called the Gaussian component of q5 and the
rest the generalized Poisson component. The correspoiidiiig Kolmogorov form,
if X has two moments finite, is given by
Exercises 39 1
where K({O)) = 0, and the measure K is finite on the Bore1 a-algebra of RP.
Here the vector ;jl is given by
Verify these two forms followiiig closely the proof in the one-dimensional case.
In the finite variance case, show that the mean and covariance matrices are
20. We now present an example, also due to P. Lkvy, showing that the
converse of the last statement of the preceding problem is not true. Further,
this result has additional interesting information. Let X , Y be iiidepeiideiit
<
N ( 0 , l ) r.v.s aiid E = X 2 ,q = 2XY and = Y2. Then show that any two of
the three r.v.s (<, q, 5) have joint d.f. which is infinitely divisible, but that the
joint d.f. of (t,q, 5) is not. Deduce that if each linear combination Cf=la i t i
is infinitely divisible ( a l , . . . , ap) E RP, then one caiiiiot coiiclude that the
vector (El,. . . , t p ) , p > 1, must be infinitely divisible. [Sketch: By the image
law theorem, the ch.f. q5 of (6, q, <) is
<
Since 6, are independent and each is infinitely divisible being gamma (or
Chi-square variables), so is the pair (E, <). It suffices to verify that (E, q) is
also. From the Lkvy canonical form for <, one gets
where
and g(v) = Iuplev if v < 0, = vpl (epv - e p v A ) if u > 0. Comparing this with
the Lkvy representation of Problem 19, we get that (<, q) is infinitely divisible.
+ +
From this one concludes that a l < aaq asc is also infinitely divisible for any
( a l , a2, a3) E R3.
To see that (El q, <) itself is not infinitely divisible, suppose the contrary.
Then, by the representation of 4 (cf. Problem 19), since E , q , < depend on
the infinitely divisible r.v.s without Gaussian components, Log 4 is a sum
of terms of the form SR(eit'" - l)v(dz), each a Stieltjes integral. Looking at
terms for tl = 0, t 2 = 0, or t3 = 0, one concludes that this is the exponent
of the generalized Poisson part, aiid thus it must be a sum of functions of a
form that depends on ( t l , t 2 ) ,(t2,t3), aiid ( t l , t3) alone. Hence, in particular,
<
it must satisfy (6, q, having all moments finite)
But the q5 of our example does not satisfy this, so that it cannot be infinitely
divisible. Thus there are more surprises than one ordinarily expects from these
r.v.s. Incidentally, this implies that the distribution of the sample covariance
matrix of a random sample from a multivariate normal r.v., called the Wishart
distribution, is not illfinitely divisible. Also compare this with the pathological
example given by Problem 9.1
a4 Log 4
4 ( X i 7 X j )= at: at: Its=o=t7 , 1 < i , j < p
Exercises 393
>
Show that q ( X i , X j ) 0. Verify that if all the Xi are bounded below by a
constant (so that its ch.f. has no Guassian component), then the Xi are inde-
pendent whenever they are uncorrelated. Without the boundedness hypothesis
(using the Kolmogorov representation) show that the components X I , . . . ,X,
>
are independent iff X f , . . . , X i are uncorrelated. [Hints: Since q(Xi, X j ) 0,
the uncorrelatedness of the X9 implies that CiZj q ( X i , X j ) = 0. Thus the
measure K ( . ) must concentrate on the axes xi = 0, i = 1 , . . . p , and implies
the uncorrelatedness of Xi and X j , i # j. The converse is clear. Several spe-
cial properties of such r.v.s with four moments have been discussed by Pierre
(1971).]
23. Complete details of the remark after Proposition 3.13 regarding the
case of the characteristic exponent a = 2 (namely, q5 is nondegenerate,
a = 2 + q5 is a normal ch.f.)
24. This problem and the next show that the bounds on a , p in Theorem
3.16 are necessary and complete the proof there (at least for the case 0 < a <
2, a # 1).Let us write the ch.f. 4 of Eq. (44) of Section 3 as follows:
(a) Verify that g(x; a,P) = g(-x; a, -P), and if < a! < 1 and P >
tan(.ira/2), then g(0; a,P) < 0. For this note that
25. (Continuation) The result for 1 < a < 2 is somewhat -more in-
volved. First verify that p(x : a,y) = p(-z; a , y ) as before, since $,,,(t) =
$Y,, (-t).
(a) Let 0 < a, -1 < y < min(2a! - 1 , 1 ) ; then for each x > 0
IP* > t a n ( ~ / 2 a )iff y*I > a-' > 2-l. But if y*l > a-l, then, by Problem
24a, g(0; a!-', P*) < 0. Thus p(0 : a-l, y*) < 0. Hence if P < - t a n ( ~ a / 2 ) ,
then y < a! - 2 + y* < -l/a, and so p(x : a , y ) < 0 as x + oo. By the
contiiiuity of p, this again shows that 4,,p is not a ch.f. However, the set
{ p : 4,,p = a ch.f.1 is convex, aiid symmetric around the origin. This shows
that if p > I t a n ( ~ a / 2 ) 1then
, $,,p is not a ch.f., and completes the necessity
(a! = 2 being immediate, since then /3 = 0). [This argument, which did not use
the theory of infinitely divisible d.f.s, is due t o Dharmadhikari and Sreehari
(1976).]
then show that it is a metric on all such random variables (and a norm if a >
1). [This is a standard Minkowski metric, cf. e.q. Rao (1987, or 2004), Theorem
4.5.4 and Proposition 4.5.6.1 The result shows that each linear combination of
X j (Cy=l ajX3) is a symmetric a-stable random variable (e.g., take ai = biu
in the above). Show that if one defines on the measure space (R, B, G ) , [B
being Bore1 a-algebra] a probability measure R whose Fourier transform R is
given by
and moreover RAl, .,A,, = P o TA;, ,*,,, TA,, ,A,, : fl + Rn being the coor-
dinate projection, guaranteed by Theorem 3.4.10. Finally, conclude that the
family of finite dimensional distributions (or image measures) of {p(A),A E
B o ) is precisely I R A ,A E B O )which is a symmetric stable class with the prop-
erty that the values of p are pairwise independent on disjoint sets, hence, also
mutually independent in this case, and p(A) = C;=+(An),An E Bo with
A = Ur=lAn, a disjoint union, the series converging in probability (whence
a.e. by Theorem 4.6.1). [A brief application of such random measures will be
given in the last chapter, aiid the resulting analysis plays an important role in
certain parts of probability as well as potential theories. For many exteiisioiis
and generalizations of these results to Banach space valued functions, the
book by Linde (1986) is a good place to turn to. Contrast the conditions here
with Exercise 20 above, in obtaining a multivariate extension for the subclass
under consideration.]
26. This aiid the next problem present an extension of the stable distri-
bution theory and help in understanding that subject further.
Call a continuous mapping $ : R+ + C stable if (i)$(O) = 1, ( i i ) $(t)I <
1,
and (iii) for each n(= 1 , 2 , .. .), there exist a, > 0 and b, E R such that
[The proof of this result proceeds in all its details similar to the argument
+
given for Theorem 3.16. Starting from the identity n$(t) = $(ant) ibnt, aiid
hence for all rationals r > 0, one gets
+
p(t) = ($(t)/t) - $(I), then p ( a p ) = p ( a ) p(P), and the familiar func-
tional equations result, as in the text. For another generalization of stability,
see, Ramachandran and C.R. Rao (1968).]
for 0 < p < 1 , 1 < p 5 2, and dl = (log2)-l. [Since the matrix (q5(t, - t j ) , 1 5
+
i, j 5 3) is positive definite, letting u = $(t), v = $(tl), w = $(t t'), where
t', t > 0, we get from its determinant
28. Let L be the class of d.f.s F which are the d.f.s of r.v.s S such that for
some sequence of independent r.v.s X I , X2, . . . , S, = Xk, then (S, -
D
b,)/a, + S, where b, E R and 0 < a, + cm, with a,+l/a, + 1. The
problem of characterization of L was proposed by A. Khintchine in 1936 and
P. Lkvy has given a complete solution in 1937. Thus it is called the L (or
Le'vy) class. Show that F E L iff the ch.f. 4 of F has the property that for
each 0 < n < I,$, is a ch.f., where $,(t) = q5(t)/q5(at), and hence q5 is
infinitely divisible. [Note first that $ cannot vanish. Indeed, if $(2a) = 0 for
some a > 0 such that q5(t) # 0,O 5 t < 2a, then $,(2a) = 0 and
(see the inequality in the last problem for q5 satisfying [even minimal] positive
definiteness). But $,(a) + 1 as a! + 1, for each a E R, by the continuity of
5 Weak Limit Laws
Find its ch.f. $. Show that fx is of class C but that 4 is not a stable ch.f.
[Hint: Observe that an exponential density is infinitely divisible, and using
the result of Problem 14 with the Lkvy-Khintchine or Kolmogorov canonical
form, deduce that fx is of class C.With the help of Theorem 3.16, conclude
that 4 cannot be in the stable family, thus the latter is a proper subset.]
D
b,)/a, + whose d.f. is V. Thus if F has two moments, then [with a, =
d m ,b,
= E(S,)] F belongs t o the domain of attraction of a normal
law. Show that only stable laws have iioiiempty domain of attraction aiid that
each such law belongs t o its own domain of attraction (cf. Proposition 3.15).
Exercises 399
If in the above definition the full sequence {(S, - b,)/a,, n 1) does >
not converge, but there is a subsequence n1 < n2 < . . . such that (S,, -
D
b,,)/a,, + W, then the d.f. H of W is said to have a domain of par-
tial attraction, with F as a member of this domain. Establish the follow-
ing beautiful result due to Khintchine: Every infinitely divisible law has a
iioiiempty domain of partial attraction. [Sketch (after Feller): If q5 is an iii-
finitely divisible ch.f., then q5 = e i by the Lkvy-Khintchine formula, aiid in fact
4(t) = lim,,, nr=, ei:'(t), where the right side is a Poisson ch.f. ( a "gen-
eralized Poisson") and gn(0) = 0 = $7(O), the $a being continuous. Each
$7 is bounded. As a preliminary, let q5k = e c ~be any sequence of infinitely
divisible ch.f.s with each & bounded. Let X(t) = Ck,, n;'& (akt). Choose
ak > 0, nk < nk+l < . . . such that In,A(t/a,) <,(t)l+ 0. Indeed, choose
-
first a sequence of integers such that (nk/nkPl) > 2'" supt,R Ick(t). Then
Now choose a1 = 1, aiid after a l , . . . , a,-1, let a, be so large that for I t < r
[since &(7) + O as 7 + 01 <k(t)I < (2r2n,)-l. With this choice, the right side
is < r-l, so that the left side + 0 as r + oo.Next, since the given q5 is a limit
of 4, = nr= e$:'(t) = ecr7(t)(say), so that <,
are bounded and continuous,
&(t) + 0 as t + 0. Define A(.) with this <,
in place of 5, above. Then for a
choice of nk, we have eX() infinitely divisible, and
Since exp{n,A(t/a,)) is a ch.f. tending to the ch.f. q5, it follows that eX(.)be-
longs to the domain of partial attraction. The original proof is more involved.]
Suppose the sequence satisfies the Lindeberg condition (as in Theorem 4.6).
Show that, following the argument of Theorem 4.7,
4
.rr k=o (2k + 1)-' exp {- ( 22(a+b)-
k + 1 ) 2 ~ 2 } sin
(2k+l)ar
if a > O , b>O
IO, otherwise.
400 5 Weak Limit Laws
[Hints: First show that the limit exists as in Theorem 4.7. To calculate that
limit, again consider the special r.v.s P [ X n k = +1] = P[X,k = -11 = 21.
Letting a, = [a&] + 1, p, = [ b f i ] +
1, verify that
where
with
+
the binomial coefficient being zero if ( n k)/2 is not an integer. Verify that
CkAk, tends t o the desired limit as n + ce on using the central limit theo-
rem. The Ak, are essentially planar random walk probabilities.]
Deduce from the above that
I 0, otherwise.
i m I/
n-oo
sup
a<F(x)
I &(Fn (5)- F(x))
F(z)
where
< <
where Y i , 1 k n , are order statistics from Y,, 1 < i < n. Thus it suffices
to calculate the limit d.f. of
Use the last part of the preceding exercise for this. The details are similar
to those of Theorem 4.8. See Rknyi (19531, in connection with both of these
results, i.e., this and the preceding one.]
>
33. Let {X,, n 1) be i.i.d. random variables with S, = Cr=lX k , So=
0. In the study of fluctuations of the random walk sequence {S,,n 0) in>
we~ had to analyze the behavior of R, = max{Sk : 0 k
Section 2 . 4 ~ < <
n).
In many of these random walk problems, the joint distributions of (R,, S,)
are of interest. Let &(u,v) = E(exp{iuR, +
ivS,)), the ch.f. of the vector
(R,,S,). If S2,S; are the positive aiid negative parts of S, [i.e., S$ =
+
(IS, Sn)/2, S; = Sk - S,], let $,(u, v) = E(exp{iuSi ivS,)). Show +
that one has the identity
The important identity (+) [or equivalently (*)I, and hence (*+), was ob-
tained by Spitzer (1956), aiid is sometimes referred to as Spitzer's identity.
He established it by first proving a combinatorial lemma, aiid using it in the
argument. Here we outline an alternative algebraic method of (*), which is
simple, elegant, and short, due to J. G. Wendel, who has obtained it after
+
Spitzer's original work. Since 4, (u, v) = c, (u v, -v), we get (*) (+).
Proof Let G,(x, Y) = P[R, < x, S, < y], and note that
Rn+1 - max(R,, S,+X,+i) and S,+i = Sn+Xn+l, so that {(R,, S,), n 0) >
is a Markov process. Also if A> :" [R, < x, S, < y], then by a simple
property of coiiditioniiig (cf. Proposition 3.1.2), we have
5 Weak Limit Laws
Thus, going to the image spaces, this becomes, if F is the common d.f. of
the Xn, (using a inanipulatioii as though the conditioning event has positive
probability, cf. Kac-Slepian paradox, Section 3.2, and thus we get the follow-
ing which needs a rigorous justification, which may be provided since all the
conditional measures are regular in this application, and since Spitzer's orig-
inal argument does not use the conditioning! See also Example 3.3.3 (b)) We
outline the argument as follows:
(iii)
Substituting this in (iv) and using it in (ii) and (iii), (*) follows. [Can this
method be used to solve the integral equation of Exercise 2 1 in Chapter 3? In
both cases the method of (ij is to be justified with the Kolmogorov definition
of conditioning, without taking the inanipulatioii for granted.]
>
34. Let {X,, n 1) be an i.i.d. sequence as in the preceding problem, and
Sn = cL=1X k , So= 0, and Sk be the positive part. Show that
and
Exercises
converges, where n, = [exp{r log r}], the integral part of the number shown. If
cj2(n) = 2 log log n , then one obtains the LIL for the Bernoulli r.v.s. [Hints:Let
A, = [S, -np > m q 5 ( n , ) for some n , n, < n < n,+l]. Verify that P(A,) I
, ) some coilstant 0 < C < oo. By hypothesis, this
~ e x ~ { - ~ q 5 ~ ( n , ) ) q 5 ( nfor
is the r t h term of a coilvergelit series, so that C,,, -
P(A,) < oo. For the
second part, which is more involved, note that n, - n,-1 n T ( l- (1ogr)-'),
and that [cj(n)/n]exp{-; cj2(n)} < cc only if q52(n) > 2 log logn, and
it diverges if ' ( n ) < 2 log log n. By various estimates deduce that for some
positive constants C1, C2,
and hence there are positive constants C3 and C4, such that
C3
I -1 log -
n,+l
2
406 5 Weak Limit Laws
and show that cj(mr)/q5(nr) 1. This gives after a careful estimation of terms
and
These imply the last part, and then the probability statement obtains. This
result if p = q = and q52(n) = 2X log log n was first proved by P. Erdos, aiid
the case of 4 for more general r.v.s is due t o Feller (1943), as noted in the
text. Regarding this problem, see also Feller (1957).]
38. Let {Xt, t >0) be a sequence of r.v.s which satisfy the first-order
(stochastic) difference equation where { E ~ t, >
0) is an i.i.d. sequence with
E(E~ =)0,O < E(E:) = r2 < 00,
To determine the convergence properties one needs t o consider the cases that
(i) n < 1, (ii) la1 = 1, aiid (iii) la > 1, where (for reference) the correspond-
ing processes are termed stable, unstable, and explosive, respectively. Since
Exercises 407
39. Let h, be the least squares estimator of a of the first order model as
given by Eq. (2) of Section 6 under the same conditions. Show that if 1 a1 > 1
and g(n; a ) = I aln/(a2- I)'/', then g(n; a ) ( & ,- a ) 2 V and that the limit
d.f. namely of V, depends on the common distribution of the errors E,. [Thus
V is Cauchy distributed if the E, are N(0, I).]
40. Use the method of proof of Theorem 6.1 (i.e., employ Theorem 4.3.2),
and complete the proof of the case a ! = 1 of that result. With a similar
method find the limit d.f. of 6, if a! = 1 and note that this is different from
that of the case a ! = 1 [the norming factor g(n; a) being the same]. A similar
statement holds for a = -1 also. [The computations are not simple and need
care.]
41. Proposition 6.2 admits the following extension. Let {U,, n 1) be a>
sequence of m-dependent r.v.s with E(Un) = 0, E(u:) <
M < cm, n 1. If >
Sn = x i = l Uk, then S,/a(S,) 2 2,where Z is N ( 0 , l ) distributed, provided
that (EL=, Var U ~ ) / ~ ~ ( S=, )O(1) and that the Liiideberg coiiditioii holds,
i.e. for each E > 0
where the Fkis the d.f. of Uk. [This is a special case of a result due t o S. Orey.]
This part uses the theory developed in the preceding two parts, and
presents different ideas based upon them. Chapters 6 and 7 are short but
the former includes the relatively new concept of stopping times and classes
of (central) limit theorems for dependent sequences. This leads t o the in-
troduction of ergodic sequeiices and the Birkhoff's theorem, and the strict
stationarity concept. The work motivates a glimpse of stochastic processes
proper. We consider the key classes related t o Brownian motion (its quick ex-
istence through random Fourier transform) and the Poisson measures leading
t o a brief analysis of general classes of additive processes. These are used t o
show how various classes of families arise in applications. They include strong,
strict and weak stationarities as well as the corresponding strict, strong and
weak harmonizabilities. The key role of Bochner's V-boundedness principle is
discussed. Numerous problems of interest, relating t o queueing, birth-death
processes, generalized random measures, and several new facts are discussed
as applications with extended sketches. As a result, this part and particularly
Chapter 8 received the largest amount of new material in comparison with
the earlier presentation.
Chapter 6
>
Definition 1 Let {.En,n 1) be an increasing sequence of a-subalgebras
of C in a probability space ( R , C , P). Then a mapping T : R + W U {oo) is
called a stopping t i m e (or an optional or a Markov t i m e ) of the class {.En,n >
1) if for each k E N,[T = k ] E Fk,or, equivalently, [ T <
k ] E Fk,[or
[T > k ] E F k . A sequence {Tn,n >
I)] of stopping times of the fixed class
> >
{.En,n 1) is termed a stopping t i m e process if Tn 5 Tn+l, n 1. The family
>
F ( T ) = {A E .Fa= a(& .En) : A n [T = k] E Fk, all k 1) is known as the
class of events prior to T .
In this definition, if P [ T = +oo] = 0, then T is a finite stopping time, and
if this probability is positive, then it is nonfinite (or extended real valued). In
general a linear combination of stopping times is n o t a stopping time. Because
of their use in our work, we detail some properties in the following:
412 6 Stopping Times
>
and if {T,, n 1 ) is a m o n o t o n e sequence of stopping t i m e s of the s a m e F,-
family, t h e n limn sup, T, = T i s a stopping t i m e , and limn F(T,) = F ( T ) .
aiid
[ i n i i i ( T l , T z ) < n ] = [ T l I n ] U [ T z < nEF,,
] n>l,
we deduce that min(Tl, T2) and max(Tl, T2) are stopping times of {F,,n 1 ) . >
Note that the argument holds for sup, T,, inf, T, also for sequences. Next
+ >
so that Tl T2 is a stopping time of {F,,n 1 ) ; aiid (i) and (iii) are simple.
<
For (iv), let A E F(T1). Since Tl Tz, A can be expressed as A = An[Tl <
Tz].Thus it suffices t o show that the latter set is in F ( T 2 ) . But by definition,
we need to verify that A n [TI < <
T2] n [T2 n] E F,, n >
1. Now for a n y
>
stopping times TI, T2 of {F,,n 1 ) we have
if x <
n , = fl if z > n. Thus for all z >
0, min(Ti, n) is F,-measurable, so
that the last set of (2) is in F,. Since A E F ( T l ) , the first set is in F,. But T2
6.1 The Calculus 413
is an {F,,n > 1)-stopping time. Thus the middle one is also in F,, so that
A E F(T2). This proves (iv) a little more generally than asserted.
Regarding (v), the argument for (2) above with A = 0 shows that [TI 5
T2] E F ( T 2 ) , aiid hence its coinpleineiit [TI > T2] E F ( T 2 ) . On the other
hand, min(Tl, T2) is a stopping time by (ii) aiid is measurable relative to
by (iv). Hence [min(Tl,T2) < Ti] E F ( T i ) , by the first line of this paragraph,
and [min(Tl,T2) = T2] E F(T2) + [TI = T2] and [TI < T2] belong to F(T2).
By interchanging 1 and 2, it follows that these events are also in F ( T l ) , and
hence are in their intersection.
Finally, for (vi), siiice liinsup, T, = infk sup,2k T, aiid by (ii) sup,,k T,
>
is a stopping time of {F,,n I ) , it follows that lim sup and lim inf a n d lim,
if it exists, of stopping times of IFn,n > 1) are again stopping times of the
same family. Let Tn(w) + T(w), w E 0. If Tn 1' T , then by (iv) a(U, F(T,)) c
F ( T ) . To show there is equality, we verify that each generator of F ( T ) is in the
left-side 0-algebra. Let A E F,, and coiisider An [ T > n], which is a generator
<
of F ( T ) , siiice this is just A n [ T nIC. Now A n [ T > n] = A n [Tk >
n] E a ( U k F ( T k ) ) . Thus F ( T ) C a(Uk,lF(Tk)). Next let T[ J, T. Then
nk21 F ( T k ) > F ( T ) . For the reverse inchsion, let A E F ( T k ) for all k >
1
(i.e., is in the intersection). Then we have, for n 1, >
A standard and natural manner in which stopping times enter our analyses
may be illustrated as follows. Let X I , X 2 , . . . be a sequeiice of r.v.s and A c R
be an interval (or a Bore1 set). The first time the sequeiice X, enters A is
clearly
TA =inf{n > 0 : X, E A), (3)
where TA = +cc if the set { ) = 0.If Fn = a ( X 1 , . . . ,X,), then we assert
>
that TA is a stopping time of {F,,n 1) (or of the Xn-sequence). Since
and these sets belong to Fk,it follows that TA is an r.v. and is an IFn,n 1)- >
stopping time. It is called the debut of A. If {X,, F,, n >
1) is an arbitrary
adapted (i.e., X, is 3,-measurable and 3, c 3,+1) sequence and T is an
>
{F,,n 1)-stopping time, we define XT : 0 + t o be that function which
is given by the equation
Thus, [XT < x] E 3 ( T ) , x E R (cf. Definition 1). Hence {XT,,, 3(T,), n > 1)
is an adapted sequence. It should be noted that we are fully using the linear
ordering of the range space of the T . If it is only partially ordered, then the
arguments get more involved. Here we consider only the simple case of N.If
the range of T is finite, theii T is called a simple stopping time. [If the range
is in IW+, and it is bounded (or finite) theii T is a bounded(simp1e) stopping
time.] Now we present an application which has interest in statistics.
>
Lemma 2 Let {Y,, 3,,n 1) be an adapted integrable sequence of r.v.s,
and T be a bounded IFn,n >
1)- stopping time. Then setting Yo = 0, we have
Hence on noting that [ T > k] E Fk,one has, from (4) by adding for 0 In <
no 7
which implies (3) since min(T, no) = T and YAo = YT, by the boundedness of
T.
Letting n + oo, by the monotone convergence on the right and the dominated
convergence on the left in (6), we get (I), since ST,,+ ST a.e., ST,, IIST1 ,
and ST is integrable. In fact,
For (2), let us set Yn = Si,Tn = min(T, n ) , and Fnas given. Then Yn+l =
S; + x;+, + 2Xn+1(XI + . . . + X,), so that (3) becomes
6.2 Wald's Equation
[since E ~ ~ ( X =
~ E(Xk+1)
+ ~ ) = 0,
by independence and the vanishing mean hypothesis]
= Var X I . E(Tn) (as before). (7)
+, Sr,
E (ST,, - )2 = P[T > k
E (x:+~) +1 (by independence)
k=m
= Var X1 . [E(T,+l) - E(T,)] +0 as n , rn + oo.
> >
then Ta is a stopping time of (a(X1,. . . , X,), n I ) , E ( T ~<) oo, k 1, and
one has the renewal assertion, lim,,, E ( T a ) / a = p p l , where p p l = 0 if
p = +oo. Further, for each a > 0, we have the bounds
1
-E(Ta) I a/E(rnin(Xl, a ) ) I E(Ta).
2 (9)
418 6 Stopping Times
Since the X i are i.i.d., so are the S i , j = 1 , 2 , . . . . Next let p = P [ S i a]. <
Then the fact that P I X 1 > 01 > 0 implies that p > 0. Also p < 1 if rn is chosen
large enough. This is because ( l l m )ELl X X ,+ E ( X 1 )> 0 a.e. by the SLLN,
and hence xzl
X X ,+ oo a.e. as rn + oo.Thus for any a > 0 , P [ S i > a] > 0
if m is large enough. We fix such an m, so that 0 < p < 1. Consider now for
jm <n < ( j l)m, +
P[T, > j m ] < P [ S 1 < a , S2 < a , . . . , Sjm-1 < a]
< P [ S i < a , S i < a , . . . , SiPl < a ]
- pj-l (by the i.i.d. property of the S ; )
- p[nlml-l 5 p(nlm)-2 = (pllm)np-2
(with [nlm]for the integral part of nlm). (10)
Hence for any integers no > 1,
Let 0 < t < - logpllm, where m and p are chosen for (10). Now we take no
>
large enough so that k log n < tn for n no. Then (11) becomes
E(T;) < x
no
n=l
n"[Ta = n] + x
n h o
e t n p 2 ( ( p 1 ' m ) n [by ( l o ) ]
Since k >
1 is an arbitrary integer, (11') shows that Ta has all moments finite.
It also implies that for each a , T a < oo a.e.
For the second part, since p > 0 , let 0 < a! < p. Set X,' = X i if X i ao, = <
>
0 otherwise. Then { X A , n 1 ) are i.i.d. bounded r.v.s, with
6.2 Wald's Equation 419
where F is the d.f. of X I . Thus choose a 0 such that a < E(X1) < p. If
SA = C r = l Xi, and TA = inf{n >
1 : SA > a}, then by the first part TA is a
stopping time with E(TA) < cm. Hence by ( I ) ,
Since by definition, Sh < S,, we get T, < TA, and theii (12) implies
Hence
lim sup E(T,)/a Il/a.
a-oo
Since T[ > T,, (15) implies (9), and this completes the proof of the theorem.
The limit statement that lirn,,, E(T,/a) = pP1 > 0 is called a renewal
theorem. Many extensions and applications of the renewal theorem have ap-
peared; for an account of these we refer the reader to Feller (1966).
420 6 Stopping Times
It is clear that {(V . X),, F,, n> 1) is an adapted sequence. The (V . X),-
sequeiice is called the predictable transform of the X-sequence, aiid the V-
sequeiice itself is termed predictable, since the present V, is already deter-
mined (i.e., measurable relative to FnP1, the "past" 0-algebra). Thus the pre-
dictable sequence transforms the increments (Xk- X k P l ) into a new sequence
>
{(V . X),, F,, n 1). If the X-process is a martingale and the V-sequence is
bounded, then the transformed process (1) is also called a "martingale trans-
form," aiid the increment sequeiice {XI, X k P l , k
- > I), is sometimes termed
a "martingale difference sequence," for the discrete tiine index t.
>
If T is a stopping time of {F,,n I), and V, = X[T),], then
is a bounded (in fact (0, 1)-valued) predictable sequence. On the other hand,
if {V,+l,F,, n > 1) is any {O,l)-valued decreasing predictable sequence, it
arises from a stopping time T of {F,,n > 1). In fact, let T = inf{n > 1) :
Vn+l = 0), where inf(0) = +cm.Then [T = n] E F,, and [V, = 11 = [ T = n],
so that T is the desired stopping time.
If {X,, F,, n > 1) is an adapted process, and T is a stopping tiine of
{F,,n > I), then the adapted process {Y,,F(T,),n > 1) is called the
transformed X-process by T , where T, = min(T,n) (a stopping time by
Proposition 1.2ii) and Y, = XT,, . If V, = X[T),], then X . V is written
xT,called a stopped process. This is a special form of (1). The problem
considered here is this. If {X, Fn),>lis a martingale, and T is a stopping
tiine of {F,,n > I), when is the transformed process {XT,,, F(T,)),)l also
a martingale? Without further restrictions, {XT,, ,3(Tn))n)l need not be a
6.3 Stopped Martingales 421
martingale, as the following example shows. Let {X,,n > 1) be i.i.d. with
mean zero. If S, = X k ,3, = a ( X 1 , . . . , X,), then {S,, 3,,n > 1) is a
martingale. Consider
T =inf{n >I : S, > 0). (2)
Then T is an {F,,n > 1)-stopping time, and
since S, > 0 on each set [ T = n]. But T > 1 because, by the queueing aspects
>
of the random walk {S,, n 11,P[sup, S, > 01 = 1 (cf. Theorem 2.4.4). If the
transformed process {S1,ST)by {I, T ) is a martingale relative t o {TI,3 ( T ) ) ,
then we must have 0 = E(S1)= E(ST),and this contradicts (3). Incidentally,
this shows that, in the Wald equation (1) of Section 2, E ( T ) = +cc must be
true, so that the expected time for the random walk t o reach the positive part
of the real line is infinite!
We start with the followiiig optional stopping assertion:
>
Theorem 2 (Doob) Let {X,, Fn,n 1) be a submartingale and {T,, n >
>
1) a stopping time process of {F,,n 11, where P[T, < oo] = 1. If Y, = XT,, ,
suppose that (i) E ( Y 2 ) < cc and (ii) lim inf,,, E(X,+XITk>,I)= 0, k > 1.
>
Then {Yn,F(Tn),n 1) is a submartingale. I n particular, if the Xn-process
is a positive (super) martingale, then (i) holds, (ii) can be omitted, so that
>
{Y,, F ( T n ) ,n 1) is a (super) martingale. The same conclusion obtains for
the given X,-process (not necessarily positive) if either T, is bounded for each
n or X, 5 E ~ ~ I (forZ )some Z E L1(P),n 1. >
Proof It is sufficient t o consider a pair of stopping times, say S and
<
T , with S T , for which (i) aiid (ii) are satisfied. We need t o show, since
Xs is F(S)-adapted and F ( S ) c F ( T ) , by Proposition 1.2, that E(I Xsl) <
oo, E ( X T ) < oo aiid
>
E ~ ( ~ ) ( xxS ~ )a.e. (7 )
Indeed, let A E F, aiid consider, for the integrability of Xs,
(by iteration)
Since X , < X?, we can use (ii) with Tk = S, in (8), so that (letting e + oo)
using (i). Since E ( . ) is a Lebesgue integral, this implies E ( Xs 1 ) < oo. Simi-
larly E(I XTI) < oo. Also, (7) follows from (8) aiid (9). In fact, if Al E F(S),
<
then A1 n [S= k] E Fkand S T implies [S= k] C [ T k]. Hence letting >
A = A1 n [S= k] in (9) and replacing S by T there, we get
which is (7).
For the second part, if X n is a positive martiiigale or supermartingale, (ii)
is unnecessary [cf. (8)], and we verify (i). The positivity implies
with equality in the martiiigale case. Now for any stopping time T of (F,, n >
I}, min(T, k ) is also one, aiid by Proposition 1, {XI,Xmin(T,k)}is a martingale
or a supermartingale for IF1,F(min(T, k))), so that
424 6 Stopping Times
Now letting k + oo on the left, using Fatou's lemma, we get E(I XT,, I) < oo.
But
The last part of the above proof actually gives a little more than the as-
sertion. We record this for reference. The bound follows from (11).
>
Corollary 3 Let {X,, F,, n 1) be a submartingale. If sup, E ( X, 1 ) <
> <
oo, a n d T i s a n y stopping t i m e of {F,,n I ) , t h e n E ( X T ) 3sup, E ( X,I).
>
I f the submartingale i s uniformly integrable and {T,, n 1) i s a stopping t i m e
> >
process o f {F,,n I ) , t h e n {XT,,, F(T,), n 1) is a submartingale.
X P minxk < X
[Kn
The second inequality is similarly proved, and the details are omitted.
>
Theorem 5 Let {X,, Fn,n 1) be an L1(P)-bounded martingale. Then
X, + X, a.e. and E ( X,) <
liminf, E(I X , ) .
Proof By Lemma 3.5.5, which is purely measure theoretic and does iiot
involve any martingale convergence, Xn = xi1) xi2),
- and {x:), Fn,n 1) >
is a positive martingale, i = 1,2. Thus for this proof it may be assumed that
X, > 0 itself. We give an indirect argument.
Suppose that a positive martingale does iiot converge a.e. Then there exist
0 < a < b < ce such that the following event,
must have positive probability. Let us define a stopping time process {T,, n >
> >
I} of IFn,n I} such that {XT,,, 3 ( T n ) ,n 1) is iiot a martingale, contra-
dicting Theorem 2 and thereby proving the result.
6 Stopping Times
Thus
The argument here is due t o J. Horowitz. We can extend it, even without
the optional stopping result and by weakening the integrability hypothesis
to include some non-L'(P)-bounded (sub-) martingales. In this case Lemma
3.5.5 is not applicable. The generalization is adapted from one due to Y. S.
Chow, who obtained the result for the directed index sets satisfying a "Vi-
tali condition." The details of the latter are spelled out in the first author's
Exercises 427
We thus end this chapter with these results using stopping times. A few
coinpleineiits are iiicluded in the exercises, as usual.
Exercises
1. Complete the proofs of the omitted parts of Proposition 1.2.
such that Ezb(ISn+l l X [T2n]) < KO < oo for all n aiid E ( T ) < co, then
show that {Y, = e t S n / ( ~ ( t ) ) nF,,
, n >
1) is a martingale and if To = 1,
428 6 Stopping Times
then {Y%, YT) is a martingale for {.El,.E(T)) and the fundamental identity
of sequential analysis obtains:
(Hint: Use the result of Problem 5 in showing the martingale property.) De-
duce, from this result, the conclusioiis of Theorem 2.1 after justifying the
differentiation under the integral sign.
Deduce that
and
(For (ii), note that A E .E(T1) n.E(T2) implies An [min(Tl,T2) I n] E .En, n >
1. The analysis needs some thought. See, e.g., the first author's book (1979),
p. 351, eq. (6).)
Chapter 7
This chapter is devoted t o a brief account of soine limit laws including the
central limit theorem,and SLLN, for certain classes of dependent random vari-
ables. These cover martingale increments and stationary sequences. A limit
theorem and a general problem for a random number of certain dependent ran-
dom variables are also considered in soine detail. Moreover, Birkhoff's ergodic
theorem, its comparison with SLLN, and a inotivatioii for strict stationarity
are discussed.
Yn (t) = n
k,,
k=l
(1 + itXn,), t € R, ,
= a.
Suppose that the following conditions hold:
P
(iv) X,k are strongly infinitesimal in that maXl<k<k,, IXnkl
- - + 0 as
n + oo.
1
eix = (1 + ix) exp{r(x) - -x2),
2 (1)
This is elementary, but it needs a little care. Thus taking logarithms and
expanding ( I ) ,
k=l
~ ( t x , ~ ; )= Y,(t)Z,(t) (say),
Taking expectatioiis and letting n + GO, the left side gives a sequence of ch.f.s
that tends to the desired normal N ( 0 , l ) ch.f. [with the first term because of
(i) of the hypothesis] if we show that the second term on the right side of (4)
goes to 0 in L'(P) using conditions (ii)-(iv). This is verified as follows. By
P
gives a uniformly integrable sequence which + 0. Thus it also goes to zero in
L1(P) by the Vitali convergence (cf. Theorem 1.4.4), completing the proof.
This result will be used to obtain a central limit theorem for martiii-
gale increments (double) arrays. Recall that an adapted integrable process
{X,,Fn, n > 1) qualifies as a martingale increments sequence iff for each
>
n 1,E~~~ (xn+l) = 0 a.e. (cf. Proposition 3.5.2). Similarly, if {Xnk,& k , 1 <
k < k,, n >1) is a double array of martingale increments sequences, and if
for each n,&k c Fnk, for k <k', then EF1l"Xn(k+l))= 0 a.e. For such a
family the followiiig result holds:
Theorem 2 Let the double array {Xnk,Fnk, 1<k < k,, n > 1) of m a r -
tingale increments satisfy th,e three conditions:
I Xnk
(ii) limn P [maxk<krL
- > E] = O for each E > 0,
432 7 Limit Laws for Some Dependent Sequences
P
(iii) X + 1 as n + m.
Then S, = c::, D
Xnk + to an r.v. which is N ( 0 , l ) .
Remark After the proof we shall indicate how (i) and (ii) are coilsequelices
of the classical Lindeberg condition. Also other forms of (iii), and some spe-
cializations of this result will be recorded.
P
In particular, the last two inequalities of (5) show that XAk = X n k as n + cm.
This implies that {XAk, 1 k < < k,, n >
1) also satisfies conditions (i)-(iii)
of the theorem. We now assert that this transformed sequence satisfies the
hypothesis of Proposition 1.
But by (i) the expectation on the right side of (6) is bounded by C , and hence
>
{Yn(t),n 1) is uniformly integrable, since it is a bounded set of L2(P).Con-
ditions (iii) and (ii) of this theorem are the same as conditions (iii) and (iv) of
Proposition 1. Thus we have verified that all the four conditions are satisfied
by the X~k-sequence,and hence S A + an r.v. which is N ( O , l ) distributed,
so that S, has the same limit distribution. This completes the proof of the
theorem.
Before discussing the relation of the above hypothesis (iii) to the Linde-
berg condition, it is useful to show that the assumptioils are nearly optimal
for normal convergence. More precisely, the following supplement t o the above
result holds.
Taking the maximum of both sides and noting, by hypothesis (ii), that Z, =
P
max I Ynk +0 if max (2,) 5 0.
Kk,, Kk,,
< <
To see that the last part of (8) is true, note that {E"lk ( Z n ) , F n k ,1 k k,)
is a positive subinartingale and Zn E L 2 ( P ) . Hence by Theorem 3.5.6ii, with
- 1 ) q = P/(P
p = 312, we have, if Un(k-l) = ~ ~ l ~ ( k (Zn?, 1)(= 311
P
as n + oo, since Z, + 0 and {Z,, n >I}, being L2(P)-bounded by (i) and
(ii), is uiiiforinly integrable, so that the first factor i 0 and the second one is
bounded. Hence
by (91, and this shows that (8) is true. Thus condition (ii) of Theorem 2 holds
for the Ynk even in a stronger form. [Namely, the sequence + 0 in L1(P).]
For (iii) of Theorem 2, note that by (iii) and (v) of the hypothesis,
7.1 Central Limit Theorems 435
E (kSkrt
max lY,k )2 s k , , x?,) + 2 E ( Gink ,ax[^'^^(^-^)
< 2 E ( kmax , (X,k)12) ,
<
since (a1 +a2I2 2(a: +a;) for a, E R. But the first term is bounded, by the
present hypothesis (i), and the second term is majorized by E(maxk<k,, E311(~-1)
(Z,))', where 2, is given in (8). The inequalities (9) now establish the desired
bound, since using the notation there,
which is bounded by (i) of the present hypothesis. Thus {Ynk,.Tnk,1 k < <
k,, n > 1) is a martingale increments sequence satisfying the hypothesis of
Theorem 2. Hence, as noted at the beginning, the result follows.
Recall that the usual Lindeberg condition states for {Xnk,1 < k < kn}n21
that for all E > 0, as n + ce
It can be shown by examples that (12) is strictly weaker than (11). On the
other hand, condition (ii) of Theorem 2 above can be written as
7 Limit Laws for Some Dependent Sequences
>
Remark If the set of r.v.s I N n , n 1) is independent of the Xn-sequence,
then such a result was known before in special studies in sequential analysis
and without such an independence assuinptioii but with Y = constant. It was
treated by F.J. Anscombe in the early 1950s. The result in the present form is
>
due t o Rknyi (1960). Note that {SN,,, n 1) is no longer a sequence of sums
of independent r.v.s.
where P(A,IA,) is the conditional probability of A, given the event A,, then
{A,, n > 1) is mixing with density a .
Then as m + oo we get
E(g,gk) = + a 2 p ( A ; n A;)
( 1 - Q ) ~ P ( An,Ak)
- a ( l a )[ P ( A ,n A:) + P(Ak n A;)]
-
= P ( A , n AI;)+ a 2 a(P(A,) + P ( A k ) )
-
+ a P ( A k )+ a 2 a ( a + P ( A k ) )= O
-
(3)
as n + oo for each k > >
1, by ( 2 ) . Let KO= sp{g,, n 1) be the linear span
and 'MI = Ro c L 2 ( P )be its closure. Thus (3) implies limn E(g,ho) = 0 for
438 7 Limit Laws for Some Dependent Sequences
each ho E K O . If h E 'MI and E > 0, then there exists a g, E 'Mo such that
I h - g,12 < E, where I f I ; = E ( f 1'). Hence
by the CBS inequality and the fact that Ign12 <1. It follows that, with
(31, limsup, E ( h g , ) < E , so that lim,,, E(hg,) = 0 for all h E 'Ml. If
'Ma = 'Mf c L 2 ( P ) , the orthogonal complement, then E ( f h ) = 0 for all
f E 'Ma, h E 'MI, and since each u E L 2 ( P ) ( = 'MI @ 'Ma) can be uniquely
+
expressed as u = u l u2, U , E 'Mi, i = 1 , 2 , it follows that limn E(ug,) = 0 for
all u E L 2 ( P ) . In particular, if u = X B , B E C, then
exists at all continuity points x of the d.f. F, where 0 < a, 1' GO, b, E R,and
S, = EL=, X k . T h e n S; = (S, b,)/a, i s a mixing sequence with "den-
-
P
It follows that, since S k / a n + 0, for each k > no,
7.2 Limit Laws for a Random Number of R.V.S 439
P D
We note that N,/M, = (N,/n)(n/M,) + Y / Y = 1. Since SiI,, 4 2,an
N(O,1) distributed r.v., it is bounded in probability. Thus the last term of
P
(7) + 0. The first term on the right converges t o the desired limit r.v.
Thus the theorem follows (from Slutsky's result again) if the middle term
on the right side of (713 0. Since M,/N, 3 1, it suffices t o show that
-1/2 P
(SN,,- S ~ , , ) k t r , + 0. This is inferred from the Kolmogorov inequality (cf.
Theorem 2.2.5) as follows.
Let E > 0,S > 0 be given. Let Ak = [Y = tk]as before, aiid set B, =
[INn - [nY]I < nrl1,rl > 0. If Cnk = [SN,, - ST!, > then ~fi],
440 7 Limit Laws for Some Dependent Sequences
Now choose ko large enough so that P ( D k o )< 613, and then choose r/o >0
small enough so that 0 < v < 70 implies
Remark 4 It is clear from the above proof that the independent r.v.s
should satisfy an appropriate central limit theorem (e.g., Liapounov's or the
Lindeberg-Feller's) and that the i.i.d. assuinptioii is not crucial. With a little
care in the error estimates, the result can be shown t o be valid if the liinit r.v.
Y is merely positive and not necessarily discrete, since (as noted in Problem
3 in Chapter 4) it can be approximated by a discrete r.v. t o any degree of
accuracy desired.
We now sketch another extension of the above ideas if the sequence
{X,, n > 1) is replaced by a certain dependent double array of consider-
able practical as well as theoretical interest. This will also indicate how other
extensions are possible.
7.2 Limit Laws for a Random Number of R.V.S 441
where G is a k-variate Gaussian d.f. with mean vector zero and covariance ma-
trix D C D of the following description: D is a k x k diagonal matrix (dii), dii =
< <
qi/F1(Qi), and C = (aij), with aij = ai = (1 qi)/qi, i j k, i = 1 , . . . , k,
-
where k > 1 is fixed (and F1(z)= dF/dz, which exists a.e.). (A k-variate
Gaussian d.f. is given in Definition 5 above.)
dom. In any event Qi is the "ith quantile" of F and Yr, is an estimator of Q,.
442 7 Limit Laws for Some Dependent Sequences
Then (11) gives the limit d.f. of the vector estimators of {Q,, 1 < i < k ) and
has independent interest in applications.
<
. . < ik n and lim,,,(ij/n) = qj, 0 < ql < . . . < qk < l(p, = 1 - qi); then
lim P [ f i ( Z Z J
n-00
+ logpj) < zj, 1 < j < k] = G(zl,. . . , z k ) , (12)
where G is a k-variate normal d.f. with mean zero and covariance matrix 2,
which is of the same form as C of Theorem 6, with a;' for a, there.
and similarly,
For any integers h, H, 0 < h < H, and a such that s + a # 0, one has
+
where 0 < Q < 1, Q/h i 0 if (H h) (h a)-' + 0 as h + oo. Substituting
-
where I E =
~ O((n-ij)-l), one has E ( Z i l ) + l o g p j and &(aj+logpj) +0
P
as n + oo. Hence f i ( Z i I - a j ) = f i ( Z i l +
logpj), and by the first para-
graph, this tends t o the standard normal r.v. as n + oo. The proof is finished.
Heiice
2,-1
a,-1
t2n
2) (by iteration)
k=O
<
as in (16). Here we used the inequality l + z ex, z > 0. Heiice {Y,(t), n 1)>
is bounded in L 2 ( P ) , which implies its uniform integrability. We now verify
7.2 Limit Laws for a Random Number of R.V.S 445
that Y,(t) + to a constant weakly in L 1 ( P ) , and this gives the desired distri-
butional convergence, as in Proposition 1.1.
Thus let k' be fixed, 1 k' < <
i j - 1, and A E F n k , . Then for n k' we >
have, as in (25),
]n
k' 2,-1
= E
[
XA
.=o
+
n(1 itx,,:)
,=kl+l
(1 + itE(X,,,))
integrable. This shows that the last integral in (30)+ 0 and by (29) the right
side tends t o e p t 2 1 2 p ( A )Thus
. we have for each A E Co with P ( A ) > 0,
Since Sn = f i ( Z i J - E ( Z i J ) )the
, proposition is established.
-c:=, ti logy,, with M given by the right side of (16). If X takes values
d l , d z ,. . . aiid S j = P [ X = d j ] ,(6, > 0 ) then
- (32)
k
where ni = [ n d I ]2;:
, = t I Z ~ n q l dLetting
,~. n + oo and using Proposi-
tion 8 and the bounded convergence theorem in (321, one gets
for j = 1 , . . . , k .
P
For this observe that rj/rk + qj/qk > 0 as n + cm,and hence r j and r k
grow at the same rate. Let d l , d2, . . . be the values taken by X , and fixing j ,
set n k = [nqjdk],the integral part. The desired coiiclusion depends on Kol-
mogorov's inequality (or equivalently here, on the submartingale inequality).
Given E > 0, S > 0, consider, with rj , sj as before,
t n B: = [I r,
~ u Ak - nkl < n6], and (for n k - Sn < l <nk + Sn) one has;
P[Ak n B: n Cnkl
.P[ inax f i l Z l
It-nk <6n
r I e
(where U = andZt
t=l
Unt,
=
e
x
I$ being i.i.d. gamma r.v.s with unit mean)
( d
LHS (34) < 2K06
-
mo - 1 E
+ j + P(B').
E2
k=l (nk - dl, + 1)
+
Since mo is fixed, and n k / ( n k - dk 1) + 1, for each k, as n + cm, we can
choose 6 > 0 small enough so that the first term on the right of (36) is at
most €13. Then choose n large enough so that P(BE) < €13. Thus as n + cm,
(36) gives
liin P[2/SI;+IZTJ ZSJ > E] < E.
-
n i m
We now extend the above proposition from the Zi to the original r.v. se-
quence; namely the Xn7s:
liin P[h,(Xl,
n-00
- 01) < 21,. . . , XI;, - 01;) < ~ k =] F ( x 1 , . . . , XI;), (37)
where D is the (Jacobian) matrix given in the statement and the Euclidean
norm of the (column) vector Iq(x - 8) 1 = o ( x - 81) by elementary analysis.
Hence
hn(f (Xn) f (0)) = hn(Xn Q)D1 h n ~ ( X n 0). + (39)
Since by (37) h,(X, 0 ) is bounded in probability and h, 1' oo a.e., it follows
P P
from the stochastic calculus that X, - 0 + 0, and then h,v(X, - 0) + 0.
7.3 Ergodic Sequences 449
From this and (39) we deduce that the stated limit d.f. of (38) must be true.
Proof We first note that the result holds for at least one n. Indeed, let
00
B = { ~ E A : T ~ ( w ) @ A forall n > 1 } = ~ { w E A : T ~ ( ~ ) @ A }
n=l
n00
= ~ nT-~(A~),
n=l
A~ = n - A. (1)
lim
n-oo
-
n
x
1 n-l
k=O
f (T'(.))
7.3 Ergodic Sequences 451
for various transformations T . This forms the ergodic theory. Note that the
X k are not independent. Here we shall prove the pointwise convergence of
(4), which is a fundamental result originally obtained by G. D. Birkhoff in
1931. In many respects, the proof proceeds along the lines of the martingale
coilvergelice theorem, but cannot be deduced from the latter. As in the mar-
tingale case, a certain maximal inequality is needed. The proof of the latter
has been re-worked and simplified since its first appearance, and the following
short version is essentially due t o A. M. Garsia who obtained it in the middle
1960s. If T is measure preserving aiid Q : f H f o T , f E L1(P), it defines
a positive linear mapping with IQf 11 I 1 1 f 11 and Q l = 1. We actually can
prove the result, more generally, for such "contractive" operators Q on L 1 ( P ) .
More precisely, one has a basic inequality as:
Proof Let f, = x r = o ~ ' " fand fn = supolk5n fk. Then fn 1. and [f,>
01 1. Af. Since Q is positive and f = fo, we have f <
f + Q~L
aiid f,+l =
+
f + x:=~(Q'"+~f) = f + QfmI f ~ f A , 0I m I n. Hence I &+I I
+
f Q f$. Coiisequeiitly
Note that the finiteness of P is not used here. With this we can establish
Since clearly lim inf, Sn( f )(Tw) = lim inf, Sn( f )(w) and similarly for lim sup,
we conclude that Bab= T-'(B,~), SO that Babis invariant. If p(Bab) < m,
then the argument of Proposition 1 yields p(Bab) = 0. Let us show that
<
~ ( B a b ) 00.
If A E C, A c Baband p(A) < cm, consider g = f - b x A E L1(p). Also, let
>
B be the set on which Sn(g) 0 for at least one n. Then by Proposition 2,
Consider h' =a - f > 0 so that Sn(hl) > 0 on Bab: then Proposition 2 again
implies
0 5 L h1dii=L (a-f?dw (12)
p(Bab) = 0. Letting a < b run through the rationals, one deduces that the set
N = [lim inf, S, ( f ) < lim sup, S, (f )] is p-null, whence lim,,, S, ( f ) = f *
exists a.e.(p), proving the SLLN result in (7). We take p = P, a probability
measure, for the last part.
It is clear that 3 is a a-algebra and f * is F-measurable. Since f E
L1(P), Sn(f)E L1 (P),and by Fatou's inequality
=L f dP [since T - ~ ( A )= A a.e.1
because P(T-'(F)) = P(F) < 6,so that 1 f dP < E . Thus the uni-
-
form integrability holds. This also implies, by the same Vitali theorem, that
1 ISn(f ) f * 11 0, coinpletiiig the proof.
-
-
(1171)
n-1
k=O
- E ( X ) ( E'(x), Z= (4, a)),a.e. and in L'(P). (16)
Starting with the strictly stationary (and perhaps ergodic) processes, one
can extend the central limit theory and other results t o this dependent class.
We leave the matter here; a few complements are indicated as exercises.
Exercises
1. The result of Proposition 1.1 can be stated in the following slightly more
general form. Assume the conditions (ii) and (iv) there and let (i) and (iii)
be replaced by (i') E(Y,(t)xA) + P ( A ) , A E El aiid (iii')~:" XKk 5 Z > 0
a.e. Then, with simple changes in the proof (see also Proposition 2.7), show
D
that S, +V where the ch.f. of V is given by Qv(t) = ~ ( e q ( + ) ) .
2. Let ( 0 ,E, P) be the Lebesgue unit interval. (R= [O, 11,P = Lebesgue
measure.) Consider the r.v.s Vn defined by V,(w) = s g n ( ~ i n ( 2 ~ + l ~ w )E) , w
>
R,n 0. These are independent with means 0 aiid variances 1, known as the
Rademacher functions. If X n k = (Ifk/&) +
2"I2xA,,, A, = [O, 2Zn],0 k < <
<
n , F,k = o(Xn3,j k), then verify that the sequence
satisfies condition (ii) of Theorem 1.2 but not the Lindeberg condition [not
k D
even its weaker form: CZ& Xi,XIIx,,,l>d+ 0 as n + oo]. Both these obser-
vations are due t o McLeish (1974).
[The argument is similar t o (but simpler than) that of Theorem 2.6. Both
these results on order statistics were sketched by the first author (1962). Also
Rknyi's (1953) results on order statistics are of interest.]
7. Let L"(P) be the Lebesgue space (a, C , P), p > 1, aiid let T, :
L1 (P)+ L1 (P) be a positive linear mapping such that IT, f I <I f 11 and
1 IT, f 1 ,< I f .,I Then it can be verified that the adjoint operator T,* of T, is
also positive aiid satisfies the same norm conditions on L1(P) and L m ( P ) . Let
TI, = TnTn- . . . TI aiid V, = T,*,Tln. Then V, is well defined on L" (P), sat-
isfies the same norm conditions, aiid is positive. I f f E LP(P), 1 < p < oo, aiid
= V, f , show that {g;, 3,,n > 1) is an adapted integrable sequence. Let
7 be the directed set of all bounded or simple stopping times of {3,,n 1). >
>
The sequence Ig;, 3,,n 1) is called an asymptotic martingale (or amart) if
the net { ~ ( g ; ) ,7 E 7 )of real numbers converges. It is asserted that + gf
a.e. and in LP(P), f E Lp(P), 1 < p < oo. This is equivalent to showing that
>
{ 9 ~ , 3 , ,n 1) is an amart. [Proofs of these results are nontrivial, c.f. G.A.
Edgar and L. Sucheston, (1976). We hope that a simple proof can be found!]
8. Show that the conclusioii of Theorem 2.1 is valid if Y there is not nec-
essarily discrete but just strictly positive.
(Y, y , u) = ~ 2 0 0 _ - ~ ~ ( SpIi,
, Z , the doubly infinite product measure space in the
sense of Jessen (cf. Theorem 3.4.3), let 4, : R + 0 ,s E S, be a family of one-
to-one measurable and P-measure-preserving transformations. Note that the
mapping (w, s ) H (4, (w), s) is P x p-measurable. If X : R + R is an integrable
r.v. and for each y E Y, sk(y) is as usual the kth coordinate of y[sk(y) E S,
all k], then the mapping (w, y) H ( q 5 s , ( y ) ( ~ ) , $(y)) is P x u-measurable and
P x u-measure preserving, where $ : Y + Y, defined by sn($(y)) = s , + ~ ( y ) ,
is called a shift transformation. Then 4, b , . . . q5so(Y) (s),gn(y))
(y, s) = ( ~ s , (y,
is well defined for n = 1 , 2 , . . . . Show, using Theorem 3.3, that
subsidiary points noted above are useful in the proof. More on this and ex-
tensions with interesting applications to Markov processes can be found in an
article by S. Kakutani (1951).)
Chapter 8
independent, with Xt,+, - Xt, distributed as N(0, a2(t,+1 - t,))for some con-
stant a2 > 0. (For simplicity, we take a2 = 1. Compare with Definition 5.4.1.)
1
where a k f( ) = Jo f (z)Hk(z)dz. The completeness of the Haar system may
be proved in different ways. For instance, one can show that they form a
<
basis in Lp(0, I ) , 1 p < co. However, a short probabilistic argument runs
as follows. If B, = a(H1,.. . , H,), then the atoms of B, are intervals of the
+
form [k2Tn,(k 1)2Tn), and ~ " r 7 - l (H,) = 0 by definition of each Hn. Thus
>
{S,, B,, n 1) of (2) is an L2(0,1)-bounded martingale for each f E L2(0,1).
Hence it converges a.e. aiid in L2(0,1) to f , since a(U,,,- B,) is the Bore1 0-
algebra of (0,l). Thus So1
(fH,)(z)dz = 0 for all n >
0 implies f = 0 a.e.,
which is coinpleteness of the system by definition. As is well-known, in L2(0,1)
this is equivalent to Parseval's equation. Thus f , g E L2(0,1) implies
8.1 Brownian Motion 461
converges uniformly in t for almost all w E 0, and the family {Xt, 0 < t < 1)
i s a Brownian motion process o n this probability space, which can t h e n be ex-
tended to all of Kt+.
Remark: The crucial part is the uniform convergence of the series in (4)
for all t , and a.a. (w). This is an important reason for using "random Fourier
series" t o study the sample path properties of their sums.
Proof Let us first establish the uniform convergence of (4). Using the
N ( 0 , l ) hypothesis, we have for u > 0
With this, if grn is the Schauder function, then by integration we note that
&(t) > 0, and if 2" < < <
m < 2"+', 0 t 1, then 0 $,(t) < <
2-"12/4. But
(5) implies
an = max
2"<k<2"+1
<
ItkI [3log P1]
462 8 A Glimpse of Stochastic Processes
< <
By (6) the series defined in (4) is uniformly convergent for all t , 0 t 1 with
< <
probability one. It remains t o verify only that {Xt, 0 t 1) is a Browiiiaii
motion process, since the continuity of Xt(w), a.a. ( w ) , is immediate from the
uniform convergence of contiiiuous summands.
It is clear that X o = 0 a.e. One uses the completeness property in the form
of (3) t o verify that Definition lii is also true. Let 0 < t l < . . . < t m < 1.
In view of the uiiiqueiiess theorem for ch.f.s (cf. Theorem 4.5.l), it suffices t o
show that the joint ch.f. of (Xt,, . . . , Xt7,,) is normal and that the (Xt,+lX t , )
are independent. For this coiisider
To get the result for 0 < t < oo,define (inductively) the process as follows.
Let Xt = xi1)for 0 < t < 1, and if X t is defined on 0 < t < n , then let
Xt = x,(!:') + X n for n < t < n + 1 , where { x ~ ( ~ < )t ,<ol } , n > 1, are
independent copies given by (4).
From Eq. (91, for 0 < t l < t 2 , we get E ( X t l X t , ) = E ( X $ ) = t l , so that
the covariance function r(.,.) of the Brownian motion process is given by
are also Brownian motion processes, (i.e., all have the same finite dimensional
d.f.s).
We can now present a mathematical description of the irregular behavior
of the process, observed by R. Brown under a microscope:
Proof In view of the preceding corollary, it suffices t o prove this result for
< <
{Xt, O t 1). We present a short argument here followiiig Dvoretzky et. al.
(1961). If Xt is differentiable at a point s E [0, 11, then for t , s close enough,
we must have Xt(w) - X,(w) < C t - sl. One may translate this into the
+
following statement: Let j = [ns] 1, the next integer t o the integral part [ns]
for any natural number n > 1. Then the above inequality gives
for an integer l > 1 and n large enough. Let A(j, l,n ) be the w-set defined
by (3) and let A be the set of w for which X(.)(w) is differentiable at some
point of [0,1]. It is not obvious that A E C . However, we show that it is
contained in a P-null set, so that (by completeness of the space ( R , C, P)),C
contains all subsets of P-null sets which then have zero probabilities. Note
that A(j, !, n ) E C.
It is seen that, by definition,
Consider
n+l
< liminf
n
X[P[1 XlInl < !/n] I3
j=1
8 A Glimpse of Stochastic Processes
= lim inf ( n
n
+ 1)(P[ XI/, < eln])'
From (4) and (5) it follows that A is contained in a countable union of P-null
sets and hence has probability zero.
If the path {Xt, (w),0 t < <
1) is rectifiable, then it is of bounded vari-
ation. But from real variable theory we know that a fuiictioii of bounded
variation has a derivative at almost all points s E [ O , 1 ] (Lebesgue measure),
and by the above, the set of w-points with this property is a P-null set. Thus
the last assertion follows, and the proof is complete.
Actually the above authors have shown, in the paper cited, a stronger
result than this theorem, namely: the process {Xt a t , 0 t+ < <
I ) , with a
"linear trend" a t , has almost no points of increase or decrease for any a E R.
Many intricate properties are found for this process because of the analytical
work available for the d.f.s N ( 0 , t ) of X t , as in (5).
By the preceding theorem, almost no Brownian path t H Xt (w),0 t 1, < <
is of bounded variation. However, P. Lkvy has given the followiiig interesting
property on its "second variation" :
lim
n-00
x[xic2-"
k=l
-
2
X(k-l)2-71] =1 a.e.
Proof Let Ykn = Xk2-?> X ( k p l ) 2 - n ,so that the Ykn are N(0,2-")
-
then
P(A,,,) <~-~2-~+'.
Thus 00 00
and by the Borel-Cantelli lemma (cf. Theorem 2.1.9i), P(A,,,, i.0.) = 0. Hence
>
for the complementary events, P(Ak,, , n no)= 1, so that
and also
t\o
xt
(2t log log l/t)1/2
=1
I =1,
t\o
xt
(2t log log l / t ) l / 2 - 1
- 1 =I. (2)
with the formula for ch.f.s. Also, Yt is .Ft-adapted, and for 0 < s < t we have
[
P max x , - - s >b ] = P [,st
maxy, t e a ' ] (atR)
Note that since X,, and hence Y,, is (uniformly) continuous on [0,t ] , the
maximum exists, aiid there are again no measurability problems. (In fact we
can consider the result when s ranges over all the rationals and by continuity
the above assertion follows.) In order to apply the Borel-Cantelli lemma, one
chooses a , b of (5) suitably.
Let h(t) = (2t log log l/t)1/2,0 < 6 < 1,E > 0 and t > 0 be small. Set
+
t, = &",a, = (1 ~)6-"h(S"), aiid b, = ih(6"). Then
max(X,
sst ,,
-
1
-a,s)
2
< b,, I
all but finitely many n = 1. (7)
Hence for a.a.(w), we can find no(w) such that n > no(w) aiid tnPl < t 5 t,
implies
Thus
lim sup(Xt/h(t)) <1 a.e.
~'4
671
-
ae-av2
- a2 lm
epu2I2 d u (by iiitegration by parts).
Hence
and this is the general term of a divergent series. Since the A, of (9) are
independent events, we get by the second Borel-Cantelli lemma (cf. Theorem
2.1.9iii), because P(A,) = cm,that P(A,, i.o.)=l. Hence for infinitely
-
many n ,
>
X 6 , ~ (1 &)h(hn)
- +
xs,,+l, a.e.. (12)
470 8 A Glimpse of Stochastic Processes
On the other hand, by (8), X6rb+1 < h(Sn+') from some n onward. But X p is
N ( 0 , hn), so that (by symmetry) XbrL+l> -h(Sn+') also. Thus (12) becomes
Consequently
liin sup X t / h ( t ) > (1 - 4 a.e. (13)
t\o
Since 0 < S < 1 is arbitrary, (13) implies
In view of the "scale invariance" (cf. Corollary 2.2ii), one has the following
consequence of (1) and (2):
and
log log t)'I2) = 1 = 1
1 (15)
the exponential form of the d.f.s. For a good account, we refer the reader
t o Lkvy's (1948) classic. For further developments, McKean (1969) and Hida
(1980) may be consulted. The analysis leads t o stochastic integration with
numerous applications. (See, e.g., Revuz and Yor (1999) 3rd. ed. Springer) for
a recent treatment.)
the series converging absolutely and uniformly i n the square [a,b] x [a,b].
Because of (2), Theorem 2.2.6 implies that the above series converge a.e. and
in L2(P)-mean. Hence E ( X t ) = 0, and by independelice of the En,
using (2) again. Then by Proposition 2.1, since each Xt is clearly Gaussian,
C, P) of the tn with
{ X t , t E [a, b]) is a Gaussian process on the space (R,
472 8 A Glimpse of Stochastic Processes
mean zero and covariance r. Such a representation as (2) and (3) is useful for
many computations.
To illustrate the effectiveness of the representation (31, and the need for
special techniques, let us calculate the distribution of a general quadratic
functional of a Gaussian process {Xt, 0 < <
t 1) with mean zero aiid a
continuous covariance r . The quadratic functional is
+a2 L1 q2(t) dt
+a2 1 1
q2(t)dt (by the orthonormality of the ui)
>
since {u,, n 1) is complete in L2(0,1) and q E L2(0, 11, SO that by Parseval's
equation we may cancel the last two terms. Consider the moment-generating
function (m.g.f.) which exists and is more convenient here than the ch.f.,
8.4 Gaussian and General Additive Processes
lytic function of X in a small disc around the origin. The second term is still
complicated. However, in the classical theory, one sets rl = r, aiid for k > 1,
so
lets r k ( s ,t) =
1
r ( s , x)rk-l(x, t ) dx, to use induction. Then set
which converges absolutely and uniformly if IXI max,,t Ir(s, t ) < 1 on the unit
square. R is called the reciprocal kernel of r, which clearly satisfies
r(s, t) + R(s, t ; A) = X
Now using (21, it can be verified that
I' r(s,x)R(x, t ; A) dx.
Suppose that the process is stochastically continuous; i.e., for each E > 0,
lim P [ X t - X ,
t i s
> E] = 0, s E (0,l)
Then it is not hard t o verify (cf. Problem 4) that limt,, q5,,t(u) = 1 uniformly
<
in u and s , t in compact intervals. Hence if 0 s < to < t l < . . . < t , < t 1, <
+
with tk = s (k(t s ) / n ) , we get
-
aiid the factors can be made close t o 1 uniformly if n is large enough. Thus
q5s,t is infinitely divisible (in the generalized sense, aiid hence in the ordinary
sense; cf. Problem 17 in Chapter 5 ) . Consequently, by the Lkvy-Khintchine
representation (cf. Theorem 5.2.5) with s = 0 < t < 1, u E R,
v2
d ~ , ( ~ ) ) (12)
Substituting (12) in (13), with the uniqueness of the formula, one can deduce
that y,,t = yt - y, and G,,t = Gt - G,. Thus
. .
This formula can be used t o analyze the processes with independent incre-
ments which are stochastically continuous. By choosing different such pairs,
8.4 Gaussian and General Additive Processes 475
and hence different classes of these processes, one can study various proper-
ties. Note (by Example 3.3.3) that a process with independent increments is
Markovian, and if it has one moment finite and then means zero (in particular,
a Brownian motion), then it is also a martingale. These relations indicate the
directions in which the above types of processes can be studied. (See Problems
5 and 6.) We now illustrate with some simple but important examples.
An interesting and concrete example of an additive process is obtained by
>
considering a nonnegative integer-valued process I N t , t 0) with independent
increments. We now detail this view t o explain the growth in new directions.
For instance, Nt can represent the total number of events that have oc-
curred up t o time t , so that such a process is often termed a counting process.
Now, unlike Brownian motion where it is assumed that increments of the
process are Gaussian, we shall assume only that the independent increments
are strictly stationary, with no distribution implied.
For the sake of simplicity, let No = 0, aiid assume that the probability of
an event occurring during an interval of length t depends upon t. One simple
way t o develop this dependence is t o assume, as originally done by Poisson
(c. 1800) that
+
PIN*, = 11 = x n t o ( A ~ ) , (15)
where X is a nonnegative constant and A t is the length of the interval [0, At].
For a small value of At, equation (15) implies
be the conditional probability of n events at time t given that there were none
initially. (Note that, in this illustration, the conditioning events always have
positive probability aiid so the difficulties considered in Section 3.2 do not
+
arise.) It follows that at time t At,
Thus
po(t + At) - p0(t) = (-Ant +O ( A ~ ) ) P ~ ( ~ )
which is
so upon letting At + 0, one has PA(t) = -XPo(t) since the last term tends
to zero by definition of o ( A t ) . This simple differential equation, with the
assumption No = 0, so that
gives
Po(t)= ePxt
Similarly, for n >1
P,(t + A t ) = P [ N , + L ~= n N o = 01
= P[Nt = n , Nt+nt Nt = ONo = 01
-
+P[Nt = n 1, Nt+nt Nt = I N o = 0]
- -
For each t > 0, P,(t) is the Poisson probability distribution with parameter
X > 0. Thus the iioiiiiegative integer-valued process Nt that has independent
and strictly stationary increments which satisfies (15),is called the Poisson
process with rate parameter X > 0.
8.4 Gaussian and General Additive Processes 477
P [ X t + ~=
t n + l X t = n] = A,& + o(At) for n > 0,
P [ X t + ~=
t n lXt = n] = p,At +o(At) for n > 1,
and
so that during a time At, the process can only increase (a "birth") by one unit
or decrease (a "death") by one unit. Birth-death processes have a wide variety
of applications in the biological and physical sciences. A few examples of these
processes are included in the exercises by considering various generalizations
of equations (15) and (16).
An important application of the Poisson process occurs in queueing theory
where the process Nt represents the number of arrivals t o the queue and
equation (15) gives the probability of an arrival t o the queue during a small
time interval. This is a specific example of the queueing model considered in
Section 2.4. We now reconsider the process Nt from a slightly more advanced
point of view.
Thus alternately the Nt process can be obtained as follows: Let X be an
exponentially distributed random variable so that P [ X < x] = 1 - e-'", x >
0, X > 0. If X I , . . . , X, are independent with the same distribution as X , let
>
S n = C i = 1 X k , be the partial sum and for t 0, set Nt = sup{n > <
1 : Sn t}
so that Nt is the last time before the sequence {S,, n >
1) crosses the level
t > 0, where as usual sup(@)= 0. Then Nt is an integer valued random
variable, and its distribution is easily obtained. In fact, since Sn has a gamma
distribution (c.f. Section 4.2) whose density is given by
we have for n = 0 , 1 , 2 , . . . (set So= 0), since [Nt > n] = [S, < t ] ,
which is obtained by integrating the second term by parts aiid cancelling the
resulting integrals. Thus {Nt, t >0) is a Poisson process. Moreover, it has
the properties for w E 0 : (a) No(w) = 0, limtioo Nt(w) = oo, (b) integer
valued, nondecreasing, right continuous (i.e. limSLtN, (w) = Nt (w)), and at
discontinuity points, say to, Nt, (w) - N t o (w) = 1. We may ask whether these
properties characterize the Poisson process in the sense that such a process
has independent stationary increments as well as the distribution given by
equation (20). For a positive answer, we need t o strengthen the hypothesis
which can be done in different, but equivalent ways. This is presented as fol-
lows.
Proof of 1. + 2. From definition of Nt, we note that the sets [Nt n] = >
[S, <t], and as seen in (20) that [Nt > <
n] = [Sn t < S,+l]. For given
t> 0 and y >0, we have, on using the independence of S, and Xn+l with
exponential distribution for X, and with = Sn X,+l, +
-
= e '"[s, It,X,+l > t - S,]. (21)
On the other hand the properties of the independent X, of hypothesis 1.
imply for yj 0,>
(since S,+l > t + y on [S, < t < S,+l] implies X,+1 > y1, and use (21))
To simplify the left side, we define new random variables depending on the
fixed (but arbitrary) t as follows. Let xit)= SNt+1 t , x:) = XNt+a, = xit)
X N L + 3 , ..., and observe that for 0 < s < t , [Nt+, Nt >
m] = [SN,+, I
-
+ x(Y:" <
t s] = [ ~ y = ~ s]. This implies
which brings in the increments of the Nt-process with the "new" partial sums
s?) in t e r m of x
)! random variables, and the increment process {Nt+, -
>
Nt, s 0) is similar to the I N t , t >0) process for the X j random variables.
Thus we have [Nt+, N t = m] = [s:) s < s:)+~]. With this new definition,
if A = x:=~ [yjl oo) C &tkis a rectangle, then (22) becomes
But such rectangles as A are generators of the Borel a-algebra of IW" so that
by a standard result (cf. Proposition 1.2.8), (23) holds for all Borel sets A.
480 8 A Glimpse of Stochastic Processes
The next step is to express the joint event [Ns, = m,, z = 1,. . . ,l] as one
+
of the events in (23) using liLr 1 variables xJt)'s. Thus if we consider A as
+ +
the rectangle x t = l [ z l . . . z,, < + +
s, < z1 . . . zm,+l),then we find
This is the key step in using iiiductioii with l = 1 , 2 , . . . , and 0 = to < t l <
. . . < te, since No = 0, so that the following is trivial for != 1, aiid then
assume for l = k, to complete the induction:
and obtain the result for l = k + 1. The equations (24) and (20) imply our
assertion and gives 2.
From this we get for any Bore1 set A c [(yl,yz) : 0 < y1 < yz] c R2 that
The result can now be extended to the sector (0 < yl < . . . < yk) in IW'"
using the mapping z, = y, y,-l, computing the Jacobian, as in the proof
-
of Theorem 3.3.9. One deduces that the X, are independent and each has an
8.4 Gaussian and General Additive Processes 48 1
where t = 1 and 6,(.) is the Dirac point measure ( aiid TO = So, supp (TX) =
{ 0 , 1 , 2 , . . .} = z+). Since TX(.)is clearly a measure on the power set ?(Z+),
>
if XI, X2 0 one has the coiivolution
and its ch.f. gives, with -irx(t) = JZ+ eZtx7rx(dx) = e- X En=, eztnx -
n! -
eX(e"-l)
>
so that {TX, X 0) is a semi-group of probability measures, under convolution.
But (25) motivates the following immediate extension.
Let (S,B, u) be a finite measure space and 0 < c = u(S) < oo. If F ( . ) =
$ u ( . ) , then (S,B, fi) is a probability space (usually) different from (R, C , P).
Let X j : S + R be independent identically distributed random variables
482 8 A Glimpse of Stochastic Processes
where N is the Poisson random variable with u(B) as intensity noted above.
Here N and X j are independent. As a conlposition of N and X j , all
at most countable, ?(.) is a random variable. In fact [?(B) = n] =
~ , ~ ~ ~ [Gx,C (B)
j m =, ~n] n [ N = n], for each integer n 2 1 so that ?(B)
is measurable for C , and thus is a random element for all B E B. To find
the distribution of %(B) we proceed through its characteristic function, and
establish the following statement
Lemma 4: For each B E B, ?(B) is Poisson distributed with intensity
C . F(B) = v(B), implying that ?(.) is pointwise a.e. a-additive. Moreover, the
result extends even if the intensity measure u(.) is a-finite.
Proof In establishing this result, through ch.f.s, we employ the fact that
E ( Y ) = E ( E ( Y 2 ) ) for any integrable (or nonnegative) random variable Y
and any r.v. Z (cf. Proposition 3.1.2). In view of Corollary 4.2.2 (uniqueness),
this is seen t o establishe the result. First we assume that 0 < u(S) < cm,as
in the statement so that F(S) = 1.
Thus denoting again by k ~ ( t = ) E(eit'"), and using the hypothesis that
X j are i.i.d. on ( S ,B, fi) which are independent of N , one has
N
= E ( E ( ~ ' ~ E J (B)
= ~N ) ) , by the above identity,
- x ~ ( e " ~ : = 1 ~ i' rB ) ) ~ =( n~) , since N is discrete
Comparing this with (25) and the following discussion, we conclude that i i ~ ( . )
is Poisson distributed with intensity u(B).
Now, if B = Ur=lEk, B k E B, 0 < v(Bk) < GO, B k disjoint, then v(B) =
C k = , v(Bk) < GO, implying that v(B,) + 0 as n + ce so that the ch.f.
00
of .irB,, (.) tends to unity and hence iiBrL+ 0 in probability. Hence .ir( ) is a-
additive in P-measure. It is also seen that .irB,, are independent on disjoint
>
sets (and 0 a.e.), it follows that iiB = Cr='=liiB7,,holds pointwise a.e.
Finally, let v(.) be a-finite, and so writing S = U ~ ? = l S nv(S,)
, < GO,
S, disjoint, let .I?, = gS,, which are independent Poisson measures on
(S,, B(S,), v(S, n .)), n > 1, by the preceding paragraph. If % = EzlgS,,,
a sum of independent Poisson random measures, it is Poisson with intensity
0 < u(Sn) < cc on S,, and this depends only on u. Thus the results holds for
a-finite v(.) also, coinpletiiig the proof.
Hereafter we write n(.) for %(.) to simplify notation. Also the relation
between the intensity parameters of N and T ( . ) should be noted.
This result implies several generalizations (the versatility of the Poisson
measure!) of the subject, originally investigated by P. Lkvy in 1937. We in-
dicate a few consequences here. The above property of n(.) motivates the
followiiig generalization.
Definition 5 Let L O ( P )be the space of all real (or complex) random
variables on a probability space ( 0 ,C , P) and (S,B) be a measurable space.
A mapping p : B + L O ( P )is called a r a n d o m m e a s u r e , if the followiiig
(abstraction of Poisson measure given in Lemma 4) holds:
The classical results assert that f,(x) + f (z), for all continuous periodic
functions f , uniformly as r + 1. Thus, T is a continuous linear mapping on
L1(-n, T ] .The study leads to a profound analysis in harmonic function theory
and elsewhere, (cf. e.g., Zygmund (1959), p. 96).
Replacing P ( r , z ) dz by ~ ( wds)
, or more inclusively p(ds)(w) of Definition
5 above one could consider the corresponding analysis for random functions
or process (or sequences) that admit integral representation, modeling that
of (30) and then study the sample path behavior of them. Here the Lebesgue
interval [-n, T ] is replaced by (S,B, v) and w (in lieu of r ) varies in ( R , C , P ) .
Such a general study has been undertaken by P. Lkvy when p is a stable ran-
dom measure, (cf. (28), (29)). The resulting class of processes is now called
Lkvy processes. Almost all the classical results have important nontrivial ex-
tensions and the ensuing theory has an enormous growth potential in many
directions. We include here a glimpse of this aspect.
The basic step in the analysis is to define an integral of a scaler (non-
random) function relative to a stable random measure p : B + L O ( P ) .In
the case of a Poisson random measure, the intensity measure v : B + E+
(but a-finite) is a natural one defining the triple (S,B, v). In the general
case (of a stable random measure) (28) or (29) we have y(.),c(.) and P(.)
as set functions, with a-additivity properties but are not generally related
to u of the triple. So the first simplification made is to assume that y(.)
and c(.) are related (or proportional) to v and /3 is a constant. Thus, let
>
?(A) = au(A), (a E R)c(A) = cu(A), (c 0), and ? ' ! , I <
I is a constant, so
that the characteristic exponent $(., .) of (29) becomes for a E R, 0 < a 2, <
$(A, t) = iav(A)t - cv(A)tla{l - iPsgn t . w(t, a)),A E Bo,t E R. (31)
It is now necessary (and nontrivial) to show that exp{-+(A, .)) is a char-
acteristic function. This is true and then one can establish the existence of
an a-stable random measure into L O ( P )on a probability space (R, C , P ) ,
486 8 A Glimpse of Stochastic Processes
using a form of Theorem 3.4.10. This will show that the random measure
p : B0 + L O ( P )is "controlled" by u in the sense that p(A) = 0, a.e. [PI holds
whenever u(A) = 0, and p is governed by the quadruple (a, c, P, u). With this
a stochastic integral correspoiidiiig to the classical one defined by (29) can be
introduced. It will be specialized to show a close relation to strictly stationary
processes, represented as integrals, which also connects measure preserving
mappings of the last chapter at the same time.
Thus if fn = Cy=,aiXA,, Ai E Bo, disjoint, so that fn is a simple function
b)
and fn E L" ( S ,B, ,-define as usual
since a Cauchy sequence has a unique limit in a metric space. It may now
be shown that the limit does not depend on the sequence {f,, n > 1). The
method is standard but not trivial (cf., Dunford-Schwartz (19581, Sec. IV.10)
and the uniqueness proof for (33) depends o n the availability of a controlling
measure u. Using the existence of such a u, it is possible to consider two
measures p1, p2 and obtain a Lebesgue type decomposition as well as the
Radon-Nikod9m theory for them. This analysis has deep interest in appli-
cations (See Section 5.4 of Rao (2000) for an aspect of this work where the
Lkvy-It6 representation and related integral formulas are given.)
Thus T : f H J , f dp is well-defined, and it may be verified that the
integral (33) or the mapping T is linear. The next important concern is to
characterize the class of p-integrable functions f E LO(S,B, u). We state a
result in this direction for an understanding of this new area and to relate
it with the strictly stationary process introduced at the end of the preceding
chapter. The following result is a substitute for (30) in the present context.
~ ( e ~ =
~ exp{-ctl"
~ ( ~ ) ) f o V u ) , f E Lm2(u). (35)
A detailed proof of this result with extensions if the spaces are vector val-
ued Lml(u; X ) and Lm2(P,y) where X , y are certain Banach (or even Frkchet)
spaces is given by Y. Okizaki (1979). We omit the details here. It uses various
properties of ch.f.'s and the work of Chapter 5. For a special class of random
measures, namely those defined by symmetric stable independent increment
process, M. Schilder (1970) has given a simpler description of the stochastic
integral (for a brief sketch of this procedure, see Problem 14). It is of inter-
est t o characterize the range R of the stochastic integral T as a subspace of
L O ( P ) ,but this is not a familiar object. For instance, it may be shown that
with (A = min, V = max) the metric given by
If the process is complex valued, then it has the corresponding form (Re =
real part) as:
by Marcus and Pisier (19841, who introduced and analyzed this class, that
the inclusion is proper unless a! = 2 which corresponds t o the Gaussian case
in which they both coincide. The following (canonical) example of a strongly
stationary a-stable process shows the motivation for this class and its close
affinity with certain problems in (random) trigonometric series.
where hx(.) is the Dirac measure at X x Rn. We omit the details which are
not difficult, although not entirely simple, and refer t o the above noted paper.
An interesting outcome of this example is that if a = 2 then Qx must be
Gaussian by Theorem 5.3.2, and if 0 < a < 2 it is a stable process. However,
if Qx are the Radeinacher functions (i.e., independent sequence taking values
+1 aiid -1 with equal probability), theii (42) reduces t o a series considered by
Paley and Zygmund in 1932 about its convergence with probability one for
every A. If Qx = eZie~lxwhere for X = n, the 8, are independent uniformly dis-
tributed random variables on [0,1],then the series becomes a Steinhaus series.
Here the hypothesis of a-stability of QA was not (and could not be) assumed.
But the convergent series represents an a-stable process. These are not siin-
ple and both were considered in Paley-Zygmund papers and later by Salem
and Zygmund in 1954. The subject was further detailed and generalized by
Kahane (1968 and 1985). The point of this discussion is that a strongly sta-
tionary a-stable class contains these interesting results and a detailed analysis
aiid characterizations are obtained by Marcus and Pisier (1984) noted above.
This study moves in a different direction if a = 2 where we can admit a much
wider class of processes t o be discussed in the following section, and it is called
weak stationarity which will be of equal (and perhaps more) importance in
applications. A surprising fact is that for E X as Rademacher, Stienhaus or
Gaussian i.i.d. variables the series represents an a.e. coiitiiiuous function for t
or a.e. unbounded function (all t ) . One has t o employ different methods. We
refer t o Kahane (1985) for details.
We have noted in (33) above that integrals of the form Js f dp can be
defined for random measures p (with independent values on disjoint sets) on
(S,S , v) for f : S + R (or @) of bounded measurable class from LO(v).In
particular, if S = R aiid fx(s)= em\ X x R,theii one has
8 A Glimpse of Stochastic Processes
is such that its closure W in the weak topology of X , is compact, where 'dt'
is the invariant or Haar measure of G [the Lebesgue measure if G = R n ] , and
g is the Fourier transform of g, [i.e., g(s) = Je < 9, 7 > dy or in the case that
G = R n , G = Rn and j(s) = JR ezsSXS(X) dX].
The point of this definition is that f is not required t o be positive definite.
Our aim is t o get a corresponding representation of f as in Theorem 4.4.2 (or
4.5.8), for a process {Xt, t E G). Here is the solution of the first step noted
when G is as above.
8.4 Gaussian and General Additive Processes 49 1
Details of the above result can be found, for instance, in the first author's
book, (Rao (2004), p. 550) and will be omitted. A consequence of this repre-
sentation is that {Xt, t E G) under the stated conditions is an integral of a
vector (particularly stochastic) measure Z : B(G) + L a ( P ) , a >
1. But now
one has to find the (probabilistic) properties of the measure Z related to the
random field {Xt, t E G) under consideration. If {Xt, t E G) is strictly sta-
tionary, then some special properties of Z should be obtained. The followiiig
result for strictly stationary a-stable processes answers the above question
and shows the basic role of the classical theory of probability here. It is essen-
tially a restatement of similar analyses of K. Urbaiiik (1968) aiid of Y. Hosoya
(1982) which is given using the new coiicept of strong stationarity introduced
above.
divisible fanlily to get another strongly stationary process. This enlarges the
previous case and yet not exhausting the strict stationarity. We indicate this
notion, introduced by Marcus (1987), t o round up these ideas known a t this
time.
Comparing formulas (35) and (40) it is clear that the characteristic ex-
ponent, t H +(t) in exp(-$(t)) is a nonnegative, nondecreasing function,
and a similar statement holds for real symmetric infinitely divisible random
variables in Lkvy's form as seen in expression (20) of Section 5.2. In detail,
<
let be a synunetric infinitely divisible real random variable whose ch.f. can,
therefore, be expressed as
To make effective use of the Hilbert space geometry, we take the r.v.s as
complex valued, (otherwise we coinplexify the process for convenience) so
where, as usual, the overbar denotes the complex conjugate. The process is
called weakly stationary (or also termed K-stationary, K for Khintchine, or
in the wide sense) if rn(t)= coilstant and r ( s , t ) = f ( s t) with f assumed
-
for almost all s-t E R (Lebesgue), aiid if r is also continuous, then (2) holds for
all s t E R.Here F is a bounded iiondecreasiiig noiiiiegative function, called
-
i=l j=1
are disjoint intervals of R} < oo. (4)
Here F (A, B) = S sRz xA(X)xB( X I ) F (dX, dX') and F is again called the spec-
tral function (bimeasure) of the process. Clearly every weakly stationary pro-
cess is strongly harmonizable as can be seen when F ( . , .) concentrates on t h e
diagonal X = X I . A very simple harmonizable process which is not weakly
stationary is the following.
Let f E L1(R) and f be its Fourier transform: f (t) = SReitxf (A) dX. If <
is an r.v. with mean zero and unit variance, and X t = ( f (t), then {Xt,t E R}
is such a process. Harinonizable processes are finding interesting applications
in the areas mentioned above, and some basic theory of the strongly harmo-
nizable class is given by Lokve (1963).
Consider the simple stationary example given above. If 'HI = ~ { f ,n, 0) >
is the closed linear span in L ~ ( P ) let, Q : L ~ ( P+ ) 'HI be the orthogonal
projection with range 'Hl. If g, = Qf,, then g, = f, for n >
0, and =0 for
n < 0. Even though {f,, o o < n < oo} is a very simple weakly stationary
process, the gn-sequence is not strongly harmonizable. A proof is not difficult,
but is nontrivial. In fact, if { X t , t E R) c L 2 ( P ) is weakly stationary, and
T is a continuous linear mapping of L 2 ( P ) into itself, letting = TXt,
then {x, t E R} is generally not strongly harmonizable. However, every such
process can be shown t o be weakly harmonizable in the following sense.
A process {Xt, t E R} c L 2 ( P ) is weakly harmonizable if E ( X t ) = 0, and
its covariance can be represented as (3) in which the spectral function F is a
covariance function of bounded variation in Fre'chet's sense:
but now the integral here has to be defined in the (weaker) sense of A t . Morse
and W. Transue and it is not an absolute integral, in contrast to Lebesgue's
definition used in (3). It is clear that each strongly harmonizable process is
weakly harmonizable, aiid the above examples show that the converse does
not hold. Most of the Lokve theory extends to this general class, although
different methods and techniques of proof are now necessary. These processes
are of considerable interest in applications noted above. The structure theory
of such processes and other results can be found in the literature [in particular,
see the first author's paper (1982)] aiid some further details will be given in
the problems section.
We present a brief characterization of weakly harmonizable processes to
give a feeling for those classes and classifications to contrast with the strict
sense classes discussed in the preceding section. Thus, we have the following di-
rect characterization of weakly harmonizable processes, specializing Theorem
4.9 above to the present (Hilbert space) context. For simplicity we coiisider
G = R, aiid give the essential details of proof.
L i ( P ) . We use the same symbol for the extension also. Hence, by a classical
Riesz representation theorem (cf. Dunford-Schwartz (1958), IV.7.3) which is
seen t o extend t o the locally compact case (such as R here), there exists a
unique measure Z : B(R) + L i ( P ) , such that
J, f"(t)(e z)(dt) =
J, f (t)(e 0 x)(t) dt.
Substituting for f" and using Fubini's theorem on the left side (clearly valid
for signed measures), one has
so that
Since f is arbitrary, its coefficient function must vanish a.e., and it is actually
everywhere by the (weak) continuity of that element. This establishes that
X ( . ) is the Fourier transform of the (stochastic) measure Z(.), hence weakly
harmonizable.
Conversely, let X ( . ) be weakly harmonizable so that it is the Fourier trans-
form of a (stochastic or vector) measure Z(.). We claim it is V-bounded. Using
the representation of X t , it is seen that
8.5 Second-Order Processes 497
If & E (Li (P))*,then one has l ( X )(.) to be the Fourier transform of ( l o 2)(.),
and hence, for f E L1(R), it follows that, as before, for the Bochner (or vector
Lebesgue) integrable (f X ) (.), et (.) = eixt,
It follows that (14) implies (cf. Definition 4.7 above) that X is V-bounded,
since L i ( P ) is reflexive and as a Fourier transform of ( l o Z)(.) H ( l o X t ) , is
continuous, i.e., X(.) is weakly continuous, completing the proof.
The same argument extends easily for any locally compact abelian group.
It is not too difficult to show that the covariance function r(., .) of X(.) is
representable as
the vigorous growth of stochastic theory into essentially all branches of anal-
ysis, and will now be in a position t o study specialized works on these and
related subjects.
Exercises
1. Find an explicit form of an n-dimensional density of the Brownian mo-
tion process {Xt, 0 t < <
1).
2. Let {Xt, t > 0) be a Gaussian process with mean function zero and
>
covariance r given by r(s,t) = e S p t ,s, t 0. If 0 < t l < . . . < t,, show that
(Xtl , . . . 1 Xt,, ) has a density ft ,,..., t,, give11 by
>
3. If { X t , t 0) is the Brownian motion of Section 1 and A, = {(t, w ) :
Xt(w) > a ) ,then verify that A, is X x P-measurable, where X is the Lebesgue
measure on R+ and P is the given probability. Thus X = { X t , t 0) is a >
(jointly) measurable random function on R+ x R + R.
4. Prove the remark following Eq. (4.10), namely: If {Xt, a t < <
b) is a
process such that for each E > 0, a t, u < <
b, limti, P [ X t - X,I >
E] = 0,
then the ch.f. q5t,7L : u H ~ ( e " ( ~ satisfies
~ ~ ~limti7,
~ ) )q5t,,(u) = 1 uniformly
in v, and conversely the latter property implies the stochastic continuity, as
given there.
>
6. (Converse t o 5) Let {Xt, t 0) be a stochastically continuous process,
X o = 0, with strictly stationary independent increments, on a probability
Exercises 499
is a Brownian motion. (This observation is due t o L.A. Shepp, and the proof is
analogous t o that of Theorem 1.2, but the uniform convergence is less simple.)
Ph(t) = -XPo(t)
PA(t) = XP,(t) + XPnPl(t) for n >1 (1)
which is the probability generating function (p.g.f.) for the system (1). Rewrit-
ing (1) on multiplying by an appropriate power of s show that the partial
derivatives yield the p.g.f. $ which satisfies
10. Some properties of the Poisson process Nt are detailed in this problem.
(a) (Memoryless property of the exponential distribution) A random
variable T is said to have the memoryless property if for each s, t E R
for some coilstant n E Kt+. Using this, deduce that T has an exponential
distribution by coilsidering
11. (The pure birth (linear birth) process) Consider a nonnegative integer-
>
valued process {Xt, t 0) with independent increments. A pure birth process
is obtained by assuming
for n = no, no + 1,.. . and X > 0. [Compare this with equation ( I ) above.]
Letting
P,(t) = P [ X t = n Xo = no]
and in a manner analogous to the derivation of the system of differential
equations for the Poisson process, show that a pure birth process {X,, t 0) >
has probabilities P, (t) which satisfy
Exercises
Either solving this system recursively, or using the p.g.f. method detailed in
Problem 9, show
Using PA(t), the derivative, show that the expected value m(t) = E(Xt)
satisfies the differential equation
exists. Then provided that the limit aiid the derivative caii be interchanged,
(formulate a simple [uniform] coiiditioii for this) show that the system (5) caii
be written as
8 A Glimpse of Stochastic Processes
with 1
which converges for J z J< ce if p 5 q and if p+q = 1, then the series converges
if z < 1. The symbol ( a ) , is known as the Pochhammer notation aiid it is
defined by
(a), =
r ( a n) + with ( a ) o = 1
r(a)
and 00
C Pj ( t ) = 1 - Po(t).
to give PA@) = y - +
( a y ) P o ( t ) .For simplicity, assume X ( 0 ) = 0. This
>
implies that Po(0)= 1 and Pn(0) = 0 for all n 1. Show that this system of
differential equations has a solution
or more succinctly
[This simple process was first considered by Swift (2000a), but has since found
a wide range of additional applications. Indeed, this idea has been applied t o
queueing models by B.K. Kumar, and D. Arivudainambi, (2000) and A. Di
Crescenzo, V. Giorno, A.G. Nobile, (2003) as well as population models by
R.J. Swift (2001), M.L. Green (2004). This simple catastrophe process has also
been extended t o a multiple catastrophe process by I. Chang, A.C. Krinik,
and R.J. Swift (2006), although a different method of solution is required.]
Verify that the integral is well-defined (show that it does not depend upon
the representation of f ) and is a-stable. [Here the availability of the control
Exercises 505
measure is needed.] Show for each 0 < p < a! < 2, one has for each simple fn
in L a ( C ) ,
for some coilstant k(p, a ) > 0 depending only on p and a . [This nontrivial
fact was established independently by M. Schilder (1970) and in an equivalent
form slightly earlier by J. Bretagnalle, D. Dacunha-Castelle and J.L. Krivine
(1966).] If f E L a ( C ) is arbitrary, verify that there exist simple fn E La(C)
< <
such that fn + f in a-norm, 1 a! 2, and {Js fndX, n > 1) is Cauchy in
LP(P). If we set
f
f dX = lim
n
Is
f
fndX,
exists. This class was also independently given by Yu. A. Rozanov and E.
Parzen, the latter under the name "asymptotic stationarity".) Show that r
is positive definite, and hence by the Bochner-Riesz theorem, coincides a.e.
(Lebesgue) with the Fourier trailsforin of a positive bounded noiidecreasing F ,
called the associated spectral function of the process. Verify that if Xt is real
valued and stationary theii Xt belongs t o class (KF), so that the stationary
processes are contained in class (KF). [It can be shown, that if Xt = +Zt,
where Zt is a zero mean stationary process and Yt a process with zero mean
+ +
and periodic covariance i.e. r(s k, t k) = r(s,t ) for some k, Y 1Z , these
processes are known as periodically correlated, theii X is in class (KF),but
is not stationary, so class (KF) contains iioiistationary processes.] Show that
any strongly harmonizable process belongs t o the class (KF). [It is also true
506 8 A Glimpse of Stochastic Processes
that some weakly harmonizable processes are contained in the class (KF), but
this proof is somewhat involved. In a slightly more general direction, a process
Xt which has covariance of the form
Billingsley, P. (1995). "Probability and Measure.", 3rd. ed., Wiley, New York.
Chung, K.L. and Ornstein, D.S. (1962). On the recurrence of sums of random
variables. Bull. Amer. Math. Soc., 68, 30-32.
Cuzick, J., and Lai, T.L. (1980). On random Fourier series. Trans. Am. Math.
Soc. 261, 53-80.
DeGroot, M. H., and Rao, A/I. M. (1963). Stochastic give-and- take. J. Math.
Anal. Appl. 7, 489-498.
Feller, W . (1943). The general form of the so-called law of the iterated loga-
rithm. Trans. Amer. Math. Soc. 54, 373-402.
Hall, P., and Heyde, C. C. (1980). "Martingale Limit Theory and its Appli-
cation." Academic Press. New York.
Hsu, P. L., and Robbins, H. (1947). Complete convergence and the law of
large numbers. Proc. Nut. Acad. Sci. 33, 25-31.
Ibragimov, I.A., and Linnik, Ju. V. (1971) "Independent and Stationary Ran-
dom Variables." Norodhoff Publishers, The Netherlands.
Ionescu Tulcea, C. (1949). Mesures dans les espaces produits. Atti Accad. Naz.
Lincei Rend. Cl. Sci. Fis. Mat. Natur. 7, 208-211.
Ionescu Tulcea, A., and Ionescu Tulcea, C. (1969). "Topics in the Theory of
Lifting." Springer- Verlag, Berlin.
Jessen, B., and Wintner, A. (1935). Distribution functions and the Rieinann
zeta function. Trans. Amer. Math. Soc. 38, 48-88.
Kac, M., and Slepian, D. (1959). Large excursions of Gaussian processes. Ann.
Math. Statist. 30, 1215-1228.
Lai, T.L. (1974). Suminability methods for i.i.d. Random Variables. Proc.
Amer. Math. Soc. 4 5 , 253-261.
Lkvy, P. (1937). (211d ed. 1954). "Thkorie de l'addition des variables" Gauthier-
Villars, Paris.
McKeaii, H. P., Jr. (1969). "Stochastic Integrals." Academic Press, New York.
Mooney, D.D. , and Swift, R.J. (1999). "A Course in Mathematical Model-
ing." The Math. Assoc. of America, Washington D.C.
Morse, A/I. aiid Trausue, W. (1956) Biineasures aiid their integral extensions.
Ann. Math. 64, 480-504.
Neal, D.K. and Swift, R.J. (1999). Designing Payoffs for Some Probabilistic
Gambling Games. Miss. J. Math. Sci., 11, 93-102.
Rao, M. A/I. (1962). Theory of order statistics. Math. Ann. 147, 298-312.
R&iyi, A. (1960). On the central liinit theorem for the sum of a random num-
ber of independent random variables. Acta Math. (Hung.) 11, 97-102.
516 References
Schilder, A/I. (1970). Some structure theorems for the symmetric stable laws.
Ann. Statist. 41, 412-421.
Switkes, J., Wirkus, S., Swift, R.J., and Mihaila, I. (2003). On the Means of
Deterministic and Stochastic Populations, The Math. Sci., 28, 91-98.
Feller, W., 44, 78, 140, 147, 212, Kendall, D.G., 212
293, 325, 332, 369, 375, 399, Khintchine, A,, 59, 103, 242, 283,
406, 419, 478, 481 292, 306, 328, 336, 365, 389,
Fisher, R.A., 205 397, 399, 467
Fisk, D.L., 201 Kibble, W.F., 387
Frankiel, F.N., 505 Kolmogorov, A N . , 4, 5, 53, 63,
Fuchs, W.H.J., 81, 101 64, 77, 96, 140, 153, 158, 159,
169, 173, 185, 306, 341, 363-
Garsia, A.M., 451 365
Ghosh, B.K., 87 Koloinogorov, A N . , 358
Gikhman, I.I., 351 Krinik, A.C., 504
Giorno, V., 504 Krivine, J.L., 505
Gnedenko, B.V., 285, 308, 312, 322, Kuelbs, J., 491
326, 341 Kumar, B.K., 504
Green, M.L., 504
Gundy, R.F., 219 Li:vy, P., 90, 93, 99, 103, 140, 147,
240, 289, 293, 306, 331, 335,
Hhjek, J., 97, 178 336, 341, 390, 391, 395, 397,
Haldane, J.B., 4 466, 470, 483, 485, 494, 499
Hall, P., 436 Lamb, C.W., 220
Halmos, P.R., 177, 205 Laplace, P.S., 292
Hardy, G.H., 17, 364 Liapounov, A., 4, 293
Hartman, P., 374 Linde, W., 396
Hausdorff, F., 364 Lindeberg, J.W., 292, 293, 325
Hayes, C.A., 131, 135 Lindley, D.V., 77
Helly, E., 224, 225 Littlewood, J.E., 17, 364
Hewitt, E., 44 Lokve, M., 249, 436, 494
Heyde, C.C., 436 Ly, S.K., 502
Hida, T., 470
Hoeffding, W., 197 Mahalanobis, P.C., 4
Horowitz, J., 426 Maistrov, L.E., 4
Hosoya, Y., 490, 491 Mann, H.B., 95, 347
Hsu, P.L., 100, 287 Marcinkiewicz, J., 215, 289
Hunt, G.A., 140, 404 Marcus, M.B., 489, 492
Hunt, R., 220 Markov, A.A., 59, 103, 140
Marshall, A.W., 97
It6, K., 507 Mathias, M., 257
McKean, H.P., 467, 470
Jerison, M., 216 McLeish, D.L., 436, 455
Jessen, B., 91, 158, 159, 163, 182, Mercer, T., 471
185, 201 Meyer, P.A., 24
Mihaila, I., 501
Kac, M., 133, 341, 404 Mooney, D.D., 501
Kahane, J.P., 58, 489 Morse, M., 495, 497
Kakutani, S., 457
Kampi: de Fkriet, J., 505 Neal, D.K., 95
Author Index
Zaanen, A.C., 20
Zolotarev, V.M., 296, 327
Zygmund, A., 484, 485, 489
Subject Index
Tail a-algebra, 39
Three series theorem, 56
Tightness conditions, 364
Transition probability, 150
U-statistics, 197
Unfavorable game, 175
Uniformly integrable, 23